Efficient synchronization algorithm -
Let me find a large sort of node_a (+ 10 MB, + 650k lines) There is a dataset and there is no master version of the dataset dataset at node_b , which means there may be some pieces in the node that are not available for other nodes. My goal is to synchronize content with node_b with the contents of node_a . What is the most effective way to do this?
Common sense solution will be:
node_e: here I have everything ... (sends the entire dataset)
node_b: this is what you have Is not ... (sends missing part)
But this solution is not perfect at all. To try to synchronize it every <+> 10 + (10 MB) is required to send node_a .
By using a little bronchitis at this time, I can start a partition of a dataset, only sending it
Can you think of any better solution?