Efficient synchronization algorithm -

August 15, 2013

Let me find a large sort of node_a (+ 10 MB, + 650k lines) There is a dataset and there is no master version of the dataset dataset at node_b , which means there may be some pieces in the node that are not available for other nodes. My goal is to synchronize content with node_b with the contents of node_a . What is the most effective way to do this?

Common sense solution will be:

node_e: here I have everything ... (sends the entire dataset)

node_b: this is what you have Is not ... (sends missing part)

But this solution is not perfect at all. To try to synchronize it every <+> 10 + (10 MB) is required to send node_a .

By using a little bronchitis at this time, I can start a partition of a dataset, only sending it

Can you think of any better solution?

Michael Elkan

Unknown

Visit profile

Archive

September 2015225

August 2015219

July 2015204

June 2015225

May 2015254

April 2015202

March 2015230

February 2015194

January 2015212

September 2014206

August 2014204

July 2014196

June 2014222

May 2014218

April 2014229

March 2014189

February 2014200

January 2014197

September 2013224

August 2013211

July 2013201

June 2013199

May 2013244

April 2013212

March 2013198

February 2013200

January 2013230

September 2012213

August 2012221

July 2012208

June 2012209

May 2012227

April 2012199

March 2012228

February 2012194

January 2012214

September 2011212

August 2011191

July 2011204

June 2011210

May 2011233

April 2011224

March 2011219

February 2011195

January 2011219

September 2010208

August 2010201

July 2010206

June 2010199

May 2010209

April 2010221

March 2010231

February 2010203

January 2010217

Show more Show less

Report Abuse

Search This Blog

CH

Efficient synchronization algorithm -