hadoop - So to send all records in SequenceFile to one mapper instance? -


First of all, I know that this is entirely for the purpose of HOP, parallelism and MR. It is being said that I have a very specific use case.

I want to send the content of a complete sequence file, no matter how large, on a mapper example, but does not understand how to handle it. / P>

I know that I can do this in a reducer using an identifier mapper, but I do not want to go through the sorting / grouping overhead to bring the data into the reducer.

I also know that without a mapers or reducer, I can read a sequence file locally, but it is not fit even in the case of my use.

Just increase your block size to be a bit more file size than the file size. This will ensure that the file goes to a single mapper. This should be done when putting the file in HDFS.

As you have indicated, these files have been generated by another MR:

You can create your own encoder and override the GetSplits () method.

getSplits () Returns an array of InputSplits, instead of breaking it into several parts, return to a single partition.

Comments

Popular posts from this blog

Verilog Error: output or inout port "Q" must be connected to a structural net expression -

jasper reports - How to center align barcode using jasperreports and barcode4j -

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -