hadoop - So to send all records in SequenceFile to one mapper instance? -
First of all, I know that this is entirely for the purpose of HOP, parallelism and MR. It is being said that I have a very specific use case.
I want to send the content of a complete sequence file, no matter how large, on a mapper example, but does not understand how to handle it. / P>
I know that I can do this in a reducer using an identifier mapper, but I do not want to go through the sorting / grouping overhead to bring the data into the reducer.
I also know that without a mapers or reducer, I can read a sequence file locally, but it is not fit even in the case of my use.
Just increase your block size to be a bit more file size than the file size. This will ensure that the file goes to a single mapper. This should be done when putting the file in HDFS.
As you have indicated, these files have been generated by another MR:
You can create your own encoder and override the GetSplits () method.
getSplits () Returns an array of InputSplits, instead of breaking it into several parts, return to a single partition.
Comments
Post a Comment