apache pig - Embed shell in PIG script -
I'm new to match pig and shell patterns.
I have a file and the third column is the content like "M2534896R402Qnew" I need to draw the number between 'M' and 'R'. In Pig script,
raw = load 'record.txt' as PigStorage ('\ t') (Charrey, Charra, Charra, Charra); Data = raw through 'command command'; How can I change the third column so that the 3 columns of all the data are removed from the crude?
Thank you.
There is no need to use streaming for it. Use the underlying UDF REGEX_EXTRACT pig already can handle it: $ cat record.txt f1 f2 M2534896R402Qnew f4 f1 f2 M2534896R987Qxyz f4 f1 F2 M2534897R421Qabc f4f 1 F2 M47Rzxcvzxcv f4 f1 f2 12345m000r f4f1f2m 23551finnf4f1f2m298793r133r23quinf4 $ cat test pog raw = load 'record.text' used in pygdottage ('tttta') (F1: Chararay, F2: Chararay, F3: Chararai, F4: Chararai); Ext = FOREACH Raw Origin REGEX_EXTRACT (f3, 'M (\\ d +) R', 1); Dump Extras; $ Pig-x local test.pig (2534896) (2534896) (2534897) (47) (000) () (298793) Note that REGEX_EXTRACT There is a chararray . If you want int you have to enter it.
Comments
Post a Comment