Pandas' merge returns a column with _x appended to the name -


I have dataframe, columns in df1 are A, B, C, D. . and df2 are the columns A, B, E, F ...

The columns I want to merge columns A b is also the same (most likely) in both data frames. Although this is a big data set, I am working on cleansing, so I do not have a very good overview of everything yet.

I have a column called b

  merge (df1, df2, at = 'a')   

and the result is B_X. Since the data set is big and dirty, I have not tried to check that in B_X B in df1 and b in from df2 How is it different

So my question is simply in general: What is the meaning of the Pandas when it has added the _x to the merged dataframe in the name of the column?

Thanks

Suffixes are added for any clashes in column names Merge is not included in the operation, see.

In your case, if you think that they are similar then you can merge them into two columns:

  pd.merge (df1, df2, on = [ 'A', 'b'])   

What this will do, however, is the only values ​​where A and B are in both data frames. The existing default merge type is a internal merge.

So what you can do, compare your DF size merged with your first one and see if they are identical and if you can do that, a merge on both columns or simply < Rename / rename code> _x / _y suffix b column

I would like to spend time determining whether these values ​​are really similar and both are present in the dataframe, in which case you want external to merge:

> pd.merge (df1, df2, at = ['a' '' b ']] Then duplicate rows (and possibly leave any NaN rows) and you should provide a clean merged dataframe.

  merged_df.drop_duplicates (cols = ['a', 'b'], inplace = true)   

< View online docs for / html>

Comments

Popular posts from this blog

Verilog Error: output or inout port "Q" must be connected to a structural net expression -

jasper reports - How to center align barcode using jasperreports and barcode4j -

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -