python - Detect and exclude outliers in Pandas dataframe -


I have a panda dataframe with some columns.

Now I know that there are outline layers based on some lines, a fixed column value.

For example the column - 'Volume' has all the values ​​12.xx and one value is 4000

Now I would like to exclude those rows which are the columns in this way.

Therefore, I have to put a filter inevitably that we select all the rows which are called 3 standard deviations from the values ​​of a given column.

boolean Use code> indexing as you would in numpy.array

  df = pd.DataFrame ({'data': np.random.normal (size = 200)}) #example Datasets of data normally distributed. df [np.abs (df.Data-df.Data.mean ()) and lt; = (3 * df.Data.std ()) # Keep only those whose +3 to -3 standard deviation is in the column 'data' df [~ (np.abs (df.Data-df.Data.mean ()) gt; (3 * df.Data.std ())]] # or if you prefer other methods   

This is similar to a series:

  S = pd.Series (np.random.normal (size = 200)) [~ ((SS.mean ()) .abs ()> 3 * sstd ()) < / code>   

Comments

Popular posts from this blog

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -

jasper reports - How to center align barcode using jasperreports and barcode4j -

django - CommandError: You must set settings.ALLOWED_HOSTS if DEBUG is False -