python - Detect and exclude outliers in Pandas dataframe -
I have a panda dataframe with some columns.
Now I know that there are outline layers based on some lines, a fixed column value.
For example the column - 'Volume' has all the values 12.xx and one value is 4000
Now I would like to exclude those rows which are the columns in this way.
Therefore, I have to put a filter inevitably that we select all the rows which are called 3 standard deviations from the values of a given column.
This is similar to a series: boolean Use code> indexing as you would in
numpy.array
df = pd.DataFrame ({'data': np.random.normal (size = 200)}) #example Datasets of data normally distributed. df [np.abs (df.Data-df.Data.mean ()) and lt; = (3 * df.Data.std ()) # Keep only those whose +3 to -3 standard deviation is in the column 'data' df [~ (np.abs (df.Data-df.Data.mean ()) gt; (3 * df.Data.std ())]] # or if you prefer other methods
S = pd.Series (np.random.normal (size = 200)) [~ ((SS.mean ()) .abs ()> 3 * sstd ()) < / code>
Comments
Post a Comment