I have a pandas dataframe for example
Col1
Col2
Col3
-1
2
-3
I want to pick the minimum absolute value: table2
Col1
Col2
Col3
Result
-1
2
-3
-1
For now I am using df.abs().idxmin(axis="columns") and I get: table3
Col1
Col2
Col3
Result
-1
2
-3
Col1
I would like to ask how can I convert table 3 to table 2?
Use np.argmin (numpy counterpart of DataFrame.idxmin). Since you want to extract the original values, it's more convenient to access those values at the numpy level.
I added an extra row to your MRE for demonstration:
cols = np.argmin(df.abs().to_numpy(), axis=1) # [0, 2]
rows = range(len(cols)) # [0, 1]
df['Result'] = df.to_numpy()[rows, cols]
# Col1 Col2 Col3 Result
# 0 -1 2 -3 -1
# 1 10 -5 4 4
You can use df.abs().min(axis=1)
>>> import pandas as pd
>>>
>>> data = {'col1': [1], 'col2': [2], 'col3': [-3]}
>>> df = pd.DataFrame(data)
>>> df
col1 col2 col3
0 1 2 -3
>>> df['Result'] = df.abs().min(axis=1)
>>> df
col1 col2 col3 Result
0 1 2 -3 1
Related
I have a Dataframe defined like :
df1 = pd.DataFrame({"col1":[1,np.nan,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan],
"col2":[np.nan,3,np.nan,4,np.nan,np.nan,np.nan,5,6],
"col3":[np.nan,np.nan,7,np.nan,np.nan,8,9,np.nan, np.nan]})
I want to transform it into a DataFrame like:
df2 = pd.DataFrame({"col_name":['col1','col2','col3','col2','col1',
'col3','col3','col2','col2'],
"value":[1,3,7,4,2,8,9,5,6]})
If possible, can we reverse this process too? By that I mean convert df2 into df1.
I don't want to go through the DataFrame iteratively as it becomes too computationally expensive.
You can stack it:
out = (df1.stack().astype(int).droplevel(0)
.rename_axis('col_name').reset_index(name='value'))
Output:
col_name value
0 col1 1
1 col2 3
2 col3 7
3 col2 4
4 col1 2
5 col3 8
6 col3 9
7 col2 5
8 col2 6
To go from out back to df1, you could pivot:
out1 = pd.pivot(out.reset_index(), 'index', 'col_name', 'value')
I have a pandas dataframe.
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
col1 col2
0 1 3
1 2 4
I want to add the list lst=[10, 20] element-wise to 'col1' to have the following dataframe.
col1 col2
0 11 3
1 22 4
How to do that?
If you want to edit the column in-place you could do,
df['col1'] += lst
after which df will be,
col1 col2
0 11 3
1 22 4
Similarly, other types of mathematical operations are possible, such as,
df['col1'] *= lst
df['col1'] /= lst
If you want to create a new dataframe after addition,
df1 = df.copy()
df1['col1'] = df['col1'].add(lst, axis=0) # df['col1'].add(lst) outputs a series, df['col1']+lst also works
Now df1 is;
col1 col2
0 11 3
1 22 4
Given the following dataframe:
import pandas as pd
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'A','A']})
df
COL1 COL2
0 A NaN
1 NaN A
2 A A
I would like to create a column ('COL3') that uses the value from COL1 per row unless that value is null (or NaN). If the value is null (or NaN), I'd like for it to use the value from COL2.
The desired result is:
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
Thanks in advance!
In [8]: df
Out[8]:
COL1 COL2
0 A NaN
1 NaN B
2 A B
In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])
In [10]: df
Out[10]:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
You can use np.where to conditionally set column values.
df = df.assign(COL3=np.where(df.COL1.isnull(), df.COL2, df.COL1))
>>> df
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
If you don't mind mutating the values in COL2, you can update them directly to get your desired result.
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
>>> df
COL1 COL2
0 A NaN
1 NaN B
2 A B
df.COL2.update(df.COL1)
>>> df
COL1 COL2
0 A A
1 NaN B
2 A A
Using .combine_first, which gives precedence to non-null values in the Series or DataFrame calling it:
import pandas as pd
import numpy as np
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
df['COL3'] = df.COL1.combine_first(df.COL2)
Output:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
If we mod your df slightly then you will see that this works and in fact will work for any number of columns so long as there is a single valid value:
In [5]:
df = pd.DataFrame({'COL1': ['B', np.nan,'B'],
'COL2' : [np.nan,'A','A']})
df
Out[5]:
COL1 COL2
0 B NaN
1 NaN A
2 B A
In [6]:
df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[6]:
0 B
1 A
2 B
dtype: object
first_valid_index will return the index value (in this case column) that contains the first non-NaN value:
In [7]:
df.apply(lambda x: x.first_valid_index(), axis=1)
Out[7]:
0 COL1
1 COL2
2 COL1
dtype: object
So we can use this to index into the series
You can also use mask which replaces the values where COL1 is NaN by column COL2:
In [8]: df.assign(COL3=df['COL1'].mask(df['COL1'].isna(), df['COL2']))
Out[8]:
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
how to get all column names where values in columns are 'f' or 't' into array ?
df['FTI'].value_counts()
instead of this 'FTI' i need array of returned columns. Is it possible?
Reproducible example:
df = pd.DataFrame({'col1':[1,2,3], 'col2':['f', 'f', 'f'], 'col3': ['t','t','t'], 'col4':['d','d','d']})
col1 col2 col3 col4
0 1 f t d
1 2 f t d
2 3 f t d
Such that, using eq and all:
>>> s = (df.eq('t') | df.eq('f')).all()
col1 False
col2 True
col3 True
col4 False
dtype: bool
To get the names:
>>> s[s].index.values
array(['col2', 'col3'], dtype=object)
To get the positions:
>>> np.flatnonzero(s) + 1
array([2, 3])
Yes. It is possible. Here is one way
You can get the columns like this.
cols=[]
for col in df.columns:
if df[col].str.contains('f|t').any()==True:
cols.append(col)
Then you can just use this for frequencies
f= pd.Series()
for col in cols:
f=pd.concat([f,df[col].value_counts()])
Input: CSV with 5 columns.
Expected Output: Unique combinations of 'col1', 'col2', 'col3'.
Sample Input:
col1 col2 col3 col4 col5
0 A B C 11 30
1 A B C 52 10
2 B C A 15 14
3 B C A 1 91
Sample Expected Output:
col1 col2 col3
A B C
B C A
Just expecting this as output. I don't need col4 and col5 in output. And also don't need any sum, count, mean etc. Tried using pandas to achieve this but no luck.
My code:
input_df = pd.read_csv("input.csv");
output_df = input_df.groupby(['col1', 'col2', 'col3'])
This code is returning 'pandas.core.groupby.DataFrameGroupBy object at 0x0000000009134278'.
But I need dataframe like above. Any help much appreciated.
df[['col1', 'col2', 'col3']].drop_duplicates()
First you can use .drop() to delete col4 and col5 as you said you don't need them.
df = df.drop(['col4', 'col5'], axis=1)
Then, you can use .drop_duplicates() to delete the duplicate rows in col1, col2 and col3.
df = df.drop_duplicates(['col1', 'col2', 'col3'])
df
The output:
col1 col2 col3
0 A B C
2 B C A
You noticed that in the output the index is 0, 2 instead of 0,1. To fix that you can do this:
df.index = range(len(df))
df
The output:
col1 col2 col3
0 A B C
1 B C A