I try to get count of each category in DataFrame as:
data = {'col_1': ['a', 'b', 'c', 'd','c'],'col_2': [3, 2, 1, 0, 4],'col3':[99,88,77,66,55]}
df = pd.DataFrame.from_dict(data)
print(df.groupby(['col_1']).count())
Output:
col_2 col3
col_1
a 1 1
b 1 1
c 2 2
d 1 1
Why there are two columns "col_2" and "col_3" and hot to get only one with name "count" ?
Wished output is :
col_1 count
a 1
b 1
c 2
d 1
You can do:
print(df.groupby(['col_1'],as_index=False).agg(count=('col_2','count')))
OR
print(df.groupby(['col_1'],as_index=False).size().rename(columns={'size':'count'}))
Output:
col_1 count
a 1
b 1
c 2
d 1
Related
is there a way to convert the multiindex columns to normal value columns? I have a multiindexed table like that:
level_0
level_1
Value
0
0
0
A
1
0
1
B
2
1
0
C
3
1
1
D
I want to convert level_0 and level_1 to normal columns:
ID
col0
col1
Value
0
0
0
A
1
0
1
B
2
1
0
C
3
1
1
D
Any suggestion?
Thank you!
You can use reset_index followed by rename.
# Setup
my_index = pd.MultiIndex.from_arrays([(0, 1, 2, 3),
(0, 0, 1, 1),
(0, 1, 0, 1)],
names=[None, 'level_0', 'level_1'])
df = pd.DataFrame({'Value': ['A', 'B', 'C', 'D']}, index=my_index)
>>> # level=['level_0', 'level_1'] works, too
>>> df = df.reset_index(level=[1, 2])
>>> df
level_0 level_1 Value
0 0 0 A
1 0 1 B
2 1 0 C
3 1 1 D
To rename the columns, you can do
>>> df.rename(columns={'level_0': 'col0', 'level_1': 'col1'})
col0 col1 Value
0 0 0 A
1 0 1 B
2 1 0 C
3 1 1 D
I have a dataframe:
df =
col1
Aciton1
1
A
2
B
3
C
1
C
I want to edit df, such that when the value of col1 is bigger than 1 , take the value from Action1.
So I will get:
col1
Aciton1
Previous_Aciton
1
A
Initial Action
2
B
A
3
C
B
1
C
Initial Action
If there are no negative and 0 values and each group starting by 1 is possible use DataFrameGroupBy.shift:
df['Previsou_Aciton'] = df.groupby(df['col1'].eq(1).cumsum())['Aciton1'].shift(fill_value='Initail Action')
print (df)
col1 Aciton1 Previsou_Aciton
0 1 A Initail Action
1 2 B A
2 3 C B
3 1 C Initail Action
Use:
df = pd.DataFrame({'col1': [1,2,3], 'act1': ['A', 'B', 'C'], 'Previsou_Aciton':['G', 'H', 'I']})
temp = df[df['col1']>1].index
df['Previsou_Aciton'].loc[temp]=df.loc[[x-1 for x in temp]]['act1'].values
input df:
the result:
Let's say I have a (pandas) dataframe like this:
Index A ID B C
1 a 1 0 0
2 b 2 0 0
3 c 2 a a
4 d 3 0 0
I want to copy the data of the third row to the second row, because their IDs are matching, but the data is not filled. However, I want to leave column 'A' intact. Looking for a result like this:
Index A ID B C
1 a 1 0 0
2 b 2 a a
3 c 2 a a
4 d 3 0 0
What would you suggest as solution?
You can try replacing '0' with NaN then ffill()+bfill() using groupby()+apply():
df[['B','C']]=df[['B','C']].replace('0',float('NaN'))
df[['B','C']]=df.groupby('ID')[['B','C']].apply(lambda x:x.ffill().bfill()).fillna('0')
output of df:
Index A ID B C
0 1 a 1 0 0
1 2 b 2 a a
2 3 c 2 a a
3 4 d 3 0 0
Note: you can also use transform() method in place of apply() method
You can use combine_first:
s = df.loc[df[["B","C"]].ne("0").all(1)].set_index("ID")[["B", "C"]]
print (s.combine_first(df.set_index("ID")).reset_index())
ID A B C Index
0 1 a 0 0 1.0
1 2 b a a 2.0
2 2 c a a 3.0
3 3 d 0 0 4.0
import pandas as pd
data = { 'A': ['a', 'b', 'c', 'd'], 'ID': [1, 2, 2, 3], 'B': [0, 0, 'a', 0], 'C': [0, 0, 'a', 0]}
df = pd.DataFrame(data)
df.index += 1
index_to_be_replaced = 2
index_to_use_to_replace = 3
columns_to_replace = ['ID', 'B', 'C']
columns_not_to_replace = ['A']
x = df[columns_not_to_replace].loc[index_to_be_replaced]
y = df[columns_to_replace].loc[index_to_use_to_replace]
df.loc[index_to_be_replaced] = pd.concat([x, y])
print(df)
Does it solve your problem? I would check on other pandas functions, as well. Like join, merge.
❯ python3 b.py
A ID B C
1 a 1 0 0
2 b 2 a a
3 c 2 a a
4 d 3 0 0
For each row in a dataframe, I wish to create duplicates of it with an additional column to identify each duplicate.
E.g Original dataframe is
A | A
B | B
I wish to make make duplicate of each row with an additional column to identify it. Resulting in:
A | A | 1
A | A | 2
B | B | 1
B | B | 2
You can use df.reindex followed by a groupby on df.index.
df = df.reindex(df.index.repeat(2))
df['count'] = df.groupby(level=0).cumcount() + 1
df = df.reset_index(drop=True)
df
a b count
0 A A 1
1 A A 2
2 B B 1
3 B B 2
Similarly, using reindex and assign with np.tile:
df = df.reindex(df.index.repeat(2))\
.assign(count=np.tile(df.index, 2) + 1)\
.reset_index(drop=True)
df
a b count
0 A A 1
1 A A 2
2 B B 1
3 B B 2
Use Index.repeat with loc, for count groupby with cumcount:
df = pd.DataFrame({'a': ['A', 'B'], 'b': ['A', 'B']})
print (df)
a b
0 A A
1 B B
df = df.loc[df.index.repeat(2)]
df['new'] = df.groupby(level=0).cumcount() + 1
df = df.reset_index(drop=True)
print (df)
a b new
0 A A 1
1 A A 2
2 B B 1
3 B B 2
Or:
df = df.loc[df.index.repeat(2)]
df['new'] = np.tile(range(int(len(df.index)/2)), 2) + 1
df = df.reset_index(drop=True)
print (df)
a b new
0 A A 1
1 A A 2
2 B B 1
3 B B 2
Setup
Borrowed from #jezrael
df = pd.DataFrame({'a': ['A', 'B'], 'b': ['A', 'B']})
a b
0 A A
1 B B
Solution 1
Create a pd.MultiIndex with pd.MultiIndex.from_product
Then use pd.DataFrame.reindex
idx = pd.MultiIndex.from_product(
[df.index, [1, 2]],
names=[df.index.name, 'New']
)
df.reindex(idx, level=0).reset_index('New')
New a b
0 1 A A
0 2 A A
1 1 B B
1 2 B B
Solution 2
This uses the same loc and reindex concept used by #cᴏʟᴅsᴘᴇᴇᴅ and #jezrael, but simplifies the final answer by using list and int multiplication rather than np.tile.
df.loc[df.index.repeat(2)].assign(New=[1, 2] * len(df))
a b New
0 A A 1
0 A A 2
1 B B 1
1 B B 2
Use pd.concat() to repeat, and then groupby with cumcount() to count:
In [24]: df = pd.DataFrame({'col1': ['A', 'B'], 'col2': ['A', 'B']})
In [25]: df
Out[25]:
col1 col2
0 A A
1 B B
In [26]: df_repeat = pd.concat([df]*3).sort_index()
In [27]: df_repeat
Out[27]:
col1 col2
0 A A
0 A A
0 A A
1 B B
1 B B
1 B B
In [28]: df_repeat["count"] = df_repeat.groupby(level=0).cumcount() + 1
In [29]: df_repeat # df_repeat.reset_index(drop=True); if index reset required.
Out[29]:
col1 col2 count
0 A A 1
0 A A 2
0 A A 3
1 B B 1
1 B B 2
1 B B 3
I have been trying to rearrange my dataframe to use it as input for a factorplot. The raw data would look like this:
A B C D
1 0 1 2 "T"
2 1 2 3 "F"
3 2 1 0 "F"
4 1 0 2 "T"
...
My question is how can I rearrange it into this form:
col val val2
1 A 0 "T"
1 B 1 "T"
1 C 2 "T"
2 A 1 "F"
...
I was trying:
df = DF.cumsum(axis=0).stack().reset_index(name="val")
However this produces only one value column not two.. thanks for your support
I would use melt, and you can sort it how ever you like
pd.melt(df.reset_index(),id_vars=['index','D'], value_vars=['A','B','C']).sort_values(by='index')
Out[40]:
index D variable value
0 1 T A 0
4 1 T B 1
8 1 T C 2
1 2 F A 1
5 2 F B 2
9 2 F C 3
2 3 F A 2
6 3 F B 1
10 3 F C 0
3 4 T A 1
7 4 T B 0
11 4 T C 2
then obviously you can name column as you like
df.set_index('index').rename(columns={'D': 'col', 'variable': 'val2', 'value': 'val'})
consider your dataframe df
df = pd.DataFrame([
[0, 1, 2, 'T'],
[1, 2, 3, 'F'],
[2, 1, 3, 'F'],
[1, 0, 2, 'T'],
], [1, 2, 3, 4], list('ABCD'))
solution
df.set_index('D', append=True) \
.rename_axis(['col'], 1) \
.rename_axis([None, 'val2']) \
.stack().to_frame('val') \
.reset_index(['col', 'val2']) \
[['col', 'val', 'val2']]