arithmetic on pandas dataframe row-wise

arithmetic on pandas dataframe row-wise - python

df have:
A B C
a 1 2 3
b 2 1 4
c 1 1 1
df want:
A B C
a 1 2 3
b 2 1 4
c 1 1 1
d 1 -1 1
I am able to get df want by using:
df.loc['d']=df.loc['b']-df.loc['a']
However, my actual df has 'a','b','c' rows for multiple IDs 'X', 'Y' etc.
A B C
X a 1 2 3
b 2 1 4
c 1 1 1
Y a 1 2 3
b 2 1 4
c 1 1 1
How can I create the same output with multiple IDs?
My original method:
df.loc['d']=df.loc['b']-df.loc['a']
fails KeyError:'b'
Desired output:
A B C
X a 1 2 3
b 2 1 4
c 1 1 1
d 1 -1 1
Y a 1 2 3
b 2 2 4
c 1 1 1
d 1 0 1

IIUC,
for i, sub in df.groupby(df.index.get_level_values(0)):
df.loc[(i, 'd'), :] = sub.loc[(i,'b')] - sub.loc[(i, 'a')]
print(df.sort_index())
Or maybe
k = df.groupby(df.index.get_level_values(0), as_index=False).apply(lambda s: pd.DataFrame([s.loc[(s.name,'b')].values - s.loc[(s.name, 'a')].values],
columns=s.columns,
index=pd.MultiIndex(levels=[[s.name], ['d']], codes=[[0],[0]])
)).reset_index(drop=True, level=0)
pd.concat([k, df]).sort_index()

Data reshaping is a useful trick if you want to do manipulation on a particular level of a multiindex. See code below,
result = (df.unstack(0).T
.assign(d=lambda x:x.b-x.a)
.stack()
.unstack(0))

Use pd.IndexSlice to slice a and b. Call diff and slice on b and rename it to d. Finally, append it to original df
idx = pd.IndexSlice
df1 = df.loc[idx[:,['a','b']],:].diff().loc[idx[:,'b'],:].rename({'b': 'd'})
df2 = df.append(df1).sort_index().astype(int)
Out[106]:
A B C
X a 1 2 3
b 2 1 4
c 1 1 1
d 1 -1 1
Y a 1 2 3
b 2 2 4
c 1 1 1
d 1 0 1

Related

Two entry table from columns pandas

I am trying to create a "two-entry table" from many columns in my df. I tried with pivot_table / crosstrab / groupby but results appeareance using this functions is not acomplish since will not be a "two entry table"
for example if i have a dataframe like this :
df
A B C D E
1 0 0 1 1
0 1 0 1 0
1 1 1 1 1
I will like to transform my df to a df which could be seen like a "two-entry table"
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 1
E 2 1 1 1 2
so if i should explain first row, would be as A has two 1 in his column, then A-A = 2, A-B = 1 because they shared one's in the third row level in df, A-C = 1 because the third row in df they shared one's in the same row level and finaly A-E = 2 because they shared one's in the first row and the third row of df

Use pd.DataFrame.dot with T:
df.T.dot(df) # or df.T#df
Output:
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 2
E 2 1 1 2 2

Is there a pythonic way to add an enumerating column while exploding a list column in pandas?

Consider the following DataFrame:
>>> df = pd.DataFrame({'A': [1,2,3], 'B':['abc', 'def', 'ghi']}).apply({'A':int, 'B':list})
>>> df
A B
0 1 [a, b, c]
1 2 [d, e, f]
2 3 [g, h, I]
This is one way to get the desired result:
>>> df['B'] = df['B'].apply(enumerate).apply(list)
>>> df = df.explode('B', ignore_index=True)
>>> df['B'] = pd.Series(df['B'], index=['B1', 'B2'])})
>>> df.droplevel(0, axis=1)
A B1 B2
0 1 0 a
1 1 1 b
2 1 2 c
3 2 0 d
4 2 1 e
5 2 2 f
6 3 0 g
7 3 1 h
8 3 2 i
Is there a neater way?

A groupby on the index is an option:
df.explode('B').assign(
B1 = lambda df: df.groupby(level=0).cumcount())
A B B1
0 1 a 0
0 1 b 1
0 1 c 2
1 2 d 0
1 2 e 1
1 2 f 2
2 3 g 0
2 3 h 1
2 3 i 2
you can always reset the index, if you have no use for it:
df.explode('B').assign(
B1 = lambda df: df.groupby(level=0).cumcount()).reset_index(drop=True)
A B B1
0 1 a 0
1 1 b 1
2 1 c 2
3 2 d 0
4 2 e 1
5 2 f 2
6 3 g 0
7 3 h 1
8 3 i 2
Since pandas version 1.3.0 you can use multiple columns with explode out of the box:
df.assign(
B1 = df.B.apply(len).apply(range)).explode(['B', 'B1'], ignore_index = True))
A B B1
0 1 a 0
1 1 b 1
2 1 c 2
3 2 d 0
4 2 e 1
5 2 f 2
6 3 g 0
7 3 h 1
8 3 i 2
I think a faster option would be to run the reshaping outside Pandas, and then rejoin back to the dataframe (of course only tests can affirm/deny this):
from itertools import chain
# you can use np.concatenate instead
# np.concatenate(df.B)
flattened = chain.from_iterable(df.B)
index = df.index.repeat([*map(len, df.B)])
flattened = pd.Series(flattened, index, name = 'B1')
(pd.concat([df.A, flattened], axis=1)
.assign(B2 = lambda df: df.groupby(level=0).cumcount())
)
A B1 B2
0 1 a 0
0 1 b 1
0 1 c 2
1 2 d 0
1 2 e 1
1 2 f 2
2 3 g 0
2 3 h 1
2 3 i 2

Pandas: Restructure dataframe from column values

The pandas dataframe includes two columns 'A' and 'B'
A B
1 a b
2 a c d
3 x
Each value in column 'B' is a string containing a variable number of letters separated by spaces.
Is there a simple way to construct:
A B
1 a
1 b
2 a
2 c
2 d
3 x

You can use the following:
splitted = df.set_index("A")["B"].str.split(expand=True)
stacked = splitted.stack().reset_index(1, drop=True)
result = stacked.to_frame("B").reset_index()
print(result)
A B
0 1 a
1 1 b
2 2 a
3 2 c
4 2 d
5 3 x
For the sub steps, see below:
print(splitted)
0 1 2
A
1 a b None
2 a c d
3 x None None
print(stacked)
A
1 a
1 b
2 a
2 c
2 d
3 x
dtype: object
Or you may also use pd.melt:
splitted = df["B"].str.split(expand=True)
pd.melt(splitted.assign(A=df.A), id_vars="A", value_name="B")\
.dropna()\
.drop("variable", axis=1)\
.sort_values("A")
A B
0 1 a
3 1 b
1 2 a
4 2 c
7 2 d
2 3 x

Value counts for all combinations of groups

I have a dataframe with multiple group columns and a value column.
a b val
0 A C 1
1 A D 1
2 A D 1
3 A D 2
4 B E 0
For any one group, for eg a==A, b==CI can do value_counts on the series slice. How can I get the value counts of all possible combinations of the group columns in a dataframe format similar to:
a b val counts
0 A C 1 1
1 A D 1 2
2 A D 2 1
2 B E 0 1

is that what you want?
In [47]: df.groupby(['a','b','val']).size().reset_index()
Out[47]:
a b val 0
0 A C 1 1
1 A D 1 2
2 A D 2 1
3 B E 0 1
or this?
In [43]: df['counts'] = df.groupby(['a','b'])['val'].transform('size')
In [44]: df
Out[44]:
a b val counts
0 A C 1 1
1 A D 1 3
2 A D 1 3
3 A D 2 3
4 B E 0 1

Convert an nxn matrix to a pandas dataframe

I have an n by n data in csv in the following format
- A B C D
A 0 1 2 4
B 2 0 3 1
C 1 0 0 5
D 2 5 4 0
...
I would like to read it and convert to a 3D pandas dataframe in the following format:
Origin Dest Distance
A A 0
A B 1
A C 2
...
What is the best way to convert it? In the worst case, I'll write a for loop to read each line and append the transpose of it but there must be an easier way. Any help would be appreciated.

Use pd.melt()
Assuming, your dataframe looks like
In [479]: df
Out[479]:
- A B C D
0 A 0 1 2 4
1 B 2 0 3 1
2 C 1 0 0 5
3 D 2 5 4 0
In [480]: pd.melt(df, id_vars=['-'], value_vars=df.columns.values.tolist()[1:],
.....: var_name='Dest', value_name='Distance')
Out[480]:
- Dest Distance
0 A A 0
1 B A 2
2 C A 1
3 D A 2
4 A B 1
5 B B 0
6 C B 0
7 D B 5
8 A C 2
9 B C 3
10 C C 0
11 D C 4
12 A D 4
13 B D 1
14 C D 5
15 D D 0
Where df.columns.values.tolist()[1:] are remaining columns ['A', 'B', 'C', 'D']
To replace '-' with 'Origin', you could use dataframe.rename(columns={...})
pd.melt(df, id_vars=['-'], value_vars=df.columns.values.tolist()[1:],
var_name='Dest', value_name='Distance').rename(columns={'-': 'Origin'})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

arithmetic on pandas dataframe row-wise - python

Data reshaping is a useful trick if you want to do manipulation on a particular level of a multiindex. See code below, result = (df.unstack(0).T .assign(d=lambda x:x.b-x.a) .stack() .unstack(0))

Related

Two entry table from columns pandas

Is there a pythonic way to add an enumerating column while exploding a list column in pandas?

Pandas: Restructure dataframe from column values

Value counts for all combinations of groups

Convert an nxn matrix to a pandas dataframe

Categories

Resources