Operations with different parts of the same dataframe

Operations with different parts of the same dataframe - python

Assume there is a dataframe:
kind value
0 1 1
1 1 2
2 1 3
3 1 4
4 1 5
5 2 6
6 2 7
7 2 8
8 2 9
9 2 10
We can do something with a filtered part of a dataframe:
df.loc[df['kind']==1, 'value'] = df.loc[df['kind']==1, 'value'] * 2
How to perform a calculation involving two or more parts of the same dataframe, assuming their size is equal? Something like this:
df.loc[df['kind']==1, 'value'] =
df.loc[df['kind']==1, 'value'] * df.loc[df['kind']==2, 'value']
(this doesn't work)

Try this:
In [107]: df.loc[df['kind']==1, 'value'] *= df.loc[df['kind']==2, 'value'].values
In [108]: df
Out[108]:
kind value
0 1 6
1 1 14
2 1 24
3 1 36
4 1 50
5 2 6
6 2 7
7 2 8
8 2 9
9 2 10

Use:
m = df['kind']==1
df.loc[m, 'value'] = df.loc[m, 'value'].values * df.loc[df['kind']==2, 'value'].values
print (df)
kind value
0 1 6
1 1 14
2 1 24
3 1 36
4 1 50
5 2 6
6 2 7
7 2 8
8 2 9
9 2 10

Related

How to keep counting although it start at 1 again

My df looks as follows:
import pandas as pd
d = {'col1': [1,2,3,3,1,2,2,3,4,1,1,2]
df= pd.DataFrame(data=d)
Now I want to add a new column with the following schemata:
col1
new_col
1
1
2
2
3
3
3
3
3
3
1
4
2
5
2
5
3
6
4
7
1
8
1
8
2
9
Once it starts again at 1 it should just keep counting.
At the moment I am at the point where I just add a column with difference:
df['diff'] = df['col1'].diff()
How to extend this approach?

Try with
df.col1.diff().ne(0).cumsum()
Out[94]:
0 1
1 2
2 3
3 3
4 4
5 5
6 5
7 6
8 7
9 8
10 8
11 9
Name: col1, dtype: int32

Try:
df["new_col"] = df["col1"].ne(df["col1"].shift()).cumsum()
>>> df
col1 new_col
0 1 1
1 2 2
2 3 3
3 3 3
4 1 4
5 2 5
6 2 5
7 3 6
8 4 7
9 1 8
10 1 8
11 2 9

Python dataframe add columns in groups of 3

I have a data-frame with n rows:
df = 1 2 3
4 5 6
4 2 3
3 1 9
6 7 0
9 2 5
I want to add a columns with the same value in groups of 3.
n (num rows) is for sure divided by 3.
So the new df will be:
df = 1 2 3 A
4 5 6 A
4 2 3 A
3 1 9 B
6 7 0 B
9 2 5 B
What is the best way to do so?

First remove last rows if not dividsable by 3 with DataFrame.iloc and then create 100% unique group by divide by 3 with integer division by 3:
print (df)
a b d
0 1 2 3
1 4 5 6
2 4 2 3
3 3 1 9
4 6 7 0
5 9 2 5
6 0 0 4 <- removed last row
N = 3
num = len(df) // N * N
df = df.iloc[:num]
df['groups'] = np.arange(len(df)) // N
print (df)
a b d groups
0 1 2 3 0
1 4 5 6 0
2 4 2 3 0
3 3 1 9 1
4 6 7 0 1
5 9 2 5 1

IIUC, groupby:
df['new_col'] = df.sum(1).groupby(np.arange(len(df))//3).transform('sum')
Output:
0 1 2 new_col
0 1 2 3 30
1 4 5 6 30
2 4 2 3 30
3 3 1 9 42
4 6 7 0 42
5 9 2 5 42

Can You Preserve Column Order When Pandas Dataframe.Combine Or DataFrame.Combine_First?

If you have 2 dataframes, represented as:
A F Y
0 1 2 3
1 4 5 6
And
B C T
0 7 8 9
1 10 11 12
When combining it becomes:
A B C F T Y
0 1 7 8 2 9 3
1 4 10 11 5 12 6
I would like it to become:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
How do I combine 1 data frame with another but keep the original column order?

In [1294]: new_df = df.join(df1)
In [1295]: new_df
Out[1295]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
OR you can also use pd.merge(not a very clean solution though)
In [1297]: df['tmp' ] =1
In [1298]: df1['tmp'] = 1
In [1309]: pd.merge(df, df1, on=['tmp'], left_index=True, right_index=True).drop('tmp', 1)
Out[1309]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12

pandas python sorting according to a pattern

I have a pandas data frame that consists of 5 columns. The second column has the numbers 1 to 500 repeated 5 times. As a shorter example the second column is something like this (1,4,2,4,3,1,1,2,4,3,2,1,4,3,2,3) and I want to sort it to look like this (1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4). The code i am using to sort is df=res.sort([2],ascending=True) but this code sorts it (1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4).
Any help will be much appreciated. Thanks

How's about this: sort by the cumcount and then the value itself:
In [11]: df = pd.DataFrame({"s": [1,4,2,4,3,1,1,2,4,3,2,1,4,3,2,3]})
In [12]: df.groupby("s").cumcount()
Out[12]:
0 0
1 0
2 0
3 1
4 0
5 1
6 2
7 1
8 2
9 1
10 2
11 3
12 3
13 2
14 3
15 3
dtype: int64
In [13]: df["s_cumcounts"] = df.groupby("s").cumcount()
In [14]: df.sort_values(["s_cumcounts", "s"])
Out[14]:
s s_cumcounts
0 1 0
2 2 0
4 3 0
1 4 0
5 1 1
7 2 1
9 3 1
3 4 1
6 1 2
10 2 2
13 3 2
8 4 2
11 1 3
14 2 3
15 3 3
12 4 3
In [15]: df = df.sort_values(["s_cumcounts", "s"])
In [16]: del df["s_cumcounts"]

How do I put a series (such as) the result of a pandas groupby.apply(f) into a new column of the dataframe?

I have a dataframe, that I want to calculate statitics on (value_count, mode, mean, etc.) and then put the result in a new column. My current solution is O(n**2) or so, and I'm sure there is likely a faster, obvious method that I'm overlooking.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(10, size=(100, 10)),
columns = list('abcdefghij'))
df['result'] = 0
groups = df.groupby([df.i, df.j])
for g in groups:
icol_eq = df.i == g[0][0]
jcol_eq = df.j == g[0][1]
i_and_j = icol_eq & jcol_eq
df['result'][i_and_j] = len(g[1])
The above works, but is extremely slow for large dataframes.
I tried
df['result'] = df.groupby([df.i, df.j]).apply(len)
but it doesn't seem to work.
Nor does
def f(g):
g['result'] = len(g)
return g
df.groupby([df.i, df.j]).apply(f)
Nor can I merge the resulting series of a df.groupby.apply(lambda x: len(x))

You want to use transform:
In [98]:
df['result'] = df.groupby([df.i, df.j]).transform(len)
df
Out[98]:
a b c d e f g h i j result
0 6 1 3 0 1 1 4 2 8 6 6
1 1 3 9 7 5 5 3 5 4 4 1
2 1 5 0 1 8 1 4 7 3 9 1
3 6 8 6 4 6 0 8 0 6 5 6
4 7 9 7 2 8 9 9 6 0 6 7
5 3 5 5 7 2 7 7 3 2 8 3
6 5 0 4 7 5 7 5 7 9 1 5
7 3 2 5 4 3 6 8 4 2 0 3
8 2 3 0 4 8 5 7 9 7 2 2
9 1 1 3 2 3 5 6 6 5 6 1
10 3 0 2 7 1 8 1 3 5 4 3
....
transform returns a Series with an index aligned to your original df so you can then add it as a column

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Operations with different parts of the same dataframe - python

Try this: In [107]: df.loc[df['kind']==1, 'value'] *= df.loc[df['kind']==2, 'value'].values In [108]: df Out[108]: kind value 0 1 6 1 1 14 2 1 24 3 1 36 4 1 50 5 2 6 6 2 7 7 2 8 8 2 9 9 2 10

Use: m = df['kind']==1 df.loc[m, 'value'] = df.loc[m, 'value'].values * df.loc[df['kind']==2, 'value'].values print (df) kind value 0 1 6 1 1 14 2 1 24 3 1 36 4 1 50 5 2 6 6 2 7 7 2 8 8 2 9 9 2 10

Related

How to keep counting although it start at 1 again

Python dataframe add columns in groups of 3

Can You Preserve Column Order When Pandas Dataframe.Combine Or DataFrame.Combine_First?

pandas python sorting according to a pattern

How do I put a series (such as) the result of a pandas groupby.apply(f) into a new column of the dataframe?

Categories

Resources