I have a dataframe like this:
a b c d m1 m2
3 2 2 2 5 4
1 4 1 1 5 4
3 2 2 3 5 4
I would like to multiply a and b for m1 and c and d for m2:
a b c d m1 m2
15 10 8 8 5 4
5 20 4 4 5 4
15 10 8 12 5 4
Also retaining the original dataframe structure, This is fairly simple in Excel, but Pandas is proving complicated since if I try the first multiplication (m1) then the DF drops non used columns.
Cheers!
Use mul with subset of columns defined by list of columns names:
df[['a', 'b']] = df[['a', 'b']].mul(df['m1'], axis=0)
df[['c', 'd']] = df[['c', 'd']].mul(df['m2'], axis=0)
print (df)
a b c d m1 m2
0 15 10 8 8 5 4
1 5 20 4 4 5 4
2 15 10 8 12 5 4
Here's one way using np.repeat:
df.iloc[:, :4] *= np.repeat(df.iloc[:, 4:].values, 2, axis=1)
print(df)
a b c d m1 m2
0 15 10 8 8 5 4
1 5 20 4 4 5 4
2 15 10 8 12 5 4
Related
I have a dataframe like this:
ID Packet Type
1 1 A
2 1 B
3 2 A
4 2 C
5 2 B
6 3 A
7 3 C
8 4 C
9 4 B
10 5 B
11 6 C
12 6 B
13 6 A
14 7 A
I want to filter the dataframe so that I have only entries that are part of a packet with size n and which types are all different. There are only n types.
For this example let's use n=3 and the types A,B,C.
In the end I want this:
ID Packet Type
3 2 A
4 2 C
5 2 B
11 6 C
12 6 B
13 6 A
How do I do this with pandas?
Another solution, using .groupby + .filter:
df = df.groupby("Packet").filter(lambda x: len(x) == x["Type"].nunique() == 3)
print(df)
Prints:
ID Packet Type
2 3 2 A
3 4 2 C
4 5 2 B
10 11 6 C
11 12 6 B
12 13 6 A
You can do transform with nunique
out = df[df.groupby('Packet')['Type'].transform('nunique')==3]
Out[46]:
ID Packet Type
2 3 2 A
3 4 2 C
4 5 2 B
10 11 6 C
11 12 6 B
12 13 6 A
I'd loop over the groupby object, filter and concatenate:
>>> pd.concat(frame for _,frame in df.groupby("Packet") if len(frame) == 3 and frame.Type.is_unique)
ID Packet Type
2 3 2 A
3 4 2 C
4 5 2 B
10 11 6 C
11 12 6 B
12 13 6 A
I have dataframe_a and dataframe_b filled with an variable number of columns but the same number of rows.
I need to subtract each column of dfb from all dfa columns and create a new dataframe containing the subtracted values.
Right now I'm doing this manually:
sub1 = dfa.subtract(dfb[0], axis = 0)
sub2 = dfa.subtract(dfb[1], axis = 0)
sub3 = dfa.subtract(dfb[2], axis = 0)
etc
then I'm using the concat function to concatenate all the columns:
subbed = pd.concat([sub1, sub2, sub3],axis=1,ignore_index=True)
subbed = pd.concat([dfa, subbed),axis = 1)
This all seems horribly inefficient and makes me feel quite bad a programming lol. How would you do this without having to subtract each column manually and directly write the results to a new dataframe?
Setup
import pandas as pd
import numpy as np
from itertools import product
dfa = pd.DataFrame([[8, 7, 6]], range(5), [*'ABC'])
dfb = pd.DataFrame([[1, 2, 3, 4]], range(5), [*'DEFG'])
Pandas' concat
I use the operator method rsub with the axis=0 argument. See this Q&A for more information
pd.concat({c: dfb.rsub(s, axis=0) for c, s in dfa.items()}, axis=1)
A B C
D E F G D E F G D E F G
0 7 6 5 4 6 5 4 3 5 4 3 2
1 7 6 5 4 6 5 4 3 5 4 3 2
2 7 6 5 4 6 5 4 3 5 4 3 2
3 7 6 5 4 6 5 4 3 5 4 3 2
4 7 6 5 4 6 5 4 3 5 4 3 2
Numpy's broadcasting
You can play around with it and learn how it works
a = dfa.to_numpy()
b = dfb.to_numpy()
c = a[..., None] - b[:, None]
df = pd.DataFrame(dict(zip(
product(dfa, dfb),
c.reshape(5, -1).transpose()
)))
df
A B C
D E F G D E F G D E F G
0 7 6 5 4 6 5 4 3 5 4 3 2
1 7 6 5 4 6 5 4 3 5 4 3 2
2 7 6 5 4 6 5 4 3 5 4 3 2
3 7 6 5 4 6 5 4 3 5 4 3 2
4 7 6 5 4 6 5 4 3 5 4 3 2
I have a dataframe with multiindex hierarchical colmn names, including empty strings as column index names. How to subset second and third columns?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(15).reshape(5,3),
index=[1,2,3,4,5],
columns=[['A', 'A', 'B'],
['a', 'b', ''],
['', 'x', '']]
)
df.columns.names = ["c_ix0", "c_ix1", "c_ix2"]
print(df)
c_ix0 A B
c_ix1 a b
c_ix2 x
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
5 12 13 14
expected output:
c_ix0 A B
c_ix1 b
c_ix2 x
1 1 2
2 4 5
3 7 8
4 10 11
5 13 14
I believe you need xs:
a = df.xs('b', axis=1, level=1)
print (a)
c_ix0 A
c_ix2 x
1 1
2 4
3 7
4 10
5 13
b = df.xs('B', axis=1, level=0)
print (b)
c_ix1
c_ix2
1 2
2 5
3 8
4 11
5 14
If want select by positions use iloc:
c = df.iloc[:, 1]
print (c)
1 1
2 4
3 7
4 10
5 13
Name: (A, b, x), dtype: int32
EDIT:
d = df.iloc[:, [1, 2]]
print (d)
c_ix0 A B
c_ix1 b
c_ix2 x
1 1 2
2 4 5
3 7 8
4 10 11
5 13 14
Given a sample MultiIndex:
idx = pd.MultiIndex.from_product([[0, 1, 2], ['a', 'b', 'c', 'd']])
df = pd.DataFrame({'value' : np.arange(12)}, index=idx)
df
value
0 a 0
b 1
c 2
d 3
1 a 4
b 5
c 6
d 7
2 a 8
b 9
c 10
d 11
How can I efficiently convert this to a tabular format like so?
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Furthermore, given the dataframe above, how can I bring it back to its original multi-indexed state?
What I've tried:
pd.DataFrame(df.values.reshape(-1, df.index.levels[1].size),
index=df.index.levels[0], columns=df.index.levels[1])
Which works for the first problem, but I'm not sure how to bring it back to its original from there.
Using unstack and stack
In [5359]: dff = df['value'].unstack()
In [5360]: dff
Out[5360]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
In [5361]: dff.stack().to_frame('name')
Out[5361]:
name
0 a 0
b 1
c 2
d 3
1 a 4
b 5
c 6
d 7
2 a 8
b 9
c 10
d 11
By using get_level_values
pd.crosstab(df.index.get_level_values(0),df.index.get_level_values(1),values=df.value,aggfunc=np.sum)
Out[477]:
col_0 a b c d
row_0
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Another alternative, which you should think of when using stack/unstack (though unstack is clearly better in this case!) is pivot_table:
In [11]: df.pivot_table(values="value", index=df.index.get_level_values(0), columns=df.index.get_level_values(1))
Out[11]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
I have two dataframes.
df1
Out[162]:
a b c
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
11 11 11 11
df2
Out[194]:
A B
0 a 3
1 b 4
2 c 5
I wish to create a 3rd column in df2 that maps df2['A'] to df1 and find the smallest number in df1 that's greater than the number in df2['B']. For example, for df2['C'].ix[0], it should go to df1['a'] and search for the smallest number that's greater than df2['B'].ix[0], which should be 4.
I had something like df2['C'] = df2['A'].map( df1[df1 > df2['B']].min() ). But this doesn't work as it won't go to df2['B'] search for corresponding rows. Thanks.
Use apply for row-wise methods:
In [54]:
# create our data
import pandas as pd
df1 = pd.DataFrame({'a':list(range(12)), 'b':list(range(12)), 'c':list(range(12))})
df1
Out[54]:
a b c
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
11 11 11 11
[12 rows x 3 columns]
In [68]:
# create our 2nd dataframe, note I have deliberately used alternate values for column 'B'
df2 = pd.DataFrame({'A':list('abc'), 'B':[3,5,7]})
df2
Out[68]:
A B
0 a 3
1 b 5
2 c 7
[3 rows x 2 columns]
In [69]:
# apply row-wise function, must use axis=1 for row-wise
df2['C'] = df2.apply(lambda row: df1[row['A']].ix[df1[row.A] > row.B].min(), axis=1)
df2
Out[69]:
A B C
0 a 3 4
1 b 5 6
2 c 7 8
[3 rows x 3 columns]
There is some example usage in the pandas docs