Add column from other data frame based on condition - python

I have two data frames:
df1 =
ID Num
a 0
b 0
c 1
d 1
And 2-nd:
df =
ID
a
a
b
b
c
c
d
I want to add Num column to df with the following rule:
If in df1 a is 0, then every a in df should be 0 and so on.
Desired output:
df1 =
ID Num
a 0
a 0
b 0
b 0
c 1
c 1
d 1
I did it with if condition, but it appears very long and hard coding

Try this:
nummap = df1.set_index('ID').to_dict()['Num']
df['Num'] = df['ID'].map(nummap)
output
In [387]: df
Out[387]:
ID Num
0 a 0
1 a 0
2 b 0
3 b 0
4 c 1
5 c 1
6 d 1

Let us try merge
df=df.merge(df1)
ID Num
0 a 0
1 a 0
2 b 0
3 b 0
4 c 1
5 c 1
6 d 1

Related

Count combination of values in pandas dataframe

Let's say we have the following df:
id
A
B
C
D
123
1
1
0
0
456
0
1
1
0
786
1
0
0
0
The id column represents a unique client.
Columns A, B, C, and D represent a product. These columns' values are binary.
1 means the client has that product.
0 means the client doesn't have that product.
I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.
This would be the desired output, given the df provided above:
A
B
C
D
A
2
1
0
0
B
0
2
1
0
C
0
1
1
0
D
0
0
1
0
import pandas as pd
df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)
Result:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
I think you want a dot product:
df2 = df.set_index('id')
out = df2.T.dot(df2)
Output:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0

How can I create a new column containing 0 and 1 values via groupby("col1")?

I have a dataframe like this:
df = pd.DataFrame({"col1":["a","a","a","b","b","c","c","c","c","d"]})
How can I create a new column containing 0 and 1 values via groupby("col1") ?
col1 col2
0 a 0
1 a 0
2 a 0
3 b 1
4 b 1
5 c 0
6 c 0
7 c 0
8 c 0
9 d 1
You can groupby col1 and take the remainder of the group number divided by 2:
df['col2'] = df.groupby('col1', sort=False).ngroup()%2
output:
col1 col2
0 a 0
1 a 0
2 a 0
3 b 1
4 b 1
5 c 0
6 c 0
7 c 0
8 c 0
9 d 1
Alternative form:
df['col2'] = df.groupby('col1', sort=False).ngroup().mod(2)
And in case you want odd groups to be 1 and even groups 0:
df['col2'] = df.groupby('col1', sort=False).ngroup().add(1).mod(2)
Without groupby try factorize
df['new'] = df.col1.factorize()[0]%2
df
Out[151]:
col1 new
0 a 0
1 a 0
2 a 0
3 b 1
4 b 1
5 c 0
6 c 0
7 c 0
8 c 0
9 d 1
Or try with
from itertools import cycle
df['new'] = df.col1.map(dict(zip(df.col1.unique(), cycle([0,1]))))
df
Out[155]:
col1 new
0 a 0
1 a 0
2 a 0
3 b 1
4 b 1
5 c 0
6 c 0
7 c 0
8 c 0
9 d 1
[It appears the question was asking about flagging every other group with 0/1; this was not clear from the initial framing of the question, so this answer perhaps appears overly simplistic.]
Check if col1 is either b or d and convert the boolean True/False to an integer:
df = pd.DataFrame({"col1":["a","a","a","b","b","c","c","c","c","d"]})
df['col2'] = df['col1'].isin(['b','d']).astype(int)
col1 col2
0 a 0
1 a 0
2 a 0
3 b 1
4 b 1
5 c 0
6 c 0
7 c 0
8 c 0
9 d 1

Select rows where two or more columns are bigger than 0 in pandas

I am working with a dataframe in pandas. My dataframe had 55 columns and 70.000 rows.
How can I select the rows where two or more values are bigger than 0?
It now looks like this:
A B C D E
a 0 2 0 8 0
b 3 0 0 0 0
c 6 2 5 0 0
And would like to make this:
A B C D E F
a 0 2 0 8 0 true
b 3 0 0 0 0 false
c 6 2 5 0 0 true
Have tried converting it to just 0's and 1's and summing that, like so:
df[df > 0] = 1
df[(df > 0).sum(axis=1) >= 2]
But then I lose all the other info in the dataframe and I still want to be able to see the original values.
Try assigning to a column like this:
>>> df['F'] = df.gt(0).sum(axis=1).ge(2)
>>> df
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True
Or try with astype(bool):
>>> df['F'] = df.astype(bool).sum(axis=1).ge(2)
>>> df
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True
>>>
You are close, only assign mask to new column:
df['F'] = (df > 0).sum(axis=1) >= 2
Or:
df['F'] = np.count_nonzero(df, axis=1) >= 2
print (df)
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True

Concat() alternate group by python3.0

My goal here is to concat() alternate groups between two dataframe.
desired result :
group ordercode quantity
0 A 1
B 1
C 1
D 1
0 A 1
B 3
1 A 1
B 2
C 1
1 A 1
B 1
C 2
My dataframe:
import pandas as pd
df1=pd.DataFrame([[0,"A",1],[0,"B",1],[0,"C",1],[0,"D",1],[1,"A",1],[1,"B",2],[1,"C",1]],columns=["group","ordercode","quantity"])
df2=pd.DataFrame([[0,"A",1],[0,"B",3],[1,"A",1],[1,"B",1],[1,"C",2]],columns=["group","ordercode","quantity"])
print(df1)
print(df2)
I have used dfff=pd.concat([df1,df2]).sort_index(kind="merge")
but I have got the below result:
group ordercode quantity
0 0 A 1
0 0 A 1
1 B 1
1 B 3
2 C 1
3 D 1
4 1 A 1
4 1 A 1
5 B 2
5 B 1
6 C 1
6 C 2
You can see here the concatenate is formed between each rows not by group.
It has to print like
group 0 of df1
group0 of df2
group1 of df1
group1 of df2 and so on
Note:
I have created these DataFrame using groupby() function
df = pd.DataFrame(np.concatenate(df.apply(lambda x: [x[0]] * x[1], 1).as_matrix()),
columns=['ordercode'])
df['quantity'] = 1
df['group'] = sorted(list(range(0, len(df)//3, 1)) * 4)[0:len(df)]
df=df.groupby(['group', 'ordercode']).sum()
Question:
Where I went wrong?
Its sorting out by taking index
I have used .set_index("group") but It didnt work either.
Use cumcount for helper column used for sorting by sort_values :
df1['g'] = df1.groupby('ordercode').cumcount()
df2['g'] = df2.groupby('ordercode').cumcount()
dfff = pd.concat([df1,df2]).sort_values(['group','g']).reset_index(drop=True)
print (dfff)
group ordercode quantity g
0 0 A 1 0
1 0 B 1 0
2 0 C 1 0
3 0 D 1 0
4 0 A 1 0
5 0 B 3 0
6 1 C 2 0
7 1 A 1 1
8 1 B 2 1
9 1 C 1 1
10 1 A 1 1
11 1 B 1 1
and last remove column:
dfff = dfff.drop('g', axis=1)

Create a categorical column based on different binary columns in python

I have a dataset that looks like this:
df = pd.DataFrame(data= [[0,0,1],[1,0,0],[0,1,0]], columns = ['A','B','C'])
A B C
0 0 0 1
1 1 0 0
2 0 1 0
I want to create a new column where on each row appears the value of the previous column where there is a 1:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Use dot:
df['value'] = df.values.dot(df.columns)
Output:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Using pd.DataFrame.idxmax:
df['value'] = df.idxmax(1)
print(df)
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B

Categories