This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I am working on a below problem :
df_temp = pd.DataFrame()
df_temp.insert(0, 'Label', ["A|B|C","A|C","C|B","A","B"])
df_temp.insert(1, 'ID', [1,2,3,4,5])
df_temp
Label ID
0 A|B|C 1
1 A|C 2
2 C|B 3
3 A 4
4 B 5
I want to convert this dataframe into something like below dataframe, where I can separate Labels for ID column.
Expected Output :
ID Label
1 A
1 B
1 C
2 A
2 C
3 C
3 B
4 A
5 B
Try this:
(df_temp.set_index('ID')['Label']
.str.split('|', expand=True)
.reset_index()
.melt('ID')
.drop('variable', axis=1)
.dropna()
.sort_values('ID'))
Output:
ID value
0 1 A
5 1 B
10 1 C
1 2 A
6 2 C
2 3 C
7 3 B
3 4 A
4 5 B
Related
This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 2 months ago.
I've a sample dataframe
s_id c1_id c2_id c3_id
1 a b c
2 a b
3 x y z
how can I transpose the dataframe to
s_id c_id
1 a
1 b
1 c
2 a
2 b
3 x
3 y
3 z
Here you go
df.set_index("s_id").stack().droplevel(1)
Result:
s_id
1 a
1 b
1 c
2 a
2 b
3 x
3 y
3 z
dtype: object
Explanation:
Set s_id as index
Apply stack so every column is stack on each other.
We remove names stacked columns, because we don't need them.
This question already has answers here:
Pandas: Find rows which don't exist in another DataFrame by multiple columns
(2 answers)
Closed 1 year ago.
I have two data frames, df1 and df2. Now, df1 contains 6 records and df2 contains 4 records. I want to get the unmatched records out of it. I tried it but getting an error ValueError: Can only compare identically-labelled DataFrame objects I guess this is due to the length of df as the df1 has 6 and df2 has 4 but how do I compare them both and get the unmatched rows?
code
df1=
a b c
0 1 2 3
1 4 5 6
2 3 5 5
3 5 6 7
4 6 7 8
5 6 6 6
df2 =
a b c
0 3 5 5
1 5 6 7
2 6 7 8
3 6 6 6
index = (df != df2).any(axis=1)
df3 = df.loc[index]
which gives:
ValueError: Can only compare identically-labelled DataFrame objects
Expected output:
a b c
0 1 2 3
1 4 5 6
I know that the error is due to the length but is there any way where we can compare two data frames and get the unmatched records out of it?
Use df.merge with indicator=True and pick all rows except both:
In [173]: df = df1.merge(df2, indicator=True, how='outer').query('_merge != "both"').drop('_merge', 1)
In [174]: df
Out[174]:
a b c
0 1 2 3
1 4 5 6
MultiIndex.from_frame + isin
We can use MultiIndex.from_frame on both df1 and df2 to create the corresponding multiindices, then use isin to test the membership of the index created from df1 in index created from df2 to create a boolean mask which can be then used to filter the non matching rows.
i1 = pd.MultiIndex.from_frame(df1)
i2 = pd.MultiIndex.from_frame(df2)
df1[~i1.isin(i2)]
Result
a b c
0 1 2 3
1 4 5 6
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am wondering how I can most efficiently do the following operation so that I can also upscale it to dataframes with million rows+.
I have 2 panda dataframes:
Data1:
Position Letter
1 a
2 b
3 c
4 b
5 a
Data2:
Weight Letter
1 a
2 b
3 c
Now I want to create an extra column(weight) in data 1 resulting in the following:
Position Letter Weight
1 a 1
2 b 2
3 c 3
4 b 2
5 a 1
Best way is to use merge:
df = df1.merge(df2, on=['Letter'])
print(df)
Position Letter Weight
0 1 a 1
1 5 a 1
2 2 b 2
3 4 b 2
4 3 c 3
This question already has answers here:
How to repeat a Pandas DataFrame?
(7 answers)
Closed 2 years ago.
I have the a pandas columns containing multiple strings. I want all these strings to be duplicated 3 times.
df = pd.DataFrame(data = ['a','b','c']),
Desired output:
0
0 a
1 b
2 c
I want to transform this table so it looks like this:
0
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
I can't seem to find an easy way to do this.
Anything will help.
Try:
df[0].repeat(3).reset_index(drop=True)
Out:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
Name: 0, dtype: object
You can use repeat + reindex:
df = df.reindex(df.index.repeat(3))
Out[105]:
0
0 a
0 a
0 a
1 b
1 b
1 b
2 c
2 c
2 c
Or concat:
df = pd.concat([df]*3)
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two data frames. for the sake of simpleness, I will provide two dummy data frames here.
A = pd.DataFrame({'id':[1,2,3], 'name':['a','b','c']})
B = pd.DataFrame({'id':[1,1,1,3,2,3,1]})
Now, I want to create a column on the data frame B with the names that match the ids.
In this case, my desire output will be:
B = pd.DataFrame({'id':[1,1,1,3,2,3,1], 'name':['a','a','a','c','b','c','a'})
I was trying to use .apply and lambda or try to come up with other ideas, but I could not make it work.
Thank you for your help.
pd.merge or .map we use your id column as the key and return all matching values on your target dataframe.
df = pd.merge(B,A,on='id',how='left')
#or
B['name'] = B['id'].map(A.set_index('id')['name'])
print(df)
id name
0 1 a
1 1 a
2 1 a
3 3 c
4 2 b
5 3 c
6 1 a
print(B)
id name
0 1 a
1 1 a
2 1 a
3 3 c
4 2 b
5 3 c
6 1 a