I'm trying to convert a column with values in a list into separated rows grouped by specifics columns.
That's the dataframe I have:
id rooms bathrooms facilities
111 1 2 [2, 3, 4]
222 2 3 [4, 5, 6]
333 2 1 [2, 3, 4]
That's the dataframe I need:
id rooms bathrooms facility
111 1 2 2
111 1 2 3
111 1 2 4
222 2 3 4
222 2 3 5
222 2 3 6
333 2 1 2
333 2 1 3
333 2 1 4
I was trying converting to list the column facilities first:
facilities = pd.DataFrame(df.facilities.tolist())
And later join by columns and following the same method with another suggested solution:
df[['id', 'rooms', 'bathrooms']].join(facilities).melt(id_vars=['id', 'rooms', 'bathrooms']).drop('variable', 1)
Unfortunately, it didn't work for me.
Another solution?
Thanks in advance!
You need explode:
df.explode('facilities')
# id rooms bathrooms facilities
#0 111 1 2 2
#0 111 1 2 3
#0 111 1 2 4
#1 222 2 3 4
#1 222 2 3 5
#1 222 2 3 6
#2 333 2 1 2
#2 333 2 1 3
#2 333 2 1 4
It is a bit awkward to have list as values in a dataframe so one way I can think of to work around this is to unpack the lists and store each in its own column, then use the melt function.
# recreate your data
d = {"id":[111, 222, 333],
"rooms": [1,2,2],
"bathrooms": [2,3,1],
"facilities": [[2, 3, 4],[4, 5, 6],[2, 3, 4]]}
df = pd.DataFrame(d)
# unpack the lists
f0, f1, f2 = [],[],[]
for row in df.itertuples():
f0.append(row.facilities[0])
f1.append(row.facilities[1])
f2.append(row.facilities[2])
df["f0"] = f0
df["f1"] = f1
df["f2"] = f2
# melt the dataframe
df = pd.melt(df, id_vars=['id', 'rooms', 'bathrooms'], value_vars=["f0", "f1", "f2"], value_name="facilities")
# optionally sort the values and remove the "variable" column
df.sort_values(by=['id'], inplace=True)
df = df[['id', 'rooms', 'bathrooms', 'facilities']]
I think that should get you the dataframe you need.
id rooms bathrooms facilities
0 111 1 2 2
3 111 1 2 3
6 111 1 2 4
1 222 2 3 4
4 222 2 3 5
7 222 2 3 6
2 333 2 1 2
5 333 2 1 3
8 333 2 1 4
The following will give the desired output
def changeDf(x):
df_m = pd.DataFrame(columns=['id','rooms','bathrooms','facilities'])
for index, fc in enumerate(x['facilities']):
df_m.loc[index] = [x['id'], x['rooms'], x['bathrooms'], fc]
return df_m
df_modified = df.apply(changeDf, axis=1)
df_final = pd.concat([i for i in df_modified])
print(df_final)
"df" is input dataframe and "df_final" is desired dataframe
Try this
reps = [len(x) for x in df.facilities]
facilities = pd.Series(np.array(df.facilities.tolist()).ravel())
df = df.loc[df.index.repeat(reps)].reset_index(drop=True)
df.facilities = facilities
df
id rooms bathrooms facilities
0 111 1 2 2
1 111 1 2 3
2 111 1 2 4
3 222 2 3 4
4 222 2 3 5
5 222 2 3 6
6 333 2 1 2
7 333 2 1 3
8 333 2 1 4
I have the following Pandas Dataframes:
df1:
C D E F G
111 222 333 444 555
666 777
df2:
A B
111 3
222 4
333 3
444 3
555 4
100 3
666 4
200 3
777 3
I need to look up in df2 to find matching value from df1.A Then replace that value in df1 with the paired valued in df2.B
So the required output would be:
C D E F G
3 4 3 3 4
4 3
I tried a left merge and thought to try and reshape the values across but thought there must be a simpler / cleaner direct search and replace method. Any help much appreciated.
First create a series mapping:
s = df2.set_index('A')['B']
Then apply this to each value:
df1 = df1.applymap(s.get)
try this,
temp=df2.set_index('A')['B']
print df1.replace(temp)
Output:
C D E F G
0 3 4 3.0 3.0 4.0
1 4 3 NaN NaN NaN
Based on the dataframe (1) below, I wish to create a dataframe (2) where either y or z is equal to 2. Is there a way to do this conveniently?
And if I were to create a dataframe (3) that only contains rows from dataframe (1) but not dataframe (2), how should I approach it?
id x y z
0 324 1 2
1 213 1 1
2 529 2 1
3 347 3 2
4 109 2 2
...
df[df[['y','z']].eq(2).any(1)]
Out[1205]:
id x y z
0 0 324 1 2
2 2 529 2 1
3 3 347 3 2
4 4 109 2 2
You can create df2 easily enough using a condition:
df2 = df1[df1.y.eq(2) | df1.z.eq(2)]
df2
x y z
id
0 324 1 2
2 529 2 1
3 347 3 2
4 109 2 2
Given df2 and df1, you can perform a set difference operation on the index, like this:
df3 = df1.iloc[df1.index.difference(df2.index)]
df3
x y z
id
1 213 1 1
You can do the following:
import pandas as pd
df = pd.read_csv('data.csv')
df2 = df[(df.y == 2) | (df.z == 2)]
print(df2)
Results:
id x y z
0 0 324 1 2
2 2 529 2 1
3 3 347 3 2
4 4 109 2 2
I have a dataframe1 like the following:
A B C D
1 111 a 9
2 121 b 8
3 122 c 7
4 121 d 6
5 131 e 5
Also, I have another dataframe2:
Code String
111 s
12 b
13 u
What I want is to creat a dataframe like the following:
A B C D
1 111 S 9
2 121 b 8
3 122 c 7
4 121 b 6
5 131 u 5
That would be, take the first n digits (where n is the number of digits in Code column of dataframe2) and if it has the same numbers that the code, then the column C in dataframe1 would change for the string in dataframe2.
Is this what you want ? The code is not very neat but work..
import pandas as pd
DICT=df2.set_index('Code').T.to_dict('list')
Temp=[]
for key, value in DICT.items():
n=len(str(key))
D1={str(key):value[0]}
T=df1.B.astype(str).apply(lambda x: x[:n]).map(D1)
Temp2=(df1.B.astype(str).apply(lambda x: x[:n]))
Tempdf=pd.DataFrame({'Ori':df1.B,'Now':Temp2,'C':df1.C})
TorF=(Tempdf.groupby(['Now'])['Ori'].transform(min) == Tempdf['Ori'])
for n, i in enumerate(T):
if TorF[n]==False:
T[n]=Tempdf.ix[n,0]
Temp.append(T)
df1.C=pd.DataFrame(data=Temp).fillna(method='bfill').T.ix[:,0]
Out[255]:
A B C D
0 1 111 s 9
1 2 121 b 8
2 3 122 c 7
3 4 121 b 6
4 5 131 u 5
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
How can I merge 2 dataframes df1 and df2 using a common column 'ADD' into df3?
Both df1 and df2 have a common column 'ADD'.
I want to use df2 as a mapping table for covert ADD into a ST value.
I have tried to convert df2 into Series or Dictionary, but nether seems work.
df1 =
Name ADD
1 A 12
2 B 54
3 C 34
4 D 756
5 E 43
df2 =
ADD ST
1 12 CA
2 54 CA
3 34 TX
df3 =
Name ADD ST
1 A 12 CA
2 B 54 CA
3 C 34 TX
4 D 756 nan
5 E 43 nan
You have to do an outer merge (join):
In [11]: df1.merge(df2, how='outer')
Out[11]:
Name ADD ST
0 A 12 CA
1 B 54 CA
2 C 34 TX
3 D 756 NaN
4 E 43 NaN