Explode pandas dataframe column values to separate rows - python

I have following pandas dataframe:
id term code
2445 | 2716 abcd | efgh 2345
1287 hgtz 6567
I would like to explode 'id' and 'term' column. How can I explode multiple columns to keep the values across the column (id, term, code) together.
The expected output is:
id term code
2445 abcd 2345
2716 efgh 2345
1287 hgtz 6567
I have tried so far is:
df.assign(id=df['id'].str.split(' | ')).explode('id')

You're in the right way, you just need some help from concat with a listcomp :
out = (
pd.concat([df[col].str.split("\s*\|\s*")
.explode() for col in ["id", "term"]], axis=1)
.join(df["code"])
)
Output :
​
print(out)
id term code
0 2445 abcd 2345
0 2716 efgh 2345
1 1287 hgtz 6567

here is a way using .str.split() and explode() which can accept multiple columns
(df[['id','term']].stack()
.str.split(' | ',regex=False)
.unstack()
.explode(['id','term'])
.join(df[['code']]))
Output:
id term code
0 2445 abcd 2345
0 2716 efgh 2345
1 1287 hgtz 6567

Related

Create DataFrame from df1 and df2 and take empty value from df2 for column value if not exist in df1 column value

df1 = pd.DataFrame({'call_sign': ['ASD','BSD','CDSF','GFDFD','FHHH'],'frn':['123','124','','656','']})
df2 = pd.DataFrame({'call_sign': ['ASD','CDSF','BSD','GFDFD','FHHH'],'frn':['1234','','124','','765']})
need to get a new df like
df2 = pd.DataFrame({'call_sign': ['ASD','BSD','CDSF','GFDFD','FHHH'],'frn':['123','','124','656','765']})
I need to take frn from df2 if it's missing in df1 and create a new df
Replace empty strings to missing values and use DataFrame.set_index with DataFrame.fillna, because need ordering like df2.call_sign add DataFrame.reindex:
df = (df1.set_index('call_sign').replace('', np.nan)
.fillna(df2.set_index('call_sign').replace('', np.nan))
.reindex(df2['call_sign']).reset_index())
print(df)
call_sign frn
0 ASD 123
1 CDSF NaN
2 BSD 124
3 GFDFD 656
4 FHHH 765
If you want to update df2 you can use boolean indexing:
# is frn empty string?
m = df2['frn'].eq('')
# update those rows from the value in df1
df2.loc[m, 'frn'] = df2.loc[m, 'call_sign'].map(df1.set_index('call_sign')['frn'])
Updated df2:
call_sign frn
0 ASD 1234
1 CDSF
2 BSD 124
3 GFDFD 656
4 FHHH 765
temp = df1.merge(df2,how='left',on='call_sign')
df1['frn']=temp.frn_x.where(temp.frn_x!='',temp.frn_y)
call_sign frn
0 ASD 123
1 BSD
2 CDSF 124
3 GFDFD 656
4 FHHH 765

Concatenating data from two files

There are 2 files opened with Pandas. If there are common parts in the first column of two files (colored letters), I want to paste the data of the second column of second file into the matched part of the first file. And if there is no match, I want to write 'NaN'. Is there a way I can do in this situation?
File1
enter code here
0 1
0 JCW 574
1 MBM 4212
2 COP 7424
3 KVI 4242
4 ECX 424
File2
enter code here
0 1
0 G=COP d4ssd5vwe2e2
1 G=DDD dfd23e1rv515j5o
2 G=FEW cwdsuve615cdldl
3 G=JCW io55i5i55j8rrrg5f3r
4 G=RRR c84sdw5e5vwldk455
5 G=ECX j4ut84mnh54t65y
File1#
enter code here
0 1 2
0 JCW 574 io55i5i55j8rrrg5f3r
1 MBM 4212 NaN
2 COP 7424 d4ssd5vwe2e2
3 KVI 4242 NaN
4 ECX 424 j4ut84mnh54t65y
Use Series.str.extract for new Series for matched values by df1[0] values first and then merge with left join in DataFrame.merge:
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
s = df2[0].str.extract(f'({"|".join(df1[0])})', expand=False)
df = df1.merge(df2[[1]], how='left', left_on=0, right_on=s)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2
0 JCW 574 io55i5i55j8rrrg5f3r
1 MBM 4212 NaN
2 COP 7424 d4ssd5vwe2e2
3 KVI 4242 NaN
4 ECX 424 j4ut84mnh54t65y
Or if need match last 3 values of column df1[0] use:
s = df2[0].str.extract(f'({"|".join(df1[0].str[-3:])})', expand=False)
df = df1.merge(df2[[1]], how='left', left_on=0, right_on=s)
df.columns = np.arange(len(df.columns))
print (df)
Have a look at the concat-function of pandas using join='outer' (https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html). There is also this question and the answer to it that can help you.
It involves reindexing each of your data frames to use the column that is now called "0" as the index, and then joining two data frames based on their indices.
Also, can I suggest that you do not paste an image of your dataframes, but upload the data in a form that other people can test their suggestions.

how to concat 2 dataframes, keeping rows with same index

I have 2 dataframes, that i want to concat with keeping the order of the column of df1:
df1:
index unnamed:0 unnamed:1 unnamed:393 unnamed:395
0 nan 1 ... 394 396
1 0 BB CC DD
df2:
index Service
220 ABC
222 ABB
394 CC
396 DD
....
the output should be like:
df3:
index
0 394 396
1 394 396
2 CC DD
3 CC DD
if i simply make df3=pd.concat([df1,df2]) it just adds df2 at the end of the list of df 1 as whole
Almost same when using df3=pd.concat([df1,df2], axis=1, ignore_index=True)
i think there is some issue with multiindexing, but don't know what to type.
Thanks
You can use df1.join(df2).
This produces an outcome like this:
The documentation is here.
For more information about all the kinds of merging operations in pandas, you can also check this.

Adding A Specific Column from a Pandas Dataframe to Another Pandas Dataframe

I am trying to add a column to a pandas dataframe (df1) that has a unique identifier ('id') column from another dataframe (df2) that has the same unique identifier ('sameid'). I have tried merge, but I need to only add one specific column ('addthiscolumn') not all of the columns. What is the best way to do this?
print df1
'id' 'column1'
0 aaa randomdata1
1 aab randomdata2
2 aac randomdata3
3 aad randomdata4
print df2
'sameid' 'irrelevant' 'addthiscolumn'
0 aaa irre1 1234
1 aab irre2 2345
2 aac irre3 3456
3 aad irre4 4567
4 aae irre5 5678
5 aad irre6 6789
Desired Result
print df1
'id' 'column1' 'addthiscolumn'
0 aaa randomdata1 1234
1 aab randomdata2 2345
2 aac randomdata3 3456
3 aad randomdata4 4567
Because you just want to merge a single column, you can select as follows:
df1.merge(df2[['sameid', 'addthiscolumn']], left_on='id', right_on='sameid')

Appending two dataframes with same columns, different order

I have two pandas dataframes.
noclickDF = DataFrame([[0, 123, 321], [0, 1543, 432]],
columns=['click', 'id', 'location'])
clickDF = DataFrame([[1, 123, 421], [1, 1543, 436]],
columns=['click', 'location','id'])
I simply want to join such that the final DF will look like:
click | id | location
0 123 321
0 1543 432
1 421 123
1 436 1543
As you can see the column names of both original DF's are the same, but not in the same order. Also there is no join in a column.
You could also use pd.concat:
In [36]: pd.concat([noclickDF, clickDF], ignore_index=True)
Out[36]:
click id location
0 0 123 321
1 0 1543 432
2 1 421 123
3 1 436 1543
Under the hood, DataFrame.append calls pd.concat.
DataFrame.append has code for handling various types of input, such as Series, tuples, lists and dicts. If you pass it a DataFrame, it passes straight through to pd.concat, so using pd.concat is a bit more direct.
For future users (sometime >pandas 0.23.0):
You may also need to add sort=True to sort the non-concatenation axis when it is not already aligned (i.e. to retain the OP's desired concatenation behavior). I used the code contributed above and got a warning, see Python Pandas User Warning. The code below works and does not throw a warning.
In [36]: pd.concat([noclickDF, clickDF], ignore_index=True, sort=True)
Out[36]:
click id location
0 0 123 321
1 0 1543 432
2 1 421 123
3 1 436 1543
You can use append for that
df = noclickDF.append(clickDF)
print df
click id location
0 0 123 321
1 0 1543 432
0 1 421 123
1 1 436 1543
and if you need you can reset the index by
df.reset_index(drop=True)
print df
click id location
0 0 123 321
1 0 1543 432
2 1 421 123
3 1 436 1543

Categories