Joining/merging multiple dataframes - python

I have 4 dataframe objects with 1 row and X columns that I would want to join, here's a screenshot of them:
I would want them to become one big row.
Thanks for anybody who helps!

You could use concatenate in dataframe as below,
df1 = pd.DataFrame(columns=list('ABC'))
df1.loc[0] = [1,1.23,'Hello']
df2 = pd.DataFrame(columns=list('DEF'))
df2.loc[0] = [2,2.23,'Hello1']
df3 = pd.DataFrame(columns=list('GHI'))
df3.loc[0] = [3,3.23,'Hello3']
df4 = pd.DataFrame(columns=list('JKL'))
df4.loc[0] = [4,4.23,'Hello4']
pd.concat([df1,df2,df3,df4],axis=1)

Related

How can i add a column that has the same value

I was trying to add a new Column to my dataset but when i did the column only had 1 index
is there a way to make one value be in al indexes in a column
import pandas as pd
df = pd.read_json('file_1.json', lines=True)
df2 = pd.read_json('file_2.json', lines=True)
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
görüş_column = ['Milet İttifakı']
df3['Siyasi Yönelim'] = görüş_column
As per my understanding, this could be your possible solution:-
You have mentioned these lines of code:-
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
You can modify them into
df3 = pd.concat([df,df2],axis=1) ## axis=1 means second dataframe will add to columns, default value is axis=0 which adds to the rows
Second point is,
df3 = df3.loc[:, ['renderedContent']]
I think you want to write this one , instead of df3=df.loc[:,['renderedContent']].
Hope it will solve your problem.

Efficient method comparing 2 different tables columns

Hi all guys,
I have got 2 dfs and I need to check if the values from the first are matching on the second, only for a specific column on each, and save the values matching in a new list. This is what I did but it is taking quite a lot of time and I was wandering if there's a more efficient way. The lists are like in the image above from 2 different tables.
for x in df_bd_names['Building_Name']:
for y in df_sup['Source_String']:
if x == y:
matching_words_sup.append(x)
Thanks
Let's create both dataframes:
df1 = pd.DataFrame({
'Building_Name': ['Exces', 'Excs', 'Exec', 'Executer', 'Executor']
})
df2 = pd.DataFrame({
'Source_String': ['Executer', 'Executor', 'Executor Of', 'Executor For', 'Exeutor']
})
Perform inner merge between dataframes and convert first column to list:
pd.merge(df1, df2, left_on='Building_Name', right_on='Source_String', how='inner')['Building_Name'].tolist()
Output:
['Executer', 'Executor']
def __init__(self, df1, df2):
self.df1 = df1
self.df2 = df2
def compareDFsEffectively(self):
np1 = self.df1.to_numpy()
np2 = self.df2.to_numpy()
np_new = np.intersect1d(np1,np2)
print(np_new)
df_new = pd.DataFrame(np_new)
print(df_new)

Could I define multiple dataframe using pd.DataFrame()?

Is is possible to create multiple dataframes under pandas library?
The following codes is what I have tried but it doesn't work...
df1, df2 = pd.DataFrame()
You could do something like this:
df1, df2 = (pd.DataFrame(),) * 2
Or, more explicitly:
df1, df2 = pd.DataFrame(), pd.DataFrame()
Or even:
df1 = df2 = pd.DataFrame()
See this answer for a great explanation.

Join columns of several dataframes into a new dataframe

I have a dictionary of dataframes. Each of these dataframes has a column 'defrost_temperature'. What I want to do is make one new dataframe that collects all those columns, maintaining them as seperate columns.
This is what I am doing right now:
merged_defrosts = pd.DataFrame()
for key in df_dict.keys():
merged_defrosts[key] = df_dict[key]["defrost_temperature"]
But unfortunately, only the first column is filled correctly. The other columns are filled with NaN as shown in the screenshot
enter image description here
The different defrosts are not necessarily the same length. (the fourth dataframe is 108 rows, the others are 109 rows)
You can try pd.merge on index of the larger.
df_result = pd.DataFrame()
for i, df in enumerate(df_dict.values()):
s1, s2 = f'_{i}', f'_{i+1}'
m1, m2 = df_result.shape[0], df.shape[0]
if m1 == 0:
df_result = df
elif m1 >= m2:
df_result = df_result.merge(df, how=left, left_index=True, right_index=True, suffixes=(s1, s2))
else:
df_result = df.merge(df_result, how=left, left_index=True, right_index=True, suffixes=(s2, s1))
This would create undesired column names though that you can manually rename them afterwards.
You could try to concat the dataframes horizontaly after making the common column the index:
merged_defrosts = pd.concat([df.set_index("defrost_temperature") for df in df_dict.values()]
).reset_index()

Retrieve multiple lookup values in large dataset?

I have two dataframes:
import pandas as pd
data = [['138249','Cat']
,['103669','Cat']
,['191826','Cat']
,['196655','Cat']
,['103669','Cat']
,['116780','Dog']
,['184831','Dog']
,['196655','Dog']
,['114333','Dog']
,['123757','Dog']]
df1 = pd.DataFrame(data, columns = ['Hash','Name'])
print(df1)
data2 = [
'138249',
'103669',
'191826',
'196655',
'116780',
'184831',
'114333',
'123757',]
df2 = pd.DataFrame(data2, columns = ['Hash'])
I want to write a code that will take the item in the second dataframe, scan the leftmost values in the first dataframe, then return all matching values from the first dataframe into a single cell in the second dataframe.
Here's the result I am aiming for:
Here's what I have tried:
#attempt one: use groupby to squish up the dataset. No results
past = df1.groupby('Hash')
print(past)
#attempt two: use merge. Result: empty dataframe
past1 = pd.merge(df1, df2, right_index=True, left_on='Hash')
print(past1)
#attempt three: use pivot. Result: not the right format.
past2 = df1.pivot(index = None, columns = 'Hash', values = 'Name')
print(past2)
I can do this in Excel with the VBA code here but this code crashes when I apply to my real dataset (likely because it is too big - approximately 30,000 rows long)
IIUC first agg and join with df1 then reindex using df2
df1.groupby('Hash')['Name'].agg(','.join).reindex(df2.Hash).reset_index()
Hash Name
0 138249 Cat
1 103669 Cat,Cat
2 191826 Cat
3 196655 Cat,Dog
4 116780 Dog
5 184831 Dog
6 114333 Dog
7 123757 Dog

Categories