Concatenating two dataframes with no common columns but same row dimension - python

I have two dataframes df1 (dimension: 2x3) and df2 (dimension: 2x239) taken for example - each having the same number of rows but a different number of columns.
I need to concatenate them to get a new dataframe df3 (dimension 2x242).
I used the concat function but ended up getting a (4x242) which has NaN values.
Need help.
Screenshot attached. jupyter notebook screenshot

You need to set the axis=1 parameter
pd.concat([df1.reset_index(), df2.reset_index()], axis=1)

You need concat on columns, from your pic, df2 index is duplicated. You need reset it to normal with .reset_index(drop=True)
out = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)

If you want to add these new columns in the same row. Try the below code :
pd.concat([df1,df2],axis=1)
The output shape will be (2*total_cols)

Related

Join column in dataframe to another dataframe - Pandas

I have 2 dataframes. One has a bunch of columns including f_uuid. The other dataframe has 2 columns, f_uuid and i_uuid.
the first dataframe may contain some f_uuids that the second dataframe doesn't and vice versa.
I want the first dataframe to have a new column i_uuid (from the second dataframe) populated with the appropriate values for the matching f_uuid in that first dataframe.
How would I achieve this?
df1 = pd.merge(df1,
df2,
on='f_uuid')
If you want to keep all f_uuid from df1 (e.g. those not available in df2), you may run
df1 = pd.merge(df1,
df2,
on='f_uuid',
how='left')
I think what your looking for is a merge : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge
In your case, that would look like :
bunch_of_col_df.merge(other_df, on="f_uuid")

How to apply a function row by row in merge syntax in Python pandas

I have two dataframes:
df1:
df2:
If i map the date in df2 from df1, using below merge command, which gives me output same as df1,
df2.merge(df1, how = 'left', on='Category')
But actually i need the output as below,
where, if only one date is returned, assign to the category
if multiple dates are returned and all are unique, assign the unique date once
if multiple dates are returned and if more than one unique date is available, assign None.
Required output:
can any one help with this, since i'm struggling here.
Thanks in advance
STEPS:
Use groupby and filter the required groups from the 1st dataframe.
drop the duplicates from df1
perform merge with this updated df1.
df1 = df1.groupby('Category').filter(
lambda x: x['Date'].nuique().eq(1)).drop_duplicates()
df2.merge(df1, how='left', on='Category')

Finding non-matching rows between two dataframes

I have a scenario where I want to find non-matching rows between two dataframes. Both dataframes will have around 30 columns and an id column that uniquely identify each record/row. So, I want to check if a row in df1 is different from the one in df2. The df1 is an updated dataframe and df2 is the previous version.
I have tried an approach pd.concat([df1, df2]).drop_duplicates(keep=False) , but it just combines both dataframes. Is there a way to do it. I would really appreciate the help.
The sample data looks like this for both dfs.
id user_id type status
There will be total 39 columns which may have NULL values in them.
Thanks.
P.S. df2 will always be a subset of df1.
If your df1 and df2 has the same shape, you may easily compare with this code.
df3 = pd.DataFrame(np.where(df1==df2,True,False), columns=df1.columns)
And you will see boolean output "False" for not matching cell value.

appending in pandas - row wise

I'm trying to append two columns of my dataframe to an existing dataframe with this:
dataframe.append(df2, ignore_index = True)
and this does not seem to be working.
This is what I'm looking for (kind of) --> a dataframe with 2 columns and 6 rows:
although this is not correct and it's using two print statements to print the two dataframes, I thought it might be helpful to have a selection of the data in mind.
I tried to use concat(), but that leads to some issues as well.
dataframe = pd.concat([dataframe, df2])
but that appears to concat the second dataframe in columns rather than rows, in addition to gicing NaN values:
any ideas on what I should do?
I assume this happened because your dataframes have different column names. Try assigning the second dataframe column names with the first dataframe column names.
df2.columns = dataframe.columns
dataframe_new = pd.concat([dataframe, df2], ignore_index=True)

Create a dataframe where the columns are existing Dataframes

I have a problem.
I have made 3 queries to 3 different tables on a database where the data is similar and stores the values on 3 different dataframes.
My question is: Is there any way to make a new data frame where the column is a Dataframe?
Like this image
https://imgur.com/pATNi80
Thank you!
I do not know what exactly you need but you can try this:-
pd.DataFrame([d["col_name"] for d in df])
Where df is the dataframe as shown in image, col_name is the column name which you want as a separate dataframe.
Thank you to jezrael for the answer.
df = pd.concat([df1, df2, df3], axis=1, keys=('df1','df2','df3'))

Categories