I have a problem.
I have made 3 queries to 3 different tables on a database where the data is similar and stores the values on 3 different dataframes.
My question is: Is there any way to make a new data frame where the column is a Dataframe?
Like this image
https://imgur.com/pATNi80
Thank you!
I do not know what exactly you need but you can try this:-
pd.DataFrame([d["col_name"] for d in df])
Where df is the dataframe as shown in image, col_name is the column name which you want as a separate dataframe.
Thank you to jezrael for the answer.
df = pd.concat([df1, df2, df3], axis=1, keys=('df1','df2','df3'))
Related
I have two dataframes df1 (dimension: 2x3) and df2 (dimension: 2x239) taken for example - each having the same number of rows but a different number of columns.
I need to concatenate them to get a new dataframe df3 (dimension 2x242).
I used the concat function but ended up getting a (4x242) which has NaN values.
Need help.
Screenshot attached. jupyter notebook screenshot
You need to set the axis=1 parameter
pd.concat([df1.reset_index(), df2.reset_index()], axis=1)
You need concat on columns, from your pic, df2 index is duplicated. You need reset it to normal with .reset_index(drop=True)
out = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
If you want to add these new columns in the same row. Try the below code :
pd.concat([df1,df2],axis=1)
The output shape will be (2*total_cols)
I have a dataframe that looks like this:
df1:
The other one with values is like this:
df2:
I want to upate df1 values with df2 and the desired result is:
I don't know if it matter but df1 has more columns than what i showed here.
I tried some solutions using unstack, join and melt, but couldn't make them work.
What is the best way to do this?
I'm trying to append two columns of my dataframe to an existing dataframe with this:
dataframe.append(df2, ignore_index = True)
and this does not seem to be working.
This is what I'm looking for (kind of) --> a dataframe with 2 columns and 6 rows:
although this is not correct and it's using two print statements to print the two dataframes, I thought it might be helpful to have a selection of the data in mind.
I tried to use concat(), but that leads to some issues as well.
dataframe = pd.concat([dataframe, df2])
but that appears to concat the second dataframe in columns rather than rows, in addition to gicing NaN values:
any ideas on what I should do?
I assume this happened because your dataframes have different column names. Try assigning the second dataframe column names with the first dataframe column names.
df2.columns = dataframe.columns
dataframe_new = pd.concat([dataframe, df2], ignore_index=True)
Im trying to write a script for few ETL transformations. I have 34 fixed columns i.e. df1, according to which I have to map the column name of different input files containing different columns i.e. df2.
df1(Standard Columns):
df2:
I have tried df.merge but that does not seem to solve my problem.
The expected result is the columns in the input file df2 to be mapped with same column name as df1 and same order as they appaer in df2with its original value intact.
Expected Result :
any help will be greatly appreciated !!
A way to do this would be to have an intermediate step of mapping the columns.
For instance:
df2.rename(columns = {'Department Code':'Field 1 Dept Number','Column2':'2_column', .....})
And then you can merge the two dataframes on the columns of interest.
Does anyone know of an efficient way to create a new dataframe based off of two dataframes in Python/Pandas?
What I am trying to do is check if a value from df1 is in df2, then do not add the row to df3. I am working with student IDS, and if a student ID from df1 is in df2, I do not want to include it in the new dataframe, df3.
So does anybody know an efficient way to do this? I have googled and looked on SO, but found nothing that works so far.
Assuming the column is called ID.
df3 = df1[~df1["ID"].isin(df2["ID"])].copy()
If you have both dataframes of same length you can also use:
print df1.loc[df1['ID'] != df2['ID']]
assign it to a third dataframe.