I'm trying to pivot my df from wide to long, and I am attempting to replicate R's dplyr::pivot_longer() function. I have tried pd.wide_to_long() and pd.melt() but have had no success in correctly formatting the df. I also attempted using df.pivot() and come to the same conclusion.
Here is what a subset of the df (called df_wide) looks like: Rows are Store Numbers, Columns are Dates, Values are Total Sales
My current function looks like this:
df_wide.pivot(index = df_wide.index,
columns = ["Store", "Date", "Value"], # Output Col Names
values = df_wide.values)
My desired output is a df that looks like this:
Note - this question is distinct from merging, as it is looking at changing the structure of a single data frame
The stack() function is useful to achieve your objective, then reformat as needed:
pd.DataFrame( df.stack() ).reset_index(drop=False).rename(columns={'level_0':'store', 'level_1':'Date', 0:'Value'})
Related
I have a Pandas dataframe that have duplicate names but with different values, and I want to remove the duplicate names but keep the rows. A snippet of my dataframe looks like this:
And my desired output would look like this:
I've tried using the builtin pandas function .drop_duplicates(), but I end up deleting all duplicates and their respective rows. My current code looks like this:
df = pd.read_csv("merged_db.csv", encoding = "unicode_escape", chunksize=50000)
df = pd.concat(df, ignore_index=True)
df2 = df.drop_duplicates(subset=['auth_given_name', 'auth_surname'])
and this is output I am currently getting:
Basically, I want to return all the values of the coauthor but remove all duplicate data of the original author. My question is what is the best way to achieve the output that I want. I tried using the subset parameter but I don't believe I'm using it correctly.I also found a similar post, but I couldn't really apply it to python. Thank you for your time!
You may consider this code
df = pd.read_csv("merged_db.csv", encoding = "unicode_escape", chunksize=50000)
first_author = df.columns[:24]
df.loc[df.duplicated(first_author), first_author] = np.empty(len(first_author))
print(df)
I have a dataframe that looks like this:
df1:
The other one with values is like this:
df2:
I want to upate df1 values with df2 and the desired result is:
I don't know if it matter but df1 has more columns than what i showed here.
I tried some solutions using unstack, join and melt, but couldn't make them work.
What is the best way to do this?
I have Pandas DataFrame in this form:
How can I transform this into a new DataFrame with this form:
I am beginning to use Seaborn and Plotly for plotting, and it seems like they prefer data to be formatted in the second way.
Lets try set_index(), unstack(), renamecolumns
`df.set_index('Date').unstack().reset_index().rename(columns={'level_0':'Name',0:'Score'})`
How it works
df.set_index('Date')#Sets Date as index
df.set_index('Date').unstack()#Flips, melts the dataframe
d=df.set_index('Date').unstack().reset_index()# resets the datframe and allocates columns, those in index become level_suffix and attained values become 0
d.rename(columns={'level_0':'Name',0:'Score'})#renames columns
Use melt function in pandas
df.melt(id_vars="Date", value_vars=["Andy", "Barry", "Cathy"], var_name="Name", value_name="Score")
This should work :
df.stack().reset_index(level=1).rename(columns={'level_1':'Name')
I have two dataframes, both with a column 'hotelCode' that is type string. I made sure to convert both columns to string beforehand.
The first dataframe, we'll call old_DF looks like so:
and the second dataframe new_DF looks like:
I have been trying to merge these unsuccessfully. I've tried
final_DF = new_DF.join(old_DF, on = 'hotelCode')
and get this error:
I've tried a variety of things: changing the index name, various merge/join/concat and just haven't been successful.
Ideally, I will have a new dataframe where you have columns [[hotelCode, oldDate, newDate]] under one roof.
import pandas as pd
final_DF = pd.merge(old_DF, new_DF, on='hotelCode', how='outer')
I have a pivot table with a multi-index in the name of the columns like this :
I want to keep the same data it is correct, but I want to give one name to each column that summarizes all the indexes to have something like this:
You can flatten a multi-index by converting it to a dataframe with text columns and joining them:
df.columns = df.columns.to_frame().astype(str).apply(''.join, axis=1)
The result should not be far from what you want. But as you have not given any reproducible example, I could not test against your data...