Pivoting DataFrame in Python Pandas

Pivoting DataFrame in Python Pandas - python

I'm trying to pivot my df from wide to long, and I am attempting to replicate R's dplyr::pivot_longer() function. I have tried pd.wide_to_long() and pd.melt() but have had no success in correctly formatting the df. I also attempted using df.pivot() and come to the same conclusion.
Here is what a subset of the df (called df_wide) looks like: Rows are Store Numbers, Columns are Dates, Values are Total Sales
My current function looks like this:
df_wide.pivot(index = df_wide.index,
columns = ["Store", "Date", "Value"], # Output Col Names
values = df_wide.values)
My desired output is a df that looks like this:
Note - this question is distinct from merging, as it is looking at changing the structure of a single data frame

The stack() function is useful to achieve your objective, then reformat as needed:
pd.DataFrame( df.stack() ).reset_index(drop=False).rename(columns={'level_0':'store', 'level_1':'Date', 0:'Value'})

Related

Pandas: How to drop column values that are duplicates but keep certain row values

I have a Pandas dataframe that have duplicate names but with different values, and I want to remove the duplicate names but keep the rows. A snippet of my dataframe looks like this:
And my desired output would look like this:
I've tried using the builtin pandas function .drop_duplicates(), but I end up deleting all duplicates and their respective rows. My current code looks like this:
df = pd.read_csv("merged_db.csv", encoding = "unicode_escape", chunksize=50000)
df = pd.concat(df, ignore_index=True)
df2 = df.drop_duplicates(subset=['auth_given_name', 'auth_surname'])
and this is output I am currently getting:
Basically, I want to return all the values of the coauthor but remove all duplicate data of the original author. My question is what is the best way to achieve the output that I want. I tried using the subset parameter but I don't believe I'm using it correctly.I also found a similar post, but I couldn't really apply it to python. Thank you for your time!

You may consider this code
df = pd.read_csv("merged_db.csv", encoding = "unicode_escape", chunksize=50000)
first_author = df.columns[:24]
df.loc[df.duplicated(first_author), first_author] = np.empty(len(first_author))
print(df)

Update rows of dataframe using columns from another dataframe

I have a dataframe that looks like this:
df1:
The other one with values is like this:
df2:
I want to upate df1 values with df2 and the desired result is:
I don't know if it matter but df1 has more columns than what i showed here.
I tried some solutions using unstack, join and melt, but couldn't make them work.
What is the best way to do this?

How can I transform a DataFrame so that the headers become column values?

I have Pandas DataFrame in this form:
How can I transform this into a new DataFrame with this form:
I am beginning to use Seaborn and Plotly for plotting, and it seems like they prefer data to be formatted in the second way.

Lets try set_index(), unstack(), renamecolumns
`df.set_index('Date').unstack().reset_index().rename(columns={'level_0':'Name',0:'Score'})`
How it works
df.set_index('Date')#Sets Date as index
df.set_index('Date').unstack()#Flips, melts the dataframe
d=df.set_index('Date').unstack().reset_index()# resets the datframe and allocates columns, those in index become level_suffix and attained values become 0
d.rename(columns={'level_0':'Name',0:'Score'})#renames columns

Use melt function in pandas
df.melt(id_vars="Date", value_vars=["Andy", "Barry", "Cathy"], var_name="Name", value_name="Score")

This should work :
df.stack().reset_index(level=1).rename(columns={'level_1':'Name')

How can these two dataframes be merged on a specific key?

I have two dataframes, both with a column 'hotelCode' that is type string. I made sure to convert both columns to string beforehand.
The first dataframe, we'll call old_DF looks like so:
and the second dataframe new_DF looks like:
I have been trying to merge these unsuccessfully. I've tried
final_DF = new_DF.join(old_DF, on = 'hotelCode')
and get this error:
I've tried a variety of things: changing the index name, various merge/join/concat and just haven't been successful.
Ideally, I will have a new dataframe where you have columns [[hotelCode, oldDate, newDate]] under one roof.

import pandas as pd
final_DF = pd.merge(old_DF, new_DF, on='hotelCode', how='outer')

remove multilevel column pivot table python

I have a pivot table with a multi-index in the name of the columns like this :
I want to keep the same data it is correct, but I want to give one name to each column that summarizes all the indexes to have something like this:

You can flatten a multi-index by converting it to a dataframe with text columns and joining them:
df.columns = df.columns.to_frame().astype(str).apply(''.join, axis=1)
The result should not be far from what you want. But as you have not given any reproducible example, I could not test against your data...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pivoting DataFrame in Python Pandas - python

The stack() function is useful to achieve your objective, then reformat as needed: pd.DataFrame( df.stack() ).reset_index(drop=False).rename(columns={'level_0':'store', 'level_1':'Date', 0:'Value'})

Related

Pandas: How to drop column values that are duplicates but keep certain row values

Update rows of dataframe using columns from another dataframe

How can I transform a DataFrame so that the headers become column values?

How can these two dataframes be merged on a specific key?

remove multilevel column pivot table python

Categories

Resources