DataFrame.melt() not pivoting columns - python

I have a CSV file that contains years in columns like this:
I want to create one "year" column with the values in a new column.
I tried using pandas.melt, but it doesn't seem to be changing the dataframe.
Here is the relevant code:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],var_name='year',value_name='Passengers').sort_values('Country Name')
I have tried adding the years to a list and passing that in to value_vars, but this doesn't work either. If value_vars is not specified (as above), it should pivot on all columns that aren't in id_vars. Any idea why this isn't working?

The .melt() function doesn't actually update the dataframe. Needed to save the returned frame:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
print(international_df)
newdf = international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],v

Related

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

Rename columns with a loop in Pandas

I need to rename all the columns of a dataframe (pandas) with ~100 columns. I created a list with all the new names stored and i need a handy function to rename them. Many solutions online are dealing "manually" be stating the old column name, which is not possible with this size.
I tried a simple for loop like:
for i in range(0,96):
df.columns[i] = new_cols_list[i]
That is the way i would do it in r, but it throws an error:
"Index does not support mutable operations"
All you have to do is:
df.columns = new_cols_list
Use it only when you have to rename all columns. The new_col_list is the list containing the new names of columns with size equal to number of columns.
When you have to rename specific columns, then use 'rename' as shown in other answers.
Use the rename function:
# df = some data frame
# new_col_list = new column names
# get the old columns names
old_columns = list(df)
# rename the columns inplate
df.rename(columns={old_columns[idx]: name for (idx, name) in enumerate(new_col_list)}, inplace=True)
See also: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

How can these two dataframes be merged on a specific key?

I have two dataframes, both with a column 'hotelCode' that is type string. I made sure to convert both columns to string beforehand.
The first dataframe, we'll call old_DF looks like so:
and the second dataframe new_DF looks like:
I have been trying to merge these unsuccessfully. I've tried
final_DF = new_DF.join(old_DF, on = 'hotelCode')
and get this error:
I've tried a variety of things: changing the index name, various merge/join/concat and just haven't been successful.
Ideally, I will have a new dataframe where you have columns [[hotelCode, oldDate, newDate]] under one roof.
import pandas as pd
final_DF = pd.merge(old_DF, new_DF, on='hotelCode', how='outer')

Use multiple rows as column header for pandas

I have a dataframe that I've imported as follows.
df = pd.read_excel("./Data.xlsx", sheet_name="Customer Care", header=None)
I would like to set the first three rows as column headers but can't figure out how to do this. I gave the following a try:
df.columns = df.iloc[0:3,:]
but that doesn't seem to work.
I saw something similar in this answer. But it only applies if all sub columns are going to be named the same way, which is not necessarily the case.
Any recommendations would be appreciated.
df = pd.read_excel(
"./Data.xlsx",
sheet_name="Customer Care",
header=[0,1,2]
)
This will tell pandas to read the first three rows of the excel file as multiindex column labels.
If you want to modify the rows after you load them then set them as columns
#set the first three rows as columns
df.columns=pd.MultiIndex.from_arrays(df.iloc[0:3].values)
#delete the first three rows (because they are also the columns
df=df.iloc[3:]

How to feed new columns every time in a loop to a spark dataframe?

I have a task of reading each columns of Cassandra table into a dataframe to perform some operations. Here I want to feed the data like if 5 columns are there in a table I want:-
first column in the first iteration
first and second column in the second iteration to the same dataframe
and likewise.
I need a generic code. Has anyone tried similar to this? Please help me out with an example.
This will work:
df2 = pd.DataFrame()
for i in range(len(df.columns)):
df2 = df2.append(df.iloc[:,0:i+1],sort = True)
Since, the same column name is getting repeated, obviously df will not have same column name twice and hence it will keep on adding rows
You can extract the names from dataframe's schema and then access that particular column and use it the way you want to.
names = df.schema.names
columns = []
for name in names:
columns.append(name)
//df[columns] use it the way you want

Categories