Rename columns with a loop in Pandas - python

I need to rename all the columns of a dataframe (pandas) with ~100 columns. I created a list with all the new names stored and i need a handy function to rename them. Many solutions online are dealing "manually" be stating the old column name, which is not possible with this size.
I tried a simple for loop like:
for i in range(0,96):
df.columns[i] = new_cols_list[i]
That is the way i would do it in r, but it throws an error:
"Index does not support mutable operations"

All you have to do is:
df.columns = new_cols_list
Use it only when you have to rename all columns. The new_col_list is the list containing the new names of columns with size equal to number of columns.
When you have to rename specific columns, then use 'rename' as shown in other answers.

Use the rename function:
# df = some data frame
# new_col_list = new column names
# get the old columns names
old_columns = list(df)
# rename the columns inplate
df.rename(columns={old_columns[idx]: name for (idx, name) in enumerate(new_col_list)}, inplace=True)
See also: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

Related

How to rename or drop specific DataFrame column if there are other columns with same name

How can one rename a specific column in a pandas DataFrame when the colum name is not unique?
Calling df.rename(columns={'old_name':'new_name'} will rename all the columns with the name "old_name".
Same question for dropping a column when there are duplicate column names.
Since calling df.rename(columns={'old_name':'new_name'} will rename all the columns called "old_name", renaming must be done with the column index.
Get all the indexes of the column of interest:
[col_index for col_index, col_name in enumerate(df.columns) if col_name in col_name_to_find]
Rename:
Once you know which index you'd like to rename: df.columns.values[col_index] = new_col_name
Dropping:
One option is to use pandas built-in duplicate method, giving you the option to keep only the first or last column, or remove them all.
df.loc[:, ~df.columns.duplicated(keep="first")].
This is helpful only if you want to drop all, the first or last duplicate column.
If you have more than 2 duplicated columns and want to keep one that is not the first or last, you can:
Get all the indexes of the column of interest (as explained above) as a list.
Remove the index you want to keep in the df from the list.
calling df.drop(df.iloc[:,[list_of_column_indexes_to_drop]], axis=1)

DataFrame.melt() not pivoting columns

I have a CSV file that contains years in columns like this:
I want to create one "year" column with the values in a new column.
I tried using pandas.melt, but it doesn't seem to be changing the dataframe.
Here is the relevant code:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],var_name='year',value_name='Passengers').sort_values('Country Name')
I have tried adding the years to a list and passing that in to value_vars, but this doesn't work either. If value_vars is not specified (as above), it should pivot on all columns that aren't in id_vars. Any idea why this isn't working?
The .melt() function doesn't actually update the dataframe. Needed to save the returned frame:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
print(international_df)
newdf = international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],v

copy multiple columns with same name but different number into new df in python/pandas

I've got a huge dataframe physioDf and want to copy some of the columns into a different df called smallPysioDf. I know, that I can copy individual columns like this:
smallPhysioDf['column_name'] = physioDf['column_name'].values
But I now want to copy 30 columns belonging to the same Variable. Each column names starts the same (e.g. "VariableName_") but end with a specific number from 1 to 30. What would be the fastest way to copy all of those columns into smallPhysioDf? I believe I would have to use a for loop but I am not sure how. Very happy for any help.
It will be quickest to select all columns at once:
columns = [f"VariableName_{i}" for i in range(1, 31, 1)]
smallPhysioDf = physioDf[columns].copy()
If smallPhysioDf already exists you can instead append (if the VariableName columns are already in the DataFrame), or merge (if the VariableName columns are new).
You can use filter to extract the column names:
for c in physioDf.filter(regex='^VariableName_').columns:
smallPhysioDf[c] = physioDf[c]
Or you can use f-string:
for n in range(1,31):
col_name = f'VariableName_{n}'
smallPhysioDf[col_name] = physioDf[col_name]

convert group of repeated columns to one column each using python

I have a csv file with repeated group of columns and I want to convert the repeated group of columns to only one column each.
I know for this kind of problem we can use the function melt in python but only when having repeated columns of only one variable .
I already found a simple solution for my problem , but I don't think it's the best.I put the repeated columns of every variable into a list,then all repeated variables into bigger list.
Then when iterating the list , I use melt on every variable(list of repeated columns of same group).
Finally I concatenate the new dataframes to only one dataframe.
Here is my code:
import pandas as pd
file_name='file.xlsx'
df_final=pd.DataFrame()
#create lists to hold headers & other variables
HEADERS = []
A = []
B=[]
C=[]
#Read CSV File
df = pd.read_excel(file_name, sheet_name='Sheet1')
#create a list of all the columns
columns = list(df)
#split columns list into headers and other variables
for col in columns:
if col.startswith('A'):
A.append(col)
elif col.startswith('B'):
B.append(col)
elif col.startswith('C') :
C.append(col)
else:
HEADERS.append(col)
#For headers take into account only the first 17 variables
HEADERS=HEADERS[:17]
#group column variables
All_cols=[]
All_cols.append(A)
All_cols.append(B)
All_cols.append(C)
#Create a final DF
for list in All_cols:
df_x = pd.melt(df,
id_vars=HEADERS,
value_vars=list,
var_name=list[0],
value_name=list[0]+'_Val')
#Concatenate DataFrames 1
df_final= pd.concat([df_A, df_x],axis=1)
#Delete duplicate columns
df_final= df_final.loc[:, ~df_final.columns.duplicated()]
I want to find a better maintenable solution for my problem and I want to have a dataframe for every group of columns (same variable) as a result.
As a beginner in python , I can't find a way of doing this.
I'm joining an image that explains what I want in case I didn't make it clear enough.
joined image

How to feed new columns every time in a loop to a spark dataframe?

I have a task of reading each columns of Cassandra table into a dataframe to perform some operations. Here I want to feed the data like if 5 columns are there in a table I want:-
first column in the first iteration
first and second column in the second iteration to the same dataframe
and likewise.
I need a generic code. Has anyone tried similar to this? Please help me out with an example.
This will work:
df2 = pd.DataFrame()
for i in range(len(df.columns)):
df2 = df2.append(df.iloc[:,0:i+1],sort = True)
Since, the same column name is getting repeated, obviously df will not have same column name twice and hence it will keep on adding rows
You can extract the names from dataframe's schema and then access that particular column and use it the way you want to.
names = df.schema.names
columns = []
for name in names:
columns.append(name)
//df[columns] use it the way you want

Categories