how to get column names in pandas of getdummies - python

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.

An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

Related

DataFrame.melt() not pivoting columns

I have a CSV file that contains years in columns like this:
I want to create one "year" column with the values in a new column.
I tried using pandas.melt, but it doesn't seem to be changing the dataframe.
Here is the relevant code:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],var_name='year',value_name='Passengers').sort_values('Country Name')
I have tried adding the years to a list and passing that in to value_vars, but this doesn't work either. If value_vars is not specified (as above), it should pivot on all columns that aren't in id_vars. Any idea why this isn't working?
The .melt() function doesn't actually update the dataframe. Needed to save the returned frame:
international_df = pd.read_csv("data/International/PASSENGER_DATA.csv",delimiter=',')
print(international_df)
newdf = international_df.melt(id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],v

Averaging data of dataframe columns based on redundancy of another column

I want to average the data of one column in a pandas dataframe is they share the same 'id' which is stored in another column in the same dataframe. To make it simple i have:
and i want:
Were is clear that 'nx' and 'ny' columns' elements have been averaged if for them the value of 'nodes' was the same. The column 'maille' on the other hand has to remain untouched.
I'm trying with groupby but couldn't manage till now to keep the column 'maille' as it is.
Any idea?
Use GroupBy.transform with specify columns names in list for aggregates and assign back:
cols = ['nx','ny']
df[cols] = df.groupby('nodes')[cols].transform('mean')
print (df)
Another idea with DataFrame.update:
df.update(df.groupby('nodes')[cols].transform('mean'))
print (df)

How can these two dataframes be merged on a specific key?

I have two dataframes, both with a column 'hotelCode' that is type string. I made sure to convert both columns to string beforehand.
The first dataframe, we'll call old_DF looks like so:
and the second dataframe new_DF looks like:
I have been trying to merge these unsuccessfully. I've tried
final_DF = new_DF.join(old_DF, on = 'hotelCode')
and get this error:
I've tried a variety of things: changing the index name, various merge/join/concat and just haven't been successful.
Ideally, I will have a new dataframe where you have columns [[hotelCode, oldDate, newDate]] under one roof.
import pandas as pd
final_DF = pd.merge(old_DF, new_DF, on='hotelCode', how='outer')

How to create a dataframe with the column included in groupby clause?

I have a data frame. It has 3 columns A, Amount. I have done a group by using 'A'. Now I want to insert these values into a new data frame how can I achieve this?
top_plt=pd.DataFrame(top_plt.groupby('A')['Amount'].sum())
The resulting dataframe contains only the Amount column but the groupby 'A' column is missing.
Example:
Result:
DataFrame constructor is not necessary, better is add as_index=False to groupby:
top_plt= top_plt.groupby('A', as_index=False)['Amount'].sum()
Or add DataFrame.reset_index:
top_plt= top_plt.groupby('A')['Amount'].sum().reset_index()

How can the following code be used to rearrange column names

Trying to understand that how does the following code rearranges the columns of the resultant dataframe as per the other dataframe.
df_with_intercept = df_with_intercept[df_scorecard['Feature_names'].values]
Please note that 'Feature names' column in df_scorecard has all the column names used in df_with_intercept with some scores against it.
Above code just rearranged the columns in df_with_intercept to match the order of rows in 'Feature names'.
This is being done to enable dot multiplication of relevant variables with each other.
df_scorecard['Feature_names']
inputs_test_with_ref_cat_w_intercept = \
inputs_test_with_ref_cat_w_intercept[df_scorecard['Feature name'].values]
I think this might help explain things a bit.
Updating column names of one Pandas dataframe from the column of another dataframe
Start with a dataframe of data
df_pokemon = pd.DataFrame({
"A": ["Eevee", "Vaporeon", "Flareon"],
"C": ["Pichu", "Pikachu", "Raichu"]})
This produces a dataframe which looks like this
Create the dataframe which has the label names
df_labels = pd.DataFrame({
"X": ["Pikachu_line", "Eevvee_line"]})
This produces a dataframe which looks like this
I can use the X column from df_labels to replace the column names in df_pokemon
df_pokemon.columns = df_labels['X'].tolist()
Thus
How to change the order of columns in one dataframe based one the data in another dataframes column
Let say want to switch the columns in df_pokemon, we can do this.
I've created a new df_labels which has an updated order (pikachu and eevee are switched)
df_labels = pd.DataFrame({"X": ["Pikachu_line", "Eevvee_line"]})
I can use this data in column X to dictate the order in df_pokemon
df_pokemon[df_labels['X'].tolist()]
You will see the order of columns has changed

Categories