How can the following code be used to rearrange column names - python

Trying to understand that how does the following code rearranges the columns of the resultant dataframe as per the other dataframe.
df_with_intercept = df_with_intercept[df_scorecard['Feature_names'].values]
Please note that 'Feature names' column in df_scorecard has all the column names used in df_with_intercept with some scores against it.
Above code just rearranged the columns in df_with_intercept to match the order of rows in 'Feature names'.
This is being done to enable dot multiplication of relevant variables with each other.
df_scorecard['Feature_names']
inputs_test_with_ref_cat_w_intercept = \
inputs_test_with_ref_cat_w_intercept[df_scorecard['Feature name'].values]

I think this might help explain things a bit.
Updating column names of one Pandas dataframe from the column of another dataframe
Start with a dataframe of data
df_pokemon = pd.DataFrame({
"A": ["Eevee", "Vaporeon", "Flareon"],
"C": ["Pichu", "Pikachu", "Raichu"]})
This produces a dataframe which looks like this
Create the dataframe which has the label names
df_labels = pd.DataFrame({
"X": ["Pikachu_line", "Eevvee_line"]})
This produces a dataframe which looks like this
I can use the X column from df_labels to replace the column names in df_pokemon
df_pokemon.columns = df_labels['X'].tolist()
Thus
How to change the order of columns in one dataframe based one the data in another dataframes column
Let say want to switch the columns in df_pokemon, we can do this.
I've created a new df_labels which has an updated order (pikachu and eevee are switched)
df_labels = pd.DataFrame({"X": ["Pikachu_line", "Eevvee_line"]})
I can use this data in column X to dictate the order in df_pokemon
df_pokemon[df_labels['X'].tolist()]
You will see the order of columns has changed

Related

Create Dataframe with a certain number of columns

I have the following Dataframe:
Now i want to copy the column "Power" as often as i want to another column in the same Dataframe.
The column names should be: Power_1; Power_2; Power_3.....
Creating the Dataframe is too complicated to share, but a simple example how to add the columns with a while-loop would be sufficient.
for i in range(10):
df[f"Power_{i}"] = df["Power"]

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

remove multilevel column pivot table python

I have a pivot table with a multi-index in the name of the columns like this :
I want to keep the same data it is correct, but I want to give one name to each column that summarizes all the indexes to have something like this:
You can flatten a multi-index by converting it to a dataframe with text columns and joining them:
df.columns = df.columns.to_frame().astype(str).apply(''.join, axis=1)
The result should not be far from what you want. But as you have not given any reproducible example, I could not test against your data...

How to feed new columns every time in a loop to a spark dataframe?

I have a task of reading each columns of Cassandra table into a dataframe to perform some operations. Here I want to feed the data like if 5 columns are there in a table I want:-
first column in the first iteration
first and second column in the second iteration to the same dataframe
and likewise.
I need a generic code. Has anyone tried similar to this? Please help me out with an example.
This will work:
df2 = pd.DataFrame()
for i in range(len(df.columns)):
df2 = df2.append(df.iloc[:,0:i+1],sort = True)
Since, the same column name is getting repeated, obviously df will not have same column name twice and hence it will keep on adding rows
You can extract the names from dataframe's schema and then access that particular column and use it the way you want to.
names = df.schema.names
columns = []
for name in names:
columns.append(name)
//df[columns] use it the way you want

Python numpy stack rows into a single column

I am working on a data frame like the following and want to reshape them into a single column and create another column using the original index:
convert the above data frame by stacking each row (indexed by "year") into a single column (named "value") and create another column filled with these values' corresponding "year" to generate a new data frame with two columns (value, year) like the following
How can I quickly achieve this using any of the numpy commands?
Thank you.
It just came to me that I can do this rather quickly with the following code
df['year'] = df.index
stacked = df.set_index('year').stack()
df = stacked.reset_index(name='value')
df.drop('level_1', axis=1, inplace=True)
This should do the trick. I should have gave it more thought before lodging this question, sorry.

Categories