I am working on a data frame like the following and want to reshape them into a single column and create another column using the original index:
convert the above data frame by stacking each row (indexed by "year") into a single column (named "value") and create another column filled with these values' corresponding "year" to generate a new data frame with two columns (value, year) like the following
How can I quickly achieve this using any of the numpy commands?
Thank you.
It just came to me that I can do this rather quickly with the following code
df['year'] = df.index
stacked = df.set_index('year').stack()
df = stacked.reset_index(name='value')
df.drop('level_1', axis=1, inplace=True)
This should do the trick. I should have gave it more thought before lodging this question, sorry.
Related
I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!
If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3
After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.
I have a datframe df, with the df.shape: (971,1)
And I have an array with the anarray.shape: (971,80).
How can I add the array to my dataframe, so that I have the shape: (971,81).
I only find solutions where the array goes into one column, but in my case it should go into several columns.
I believe you need helper DataFrame with same index like df and then DataFrame.join:
df = df.join(pd.DataFrame(anarray, index=df.index))
Trying to understand that how does the following code rearranges the columns of the resultant dataframe as per the other dataframe.
df_with_intercept = df_with_intercept[df_scorecard['Feature_names'].values]
Please note that 'Feature names' column in df_scorecard has all the column names used in df_with_intercept with some scores against it.
Above code just rearranged the columns in df_with_intercept to match the order of rows in 'Feature names'.
This is being done to enable dot multiplication of relevant variables with each other.
df_scorecard['Feature_names']
inputs_test_with_ref_cat_w_intercept = \
inputs_test_with_ref_cat_w_intercept[df_scorecard['Feature name'].values]
I think this might help explain things a bit.
Updating column names of one Pandas dataframe from the column of another dataframe
Start with a dataframe of data
df_pokemon = pd.DataFrame({
"A": ["Eevee", "Vaporeon", "Flareon"],
"C": ["Pichu", "Pikachu", "Raichu"]})
This produces a dataframe which looks like this
Create the dataframe which has the label names
df_labels = pd.DataFrame({
"X": ["Pikachu_line", "Eevvee_line"]})
This produces a dataframe which looks like this
I can use the X column from df_labels to replace the column names in df_pokemon
df_pokemon.columns = df_labels['X'].tolist()
Thus
How to change the order of columns in one dataframe based one the data in another dataframes column
Let say want to switch the columns in df_pokemon, we can do this.
I've created a new df_labels which has an updated order (pikachu and eevee are switched)
df_labels = pd.DataFrame({"X": ["Pikachu_line", "Eevvee_line"]})
I can use this data in column X to dictate the order in df_pokemon
df_pokemon[df_labels['X'].tolist()]
You will see the order of columns has changed
I want to create a new named column in a Pandas dataframe, insert first value into it, and then add another values to the same column:
Something like:
import pandas
df = pandas.DataFrame()
df['New column'].append('a')
df['New column'].append('b')
df['New column'].append('c')
etc.
How do I do that?
If I understand, correctly you want to append a value to an existing column in a pandas data frame. The thing is with DFs you need to maintain a matrix-like shape so the number of rows is equal for each column what you can do is add a column with a default value and then update this value with
for index, row in df.iterrows():
df.at[index, 'new_column'] = new_value
Dont do it, because it's slow:
updating an empty frame a-single-row-at-a-time. I have seen this method used WAY too much. It is by far the slowest. It is probably common place (and reasonably fast for some python structures), but a DataFrame does a fair number of checks on indexing, so this will always be very slow to update a row at a time. Much better to create new structures and concat.
Better to create a list of data and create DataFrame by contructor:
vals = ['a','b','c']
df = pandas.DataFrame({'New column':vals})
If in case you need to add random values to the newly created column, you could also use
df['new_column']= np.random.randint(1, 9, len(df))