Python numpy stack rows into a single column - python

I am working on a data frame like the following and want to reshape them into a single column and create another column using the original index:
convert the above data frame by stacking each row (indexed by "year") into a single column (named "value") and create another column filled with these values' corresponding "year" to generate a new data frame with two columns (value, year) like the following
How can I quickly achieve this using any of the numpy commands?
Thank you.

It just came to me that I can do this rather quickly with the following code
df['year'] = df.index
stacked = df.set_index('year').stack()
df = stacked.reset_index(name='value')
df.drop('level_1', axis=1, inplace=True)
This should do the trick. I should have gave it more thought before lodging this question, sorry.

Related

moving a row next to another row in panda data frame

I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!
If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

Add array to a dataframe in python

I have a datframe df, with the df.shape: (971,1)
And I have an array with the anarray.shape: (971,80).
How can I add the array to my dataframe, so that I have the shape: (971,81).
I only find solutions where the array goes into one column, but in my case it should go into several columns.
I believe you need helper DataFrame with same index like df and then DataFrame.join:
df = df.join(pd.DataFrame(anarray, index=df.index))

How can the following code be used to rearrange column names

Trying to understand that how does the following code rearranges the columns of the resultant dataframe as per the other dataframe.
df_with_intercept = df_with_intercept[df_scorecard['Feature_names'].values]
Please note that 'Feature names' column in df_scorecard has all the column names used in df_with_intercept with some scores against it.
Above code just rearranged the columns in df_with_intercept to match the order of rows in 'Feature names'.
This is being done to enable dot multiplication of relevant variables with each other.
df_scorecard['Feature_names']
inputs_test_with_ref_cat_w_intercept = \
inputs_test_with_ref_cat_w_intercept[df_scorecard['Feature name'].values]
I think this might help explain things a bit.
Updating column names of one Pandas dataframe from the column of another dataframe
Start with a dataframe of data
df_pokemon = pd.DataFrame({
"A": ["Eevee", "Vaporeon", "Flareon"],
"C": ["Pichu", "Pikachu", "Raichu"]})
This produces a dataframe which looks like this
Create the dataframe which has the label names
df_labels = pd.DataFrame({
"X": ["Pikachu_line", "Eevvee_line"]})
This produces a dataframe which looks like this
I can use the X column from df_labels to replace the column names in df_pokemon
df_pokemon.columns = df_labels['X'].tolist()
Thus
How to change the order of columns in one dataframe based one the data in another dataframes column
Let say want to switch the columns in df_pokemon, we can do this.
I've created a new df_labels which has an updated order (pikachu and eevee are switched)
df_labels = pd.DataFrame({"X": ["Pikachu_line", "Eevvee_line"]})
I can use this data in column X to dictate the order in df_pokemon
df_pokemon[df_labels['X'].tolist()]
You will see the order of columns has changed

How to add values to a new column in pandas dataframe?

I want to create a new named column in a Pandas dataframe, insert first value into it, and then add another values to the same column:
Something like:
import pandas
df = pandas.DataFrame()
df['New column'].append('a')
df['New column'].append('b')
df['New column'].append('c')
etc.
How do I do that?
If I understand, correctly you want to append a value to an existing column in a pandas data frame. The thing is with DFs you need to maintain a matrix-like shape so the number of rows is equal for each column what you can do is add a column with a default value and then update this value with
for index, row in df.iterrows():
df.at[index, 'new_column'] = new_value
Dont do it, because it's slow:
updating an empty frame a-single-row-at-a-time. I have seen this method used WAY too much. It is by far the slowest. It is probably common place (and reasonably fast for some python structures), but a DataFrame does a fair number of checks on indexing, so this will always be very slow to update a row at a time. Much better to create new structures and concat.
Better to create a list of data and create DataFrame by contructor:
vals = ['a','b','c']
df = pandas.DataFrame({'New column':vals})
If in case you need to add random values to the newly created column, you could also use
df['new_column']= np.random.randint(1, 9, len(df))

Categories