Add array to a dataframe in python - python

I have a datframe df, with the df.shape: (971,1)
And I have an array with the anarray.shape: (971,80).
How can I add the array to my dataframe, so that I have the shape: (971,81).
I only find solutions where the array goes into one column, but in my case it should go into several columns.

I believe you need helper DataFrame with same index like df and then DataFrame.join:
df = df.join(pd.DataFrame(anarray, index=df.index))

Related

Read 1st column, 2nd column, and nth column to last column of panda dataframe

I have a pandas dataframe df.
There are 27 columns in df.
I want to read the 1st, 2nd and 10th to the last columns of df. I can do this df.iloc[0,1,9,10,11,.....,26] but this is too tedious to type if the dataframe has many columns. What is a more elegant way to read the columns?
I am using python v3.7
If you like to select columns by their numerical index, iloc is the right thing to use. You can use np.arange add a range of columns (such as between the 10th to the last one).
import pandas as pd
import numpy as np
cols = [0, 1]
cols.extend(np.arange(10, df.shape[1]))
df.iloc[:,cols]
Alternatively, you can use numpy's r_ slicing trick:
df.iloc[:,np.r_[0:2, 10:df.shape[1]]]
You can use "list" and "range":
df.iloc[:,[0,1]+list(range(9,27))]
Or numpy way:
df.iloc[:,np.append([0,1],np.arange(9,27))]
If you know the column names, you can try :
df = df[['col1', 'col2', 'coln']]
If you don't know the exact column names, you can try this :
list_of_columns_index = [1,2,3, n]
df = df[[df.columns[i] for i in list_of_columns_index]]
Suppose you know the name of the starting column or name of column 10th in your context. Assume name is starting_column_name.
Using name of column will make the code more readable and you save the trouble of counting columns to get to the right one.
num_columns = df.shape[1] # number of columns in dataframe
starting_column = df.columns.get_loc(starting_column_name)
features = df.iloc[:, np.r_[0:2, starting_column:num_columns]]

Can I get concat() to ignore column names and work only based on the position of the columns?

The docs , at least as of version 0.24.2, specify that pandas.concat can ignore the index, with ignore_index=True, but
Note the index values on the other axes are still respected in the
join.
Is there a way to avoid this, i.e. to concatenate based on the position only, and ignoring the names of the columns?
I see two options:
rename the columns so they match, or
convert to numpy, concatenate in
numpy, then from numpy back to pandas
Are there more elegant ways?
For example, if I want to add the series s as an additional row to the dataframe df, I can:
convert s to frame
transpose it
rename its columns so they are the
same as those of df
concatenate
It works, but it seems very "un-pythonic"!
A toy example is below; this example is with a dataframe and a series, but the same concept applies with two dataframes.
import pandas as pd
df=pd.DataFrame()
df['a']=[1]
df['x']='this'
df['y']='that'
s=pd.Series([3,'txt','more txt'])
st=s.to_frame().transpose()
st.columns=df.columns
out= pd.concat( [df, st] , axis=0, ignore_index=True)
In the case of 1 dataframe and 1 series, you can do:
df.loc[df.shape[0], :] = s.values

Slicing a set of columns when a pandas dataframe does not include column labels

How to slice the last n columns from pandas dataframe assuming the dataframe does not include column labels? For instance, I want to slice the last 4 columns:
data = np.random.uniform(0,10,(4,10)).astype(np.int)
df = pd.DataFrame(data)
print(df.ix[:,4])
Can someone fix this up?
You can try with iloc instead of ix with -4: columns:
print(df.iloc[:,-4:])

Adding a new column to a pandas dataframe

I have a dataframe df with one column and 500k rows (df with first 5 elements is given below). I want to add new data in the existing column. The new data is a matrix of 200k rows and 1 column. How can I do it? Also I want add a new column named op.
X098_DE_time
0.046104
-0.037134
-0.089496
-0.084906
-0.038594
We can use concat function after rename the column from second dataframe.
df2.rename(columns={'op':' X098_DE_time'}, inplace=True)
new_df = pd.concat([df, new_df], axis=0)
Note: If we don't rename df2 column, the resultant new_df will have 2 different columns.
To add new column you can use
df["new column"] = [list of values];

Python numpy stack rows into a single column

I am working on a data frame like the following and want to reshape them into a single column and create another column using the original index:
convert the above data frame by stacking each row (indexed by "year") into a single column (named "value") and create another column filled with these values' corresponding "year" to generate a new data frame with two columns (value, year) like the following
How can I quickly achieve this using any of the numpy commands?
Thank you.
It just came to me that I can do this rather quickly with the following code
df['year'] = df.index
stacked = df.set_index('year').stack()
df = stacked.reset_index(name='value')
df.drop('level_1', axis=1, inplace=True)
This should do the trick. I should have gave it more thought before lodging this question, sorry.

Categories