How do I convert a numpy array into a dataframe column. Let's say I have created an empty dataframe, df, and I loop through code to create 5 numpy arrays. Each iteration of my for loop, I want to convert the numpy array I have created in that iteration into a column in my dataframe. Just to clarify, I do not want to create a new dataframe every iteration of my loop, I only want to add a column to the existing one. The code I have below is sketchy and not syntactically correct, but illustrates my point.
df = pd.dataframe()
for i in range(5):
arr = create_numpy_arr(blah) # creates a numpy array
df[i] = # convert arr to df column
This is the simplest way:
df['column_name']=pd.Series(arr)
Since you want to create a column and not an entire DataFrame from your array, you could do
import pandas as pd
import numpy as np
column_series = pd.Series(np.array([0, 1, 2, 3]))
To assign that column to an existing DataFrame:
df = df.assign(column_name=column_series)
The above will add a column named column_name into df.
If, instead, you don't have any DataFrame to assign those values to, you can pass a dict to the constructor to create a named column from your numpy array:
df = pd.DataFrame({ 'column_name': np.array([0, 1, 2, 3]) })
That will work
import pandas as pd
import numpy as np
df = pd.DataFrame()
for i in range(5):
arr = np.random.rand(10)
df[i] = arr
Maybe a simpler way is to use the vectorization
arr = np.random.rand(10, 5)
df = pd.DataFrame(arr)
Related
I trying to create a code to take all my data and create groups/DF.
For example, I have 4000 rows in my data but I want to read the first 100 and create a DataFRAME with this first 100, read the next 100 and create a another DataFRAme until the EOF.
I started with that but I just can take all the data:
for index, rows in df_.iterrows():
# Create list for the current row
my_list = rows.Temperatura
# append the list to the final list
Row_list.append(my_list)
You can numpy's array_split:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(list(range(1000)))
In [4]: np.array_split(df, df.count() / 100)
I am trying to assign values to some rows using pandas dataframe. Is there any function to do this?
For a whole column:
df = df.assign(column=value)
... where column is the name of the column.
For a specific column of a specific row:
df.at[row, column] = value
... where row is the index of the row, and column is the name of the column.
The later changes the dataframe "in place".
There is a good tutorial here.
Basically, try this:
import pandas as pd
import numpy as np
# Creating a dataframe
# Setting the seed value to re-generate the result.
np.random.seed(25)
df = pd.DataFrame(np.random.rand(10, 3), columns =['A', 'B', 'C'])
# np.random.rand(10, 3) has generated a
# random 2-Dimensional array of shape 10 * 3
# which is then converted to a dataframe
df
You will get something like this:
I am doing some modifications to a dataframe with a for loop. I am adding a new column every cycle of the for loop, however, I also drop this column at the end of the cycle. I would like to know if it is possible to store the values of this column per cycle, and create a new dataframe that is made of each of these columns that were generated per cycle. I am using the following code:
import numpy as np
import pandas as pd
newdf = np.zeros([1000,5])
df = pd.DataFrame(np.random.choice([0.0, 0.05], size=(1000,1000)))
for i in range(0, 10):
df['sum']= df.iloc[:, -1000:].sum(axis=1)
newdf[:,i] = df['sum']
df = df.drop('sum', 1)
However, I get the following error:
index 5 is out of bounds for axis 1 with size 5
Thanks
The issue occurs not because of anything that has to do with df, but because when i = 5, newdf[:, i] refers to the sixth column of a NumPy array containing only five columns. If, instead, you initialize newdf through newdf = np.zeros([1000, 10]), or loop only over range(5), then your code runs without errors.
I have a dataframe that currently looks like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,-3.4],'3M':[24,-31,53,5]}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','SP','1M','3M'])
print df
I would like to transponse in a way such that all the value fields get transposed to the Value Column and the date is appended as a row item. The column name of the value field becomes a row for the Description column. That is the resulting Dataframe should look like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-10','2017-03-10','2017-03-13','2017-03-13','2017-03-13','2017-03-14','2017-03-14','2017-03-14','2017-03-15','2017-03-15','2017-03-15'],'Value':[35.6,-7.8,24,56.7,56,-31,41,56,53,41,-3.4,5],'Desc':['SP','1M','3M','SP','1M','3M','SP','1M','3M','SP','1M','3M']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','Value','Desc'])
print df
Could someone please help how I can flip and transpose my DataFrame this way?
Use pd.melt to transform DF from a wide format to a long one:
idx = "Series_Date" # identifier variable
pd.melt(df, id_vars=idx, var_name="Desc").sort_values(idx).reset_index(drop=True)
Using python3 I wrote a code for calculating data. Code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def data(symbols):
dates = pd.date_range('2016/01/01','2016/12/23')
df=pd.DataFrame(index=dates)
for symbol in symbols:
df_temp=pd.read_csv("/home/furqan/Desktop/Data/{}.csv".format(symbol),
index_col='Date',parse_dates=True,usecols=['Date',"Close"],
na_values = ['nan'])
df_temp=df_temp.rename(columns={'Close':symbol})
df=df.join(df_temp)
df=df.fillna(method='ffill')
df=df.fillna(method='bfill')
df=(df/df.ix[0,: ])
return df
symbols = ['FABL','HINOON']
df=data(symbols)
print(df)
p_value=(np.zeros((2,2),dtype="float"))
p_value[0,0]=0.5
p_value[1,1]=0.5
print(df.shape[1])
print(p_value.shape[0])
df=np.dot(df,p_value)
print(df.shape[1])
print(df.shape[0])
print(df)
When I print df for second time the index has vanished. I think the issue is due to matrix multiplication. How can I get the indexing and column headings back into df?
To resolve your issue, because you are using numpy methods, these typically return a numpy array which is why any existing columns and index labels will have been lost.
So instead of
df=np.dot(df,p_value)
you can do
df=df.dot(p_value)
Additionally because p_value is a pure numpy array, there is no column names here so you can either create a df using existing column names:
p_value=pd.DataFrame(np.zeros((2,2),dtype="float"), columns = df.columns)
or just overwrite the column names directly after calculating the dot product like so:
df.columns = ['FABL', 'HINOON']