Reshape (stack) pandas dataframe based on predefined number of rows

Reshape (stack) pandas dataframe based on predefined number of rows - python

I have a pandas dataframe which looks like one long row.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
________________________________________________________________________________________________
2010 | 0.1 0.5 0.5 0.7 0.5 0.5 0.5 0.5 0.9 0.5 0.5 0.8 0.3 0.3 0.6
I would like to reshape it as:
0 1 2 3 4
____________________________________
|0| 0.1 0.5 0.5 0.7 0.5
2010 |1| 0.5 0.5 0.5 0.9 0.5
|2| 0.5 0.8 0.3 0.3 0.6
I can certainly do it using a loop, but I'm guessing (un)stack and/or pivot might be able to do the trick, but I couldn't figure it out how...
Symmetry/filling up blanks - if the data is not integer divisible by the number of rows after unstack - is not important for now.
EDIT:
I coded up the loop solution meanwhile:
df=my_data_frame
dk=pd.DataFrame()
break_after=3
for i in range(len(df)/break_after):
dl=pd.DataFrame(df[i*break_after:(i+1)*break_after]).T
dl.columns=range(break_after)
dk=pd.concat([dk,dl])

If there is only one index (2010), this will work fine.
df1 = pd.DataFrame(np.reshape(df.values,(3,5)))
df1['Index'] = '2010'
df1.set_index('Index',append=True,inplace=True)
df1 = df1.reorder_levels(['Index', None])
Output:
0 1 2 3 4
Index
2010 0 0.1 0.5 0.5 0.7 0.5
1 0.5 0.5 0.5 0.9 0.5
2 0.5 0.8 0.3 0.3 0.6

Related

How to transform the following dataset for time series analysis?

This is the dataset, I want to transform for time series forecasting. Here, the column names contains the store number.
df=
| Date | store_1 |store_2 |store_3
|:---- |:------:| -----:|-----:|
| 1-1-21 | 0.5 | 0.2 | 0.3 |
| 1-2-21 | 0.3 | 0.7 | 0.1 |
| 1-3-21 | 0.6 | 0.9 | 0.4 |
I want to convert df to df1:
Date store number value
1-1-21 1 0.5
1-2-21 1 0.3
1-3-21 1 0.6
1-1-21 2 0.2
1-2-21 2 0.7
1-3-21 2 0.9
1-1-21 3 0.3
1-2-21 3 0.1
1-3-21 3 0.4

Use melt:
out = df.melt(id_vars=['Date'], var_name='Store_number', value_name='Value')
out['Store_number'] = out['Store_number'].str.extract(r'store_(\d+)')
print(out)
# Output:
Date Store_number Value
0 1-1-21 1 0.5
1 1-2-21 1 0.3
2 1-3-21 1 0.6
3 1-1-21 2 0.2
4 1-2-21 2 0.7
5 1-3-21 2 0.9
6 1-1-21 3 0.3
7 1-2-21 3 0.1
8 1-3-21 3 0.4
Update:
Can you suggest me a way to get back to the orginal form? After forecasting the prediction, I need to make the dataframe as the orginal one.
out = out.pivot(index='Date', columns='Store_number', values='Value') \
.add_prefix('store_').rename_axis(columns=None).reset_index()
print(out)
Date store_1 store_2 store_3
0 1-1-21 0.5 0.2 0.3
1 1-2-21 0.3 0.7 0.1
2 1-3-21 0.6 0.9 0.4

Splitting by indices: I want to split the train + test from the data whose indices have been given. How shall I get train/test df?

for example= df is the data with features. I want to split the train + test from the data whose indices have been given. How shall I get train/test df.
df=
0 2 0.3 0.5 0.5
1 4 0.5 0.7 0.4
2 2 0.5 0.1 0.4
3 4 0.4 0.1 0.3
4 2 0.3 0.1 0.5
where train.txt is
train=pd.read_csv(data_train.txt)
where in this dataframe indices are given. How should I get the training data from those indices?
Contents in data_train.txt(there are 10000 of data in which train indices are given in this txt file)
0
2
4
I want these indices for training data with feature:- like
final train should look like this (see the index):
0 2 0.3 0.5 0.5
2 2 0.5 0.1 0.4
4 2 0.3 0.1 0.5

If you have a df as given by:
0 1 2 3 4
0 0 2 0.3 0.5 0.5
1 1 4 0.5 0.7 0.4
2 2 2 0.5 0.1 0.4
3 3 4 0.4 0.1 0.3
4 4 2 0.3 0.1 0.5
and another train_indices as given by:
0
0 0
1 2
2 4
then all you need to do to get the corresponding rows of df depends on how the data is organised:
#if you're trying to match the index of the df itself
train_df = df.iloc[train_indices]
#if you're trying to match column 0, which might be important
#if it's not aligned to the index
train_df = df.loc[df[0].isin(train_indices)]
Both of these (in this case) return:
0 1 2 3 4
0 0 2 0.3 0.5 0.5
2 2 2 0.5 0.1 0.4
4 4 2 0.3 0.1 0.5

reshape dataframe with various length rows

I have a dataframe in python that looks like this:
ID Value
001 0.5
001 0.2
001 0.5
001 0.0
002 0.4
002 0.6
002 0.6
I would like the data to be reshaped into something like this:
ID Val1 Val2 Val3 Val4
001 0.5 0.2 0.5 0.0
002 0.4 0.6 0.6 NaN
Can anyone help with this? My first thought was de-melting the data with "pivot" but without a value denoting the "Val" position, it doesnt work as intended.
thanks!

Grouppby your ID then reset the index to keep the columns consistent and unstack
df.groupby('ID')['Value'].apply(lambda df: df.reset_index(drop=True)).unstack()
0 1 2 3
ID
1 0.5 0.2 0.5 0.0
2 0.4 0.6 0.6 NaN
OR to not use ID as the index:
df.sort_values('ID').groupby('ID')['Value'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()
ID 0 1 2 3
0 1 0.5 0.2 0.5 0.0
1 2 0.4 0.6 0.6 NaN

You can assign an indexer series, then pivot:
res = df.assign(ValNum=df.groupby('ID').cumcount()+1)\
.pivot(index='ID', columns='ValNum', values='Value')\
.reset_index()
print(res)
ValNum ID 1 2 3 4
0 1 0.5 0.2 0.5 0.0
1 2 0.4 0.6 0.6 NaN

This might work:
>>> df = pd.DataFrame({"id": ["001"]*4 + ["002"]*3, "value": [0.5, 0.2, 0.5, 0.0, 0.4, 0.6, 0.6]})
>>> df
id value
0 001 0.5
1 001 0.2
2 001 0.5
3 001 0.0
4 002 0.4
5 002 0.6
6 002 0.6
>>> pd.concat([pd.Series(list(g["value"]), name=x) for x, g in df.groupby("id")], axis=1).T
0 1 2 3
001 0.5 0.2 0.5 0.0
002 0.4 0.6 0.6 NaN
Now what you have to do is to rename the columns/rows.

Python pandas data frame reshape

The data shown below is an simplified example. The actual data frame is 3750 rows 2 columns data frame. I need to reshape the data frame into another structure.
A A2
0.1 1
0.4 2
0.6 3
B B2
0.8 1
0.7 2
0.9 3
C C2
0.3 1
0.6 2
0.8 3
How can I reshape above data frame into horizontal as following:
A A2 B B2 C C2
0.1 1 0.8 1 0.3 1
0.4 2 0.7 2 0.6 2
0.6 3 0.9 3 0.8 3

You can reshape your data and create a new dataframe:
cols = 6
rows = 4
df = pd.DataFrame(df.values.T.reshape(cols,rows).T)
df.rename(columns=df.iloc[0]).drop(0)
A B C A2 B2 C2
1 0.1 0.8 0.3 1 1 1
2 0.4 0.7 0.6 2 2 2
3 0.6 0.9 0.8 3 3 3

try this, If you don't want to hard code your values.
df['header']=pd.to_numeric(df[0],errors='coerce')
l= df['header'].values
m_l = l.reshape((np.isnan(l).sum(),-1))[:,1:]
h=df[df['header'].isnull()][0].values
print pd.DataFrame(dict(zip(h,m_l)))
Output:
A B C
0 0.1 0.8 0.3
1 0.4 0.7 0.6
2 0.6 0.9 0.8

Create multiple dataframes based on the original dataframe columns number

I've search for quite a time, but I haven't found any similar question. If there is, please let me know!
I am currently trying to divide one dataframe into n dataframes where the n is equal to the number of columns of the original dataframe. All the new resulting dataframes must always keep the first column of the original dataframe. An extra would be gather all togheter in a list, for example, for further access.
In order to visualize my intention, here goes an brief example:
>> original df
GeneID A B C D E
1 0.3 0.2 0.6 0.4 0.8
2 0.5 0.3 0.1 0.2 0.6
3 0.4 0.1 0.5 0.1 0.3
4 0.9 0.7 0.1 0.6 0.7
5 0.1 0.4 0.7 0.2 0.5
My desired output would be something like this:
>> df1
GeneID A
1 0.3
2 0.5
3 0.4
4 0.9
5 0.1
>> df2
GeneID B
1 0.2
2 0.3
3 0.1
4 0.7
5 0.4
....
And so on, until all the columns from the original dataframe be covered.
What would be the better solution ?

You can use df.columns to get all column names and then create sub-dataframes:
outdflist =[]
# for each column beyond first:
for col in oridf.columns[1:]:
# create a subdf with desired columns:
subdf = oridf[['GeneID',col]]
# append subdf to list of df:
outdflist.append(subdf)
# to view all dataframes created:
for df in outdflist:
print(df)
Output:
GeneID A
0 1 0.3
1 2 0.5
2 3 0.4
3 4 0.9
4 5 0.1
GeneID B
0 1 0.2
1 2 0.3
2 3 0.1
3 4 0.7
4 5 0.4
GeneID C
0 1 0.6
1 2 0.1
2 3 0.5
3 4 0.1
4 5 0.7
GeneID D
0 1 0.4
1 2 0.2
2 3 0.1
3 4 0.6
4 5 0.2
GeneID E
0 1 0.8
1 2 0.6
2 3 0.3
3 4 0.7
4 5 0.5
Above for loop can also be written more simply as list comprehension:
outdflist = [ oridf[['GeneID', col]]
for col in oridf.columns[1:] ]

You can do with groupby
d={'df'+ str(x): y for x , y in df.groupby(level=0,axis=1)}
d
Out[989]:
{'dfA': A
0 0.3
1 0.5
2 0.4
3 0.9
4 0.1, 'dfB': B
0 0.2
1 0.3
2 0.1
3 0.7
4 0.4, 'dfC': C
0 0.6
1 0.1
2 0.5
3 0.1
4 0.7, 'dfD': D
0 0.4
1 0.2
2 0.1
3 0.6
4 0.2, 'dfE': E
0 0.8
1 0.6
2 0.3
3 0.7
4 0.5, 'dfGeneID': GeneID
0 1
1 2
2 3
3 4
4 5}

You can create a list of column names, and manually loop through and create a new DataFrame each loop.
>>> import pandas as pd
>>> d = {'col1':[1,2,3], 'col2':[3,4,5], 'col3':[6,7,8]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2 col3
0 1 3 6
1 2 4 7
2 3 5 8
>>> newstuff=[]
>>> columns = list(df)
>>> for column in columns:
... newstuff.append(pd.DataFrame(data=df[column]))
Unless your dataframe is unreasonably massive, above code should serve its job.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reshape (stack) pandas dataframe based on predefined number of rows - python

Related

How to transform the following dataset for time series analysis?

Splitting by indices: I want to split the train + test from the data whose indices have been given. How shall I get train/test df?

reshape dataframe with various length rows

Python pandas data frame reshape

Create multiple dataframes based on the original dataframe columns number

Categories

Resources