Pandas pivot_table. How to change sorting - python

I have this dataframe df:
alpha1 week_day calendar_week
0 2.49 Freitag 2022-04-(01/07)
1 1.32 Samstag 2022-04-(01/07)
2 2.70 Sonntag 2022-04-(01/07)
3 3.81 Montag 2022-04-(01/07)
4 3.58 Dienstag 2022-04-(01/07)
5 3.48 Mittwoch 2022-04-(01/07)
6 1.79 Donnerstag 2022-04-(01/07)
7 2.12 Freitag 2022-04-(08/14)
8 2.41 Samstag 2022-04-(08/14)
9 1.78 Sonntag 2022-04-(08/14)
10 3.19 Montag 2022-04-(08/14)
11 3.33 Dienstag 2022-04-(08/14)
12 2.88 Mittwoch 2022-04-(08/14)
13 2.98 Donnerstag 2022-04-(08/14)
14 3.01 Freitag 2022-04-(15/21)
15 3.04 Samstag 2022-04-(15/21)
16 2.72 Sonntag 2022-04-(15/21)
17 4.11 Montag 2022-04-(15/21)
18 3.90 Dienstag 2022-04-(15/21)
19 3.16 Mittwoch 2022-04-(15/21)
and so on, with ascending calendar weeks.
I performed a pivot table to generate a heatmap.
df_pivot = pd.pivot_table(df, values=['alpha1'], index=['week_day'], columns=['calendar_week'])
What I get is:
alpha1 \
calendar_week 2022-(04-29/05-05) 2022-(05-27/06-02) 2022-(07-29/08-04)
week_day
Dienstag 3.32 2.09 4.04
Donnerstag 3.27 2.21 4.65
Freitag 2.83 3.08 4.19
Mittwoch 3.22 3.14 4.97
Montag 2.83 2.86 4.28
Samstag 2.62 3.62 3.88
Sonntag 2.81 3.25 3.77
\
calendar_week 2022-(08-26/09-01) 2022-04-(01/07) 2022-04-(08/14)
week_day
Dienstag 2.92 3.58 3.33
Donnerstag 3.58 1.79 2.98
Freitag 3.96 2.49 2.12
Mittwoch 3.09 3.48 2.88
Montag 3.85 3.81 3.19
Samstag 3.10 1.32 2.41
Sonntag 3.39 2.70 1.78
As you see the sorting of the pivot table is messed up. I need the same sorting for the columns (calendar weeks) as in the original dataframe.
I have been looking all over but couldn't find how to achieve this.
Would be also very nice, if the sorting of the rows remains the same.
Any help will be greatly appreciated
UPDATE
I didn't paste all the data. It would have been too long
The calendar_week column consist of following elements
'2022-04-(01/07)',
'2022-04-(08/14)',
'2022-04-(15/21)',
'2022-04-(22/28)',
'2022-(04-29/05-05)',
'2022-05-(06/12)',
'2022-05-(13/19)',
'2022-05-(20/26)',
'2022-(05-27/06-02)',
'2022-06-(03/09)'
'2022-06-(10/16)'
'2022-06-(17/23)'
'2022-06-(24/30)'
'2022-07-(01/07)'
'2022-07-(08/14)'
'2022-07-(15/21)'
'2022-07-(22/28)'
'2022-(07-29/08-04)'
'2022-08-(05/11)'
etc....
Each occurs 7 times in df. It represents a calendar week.
The sorting is the natural time sorting.
After pivoting the dataframe, the sorting of the column get messed up. And I guess it's due to the 2 different types: 2022-(07-29/08-04) and 2022-07-(15/21).

Try running this:
df_pivot.sort_values(by = ['calendar_week'], axis = 1, ascending = True)
I got the following output. Is this what you wanted?
calendar_week
2022-04-(01/07)
2022-04-(08/14)
2022-04-(15/21)
week_day
Dienstag
3.58
3.33
3.90
Donnerstag
1.79
2.98
NaN
Freitag
2.49
2.12
3.01
Mittwoch
3.48
2.88
3.16
Montag
3.81
3.19
4.11
be sure to remove the NaN values using the fillna() function.
I hope that answers it. :)

You can use an ordered Categorical for your week days and sort the dates after pivoting with sort_index:
# define the desired order of the days
days = ['Montag', 'Dienstag', 'Mittwoch', 'Donnerstag',
'Freitag', 'Samstag', 'Sonntag']
df_pivot = (df
.assign(week_day=pd.Categorical(df['week_day'], categories=days,
ordered=True))
.pivot_table(values='alpha1', index='week_day',
columns='calendar_week')
.sort_index(axis=1)
)
output:
calendar_week 2022-04-(01/07) 2022-04-(08/14) 2022-04-(15/21)
week_day
Montag 3.81 3.19 4.11
Dienstag 3.58 3.33 3.90
Mittwoch 3.48 2.88 3.16
Donnerstag 1.79 2.98 NaN
Freitag 2.49 2.12 3.01
Samstag 1.32 2.41 3.04
Sonntag 2.70 1.78 2.72

Related

Index must be DatetimeIndex when filtering dataframe

I then have a function which look for a specific date (in this case, 2022-01-26):
def get_days(data, date):
df = pd.read_csv(data)
df = df[(df['date'] >= date) & (df['date'] <= date)]
get_trading_session_times(df)
Which returns:
v vw o c h l n date time
0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00
1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00
2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00
3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00
4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00
.. ... ... ... ... ... ... ... ... ...
868 176.0 3.1300 3.13 3.13 3.13 3.13 2 2022-01-26 23:53:00
869 550.0 3.1200 3.12 3.12 3.12 3.12 3 2022-01-26 23:56:00
870 460.0 3.1211 3.12 3.12 3.12 3.12 3 2022-01-26 23:57:00
871 1175.0 3.1201 3.12 3.12 3.12 3.12 6 2022-01-26 23:58:00
872 559.0 3.1102 3.11 3.11 3.11 3.11 5 2022-01-26 23:59:00
[873 rows x 9 columns]
When I then try to look for only times between 09:00 and 09:30 like so:
def get_trading_session_times(df):
df = df['time'].between_time('09:00', '09:30')
print(df)
I get the following error:
Index must be DatetimeIndex when filtering dataframe
Full code:
import pandas as pd
data = 'data\BBIG.csv'
date = '2022-01-26'
def get_days(data, date):
df = pd.read_csv(data)
df = df[(df['date'] >= date) & (df['date'] <= date)]
get_trading_session_times(df)
def get_trading_session_times(df):
df = df['time'].between_time('09:00', '09:30')
print(df)
get_days(data, date)
What am I doing wrong?
between_time is only valid if your index is a DateTiimeIndex
As your string time is well formatted, you can use between to compare them because your values can be sorted in lexicographical order.
>>> df[df['time'].between('09:00', '09:30')]
v vw o c h l n date time
0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00
1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00
2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00
3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00
4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00
Update
If your time column contains a time object:
from datetime import time
df['time'] = pd.to_datetime(df['time']).dt.time
out = df[df['time'].between(time(9, 0), time(9, 30))]
print(out)
# Output
v vw o c h l n date time
0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00
1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00
2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00
3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00
4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00

Python Pandas: How to combine or merge two difrent size dataframes based on dates

I like to merge or combine two dataframes of different size df1 and df2, based on a range of dates, for example:
df1:
Date Open High Low
2021-07-01 8.43 8.44 8.22
2021-07-02 8.36 8.4 8.28
2021-07-06 8.22 8.23 8.06
2021-07-07 8.1 8.19 7.98
2021-07-08 8.07 8.1 7.91
2021-07-09 7.97 8.11 7.92
2021-07-12 8 8.2 8
2021-07-13 8.15 8.18 8.06
2021-07-14 8.18 8.27 8.12
2021-07-15 8.21 8.26 8.06
2021-07-16 8.12 8.23 8.07
df2:
Day of month Revenue Earnings
01 45000 4000
07 43500 5000
12 44350 6000
15 39050 7000
results should be something like this:
combination:
Date Open High Low Earnings
2021-07-01 8.43 8.44 8.22 4000
2021-07-02 8.36 8.4 8.28 4000
2021-07-06 8.22 8.23 8.06 4000
2021-07-07 8.1 8.19 7.98 5000
2021-07-08 8.07 8.1 7.91 5000
2021-07-09 7.97 8.11 7.92 5000
2021-07-12 8 8.2 8 6000
2021-07-13 8.15 8.18 8.06 6000
2021-07-14 8.18 8.27 8.12 6000
2021-07-15 8.21 8.26 8.06 7000
2021-07-16 8.12 8.23 8.07 7000
The Earnings column is merged based on a range of date, how can I do this in python pandas?
Try merge_asof
#df1.date=pd.to_datetime(df1.date)
df1['Day of month'] = df1.Date.dt.day
out = pd.merge_asof(df1, df2, on ='Day of month', direction = 'backward')
out
Out[213]:
Date Open High Low Day of month Revenue Earnings
0 2021-07-01 8.43 8.44 8.22 1 45000 4000
1 2021-07-02 8.36 8.40 8.28 2 45000 4000
2 2021-07-06 8.22 8.23 8.06 6 45000 4000
3 2021-07-07 8.10 8.19 7.98 7 43500 5000
4 2021-07-08 8.07 8.10 7.91 8 43500 5000
5 2021-07-09 7.97 8.11 7.92 9 43500 5000
6 2021-07-12 8.00 8.20 8.00 12 44350 6000
7 2021-07-13 8.15 8.18 8.06 13 44350 6000
8 2021-07-14 8.18 8.27 8.12 14 44350 6000
9 2021-07-15 8.21 8.26 8.06 15 39050 7000
10 2021-07-16 8.12 8.23 8.07 16 39050 7000
A more general approach is the following:
First you introduce a key both dataframes share.
In this case, the day of the month (or, potentially, multiple keys like day of the month and month). df1["day"] = df1["Date"].dt.day
If you were to merge (leftjoin df2 on df1) now, you wouldn't have enough keys in df2, as there are days missing. To fill the gaps, we could interpolate, or use the naïve approach: If we don't know the Revenue / Earnings for a specific day, we take the last known one and apply no further calculation. One way to achieve this is described here: How to replace NaNs by preceding or next values in pandas DataFrame? df.fillna(method='ffill')
Now we merge on our key. Following the doc https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html , we do it like this: df1.merge(df2, left_on='day')
Voilà!

Select rows from DataReader based on value and transfer to DataFrame

I am doing a project where I read in the historical values for a given stock, I then want to filter out the days where the price has jumped +5% or -5% into a different dataframe.
But I am struggling with the transfer of the row.
import pandas_datareader as web
import pandas as pd
import datetime
start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)
df1 = pd.DataFrame()
df = web.DataReader("amd", 'yahoo', start, end)
df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)
for row in df:
df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
df['perchange'] = df['perchange'].astype(float)
if df['perchange'] >= 5.0:
df1 += df
if ['perchange'] <= -5.0:
df1 += df
df.to_csv('amd_volume_price_history.csv')
df1.to_csv('amd_5_to_5.csv')
You can do this to create a new dataframe with
the rows where the percentage of changes is greater than 5% in absolute value. As you can see Series.between has been used to performance a boolean indexing:
not_significant=((df['Close']-df['Open'])/df['Open']).between(-0.05,0.05)
df_filtered=df[~not_significant]
print(df_filtered)
Output
High Low Open Close Volume Adj Close
Date
2015-09-11 2.140000 1.810000 1.880000 2.010000 31010300 2.010000
2015-09-14 2.000000 1.810000 2.000000 1.820000 16458500 1.820000
2015-10-19 2.010000 1.910000 1.910000 2.010000 10670800 2.010000
2015-10-23 2.210000 2.100000 2.100000 2.210000 9564200 2.210000
2015-11-03 2.290000 2.160000 2.160000 2.280000 8705800 2.280000
... ... ... ... ... ... ...
2019-06-06 31.980000 29.840000 29.870001 31.820000 131267800 31.820000
2019-07-31 32.299999 30.299999 32.080002 30.450001 119190000 30.450001
2019-08-08 34.270000 31.480000 31.530001 33.919998 167278800 33.919998
2019-08-12 34.650002 32.080002 34.160000 32.430000 106936000 32.430000
2019-08-23 31.830000 29.400000 31.299999 29.540001 83681100 29.540001
[123 rows x 6 columns]
if you really need perchange column you can create changing the code:
df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]
print(df_filtered)
Also you can use DataFrame.pct_change:
df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100
Output
High Low Open Close Volume Adj Close \
Date
2015-09-11 2.140000 1.810000 1.880000 2.010000 31010300 2.010000
2015-09-14 2.000000 1.810000 2.000000 1.820000 16458500 1.820000
2015-10-19 2.010000 1.910000 1.910000 2.010000 10670800 2.010000
2015-10-23 2.210000 2.100000 2.100000 2.210000 9564200 2.210000
2015-11-03 2.290000 2.160000 2.160000 2.280000 8705800 2.280000
... ... ... ... ... ... ...
2019-06-06 31.980000 29.840000 29.870001 31.820000 131267800 31.820000
2019-07-31 32.299999 30.299999 32.080002 30.450001 119190000 30.450001
2019-08-08 34.270000 31.480000 31.530001 33.919998 167278800 33.919998
2019-08-12 34.650002 32.080002 34.160000 32.430000 106936000 32.430000
2019-08-23 31.830000 29.400000 31.299999 29.540001 83681100 29.540001
Perchange
Date
2015-09-11 6.914893
2015-09-14 -8.999997
2015-10-19 5.235603
2015-10-23 5.238102
2015-11-03 5.555550
... ...
2019-06-06 6.528285
2019-07-31 -5.081050
2019-08-08 7.580074
2019-08-12 -5.064401
2019-08-23 -5.622998
[123 rows x 7 columns]
your code would look like this:
#Libraries
import pandas_datareader as web
import pandas as pd
import datetime
#Getting data
start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)
df = web.DataReader("amd", 'yahoo', start, end)
#Convertint to float to calculate and filtering
df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)
#Creating Perchange column.
df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
#df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100
#Filtering
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]
#Saving data.
df.to_csv('amd_volume_price_history.csv')
df_filtered.to_csv('amd_5_to_5.csv')
EDIT
df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
significant=~(df['Perchange']).between(-5,5)
group_by_jump=significant.cumsum()
jump_and_4=group_by_jump.groupby(group_by_jump,sort=False).cumcount().le(4)&group_by_jump.ne(0)
df_filtered=df[jump_and_4]
print(df_filtered.head(50))
High Low Open Close Volume Adj Close Perchange
Date
2015-09-11 2.14 1.81 1.88 2.01 31010300 2.01 6.914893
2015-09-14 2.00 1.81 2.00 1.82 16458500 1.82 -8.999997
2015-09-15 1.87 1.81 1.84 1.86 6524400 1.86 1.086955
2015-09-16 1.90 1.85 1.87 1.89 4928300 1.89 1.069518
2015-09-17 1.94 1.87 1.90 1.89 5831600 1.89 -0.526315
2015-09-18 1.92 1.85 1.87 1.87 11814000 1.87 0.000000
2015-10-19 2.01 1.91 1.91 2.01 10670800 2.01 5.235603
2015-10-20 2.03 1.97 2.00 2.02 5584200 2.02 0.999999
2015-10-21 2.12 2.01 2.02 2.10 14944100 2.10 3.960392
2015-10-22 2.16 2.09 2.10 2.14 8208400 2.14 1.904772
2015-10-23 2.21 2.10 2.10 2.21 9564200 2.21 5.238102
2015-10-26 2.21 2.12 2.21 2.15 6313500 2.15 -2.714929
2015-10-27 2.16 2.10 2.12 2.15 5755600 2.15 1.415104
2015-10-28 2.20 2.12 2.14 2.18 6950600 2.18 1.869157
2015-10-29 2.18 2.11 2.15 2.13 4500400 2.13 -0.930232
2015-11-03 2.29 2.16 2.16 2.28 8705800 2.28 5.555550
2015-11-04 2.30 2.18 2.27 2.20 8205300 2.20 -3.083698
2015-11-05 2.24 2.17 2.21 2.20 4302200 2.20 -0.452488
2015-11-06 2.21 2.13 2.19 2.15 8997100 2.15 -1.826482
2015-11-09 2.18 2.10 2.15 2.11 6231200 2.11 -1.860474
2015-11-18 2.15 1.98 1.99 2.12 9384700 2.12 6.532657
2015-11-19 2.16 2.09 2.10 2.14 4704300 2.14 1.904772
2015-11-20 2.25 2.13 2.14 2.22 10727100 2.22 3.738314
2015-11-23 2.24 2.18 2.22 2.22 4863200 2.22 0.000000
2015-11-24 2.40 2.17 2.20 2.34 15859700 2.34 6.363630
2015-11-25 2.40 2.31 2.36 2.38 6914800 2.38 0.847467
2015-11-27 2.38 2.32 2.37 2.33 2606600 2.33 -1.687762
2015-11-30 2.37 2.25 2.34 2.36 9924400 2.36 0.854700
2015-12-01 2.37 2.31 2.36 2.34 5646400 2.34 -0.847457
2015-12-16 2.55 2.37 2.39 2.54 19543600 2.54 6.276144
2015-12-17 2.60 2.52 2.52 2.56 11374100 2.56 1.587300
2015-12-18 2.55 2.42 2.51 2.45 17988100 2.45 -2.390436
2015-12-21 2.53 2.43 2.47 2.53 6876600 2.53 2.429147
2015-12-22 2.78 2.54 2.55 2.77 24893200 2.77 8.627452
2015-12-23 2.94 2.75 2.76 2.83 30365300 2.83 2.536229
2015-12-24 3.00 2.86 2.88 2.92 11890900 2.92 1.388888
2015-12-28 3.02 2.86 2.91 3.00 16050500 3.00 3.092780
2015-12-29 3.06 2.97 3.04 3.00 15300900 3.00 -1.315788
2016-01-06 2.71 2.47 2.66 2.51 23759400 2.51 -5.639101
2016-01-07 2.48 2.26 2.43 2.28 22203500 2.28 -6.172843
2016-01-08 2.42 2.10 2.36 2.14 31822400 2.14 -9.322025
2016-01-11 2.36 2.12 2.16 2.34 19629300 2.34 8.333325
2016-01-12 2.46 2.28 2.40 2.39 17986100 2.39 -0.416666
2016-01-13 2.45 2.21 2.40 2.25 12749700 2.25 -6.250004
2016-01-14 2.35 2.21 2.29 2.21 15666600 2.21 -3.493447
2016-01-15 2.13 1.99 2.10 2.03 21199300 2.03 -3.333330
2016-01-19 2.11 1.90 2.08 1.95 18978900 1.95 -6.249994
2016-01-20 1.95 1.75 1.81 1.80 29243600 1.80 -0.552486
2016-01-21 2.18 1.81 1.82 2.09 26387900 2.09 14.835157
2016-01-22 2.17 1.98 2.11 2.02 16245500 2.02 -4.265399
try to integrate your code with these modifications:
1) you probably don't need any loop to calculate the new column:
df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
df['perchange'] = df['perchange'].astype(float)
2) define an empty df
df1=pd.DataFrame([])
3) filter the old df with loc method (get used with its notation it is very useful) and append to the empty data frame, this will transfer the rows that verify the condition
df1=df1.append(df.loc[(df['perchange'] <= -5.0) | (df['perchange'] >= -5.0)])
print(df1)
hope it helps

How to group daily time series data into smaller dataframes of weeks

I have a dataframe that looks like this:
open high low close weekday
time
2011-11-29 2.55 2.98 2.54 2.75 1
2011-11-30 2.75 3.09 2.73 2.97 2
2011-12-01 2.97 3.14 2.93 3.06 3
2011-12-02 3.06 3.14 3.03 3.12 4
2011-12-03 3.12 3.13 2.75 2.79 5
2011-12-04 2.79 2.90 2.61 2.83 6
2011-12-05 2.83 2.93 2.78 2.88 0
2011-12-06 2.88 3.05 2.87 3.03 1
2011-12-07 3.03 3.08 2.93 2.99 2
2011-12-08 2.99 3.01 2.88 2.98 3
2011-12-09 2.98 3.04 2.93 2.97 4
2011-12-10 2.97 3.13 2.93 3.05 5
2011-12-11 3.05 3.38 2.99 3.25 6
The weekday column refers to 0 = Monday,...6 = Sunday.
I want to make groups of smaller dataframes only containing the data for Friday, Saturday, Sunday and Monday. So one subset would look like this:
2011-12-02 3.06 3.14 3.03 3.12 4
2011-12-03 3.12 3.13 2.75 2.79 5
2011-12-04 2.79 2.90 2.61 2.83 6
2011-12-05 2.83 2.93 2.78 2.88 0
filter before drop_duplicates
df[df.weekday.isin([4,5,6,0])].drop_duplicates('weekday')
Out[10]:
open high low close weekday
2011-12-02 3.06 3.14 3.03 3.12 4
2011-12-03 3.12 3.13 2.75 2.79 5
2011-12-04 2.79 2.90 2.61 2.83 6
2011-12-05 2.83 2.93 2.78 2.88 0

python (pandas) - TypeError: must be str, not list when concatenating lists

I have this dataframe; please note the last column ("Yr_Mo_Date") on the right
In[38]: data.head()
Out[38]:
RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL Yr_Mo_Dy
0 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04 61-1-1
1 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 61-1-2
2 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71 61-1-3
3 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88 61-1-4
4 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83 61-1-5
The type of the "Yr_Mo_Dy" column is object while the others are float64.
I simply want to change the order of the columns so that "Yr_Mo_Dy" is the first column in the dataframe.
I tried the following but I get TypeError. What's wrong?
In[39]: cols = data.columns.tolist()
In[40]: cols
Out[40]:
['RPT',
'VAL',
'ROS',
'KIL',
'SHA',
'BIR',
'DUB',
'CLA',
'MUL',
'CLO',
'BEL',
'MAL',
'Yr_Mo_Dy']
In[41]: cols = cols[-1] + cols[:-1]
TypeError Traceback (most recent call last)
<ipython-input-59-c0130d1863e8> in <module>()
----> 1 cols = cols[-1] + cols[:-1]
TypeError: must be str, not list
You need add : for one element list because need concanecate 2 lists:
#string
print (cols[-1])
Yr_Mo_Dy
#one element list
print (cols[-1:])
['Yr_Mo_Dy']
cols = cols[-1:] + cols[:-1]
Or is possible add [], but it is worse readable:
cols = [cols[-1]] + cols[:-1]
print (cols)
['Yr_Mo_Dy', 'RPT', 'VAL', 'ROS', 'KIL', 'SHA', 'BIR',
'DUB', 'CLA', 'MUL', 'CLO', 'BEL', 'MAL']
Option 1
Use pd.DataFrame.insert and pd.DataFrame.pop to alter the dataframe in place. This is a very generalizable solution as you can swap in any column position for popping or inserting.
c = df.columns[-1]
df.insert(0, c, df.pop(c))
df
Yr_Mo_Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
0 61-1-1 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04
1 61-1-2 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
2 61-1-3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71
3 61-1-4 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
4 61-1-5 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
Option 2
pd.DataFrame.reindex_axis and np.roll
df.reindex_axis(np.roll(df.columns, 1), 1)
Yr_Mo_Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
0 61-1-1 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04
1 61-1-2 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
2 61-1-3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71
3 61-1-4 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
4 61-1-5 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83

Categories