Plotting a stacked horizontal barplot - python

I have this dataframe called "df_pressure":
Ranking Squad Press Succ Succ% Fail Fail%
11 1 Manchester City 4254 1381 32.5 2873 67.5
10 2 Liverpool 5360 1731 32.3 3629 67.7
5 3 Chelsea 5533 1702 30.8 3831 69.2
16 4 Tottenham 5477 1523 27.8 3954 72.2
0 5 Arsenal 4772 1440 30.2 3332 69.8
12 6 Manchester Utd 5069 1462 28.8 3607 71.2
18 7 West Ham 4917 1372 27.9 3545 72.1
9 8 Leicester City 5982 1719 28.7 4263 71.3
3 9 Brighton 5670 1832 32.3 3838 67.7
19 10 Wolves 5529 1633 29.5 3896 70.5
13 11 Newcastle Utd 5430 1460 26.9 3970 73.1
6 12 Crystal Palace 6041 1809 29.9 4232 70.1
2 13 Brentford 5566 1609 28.9 3957 71.1
1 14 Aston Villa 5515 1524 27.6 3991 72.4
15 15 Southampton 5869 1806 30.8 4063 69.2
7 16 Everton 6346 1892 29.8 4454 70.2
8 17 Leeds United 7078 2118 29.9 4960 70.1
4 18 Burnley 5527 1499 27.1 4028 72.9
17 19 Watford 5730 1656 28.9 4074 71.1
14 20 Norwich City 6146 1570 25.5 4576 74.5
I then decided to create another dataframe for some columns only:
df_pressure_perc=df_pressure[['Squad','Succ%','Fail%']]
df_pressure_perc.reset_index(drop=True, inplace=True)
df_pressure_perc.set_index('Squad')
print(df_pressure_perc)
Output:
Squad Succ% Fail%
0 Manchester City 32.5 67.5
1 Liverpool 32.3 67.7
2 Chelsea 30.8 69.2
3 Tottenham 27.8 72.2
4 Arsenal 30.2 69.8
5 Manchester Utd 28.8 71.2
6 West Ham 27.9 72.1
7 Leicester City 28.7 71.3
8 Brighton 32.3 67.7
9 Wolves 29.5 70.5
10 Newcastle Utd 26.9 73.1
11 Crystal Palace 29.9 70.1
12 Brentford 28.9 71.1
13 Aston Villa 27.6 72.4
14 Southampton 30.8 69.2
15 Everton 29.8 70.2
16 Leeds United 29.9 70.1
17 Burnley 27.1 72.9
18 Watford 28.9 71.1
19 Norwich City 25.5 74.5
Based on this new dataframe "df_pressure_perc", I decided to create a stacked barplot. Upon creating it with the following code: df_pressure_perc.plot(kind='barh', stacked=True, ylabel='Squad', colormap='tab10', figsize=(10, 6))
I realised my viz Y axis were not labelled in terms of the Squad names. Would like to seek some advice on how I can reflect the Y axis in Squad names instead of 0-19.
Visualization(stacked barplot)

Related

Special kind of dataframes merging — inserting into a dataframe according to date values

df_1 is as follows -
date id score
2019-05 5 78.9
2019-06 5 77.5
2019-07 5 80.2
2019-08 5 82.0
2019-05 2 79.9
2019-06 2 69.3
2019-07 2 75.2
2019-08 2 80.0
2019-05 70 68.8
2019-06 70 67.5
2019-07 70 70.2
2019-08 70 86.0
df_2 is as follows -
date id score
2019-01 2 79.1
2019-02 2 79.2
2019-03 2 75.2
2019-04 2 80.0
2019-01 5 78.9
2019-02 5 78.5
2019-03 5 80.8
2019-04 5 82.8
2019-01 70 68.4
2019-02 70 72.2
2019-03 70 70.5
2019-04 70 81.0
How can I merge them into one dataframe according to date and id, resulting in -
date id score
2019-01 2 79.1
2019-02 2 79.2
2019-03 2 75.2
2019-04 2 80.0
2019-05 2 79.9
2019-06 2 69.3
2019-07 2 75.2
2019-08 2 80.0
2019-01 5 78.9
2019-02 5 78.5
2019-03 5 80.8
2019-04 5 82.8
2019-05 5 78.9
2019-06 5 77.5
2019-07 5 80.2
2019-08 5 82.0
2019-01 70 68.4
2019-02 70 72.2
2019-03 70 70.5
2019-04 70 81.0
2019-05 70 68.8
2019-06 70 67.5
2019-07 70 70.2
2019-08 70 86.0
Use pd.concat:
pd.concat([df_1, df_2]).sort_values(["date", "id"]).reset_index(drop=True)
Concat and sort values
pd.concat([df1, df2]).sort_values(['id', 'date'])
date id score
0 2019-01 2 79.1
1 2019-02 2 79.2
2 2019-03 2 75.2
3 2019-04 2 80.0
4 2019-05 2 79.9
5 2019-06 2 69.3
6 2019-07 2 75.2
7 2019-08 2 80.0
4 2019-01 5 78.9
5 2019-02 5 78.5
6 2019-03 5 80.8
7 2019-04 5 82.8
0 2019-05 5 78.9
1 2019-06 5 77.5
2 2019-07 5 80.2
3 2019-08 5 82.0
8 2019-01 70 68.4
9 2019-02 70 72.2
10 2019-03 70 70.5
11 2019-04 70 81.0
8 2019-05 70 68.8
9 2019-06 70 67.5
10 2019-07 70 70.2
11 2019-08 70 86.0

Plot seaborn boxplot for multiple columns and compare with a standard scale [duplicate]

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Closed 1 year ago.
I am a newbie in data analysis. I wish to know how to boxplot multiple columns (x-axis = Points, Score, Weigh) in a single graph and make the y-axis as a standardized scale for comparison. I have tried and couldn't understand the code (Python+Pandas+Seaborn) for this. Help me out guys. The dataset for the same is as follows:
Cars
Points
Score
Weigh
0
Mazda RX4
3.90
2.620
16.46
1
Mazda RX4 Wag
3.90
2.875
17.02
2
Datsun 710
3.85
2.320
18.61
3
Hornet 4 Drive
3.08
3.215
19.44
4
Hornet Sportabout
3.15
3.440
17.02
5
Valiant
2.76
3.460
20.22
6
Duster 360
3.21
3.570
15.84
7
Merc 240D
3.69
3.190
20.00
8
Merc 230
3.92
3.150
22.90
9
Merc 280
3.92
3.440
18.30
10
Merc 280C
3.92
3.440
18.90
11
Merc 450SE
3.07
4.070
17.40
12
Merc 450SL
3.07
3.730
17.60
13
Merc 450SLC
3.07
3.780
18.00
14
Cadillac Fleetwood
2.93
5.250
17.98
15
Lincoln Continental
3.00
5.424
17.82
16
Chrysler Imperial
3.23
5.345
17.42
17
Fiat 128
4.08
2.200
19.47
18
Honda Civic
4.93
1.615
18.52
19
Toyota Corolla
4.22
1.835
19.90
20
Toyota Corona
3.70
2.465
20.01
21
Dodge Challenger
2.76
3.520
16.87
22
AMC Javelin
3.15
3.435
17.30
23
Camaro Z28
3.73
3.840
15.41
24
Pontiac Firebird
3.08
3.845
17.05
25
Fiat X1-9
4.08
1.935
18.90
26
Porsche 914-2
4.43
2.140
16.70
27
Lotus Europa
3.77
1.513
16.90
28
Ford Pantera L
4.22
3.170
14.50
29
Ferrari Dino
3.62
2.770
15.50
30
Maserati Bora
3.54
3.570
14.60
31
Volvo 142E
4.11
2.780
18.60
My output should look something like:
Output Boxplot Graph
With matplotlib:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("test_data.txt")
plt.rcParams['figure.figsize'] = (8,4)
data.boxplot(column=['Points', 'Score', 'Weigh'], grid='true', color='blue',fontsize=10, rot=30)
And with seaborn:
import pandas as pd
import seaborn as sns
data = pd.read_csv("test_data.txt")
ax = sns.boxplot(data=data, palette="Set2")
boxplot = df.boxplot(column=['Points', 'Score', 'Weight'])
might work here

Not able to read txt file without comma separator in pandas python

CODE
import pandas
df = pandas.read_csv('biharpopulation.txt', delim_whitespace=True)
df.columns = ['SlNo','District','Total','Male','Female','Total','Male','Female','SC','ST','SC','ST']
DATA
SlNo District Total Male Female Total Male Female SC ST SC ST
1 Patna 729988 386991 342997 9236 5352 3884 15.5 0.2 38.6 68.7
2 Nalanda 473786 248246 225540 970 524 446 20.2 0.0 29.4 29.8
3 Bhojpur 343598 181372 162226 8337 4457 3880 15.3 0.4 39.1 46.7
4 Buxar 198014 104761 93253 8428 4573 3855 14.1 0.6 37.9 44.6
5 Rohtas 444333 233512 210821 25663 13479 12184 18.1 1.0 41.3 30.0
6 Kaimur 286291 151031 135260 35662 18639 17023 22.2 2.8 40.5 38.6
7 Gaya 1029675 529230 500445 2945 1526 1419 29.6 0.1 26.3 49.1
8 Jehanabad 174738 90485 84253 1019 530 489 18.9 0.07 32.6 32.4
9 Arawal 11479 57677 53802 294 179 115 18.8 0.04
10 Nawada 435975 223929 212046 2158 1123 1035 24.1 0.1 22.4 20.5
11 Aurangabad 472766 244761 228005 1640 865 775 23.5 0.1 35.7 49.7
Saran
12 Saran 389933 199772 190161 6667 3384 3283 12 0.2 33.6 48.5
13 Siwan 309013 153558 155455 13822 6856 6966 11.4 0.5 35.6 44.0
14 Gopalganj 267250 134796 132454 6157 2984 3173 12.4 0.3 32.1 37.8
15 Muzaffarpur 594577 308894 285683 3472 1789 1683 15.9 0.1 28.9 50.4
16 E. Champaran 514119 270968 243151 4812 2518 2294 13.0 0.1 20.6 34.3
17 W. Champaran 434714 228057 206657 44912 23135 21777 14.3 1.5 22.3 24.1
18 Sitamarhi 315646 166607 149039 1786 952 834 11.8 0.1 22.1 31.4
19 Sheohar 74391 39405 34986 64 35 29 14.4 0.0 16.9 38.8
20 Vaishali 562123 292711 269412 3068 1595 1473 20.7 0.1 29.4 29.9
21 Darbhanga 511125 266236 244889 841 467 374 15.5 0.0 24.7 49.5
22 Madhubani 481922 248774 233148 1260 647 613 13.5 0.0 22.2 35.8
23 Samastipur 628838 325101 303737 3362 2724 638 18.5 0.1 25.1 22.0
24 Munger 150947 80031 70916 18060 9297 8763 13.3 1.6 42.6 37.3
25 Begusarai 341173 177897 163276 1505 823 682 14.5 0.1 31.4 78.6
26 Shekhapura 103732 54327 49405 211 115 96 19.7 0.0 25.2 45.6
27 Lakhisarai 126575 65781 60794 5636 2918 2718 15.8 0.7 26.8 12.9
28 Jamui 242710 124538 118172 67357 34689 32668 17.4 4.8 24.5 26.7
The issue is with these 2 lines:
16 E. Champaran 514119 270968 243151 4812 2518 2294 13.0 0.1 20.6 34.3
17 W. Champaran 434714 228057 206657 44912 23135 21777 14.3 1.5 22.3 24.1
If you can somehow remove the space between E. Champaran and W. Champaran then you can do this:
df = pd.read_csv('test.csv', sep=r'\s+', skip_blank_lines=True, skipinitialspace=True)
print(df)
SlNo District Total Male Female Total.1 Male.1 Female.1 SC ST SC.1 ST.1
0 1 Patna 729988 386991 342997 9236 5352 3884 15.5 0.20 38.6 68.7
1 2 Nalanda 473786 248246 225540 970 524 446 20.2 0.00 29.4 29.8
2 3 Bhojpur 343598 181372 162226 8337 4457 3880 15.3 0.40 39.1 46.7
3 4 Buxar 198014 104761 93253 8428 4573 3855 14.1 0.60 37.9 44.6
4 5 Rohtas 444333 233512 210821 25663 13479 12184 18.1 1.00 41.3 30.0
5 6 Kaimur 286291 151031 135260 35662 18639 17023 22.2 2.80 40.5 38.6
6 7 Gaya 1029675 529230 500445 2945 1526 1419 29.6 0.10 26.3 49.1
7 8 Jehanabad 174738 90485 84253 1019 530 489 18.9 0.07 32.6 32.4
8 9 Arawal 11479 57677 53802 294 179 115 18.8 0.04 NaN NaN
9 10 Nawada 435975 223929 212046 2158 1123 1035 24.1 0.10 22.4 20.5
10 11 Aurangabad 472766 244761 228005 1640 865 775 23.5 0.10 35.7 49.7
11 12 Saran 389933 199772 190161 6667 3384 3283 12.0 0.20 33.6 48.5
12 13 Siwan 309013 153558 155455 13822 6856 6966 11.4 0.50 35.6 44.0
13 14 Gopalganj 267250 134796 132454 6157 2984 3173 12.4 0.30 32.1 37.8
14 15 Muzaffarpur 594577 308894 285683 3472 1789 1683 15.9 0.10 28.9 50.4
15 16 E.Champaran 514119 270968 243151 4812 2518 2294 13.0 0.10 20.6 34.3
16 17 W.Champaran 434714 228057 206657 44912 23135 21777 14.3 1.50 22.3 24.1
17 18 Sitamarhi 315646 166607 149039 1786 952 834 11.8 0.10 22.1 31.4
18 19 Sheohar 74391 39405 34986 64 35 29 14.4 0.00 16.9 38.8
19 20 Vaishali 562123 292711 269412 3068 1595 1473 20.7 0.10 29.4 29.9
20 21 Darbhanga 511125 266236 244889 841 467 374 15.5 0.00 24.7 49.5
21 22 Madhubani 481922 248774 233148 1260 647 613 13.5 0.00 22.2 35.8
22 23 Samastipur 628838 325101 303737 3362 2724 638 18.5 0.10 25.1 22.0
23 24 Munger 150947 80031 70916 18060 9297 8763 13.3 1.60 42.6 37.3
24 25 Begusarai 341173 177897 163276 1505 823 682 14.5 0.10 31.4 78.6
25 26 Shekhapura 103732 54327 49405 211 115 96 19.7 0.00 25.2 45.6
26 27 Lakhisarai 126575 65781 60794 5636 2918 2718 15.8 0.70 26.8 12.9
27 28 Jamui 242710 124538 118172 67357 34689 32668 17.4 4.80 24.5 26.7
Your problem is that the CSV is whitespace-delimited, but some of your district names also have whitespace in them. Luckily, none of the district names contain '\t' characters, so we can fix this:
df = pandas.read_csv('biharpopulation.txt', delimiter='\t')

Convert yearly cumulative data to monthly absolute values in Python

Say I have a yearly cumulative dataframe as follows:
date v1 v2
0 2019-10 109.23 126.17
1 2019-09 108.90 121.07
2 2019-08 95.96 85.40
3 2019-07 91.30 82.92
4 2019-06 80.19 26.04
5 2019-05 65.98 18.58
6 2019-04 38.80 9.87
7 2019-03 3.01 2.51
8 2019-02 3.01 2.49
9 2018-12 221.31 249.87
10 2018-11 215.59 137.92
11 2018-10 195.16 110.69
12 2018-09 160.45 101.15
13 2018-08 124.70 75.57
14 2018-07 122.98 52.48
15 2018-06 73.46 34.82
16 2018-05 42.22 34.61
17 2018-04 9.94 28.52
18 2018-03 4.07 28.52
19 2018-02 2.04 21.84
Just wonder if it's possible to generate cum_v1 and cum_v2 for each year data.
The logic of calculation is: value for cum_v1 in 2019-10 is calculated by value in 2019-10 (taking the initial amount) minus in 2019-09, until 2019-02 will keep same for cum_v1 as v1, and set 0 for all values in 2019-01. Same logic for the year of 2018.
The desired output will like this:
date v1 cum_v1 v2 cum_v2
0 2019-10 109.23 0.33 126.17 5.10
1 2019-09 108.90 12.94 121.07 35.67
2 2019-08 95.96 4.66 85.40 2.48
3 2019-07 91.30 11.11 82.92 56.88
4 2019-06 80.19 14.21 26.04 7.46
5 2019-05 65.98 27.18 18.58 8.71
6 2019-04 38.80 35.79 9.87 7.36
7 2019-03 3.01 0.00 2.51 0.02
8 2019-02 3.01 3.01 2.49 2.49
9 2019-01 0 0 0 0
10 2018-12 221.31 5.72 249.87 111.95
11 2018-11 215.59 20.43 137.92 27.23
12 2018-10 195.16 34.71 110.69 9.54
13 2018-09 160.45 35.75 101.15 25.58
14 2018-08 124.70 1.72 75.57 23.09
15 2018-07 122.98 49.52 52.48 17.66
16 2018-06 73.46 31.24 34.82 0.21
17 2018-05 42.22 32.28 34.61 6.09
18 2018-04 9.94 5.87 28.52 0.00
19 2018-03 4.07 2.03 28.52 6.68
20 2018-02 2.04 2.04 21.84 21.84
21 2018-01 0 0 0 0
Using pandas.Groupby with diff:
df[['cum_v1', 'cum_v2']] = df.groupby(df['date'].str[:4]).diff(-1).fillna(df[['v1', 'v2']])
print(df)
Output:
date v1 v2 cum_v1 cum_v2
0 2019-10 109.23 126.17 0.33 5.10
1 2019-09 108.90 121.07 12.94 35.67
2 2019-08 95.96 85.40 4.66 2.48
3 2019-07 91.30 82.92 11.11 56.88
4 2019-06 80.19 26.04 14.21 7.46
5 2019-05 65.98 18.58 27.18 8.71
6 2019-04 38.80 9.87 35.79 7.36
7 2019-03 3.01 2.51 0.00 0.02
8 2019-02 3.01 2.49 3.01 2.49
9 2018-12 221.31 249.87 5.72 111.95
10 2018-11 215.59 137.92 20.43 27.23
11 2018-10 195.16 110.69 34.71 9.54
12 2018-09 160.45 101.15 35.75 25.58
13 2018-08 124.70 75.57 1.72 23.09
14 2018-07 122.98 52.48 49.52 17.66
15 2018-06 73.46 34.82 31.24 0.21
16 2018-05 42.22 34.61 32.28 6.09
17 2018-04 9.94 28.52 5.87 0.00
18 2018-03 4.07 28.52 2.03 6.68
19 2018-02 2.04 21.84 2.04 21.84
Use DataFrameGroupBy.diff with Series.dt.year with columns in list, replace last missing values by original by DataFrame.fillna, add prefixes by DataFrame.add_prefix and last join to original by DataFrame.join:
df['date'] = pd.to_datetime(df['date']).dt.to_period('m')
cols = ['v1','v2']
df = df.join(df.groupby(df['date'].dt.year)[cols].diff(-1).fillna(df[cols]).add_prefix('cum'))
print(df)
date v1 v2 cumv1 cumv2
0 2019-10 109.23 126.17 0.33 5.10
1 2019-09 108.90 121.07 12.94 35.67
2 2019-08 95.96 85.40 4.66 2.48
3 2019-07 91.30 82.92 11.11 56.88
4 2019-06 80.19 26.04 14.21 7.46
5 2019-05 65.98 18.58 27.18 8.71
6 2019-04 38.80 9.87 35.79 7.36
7 2019-03 3.01 2.51 0.00 0.02
8 2019-02 3.01 2.49 3.01 2.49
9 2018-12 221.31 249.87 5.72 111.95
10 2018-11 215.59 137.92 20.43 27.23
11 2018-10 195.16 110.69 34.71 9.54
12 2018-09 160.45 101.15 35.75 25.58
13 2018-08 124.70 75.57 1.72 23.09
14 2018-07 122.98 52.48 49.52 17.66
15 2018-06 73.46 34.82 31.24 0.21
16 2018-05 42.22 34.61 32.28 6.09
17 2018-04 9.94 28.52 5.87 0.00
18 2018-03 4.07 28.52 2.03 6.68
19 2018-02 2.04 21.84 2.04 21.84
EDIT:
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').resample('MS').sum()
cols = ['v1','v2']
df = (df.join(df.groupby(df.index.year)[cols].diff(-1).fillna(df[cols])
.add_prefix('cum')).to_period('m'))
print(df)
v1 v2 cumv1 cumv2
date
2018-02 2.04 21.84 -2.03 -6.68
2018-03 4.07 28.52 -5.87 0.00
2018-04 9.94 28.52 -32.28 -6.09
2018-05 42.22 34.61 -31.24 -0.21
2018-06 73.46 34.82 -49.52 -17.66
2018-07 122.98 52.48 -1.72 -23.09
2018-08 124.70 75.57 -35.75 -25.58
2018-09 160.45 101.15 -34.71 -9.54
2018-10 195.16 110.69 -20.43 -27.23
2018-11 215.59 137.92 -5.72 -111.95
2018-12 221.31 249.87 221.31 249.87
2019-01 0.00 0.00 -3.01 -2.49
2019-02 3.01 2.49 0.00 -0.02
2019-03 3.01 2.51 -35.79 -7.36
2019-04 38.80 9.87 -27.18 -8.71
2019-05 65.98 18.58 -14.21 -7.46
2019-06 80.19 26.04 -11.11 -56.88
2019-07 91.30 82.92 -4.66 -2.48
2019-08 95.96 85.40 -12.94 -35.67
2019-09 108.90 121.07 -0.33 -5.10
2019-10 109.23 126.17 109.23 126.17

Read excel row by row and do transpose, Python 3.6

I have excel file with below data and I want to read data where First Column contains 'Area' & transpose, then again move & find where Column contains 'Area' & transpose
In this data total 3 table data given, I want to split it & then transpose. First Column contains Area code and other column name contains Year
Area 1980 1981 1982 1983
AU 33.7 38.8 40.2 42.5
BE 54.6 51.6 49.7 48.9
FI 43.2 49.6 58.8 71.1
Area 1979 1980 1981 1982
AU 29.8 33.7 38.8 40.2
BE 54.2 54.6 51.6 49.7
CA 39.4 44.3 50.6 48
Area 1978 1979 1980 1981
DK 58 57.2 54.5 53.2
FI 37.7 43.2 49.6 58.8
FR 41.6 49.9 55.4 58.5
Final Result expected:
Area variable value
AU 1980 33.7
other values
How to achieve this?
Assuming that we have the following list of DataFrame's:
In [106]: dfs
Out[106]:
[ Area 1980 1981 1982 1983
0 AU 33.7 38.8 40.2 42.5
1 BE 54.6 51.6 49.7 48.9
2 FI 43.2 49.6 58.8 71.1, Area 1979 1980 1981 1982
0 AU 29.8 33.7 38.8 40.2
1 BE 54.2 54.6 51.6 49.7
2 CA 39.4 44.3 50.6 48.0, Area 1978 1979 1980 1981
0 DK 58.0 57.2 54.5 53.2
1 FI 37.7 43.2 49.6 58.8
2 FR 41.6 49.9 55.4 58.5]
first we concatenate them horizontally:
In [107]: df = pd.concat([x.set_index('Area') for x in dfs], axis=1)
In [108]: df
Out[108]:
1980 1981 1982 1983 1979 1980 1981 1982 1978 1979 1980 1981
AU 33.7 38.8 40.2 42.5 29.8 33.7 38.8 40.2 NaN NaN NaN NaN
BE 54.6 51.6 49.7 48.9 54.2 54.6 51.6 49.7 NaN NaN NaN NaN
CA NaN NaN NaN NaN 39.4 44.3 50.6 48.0 NaN NaN NaN NaN
DK NaN NaN NaN NaN NaN NaN NaN NaN 58.0 57.2 54.5 53.2
FI 43.2 49.6 58.8 71.1 NaN NaN NaN NaN 37.7 43.2 49.6 58.8
FR NaN NaN NaN NaN NaN NaN NaN NaN 41.6 49.9 55.4 58.5
now we can stack DF and rename columns:
In [109]: df.stack().reset_index() \
.rename(columns={'level_0':'Area','level_1':'variable',0:'value'})
Out[109]:
Area variable value
0 AU 1980 33.7
1 AU 1981 38.8
2 AU 1982 40.2
3 AU 1983 42.5
4 AU 1979 29.8
5 AU 1980 33.7
6 AU 1981 38.8
7 AU 1982 40.2
8 BE 1980 54.6
9 BE 1981 51.6
10 BE 1982 49.7
11 BE 1983 48.9
12 BE 1979 54.2
13 BE 1980 54.6
14 BE 1981 51.6
15 BE 1982 49.7
16 CA 1979 39.4
17 CA 1980 44.3
18 CA 1981 50.6
19 CA 1982 48.0
20 DK 1978 58.0
21 DK 1979 57.2
22 DK 1980 54.5
23 DK 1981 53.2
24 FI 1980 43.2
25 FI 1981 49.6
26 FI 1982 58.8
27 FI 1983 71.1
28 FI 1978 37.7
29 FI 1979 43.2
30 FI 1980 49.6
31 FI 1981 58.8
32 FR 1978 41.6
33 FR 1979 49.9
34 FR 1980 55.4
35 FR 1981 58.5
what have you tried thus far?
Pandas is a really good library to use for data parsing etc.
you could implement something along the lines of...
import pandas as pd
df = pd.DataFrame.from_csv(csv_filename)
def create_new_table(df):
start = 0
end = 3
while (df.last_valid_index() != end):
#create a new dataframe with the relevant column
newdf.transpose()
start = end
end = end + 3

Categories