Selecting Rows in pandas multiindex using query - python

I have the following dataframe:
Attributes Adj Close
Symbols ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS BAJAJ-AUTO.NS BAJFINANCE.NS BAJAJFINSV.NS BHARTIARTL.NS INFRATEL.NS BPCL.NS BRITANNIA.NS ... TCS.NS TATAMOTORS.NS TATASTEEL.NS TECHM.NS TITAN.NS ULTRACEMCO.NS UPL.NS VEDL.NS WIPRO.NS ZEEL.NS
month day
1 1 279.239893 676.232860 290.424052 2324.556588 974.134152 3710.866499 290.157978 243.696764 146.170036 950.108271 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 240.371331 507.737111 236.844831 2340.821987 718.111446 3042.034076 277.125503 236.177303 122.136606 733.759396 ... -2.714824 2.830603 109.334502 -17.856865 13.293902 18.980020 0.689529 -0.006994 -3.862265 -10.423989
3 241.700116 498.997079 213.632179 2368.956136 746.050460 3292.162304 279.075750 231.213816 114.698633 686.986466 ... 0.075497 -0.629591 -0.241416 -0.260787 1.392858 -1.196444 -0.660421 -0.161608 -0.243293 -1.687734
4 223.532480 439.849441 201.245454 2391.910913 499.554044 2313.025635 287.582485 276.568762 104.650728 603.446742 ... -1.270405 0.178012 0.109399 -0.224380 -0.415277 -5.050810 -0.084462 -0.075032 3.924894 0.959136
5 213.588413 359.632790 187.594303 2442.596619 309.180993 1587.324934 260.401816 305.384079 95.571235 475.708696 ... -0.995601 -1.093621 0.214684 -1.189623 -2.503186 -0.511994 -0.512211 0.693024 -1.025715 -1.516946
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12 27 238.901700 500.376711 227.057510 2413.230611 748.599821 3299.320564 276.806537 242.597250 124.235449 727.263012 ... 2.770155 -4.410527 -0.031403 -5.315438 -1.792164 1.038870 -0.860125 -1.258880 -0.933370 -1.487581
28 236.105050 461.535601 218.893424 2375.671582 542.521903 2613.480190 284.374906 264.309625 117.807956 681.625725 ... -0.614677 -1.045941 0.688749 -0.375988 1.848569 -1.362454 37.301528 4.794349 -21.079648 -2.224608
29 215.606034 372.030459 203.876520 2450.112244 324.772498 1765.010912 257.278008 300.096024 108.679112 543.112336 ... 3.220893 -28.873421 0.197491 0.649738 0.737047 -6.121189 -1.165286 0.197648 0.250269 -0.064486
30 205.715512 432.342895 235.872734 2279.715479 515.535031 2164.257183 237.584375 253.401642 116.322402 634.503822 ... -1.190093 0.111826 -1.100066 -0.274475 -1.107278 -0.638013 -7.148901 -0.594369 -0.622608 0.368726
31 222.971462 490.784491 246.348255 2211.909688 670.891505 2671.694809 260.623987 230.032092 108.617400 719.389436 ... -1.950700 0.994181 -11.328524 -1.575859 -8.297147 1.151578 -0.059656 -0.650074 -0.648105 -0.749307
366 rows × 601 columns
To select the row which is month 1 and day 1 i have used the following code:
df.query('month ==1' and 'day ==1')
But this produced the following dataframe:
Attributes Adj Close
Symbols ADANIPORTS.NS ASIANPAINT.NS AXISBANK.NS BAJAJ-AUTO.NS BAJFINANCE.NS BAJAJFINSV.NS BHARTIARTL.NS INFRATEL.NS BPCL.NS BRITANNIA.NS ... TCS.NS TATAMOTORS.NS TATASTEEL.NS TECHM.NS TITAN.NS ULTRACEMCO.NS UPL.NS VEDL.NS WIPRO.NS ZEEL.NS
month day
1 1 279.239893 676.232860 290.424052 2324.556588 974.134152 3710.866499 290.157978 243.696764 146.170036 950.108271 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1 215.752040 453.336287 213.741552 2373.224390 517.295897 2289.618629 280.212598 253.640594 104.505893 620.435294 ... -2.526060 -1.059128 -2.052233 3.941005 25.233763 -41.377432 1.032536 7.398859 -4.622867 -1.506376
3 1 233.534958 472.889636 204.900776 2318.030298 561.193189 2697.357413 254.006857 250.426263 106.528327 649.475321 ... -2.269081 -1.375370 -1.734496 27.675276 -1.944131 0.401074 -0.852499 -0.119033 -1.723600 -1.930760
4 1 192.280787 467.604906 227.369618 1982.318034 506.188324 1931.920305 252.626459 226.062386 98.663596 637.086713 ... -0.044923 -0.111909 -0.181328 -1.943672 1.983368 -1.677000 -0.531217 0.032385 -0.956535 -2.015332
5 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
6 1 230.836429 509.991614 218.370072 2463.180957 526.564244 2231.603166 289.425584 298.146594 118.566019 754.736115 ... -0.807933 -1.509616 1.792957 10.396550 -1.060003 2.008286 1.029651 6.690478 -3.114476 0.766063
7 1 197.943186 355.930544 242.388461 2168.834937 412.196744 1753.647647 233.189894 241.823186 90.870574 512.000742 ... -1.630295 11.019253 -0.244958 2.188104 -0.505939 -0.564639 -1.747775 -0.394980 -2.736355 -0.140087
8 1 236.361903 491.867703 218.289537 2102.183175 657.764627 2792.688073 264.695685 249.063224 108.213277 662.192035 ... -1.655988 -1.555488 -1.199192 -0.565774 -1.831832 -4.770262 -0.442534 -6.168488 -0.267261 -3.324977
9 1 229.131335 372.101859 225.172708 2322.747894 333.243305 1800.901049 246.923254 287.262203 114.754666 562.854895 ... -2.419973 0.205031 -1.096847 -0.840121 -2.932670 1.719342 6.196965 -2.674245 -6.542936 -2.526353
10 1 208.748352 429.829772 222.081509 2095.421448 553.005620 2204.335371 259.718945 229.177512 102.475334 641.439810 ... 0.752312 -1.371583 -1.367145 -5.607321 3.259092 26.787332 -1.023199 -0.589042 0.507405 2.428903
11 1 248.233805 545.774276 241.743095 2390.945333 803.738236 3088.686081 277.757322 243.703551 131.933623 789.243830 ... -1.882445 -0.660089 -0.476966 -1.097497 -0.525270 -0.857579 -0.702017 0.016806 -0.792296 -0.368364
12 1 200.472858 353.177721 200.870312 2451.274841 295.858735 1556.379498 255.714673 301.000198 103.908244 514.528562 ... -0.789445 -14.382776 0.196276 -0.394203 7.600042 48.345830 -0.276618 -0.411825 2.271997 42.734886
12 rows × 601 columns
it has produced day 1 for each month instead of row which will show month 1 and day 1. What can i do to resolve this issue?

Remove one '' for one string:
df.query('month == 1 and day == 1')

Related

Using DataFrame Columns as id

Does anyone know how to transform this DataFrame in a way that the column names become a query ID (keeping the df length) and the values are flattened. I am trying to learn about 'learning to rank' algorithms. Thanks for the help.
AUD=X CAD=X CHF=X ... SGD=X THB=X ZAR=X
Date ...
2004-06-30 NaN 1.33330 1.25040 ... 1.72090 40.834999 6.12260
2004-07-01 NaN 1.33160 1.24900 ... 1.71420 40.716999 6.16500
2004-07-02 NaN 1.32270 1.23320 ... 1.71160 40.638000 6.12010
2004-07-05 NaN 1.32470 1.23490 ... 1.71480 40.658001 6.15010
2004-07-06 NaN 1.32660 1.23660 ... 1.71530 40.765999 6.20990
... ... ... ... ... ... ...
2021-07-19 1.352997 1.26169 0.91853 ... 1.35630 32.810001 14.38950
2021-07-20 1.362546 1.27460 0.91850 ... 1.36360 32.840000 14.53068
2021-07-21 1.362600 1.26751 0.92123 ... 1.36621 32.820000 14.59157
2021-07-22 1.360060 1.25689 0.91757 ... 1.36383 32.849998 14.57449
2021-07-23 1.354922 1.25640 0.91912 ... 1.35935 32.879002 14.69760
In [3]: df
Out[3]:
AUD=X CAD=X CHF=X SGD=X THB=X ZAR=X
Date
2004-06-30 NaN 1.3333 1.2504 1.7209 40.834999 6.1226
2004-07-01 NaN 1.3316 1.2490 1.7142 40.716999 6.1650
2004-07-02 NaN 1.3227 1.2332 1.7116 40.638000 6.1201
2004-07-05 NaN 1.3247 1.2349 1.7148 40.658001 6.1501
2004-07-06 NaN 1.3266 1.2366 1.7153 40.765999 6.2099
In [6]: df.columns = df.columns.str.slice(0, -2)
In [8]: df.T
Out[8]:
Date 2004-06-30 2004-07-01 2004-07-02 2004-07-05 2004-07-06
AUD NaN NaN NaN NaN NaN
CAD 1.333300 1.331600 1.3227 1.324700 1.326600
CHF 1.250400 1.249000 1.2332 1.234900 1.236600
SGD 1.720900 1.714200 1.7116 1.714800 1.715300
THB 40.834999 40.716999 40.6380 40.658001 40.765999
ZAR 6.122600 6.165000 6.1201 6.150100 6.209900
I'm still not super clear on the requirements, but this transformation might help.

How can I manipulate my DataFrame/Table in order to display in the following format?

How can I modify the output from what it is currently, into the arrangement of the output as described at the bottom? I've tried stacking and un-stacking but I can't seem to hit the head on the nail. Help would be highly appreciated.
My code:
portfolio_count = 0
Equity_perportfolio = []
Portfolio_sequence = []
while portfolio_count < 1:
# declaring list
list = Tickers
portfolio_count = portfolio_count + 1
# initializing the value of n (Number of assets in portfolio)
n = 5
# printing n elements from list (add number while printing the potential portfolio)
potential_portfolio = random.sample(list, n)
print("Portfolio number", portfolio_count)
print(potential_portfolio)
#Pull 'relevant data' about the selected stocks. (Yahoo API?) # 1. df with Index Date and Closing
price_data_close = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')['Close']
price_data = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')
print(price_data)
Which gives me the following structure:(IGNORE NaNs)
Attributes Adj Close ... Volume
Symbols D HOLX PSX ... PSX MGM PG
Date ...
2012-01-03 36.209511 17.840000 NaN ... NaN 25873300.0 11565900.0
2012-01-04 35.912926 17.910000 NaN ... NaN 14717400.0 10595400.0
2012-01-05 35.837063 18.360001 NaN ... NaN 12437500.0 10085300.0
2012-01-06 35.471519 18.570000 NaN ... NaN 9079700.0 8421200.0
2012-01-09 35.423241 18.520000 NaN ... NaN 15750100.0 7836100.0
... ... ... ... ... ... ... ...
2021-03-25 75.220001 71.050003 82.440002 ... 2613300.0 9601500.0 7517300.0
2021-03-26 75.779999 73.419998 84.309998 ... 2368900.0 7809100.0 10820100.0
2021-03-29 76.699997 74.199997 82.529999 ... 1880600.0 7809700.0 11176000.0
2021-03-30 75.529999 73.870003 82.309998 ... 1960600.0 5668500.0 8090600.0
2021-03-31 75.959999 74.379997 81.540001 ... 2665200.0 7029900.0 9202600.0
However, I wanted it to output in this format:
Date Symbols Open High Low Close Volume Adjusted
04/12/2020 MMM 172.130005 173.160004 171.539993 172.460007 2663600 171.050461
07/12/2020 MMM 171.720001 172.5 169.179993 170.149994 2526800 168.759323
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
15/12/2020 PSX 174.389999 175.059998 172.550003 174.679993 2270600 173.252304
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
23/12/2020 PG 175.300003 175.809998 173.960007 173.990005 1762600 172.567963
28/12/2020 PG 175.309998 176.399994 174.389999 174.710007 1403000 173.282074
29/12/2020 PG 175.550003 175.639999 173.149994 173.850006 1218900 172.429108
31/12/2020 PG 174.119995 174.869995 173.179993 174.789993 1841300 173.361404
05/01/2021 PG 172.009995 173.25 170.649994 171.580002 2295300 170.177643
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
08/01/2021 MMM 169.169998 169.539993 164.610001 166.619995 4808100 165.258179
13/01/2021 MMM 167.270004 167.740005 166.050003 166.279999 2098000 164.920959
15/01/2021 MMM 165.630005 166.259995 163.380005 165.550003 3550700 164.19693
19/01/2021 MMM 167.259995 169.550003 166.800003 169.119995 3903200 167.737747

List of dataframes with different column names to a single pandas dataframe

I have list of 3 dataframes of stock tickers and prices I want to convert into a single dataframe.
dataframes:
[ Date AMBU-B.CO BAVA.CO CARL-B.CO CHR.CO COLO-B.CO \
0 2020-01-02 112.500000 172.850006 984.400024 525.599976 814.000000
1 2020-01-03 111.300003 171.199997 989.799988 526.799988 812.000000
2 2020-01-06 108.150002 166.100006 1001.000000 519.599976 820.200012
3 2020-01-07 110.500000 170.000000 1002.000000 522.400024 823.599976
4 2020-01-08 109.599998 171.399994 993.000000 510.399994 820.000000
.. ... ... ... ... ... ...
308 2021-03-25 270.000000 295.200012 965.799988 562.599976 964.200012
309 2021-03-26 271.299988 302.000000 974.599976 548.599976 954.000000
310 2021-03-29 281.000000 294.000000 981.400024 575.000000 968.200012
311 2021-03-30 280.899994 282.600006 986.599976 567.400024 950.200012
312 2021-03-31 297.899994 286.399994 974.599976 576.400024 953.799988
DANSKE.CO DEMANT.CO DSV.CO FLS.CO ... NETC.CO \
0 110.349998 208.600006 769.799988 272.500000 ... 314.000000
1 107.900002 206.600006 751.400024 267.899994 ... 313.000000
2 106.699997 206.500000 752.400024 265.600006 ... 309.799988
3 107.750000 204.399994 753.799988 273.399994 ... 309.200012
4 108.250000 205.600006 755.799988 268.000000 ... 309.200012
.. ... ... ... ... ... ...
308 117.349998 260.399994 1170.000000 230.199997 ... 603.000000
309 120.050003 267.600006 1212.500000 237.800003 ... 603.500000
310 118.750000 267.100006 1206.000000 238.300003 ... 599.000000
311 120.500000 265.500000 1213.500000 243.600006 ... 592.000000
312 118.699997 268.700012 1244.500000 243.100006 ... 604.000000
NOVO-B.CO NZYM-B.CO ORSTED.CO PNDORA.CO RBREW.CO ROCK-B.CO \
0 388.700012 327.100006 681.000000 293.000000 603.000000 1584.0
1 383.200012 322.500000 677.400024 293.200012 605.200012 1567.0
2 382.049988 321.200012 670.200012 328.200012 601.599976 1547.0
3 381.700012 322.000000 662.000000 339.299988 612.200012 1546.0
4 382.500000 322.700012 645.000000 343.600006 602.200012 1531.0
.. ... ... ... ... ... ...
308 425.450012 403.399994 983.000000 655.799988 658.400024 2506.0
309 423.549988 404.100006 1013.500000 666.400024 666.599976 2672.0
310 431.549988 404.000000 1013.000000 678.400024 669.799988 2650.0
311 430.700012 401.500000 998.799988 678.400024 672.000000 2632.0
312 429.750000 406.299988 1024.500000 679.599976 663.400024 2674.0
SIM.CO TRYG.CO VWS.CO
0 776.0 196.399994 659.400024
1 764.5 195.600006 648.599976
2 751.5 195.000000 648.400024
3 753.5 200.000000 639.599976
4 762.0 197.500000 645.400024
.. ... ... ...
308 769.0 145.300003 1138.500000
309 775.5 146.500000 1187.000000
310 772.0 149.000000 1217.000000
311 781.0 149.800003 1245.000000
312 785.5 149.600006 1302.000000
[313 rows x 26 columns],
Date 1COV.DE ADS.DE ALV.DE BAS.DE BAYN.DE \
0 2020-01-02 42.180000 291.549988 221.500000 68.290001 73.519997
1 2020-01-03 41.900002 291.950012 219.050003 67.269997 72.580002
2 2020-01-06 39.889999 289.649994 217.699997 66.269997 71.739998
3 2020-01-07 40.130001 294.750000 218.199997 66.300003 72.129997
4 2020-01-08 40.830002 302.850006 218.300003 65.730003 74.000000
.. ... ... ... ... ... ...
314 2021-03-29 56.439999 264.100006 214.600006 70.029999 53.360001
315 2021-03-30 58.200001 265.000000 219.050003 71.879997 53.750000
316 2021-03-31 57.340000 266.200012 217.050003 70.839996 53.959999
317 2021-04-01 57.660000 267.950012 217.649994 71.629997 53.419998
318 2021-04-01 57.660000 267.950012 217.649994 71.629997 53.419998
BEI.DE BMW.DE CON.DE DAI.DE ... IFX.DE LIN.DE \
0 105.650002 74.220001 116.400002 49.974998 ... 20.684999 190.050003
1 105.650002 73.320000 113.980003 49.070000 ... 20.389999 185.300003
2 106.000000 73.050003 112.680000 48.805000 ... 20.045000 183.600006
3 105.750000 74.220001 115.120003 49.195000 ... 21.040001 185.300003
4 106.199997 74.410004 117.339996 49.470001 ... 21.309999 185.850006
.. ... ... ... ... ... ... ...
314 90.220001 85.599998 111.949997 73.709999 ... 34.880001 237.000000
315 90.040001 88.800003 113.449997 75.940002 ... 35.535000 238.500000
316 90.099998 88.470001 112.699997 76.010002 ... 36.154999 238.899994
317 90.500000 89.519997 112.760002 NaN ... 36.570000 238.699997
318 90.500000 89.519997 112.760002 74.970001 ... 36.570000 238.699997
MRK.DE MTX.DE MUV2.DE RWE.DE SAP.DE SIE.DE \
0 106.000000 258.100006 265.899994 26.959999 122.000000 118.639999
1 107.250000 257.799988 262.600006 26.840000 120.459999 116.360001
2 108.400002 258.000000 262.700012 26.450001 119.559998 115.820000
3 109.500000 262.299988 264.500000 27.049999 120.099998 116.559998
4 111.300003 263.000000 265.000000 27.170000 120.820000 117.040001
.. ... ... ... ... ... ...
314 145.949997 196.199997 260.200012 32.709999 104.300003 137.839996
315 145.949997 201.300003 265.000000 32.400002 103.559998 141.080002
316 145.800003 200.699997 262.600006 33.419998 104.419998 140.000000
317 145.800003 206.199997 266.049988 34.060001 106.000000 141.020004
318 145.800003 206.199997 266.049988 34.060001 106.000000 141.020004
VNA.DE VOW3.DE
0 48.419998 180.500000
1 48.599998 176.639999
2 48.450001 176.619995
3 48.709999 176.059998
4 48.970001 176.820007
.. ... ...
314 55.599998 229.750000
315 55.619999 240.550003
316 55.700001 238.600006
317 56.099998 235.850006
318 56.099998 235.850006
[319 rows x 31 columns],
Date ADE.OL AKRBP.OL BAKKA.OL DNB.OL EQNR.OL \
0 2020-01-02 106.800003 289.000000 664.0 165.800003 177.949997
1 2020-01-03 108.199997 292.899994 670.0 164.850006 180.949997
2 2020-01-06 107.000000 296.299988 654.0 164.899994 185.000000
3 2020-01-07 111.199997 295.700012 657.5 163.899994 183.000000
4 2020-01-08 108.800003 295.299988 668.5 166.000000 183.600006
.. ... ... ... ... ... ...
310 2021-03-25 133.000000 237.500000 633.0 178.050003 164.449997
311 2021-03-26 133.300003 244.199997 640.0 181.449997 167.649994
312 2021-03-29 131.100006 248.199997 660.0 182.000000 169.750000
313 2021-03-30 126.900002 244.800003 672.0 182.500000 168.600006
314 2021-03-31 125.900002 242.800003 677.5 182.000000 167.300003
GJF.OL LSG.OL MOWI.OL NAS.OL ... NHY.OL \
0 184.149994 59.240002 229.500000 4094.000000 ... 33.410000
1 185.100006 58.900002 229.800003 3986.000000 ... 32.660000
2 182.550003 59.000000 229.199997 3857.000000 ... 32.299999
3 184.600006 59.000000 227.199997 3964.000000 ... 32.220001
4 184.199997 59.700001 226.699997 3964.000000 ... 32.090000
.. ... ... ... ... ... ...
310 199.199997 70.680000 205.500000 53.299999 ... 50.060001
311 200.000000 71.959999 208.000000 53.020000 ... 53.080002
312 200.600006 73.099998 209.699997 55.000000 ... 53.060001
313 200.399994 73.419998 210.800003 60.759998 ... 53.419998
314 200.600006 73.099998 212.199997 66.400002 ... 54.759998
ORK.OL SALM.OL SCATC.OL SCHA.OL STB.OL SUBC.OL \
0 89.959999 454.000000 123.400002 271.299988 69.900002 105.900002
1 89.699997 453.899994 123.000000 272.100006 69.500000 107.150002
2 89.139999 453.500000 117.300003 268.299988 68.639999 108.150002
3 89.879997 447.700012 116.000000 272.299988 69.720001 107.699997
4 87.720001 451.799988 118.400002 271.899994 70.139999 107.250000
.. ... ... ... ... ... ...
310 84.000000 568.799988 235.000000 368.200012 81.779999 87.800003
311 84.400002 581.799988 237.600006 375.700012 83.860001 87.000000
312 84.839996 585.000000 244.600006 367.399994 84.540001 87.820000
313 84.800003 587.400024 246.399994 361.000000 85.400002 87.279999
314 83.839996 590.000000 258.600006 359.000000 86.139999 85.900002
TEL.OL TOM.OL YAR.OL
0 157.649994 287.799988 361.299988
1 158.800003 284.399994 356.000000
2 159.399994 280.000000 356.000000
3 156.850006 274.000000 351.399994
4 155.449997 278.600006 357.299988
.. ... ... ...
310 149.350006 376.200012 438.000000
311 149.050003 376.700012 444.000000
312 151.000000 378.500000 448.500000
313 150.600006 372.799988 447.200012
314 150.500000 370.299988 444.799988
[315 rows x 21 columns]]
I found out that to solve this one usually uses pd.concat, but this does not seem to work for me:
df = pd.concat(dataframes)
df
It seems to return a lot of NANs, and it should not. How to solve this? If it can help, all dataframes uses the same dates from 2020-01-02 to 2021-03-31.
Date AMBU-B.CO BAVA.CO CARL-B.CO CHR.CO COLO-B.CO DANSKE.CO DEMANT.CO DSV.CO FLS.CO ... NHY.OL ORK.OL SALM.OL SCATC.OL SCHA.OL STB.OL SUBC.OL TEL.OL TOM.OL YAR.OL
0 2020-01-02 112.500000 172.850006 984.400024 525.599976 814.000000 110.349998 208.600006 769.799988 272.500000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2020-01-03 111.300003 171.199997 989.799988 526.799988 812.000000 107.900002 206.600006 751.400024 267.899994 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2020-01-06 108.150002 166.100006 1001.000000 519.599976 820.200012 106.699997 206.500000 752.400024 265.600006 ... NaN NaN NaN
EDIT: here is how dataframes are created to start with:
def motor_daily(ticker_list):
#function uses start and end dates to get closing prices for certain stocks.
df = yf.download(ticker_list, start=phase_2.start(),
end=phase_2.tomorrow()).Close
return df
def ticker_data(list):
#function takes "ticks" which is a list of ticker names and uses
#motor_daily_big_function to get data frame yahoo API
data = []
for ticks in list:
data.append(motor_daily(ticks))
return data
res = ticker_data(list_of_test)
dataframes = [pd.DataFrame(lst) for lst in res]
I fixed it myself, here is what I did:
dataframes_concat = pd.concat(dataframes)
df1 = dataframes_concat.groupby('Date', as_index=True).first()
print(df1)
AMBU-B.CO BAVA.CO CARL-B.CO CHR.CO COLO-B.CO DANSKE.CO DEMANT.CO DSV.CO FLS.CO GMAB.CO ... NHY.OL ORK.OL SALM.OL SCATC.OL SCHA.OL STB.OL SUBC.OL TEL.OL TOM.OL YAR.OL
Date
2020-01-02 112.500000 172.850006 984.400024 525.599976 814.000000 110.349998 208.600006 769.799988 272.500000 1486.5 ... 33.410000 89.959999 454.000000 123.400002 271.299988 69.900002 105.900002 157.649994 287.799988 361.299988
2020-01-03 111.300003 171.199997 989.799988 526.799988 812.000000 107.900002 206.600006 751.400024 267.899994 1444.5 ... 32.660000 89.699997 453.899994 123.000000 272.100006 69.500000 107.150002 158.800003 284.399994 356.000000
2020-01-06 108.150002 166.100006 1001.000000 519.599976 820.200012 106.699997 206.500000 752.400024 265.600006 1419.5 ... 32.299999 89.139999 453.500000 117.300003 268.299988 68.639999 108.150002 159.399994 280.000000 356.000000
2020-01-07 110.500000 170.000000 1002.000000 522.400024 823.599976 107.750000 204.399994 753.799988 273.399994 1456.0 ... 32.220001 89.879997 447.700012 116.000000 272.299988 69.720001 107.699997 156.850006 274.000000 351.399994
2020-01-08 109.599998 171.399994 993.000000 510.399994 820.000000 108.250000 205.600006 755.799988 268.000000 1466.5 ... 32.090000 87.720001 451.799988 118.400002 271.899994 70.139999 107.250000 155.449997 278.600006 357.299988
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-03-26 271.299988 302.000000 974.599976 548.599976 954.000000 120.050003 267.600006 1212.500000 237.800003 2045.0 ... 53.080002 84.400002 581.799988 237.600006 375.700012 83.860001 87.000000 149.050003 376.700012 444.000000
2021-03-29 281.000000 294.000000 981.400024 575.000000 968.200012 118.750000 267.100006 1206.000000 238.300003 2028.0 ... 53.060001 84.839996 585.000000 244.600006 367.399994 84.540001 87.820000 151.000000 378.500000 448.500000
2021-03-30 280.899994 282.600006 986.599976 567.400024 950.200012 120.500000 265.500000 1213.500000 243.600006 2019.0 ... 53.419998 84.800003 587.400024 246.399994 361.000000 85.400002 87.279999 150.600006 372.799988 447.200012
2021-03-31 297.899994 286.399994 974.599976 576.400024 953.799988 118.699997 268.700012 1244.500000 243.100006 2087.0 ... 54.759998 83.839996 590.000000 258.600006 359.000000 86.139999 85.900002 150.500000 370.299988 444.799988
2021-04-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
last row is NAN as markets are closed for easter.

Rolling Mean not being calculated on a new column

I have an issue on calculating the rolling mean for a column I added in the code. For some reason, it doesnt work on the column I added but works on a column from the original csv.
Original dataframe from the csv as follow:
Open High Low Last Change Volume Open Int
Time
09/20/19 98.50 99.00 98.35 98.95 0.60 3305.0 0.0
09/19/19 100.35 100.75 98.10 98.35 -2.00 17599.0 0.0
09/18/19 100.65 101.90 100.10 100.35 0.00 18258.0 121267.0
09/17/19 103.75 104.00 100.00 100.35 -3.95 34025.0 122453.0
09/16/19 102.30 104.95 101.60 104.30 1.55 21403.0 127447.0
Ticker = pd.read_csv('\\......\Historical data\kcz19 daily.csv',
index_col=0, parse_dates=True)
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1)).fillna('')
Ticker['ret20'] = Ticker['Return'].rolling(window=20, win_type='triang').mean()
print(Ticker.head())
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 -0.00608213
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 0.0201315
09/17/19 103.75 104.00 100.00 ... 122453.0 0 0
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 0.0386073
ret20 column should have the rolling mean of the column Return so it should show some data starting from raw 21 whereas it is only a copy of column Return here.
If I replace with the Last column it will work.
Below is the result using colum Last
Open High Low ... Open Int Return ret20
Time ...
09/20/19 98.50 99.00 98.35 ... 0.0 NaN
09/19/19 100.35 100.75 98.10 ... 0.0 -0.00608213 NaN
09/18/19 100.65 101.90 100.10 ... 121267.0 0.0201315 NaN
09/17/19 103.75 104.00 100.00 ... 122453.0 0 NaN
09/16/19 102.30 104.95 101.60 ... 127447.0 0.0386073 NaN
09/13/19 103.25 103.60 102.05 ... 128707.0 -0.0149725 NaN
09/12/19 102.80 103.85 101.15 ... 128904.0 0.00823848 NaN
09/11/19 102.00 104.70 101.40 ... 132067.0 -0.00193237 NaN
09/10/19 98.50 102.25 98.00 ... 135349.0 -0.0175614 NaN
09/09/19 97.00 99.25 95.30 ... 137347.0 -0.0335283 NaN
09/06/19 95.35 97.30 95.00 ... 135399.0 -0.0122889 NaN
09/05/19 96.80 97.45 95.05 ... 136142.0 -0.0171477 NaN
09/04/19 95.65 96.95 95.50 ... 134864.0 0.0125002 NaN
09/03/19 96.00 96.60 94.20 ... 134685.0 -0.0109291 NaN
08/30/19 95.40 97.20 95.10 ... 134061.0 0.0135137 NaN
08/29/19 97.05 97.50 94.75 ... 132639.0 -0.0166584 NaN
08/28/19 97.40 98.15 95.95 ... 130573.0 0.0238601 NaN
08/27/19 97.35 98.00 96.40 ... 129921.0 -0.00410889 NaN
08/26/19 95.55 98.50 95.25 ... 129003.0 0.0035962 NaN
08/23/19 96.90 97.40 95.05 ... 130268.0 -0.0149835 98.97775
Appreciate any help
the .fillna('') is creating a string in the first row which then creates errors for the rolling calculation in Ticker['ret20'].
Delete this and the code will run fine:
Ticker['Return'] = np.log(Ticker['Last'] / Ticker['Last'].shift(1))

How to join a dataframe to a Series with different indices

I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?
Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735

Categories