How can I process a DataFrame without a for loop?

How can I process a DataFrame without a for loop? - python

My DataFrame is:
Date Open High Low Close Adj Close Volume
5932 2016-08-18 218.339996 218.899994 218.210007 218.860001 207.483215 52989300
5933 2016-08-19 218.309998 218.750000 217.740005 218.539993 207.179825 75443000
5934 2016-08-22 218.259995 218.800003 217.830002 218.529999 207.170364 61368800
5935 2016-08-23 219.250000 219.600006 218.899994 218.970001 207.587479 53399200
5936 2016-08-24 218.800003 218.910004 217.360001 217.850006 206.525711 71728900
5937 2016-08-25 217.399994 218.190002 217.220001 217.699997 206.383514 69224800
5938 2016-08-26 217.919998 219.119995 216.250000 217.289993 205.994827 122506300
5939 2016-08-29 217.440002 218.669998 217.399994 218.360001 207.009201 68606100
5940 2016-08-30 218.259995 218.589996 217.350006 218.000000 206.667908 58114500
5941 2016-08-31 217.610001 217.750000 216.470001 217.380005 206.080124 85269500
5942 2016-09-01 217.369995 217.729996 216.029999 217.389999 206.089645 97844200
5943 2016-09-02 218.389999 218.869995 217.699997 218.369995 207.018692 79293900
5944 2016-09-06 218.699997 219.119995 217.860001 219.029999 207.644394 56702100
5945 2016-09-07 218.839996 219.220001 218.300003 219.009995 207.625412 76554900
5946 2016-09-08 218.619995 218.940002 218.149994 218.509995 207.151398 73011600
5947 2016-09-09 216.970001 217.029999 213.250000 213.279999 202.193268 221589100
5948 2016-09-12 212.389999 216.809998 212.309998 216.339996 205.094223 168110900
5949 2016-09-13 214.839996 215.149994 212.500000 213.229996 202.145859 182828800
5950 2016-09-14 213.289993 214.699997 212.500000 213.149994 202.070023 134185500
5951 2016-09-15 212.960007 215.729996 212.750000 215.279999 204.089294 134427900
5952 2016-09-16 213.479996 213.690002 212.570007 213.369995 203.300430 155236400
Currently, I'm doing this:
state['open_price'] = lookback.Open.iloc[-1:].get_values()[0]
for ind, row in lookback.reset_index().iterrows():
if ind < self.LOOKBACK_DAYS:
state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close
state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open
state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume
But this is exceedingly slow. Is there some more vectorized way to do this?
I am trying to convert this to:
cash 1.000000e+05
num_shares 0.000000e+00
cost_basis 0.000000e+00
open_price 1.316900e+02
close_20 1.301100e+02
open_20 1.302600e+02
volume_20 4.670420e+07
close_19 1.302100e+02
open_19 1.299900e+02
volume_19 4.320920e+07
close_18 1.300200e+02
open_18 1.300300e+02
volume_18 3.252300e+07
close_17 1.292200e+02
open_17 1.299300e+02
volume_17 8.207990e+07
close_16 1.300300e+02
open_16 1.294100e+02
volume_16 6.150570e+07
close_15 1.298000e+02
open_15 1.301100e+02
volume_15 7.057170e+07
close_14 1.298300e+02
open_14 1.300200e+02
volume_14 6.292560e+07
close_13 1.297300e+02
open_13 1.300700e+02
volume_13 6.162470e+07
close_12 1.305600e+02
open_12 1.297300e+02
...
close_10 1.308700e+02
open_10 1.308500e+02
volume_10 5.790620e+07
close_9 1.295400e+02
open_9 1.310600e+02
volume_9 8.018090e+07
close_8 1.297400e+02
open_8 1.297400e+02
volume_8 4.149650e+07
close_7 1.286400e+02
open_7 1.298500e+02
volume_7 7.279940e+07
close_6 1.288800e+02
open_6 1.287700e+02
volume_6 4.303370e+07
close_5 1.287100e+02
open_5 1.285900e+02
volume_5 5.105180e+07
close_4 1.286600e+02
open_4 1.288300e+02
volume_4 6.416770e+07
close_3 1.307000e+02
open_3 1.289300e+02
volume_3 9.253180e+07
close_2 1.309500e+02
open_2 1.307500e+02
volume_2 8.726900e+07
close_1 1.311300e+02
open_1 1.310000e+02
volume_1 8.600550e+07
Length: 64, dtype: float64

One way is to cheat and use the underlying arrays using .values
I'll add some steps that i took to create an equivalent example as well:
import pandas as pd
from itertools import product
initial = ['cash', 'num_shares', 'somethingsomething']
initial_series = pd.Series([1, 2, 3], index = initial)
print(initial_series)
#Output:
cash 1
num_shares 2
somethingsomething 3
dtype: int64
Okay, just some values at the start of your series in output, mocked for the example.
df = pd.read_clipboard(sep='\s\s+') #pure magic
print(df.head())
#Output:
Date Open ... Adj Close Volume
5932 2016-08-18 218.339996 ... 207.483215 52989300
5933 2016-08-19 218.309998 ... 207.179825 75443000
5934 2016-08-22 218.259995 ... 207.170364 61368800
5935 2016-08-23 219.250000 ... 207.587479 53399200
5936 2016-08-24 218.800003 ... 206.525711 71728900
[5 rows x 7 columns]
df is now essentially the dataframe you provided in the example. The clipboard trick comes from here and is a good read for pandas MCVEs.
to_select = ['Close', 'Open', 'Volume']
SOMELOOKBACK = 6000 #mocked
final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]
This prepares the indexes and looks something like this
['Close_68',
'Open_68',
'Volume_68',
'Close_67',
'Open_67',
'Volume_67',
...
]
Now, just select the relevant columns from dataframe, use .values to get a 2d array then flatten, to get the final series.
final_series = pd.Series(df[to_select].values.flatten(), index = final_index)
result = initial_series.append(final_series)
#Output:
cash 1.000000e+00
num_shares 2.000000e+00
somethingsomething 3.000000e+00
Close_68 2.188600e+02
Open_68 2.183400e+02
Volume_68 5.298930e+07
Close_67 2.185400e+02
Open_67 2.183100e+02
Volume_67 7.544300e+07
Close_66 2.185300e+02
Open_66 2.182600e+02
Volume_66 6.136880e+07
...
Close_48 2.133700e+02
Open_48 2.134800e+02
Volume_48 1.552364e+08
Length: 66, dtype: float64

Related

iterrate and save each stock historical data in dataframe without downloading in CSV

I would like to pull historical data from yfinance for a specific list of stocks. I want to store earch stock in a separate dataframes (each stock with its own df).
I can download it to multiple csv's through below code, but I couldn't find a way to store them in different dataframes (wihtout having to download them to csv)
import yfinance
stocks = ['TSLA','MSFT','NIO','AAPL','AMD','ADBE','ALGN','AMZN','AMGN','AEP','ADI','ANSS','AMAT','ASML','TEAM','ADSK']
for i in stocks:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df.to_csv( i + '.csv')
I want my end results to be a dataframe called "TSLA" for tsla historical data and another one called "MSFT" for msft data...and so on
I tried:
stock = ['TSLA','MSFT','NIO','AAPL','AMD']
df_ = {}
for i in stock:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df_["{}".format(i)] = df
And I have to call each dataframe by key to get it like df_["TSLA"] but this is not what I want. I need a datafram called only TSLA that have tsla data and so on. Is there a way to do it?

You don't need to download data multiple times. You just have to split whole data with groupby and create variables dynamically with locals():
stocks = ['TSLA', 'MSFT', 'NIO', 'AAPL', 'AMD', 'ADBE', 'ALGN', 'AMZN',
'AMGN', 'AEP', 'ADI', 'ANSS', 'AMAT', 'ASML', 'TEAM', 'ADSK']
data = yfinance.download(stocks, start='2015-01-01', end='2021-09-12')
for stock, df in data.groupby(level=1, axis=1):
locals()[stock] = df.droplevel(level=1, axis=1)
df.to_csv(f'{stock}.csv')
Output:
>>> TSLA
Adj Close Close High Low Open Volume
Date
2014-12-31 44.481998 44.481998 45.136002 44.450001 44.618000 11487500
2015-01-02 43.862000 43.862000 44.650002 42.652000 44.574001 23822000
2015-01-05 42.018002 42.018002 43.299999 41.431999 42.910000 26842500
2015-01-06 42.256001 42.256001 42.840000 40.841999 42.012001 31309500
2015-01-07 42.189999 42.189999 42.956001 41.956001 42.669998 14842000
... ... ... ... ... ... ...
2021-09-03 733.570007 733.570007 734.000000 724.200012 732.250000 15246100
2021-09-07 752.919983 752.919983 760.200012 739.260010 740.000000 20039800
2021-09-08 753.869995 753.869995 764.450012 740.770020 761.580017 18793000
2021-09-09 754.859985 754.859985 762.099976 751.630005 753.409973 14077700
2021-09-10 736.270020 736.270020 762.609985 734.520020 759.599976 15114300
[1686 rows x 6 columns]
>>> ANSS
Adj Close Close High Low Open Volume
Date
2014-12-31 82.000000 82.000000 83.480003 81.910004 83.080002 304600
2015-01-02 81.639999 81.639999 82.629997 81.019997 82.089996 282600
2015-01-05 80.860001 80.860001 82.070000 80.779999 81.290001 321500
2015-01-06 79.260002 79.260002 81.139999 78.760002 81.000000 344300
2015-01-07 79.709999 79.709999 80.900002 78.959999 79.919998 233300
... ... ... ... ... ... ...
2021-09-03 368.380005 368.380005 371.570007 366.079987 366.079987 293000
2021-09-07 372.070007 372.070007 372.410004 364.950012 369.609985 249500
2021-09-08 372.529999 372.529999 375.820007 369.880005 371.079987 325800
2021-09-09 371.970001 371.970001 375.799988 371.320007 372.519989 194900
2021-09-10 373.609985 373.609985 377.260010 372.470001 374.540009 278800
[1686 rows x 6 columns]

You can create global or local variable like
globals()["TSLA"] = "some value"
print(TSLA)
locals()["TSLA"] = "some value"
print(TSLA)
but frankly it is waste of time. It is much more useful to keep it as dictionary.
With dictionary you can use for-loop to run some code on all dataframes.
You can also seletect dataframes by name. etc.
Examples:
df_max = {}
for name, df in df_.items():
df_max[name] = df.max()
name = input("What to display: ")
df_[name].plot()

How can I use the Apply() function for this problem?

I have a df, with random portfolios, they are shown as follows
>>> random_portafolios
AAPL weight MSFT weight XOM weight JNJ weight JPM weight AMZN weight GE weight FB weight T weight
0 0.188478 0.068795 0.141632 0.147974 0.178185 0.040370 0.020516 0.047275 0.166774
1 0.236818 0.008540 0.082680 0.088380 0.453573 0.021001 0.014043 0.089811 0.005155
2 0.179750 0.071711 0.050107 0.089424 0.080108 0.106136 0.155139 0.073487 0.194138
3 0.214392 0.015681 0.034284 0.276342 0.118263 0.002101 0.057484 0.000317 0.281137
4 0.301469 0.099750 0.046454 0.093279 0.020095 0.073545 0.178752 0.146486 0.040168
5 0.132916 0.006199 0.305137 0.032262 0.090356 0.169671 0.205602 0.003686 0.054172
the random portfolios have different weights
I also have the one-year returns on these actions
>>> StockReturns.head()
AAPL MSFT XOM TWTR JPM AMZN GE FB T
Date
2017-01-04 -0.001164 -0.004356 -0.011069 0.025547 0.001838 0.004657 0.000355 0.015660 -0.005874
2017-01-05 0.005108 0.000000 -0.014883 0.013642 -0.009174 0.030732 -0.005674 0.016682 -0.002686
2017-01-06 0.011146 0.008582 -0.000499 0.004681 0.000123 0.019912 0.002853 0.022707 -0.019930
2017-01-09 0.009171 -0.003170 -0.016490 0.019220 0.000741 0.001168 -0.004979 0.012074 -0.012641
2017-01-10 0.001049 -0.000335 -0.012829 -0.007429 0.002837 -0.001280 -0.002859 -0.004404 0.000278
Now I want to add two columns "Returns" and "Volatility" to that df. The returns and volatilities must be in function to the values of each row, and to do it for each row that there is.
He tried to use the Apply() function
def complex_computation():
WeightedReturns = StockReturns.mul(arr, axis=1)
ReturnsDaily= WeightedReturns.sum(axis=1)
mean_retorns_daily = np.mean(ReturnsDaily)
Returns = ((1+mean_retorns_daily)**252)
cov_mat =StockReturns.cov()
cov_mat_annual = cov_mat*252
Volatility= np.sqrt(np.dot(arr.T, np.dot(cov_mat_annual, arr)))
return Returns, Volatility
What I'm looking for is that this result appears in a row that corresponds to it (row 0, in this case), and do the same for the following rows that I have. Python def complex_computation(): said that name 'Returns' is not defined
After
def func(row):
random_portafolios['Volatility'].append(Volatility)
Returns, Volatility = complex_computation(row.values)
return pd.Series({'NewColumn1': Retturns,
'NewColumn2': Volatility})
and
def run_apply(random_portafolios):
df_result = random_portafolios.apply(func, axis=1)
return df_result
How could I run my code correctly?
Help me!

Generating monthly means for all columns without initializing a list for each column?

I have time series data I want to generate the mean for each month, for each column. I have successfully done so, but by creating a list for each column - which wouldn't be feasible for thousands of columns.
How can I adapt my code to auto-populate the column names and values into a dataframe with thousands of columns?
For context, this data has 20 observations per hour for 12 months.
Original data:
timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
Output:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
0 106.734147 16518.428734 16518.428734 7630.187992 45.992215
1 115.099825 18222.911023 18222.911023 9954.252911 47.334477
2 111.555504 19090.607211 19090.607211 9283.845649 48.939581
3 102.408996 18399.719852 18399.719852 7778.897037 48.130057
4 118.371951 20245.378742 20245.378742 9024.424210 64.796939
5 127.580516 21859.212675 21859.212675 9595.477455 70.952311
6 134.159082 22349.853561 22349.853561 10305.252112 75.195480
7 137.990638 21122.233427 21122.233427 10024.709142 74.755469
8 144.958318 18633.290818 18633.290818 11193.381098 66.776627
9 122.406489 20258.135923 20258.135923 10504.604420 61.793355
10 104.817850 18762.070668 18762.070668 9361.052983 51.802615
11 106.589672 20049.809554 20049.809554 9158.685383 51.611633
Successful code:
#separate data into months
v = list(range(1,13))
data_month = []
for i in v:
data_month.append(data[(data.index.month==i)])
# average per month for each sensor
mean_56TI1164 = []
mean_56FI1281 = []
mean_56TI1281 = []
mean_52FC1043 = []
mean_57TI1501 = []
for i in range(0,12):
mean_56TI1164.append(data_month[i]['56TI1164'].mean())
mean_56FI1281.append(data_month[i]['56FI1281'].mean())
mean_56TI1281.append(data_month[i]['56FI1281'].mean())
mean_52FC1043.append(data_month[i]['52FC1043'].mean())
mean_57TI1501.append(data_month[i]['57TI1501'].mean())
mean_df = {'56TI1164': mean_56TI1164, '56FI1281': mean_56FI1281, '56TI1281': mean_56TI1281, '52FC1043': mean_52FC1043, '57TI1501': mean_57TI1501}
mean_df = pd.DataFrame(mean_df, columns= ['56TI1164', '56FI1281', '56TI1281', '52FC1043', '57TI1501'])
mean_df
Unsuccessful attempt to condense:
col = list(data.columns)
mean_df = pd.DataFrame()
for i in range(0,12):
for j in col:
mean_df[j].append(data_month[i][j].mean())
mean_df

As suggested by G. Anderson, you can use groupby as in this example:
import pandas as pd
import io
csv="""timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 120.0000 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 124.0000 17967.33 39.11348 9348.223 48.94284"""
# The following lines read your data into a pandas dataframe;
# it may help if your data comes in the form you wrote in the question
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
data = pd.read_csv(io.StringIO(csv), sep='\s+(?!\d\d:\d\d:\d\d)', \
date_parser=dateparse, index_col=0, engine='python')
# Here is where your data is resampled by month and mean is calculated
data.groupby(pd.Grouper(freq='M')).mean()
# If you have missing months, use this instead:
#data.groupby(pd.Grouper(freq='M')).mean().dropna()
Result of data.groupby(pd.Grouper(freq='M')).mean().dropna() will be:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
timestamp
2016-12-31 117.96730 17876.270 39.100740 9302.815 49.239630
2017-01-31 117.94445 17732.405 39.110535 9335.498 48.961015
2018-01-31 122.00000 17967.330 39.113480 9348.223 48.942840
Please note that I used data.groupby(pd.Grouper(freq='M')).mean().dropna() to get rid of NaN for the missing months (I added some data for January 2018 skipping what's in between).
Also note that the convoluted read_csv uses a regular expression as a separator: \s+ means one or more whitespace characters, while (?!\d\d:\d\d:\d\d) means "skip this whitespace if followed by something like 23:55:00".
Last engine='python' avoids warnings when read_csv() is used with regular expression

ValueError: cannot reindex from a duplicate axis Pandas

So I have a an array of timeseries` that are generated based on a fund_id:
def get_adj_nav(self, fund_id):
df_nav = read_frame(
super(__class__, self).filter(fund__id=fund_id, nav__gt=0).exclude(fund__account_class=0).order_by(
'valuation_period_end_date'), coerce_float=True,
fieldnames=['income_payable', 'valuation_period_end_date', 'nav', 'outstanding_shares_par'],
index_col='valuation_period_end_date')
df_dvd, skip = self.get_dvd(fund_id=fund_id)
df_nav_adj = calculate_adjusted_prices(
df_nav.join(df_dvd).fillna(0).rename_axis({'payout_per_share': 'dividend'}, axis=1), column='nav')
return df_nav_adj
def json_total_return_table(request, fund_account_id):
ts_list = []
for fund_id in Fund.objects.get_fund_series(fund_account_id=fund_account_id):
if NAV.objects.filter(fund__id=fund_id, income_payable__lt=0).exists():
ts = NAV.objects.get_adj_nav(fund_id)['adj_nav']
ts.name = Fund.objects.get(id=fund_id).account_class_description
ts_list.append(ts.copy())
print(ts)
df_adj_nav = pd.concat(ts_list, axis=1) # ====> Throws error
cols_to_datetime(df_adj_nav, 'index')
df_adj_nav = ffn.core.calc_stats(df_adj_nav.dropna()).to_csv(sep=',')
So an example of how the time series look like is this:
valuation_period_end_date
2013-09-03 17.234000
2013-09-04 17.277000
2013-09-05 17.363000
2013-09-06 17.326900
2013-09-09 17.400800
2013-09-10 17.473000
2013-09-11 17.486800
2013-09-12 17.371600
....
Name: CLASS I, Length: 984, dtype: float64
Another timeseries:
valuation_period_end_date
2013-09-03 17.564700
2013-09-04 17.608500
2013-09-05 17.696100
2013-09-06 17.659300
2013-09-09 17.734700
2013-09-10 17.808300
2013-09-11 17.823100
2013-09-12 17.704900
....
Name: CLASS F, Length: 984, dtype: float64
For each timeseries the Lengths are different and I am wondering if that is the reason for the error I am getting: cannot reindex from a duplicate axis. I am new to pandas so I was wondering if you guys have any advice.
Thanks
EDIT: Also the indexes aren't supposed to be unique.

Perhaps something like this would work. I've added the fund_id to the dataframe and reindexed it to the valuation_period_end_date and fund_id.
# Only fourth line above error.
ts = (
NAV.objects.get_adj_nav(fund_id['adj_nav']
.to_frame()
.assign(fund_id=fund)
.reset_index()
.set_index(['valuation_period_end_date', 'fund_id']))
And then stack with axis=0, group on the date and fund_id (assuming there is only one unique value per date and fund_id, you can take the first value), then unstack fund_id to pivot it as columns:
df_adj_nav = (
pd.concat(ts_list, axis=0)
.groupby(['valuation_period_end_date', 'fund_id'])
.first()
.to_frame()
.unstack('fund_id'))

Import CSV and analyse in Python

Looking for some help on a small project. I am trying to learn Python and I'm totally lost on a problem. Please let me explain.
I have a csv file that contains 'Apple share prices', so far i can import into Python using the csv module, however, I need to analyse the data and generate monthly averages and determine the best and worst 6 months. My csv columns are Date, Price.
Help is much appreciated.
"Date","Open","High","Low","Close","Volume","Adj Close"
"2012-11-14",660.66,662.18,650.5,652.55,1668400,652.55
"2012-11-13",663,667.6,658.23,659.05,1594200,659.05
"2012-11-12",663.75,669.8,660.87,665.9,1405900,665.9
"2012-11-09",654.65,668.34,650.3,663.03,3114100,663.03
"2012-11-08",670.2,671.49,651.23,652.29,2597000,652.29
"2012-11-07",675,678.23,666.49,667.12,2232300,667.12
"2012-11-06",685.48,686.5,677.55,681.72,1582800,681.72
"2012-11-05",684.5,686.86,675.56,682.96,1635900,682.96
"2012-11-02",694.79,695.55,687.37,687.92,2324400,687.92
"2012-11-01",679.5,690.9,678.72,687.59,2050100,687.59
"2012-10-31",679.86,681,675,680.3,1537000,680.3
"2012-10-26",676.5,683.03,671.2,675.15,1950800,675.15
"2012-10-25",680,682,673.51,677.76,2401100,677.76
"2012-10-24",686.8,687,675.27,677.3,2496500,677.3
etc...

With pandas this would be
In [28]: df = pd.read_csv('my_data.csv', parse_dates=True, index_col=0, sep=',')
In [29]: df
Out[29]:
Open High Low Close Volume Adj Close
Date
2012-11-14 660.66 662.18 650.50 652.55 1668400 652.55
2012-11-13 663.00 667.60 658.23 659.05 1594200 659.05
2012-11-12 663.75 669.80 660.87 665.90 1405900 665.90
2012-11-09 654.65 668.34 650.30 663.03 3114100 663.03
2012-11-08 670.20 671.49 651.23 652.29 2597000 652.29
2012-11-07 675.00 678.23 666.49 667.12 2232300 667.12
2012-11-06 685.48 686.50 677.55 681.72 1582800 681.72
2012-11-05 684.50 686.86 675.56 682.96 1635900 682.96
2012-11-02 694.79 695.55 687.37 687.92 2324400 687.92
2012-11-01 679.50 690.90 678.72 687.59 2050100 687.59
2012-10-31 679.86 681.00 675.00 680.30 1537000 680.30
2012-10-26 676.50 683.03 671.20 675.15 1950800 675.15
2012-10-25 680.00 682.00 673.51 677.76 2401100 677.76
2012-10-24 686.80 687.00 675.27 677.30 2496500 677.30
In [30]: monthly = df.resample('1M')
In [31]: monthly
Out[30]:
Open High Low Close Volume Adj Close
Date
2012-10-31 680.790 683.2575 673.745 677.6275 2096350 677.6275
2012-11-30 673.153 677.7450 665.682 670.0130 2020510 670.0130
You can than sort for the column you want
In [33]: monthly.sort('Close')
Out[33]:
Open High Low Close Volume Adj Close
Date
2012-11-30 673.153 677.7450 665.682 670.0130 2020510 670.0130
2012-10-31 680.790 683.2575 673.745 677.6275 2096350 677.6275
You can even fetch the data from Yahoo finance:
In [37]: from pandas.io import data as pddata
In [40]: df = pddata.DataReader('AAPL', data_source='yahoo', start='2012-01-01')
In [41]: df.resample('1M').sort('Close')
Out[44]:
Open High Low Close Volume Adj Close
Date
2012-01-31 428.760000 431.008500 425.810500 428.578000 12249740.000000 424.804500
2012-02-29 494.803000 500.849000 491.437500 497.571000 20300990.000000 493.191000
2012-11-30 560.365385 566.118462 548.523846 555.789231 24861884.615385 554.970769
2012-05-31 565.785000 572.141364 558.397273 564.673182 18029781.818182 559.702273
2012-06-30 574.660952 578.889048 569.213333 574.562381 13360247.619048 569.504762
2012-03-31 576.858182 582.064545 570.245909 577.507727 25299250.000000 572.424545
2012-07-31 599.610000 604.920952 594.680476 601.068095 15152466.666667 595.776667
2012-04-30 609.607500 615.487500 598.650000 606.003000 27855340.000000 600.668500
2012-10-31 638.667143 643.650476 628.213810 634.714286 20651071.428571 631.828571
2012-08-31 641.527826 646.655217 637.138261 642.696087 12851252.173913 639.090870
2012-09-30 682.118421 687.007895 676.095263 681.568421 17291363.157895 678.470526

After you have read the items and saved the [month, mean_price] pairs in a list, you can sort the list:
import operator
values_list.sort(key=operator.itemgetter(1))
This will sort the values by price. To get the top n values:
print values_list[-n:]
Or the bottom n:
print values_list[:n]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I process a DataFrame without a for loop? - python

Related

iterrate and save each stock historical data in dataframe without downloading in CSV

How can I use the Apply() function for this problem?

Generating monthly means for all columns without initializing a list for each column?

ValueError: cannot reindex from a duplicate axis Pandas

Import CSV and analyse in Python

Categories

Resources