Minute data not working zipline - python

I wanted to get some minute history data by using the following:
hist_minutes = data.history(context.aapl,'price',50,'1m')
This gave me the following error:
NoDataForSid:No minute data for sid 2509.
This is strange because when I used 1d instead of 1m it did work, so why is that? And how can this be fixed in order to also get minute data.

What data are you using? If you ingested the default Quandl bundle, that dataset in particular only has daily prices, and does not have minutely prices. You'll need to gather your own minutely data and either write a new bundle or ingest csv files.
If you want to use anything that's not US equity daily pricing, you'll need your own data.
Source / Disclaimer: I'm a Zipline maintainer.

Related

Use ISIN to retrieve stock data in PYTHON

In my quest of retrieving the stock prices, in daily, within a 10y period, of the 600 companies of the index EUROSTOXX 600, I'm facing some difficulties.
First question : Retrieving all of this with one part of code seems feasible according to you ?
(I'm considering adding also main financial indicators like ROI,ROE,EBIT,EPS, annual performance... and export all of this on one excel sheet)
I collected all the 600 ISIN. The question is, can I use it to retrieve the data from yahoo finance (or anything else) or should I find a way to find the 600 real tickers defined by Yahoo ?
If yes, does anyone have a tip for that ? I've been looking for lists but this index doesn't look very popular apparently.
Thank you for reading !

Efficient Python/Pandas Selection and Sorting

My brother and I are working on reproducing the findings of this paper where they use daily historical stock prices to predict future prices with incredibly high accuracy.
They used data from the center for research and security prices, but we are using data from quandl and are running into trouble with the runtimes for our preprocessing. As in the paper, for each ticker and each day, we want to create a vector of 32 z-scores, comparing that ticker's momentum to the rest of the market's. As an intermediate step, they create a vector of 32 cumulative returns for each stock over a given time period (12 for the previous 12 months and 20 for the previous 20 days.) (this process is also briefly discussed in section 2.2 of the paper)
Because of the large amount of data, we are having problems with runtimes simply creating these cumulative returns vectors. We imported the data from a 1.7gb csv and turned it into a Pandas dataframe. We then wrote a function to take a given ticker and date and create a cumulative returns vector. I suspect that our trouble is in how we select data out of the dataframe. (specifically the following lines [each of them don't take much time, but they both need to be run many times for each date])
prices = data[data.ticker == ticker_argument]
and
p = prices[prices.date == date_argument]
What would be the most efficient way of doing this considering that the data is sorted by ticker and the data for each ticker is also sorted by date? I'd imagine that you could do some sort of binary search to speed most of it up, but should I do that by hand in plain python or is there a better pandas way to do that? Any help would be greatly appreciated.
The quandl csv is at http://www.jackmckeown.com/prices.csv if you want more info on how the data is formatted.

Complete sparse data in Graphite

I have a list of objects that each contain: Product, Price, Time. It basically holds product price change over time. So, for each price change you'll get a record with the product name, the new price and the exact second of the change.
I'm sending the data to graphite using the timestamp (written in python):
import socket
sock = socket.socket()
sock.connect(('my-host-ip', 2003))
message = 'my.metric.prefix.%s %s %d\n' % (product, price, timestamp)
sock.sendall(message)
sock.close()
Thing is, as prices do not change very often, the data points are very sparsed which mean I get a point per product in a frequency of hours/days. If I look at Graphite at the exact time of the price change, I can see the data point. But if I want to look at price change over time, I would like to draw a constant line from the data point of the price change going forward.
I tried using:
keepLastValue(my.metric.prefix.*)
I would work only if I look at the data points in a time frame of a few minutes, but not hours (surely not days). Is there a way to do something like that in Graphite? Or I have to put some redundant data every minute to describe the missing points?
I believe using keepLastValue doesn't work for you for a coarser time interval due to aggregation rules defined in storage-aggregation.conf. You can try using xFilesFactor = 0 and aggregationMethod = last to get always the last value of the metric at each aggregated point.
However I think your concrete use case is much better resolved by using StatsD gauges. Basically you can set an arbitrary numerical value for a gauge in StatsD and it will send (flush) its value to Graphite every 10 seconds by default. You can set the flush interval to a shorter period, like 1 second, if you really need to record the second of the change. If the gauge is not updated at the next flush, StatsD will send the previous value.
So basically StatsD gauges do what you say about sending redundant data to describe missing points.
I had the same problem as well with sparse data.
I used the whisper database tools outlined in the link below to update my whisper files which were aggregating load data on 10 minute intervals.
https://github.com/graphite-project/whisper
First examine my file using whisper-info:
/opt/graphite/bin/whisper-info.py /opt/graphite/storage/whisper/test/system/myserver/n001/loadave.wsp
Then fix aggregation methods using whisper-resize:
/opt/graphite/bin/whisper-resize.py --xFilesFactor=0 --aggregationMethod=last /opt/graphite/storage/whisper/test/system/myserver/n001/loadave.wsp 600:259200
Please be cautious using whisper-resize as it can result in data loss if you aren't careful!

How to convert daily to monthly netcdf files

I have downloaded climate model output in the form of netcdf files with one variable (pr) for the whole world with a daily time-step. My final goal is to have monthly data for Europe.
I have never used netcdf files before and all the specific software for netcdf I could find doesn't seems to work in windows. Since I programme in R, I tried using the ncdf4 package but run into memory size problems (my files are around 2Gb)... I am now trying the netCDF4 module in python (first time I am using python - so go easy on me).
I have managed to install everything and found some code online to import the dataset:
nc_fid = Dataset(nc_f, 'r')
# Extract data from NetCDF file
lats = nc_fid.variables['lat'][:]
lons = nc_fid.variables['lon'][:]
time = nc_fid.variables['time'][:]
pp = nc_fid.variables['pr'][:]
However all the tutorials I found are on how to make a netcdf file... I have no idea how to aggregate this daily rainfall (variable pr) into monthly. Also, I have different types of calender in different files, but I don't even know how to access that information:
time.calendar
AttributeError: 'numpy.ndarray' object has no attribute 'calendar'
Please help, I don't want to have to learn Linux just so I can sort-out some data :(
Why not avoid programming entirely and use NCO which supplies the ncrcat command that aggregates data thusly:
ncrcat day*.nc month.nc
VoilĂ . See more ncrcat examples here.
Added 20160628: If instead of a month-long timeseries you want a monthly average then use the same command only with ncra instead of ncrcat. The manual explains things like this.
If you have a daily timestep and you want to calculate the monthly mean then you can do
cdo monmean input_yyyy.nc output_yyyy.nc
It sounds as if you have several of these files, so you will need to merge them with
cdo mergetime file_*.nc timeseries.nc
where the * is a wildcard for the years.

matplotlib.finance 5 minute intervals

I have been using matplotlib.finance to pull stock information. quotes_historical_yahoo() is a really easy function to use but seems to only allow me to pull information on the day.
Is there a way using matplotlib to pull stock values in intervals of 5 minutes?
If not can I get a suggestion of some other python software that will do what I want.
There are several sources of historical data at varying resolutions here, but they dont go back very far. For example, you can only get ten days worth of data at the 1 minute interval from google finance.
I use pandas for historical data using DataReader(), and then read_csv() for the above sources (but that can get tricky and you will need to write your own code to format and make some of these useful)

Categories