matplotlib.finance 5 minute intervals - python

I have been using matplotlib.finance to pull stock information. quotes_historical_yahoo() is a really easy function to use but seems to only allow me to pull information on the day.
Is there a way using matplotlib to pull stock values in intervals of 5 minutes?
If not can I get a suggestion of some other python software that will do what I want.

There are several sources of historical data at varying resolutions here, but they dont go back very far. For example, you can only get ten days worth of data at the 1 minute interval from google finance.
I use pandas for historical data using DataReader(), and then read_csv() for the above sources (but that can get tricky and you will need to write your own code to format and make some of these useful)

Related

Plot timestamp distribution in a day over multiple dates

This seems like a trivial thing to solve, but every idea I have is very hacky. I have a series of timestamps that spans multiple days. What I'm interested in, is the distribution of these timestamps (events) within 24h: e.g., see whether there are more events in the morning.
My data is in a pandas.DataFrame, and the timestamp column has dtype datetime[ns]. I tried matplotlib.pyplot.plot(data.timestamp.dt.time), but that gives an error. I also thought of subtracting the data from my timestamps so they all start on 'day 0', and formatting the X-axis in the plot to not show the date. Feels very clumsy. Is there a better way?
If you are interested in distribution with resolution limited to e.g.
hours, you can:
Create a new column with extracted hour from your source timestamp.
Group your data by hour.
Generate your plot.
As you failed to post any sample data, I'm not able to post any code.

How do you store the results of some python code for use in another calculation?

I am a day trader who is new to python and learning every day. I have written a basic script or maybe you call it a function? but some basic python code that pulls the best bid/offer data from an API for me on repeat every 5 seconds
I now want a rolling average of the data coming in the from the API every 5 seconds so i can compare the current data against the rolling average
My problem is I have no idea where to start or what I should be looking to learn. Any help would be great! Even just to point me in the right direction.
Does the data need to be stored into a .csv that is updated each 5 seconds? or can all this be done within the code?
Thanks in advance for any help, code is below
import time
from binance.client import Client
api_key = "###"
api_secret = "###"
while True:
client = Client(api_key, api_secret)
ticker_info = (client.get_ticker(symbol="ETHUSDT"))
bid_qty = int(float(ticker_info['bidQty']))
ask_qty = int(float(ticker_info['askQty']))
bbo_delta = ask_qty-bid_qty
print("Ask=")
print(ask_qty)
print("Bid=")
print(bid_qty)
print("Delta=")
print(bbo_delta)
print("-")
time.sleep(5)
Actually, this types of quires may have various possibilities. As per your explanation, I got to know that, You want to fetch the data that get generated every five second.
So, you can scrape the data via beautiful soup if you are using python.
As the data is updated in every five second, after some times, it may be huge too . So, in that case properly stored data maybe in a csv or in a excel or in a database , It would be great help for you too.
Just scrape the data and store the data in a csv format or if you are using API, store that in a proper dataframe.
My suggestion would be use Beautiful Soup (BS4) and read some documentation and just in a few lines of code, store the data in a csv.
Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
By a 'rolling average' do you mean over minutes, hours or days? You can get a one minute rolling mean by putting the values into a list of len <= 12 and dropping the 'old' (only one minute old) values as new ones arrive. List is going to get really big as the roll time window gets big enough to be useful (len=120k for a one week average). Hard to imagine volatility that would make a 5 sec sampling interval valuable, but I know nothing about day trading. If you do want that short an interval and a 100k size data set, reading and writing to a file is going to be too slow.
Try writing code for a one hour rolling average with samples every minute. That will get you started. You can then post the code with specific questions and incrementally work to your goal.

Take dates and times from multiple columns to one datetime object with Python

I've got a dataset with multiple time values as below.
Area,Year,Month,Day of Week,Time of Day,Hour of Day
x,2016,1,6.0,108,1.0
z,2016,1,6.0,140,1.0
n,2016,1,6.0,113,1.0
p,2016,1,6.0,150,1.0
r,2016,1,6.0,158,1.0
I have been trying to transform this into a single datetime object to simplify the dataset and be able to do proper time series analysis against it.
For some reason I have been unable to get the right outcome using the datetime library from Python. Would anyone be able to point me in the right direction?
Update - Example of stats here.
https://data.pa.gov/Public-Safety/Crash-Incident-Details-CY-1997-Current-Annual-Coun/dc5b-gebx/data
I don't think there is a week column. Hmm. I wonder if I've missed something?
Any suggestions would be great. Really just looking to simplify this dataset. Maybe even create another table / sheet for the causes of crash, as their's a lot of superfluous columns that are taking up a lot of data, which can be labeled with simple ints.

Minute data not working zipline

I wanted to get some minute history data by using the following:
hist_minutes = data.history(context.aapl,'price',50,'1m')
This gave me the following error:
NoDataForSid:No minute data for sid 2509.
This is strange because when I used 1d instead of 1m it did work, so why is that? And how can this be fixed in order to also get minute data.
What data are you using? If you ingested the default Quandl bundle, that dataset in particular only has daily prices, and does not have minutely prices. You'll need to gather your own minutely data and either write a new bundle or ingest csv files.
If you want to use anything that's not US equity daily pricing, you'll need your own data.
Source / Disclaimer: I'm a Zipline maintainer.

Transforming data in pandas

What would be the best way to approach this problem using python and pandas?
I have an excel file of electricity usage. It comes in an awkward structure and I want to transform it so that I can compare it to weather data based on date and time.
The structure look like ( foo is a string and xx is a number)
100,foo,foo,foo,foo
200,foo,foo,foo,foo,foo,0000,kWh,15
300,20181101,xx,xx,xx,xx...(96 columns)xx,A
... several hundred more 300 type rows
the 100 and 200 rows identify the meter and provide a partial schema. ie data is in kWh and 15 minute intervals. The 300 rows contain date and 96 (ie 96 = 24hours*4 15min blocks) columns of 15min power consumption and one column with a data quality flag.
I have previously processed all the data in other tools but I'm trying to learn how to do it in Python (jupyter notebook to be precise) and tap into the far more advanced analysis, modeling and visualisation tools available.
I think the thing to do is transform the data into a series of datetime and power. From there I can aggregate filter and compare however I like.
I am at a loss even to know what question to ask or resource to look up to tackle this problem. I could just import the 300 rows as is and loop through the rows and columns to create a new series in the right structure - easy enough to do. However, I strongly suspect there is an inbuilt method for doing this sort of thing and I would greatly appreciate any advise on what might be the best strategies. Possibly I don't need to transform the data at all.
You can read the data easy enough into a DataFrame, you just have to step over the metadata rows, e.g.:
df = pd.read_csv(<file>, skiprows=[0,1], index_col=1, parse_dates=True, header=None)
This will read in the csv, skip over the first 2 lines, make the date column the index and try and parse it to a date type.

Categories