I am writing an API that collects my fitbit data and an extract is shown below. I wanted to ask if anyone knows how to display the x axis as 24 hour time. The program creates a csv and in the file i do have Date and Time fields however could not get it to display on the graph.
(i have deleted the beginning bit of this code but it just contained the import functions and CLIENT_ID and CLIENT_SECRET.)
server = Oauth2.OAuth2Server(CLIENT_ID, CLIENT_SECRET)
server.browser_authorize()
ACCESS_TOKEN = str(server.fitbit.client.session.token['access_token'])
REFRESH_TOKEN =
str(server.fitbit.client.session.token['refresh_token'])
auth2_client = fitbit.Fitbit(CLIENT_ID, CLIENT_SECRET, oauth2=True,
access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)
yesterday = str((datetime.datetime.now() -
datetime.timedelta(days=1)).strftime("%Y%m%d"))
yesterday2 = str((datetime.datetime.now() -
datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
yesterday3 = str((datetime.datetime.now() -
datetime.timedelta(days=1)).strftime("%d/%m/%Y"))
today = str(datetime.datetime.now().strftime("%Y%m%d"))
fit_statsHR = auth2_client.intraday_time_series('activities/heart',
base_date=yesterday2, detail_level='15min')
time_list = []
val_list = []
for i in fit_statsHR['activities-heart-intraday']['dataset']:
val_list.append(i['value'])
time_list.append(i['time'])
heartdf = pd.DataFrame({'Heart
Rate':val_list,'Time':time_list,'Date':yesterday3})
heartdf.to_csv('/Users/zabiullahmohebzadeh/Desktop/python-fitbit-
master/python-fitbit-master/Data/HeartRate - '+ \
yesterday+'.csv', \
columns=['Date','Time','Heart Rate'], header=True, \
index = False)
plt.plot(val_list, 'r-')
plt.ylabel('Heart Rate')
plt.show()
You can pass your x-values to plt.plot:
plt.plot(time_list, val_list, 'r-')
Without knowing how your time_list is formatted I can't advise on the best way to get it into 24hr time I'm afraid.
Related
I am facing an issue where my Django application, that uses pandas processing code, is not loading a specific page when deployed on https://railway.app service and keeps giving an application server error, whereas it runs fine on a local server but takes more than 10-11 seconds for loading the specific page. I would like to know the possible causes of this issue and how can I troubleshoot it.
Note: I am using an API build with Django Rest framework.
View that renders the pandas processed data:
def nsebse(request):
if request.user.is_authenticated:
endpoint1 = rq.get('https://stockindex.up.railway.app/api/bse/').json()
endpoint2 = rq.get('https://stockindex.up.railway.app/api/nse/').json()
bseData = helperForBSENSE(endpoint1)
nseData = helperForBSENSE(endpoint2)
return render(request,'nsebse.html',{'bseData':bseData,'nseData':nseData})
else:
return render(request,'notauth.html')
Helper function used in above segment:
def helperForBSENSE(data):
df = pd.DataFrame(columns = ['id','date','open','high','low','close','adj_close','volume'])
for i in range(len(data)):
item = data[i]
df.loc[i] = [item['id'],item['date'],item['open'],item['high'],item['low'],item['close'],item['adj_close'],item['volume']]
df['date']= pd.to_datetime(df['date'])
today, req_day = datetime(2023,1,12), datetime(2022,1,12)
one_year_df = df[(df['date'] <= today) & (df['date'] > req_day)]
fifty_two_week_high = max(one_year_df['high'])
fifty_two_week_low = min(one_year_df['low'])
day_high = df[df['date']==today]['high']
day_low = df[df['date']==today]['low']
previous_day = today - pd.Timedelta(days=1)
today_close = float(df[df['date']==today]['close'])
prev_close = float(df[df['date']==previous_day]['close'])
today_open = float(df[df['date']==today]['open'])
today_gain_or_loss = float(today_close - prev_close)
data = {
'today_open':today_open,
'today_close':today_close,
'prev_close': prev_close,
'today_gain_or_loss' : today_gain_or_loss,
'day_high':float(day_high),
'day_low':float(day_low),
'fifty_two_week_high':fifty_two_week_high,
'fifty_two_week_low':fifty_two_week_low,
'xAxis':list(df['date']),
'yAxis':list(df['close']),
}
return data
here is a live deployment link: https://stockindex.up.railway.app (click on NSE/BSE or companies tab for the page with above said problem)
So i figured out the answer...
for i in range(len(data)):
item = data[i]
df.loc[i] = [item['id'],item['date'],item['open'],item['high'],item['low'],item['close'],item['adj_close'],item['volume']]
df['date']= pd.to_datetime(df['date'])
i ended up profilng the above code segment and it took 20 seconds to do that(convert json reposnse to a dataframe) so i used a builtin method of pandas to that and significantly reduced the time taken from 20 sec to 3 sec
code used to replace above:
df = pd.json_normalize(data)
I'm trying to understand why this pipeline writes no output to BigQuery.
What I'm trying to achieve is to calculate the USD index for the last 10 years, starting from different currency pairs observations.
All the data is in BigQuery and I need to organize it and sort it in a chronollogical way (if there is a better way to achieve this, I'm glad to read it because I think this might not be the optimal way to do this).
The idea behing the class Currencies() is to start grouping (and keep) the last observation of a currency pair (eg: EURUSD), update all currency pair values as they "arrive", sort them chronologically and finally get the open, high, low and close value of the USD index for that day.
This code works in my jupyter notebook and in cloud shell using DirectRunner, but when I use DataflowRunner it does not write any output. In order to see if I could figure it out, I tried to just create the data using beam.Create() and then write it to BigQuery (which it worked) and also just read something from BQ and write it on other table (also worked), so my best guess is that the problem is in the beam.CombineGlobally part, but I don't know what it is.
The code is as follows:
import logging
import collections
import apache_beam as beam
from datetime import datetime
SYMBOLS = ['usdjpy', 'usdcad', 'usdchf', 'eurusd', 'audusd', 'nzdusd', 'gbpusd']
TABLE_SCHEMA = "date:DATETIME,index:STRING,open:FLOAT,high:FLOAT,low:FLOAT,close:FLOAT"
class Currencies(beam.CombineFn):
def create_accumulator(self):
return {}
def add_input(self,accumulator,inputs):
logging.info(inputs)
date,currency,bid = inputs.values()
if '.' not in date:
date = date+'.0'
date = datetime.strptime(date,'%Y-%m-%dT%H:%M:%S.%f')
data = currency+':'+str(bid)
accumulator[date] = [data]
return accumulator
def merge_accumulators(self,accumulators):
merged = {}
for accum in accumulators:
ordered_data = collections.OrderedDict(sorted(accum.items()))
prev_date = None
for date,date_data in ordered_data.items():
if date not in merged:
merged[date] = {}
if prev_date is None:
prev_date = date
else:
prev_data = merged[prev_date]
merged[date].update(prev_data)
prev_date = date
for data in date_data:
currency,bid = data.split(':')
bid = float(bid)
currency = currency.lower()
merged[date].update({
currency:bid
})
return merged
def calculate_index_value(self,data):
return data['usdjpy']*data['usdcad']*data['usdchf']/(data['eurusd']*data['audusd']*data['nzdusd']*data['gbpusd'])
def extract_output(self,accumulator):
ordered = collections.OrderedDict(sorted(accumulator.items()))
index = {}
for dt,currencies in ordered.items():
if not all([symbol in currencies.keys() for symbol in SYMBOLS]):
continue
date = str(dt.date())
index_value = self.calculate_index_value(currencies)
if date not in index:
index[date] = {
'date':date,
'index':'usd',
'open':index_value,
'high':index_value,
'low':index_value,
'close':index_value
}
else:
max_value = max(index_value,index[date]['high'])
min_value = min(index_value,index[date]['low'])
close_value = index_value
index[date].update({
'high':max_value,
'low':min_value,
'close':close_value
})
return index
def main():
query = """
select date,currency,bid from data_table
where date(date) between '2022-01-13' and '2022-01-16'
and currency like ('%USD%')
"""
options = beam.options.pipeline_options.PipelineOptions(
temp_location = 'gs://PROJECT/temp',
project = 'PROJECT',
runner = 'DataflowRunner',
region = 'REGION',
num_workers = 1,
max_num_workers = 1,
machine_type = 'n1-standard-1',
save_main_session = True,
staging_location = 'gs://PROJECT/stag'
)
with beam.Pipeline(options = options) as pipeline:
inputs = (pipeline
| 'Read From BQ' >> beam.io.ReadFromBigQuery(query=query,use_standard_sql=True)
| 'Accumulate' >> beam.CombineGlobally(Currencies())
| 'Flat' >> beam.ParDo(lambda x: x.values())
| beam.io.Write(beam.io.WriteToBigQuery(
table = 'TABLE',
dataset = 'DATASET',
project = 'PROJECT',
schema = TABLE_SCHEMA))
)
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
main()
They way I execute this is from shell, using python3 -m first_script (is this the way I should run this batch jobs?).
What I'm missing or doing wrong? This is my first attemp to use Dataflow, so i'm probably making several mistakes in the book.
For whom it may help: I faced a similar problem but I already used the same code for a different flow that had a pubsub as input where it worked flawless instead a file based input where it simply did not. After a lot of experimenting I found that in the options I changed the flag
options = PipelineOptions(streaming=True, ..
to
options = PipelineOptions(streaming=False,
as of course it is not a streaming source, it's a bounded source, a batch. After I set this flag to true I found my rows in the BigQuery table. After it had finished it even stopped the pipeline as it where a batch operation. Hope this helps
Working with google(and all) api calls for the first time, I'm continually hitting a rate limit threshold despite having limited my rate. How would I go about changing the following code to a batch format to avoid this?
#API Call Function
from ratelimit import limits, sleep_and_retry
import requests
from googleapiclient.discovery import build
#sleep_and_retry
#limits(calls=1, period=4.5)
def pull_sheet_data(SCOPE,SPREADSHEET_ID,DATA_TO_PULL):
creds = gsheet_api_check(SCOPE)
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
result = sheet.values().get(
spreadsheetId=SPREADSHEET_ID,
range=DATA_TO_PULL).execute()
values = result.get('values', [])
if not values:
print('No data found.')
else:
rows = sheet.values().get(spreadsheetId=SPREADSHEET_ID,
range=DATA_TO_PULL).execute()
data = rows.get('values')
print("COMPLETE: Data copied")
return data
#list files in the active brews folder
activebrews = drive.ListFile({'q':"'0BxU70FB_wb-Da0x5amtYbUkybXc' in parents"}).GetList()
tabs = ['Brew Log','Fermentation Log','Centrifuge Log']
brewsheetdict ={}
#Pulls data from the entire spreadsheet tab.
for i in activebrews:
for j in tabs:
#set spreadsheet parameters
SCOPE = ['https://www.googleapis.com/auth/drive.readonly']
SPREADSHEET_ID = i['id']
data = pull_sheet_data(SCOPE,SPREADSHEET_ID,j)
dftitle = str(i['title'])
dftab = str(j)
dfname = dftitle+'_'+dftab
brewsheetdict[dfname] = pd.DataFrame(data)
Thanks!
Thanks to Tanaike's suggestion, I settled on the following solution. There's still remaining issues that I haven't resolved. Namely, the resultant API calls occured at a rate of .5/second, well below the published limit, but any faster would still result in rate limiting issues. Additionally, the code completed after executing the for loops on half of the list every time, requiring repeated executions. To work around the second problem, I included the indicated line to remove list items after they were iterated over, so re running the code began with the last incomplete record each time.
import time
SCOPE = ['https://www.googleapis.com/auth/drive.readonly']
recordcount=0
#if (current - start)/(recordcount) >= 1:
sleeptime=2.2
start=time.time()
for i in activebrews:
SPREADSHEET_ID = i['id']
for j in tabs:
data = pull_sheet_data(SCOPE,SPREADSHEET_ID,j)
dftitle = str(i['title'])
dftab = str(j)
dfname = dftitle+'_'+dftab
brewsheetdict[dfname] = pd.DataFrame(data)
recordcount+=1
time.sleep(sleeptime)
end=time.time()
print(recordcount,dfname, (end-start)/recordcount)
activebrews.remove(i)#remove list item after iterated over
time.sleep(1)
brewsheetdata = open("brewsheetdata.pkl","wb")
pickle.dump(brewsheetdict,brewsheetdata)
brewsheetdata.close()
I have a class assignment to write a python program to download end-of-day data last 25 years the major global stock market indices from Yahoo Finance:
Dow Jones Index (USA)
S&P 500 (USA)
NASDAQ (USA)
DAX (Germany)
FTSE (UK)
HANGSENG (Hong Kong)
KOSPI (Korea)
CNX NIFTY (India)
Unfortunately, when I run the program an error occurs.
File "C:\ProgramData\Anaconda2\lib\site-packages\yahoofinancials__init__.py", line 91, in format_date
form_date = datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
ValueError: timestamp out of range for platform localtime()/gmtime() function
If you see below, you can see the code that I have written. I'm trying to debug my mistakes. Can you help me out please? Thanks
from yahoofinancials import YahooFinancials
import pandas as pd
# Select Tickers and stock history dates
index1 = '^DJI'
index2 = '^GSPC'
index3 = '^IXIC'
index4 = '^GDAXI'
index5 = '^FTSE'
index6 = '^HSI'
index7 = '^KS11'
index8 = '^NSEI'
freq = 'daily'
start_date = '1993-06-30'
end_date = '2018-06-30'
# Function to clean data extracts
def clean_stock_data(stock_data_list):
new_list = []
for rec in stock_data_list:
if 'type' not in rec.keys():
new_list.append(rec)
return new_list
# Construct yahoo financials objects for data extraction
dji_financials = YahooFinancials(index1)
gspc_financials = YahooFinancials(index2)
ixic_financials = YahooFinancials(index3)
gdaxi_financials = YahooFinancials(index4)
ftse_financials = YahooFinancials(index5)
hsi_financials = YahooFinancials(index6)
ks11_financials = YahooFinancials(index7)
nsei_financials = YahooFinancials(index8)
# Clean returned stock history data and remove dividend events from price history
daily_dji_data = clean_stock_data(dji_financials
.get_historical_stock_data(start_date, end_date, freq)[index1]['prices'])
daily_gspc_data = clean_stock_data(gspc_financials
.get_historical_stock_data(start_date, end_date, freq)[index2]['prices'])
daily_ixic_data = clean_stock_data(ixic_financials
.get_historical_stock_data(start_date, end_date, freq)[index3]['prices'])
daily_gdaxi_data = clean_stock_data(gdaxi_financials
.get_historical_stock_data(start_date, end_date, freq)[index4]['prices'])
daily_ftse_data = clean_stock_data(ftse_financials
.get_historical_stock_data(start_date, end_date, freq)[index5]['prices'])
daily_hsi_data = clean_stock_data(hsi_financials
.get_historical_stock_data(start_date, end_date, freq)[index6]['prices'])
daily_ks11_data = clean_stock_data(ks11_financials
.get_historical_stock_data(start_date, end_date, freq)[index7]['prices'])
daily_nsei_data = clean_stock_data(nsei_financials
.get_historical_stock_data(start_date, end_date, freq)[index8]['prices'])
stock_hist_data_list = [{'^DJI': daily_dji_data}, {'^GSPC': daily_gspc_data}, {'^IXIC': daily_ixic_data},
{'^GDAXI': daily_gdaxi_data}, {'^FTSE': daily_ftse_data}, {'^HSI': daily_hsi_data},
{'^KS11': daily_ks11_data}, {'^NSEI': daily_nsei_data}]
# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4, data_list5, data_list6, data_list7, data_list8):
data_dict = {}
i = 0
for list_item in data_list2:
if 'type' not in list_item.keys():
data_dict.update({list_item['formatted_date']: {'^DJI': data_list1[i]['close'], '^GSPC': list_item['close'],
'^IXIC': data_list3[i]['close'], '^GDAXI': data_list4[i]['close'],
'^FTSE': data_list5[i]['close'], '^HSI': data_list6[i]['close'],
'^KS11': data_list7[i]['close'], '^NSEI': data_list8[i]['close']}})
i += 1
tseries = pd.to_datetime(list(data_dict.keys()))
df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
columns=['^DJI', '^GSPC', '^IXIC', '^GDAXI', '^FTSE', '^HSI', '^KS11', '^NSEI']).sort_index()
return df
Your problem is your datetime stamps are in the wrong format. If you look at the error code it clugely tells you:
datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
Notice the int(in_date) part?
It wants the unix timestamp. There are several ways to get this, out of the time module or the calendar module, or using Arrow.
import datetime
import calendar
date = datetime.datetime.strptime("1993-06-30", "%Y-%m-%d")
start_date = calendar.timegm(date.utctimetuple())
* UPDATED *
OK so I fixed up to the dataframes portion. Here is my current code:
# Select Tickers and stock history dates
index = {'DJI' : YahooFinancials('^DJI'),
'GSPC' : YahooFinancials('^GSPC'),
'IXIC':YahooFinancials('^IXIC'),
'GDAXI':YahooFinancials('^GDAXI'),
'FTSE':YahooFinancials('^FTSE'),
'HSI':YahooFinancials('^HSI'),
'KS11':YahooFinancials('^KS11'),
'NSEI':YahooFinancials('^NSEI')}
freq = 'daily'
start_date = '1993-06-30'
end_date = '2018-06-30'
# Clean returned stock history data and remove dividend events from price history
daily = {}
for k in index:
tmp = index[k].get_historical_stock_data(start_date, end_date, freq)
if tmp:
daily[k] = tmp['^{}'.format(k)]['prices'] if 'prices' in tmp['^{}'.format(k)] else []
Unfortunately I had to fix a couple things in the yahoo module. For the class YahooFinanceETL:
#staticmethod
def format_date(in_date, convert_type):
try:
x = int(in_date)
convert_type = 'standard'
except:
convert_type = 'unixstamp'
if convert_type == 'standard':
if in_date < 0:
form_date = datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=in_date)
else:
form_date = datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
else:
split_date = in_date.split('-')
d = date(int(split_date[0]), int(split_date[1]), int(split_date[2]))
form_date = int(time.mktime(d.timetuple()))
return form_date
AND:
# private static method to scrap data from yahoo finance
#staticmethod
def _scrape_data(url, tech_type, statement_type):
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
script = soup.find("script", text=re.compile("root.App.main")).text
data = loads(re.search("root.App.main\s+=\s+(\{.*\})", script).group(1))
if tech_type == '' and statement_type != 'history':
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"]
elif tech_type != '' and statement_type != 'history':
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"][tech_type]
else:
if "HistoricalPriceStore" in data["context"]["dispatcher"]["stores"] :
stores = data["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
else:
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"]
return stores
You will want to look at the daily dict, and rewrite your build_data_frame function, which it should be a lot simpler now since you are working with a dictionary already.
I am actually the maintainer and author of YahooFinancials. I just saw this post and wanted to personally apologize for the inconvenience and let you all know I will be working on fixing the module this evening.
Could you please open an issue on the module's Github page detailing this?
It would also be very helpful to know which version of python you were running when you encountered these issues.
https://github.com/JECSand/yahoofinancials/issues
I am at work right now, however as soon as I get home in ~7 hours or so I will attempt to code a fix and release it. I'll also work on the exception handling. I try my best to maintain this module, but my day (and often night time) job is rather demanding. I will report back with the final results of these fixes and publish to pypi when it is done and stable.
Also if anyone else has any feedback or personal fixes made you can offer, it would be a huge huge help in fixing this. Proper credit will be given of course. I am also in desperate need of contributers, so if anyone is interested in that as well let me know. I am really wanting to take YahooFinancials to the next level and have this project become a stable and reliable alternative for free financial data for python projects.
Thank you for your patience and for using YahooFinancials.
This is my first post so please bear with me.
I have a large (~1GB) json file of Tweets I collected via Twitter's Streaming API. I am able to successfully parse this out into a CSV with the fields I need, however, it is painfully slow - even with the few entities I am extracting (userid, lat/long, and parsing Twitter date string to date/time). What methods could I potentially use to try and speed this up? It currently takes several hours, and I'm anticipating collecting more data....
import ujson
from datetime import datetime
from dateutil import tz
from csv import writer
import time
def hms_string(sec_elapsed):
h = int(sec_elapsed / (60 * 60))
m = int((sec_elapsed % (60 * 60)) / 60)
s = sec_elapsed % 60.
return "{}:{:>02}:{:>05.2f}".format(h, m, s)
start_time = time.time()
with open('G:\Programming Projects\GGS 681\dmv_raw_tweets1.json', 'r') as in_file, \
open('G:\Programming Projects\GGS 681\dmv_tweets1.csv', 'w') as out_file:
print >> out_file, 'user_id,timestamp,latitude,longitude'
csv = writer(out_file)
tweets_count = 0
for line in in_file:
tweets_count += 1
tweets = ujson.loads(line)
timestamp = []
lats = ''
longs = ''
for tweet in tweets:
tweet = tweets
from_zone = tz.gettz('UTC')
to_zone = tz.gettz('America/New_York')
times = tweet['created_at']
for tweet in tweets:
times = tweets['created_at']
utc = datetime.strptime(times, '%a %b %d %H:%M:%S +0000 %Y')
utc = utc.replace(tzinfo=from_zone) #comment out to parse to utc
est = utc.astimezone(to_zone) #comment out to parse to utc
timestamp = est.strftime('%m/%d/%Y %I:%M:%S %p') # use %p to differentiate AM/PM
for tweet in tweets:
if tweets['geo'] and tweets['geo']['coordinates'][0]:
lats, longs = tweets['geo']['coordinates'][:2]
else:
pass
row = (
tweets['user']['id'],
timestamp,
lats,
longs
)
values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
csv.writerow(values)
end_time = time.time()
print "{} to execute this".format(hms_string(end_time - start_time))
It appears I may have solved this. Looking at the code I was actually running, it looks like my if/else statement below was incorrect.
for tweet in tweets:
if tweets['geo'] and tweets['geo']['coordinates'][0]:
lats, longs = tweets['geo']['coordinates'][:2]
else:
None
I was using else: None, when I should've been using pass or continue. Also, I removed the inner iteration in tweets in my original code. It was able to parse a 60mb file about 4 minutes. Still, if anyone has any tips for making this any faster, I'm open to your suggestions.
Edit: I also used ujson which has significantly increased the speed of loading/dumping the json data from twitter.