As per the title, my if/else below are not being considered — not sure why.
Here is my code:
cursor.execute("SELECT epic, MAX(timestamp) FROM market_data GROUP BY epic")
epics=(
"KA.D.MXUSLN.DAILY.IP",
"CS.D.BITCOIN.TODAY.IP",
"CS.D.CRYPTOB10.TODAY.IP")
for row in cursor:
for epic in epics:
# If epic exists in the market_data table then take the max timestamp and request new data with date1=maxtimestamp+1min and date2=now()
if epic in row['epic']:
date1 = row['max'] + datetime.timedelta(minutes=1)
date2 = datetime.datetime.now()
else:
# if epic not already in market_data table then fresh new request with date1=now() and date2=now()+1min
date1 = datetime.datetime.now()
date2 = datetime.datetime.now() + datetime.timedelta(minutes=1)
# URL PRODUCTION/LIVE Enviroment - demo most likely throttled and limited
fmt = "https://example.com/" + str(epic) + "/1/MINUTE/batch/start/{date1:%Y/%m/%d/%H/%M/0/0}/end/{date2:%Y/%m/%d/%H/%M/%S/0}?format=json"
# while date1 <= date2:
url = fmt.format(epic, date1=date1, date2=date2)
resp = requests.get(url, headers=headers)
print(url)
The output of cursor is:
CS.D.BITCOIN.TODAY.IP 2019-05-01 00:00:00
KA.D.MXUSLN.DAILY.IP 2020-02-14 14:26:00
The code above outputs this:
https://example.com/CS.D.BITCOIN.TODAY.IP/start/2019/05/01/00/01/0/0/end/2020/02/14/15/10/44/0?format=json
https://example.com/CS.D.CRYPTOB10.TODAY.IP/start/2020/02/14/15/10/0/0/end/2020/02/14/15/11/44/0?format=json
https://example/KA.D.MXUSLN.DAILY.IP/start/2020/02/14/14/27/0/0/end/2020/02/14/15/10/44/0?format=json
https://example.com/CS.D.BITCOIN.TODAY.IP/start/2020/02/14/15/10/0/0/end/2020/02/14/15/11/44/0?format=json
https://example.com/CS.D.CRYPTOB10.TODAY.IP/start/2020/02/14/15/10/0/0/end/2020/02/14/15/11/44/0?format=json
Note - as, epics "KA.D.MXUSLN.DAILY.IP" and "CS.D.BITCOIN.TODAY.IP are already in cursor, I expect the output to just be:
https://example.com/CS.D.BITCOIN.TODAY.IP/start/2019/05/01/00/01/0/0/end/2020/02/14/15/10/44/0?format=json
https://example.com/CS.D.CRYPTOB10.TODAY.IP/start/2020/02/14/15/10/0/0/end/2020/02/14/15/11/44/0?format=json
https://example/KA.D.MXUSLN.DAILY.IP/start/2020/02/14/14/27/0/0/end/2020/02/14/15/10/44/0?format=json
Why aren't my if and else being considered?
It is considered, but then you continue to iterate over the other epics anyway and print those too. You could use next instead of your inner for loop, if you find a match, remove it from the list of epics. and then any remaining epics can be handled afterwards as required
for row in cursor:
epic = next(epic for epic in epics if epic in row["epic"])
if epic is not None:
date1 = row['max'] + datetime.timedelta(minutes=1)
date2 = datetime.datetime.now()
epics.remove(epic)
else:
date1 = datetime.datetime.now()
date2 = datetime.datetime.now() + datetime.timedelta(minutes=1)
# URL PRODUCTION/LIVE Enviroment - demo most likely throttled and limited
fmt = "https://example.com/" + str(epic) + "/1/MINUTE/batch/start/{date1:%Y/%m/%d/%H/%M/0/0}/end/{date2:%Y/%m/%d/%H/%M/%S/0}?format=json"
# while date1 <= date2:
url = fmt.format(epic, date1=date1, date2=date2)
resp = requests.get(url, headers=headers)
print(url)
Note: This leaves an issue where your fmt url will contain None, if there are no matches, not sure how you wish to handle this.
Related
I am trying to do a function where I check if a date is in my excel file, and if unfortunately it is not. I retrieve the date before.
I succeeded with the after date and here is my code.
Only with the date before, I really can't do it.
i tried this for the day before:
def get_all_dates_between_2_dates_with_special_begin_substraction(Class, date_départ, date_de_fin, date_debut_analyse, exclus=False):
date_depart = date_départ
date_fin = date_de_fin
result_dates = []
inFile = "database/Calendar_US_Target.xlsx"
inSheetName = "Sheet1"
df =(pd.read_excel(inFile, sheet_name = inSheetName))
date_depart = datetime.datetime.strptime(date_depart, '%Y-%m-%d')
date_fin = datetime.datetime.strptime(date_fin, '%Y-%m-%d')
date_calcul_depart = datetime.datetime.strptime(date_debut_analyse, '%Y-%m-%d')
var_date_depart = date_depart
time_to_add = ""
if (Class.F0 == "mois"):
time_to_add = relativedelta(months=1)
if (Class.F0 == "trimestre"):
time_to_add = relativedelta(months=3)
if (Class.F0 == "semestre"):
time_to_add = relativedelta(months=6)
if (Class.F0 == "année"):
time_to_add = relativedelta(years=1)
while var_date_depart <= date_fin:
-------------------------------------------------------------
df['mask'] = (var_date_depart <= df['TARGETirs_holi']) # daybefore
print(df.head())
print(df[df.mask =="True"].head(1)) #want to check the last true value
------------------------------------------------------------------------------
if (result >= date_calcul_depart):
result = (str(result)[0:10])
result = result[8:10] + "/" + result[5:7] + "/" + result[0:4]
result_dates.append(str(result))
var_date_depart = var_date_depart + time_to_add
if (exclus == True):
result_dates = result_dates[1:-1]
return(result_dates)
I want to say, do a column (or a dataframe) where the first date is true where the first date smaller than the second then i take the last value who is true.
for example:
I have this array [12-05-2022,15-05-2022,16-05-2022 and 19-05-2022]
if i put 15-05-2022, it gives me 15-05-2022, but if i put 18-05-2022, its gives me 16-05-2022
Thanks!
I'm working on an elearning website and I'm trying to integrate Zoom meetings using the API
According to the official documentation, the start_time must be set to the yyyy-MM-ddTH:M:S.
Example : 2020-10-02T18:00:00
Based on that, this is the code I'm using.
class Zoom:
...
def parse_date(self, date):
parts = date.strip().split(' ')
part1 = parts[0]
part2 = parts[1]
parts1 = part1.split('/')
day = parts1[0]
month = parts1[1]
year = parts1[2]
parts2 = part2.split(':')
h = parts2[0]
m = parts2[1]
formatted_date = year + '-' + month + '-' + day + 'T' + h + ':' + m + ':00Z'
return formatted_date
def create_meeting(self, topic, start_date, password):
token = self.get_token()
conn = http.HTTPSConnection(Zoom.ZOOM_API_URL)
headers = {'authorization': "Bearer " + token, 'content-type': "application/json"}
data = {'topic': topic, 'type': 2, 'start_time': self.parse_date(start_date), 'timezone': 'Africa/Casablanca', 'password': password}
conn.request("POST", "/v2/users/me/meetings", json.dumps(data), headers)
response = json.loads(conn.getresponse().read().decode('utf-8'))
return response
zoom = Zoom('API_KEY', 'API_SECRET')
meeting = zoom.create_meeting(topic='Learning test', start_date='02/10/2020 18:00', password='123456')
The meeting is created but the start date is ignored as shown in the image
As you can see I specified 6 PM as a start date but it's 7 PM.
It seems the problem was caused by the Z at the end of the date. After removing it the date hour is no longer incremented.
I'm using an API to lookup historical stock market prices for a given company on the last day of each month. The problem is that the last day can sometimes fall on a weekend or holiday, in which case the API returns a KeyError. I've tried using an exception to handle this by adding n number to the date to get the next-closest valid one, but this is not foolproof.
Here is my existing code:
import os
from iexfinance.stocks import get_historical_data
import iexfinance
import pandas as pd
# Set API Keys
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tsk_5798c0ab124d49639bb1575b322841c4'
stocks = ['AMZN', 'FDX', 'XXXXX', 'BAC', 'COST']
date = "20191130"
for stock in stocks:
try:
price_df = get_historical_data(stock, date, close_only=True,output_format='pandas')
price = price_df['close'].values[0]
print(price)
except KeyError:
date = str(int(date) - 1)
price_df = get_historical_data(stock, date, close_only=True, output_format='pandas')
price = price_df['close'].values[0]
print(price)
except iexfinance.utils.exceptions.IEXQueryError:
print(stock + " is not a valid company")
But if you change date = "20160131", then you get a KeyError again.
So is there a simple way to handle these exceptions and get the next-valid date?
Note that the API Key is public and for sandbox purposes, so feel free to use
I think this might work:
def get_prices(stocks, date):
for stock in stocks:
try:
price_df = get_historical_data(stock, date, close_only=True,output_format='pandas')
price = price_df['close'].values[0]
print(stock + " was # $" + str(price) + " on " + str(date))
except KeyError:
return get_prices(stocks, date = str(int(date) - 1))
print(stock + " was # $" + str(price) + " on " + str(date))
except iexfinance.utils.exceptions.IEXQueryError:
print(stock + " is not a valid company")
Im trying to use the eventful api to get information about only music events (concerts) between two dates. For example I want to get the below information about each concert from 20171012 to 20171013:
- city
- performer
- country
- latitude
- longitude
- genre
- title
- image
- StarTime
Im using a python example available online and change it to get the data above. But for now its not working Im just able to get this information:
{'latitude': '40.4',
'longitude': '-3.68333',
'start_time': '2017-10-12 20:00:00',
'city_name': 'Madrid', 'title': 'Kim Waters & Maysa Smooth en Hot Jazz Festival'}
But the performer, genre country and image url its not working. Do you know how to get that information? When I change the python example below to get this information it returns always a empty array.
python example working: (However, without getting the performer, genre, country and image url, if I add theese elements to the event_features I get an empty array)
import requests
import datetime
def get_event(user_key, event_location , start_date, end_date, event_features, fname):
data_lst = [] # output
start_year = int(start_date[0:4])
start_month = int(start_date[4:6])
start_day = int(start_date[6:])
end_year = int(end_date[0:4])
end_month = int(end_date[4:6])
end_day = int(end_date[6:])
start_date = datetime.date(start_year, start_month, start_day)
end_date = datetime.date(end_year, end_month, end_day)
step = datetime.timedelta(days=1)
while start_date <= end_date:
date = str(start_date.year)
if start_date.month < 10:
date += '0' + str(start_date.month)
else:
date += str(start_date.month)
if start_date.day < 10:
date += '0' + str(start_date.day)
else:
date += str(start_date.day)
date += "00"
date += "-" + date
url = "http://api.eventful.com/json/events/search?"
url += "&app_key=" + user_key
url += "&location=" + event_location
url += "&date=" + date
url += "&page_size=250"
url += "&sort_order=popularity"
url += "&sort_direction=descending"
url += "&q=music"
url+= "&c=music"
data = requests.get(url).json()
try:
for i in range(len(data["events"]["event"])):
data_dict = {}
for feature in event_features:
data_dict[feature] = data["events"]["event"][i][feature]
data_lst.append(data_dict)
except:
pass
print(data_lst)
start_date += step
def main():
user_key = ""
event_location = "Madrid"
start_date = "20171012"
end_date = "20171013"
event_location = event_location.replace("-", " ")
start_date = start_date
end_date = end_date
event_features = ["latitude", "longitude", "start_time"]
event_features += ["city_name", "title"]
event_fname = "events.csv"
get_event(user_key, event_location, start_date, end_date, event_features, event_fname)
if __name__ == '__main__':
main()
You should debug your problem and not to ignore all exceptions.
Replace lines try: ... except: pass by:
data = requests.get(url).json()
if "event" in data.get("event", {}):
for row in data["events"]["event"]:
# print(row) # you can look here what are the available data, while debugging
data_dict = {feature: row[feature] for feature in features}
data_lst.append(data_dict)
else:
pass # a problem - you can do something here
You will see a KeyError with a name of the missing feature that is not present in "row". You should fix missing features and read documentation about API of that service. Country feature is probably "country_name" similarly to "city_name". Maybe you should set the "include" parameter to specify more sections of details in search than defaults only.
An universal try: ... except: pass should never used, because "Errors should never pass silently." (The Zen of Python)
Read Handling Exceptions:
... The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! ...
A more important command where unexpected exceptions are possible is requests.get(url).json(), e.g. TimeoutException. Anyway you should not continue the "while" loop if there is a problem.
If you look at the data returned by eventful.com, a few things are clear:
For country, the field to be used is country_name. This was missing from your "event_features" list
There can be multiple performers for each event. To get all the performers, you need to add "performers" to your "event_features" list
There is no field named Genre and hence you cannot find Genre
The "image" field is always None. This means there is no image available.
Here is modified code. Hopefully it works much better and it will help you move forward.
import datetime
import requests
data_lst = [] # output
event_features = ["latitude", "longitude", "start_time", "city_name",
"country_name", "title", "image", "performers"]
def get_event(user_key, event_location, start_date, end_date):
start_year = int(start_date[0:4])
start_month = int(start_date[4:6])
start_day = int(start_date[6:])
end_year = int(end_date[0:4])
end_month = int(end_date[4:6])
end_day = int(end_date[6:])
start_date = datetime.date(start_year, start_month, start_day)
end_date = datetime.date(end_year, end_month, end_day)
step = datetime.timedelta(days=1)
while start_date <= end_date:
date = str(start_date.year)
if start_date.month < 10:
date += '0' + str(start_date.month)
else:
date += str(start_date.month)
if start_date.day < 10:
date += '0' + str(start_date.day)
else:
date += str(start_date.day)
date += "00"
date += "-" + date
url = "http://api.eventful.com/json/events/search?"
url += "&app_key=" + user_key
url += "&location=" + event_location
url += "&date=" + date
url += "&page_size=250"
url += "&sort_order=popularity"
url += "&sort_direction=descending"
url += "&q=music"
url += "&c=music"
data = requests.get(url).json()
print "==== Data Returned by eventful.com ====\n", data
try:
for i in range(len(data["events"]["event"])):
data_dict = {}
for feature in event_features:
data_dict[feature] = data["events"]["event"][i][feature]
data_lst.append(data_dict)
except IndexError:
pass
print "===================================="
print data_lst
start_date += step
def main():
user_key = "Enter Your Key Here"
event_location = "Madrid"
start_date = "20171012"
end_date = "20171013"
event_location = event_location.replace("-", " ")
start_date = start_date
end_date = end_date
#event_fname = "events.csv"
get_event(user_key, event_location, start_date, end_date)
if __name__ == '__main__':
main()
I was able to successfully pull data from the Eventful API for the performer, image, and country fields. However, I don't think the Eventful Search API supports genre - I don't see it in their documentation.
To get country, I added "country_name", "country_abbr" to your event_features array. That adds these values to the resulting JSON:
'country_abbr': u'ESP',
'country_name': u'Spain'
Performer also can be retrieved by adding "performers" to event_features. That will add this to the JSON output:
'performers': {
u'performer': {
u'name': u'Kim Waters',
u'creator': u'evdb',
u'url': u'http://concerts.eventful.com/Kim-Waters?utm_source=apis&utm_medium=apim&utm_campaign=apic',
u'linker': u'evdb',
u'short_bio': u'Easy Listening / Electronic / Jazz', u'id': u'P0-001-000333271-4'
}
}
To retrieve images, add image to the event_features array. Note that not all events have images, however. You will either see 'image': None or
'image': {
u'medium': {
u'url': u'http://d1marr3m5x4iac.cloudfront.net/store/skin/no_image/categories/128x128/other.jpg',
u'width': u'128',
u'height': u'128'
},
u'thumb': {
u'url': u'http://d1marr3m5x4iac.cloudfront.net/store/skin/no_image/categories/48x48/other.jpg',
u'width': u'48',
u'height': u'48'
}
}
Good luck! :)
SECOND EDIT:
Finished snippet for adjusting timezones and converting format. See correct answer below for details leading to this solution.
tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)
def timestamp_to_str(timestamp):
return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')
timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]
adjtimesfloat = [float(i) for i in adjtimes]
dtinfofloat = [float(i) for i in dtinfo]
finishedtimes = [x for x in map(timestamp_to_str, adjtimesfloat)]
originaltimes = [x for x in map(timestamp_to_str, dtinfofloat)]
END SECOND EDIT
EDIT:
This code allows me to scrape the POSIX time from the HTML file and then add a number of hours entered by the user to the original value. Negative numbers will also work to subtract hours. The user will be working in whole hours as the changes are specifically to adjust for timezones.
tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)
timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]
All that is left is a reverse of a function like the one suggested below. How do I convert each POSIX time in the list to a readable format using a function?
END EDIT
The code below creates a csv file containing data scraped from a saved Twitter HTML file.
Twitter converts all the timestamps to the user's local time in the browser. I would like to have an input option for the user to adjust the timestamps by a certain number of hours so that the data for the tweet reflects the tweeter's local time.
I'm currently scraping an element called 'title' that is a part of each permalink. I could just as easily scrape the POSIX time from each tweet instead.
title="2:29 PM - 28 Sep 2015"
vs
data-time="1443475777" data-time-ms="1443475777000"
How would I edit the following piece so it added a variable entered by the user to each timestamp? I don't need help with requesting input, I just need to know how to apply it to the list of timestamps after the input is passed to python.
timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]
Other questions related to this code/project.
Fix encoding error with loop in BeautifulSoup4?
Focusing in on specific results while scraping Twitter with Python and Beautiful Soup 4?
Using Python to Scrape Nested Divs and Spans in Twitter?
Full code.
from bs4 import BeautifulSoup
import requests
import sys
import csv
import re
from datetime import datetime
from pytz import timezone
url = input("Enter the name of the file to be scraped:")
with open(url, encoding="utf-8") as infile:
soup = BeautifulSoup(infile, "html.parser")
#url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
#headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
#r = requests.get(url, headers=headers)
#data = r.text.encode('utf-8')
#soup = BeautifulSoup(data, "html.parser")
names = soup('strong', {'class': 'fullname js-action-profile-name show-popup-with-id'})
usernames = [name.contents for name in names]
handles = soup('span', {'class': 'username js-action-profile-name'})
userhandles = [handle.contents[1].contents[0] for handle in handles]
athandles = [('#')+abhandle for abhandle in userhandles]
links = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
urls = [link["href"] for link in links]
fullurls = [permalink for permalink in urls]
timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]
messagetexts = soup('p', {'class': 'TweetTextSize js-tweet-text tweet-text'})
messages = [messagetext for messagetext in messagetexts]
retweets = soup('button', {'class': 'ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet'})
retweetcounts = [retweet.contents[3].contents[1].contents[1].string for retweet in retweets]
favorites = soup('button', {'class': 'ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite'})
favcounts = [favorite.contents[3].contents[1].contents[1].string for favorite in favorites]
images = soup('div', {'class': 'content'})
imagelinks = [src.contents[5].img if len(src.contents) > 5 else "No image" for src in images]
#print (usernames, "\n", "\n", athandles, "\n", "\n", fullurls, "\n", "\n", datetime, "\n", "\n",retweetcounts, "\n", "\n", favcounts, "\n", "\n", messages, "\n", "\n", imagelinks)
rows = zip(usernames,athandles,fullurls,datetime,retweetcounts,favcounts,messages,imagelinks)
rownew = list(rows)
#print (rownew)
newfile = input("Enter a filename for the table:") + ".csv"
with open(newfile, 'w', encoding='utf-8') as f:
writer = csv.writer(f, delimiter=",")
writer.writerow(['Usernames', 'Handles', 'Urls', 'Timestamp', 'Retweets', 'Favorites', 'Message', 'Image Link'])
for row in rownew:
writer.writerow(row)
Using your code as example, the var datetime store a list of string dates. So let's dissect the process in 3 steps, just for comprehension.
Example
>>> datetime = [timestamp["title"] for timestamp in timestamps]
>>> print(datetime)
['2:13 AM - 29 Sep 2015', '2:29 PM - 28 Sep 2015', '8:04 AM - 28 Sep 2015']
First step: convert it to a Python datetime object.
>>> datetime_obj = datetime.strptime('2:13 AM - 29 Sep 2015', '%H:%M %p - %d %b %Y')
>>> print(datetime_obj)
datetime.datetime(2015, 9, 29, 2, 13)
Second step: convert datetime object to a Python structured time object
>>> to_time = struct_date.timetuple()
>>> print(to_time)
time.struct_time(tm_year=2015, tm_mon=9, tm_mday=29, tm_hour=2, tm_min=13, tm_sec=0, tm_wday=1, tm_yday=272, tm_isdst=-1)
Third step: convert sturctured time object to time using time.mktime
>>> timestamp = time.mktime(to_time)
>>> print(timestamp)
1443503580.0
All together now.
import time
from datetime import datetime
...
def str_to_ts(str_date):
return time.mktime(datetime.strptime(str_date, '%H:%M %p - %d %b %Y').timetuple())
datetimes = [timestamp["title"] for timestamp in timestamps]
times = [i for i in map(str_to_ts, datetimes)]
PS: datetime is a bad choice for variable name. Specially in this context. :-)
Update
To apply a function to each value of list:
def add_time(timestamp, hours=0, minutes=0, seconds=0):
return timestamp + seconds + (minutes * 60) + (hours * 60 * 60)
datetimes = [timestamp["title"] for timestamp in timestamps]
times = [add_time(i, 5, 0, 0) for i in datetimes]
Update 2
To convert a timestamp to string formatted date:
def timestamp_to_str(timestamp):
return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')
Example:
>>> from time import time
>>> from datetime import datetime
>>> timestamp_to_str(time())
'17:01:47 08/29/2016'
This is what I was thinking but not sure if this is what you're after:
>>> timestamps = ["1:00 PM - 28 Sep 2015", "2:00 PM - 28 Sep 2016", "3:00 PM - 29 Sep 2015"]
>>> datetime = dict(enumerate(timestamps))
>>> datetime
{0: '1:00 PM - 28 Sep 2015',
1: '2:00 PM - 28 Sep 2016',
2: '3:00 PM - 29 Sep 2015'}
It seems you are looking for datetime.timedelta (documentation here). You can convert your inputs into datetime.datetime objects in various ways, for example,
timestamp = datetime.datetime.fromtimestamp(1443475777)
Then you can perform arithmetic on them with timedelta objects. A timedelta just represents a change in time. You can construct one with an hours argument like so:
delta = datetime.timedelta(hours=1)
And then timestamp + delta will give you another datetime one hour in the future. Subtraction will work as well, as will other arbitrary time intervals.