I created a web scraping program that open several URLs, it checks which one of the URLs has information related to "tomorrow"s date and then it prints some specific information that is on that URL. My problem is that sometimes none of the URLs in that list has information concerning "tomorrow". So I would like that in such case, the program prints other innformation like "no data found". How could I accomplish that? Other doubt I have, do I need the while loop at the beginning? Thanks.
My code is:
from datetime import datetime, timedelta
tomorrow = datetime.now() + timedelta(days=1)
tomorrow = tomorrow.strftime('%d-%m-%Y')
day = ""
while day != tomorrow:
for url in list_urls:
browser.get(url)
time.sleep(1)
dia_page = browser.find_element_by_xpath("//*[#id='item2']/b").text
dia_page = dia_page[-10:]
day_uns = datetime.strptime(dia_page, "%d-%m-%Y")
day = day_uns.strftime('%d-%m-%Y')
if day == tomorrow:
meals = browser.find_elements_by_xpath("//*[#id='item2']/span")
meal_reg = browser.find_element_by_xpath("//*[#id='item_frm']/span[1]").text
sopa2 = (meals[0].text)
refeicao2 = (meals[1].text)
sobremesa2 = (meals[2].text)
print(meal_reg)
print(sopa2)
print(refeicao2)
print(sobremesa2)
break
No need for a while loop, you can use the for-else Python construct for this:
for url in list_urls:
# do stuff
if day == tomorrow:
# do and print stuff
break
else: # break never encountered
print("no data found")
Related
I’m trying to get stock prices to buy or sell on specific date. When the buy price, given the sell price should be NAN. Likewise, if the sell-price is given, the buy price has to be NAN. This function and coding is originally proposed by Joseph Hart (https://medium.com/analytics-vidhya/sma-short-moving-average-in-python-c656956a08f8).
The return values of the function are (sig_buy_price, sig_sell_price). My data source is Pandas DataFrame, namely qqq_df. SMA_30 and SMA_100 are samples drawn from qqq_df.
The output is not giving me the expected result, which is stated above. Please find the code indicated below. I need specific steps and codes, to resolve the issue. I look forward to hearing from forum members. Thanks.
def buy_sell(qqq_df):
sig_price_buy = []
sig_price_sell = []
flag = -1
for i in range(len(qqq_df)):
if qqq_df['sma_30'][i] > qqq_df['sma_100'][i]:
if flag != 1:
sig_price_buy.append(qqq_df['close'] [i])
sig_price_sell.append(np.nan)
print(qqq_df['date'][i])
else:
sig_price_buy.append(np.nan)
sig_price_buy.append(np.nan)
elif qqq_df['sma_30'][i] < qqq_df['sma_100'][i]:
if flag != 0:
sig_price_buy.append(np.nan)
sig_price_sell.append(qqq_df ['close'] [i])
print(qqq_df['date'][i])
flag = 0
else:
sig_price_buy.append(np.nan)
sig_price_sell.append(np.nan)
else:
sig_price_buy.append(np.nan)
sig_price_sell.append(np.nan)
return(sig_price_buy, sig_price_sell)
b, s = buy_sell(qqq_df = qqq_df)
print(b, s)
The following code worked for me:
def buy_sell(data):
sig_price_buy = []
sig_price_sell = []
flag = -1
for i in range(len(data)):
if data['SMA30'][i] >= data['SMA100'][i] and flag != 1:
sig_price_buy.append(data['TSLA'][i])
sig_price_sell.append(np.nan)
flag = 1
elif data['SMA30'][i] <= data['SMA100'][i] and flag != 0:
sig_price_buy.append(np.nan)
sig_price_sell.append(data['TSLA'][i])
flag = 0
else:
sig_price_buy.append(np.nan)
sig_price_sell.append(np.nan)
return (sig_price_buy, sig_price_sell)
The format of the input data must be:
TSLA = pd.read_csv("TSLA.csv")
close = TSLA.Close
data = pd.DataFrame()
data['SMA30'] = close.rolling(window = 30).mean()
data['SMA100'] = close.rolling(window = 100).mean()
data['TSLA'] = close
I tested it on the TSLA stocks, so nevermind the different naming.
If you want to print it in a certain format, which I read in one of your comments, I recommend first running the function and then printing the resulting arrays formatted the way you like in a second step.
EDIT:
If you wish to access a yahoo finance stock directly from python, you can use the following framework:
import datetime
from os import mkdir, path
from urllib import request
def convertDateToYahooTime(date: datetime.datetime) -> int: #converts a date to a POSIX for the yahoo finance website
return int(date.replace(hour=2, minute=0, second=0, microsecond=0).timestamp())
def downloadStock(stock: str, olderDate: datetime.datetime, recentDate: datetime.datetime) -> str: #downloads a stock from yahoo finance and returns the filename
p1: int = convertDateToYahooTime(olderDate)
p2: int = convertDateToYahooTime(recentDate)
link = f"https://query1.finance.yahoo.com/v7/finance/download/{stock}?period1={p1}&period2={p2}&interval=1d&events=history&includeAdjustedClose=true"
file = f"{stock}_{olderDate.date().strftime('%Y-%m-%d')}_{recentDate.date().strftime('%Y-%m-%d')}"
if not path.exists('stocks'):
mkdir('stocks')
response = request.urlretrieve(link, f'stocks//{file}') #save the stock to a folder named 'stocks'
return file
def fetchStock(stock: str, period: int) -> str: #fetches a stock and returns the filename
today = datetime.datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
past = datetime.datetime.fromtimestamp(today.timestamp() - period*86400)
return downloadStock(stock, past, today)
downladStock accesses the download link to any yahoo finance stock you want. fetchStock just makes your life easier as you can enter the name of your stock and the period of how many days of the stock you wish to access, counting backward from the present day. So if you do this:
STOCK = 'GOOG'
PERIOD = 1000
file = fetchStock(STOCK, PERIOD)
df = pd.read_csv('stocks//' + file)
df will be the google stock of the past 1000 days. You can do this to load any stock you want. after this, you can repeat the steps I mentioned above to format the stock and run the SMA30/100 algorithm on it.
I'm not sure whether this is what you wanted, but I hope it helps and you have fun with it.
Hello Community Members,
I am very new to python language and programming, currently I am working on a news API that shows the news from that API. I want this program to check and update whenever there is any update to the API. Please help what can I do to complete this.
CODE:
url = 'https://cryptopanic.com/api/v1/posts/?auth_token=<my token>&filter=hot'
html_link = requests.get(url)
datatype = html_link.json()
news_info = datatype['results']
latest_news = news_info[0]['title']
source = news_info[0]['source']['title']
print(latest_news)
I want this latest_news variable which stores the news to print whenever there is new news in the list, I have tried comparison method but still didn't find anything so far.
Does this fill your criteria? You have to run it every 5 minutes, or any time you want and you will get the latest titles.
import requests, json
old_news_info = {"news": []}
try:
old_news_info = json.load(open("old_news_info.json", "r"))
except:
pass
url = 'https://cryptopanic.com/api/v1/posts/?auth_token=<token>&filter=hot'
print("waiting for response")
html_link = requests.get(url)
datatype = html_link.json()
if datatype != {'status': 'Incomplete', 'info': 'Token not found'}:
news_info = datatype['results']
if not news_info[0] in old_news_info["news"]:
for news in news_info:
if news in old_news_info["news"]:
break
else:
old_news_info["news"].append(news)
print(news["source"]['title'])
json.dump(old_news_info, open("old_news_info.json", "w"), indent = 4)
else:
print("Token not found")
I am trying to open up several URL's (because they contain data I want to append to a list). I have a logic saying "if amount in icl_dollar_amount_l" then run the rest of the code. However, I want the script to only run the rest of the code on that specific amount in the variable "amount".
Example:
selenium opens up X amount of links and sees ['144,827.95', '5,199,024.87', '130,710.67'] in icl_dollar_amount_l but i want it to skip '144,827.95', '5,199,024.87' and only get the information for '130,710.67' which is in the 'amount' variable already.
Actual results:
Its getting webscaping information for amount '144,827.95' only and not even going to '5,199,024.87', '130,710.67'. I only want it getting webscaping information for '130,710.67' because my amount variable has this as the only amount.
print(icl_dollar_amount_l)
['144,827.95', '5,199,024.87', '130,710.67']
print(amount)
'130,710.67'
file2.py
def scrapeBOAWebsite(url,fcg_subject_l, gp_subject_l):
from ICL_Awk_Checker import rps_amount_l2
icl_dollar_amount_l = []
amount_ack_missing_l = []
file_total_l = []
body_l = []
for link in url:
print(link)
browser = webdriver.Chrome(options=options,
executable_path=r'\\TEST\user$\TEST\Documents\driver\chromedriver.exe')
# if 'P2 Cust ID 908554 File' in fcg_subject:
browser.get(link)
username = browser.find_element_by_name("dialog:username").get_attribute('value')
submit = browser.find_element_by_xpath("//*[#id='dialog:continueButton']").click()
body = browser.find_element_by_xpath("//*[contains(text(), 'Total:')]").text
body_l.append(body)
icl_dollar_amount = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
icl_dollar_amount_l.append(icl_dollar_amount)
if not missing_amount:
logging.info("List is empty")
print("List is empty")
count = 0
for amount in missing_amount:
if amount in icl_dollar_amount_l:
body = body_l[count]
get_file_total = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
file_total_l.append(get_file_total)
return icl_dollar_amount_l, file_date_l, company_id_l, client_id_l, customer_name_l, file_name_l, file_total_l, \
item_count_l, file_status_l, amount_ack_missing_l
I don't know if I understand problem but this
if amount in icl_dollar_amount_l:
doesn't give information on which position is '130,710.67' in icl_dollar_amount_l and you need also
count = icl_dollar_amount_l.index(amount)
for amount in missing_amount:
if amount in icl_dollar_amount_l:
count = icl_dollar_amount_l.index(amount)
body = body_l[count]
But it will works if you expect only one amount on list icl_dollar_amount_l. For more elements you would have to use rather for-loop and check every element separatelly
for amount in missing_amount:
for count, item in enumerate(icl_dollar_amount_l)
if amount == item :
body = body_l[count]
But frankly I don't know why you don't check it in first loop for link in url: when you have direct access to icl_dollar_amount and body
I have a class assignment to write a python program to download end-of-day data last 25 years the major global stock market indices from Yahoo Finance:
Dow Jones Index (USA)
S&P 500 (USA)
NASDAQ (USA)
DAX (Germany)
FTSE (UK)
HANGSENG (Hong Kong)
KOSPI (Korea)
CNX NIFTY (India)
Unfortunately, when I run the program an error occurs.
File "C:\ProgramData\Anaconda2\lib\site-packages\yahoofinancials__init__.py", line 91, in format_date
form_date = datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
ValueError: timestamp out of range for platform localtime()/gmtime() function
If you see below, you can see the code that I have written. I'm trying to debug my mistakes. Can you help me out please? Thanks
from yahoofinancials import YahooFinancials
import pandas as pd
# Select Tickers and stock history dates
index1 = '^DJI'
index2 = '^GSPC'
index3 = '^IXIC'
index4 = '^GDAXI'
index5 = '^FTSE'
index6 = '^HSI'
index7 = '^KS11'
index8 = '^NSEI'
freq = 'daily'
start_date = '1993-06-30'
end_date = '2018-06-30'
# Function to clean data extracts
def clean_stock_data(stock_data_list):
new_list = []
for rec in stock_data_list:
if 'type' not in rec.keys():
new_list.append(rec)
return new_list
# Construct yahoo financials objects for data extraction
dji_financials = YahooFinancials(index1)
gspc_financials = YahooFinancials(index2)
ixic_financials = YahooFinancials(index3)
gdaxi_financials = YahooFinancials(index4)
ftse_financials = YahooFinancials(index5)
hsi_financials = YahooFinancials(index6)
ks11_financials = YahooFinancials(index7)
nsei_financials = YahooFinancials(index8)
# Clean returned stock history data and remove dividend events from price history
daily_dji_data = clean_stock_data(dji_financials
.get_historical_stock_data(start_date, end_date, freq)[index1]['prices'])
daily_gspc_data = clean_stock_data(gspc_financials
.get_historical_stock_data(start_date, end_date, freq)[index2]['prices'])
daily_ixic_data = clean_stock_data(ixic_financials
.get_historical_stock_data(start_date, end_date, freq)[index3]['prices'])
daily_gdaxi_data = clean_stock_data(gdaxi_financials
.get_historical_stock_data(start_date, end_date, freq)[index4]['prices'])
daily_ftse_data = clean_stock_data(ftse_financials
.get_historical_stock_data(start_date, end_date, freq)[index5]['prices'])
daily_hsi_data = clean_stock_data(hsi_financials
.get_historical_stock_data(start_date, end_date, freq)[index6]['prices'])
daily_ks11_data = clean_stock_data(ks11_financials
.get_historical_stock_data(start_date, end_date, freq)[index7]['prices'])
daily_nsei_data = clean_stock_data(nsei_financials
.get_historical_stock_data(start_date, end_date, freq)[index8]['prices'])
stock_hist_data_list = [{'^DJI': daily_dji_data}, {'^GSPC': daily_gspc_data}, {'^IXIC': daily_ixic_data},
{'^GDAXI': daily_gdaxi_data}, {'^FTSE': daily_ftse_data}, {'^HSI': daily_hsi_data},
{'^KS11': daily_ks11_data}, {'^NSEI': daily_nsei_data}]
# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4, data_list5, data_list6, data_list7, data_list8):
data_dict = {}
i = 0
for list_item in data_list2:
if 'type' not in list_item.keys():
data_dict.update({list_item['formatted_date']: {'^DJI': data_list1[i]['close'], '^GSPC': list_item['close'],
'^IXIC': data_list3[i]['close'], '^GDAXI': data_list4[i]['close'],
'^FTSE': data_list5[i]['close'], '^HSI': data_list6[i]['close'],
'^KS11': data_list7[i]['close'], '^NSEI': data_list8[i]['close']}})
i += 1
tseries = pd.to_datetime(list(data_dict.keys()))
df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
columns=['^DJI', '^GSPC', '^IXIC', '^GDAXI', '^FTSE', '^HSI', '^KS11', '^NSEI']).sort_index()
return df
Your problem is your datetime stamps are in the wrong format. If you look at the error code it clugely tells you:
datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
Notice the int(in_date) part?
It wants the unix timestamp. There are several ways to get this, out of the time module or the calendar module, or using Arrow.
import datetime
import calendar
date = datetime.datetime.strptime("1993-06-30", "%Y-%m-%d")
start_date = calendar.timegm(date.utctimetuple())
* UPDATED *
OK so I fixed up to the dataframes portion. Here is my current code:
# Select Tickers and stock history dates
index = {'DJI' : YahooFinancials('^DJI'),
'GSPC' : YahooFinancials('^GSPC'),
'IXIC':YahooFinancials('^IXIC'),
'GDAXI':YahooFinancials('^GDAXI'),
'FTSE':YahooFinancials('^FTSE'),
'HSI':YahooFinancials('^HSI'),
'KS11':YahooFinancials('^KS11'),
'NSEI':YahooFinancials('^NSEI')}
freq = 'daily'
start_date = '1993-06-30'
end_date = '2018-06-30'
# Clean returned stock history data and remove dividend events from price history
daily = {}
for k in index:
tmp = index[k].get_historical_stock_data(start_date, end_date, freq)
if tmp:
daily[k] = tmp['^{}'.format(k)]['prices'] if 'prices' in tmp['^{}'.format(k)] else []
Unfortunately I had to fix a couple things in the yahoo module. For the class YahooFinanceETL:
#staticmethod
def format_date(in_date, convert_type):
try:
x = int(in_date)
convert_type = 'standard'
except:
convert_type = 'unixstamp'
if convert_type == 'standard':
if in_date < 0:
form_date = datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=in_date)
else:
form_date = datetime.datetime.fromtimestamp(int(in_date)).strftime('%Y-%m-%d')
else:
split_date = in_date.split('-')
d = date(int(split_date[0]), int(split_date[1]), int(split_date[2]))
form_date = int(time.mktime(d.timetuple()))
return form_date
AND:
# private static method to scrap data from yahoo finance
#staticmethod
def _scrape_data(url, tech_type, statement_type):
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
script = soup.find("script", text=re.compile("root.App.main")).text
data = loads(re.search("root.App.main\s+=\s+(\{.*\})", script).group(1))
if tech_type == '' and statement_type != 'history':
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"]
elif tech_type != '' and statement_type != 'history':
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"][tech_type]
else:
if "HistoricalPriceStore" in data["context"]["dispatcher"]["stores"] :
stores = data["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
else:
stores = data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"]
return stores
You will want to look at the daily dict, and rewrite your build_data_frame function, which it should be a lot simpler now since you are working with a dictionary already.
I am actually the maintainer and author of YahooFinancials. I just saw this post and wanted to personally apologize for the inconvenience and let you all know I will be working on fixing the module this evening.
Could you please open an issue on the module's Github page detailing this?
It would also be very helpful to know which version of python you were running when you encountered these issues.
https://github.com/JECSand/yahoofinancials/issues
I am at work right now, however as soon as I get home in ~7 hours or so I will attempt to code a fix and release it. I'll also work on the exception handling. I try my best to maintain this module, but my day (and often night time) job is rather demanding. I will report back with the final results of these fixes and publish to pypi when it is done and stable.
Also if anyone else has any feedback or personal fixes made you can offer, it would be a huge huge help in fixing this. Proper credit will be given of course. I am also in desperate need of contributers, so if anyone is interested in that as well let me know. I am really wanting to take YahooFinancials to the next level and have this project become a stable and reliable alternative for free financial data for python projects.
Thank you for your patience and for using YahooFinancials.
I've got a script that grabs every event off of a Google Claendar, and the searches through those events, and prints the ones that contain a search term to a file.
The problem I'm having is that I need them to be put in order by date, and this doesn't seem to do that.
while True:
events = calendar_acc.events().list(calendarId='myCal',pageToken=page_token).execute()
for event in events['items']:
if 'Search Term' in event['summary']:
#assign names to hit, and date
find = event['summary']
date = event['start'][u'dateTime']
#only print the first ten digits of the date.
month = date[5:7]
day = date[8:10]
year = date[0:4]
formatted_date = month+"/"+day+"/"+year
#write a line
messy.write(formatted_date+" "+event['summary']+"\n\n")
I think there is a way to do this with the time module maybe, but I'm not sure. Any help is appreciated.
Just in case anyone else needs to do this. With the help of jedwards.
I ended up creating an empty list: hits
And then appending the ['start']['dateTime'] as an datetime.datetime object,
and ['summary'] to the list for each event that contained my "Search Term". Like so:
hits = []
while True:
events = calendar_acc.events().list(calendarId='My_Calendar_ID', pageToken=page_token).execute()
for event in events['items']:
if "Search Term" in event['summary']:
hits.append((dateutil.parser.parse(event['start']['dateTime']), event['summary']))
page_token = events.get('nextPageToken')
if not page_token:
break
The you just sort the list, and in my case, I cut the datetime object down to just be the date. And then wrote the whole line to a file. But this code just prints it to the console.
hits.sort()
for x in hits:
d = x[0]
date = "%d/%d/%d"%(getattr(d,'month'),getattr(d,'day'), getattr(d,'year'))
final = str(date)+"\t\t"+str(x[1])
print final
Thanks again to jedwards in the comments!
You can return a sorted list (by date ascending) from the API by using the "orderBy" parameter and setting it to "updated".
page_token = None
while True:
events = service.events().list(calendarId=myID, orderBy='updated', pageToken=page_token).execute()
for event in events['items']:
print(event)
page_token = events.get('nextPageToken')
if not page_token:
break
For more information see: https://developers.google.com/calendar/v3/reference/events/list
Hope this helps.