Skip exceptions in python - python

I have a newbie questions:
let say I have this list of stock in python
import requests
list = ["AMZN","APPL", "BAC"]
try:
for x in list:
url ='https://financialmodelingprep.com/api/v3/quote-short/'+x+'?apikey=demo'
response = requests.request('GET', url)
result = response.json()
print(result[0]["price"])
except:
pass
the second ticker will throw an exceptions, how do I make python to run the third ticker no matter what happen to the second ticker requests?

Use try-except inside for loop like below
import requests
list = ["AMZN","APPL", "BAC"]
for x in list:
try:
url ='https://financialmodelingprep.com/api/v3/quote-short/'+x+'?apikey=demo'
response = requests.request('GET', url)
result = response.json()
print(result[0]["price"])
except:
pass

You can use continue
import requests
list = ["AMZN","APPL", "BAC"]
for x in list:
try:
url ='https://financialmodelingprep.com/api/v3/quote-short/'+x+'?apikey=demo'
response = requests.request('GET', url)
result = response.json()
print(result[0]["price"])
except:
continue

Related

Index error: list index out of range - How to skip a broken URL?

How can I tell my program to skip broken / non-existent URLs and continue with the task? Every time I run this, it will stop whenever it encounters a URL that doesn't exist and gives the error: index error: list index out of range.
The range is URL's between 1 to 450, but there are some pages in the mix that are broken (for example, URL 133 doesn't exist).
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
df = pd.DataFrame()
for id in range (1, 450):
url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data = json.loads(s)
data = json_normalize(data)
matsit = pd.DataFrame(data)
df = pd.concat([df, matsit], axis=0)
df.to_csv("matsit.csv", index=False)
I would assume your index error comes from the line of code with the following statement:
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
You could solve it like this:
try:
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
except IndexError as IE:
print(f"Indexerror: {IE}")
continue
If the error does not occur on the line above, just catch the exception on the line where the index error is occuring. Alternatively you can also just catch all exceptions with
try:
code_where_exception_occurs
except Exception as e:
print(f"Exception: {e}")
continue
but I would recommend to be as specific as possible, so that you handle all expected errors in the appropriate way.
In the example above replace code_where_exception_occurs with the code. You could also put the try/except clause around the whole block of code inside the for loop, but it is best to catch all exeptions individually.
This should also work:
try:
url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data = json.loads(s)
data = json_normalize(data)
matsit = pd.DataFrame(data)
df = pd.concat([df, matsit], axis=0)
except Exception as e:
print(f"Exception: {e}")
continue
Main issue is that you get a 204 error (e.g.: https://liiga.fi/api/v1/shotmap/2022/405) for some of the urls, so simply use if-statement to check and handle this:
for i in range (400, 420):
url = f"https://liiga.fi/api/v1/shotmap/2022/{i}"
r=requests.get(url)
if r.status_code != 200:
print(f'Error occured: {r.status_code} on url: {url}')
#### log or do what ever you like to do in case of error
else:
data.append(pd.json_normalize(r.json()))
Note: As already mentioned in https://stackoverflow.com/a/73584487/14460824 there is no need to use BeautifulSoup, use pandas directly instead to keep your code clean
Example
import requests, time
import pandas as pd
data = []
for i in range (400, 420):
url = f"https://liiga.fi/api/v1/shotmap/2022/{i}"
r=requests.get(url)
if r.status_code != 200:
print(f'Error occured: {r.status_code} on url: {url}')
else:
data.append(pd.json_normalize(r.json()))
pd.concat(data, ignore_index=True)#.to_csv("matsit", index=False)
Output
Error occured: 204 on url: https://liiga.fi/api/v1/shotmap/2022/405

How to get all UniProt results as tsv for query

I'm looking for a programmatic way to get all the Uniprot ids and sequences (Swiss-Prot + TrEMBL) for a given protein length but if I run my query I get only first 25 results. Is there any way to run a loop to get them all?
My code:
import requests, sys
WEBSITE_API = "https://rest.uniprot.org"
# Helper function to download data
def get_url(url, **kwargs):
response = requests.get(url, **kwargs);
if not response.ok:
print(response.text)
response.raise_for_status()
sys.exit()
return response
r = get_url(f"{WEBSITE_API}/uniprotkb/search?query=length%3A%5B100%20TO%20109%5D&fields=id,accession,length,sequence", headers={"Accept": "text/plain; format=tsv"})
with open("request.txt","w") as file:
file.write(r.text)
I ended up with a different url which seems to be working to give back more/all results. It was taking a long time to return all ~5M, so I've restricted it to just 10,000 right now
import requests
import pandas as pd
import io
# Helper function to download data
def get_url(url, **kwargs):
response = requests.get(url, **kwargs);
if not response.ok:
print(response.text)
response.raise_for_status()
sys.exit()
return response
#create the url
base_url = 'https://www.uniprot.org/uniprot/?query=length:[100%20TO%20109]'
additional_fields = [
'&format=tab',
'&limit=10000', #currently limiting the number of returned results, comment out this line to get all output
'&columns=id,accession,length,sequence',
]
url = base_url+''.join(additional_fields)
print(url)
#get the results as a large pandas table
r = get_url(url)
df = pd.read_csv(io.StringIO(r.text),sep='\t')
df
Output:

MODUL grequests, How GET simple print URL and Response?

anyone can please explain me how i can split results for get just simple url and response?
I have try so many time but nothing, for now i can print just like:
50
0.4110674999999999
........, [<Response [200]>], [<Response [200]>], [<Response [200]>]]
[......, ['http://example.com.com/catalogue/page-48.html'], ['http://example.com.com/catalogue/page-49.html'], ['http://example.com.com/catalogue/page-50.html']]
I need like
<Response [200]>
https://example.com/
Thanks so much.
Ps. Also why after installing module grequests I get this message on the console
C:\P3\lib\site-packages\grequests.py:22: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (C:\\P3\\lib\\site-packages\\urllib3\\util\\ssl_.py)', 'urllib3.util (C:\\P3\\lib\\site-packages\\urllib3\\util\\__init__.py)'].
curious_george.patch_all(thread=False, select=False)
How I can fix it ? Uninstall complete python, install some patch or what ?
Thanks!
import grequests
from bs4 import BeautifulSoup
import time
def get_urls():
urls = []
for x in range(1,51):
urls.append(f'http://books.toscrape.com/catalogue/page-{x}.html')
return urls
def get_data(urls):
reqs = [grequests.get(link) for link in urls]
resp = grequests.map(reqs)
return resp
if __name__ == '__main__':
start = time.perf_counter()
urls = get_urls()
url = len(get_urls())
resp = get_data(urls)
respo = len(get_data(urls))
fin = time.perf_counter() - start
resp_list = resp
chunked_resp = list()
chunk_size = respo
urls_list = urls
chunked_url = list()
chunk_size = url
print(urls)
print(url)
print(resp)
print(respo)
print(fin)
resp_list = resp
chunked_resp = list()
chunk_size = 1
for i in range(0, len(resp_list), chunk_size):
chunked_resp.append(resp_list[i:i+chunk_size])
print(chunked_resp)
urls_list = urls
chunked_url = list()
chunk_size = 1
for i in range(0, len(urls_list), chunk_size):
chunked_url.append(urls_list[i:i+chunk_size])
print(chunked_url)
OK i have get a solution only for print url:
def get_data(urls):
reqs = [grequests.get(link) for link in urls]
resp = grequests.get(reqs)
return resp
if __name__ == '__main__':
start = time.perf_counter()
urls = get_urls()
resp = get_data(urls)
resp = '\n'.join(resp)
url = '\n'.join(urls)
http://books.toscrape.com/catalogue/page-48.html
http://books.toscrape.com/catalogue/page-49.html
http://books.toscrape.com/catalogue/page-50.html
resp = '\n'.join(resp)
TypeError: can only join an iterable
But for a resp i get TypeError: can only join an iterable
Ps. i am started learn python max 1 month... :(

Why does my for loop only check the last item?

I've built a async'ed program thats going to check if a element exists on multiple paths of a website.
The program has a base url, that will get different paths of the domain to check, which are located in a json file (name.json).
If the element I'm looking for exists the program should print out "1". But I've quickly realised that it only chooses to check the last item in the json list.
import json
import grequests
from bs4 import BeautifulSoup
idlist = json.loads(open('name.json').read())
baseurl = 'https://steamcommunity.com/id/'
for uid in idlist:
fullurl = baseurl + uid
rs = (grequests.get(fullurl) for uid in idlist)
resp = grequests.map(rs)
for r in resp:
soup = BeautifulSoup(r.text, 'lxml')
if soup.find('span', class_='actual_persona_name'):
print('1')
else:
print('2')
The json file just consist of a random array to test the program.
["xyz",
"sdasda9229",
"sdasda923229",
"sda",
"sda",
"sda",
"sd2",
"aaaaaa",
"aaaaaaaaa",
"aa2092425",
"aaaa23917"]
After appending the id to the base url, it's not getting stored. You have to store it and pass the complete urls while constructing the get request
complete_urls = []
for uid in idlist:
fullurl = baseurl + uid
complete_urls.append(fullurl)
rs = (grequests.get(fullurl) for fullurl in complete_urls)

Error when using Requests with python 3.5 recursively (GooglePlaces API)

I have been having a problem where I try to send a get request and if there is a next page token in the result it will then take that link and execute another request recursively until there is no next page token in the result.
The first request works fine but when there is a next page token in the response and it tries to execute the new request the result is an Invalid ReSponse but if I take the link that was given from the result and use it in postman or on my browser everything is fine.
I'm assuming it has something to requests running on different threads at the same time.
The second response from request using Python:
{'html_attributions': [], 'status': 'INVALID_REQUEST', 'results': []}
Here is what I have:
import requests
def getPlaces(location,radius,type, APIKEY):
url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="+location+"&radius="+radius+"&type="+type+"&key="+APIKEY
print('Getting results for type ' + type + '...')
r = requests.get(url)
response = r.json()
results = []
if response['status'] == 'ZERO_RESULTS':
print("Did not find results for the type "+type)
else:
print("Results for type "+type)
for result in response['results']:
results.append(result)
print(result)
print('Printing results')
print(results)
if 'next_page_token' in response:
print("There is a next page")
page_token = response['next_page_token']
print(page_token)
next_results = getNextPlace(page_token,APIKEY)
print(next_results)
results.append(next_results)
return results
# Get the rest of the results
def getNextPlace(page_token,APIKEY):
print('...')
next_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+location+'&radius='+radius+'&type='+type+'&pagetoken=' + page_token + '&key=' + APIKEY
print(next_url)
r = requests.get(next_url)
response = r.json()
results = []
print(response)
if response['status'] == 'ZERO_RESULTS':
print("Did not find results")
elif response['status'] == 'INVALID_REQUEST':
print('Invalid response')
else:
for next_result in response['results']:
results.append(next_result)
print(next_result)
if 'next_page_token' in response:
new_page_token = response['next_page_token']
getNext = getNextPlace(new_page_token,APIKEY)
results.append(getNext)
return results
Figured out the issue!
Google API doesn't allow consecutive requests to its API if the last request was within ~2 seconds.
What I did have I just had the program sleep for 3 seconds and the sent the request.
Now everything is working fine
What you are trying to do can be seen in one function like:
def getPlaces(location,radius,API,i,type):
url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="+location+"&radius="+radius+"&key="+API+"&types="+type
r = requests.get(url)
response = r.json()
results = []
for result in response['results']:
results.append(result)
l=[]
while True:
if 'next_page_token' in response:
page_token = response['next_page_token']
l.append(page_token)
next_url = url+'&pagetoken='+l[i]
i=i+1
time.sleep(3)
r = requests.get(next_url)
response = r.json()
for next_result in response['results']:
results.append(next_result)
else:
break
return results
Your code print "invalid response" because response['status'] == 'INVALID_REQUEST', so it is google api service think your url request is invalid.
As this document says, the parameter location, radius, type and key is required, and the pagetoken is optional. So your second request url is invalid because it does not have the all required key.
Maybe you should try change the url to :
next_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+location+"&radius="+radius+"&type="+type+"&key="+APIKEY + "&pagetoken=" + page_token

Categories