Python Script skipping for loop - python

I am trying to iterate through some json data, but the information is found on several pages. I don't have a problem working through the first page, however it will just skip over the next set. The weird thing is, it will execute fine while in debug mode. I'm guessing its a timing issue while working with the json loads, but I tried putting sleep timers around that code and the issue persisted.
url = apipath + query + apikey
response = requests.get(url)
data = json.loads(response.text)
for x in data["results"]:
nameList.append(x["name"])
latList.append(x["geometry"]["location"]["lat"])
lonList.append(x["geometry"]["location"]["lng"])
pagetoken = "pagetoken=" + data["next_page_token"]
url = apipath + pagetoken + apikey
response = requests.get(url)
data = json.loads(response.text)
for x in data["results"]:
nameList.append(x["name"])
latList.append(x["geometry"]["location"]["lat"])
lonList.append(x["geometry"]["location"]["lng"])

I would venture to guess that data["results"] equates to a None value and therefore calling for x in None: would result in the program skipping your for loop. Have you tried putting a print above the for loop? Perhaps try print(data["results"]) before going into your loop to ensure the data you want exists. If that returns None then maybe try just print(data) and see what the program is reading.

Well it did end up being a timing issue. I placed a 2 second timer before the second request and it now will load the data just fine. I guess Python just couldn't keep up.

Related

How do you loop through api with list of parameters and store resulting calls in one dataframe

I'm trying to loop a list of match ids (LMID5) as parameters for api calls. I think I have the looping the API calls correct as it prints the urls but I'm struggling to store the results every time in the same dataframe.
The results of the API come through in JSON. Which I then normalise into a DF.
When just using one parameter to call api this is how I code it and create a df.
responsematchDetails = requests.get(url = matchDetails)
dfLM = pd.json_normalize(responseleagueMatches.json()['data'])
The issue is when trying to loop through a list of parameters and trying to store in one df. The below code is what I have wrote to try loop many calls to API using parameters from a list, but I'm struggling to store the data each time.
for i in list(LMID5):
url = 'https://api.football-data-api.com/match?key=&match_id=' + str(i)
rm = requests.get(url)
print(url)
for match in pd.json_normalize(rm.json()["data"]):
dfMatchDetails = dfMatchDetails.append({[match]
}, ignore_index=True)
Can you try this:
dfMatchDetails=pd.DataFrame()
for i in list(LMID5):
url = 'https://api.football-data-api.com/match?key=&match_id=' + str(i)
rm = requests.get(url)
print(url)
dfMatchDetails=pd.concat([dfMatchDetails,pd.json_normalize(rm.json()['data'])])

How can I speed up get request, if what is a faster method?

I have some code inside of an app that is slowing me down wayyy too much, and it's a simple
'get' function...
This portion of the code is just finding the location of the PDF on the internet, then extracting it. I thought it was the extraction process that was taking so long, but after some testing, I believe it's the 'get' request. I am passing a variable into the URL because there are many different PDFs that the user can indirectly select. I have tried to use kivy's Urlrequest but I honestly can't get my head around getting a result frim it. I have heard it is faster though. I have another 2 'post' sessions in different functions that work 10 times faster than this one, so not sure what the issue is...
The rest of my program is working just fine, it's just this which is adding sometimes upwards of 20-25 seconds onto load times (which is unreasonable).
I will include a working extract of the problem below for you to please try.
I have found on it's first attempt at an "airport_loc" it is the slowest, please try swapping out the airport_loc variable with some of these examples:
"YPAD"
"YMLT"
"YPPH"
What can I do different here to speed it up or simply make it more efficient?
import requests
from html2text import html2text
import re
s = requests.session()
page = s.get('https://www.airservicesaustralia.com/aip/pending/dap/AeroProcChartsTOC.htm')
text = html2text(page.text)
airport_loc = "YSSY"
finding_airport = (re.search(r'.%s.' % re.escape(airport_loc), text)).group()
ap_id_loc = int(text.index(finding_airport)) + 6
ap_id_onward = text[ap_id_loc:]
next_loc = re.search(r'[(]Y\w\w\w[)]', ap_id_onward)
next_loc_stop = text.index(next_loc.group())
ap_id_to_nxt_ap = text[ap_id_loc:next_loc_stop]
needed_text = (html2text(ap_id_to_nxt_ap))
airport_id_less_Y = airport_loc[1:]
app_1 = re.search(r'%sGN.*' % re.escape(airport_id_less_Y), needed_text)
app_2 = re.search(r'%sII.*' % re.escape(airport_id_less_Y), needed_text)
try:
if app_2.group():
line_of_chart = (app_2.group())
except:
if app_1.group():
line_of_chart = (app_1.group())
chart_title = (re.search(r'\w\w\w\w\w\d\d[-]\d*[_][\d\w]*[.]pdf', line_of_chart)).group()
# getting exact pdf now
chart_PDF = ('https://www.airservicesaustralia.com/aip/pending/dap/' + chart_title)
retrieve = s.get(chart_PDF)
content = retrieve.content
print(content)
# from here on is working fine.
I haven't included the code following this because it's not really relevant I think.
Please help me speed this thing up :(
It still takes 3 seconds to me with just your code.
latency might come from server.
to make request little faster, I try to edit HTTP adapter like this.
s.mount('http://', requests.adapters.HTTPAdapter(max_retries=0))
retrieve = s.get(chart_PDF)
It shows little improvement (3sec -> 2sec)
But have a risk for failure.
using "asyncio" or other async http library is more better ways

how can I make the process of checking urls faster?

I have to check 1,000,000 URLs as a part of a project and they have randomly been selected to either be valid or invalid. I have written the code and it's working and stuff, but I was wondering if there is a way I can make this better i.e. faster/more efficient.
I don't know much about this world of efficiency, but I have heard the word multithreading thrown around, would that help and how do I do that?
import requests
# url = "http://*******/(number from 1 to 1,000,000)"
available_numbers = []
for i in range(1,1000000):
url = f"http://*************/{i}"
data = requests.get(url)
if data.status_code == 200:
available_numbers.append(i)
print(available_numbers)

Python - Thread, Sending 20 request - First to arrive, First to serve?

So I was thinking to use some request to send etc 20 request to a site and the first one to serve a value from one of those site should use that value and continue my code basically. So whenever there is a value inside a String or whatever then just continue the code. However I got stuck. What I have done so far is that I have been able to send only one request:
my_key = load["MyKey"] # My own key for website.
website_key = webkey
url = webUrl
Myclient = MyClient(my_key)
task = Task(url, website_key)
values = client.createTask(task)
value.join()
value = values.get_response()
print(value)
So basically with .join is searches for the value from the website and then return it as a get_response whenever its ready. However when I do this, It will only do one and then end.
And what I want to do is basically to send like etc 25 of them and then whenever hits the value first then end the other one and continue or end the program pretty much.
What would be the best solution for that?

that's too slow to get my wall from API graph

i want to know how much time(secend) do i need to get my whole facebook wall from json(graph api)
it takes about 190 seconds to get my whole wall's post (maybe 2000 posts and 131pages(json))
follow is python code. that code is just reading the posts.
is there any problem in my code? and should i cut my response time?
accessToken = "Secret"
requestURL = "https://graph.facebook.com/me/feed?access_token="+accessToken
beforeSec = time.time()*1000
pages = 1
while 1:
read = urllib.urlopen(requestURL).read()
read = json.loads(read)
data = read["data"]
for i in range(0, len(data)):
pass
try:
requestURL = read["paging"]["next"]
pages+=1
except:
break
afterSec = time.time()*1000
print afterSec - beforeSec
It depends offcourse on how big the users wall is ... They have released a new batch function : http://developers.facebook.com/docs/reference/api/batch/
Mayb you can use that?
Your code is synchronous, so you download the pages one by one.
It's very slow, you could download several pages in parallel instead.
Greenlet are the new hype for Python paraller computing, so give a try to gevent.
Well, this is provided you can get the next page before downloading the entire previous page of course. Try to see if you can get the next paging in a quick way.

Categories