Issues dynamically changing Python 'Requests' header to iterate through API URL endpoints - python

My issue is as follows:
I am attempting to pull down a list of all email address entries from an API. The data set is so large that it spans multiple API 'pages' with unique URLs. The page number can be specified as a parameter in the API request URL. I wrote a loop to try and collect email information from an API page, add the email addresses to a list, add 1 to the page number, and repeat the process up to 30 pages. Unfortunately it seems like the loop is only querying the same page 30 times and producing duplicates. I feel like I'm missing something simple (beginner here) but please let me know if anyone can help. Code is below:
import requests
import json
number = 1
user_list = []
parameters = {'page': number, 'per_page':50}
response = requests.get('https://api.com/profiles.json', headers=headers, params=parameters)
while number <=30:
formatted_data = response.json()
profiles = formatted_data['profiles']
for dict in profiles:
user_list.append(dict['email'])
number = number + 1
print(sorted(user_list))

In Python, numbers and strings are passed by value, not by reference.
That means you need to update dictionary after every iteration.
You also need to place requests.get() inside your loop to get different results
import json
number = 1
user_list = []
parameters = {'page': number, 'per_page':50}
while number <=30:
response = requests.get('https://api.com/profiles.json', headers=headers, params=parameters)
formatted_data = response.json()
profiles = formatted_data['profiles']
for dict_ in profiles: # try to avoid using keywords for your variables
user_list.append(dict_['email'])
number = number + 1
parameters['page'] = number
print(sorted(user_list))

Related

Search query keeps returning 0 for google maps API query

I'm trying to find the number of museums in each city in the UK by using the Google maps API. I keep getting a 0 search result with the following code. I thought it might be because I didn't enable billing on my Google Maps projects but I enabled billing and it still didn't work. Then I created a new API key and that didn't work either. Here is my code:
import requests
import json
api_key = ''
query = 'museums'
location = '51.509865,0.1276' # lat,lng of London
radius = 10000 # search radius in meters
url = f'https://maps.googleapis.com/maps/api/place/textsearch/json?query={query}&location={location}&radius={radius}&key={api_key}'
#url = f'https://maps.googleapis.com/maps/api/place/textsearch/json?query={query}&key={api_key}'
response = requests.get(url)
data = json.loads(response.text)
# retrieve the number of results
num_results = len(data['results'])
print(f'Number of results for "{query}" in "{location}": {num_results}')
I'm also open to trying a different method or package if that works.
And what it returns:
Number of results for "museum" in "51.509865,0.1276": 0

How do I get more than 500 codereviews with Gerrit REST Api?

I'm writing a python script to generate how many changes was made within a timeframe for all projects, but when I use the Gerrit REST Api I can only get up to maximum of 500 unique users, and I want to see all of them, even if I take long timeframe (1 year Gerrit picture). This is my function for the API
def requestAPICall(url):
"""
does API stuff
"""
response = requests.get(url)
if response.status_code == 200:
JSON_response = json.loads(response.text[4:])
generateJSON(JSON_response)
return (JSON_response, True)
print("Error Occured")
return (response, False)
This is the link I used for the request in this case
https://chromium-review.googlesource.com/changes/?q=since:%222022-01-01%2011:26:25%20%2B0100%22+before:%222023-01-01%2011:31:25%20%2B0100%22
I have tried curl commands but I do not know if that works
There is a default limit on the number of returned items, and if you're making anonymous queries I don't believe you can change this. From the documentation:
The query string must be provided by the q parameter. The n parameter can be used to limit the returned results. The no-limit parameter can be used remove the default limit on queries and return all results (does not apply to anonymous requests). This might not be supported by all index backends.
However, you can return paginated resulted using the start parameter:
If the number of changes matching the query exceeds either the internal limit or a supplied n query parameter, the last change object has a _more_changes: true JSON field set.
The S or start query parameter can be supplied to skip a number of changes from the list.
So if the final result sets _more_changes: true, you can make a subsequent request using the start parameter.
That means your Python code is going to look something like:
import json
import requests
import sys
class Gerrit:
"""Wrap up Gerrit API functionality in a simple class to make
it easier to consume from our code. This limited example only
supports the `changes` endpoint.
See https://gerrit-review.googlesource.com/Documentation/rest-api.html
for complete REST API documentation.
"""
def __init__(self, baseurl):
self.baseurl = baseurl
def changes(self, query, start=None, limit=None, options=None):
"""This implements the API described in [1].
[1]: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html
"""
params = {"q": query}
if start is not None:
params["S"] = start
if limit is not None:
params["n"] = limit
if options is not None:
params["o"] = options
res = requests.get(f"{self.baseurl}/changes", params=params)
print(f"fetched [{res.status_code}]: {res.url}", file=sys.stderr)
res.raise_for_status()
return json.loads(res.text[4:])
# And here is an example in which we use the Gerrit class to perform a
# query against https://chromium-review.googlesource.com. This is similar
# to the query in your question, but using a constrained date range in order
# to limit the total number of results.
g = Gerrit("https://chromium-review.googlesource.com")
all_results = []
start = 0
while True:
res = g.changes(
'since:"2022-12-31 00:00:00" before:"2023-01-01 00:00:00"',
limit=200,
start=start,
)
if not res:
break
all_results.extend(res)
if not res[-1].get("_more_changes"):
break
start += len(res)
# Here we're just dumping all the results as a JSON document on
# stdout.
print(json.dumps(all_results))
This demonstrates how to use limit to control the number of queries returned in a "page", and the start parameter to request additional pages of results.
But look out! The example query here includes only a couple days and returns over 3000 results; I suspect that any attempt to fetch a year's worth of data, particularly with an anonymous connection, are going to run into some sort of server rate limits.

Python - save multiple responses from multiple requests

I am pulling JSON data from an api and I am looking to pass in a different parameter for each request and save each response
My current code
# create an empty list to store each account id
accounts = []
##store in accounts list every id
for each in allAccounts['data']:
accounts.append((each['id']))
#for each account , call a new account id for the url
for id in accounts:
urlAccounts = 'https://example.somewebsite.ie:000/v12345/accounts/'+id+'/users'
I save a response and print out the values.
accountReq = requests.get(urlAccounts, headers=headers)
allUsers = accountReq.json()
for each in allUsers['data']:
print(each['username']," " +each['first_name'])
This is fine and it works but I only store the first ID's response.
How do I store the responses from all of the requests?
So I'm looking to send multiple requests where the ID changes every time and save each response essentially.
I'm using python version 3.10.4 .
My code for this in case anyone stumbles across this.
# list of each api url to use
link =[]
#for every id in the accounts , create a new url link into the link list
for i in accounts:
link.append('https://example.somewebsite.ie:000/v12345/accounts/'+i+'/users')
#create a list with all the different requests
accountReq = []
for i in link:
accountReq.append(requests.get(i, headers=headers).json())
# write to a txt file
with open('masterSheet.txt', 'x') as f:
#for every request
for each in accountReq:
#get each accounts data
account = each['data']
#for each accounts data
#get each users email and names
for data in account:
sheet=(data['username']+" "+" ",data['first_name'],data['last_name'])
f.write(str(sheet)+"\n")

how to get a json that varais each request by it's number

I made a request to Instagram v1 API it gives back the response in JSON like this
The JSON data on pastebin.com
I noticed that I can get the number of IDs and the IDs by :
IDs = response['reels'][ide]["media_ids"]
count=response['reels'][ide]["media_count"]
I don't know where I can use these IDs to help extract the stories URL
I don't know how to use it to get the media URLs cause it changes with the number of stories
also if there is another way to extract it, it may solve my problem
the "url" key is not unique it's used in other values
Assuming the "media URLs" are the values associated with a key "url" then you can just do this:
import json
def print_url(jdata):
if isinstance(jdata, list):
for v in jdata:
print_url(v)
elif isinstance(jdata, dict):
if (url := jdata.get('url')):
print(url)
else:
print_url(list(jdata.values()))
with open('instagram.json', encoding='utf-8') as data:
print_url(json.load(data))

How to add multiple responses into a single json object

I am making some request to another site from my flask api. Basically my flask api is a proxy. So initially I substitute the parameters with the known company id and and get all the workers id. Given the workers id, I try to make another request which helps me get all their details. However with the code below I am only getting the last response which means only the details of the last worker. You can ignore the j==1 for now I did it for testing purposes.
tempDict={}
updateDic={}
dictToSend={}
j=0
#i = companyid
#id=workerid
# I make several calls to url2 depending on the number of employee ids in number
for id in number:
url2="someurl/" + str(i)+ "/contractors/"+str(id)
r = requests.get(url2, headers={'Content-type': 'application/json',"Authorization":authenticate,'Accept': 'application/json'})
print("id"+str(id))
print(url2)
loadJsonResponse2=json.loads(r.text)
print(loadJsonResponse2)
key = i
tempDict.update(loadJsonResponse2)
# I want to have all of their details and add the company number before
print(tempDict)
if(j==1):
dictToSend[key]=tempDict
return jsonify(dictToSend)
j=j+1
return jsonify(dictToSend)
So I have all the workers ids and I request the other url to get all their details. The response is in json format. However I am only getting the last response with the above code. I did something like j==1 because I wanted to check the return.
dictToSend[key]=tempDict
return jsonify(dictToSend)
The key is the company id so that I can identify which company the worker is from.
How can I get to concatenate all the json responses and at the end add a key like "5":{concatenation of all json requests}
Thank you,
Your key for json object is
#i = companyid
.
.
.
key = i
.
.
.
# You are adding all your responses to companyid,
# better make a key with companyid and workerid
# key = str(companyid) + ":" + str(workerid)
dictToSend[key]=tempDict
And here
# you may not need this, since there is already a loop iterating on workerid
if(j==1):
dictToSend[key]=tempDict
return jsonify(dictToSend)
j=j+1
# then only useful line would be
dictToSend[key]=tempDict

Categories