This output should be way longer than it is in here.
I start with a GET request, I parse a JSON list and extract the id, which I then call on the second function, that will give me a second ID which then I will use to call on the 3rd function. But, I am only getting one entry whereas I should be getting way more entries.
The code is the following:
from requests.auth import HTTPBasicAuth
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def countries():
data = requests.get("https://localhost:8543/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
return [elem.get("id","") for elem in rep['items']]
def regions():
for c in countries():
url = requests.get("https://localhost:8543/api/netim/v1/countries/{}/regions".format(c), verify=False, auth=HTTPBasicAuth("admin", "admin"))
response = url.json()
return [cid.get("id","") for cid in response['items']]
def city():
for r in regions():
api = requests.get("https://localhost:8543/api/netim/v1/regions/{}/cities".format(r), verify=False, auth=HTTPBasicAuth("admin", "admin"))
resolt = api.json()
return(json.dumps([{"name":r.get("name",""),"id":r.get("id", "")} for r in resolt['items']], indent=4))
city()
print(city())
The output is the following :
[
{
"name": "Herat",
"id": "AF~HER~Herat"
}
]
I should have a huge list, so I am not sure what am I missing?
You need to go through all the iterations of your loop and collect the results, then jsonify the and return them.
data = []
for r in regions():
api = requests.get("https://localhost:8543/api/netim/v1/regions/{}/cities".format(r), verify=False, auth=HTTPBasicAuth("admin", "admin"))
resolt = api.json()
data.extend([{"name":r.get("name",""),"id":r.get("id", "")} for r in resolt['items']])
return json.dumps(data, indent=4)
This would be a fix for city() but you have the same problem in all your functions. return immediately exits the function and does not do anything else, effectively all your for loops are doing 1 iteration.
I'll update my example here to give you a better idea what's occurring.
Your functions are basically this:
def test_fn():
for i in [1,2,3,4]:
return i
# output:
1
# We never see 2 or 3 or 4 because we return before looping on them.
What you want:
def test_fn():
results = []
for i in [1,2,3,4]:
results.append(i)
return results
# output
[1,2,3,4]
It seems like you understand that the for loop is going to take some action once for each element in the list. What you're not understanding is that return ends the function NOW. No more for loop, no more actions, and in your code, you immediately return inside the for loop, stopping any further action.
Related
What am I doing wrong? I have an extractor that works great but writing the test is stumping me and it's failing. Can anyone help me figure out where I'm going wrong?
from unittest.mock import MagicMock, patch
import pandas as pd
import requests
from my_project.task import extractor
from my_project.tests import utils
from prefect.logging import disable_run_logger
CONTACT_RECORD = utils.TEST_CONTACT_RECORD
PAGED_CONTACT_RECORD = utils.TEST_PAGED_CONTACT_RECORD
EXPECTED_CONTACT_RECORD = utils.EXPECTED_CONTACT_RECORD
#patch("requests.get")
def test_contact_extractor(get: MagicMock):
"""
Should call "request.get" once and return a json
containing contact data.
"""
get.return_value.json.return_value = CONTACT_RECORD
with disable_run_logger():
result = extractor.get_contacts()
assert get.call_count == 1
assert result == pd.DataFrame(EXPECTED_CONTACT_RECORD)
#patch("my_project.extractor.get_contacts")
def test_get_paged_contacts(get_contacts: MagicMock):
"""
Should run "requests.get" until ['has-more'] is False
and there is no offset value.
"""
get_contacts.return_value.json.side_effect = [
PAGED_CONTACT_RECORD,
PAGED_CONTACT_RECORD,
PAGED_CONTACT_RECORD,
CONTACT_RECORD,
]
with disable_run_logger():
data = extractor.get_paged_contacts(
endpoint=MagicMock, query_string=MagicMock, df=MagicMock
)
assert get_contacts.call_count == 4
assert data == pd.DataFrame(EXPECTED_CONTACT_RECORD)
Some errors I'm getting are:
requests imported but not used
callable[[Union[str,btyes],....], Response] has no attribute "return_value"
EDIT:
No longer getting the second error because I realized I had a typo, but currently getting:
AttributeError: 'NoneType' object has no attribute 'client'
Edit:
Here is my get_paged_data() function:
def get_paged_contacts(
endpoint: str, query_string: typing.Dict[str, typing.Any], df: pd.DataFrame
) -> pd.DataFrame:
"""
Return the results of the get request.
Loops over api response and appends the results of a while loop for pagination, then
merges the results with the previously extracted dataframe.
"""
url = endpoint
contacts = []
response = requests.request("GET", url, headers=header, params=query_string).json()
has_more = response["has-more"]
offset = response["vid-offset"]
while has_more is True:
querystring = {"limit": "100", "archived": "false", "offset": offset}
try:
response = requests.request(
"GET", url, headers=header, params=querystring
).json()
time.sleep(10)
except (requests.exceptions.ConnectionError, json.decoder.JSONDecodeError) as j:
logger.error(f"Error occurred: {j}.")
break
for x in range(len(response["contacts"])):
contacts.append(response["contacts"][x])
contacts = json_normalize(contacts)
merged = pd.concat([df, contacts])
return merged
After checking the edited question, here is a possible approach. The code under test could be the following:
def get_paged_contacts(endpoint: str,
query_string: typing.Dict[str, typing.Any],
df: pd.DataFrame) -> pd.DataFrame:
"""
Return the results of the get request.
Loops over api response and appends the results of a while loop
for pagination, then merges the results with the previously
extracted dataframe.
"""
url = endpoint
contacts = []
response = requests.request("GET", url,
headers=header,
params=query_string).json()
has_more = response["has-more"]
offset = response["vid-offset"]
# Get the contacts coming from the first response
contacts.extend(response['contacts'])
while has_more:
querystring = {"limit": "100",
"archived": "false", "offset": offset}
try:
response = requests.request("GET", url,
headers=header,
params=querystring).json()
# Update the looping condition in every response
has_more = response["has-more"]
contacts.extend(response['contacts'])
time.sleep(10)
except (requests.exceptions.ConnectionError, json.decoder.JSONDecodeError) as j:
logger.error(f"Error occurred: {j}.")
break
contacts = pd.json_normalize(contacts)
merged = pd.concat([df, contacts])
# Reset the dataframe index after concatenating
merged.reset_index(drop=True, inplace=True)
return merged
It can be refactored by having all requests inside the while loop, to avoid duplication, but it is not clear how you want to handle the query_string parameter, so I left it as it is. Then, the test code could be something like this:
#patch('my_project.task.extractor.requests.request')
def test_get_paged_contacts(request_mock):
request_mock.return_value.json.side_effect = [
PAGED_CONTACT_RECORD,
PAGED_CONTACT_RECORD,
PAGED_CONTACT_RECORD,
CONTACT_RECORD,
]
expected_df = pd.DataFrame(EXPECTED_CONTACT_RECORD)
input_df = pd.DataFrame()
res = get_paged_contacts('dummy_endpoint', None, input_df)
assert request_mock.call_count == 4
assert_frame_equal(res, expected_df)
The assert_frame_equal function is a utility provided by pandas to check two dataframes for equality. It is particularly useful for unit testing with pandas dataframes. You can check it here. Of course, you need to import it with from pandas.testing import assert_frame_equal
The code below is a sample from my complete program, I tried it to make understandable.
It sends requests to a REST API. It starts with an URL and the number of pages for this specific search and tries to catch the content for each page.
Each page has several results. Each result becomes a FinalObject.
Because there are as many API requests as there are pages, I decided to use multi-threading and the concurrent.futures module.
=> It works but, as I'm new in coding and Python, I still have these 2 questions:
How to use ThreadPoolExecutor sequentially in this case,
Is there a better way to handle multi-threading in this case?
from concurrent.futures import ThreadPoolExecutor
from requests import get as re_get
def main_function(global_page_number, headers, url_request):
# create a list of pages number
pages_numbers_list = [i for i in range(global_page_number)]
# for each page, call the page_handler (MultiThreading)
with ThreadPoolExecutor(max_workers=10) as executor:
for item in pages_numbers_list:
executor.submit(
page_handler,
item,
url_request,
headers
)
def page_handler(page_number, url_request, headers):
# we change the page number in the url request
url_request = change_page(url_request, page_number)
# new request with the new url
result = re_get(url_request, headers=headers)
result = result.json()
# in the result, with found the list of dict in order to create the
# final object
final_object_creation(result['results_list'])
def change_page(url_request, new_page_number):
"to increment the value of the 'page=' attribute in the url"
current_nb_page = ''
start_nb = url_request.find("page=") + len('page=')
while 1:
if url_request[start_nb].isdigit():
current_nb_page = url_request[start_nb]
else:
break
new_url_request = url_request.replace("page=" + current_nb_page,
"page=" + str(new_page_number))
return new_url_request
def final_object_creation(results_list):
'thanks to the object from requests.get(), it builts the final object'
global current_id_decision, dict_decisions
# each item in the results lis should be an instance of the final object
for item in results_list:
# On définit l'identifiant du nouvel objet Decision
current_id_decision += 1
new_id = current_id_decision
# On crée l'objet Décision et on l'ajoute au dico des décisions
dict_decisions[new_id] = FinalObject(item)
class FinalObject:
def __init__(self, content):
self.content = content
current_id_decision = 0
dict_decisions = {}
main_function(1000, "headers", "https://api/v1.0/search?page=0&query=test")
I want to transfer label_count and card_m to my main flask python file. How do I do that? I already tried importing it it didn't work. And if there is any solution to card_m I don't want repeat request so many times
import requests
import json
from itertools import chain
from collections import Counter
url = "https://api.trello.com/1/boards/OIeEN1vG/cards"
query = {
'key': 'e8cac9f95a86819d54194324e95d4db8',
'token': 'aee28b52f9f8486297d8656c82a467bb4991a1099e23db539604ac35954d5633'
}
response = requests.request(
"GET",
url,
params=query
)
data = response.json()
card_labels_string = list(chain.from_iterable([d['labels']for d in data]))
card_labels = [c ["color"] for c in card_labels_string]
label_count = dict((i, card_labels.count(i)) for i in card_labels)
cards = dict(zip([d['name']for d in data],[d['shortLink']for d in data]))
card_m = {}
for key,value in cards.items():
url_card = "https://api.trello.com/1/cards/{}/members".format(value)
res = requests.request(
"GET",
url_card,
params=query
)
names = [f['fullName']for f in res.json()]
card_m.update({key : names})
print(label_count, card_m)
Ok based on you comments i think i can help you out now. So two things you should do to make this as clean as possible and to avoid bugs later on.
Right now your code is in the global scope. You should avoid doing this at cost unless there is literally no other option. So first thing you should do is create a static class for holding this data. Maybe something like this.
class LabelHelper(object):
card_m = {}
label_count = None
#classmethod
def startup(cls):
url = "https://api.trello.com/1/boards/OIeEN1vG/cards"
query = {
'key': 'e8cac9f95a86819d54194324e95d4db8',
'token': 'aee28b52f9f8486297d8656c82a467bb4991a1099e23db539604ac35954d5633'
}
response = requests.request(
"GET",
url,
params=query
)
data = response.json()
card_labels_string = list(chain.from_iterable([d['labels'] for d in data]))
card_labels = [c["color"] for c in card_labels_string]
cls.label_count = dict((i, card_labels.count(i)) for i in card_labels)
cards = dict(zip([d['name'] for d in data], [d['shortLink'] for d in data]))
for key, value in cards.items():
url_card = "https://api.trello.com/1/cards/{}/members".format(value)
res = requests.request(
"GET",
url_card,
params=query
)
names = [f['fullName'] for f in res.json()]
cls.card_m.update({key: names})
#classmethod
def get_data(cls):
return cls.label_count, cls.card_m
Now we need to run that startup method in this class before we start up flask via app.run. So it can look something like this...
if __name__ == '__main__':
LabelHelper.startup()
app.run("your interface", your_port)
Now we have populated those static variables with the data. Now you just need to import that static class in whatever file you want and just call get_data and you will get what you want. So like this...
from labelhelper import LabelHelper
def some_function():
label_count, card_m = LabelHelper.get_data()
FYI in the from import labelhelper being lowercase if cause in general you would name the file containing that class labelhelper.py
What do you mean, "transfer"? If you want to use them in another function, do this:
from main_python import other_function
print(label_count, card_m)
other_function(label_count, card_m)
So I’m using requests python library to make a series of requests ie
Req1 then Req2 then Req 3
Issue is the req1 keeps repeating itself and is not going forward to req2
Any help please
Code
While true:
Try:
session = requests.session()
r = session.get('Url').text #req1
postdata = 'the post data'
myheader = {'the headers'}
n = session.post('Myurl ', data=postdata, headers=myheaders).text #req2
Request keeps repeating the get request
Your indentation could be the problem as only the indented code from the while True loop will be repeated.
This in return would cause the rest of the code to not run as the loop will never end.
Some errors I also noticed were:
After the try: there is no except:
The T in Try: is also uppercased when it should be lowercased
The T in True: should be uppercase as well
The w in While should be lowercased
An proper example would be
while True:
try:
session = requests.session()
r = session.get('https://example.com').text #req1
postdata = {'data': 'data'}
myheader = {'header': 'header'}
n = session.post('https://example.com', data=postdata, headers=myheaders).text #req2
except:
# Some logic for after a error occurs
# ex:
print("Error has occured")
Now this is just nit picking and isn't all that relevant but the point of using requests.session() is to be a faster version of typing out requests so setting it to session is a little redundant.
Python Requests API client has a function that needs to re execute if run unsuccessfully.
Kitten(BaseClient):
def create(self, **params):
uri = self.BASE_URL
data = dict(**(params or {}))
r = self.client.post(uri, data=json.dumps(data))
return r
If ran with
api = Kitten()
data = {"email": "bill#dow.com", "currency": "USD", "country": "US" }
r = api.create(**data)
The issue is whenever you run it, the first time it always returns back the request as GET, even when it it POST. The first time the post is sent, it returns back GET list of entries.
The later requests, second and later, api.create(**data) return back new entries created like they should be.
There is a status_code for get and post
# GET
r.status_code == 200
# POST
r.status_code == 201
What would be better Python way to re execute when status_code is 200, till a valid 201 is returned.
If you know for sure that the 2nd post will always return your expected value, you can use a ternary operator to perform the check a second time:
Kitten(BaseClient):
def create(self, **params):
uri = self.BASE_URL
data = dict(**(params or {}))
r = self._get_response(uri, data)
return r if r.status_code == 201 else self._get_response(uri, data)
def _get_response(uri, data):
return self.client.post(uri, data=json.dumps(data)
Otherwise you can put it in a while loop where the condition is that the status code == 201.