Asynchronous requests inside the for loop in python

Asynchronous requests inside the for loop in python - python

I have this snippet
config = {10: 'https://www.youtube.com/', 5: 'https://www.youtube.com/', 7: 'https://www.youtube.com/',
3: 'https://sportal.com/', 11: 'https://sportal.com/'}
def test(arg):
for key in arg.keys():
requests.get(arg[key], timeout=key)
test(config)
On that way the things are happaning synchronously. I want to do it аsynchronously. I want to iterate through the loop without waiting for response for each address and to go ahead to the next one. And so until I iterate though all addresses in dictionary. Than I want to wait until I get all responses for all addresses and after that to get out of test function. I know that I can do it with threading but I read that with asyncio lyb it can be done better, but I couldn't implement it. If anyone have even better suggestions I am open for them. Here is my try:
async def test(arg):
loop = asyncio.get_event_loop()
tasks = [loop.run_in_executor(requests.get(arg[key], timeout=key) for key in arg.keys())]
await asyncio.gather(*tasks)
asyncio.run(test(config))

Here is the solution:
def addresses(adr, to):
requests.get(adr, timeout=to)
async def test(arg):
loop = asyncio.get_event_loop()
tasks = [loop.run_in_executor(None, addresses, arg[key], key) for key in arg.keys()]
await asyncio.gather(*tasks)
asyncio.run(test(config))
Now it works аsynchronously with lyb asyncio not with threading.

Some good answers here. I had trouble with this myself (I do a lot of webscraping) and so I created a package to help me async-scrape (https://pypi.org/project/async-scrape/).
It supports GET and POST. I tried to make it as easy to use as possible. You just need to specify a handler function for the response when you instantiate and then use the scrape_all method to do the work.
It uses the term scrape becasue i've build in some handlers for common errors when scraping websites.
You can do some things in it as well like limit the call rate if you find you're getting blocked.
An example of it's use is:
# Create an instance
from async_scrape import AsyncScrape
def post_process(html, resp, **kwargs):
"""Function to process the gathered response from the request"""
if resp.status == 200:
return "Request worked"
else:
return "Request failed"
async_Scrape = AsyncScrape(
post_process_func=post_process,
post_process_kwargs={},
fetch_error_handler=None,
use_proxy=False,
proxy=None,
pac_url=None,
acceptable_error_limit=100,
attempt_limit=5,
rest_between_attempts=True,
rest_wait=60,
call_rate_limit=None,
randomise_headers=True
)
urls = [
"https://www.google.com",
"https://www.bing.com",
]
resps = async_Scrape.scrape_all(urls)
To do this inside a loop i collect the results and add then to a set and pop off the old ones.
EG
from async_scrape import AsyncScrape
from bs4 import BeautifulSoup as bs
def post_process(html, resp, **kwargs):
"""Function to process the gathered response from the request"""
new_urls = bs.findall("a", {"class":"new_link_on_website"}
return [new_urls, resp]
async_Scrape = AsyncScrape(
post_process_func=post_process,
post_process_kwargs={}
)
# Run the loop
urls = set(["https://initial_webpage.com/"])
processed = set()
all_resps = []
while len(urls):
resps = async_scrape.scrape_all(urls)
# Get failed urls
success_reqs = set([
r["req"] for r in resps
if not r["error"]
])
errored_reqs = set([
r["req"] for r in resps
if r["error"]
])
# Get what you want from the responses
for r in success_reqs:
# Add found urls to urls
urls |= set(r["func_resp"][0]) # "func_resp" is the key to the return from your handler function
# Collect the response
all_resps.extend(r["func_resp"][1])
# Add to processed urls
processed.add(r["url"]) # "url" is the key to the url from the response
# Remove processed urls
urls = urls - processed

Related

REQUESTS Maximum number of attempts with a waiting time and in case of failure, give a message in Python

the situation is that sometimes a request does not load or gets stuck in Python, in case that happens or any error occurs, I would like to retry it "n" times and wait up to a maximum of 3 seconds for each one and in case the attempts are over tell me a message that f"Could not process {type_1} and {type_2}". Everything runs in parallel with concurrent.futures. Could you help me with that?
import Requests
import concurrent.futures
import json
data = [['PEN','USD'],['USD','EUR']]
def currency(element):
type_1 =element[0]
type_2 = element[1]
s = requests.Session()
url = f'https://usa.visa.com/cmsapi/fx/rates?amount=1&fee=0&utcConvertedDate=07%2F26%2F2022&exchangedate=07%2F26%2F2022&fromCurr={type_1}&toCurr={type_2}'
a = s.get(url)
response = json.loads(a)
value = response["convertedAmount"]
return value
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
currency, data)
for value in results:
print(value)

Your code is almost there. Here, I modified a few things:
from concurrent.futures import ThreadPoolExecutor
import time
import requests
def convert_currency(tup):
from_currency, to_currency = tup
url = (
"https://usa.visa.com/cmsapi/fx/rates?amount=1&fee=0"
"&utcConvertedDate=07%2F26%2F2022&exchangedate=07%2F26%2F2022&"
f"fromCurr={from_currency}&toCurr={to_currency}"
)
session = requests.Session()
for _ in range(3):
try:
response = session.get(url, timeout=3)
if response.ok:
return response.json()["convertedAmount"]
except requests.exceptions.ConnectTimeout:
time.sleep(3)
return f"Could not process {from_currency} and {to_currency}"
data = [["VND", "XYZ"], ['PEN','USD'], ["ABC", "XYZ"], ['USD','EUR'], ["USD", "XXX"]]
with ThreadPoolExecutor() as executor:
results = executor.map(convert_currency, data)
for value in results:
print(value)
Notes
I retried 3 times (see the for loop)
Use timeout= to specify the time out (in seconds)
The .ok attribute will tell if the call was successful
No need to import json as the response object can JSON decode with the .json() method
You might experiment between ThreadPoolExecutor and ProcessPoolExecutor to see which one performs better

REST API requests - Use Concurrent.futures in the right way

The code below is a sample from my complete program, I tried it to make understandable.
It sends requests to a REST API. It starts with an URL and the number of pages for this specific search and tries to catch the content for each page.
Each page has several results. Each result becomes a FinalObject.
Because there are as many API requests as there are pages, I decided to use multi-threading and the concurrent.futures module.
=> It works but, as I'm new in coding and Python, I still have these 2 questions:
How to use ThreadPoolExecutor sequentially in this case,
Is there a better way to handle multi-threading in this case?
from concurrent.futures import ThreadPoolExecutor
from requests import get as re_get
def main_function(global_page_number, headers, url_request):
# create a list of pages number
pages_numbers_list = [i for i in range(global_page_number)]
# for each page, call the page_handler (MultiThreading)
with ThreadPoolExecutor(max_workers=10) as executor:
for item in pages_numbers_list:
executor.submit(
page_handler,
item,
url_request,
headers
)
def page_handler(page_number, url_request, headers):
# we change the page number in the url request
url_request = change_page(url_request, page_number)
# new request with the new url
result = re_get(url_request, headers=headers)
result = result.json()
# in the result, with found the list of dict in order to create the
# final object
final_object_creation(result['results_list'])
def change_page(url_request, new_page_number):
"to increment the value of the 'page=' attribute in the url"
current_nb_page = ''
start_nb = url_request.find("page=") + len('page=')
while 1:
if url_request[start_nb].isdigit():
current_nb_page = url_request[start_nb]
else:
break
new_url_request = url_request.replace("page=" + current_nb_page,
"page=" + str(new_page_number))
return new_url_request
def final_object_creation(results_list):
'thanks to the object from requests.get(), it builts the final object'
global current_id_decision, dict_decisions
# each item in the results lis should be an instance of the final object
for item in results_list:
# On définit l'identifiant du nouvel objet Decision
current_id_decision += 1
new_id = current_id_decision
# On crée l'objet Décision et on l'ajoute au dico des décisions
dict_decisions[new_id] = FinalObject(item)
class FinalObject:
def __init__(self, content):
self.content = content
current_id_decision = 0
dict_decisions = {}
main_function(1000, "headers", "https://api/v1.0/search?page=0&query=test")

Iteratively create a list of URLS from a list and call each for a request

I'm trying to iteratively create urls combining two url variables and a list of unique uuids that need to be added to the url like so .com/{test_adv_uuid}/. My current code looks like this:
import logging
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
order_id = 1
order_amount = 10
test_adv_uuid = [
f'08d79951-1908-4f0f-91aa-40eee5ac0e3f',
f'30816356-4be2-4def-9793-2913dc7dae82',
f'40a9420a-3bad-4778-930d-0e3da35355d1',
f'9c2c4477-21ea-4b90-b72b-db40c8ae9754',
f'6f0fa70b-4914-458d-8b02-0b6a4475773f',
f'9614bd9f-afa0-4d93-b709-beb38c99cd66',
]
test_adv_url_body = f'https://listen.whatever.com/{test_adv_uuid}/pixel.png'
test_adv_url_attr = f'https://listen.whatever.com/{test_adv_uuid}/pixel.png?order={order_id}&value={order_amount}'
# look up thread pool stuff for understanding on the download & logging
THREAD_POOL = 16
session = requests.Session()
session.mount(
'https://',
requests.adapters.HTTPAdapter(pool_maxsize=THREAD_POOL,
max_retries=3,
pool_block=True)
)
def get(url):
response = session.get(url)
logging.info("request was completed in %s seconds [%s]", response.elapsed.total_seconds(), response.url)
if response.status_code != 200:
logging.error("request failed, error code %s [%s]", response.status_code, response.url)
if 500 <= response.status_code < 600:
# server is overloaded? give it a break
time.sleep(5)
return response
def download(urls):
with ThreadPoolExecutor(max_workers=THREAD_POOL) as executor:
# wrap in a list() to wait for all requests to complete
for response in list(executor.map(get, urls)):
if response.status_code == 200:
print(response.content)
def main():
logging.basicConfig(
format='%(asctime)s.%(msecs)03d %(levelname)-8s %(message)s',
level=logging.INFO,
datefmt='%Y-%m-%d %H:%M:%S'
)
urls = [
test_adv_url_body,
test_adv_url_attr
]
download(urls)
if __name__ == "__main__":
main()
The output I get combines all the uuid items in the list into a single url and looks like:
request was completed in 0.286232 seconds [https://listen.whatever.com/%5B'08d79951-1908-4f0f-91aa-40eee5ac0e3f',%20'30816356-4be2-4def-9793-2913dc7dae82',%20'40a9420a-3bad-4778-930d-0e3da35355d1',%20'9c2c4477-21ea-4b90-b72b-db40c8ae9754',%20'6f0fa70b-4914-458d-8b02-0b6a4475773f',%20'9614bd9f-afa0-4d93-b709-beb38c99cd66'%5D/pixel.png?order=1&value=10]
How would I go about refactoring this to iteratively make each one of these urls with the dynamic uuids into a separate single request?
Would I iteratively create the urls in a list? or within the request call itself?

try this
# better style to name your lists with plurals
test_adv_uuids = [
'08d79951-1908-4f0f-91aa-40eee5ac0e3f',
'30816356-4be2-4def-9793-2913dc7dae82',
'40a9420a-3bad-4778-930d-0e3da35355d1',
'9c2c4477-21ea-4b90-b72b-db40c8ae9754',
'6f0fa70b-4914-458d-8b02-0b6a4475773f',
'9614bd9f-afa0-4d93-b709-beb38c99cd66',
]
# use a list comprehension to build the list of urls
test_adv_url_bodies = [f'https://listen.whatever.com/{uuid}/pixel.png' for uuid in test_adv_uuids]
test_adv_url_attrs = [f'https://listen.whatever.com/{test_adv_uuid}/pixel.png?order={order_id}&value={order_amount}' for uuid in test_adv_uuids]
and then in your main() you'd have
urls = test_adv_url_bodies + test_adv_url_attrs
this is technically slightly less efficient than creating the urls list in a single for loop, rather than using two list comprehensions, so I'll leave that as an exercise for the reader

python asyncio asynchronously fetch data by key from a dict when the key becomes available

Like title told, my use case is like this:
I have one aiohttp server, which accept request from client, when i have the request i generate one unique request id for it, and then i send the {req_id: req_pyaload} dict to some workers (the worker is not in python thus running in another process), when the workers complete the work, i get back the response and put them in a result dict like this: {req_id_1: res_1, req_id_2: res_2}.
Then I want my aiohttp server handler to await on above result dict, so when the specific response become available (by req_id) it can send it back.
I build below example code to try to simulate the process, but got stuck in implementing the coroutine async def fetch_correct_res(req_id) which should asynchronously/unblockly fetch the correct response by req_id.
import random
import asyncio
import shortuuid
n_tests = 1000
idxs = list(range(n_tests))
req_ids = []
for _ in range(n_tests):
req_ids.append(shortuuid.uuid())
res_dict = {}
async def fetch_correct_res(req_id):
pass
async def handler(req):
res = await fetch_correct_res(req)
assert req == res, "the correct res for the req should exactly be the req itself."
print("got correct res for req: {}".format(req))
async def randomly_put_res_to_res_dict():
for _ in range(n_tests):
random_idx = random.choice(idxs)
await asyncio.sleep(random_idx / 1000)
res_dict[req_ids[random_idx]] = req_ids[random_idx]
print("req: {} is back".format(req_ids[random_idx]))
So:
Is it possible to make this solution work? how?
If above solution is not possible, what should be the correct solution for this use case with asyncio?
Many thanks.
The only approach i can think of for now to make this work is: pre-created some asyncio.Queue with pre-assigned id, then for each incoming request assign one queue to it, so the handler just await on this queue, when the response come back i put it into this pre-assigned queue only, after the request fulfilled, i collect back the queue to use it for next incoming request. Not very elegant, but will solve the problem.

See if the below sample implementation fulfils your need
basically you want to respond back to the request(id) with your response(unable to predict the order) in an asynchronous way
So at the time of request handling, populate the dict with {request_id: {'event':<async.Event>, 'result': <result>}} and await on asyncio.Event.wait(), once the response is received, signal the event with asyncio.Event.set() which will release the await and then fetch the response from the dict based on the request id
I modified your code slightly to pre-populate the dict with request id and put the await on asyncio.Event.wait() until the signal comes from the response
import random
import asyncio
import shortuuid
n_tests = 10
idxs = list(range(n_tests))
req_ids = []
for _ in range(n_tests):
req_ids.append(shortuuid.uuid())
res_dict = {}
async def fetch_correct_res(req_id, event):
await event.wait()
res = res_dict[req_id]['result']
return res
async def handler(req, loop):
print("incoming request id: {}".format(req))
event = asyncio.Event()
data = {req :{}}
res_dict.update(data)
res_dict[req]['event']=event
res_dict[req]['result']='pending'
res = await fetch_correct_res(req, event)
assert req == res, "the correct res for the req should exactly be the req itself."
print("got correct res for req: {}".format(req))
async def randomly_put_res_to_res_dict():
random.shuffle(req_ids)
for i in req_ids:
await asyncio.sleep(random.randrange(2,4))
print("req: {} is back".format(i))
if res_dict.get(i) is not None:
event = res_dict[i]['event']
res_dict[i]['result'] = i
event.set()
loop = asyncio.get_event_loop()
tasks = asyncio.gather(handler(req_ids[0], loop),
handler(req_ids[1], loop),
handler(req_ids[2], loop),
handler(req_ids[3], loop),
randomly_put_res_to_res_dict())
loop.run_until_complete(tasks)
loop.close()
sample response from the above code
incoming request id: NDhvBPqMiRbteFD5WqiLFE
incoming request id: fpmk8yC3iQcgHAJBKqe2zh
incoming request id: M7eX7qeVQfWCCBnP4FbRtK
incoming request id: v2hAfcCEhRPUDUjCabk45N
req: VeyvAEX7YGgRZDHqa2UGYc is back
req: M7eX7qeVQfWCCBnP4FbRtK is back
got correct res for req: M7eX7qeVQfWCCBnP4FbRtK
req: pVvYoyAzvK8VYaHfrFA9SB is back
req: soP8NDxeQKYjgeT7pa3wtG is back
req: j3rcg5Lp59pQXuvdjCAyZe is back
req: NDhvBPqMiRbteFD5WqiLFE is back
got correct res for req: NDhvBPqMiRbteFD5WqiLFE
req: v2hAfcCEhRPUDUjCabk45N is back
got correct res for req: v2hAfcCEhRPUDUjCabk45N
req: porzHqMqV8SAuttteHRwNL is back
req: trVVqZrUpsW3tfjQajJfb7 is back
req: fpmk8yC3iQcgHAJBKqe2zh is back
got correct res for req: fpmk8yC3iQcgHAJBKqe2zh

This may work (note: I removed UUID in order to know req id in advance)
import random
import asyncio
n_tests = 1000
idxs = list(range(n_tests))
req_ids = []
for i in range(n_tests):
req_ids.append(i)
res_dict = {}
async def fetch_correct_res(req_id):
while not res_dict.get(req_id):
await asyncio.sleep(0.1)
return req_ids[req_id]
async def handler(req):
print("fetching req: ", req)
res = await fetch_correct_res(req)
assert req == res, "the correct res for the req should exactly be the req itself."
print("got correct res for req: {}".format(req))
async def randomly_put_res_to_res_dict(future):
for i in range(n_tests):
res_dict[req_ids[i]] = req_ids[i]
await asyncio.sleep(0.5)
print("req: {} is back".format(req_ids[i]))
future.set_result("done")
loop = asyncio.get_event_loop()
future = asyncio.Future()
asyncio.ensure_future(randomly_put_res_to_res_dict(future))
loop.run_until_complete(handler(10))
loop.close()
Is it the best solution? according to me No, basically its kind of requesting long running job status, and you should have (REST) api for doing the job submission and knowing job status like:
http POST server:port/job
{some job json paylod}
Response: 200 OK {"req_id": 1}
http GET server:port/job/1
Response: 200 OK {"req_id": 1, "status": "in process"}
http GET server:port/job/1
Response: 200 OK {"req_id": 1, "status": "done", "result":{}}

How to speed up Ajax requests Python Youtube scraper

I'm editing on a simple scraper that crawls a Youtube video's comment page. The crawler uses Ajax to go through every comment on a Youtube Videos comment page and then saves them to a json file. Even with small number of comments (< 10), it still takes 3+ min for the comments to be parsed.
I've tried including request-cache and using ujson instead of json to see if there are any benefits but there's no noticeable difference.
Here's the code I'm using currently:
import os
import sys
import time
import ujson
import requests
import requests_cache
import argparse
import lxml.html
requests_cache.install_cache('comment_cache')
from lxml.cssselect import CSSSelector
YOUTUBE_COMMENTS_URL = 'https://www.youtube.com/all_comments?v={youtube_id}'
YOUTUBE_COMMENTS_AJAX_URL = 'https://www.youtube.com/comment_ajax'
def find_value(html, key, num_chars=2):
pos_begin = html.find(key) + len(key) + num_chars
pos_end = html.find('"', pos_begin)
return html[pos_begin: pos_end]
def extract_comments(html):
tree = lxml.html.fromstring(html)
item_sel = CSSSelector('.comment-item')
text_sel = CSSSelector('.comment-text-content')
photo_sel = CSSSelector('.user-photo')
for item in item_sel(tree):
yield {'cid': item.get('data-cid'),
'name': item.get('data-name'),
'ytid': item.get('data-aid'),
'text': text_sel(item)[0].text_content(),
'photo': photo_sel(item)[0].get('src')}
def extract_reply_cids(html):
tree = lxml.html.fromstring(html)
sel = CSSSelector('.comment-replies-header > .load-comments')
return [i.get('data-cid') for i in sel(tree)]
def ajax_request(session, url, params, data, retries=10, sleep=20):
for _ in range(retries):
response = session.post(url, params=params, data=data)
if response.status_code == 200:
response_dict = ujson.loads(response.text)
return response_dict.get('page_token', None), response_dict['html_content']
else:
time.sleep(sleep)
def download_comments(youtube_id, sleep=1, order_by_time=True):
session = requests.Session()
# Get Youtube page with initial comments
response = session.get(YOUTUBE_COMMENTS_URL.format(youtube_id=youtube_id))
html = response.text
reply_cids = extract_reply_cids(html)
ret_cids = []
for comment in extract_comments(html):
ret_cids.append(comment['cid'])
yield comment
page_token = find_value(html, 'data-token')
session_token = find_value(html, 'XSRF_TOKEN', 4)
first_iteration = True
# Get remaining comments (the same as pressing the 'Show more' button)
while page_token:
data = {'video_id': youtube_id,
'session_token': session_token}
params = {'action_load_comments': 1,
'order_by_time': order_by_time,
'filter': youtube_id}
if order_by_time and first_iteration:
params['order_menu'] = True
else:
data['page_token'] = page_token
response = ajax_request(session, YOUTUBE_COMMENTS_AJAX_URL, params, data)
if not response:
break
page_token, html = response
reply_cids += extract_reply_cids(html)
for comment in extract_comments(html):
if comment['cid'] not in ret_cids:
ret_cids.append(comment['cid'])
yield comment
first_iteration = False
time.sleep(sleep)
# Get replies (the same as pressing the 'View all X replies' link)
for cid in reply_cids:
data = {'comment_id': cid,
'video_id': youtube_id,
'can_reply': 1,
'session_token': session_token}
params = {'action_load_replies': 1,
'order_by_time': order_by_time,
'filter': youtube_id,
'tab': 'inbox'}
response = ajax_request(session, YOUTUBE_COMMENTS_AJAX_URL, params, data)
if not response:
break
_, html = response
for comment in extract_comments(html):
if comment['cid'] not in ret_cids:
ret_cids.append(comment['cid'])
yield comment
time.sleep(sleep)
def main(argv):
parser = argparse.ArgumentParser(add_help=False, description=('Download Youtube comments without using the Youtube API'))
parser.add_argument('--help', '-h', action='help', default=argparse.SUPPRESS, help='Show this help message and exit')
parser.add_argument('--youtubeid', '-y', help='ID of Youtube video for which to download the comments')
parser.add_argument('--output', '-o', help='Output filename (output format is line delimited JSON)')
parser.add_argument('--timeorder', '-t', action='store_true', help='Download Youtube comments ordered by time')
try:
args = parser.parse_args(argv)
youtube_id = args.youtubeid
output = args.output
start_time = time.time()
if not youtube_id or not output:
parser.print_usage()
raise ValueError('you need to specify a Youtube ID and an output filename')
print 'Downloading Youtube comments for video:', youtube_id
count = 0
with open(output, 'wb') as fp:
for comment in download_comments(youtube_id, order_by_time=bool(args.timeorder)):
print >> fp, ujson.dumps(comment, escape_forward_slashes=False)
count += 1
sys.stdout.write('Downloaded %d comment(s)\r' % count)
sys.stdout.flush()
elapsed_time = time.time() - start_time
print '\nDone! Elapsed time (seconds):', elapsed_time
except Exception, e:
print 'Error:', str(e)
sys.exit(1)
if __name__ == "__main__":
main(sys.argv[1:])
I'm new to Python so I'm not sure where the bottlenecks are. The finished script will be used to parse through 100,000+ comments so performance is a large factor.
Would using multithreading solve the issue? And if so how would I refactor this to benefit from it?
Is this strictly a network issue?

Yes, Multithreading will speed up the process. Run the network operations (ie. downloading) in a separate Thread.
Yes, it is a network related issue.
Your requests are I/O bound. You make a request to Youtube - it takes some time to get back the response, it's dependent mostly on the network, you can't make the process faster. However, you can use Threads to send multiple requests in parallel. That will not make the actual process faster but you will process more in less time.
Threading tutorial:
https://pymotw.com/2/threading/
http://www.tutorialspoint.com/python/python_multithreading.htm
An example somewhat similar to your task -- http://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python
Also since you will be doing a lot of scraping and processing, I would recommend using something like Scrapy - I personally use it for these kind of tasks.

Making multiple requests at once will speed up the process, but if it's taking 3 minutes to parse 10 comments you have some other issues and parsing 100,000 comments will take days. Unless there's a pressing reason to use lxml I'd suggest you look at BeautifulSoup and let it provide you with lists of the comment tags and their text content rather than doing it yourself. I'm guessing most of the slowness is in lxml transforming the content you pass to it and then in your manual counting to find positions in a string. I'm also suspicious of the calls to sleep-- what are those for?
Assuming this
print >> fp, ujson.dumps(comment, escape_forward_slashes=False)
count += 1
sys.stdout.write('Downloaded %d comment(s)\r' % count)
is just for debugging, move it into download_comments and use logging so you can turn it on and off. Dumping each individual comment to JSON as you go is going to be slow; you might want to start dumping these into a database now to avoid that. And re-examine why you're doing things one comment at a time: BeautifulSoup should give you a full list of comments & their text with each page load so you can handle them in batches which will be handy once you start parsing larger groups.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Asynchronous requests inside the for loop in python - python

Related

REQUESTS Maximum number of attempts with a waiting time and in case of failure, give a message in Python

REST API requests - Use Concurrent.futures in the right way

Iteratively create a list of URLS from a list and call each for a request

python asyncio asynchronously fetch data by key from a dict when the key becomes available

How to speed up Ajax requests Python Youtube scraper

Categories

Resources