Forcibly stopping a thread that is running a urllib download - python

I have a some threads that are downloading content for various websites using Python's built-in urllib module. The code looks something like this:
from urllib.request import Request, urlopen
from threading import Thread
##do stuff
def download(url):
req = urlopen(Request(url))
return req.read()
url = "somerandomwebsite"
Thread(target=download, args=(url,)).start()
Thread(target=download, args=(url,)).start()
#Do more stuff
The user should have an option to stop loading data, and while I can use flags/events to prevent using the data when it is done downloading if the user stops the download, I can't actually stop the download itself.
Is there a way to either stop the download (and preferably do something when the download is stopped) or forcibly (and safely) kill the thread the download is running in?
Thanks in advance.

you can use urllib.request.urlretrieve instead witch takes a reporthook argument:
from urllib.request import urlretrieve
from threading import Thread
url = "someurl"
flag = 0
def dl_progress(count, blksize, filesize):
global flag
if flag:
raise Exception('downlaod canceled')
Thread(target=urlretrieve, args=(url, "test.rar",dl_progress)).start()
if cancel_download():
flag = 1

Related

How do I 'spam' an url via python?

I'd want to make a script that automatically sends alot of requests to a URL via Python.
Example link: https://page-views.glitch.me/badge?page_id=page.id
I've tried selenium but thats very slow.
pip install requests
import requests
for i in range(100): # Or whatever amount of requests you wish to send
requests.get("https://page-views.glitch.me/badge?page_id=page.id")
Or if you really wanted to hammer the address you could use multiprocessing
import multiprocessing as mp
import requests
def my_func(x):
for i in range(x):
print(requests.get("https://page-views.glitch.me/badge?page_id=page.id"))
def main():
pool = mp.Pool(mp.cpu_count())
pool.map(my_func, range(0, 100))
if __name__ == "__main__":
main()
You can send send multiple get() requests in a loop as follows:
for i in range(100):
driver.get(https://page-views.glitch.me/badge?page_id=page.id)

How can I prevent my python program from loading when using pyngrok?

I have the following function in my script
import os, re
from pyngrok import ngrok
def server():
os.system('kill -9 $(pgrep ngrok)')
ngrok.connect(443, "tcp")
while True:
ngrok_tunnels = ngrok.get_tunnels()
url = ngrok_tunnels[0].public_url
if re.match("tcp://[0-9]*.tcp.ngrok.io:[0-9]*", url) is not None:
print "your url is : " + url
break
This is responsible for generating a ngrok tcp link and it works, but it gets stuck like in the image below.
enter image description here
How can I prevent it from being charged? And just print the link, they told me about the monitor_thread mode in False but I don't know how to configure it in my function, thank you very much in advance.
The reason the script is “stuck” is because pyngrok starts ngrok with a thread to monitor logs, and the Python process can’t exit until all threads have been dealt with. You can stop the monitor thread, as shown here in the documentation, or, if you have no use for it, you can prevent it from starting in the first place:
import os, re
from pyngrok import ngrok
from pyngrok.conf import PyngrokConfig
def server():
os.system('kill -9 $(pgrep ngrok)')
ngrok_tunnel = ngrok.connect(443, "tcp", pyngrok_config=PyngrokConfig(monitor_thread=False))
print("your url is : " + ngrok_tunnel.public_url)
However, this still won’t do what you want. If you do this, yes, you will be returned back to the console, but with that the ngrok process will also be stopped, as it is a subprocess of Python at this point. To leave the tunnels open, you need to leave the process running.

Checking network connection and running a program based off result

I am writing a program that will test for an active network connection, then run our software downloader if there is. If there is an update ready to download in the downloader it will automatically download and install said update.
import urllib3
from subprocess import Popen
import subprocess
def run_downloader():
return subprocess.Popen(['C:\ProgramFiles\PrecisionOSTech\Education\POSTdownloader.exe'])
def internet_on():
if urllib3.urlopen('http://216.58.192.142', timeout=1):
return True
else:
urllib3.URLError
return False
if internet_on == True:
run_downloader()
The expectation is to have the program test for an active connection and return either true or false. If true, then it should run the downloader.
Currently, the program runs with no errors but does not run the downloader upon completion. I can only imagine that the internet_on(): method is not returning true as I am wanting. If I run the subprocess.Popen line outside of the method, the downloader will start as planned.
Any assistance is appreciated !
Thanks
It should be something like this:
import urllib.request
from subprocess import Popen
import subprocess
def run_downloader():
return subprocess.Popen(['C:\ProgramFiles\PrecisionOSTech\Education\POSTdownloader.exe'])
def ispageresponding():
response_status_code = urllib.request.urlopen("http://www.stackoverflow.com").getcode()
return response_status_code == 200:
if __name__ == '__main__':
if ispageresponding() == True:
run_downloader()
Note that internet connection function is called in the if statement and is not inside the same function. Also if you are using python3 you should use urllib.request lib.

Unable to execute my script in the right way using thread

I've tried to create a scraper using python in combination with Thread to make the execution time faster. The scraper is supposed to parse all the shop names along with their phone numbers traversing multiple pages.
The script is running without any issues. As I'm very new to work with Thread, I can hardly understand I'm doing it in the right way.
This is what I've tried so far with:
import requests
from lxml import html
import threading
from urllib.parse import urljoin
link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"
def get_information(url):
for pagelink in [url.format(page) for page in range(20)]:
response = requests.get(pagelink).text
tree = html.fromstring(response)
for title in tree.cssselect("div.info"):
name = title.cssselect("a.business-name span[itemprop=name]")[0].text
try:
phone = title.cssselect("div[itemprop=telephone]")[0].text
except Exception: phone = ""
print(f'{name} {phone}')
thread = threading.Thread(target=get_information, args=(link,))
thread.start()
thread.join()
The problem being I can't find any difference in time or performance whether I run the above script using Thread or without using Thread. If I'm going wrong, how can I execute the above script using Thread?
EDIT: I've tried to change the logic to use multiple links. Is it possible now? Thanks in advance.
You can use Threading to scrape several pages in paralel as below:
import requests
from lxml import html
import threading
from urllib.parse import urljoin
link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"
def get_information(url):
response = requests.get(url).text
tree = html.fromstring(response)
for title in tree.cssselect("div.info"):
name = title.cssselect("a.business-name span[itemprop=name]")[0].text
try:
phone = title.cssselect("div[itemprop=telephone]")[0].text
except Exception: phone = ""
print(f'{name} {phone}')
threads = []
for url in [link.format(page) for page in range(20)]:
thread = threading.Thread(target=get_information, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
Note that sequence of data will not be preserved. It means that if to scrape pages one by one sequence of extracted data will be:
page_1_name_1
page_1_name_2
page_1_name_3
page_2_name_1
page_2_name_2
page_2_name_3
page_3_name_1
page_3_name_2
page_3_name_3
while with Threading data will be mixed:
page_1_name_1
page_2_name_1
page_1_name_2
page_2_name_2
page_3_name_1
page_2_name_3
page_1_name_3
page_3_name_2
page_3_name_3

python wget library not returning if internet goes down for some times. How can I return with error if Internet is down?

I have below code to download a file inside a loop,
import wget
try:
wget.download(url)
except:
pass
But if the Internet goes down, it doesn't return!
So my whole loop is stuck.
I want to repeat the same download if internet goes down. So I wanna know does any error happen.
How can i mitigate this?
One simple solution is to move your download code to a thread and make it a separate process which can be interrupted.
You can use python Thread and Timer module to achieve it.
from threading import Thread, Timer
from functools import partial
import time
import urllib
def check_connectivity(t):
try:
urllib.request.urlopen("http://google.com", timeout=2)
except Exception as e:
t._Thread__stop()
class Download(Thread):
def run(self):
print("Trying to download file....")
con = partial(check_connectivity, self)
while True:
t = Timer(5, con) # Checks the connectivity every 5 second or less.
t.start()
# your download code....
def main():
down = Download()
down.start()
down.join()
You code move your main download loop inside the thread's run method. And start a timer inside which listens for the network connectivity.

Categories