I'm working on an application to download the code of a web page and captures the links.
It works, but if I connect the program to a GUI, it locks the corresponding button until the download is completed.
If I trigger the download via a separate thread, to avoid the button lock, it just freezes and does not complete execution.
Is this normal? Or am I missing something?
Below goes the snippet of code. If I call grab() from a separate thread, nothing happens, neither errors.
The function update_observers() only notifies the observers, not doing else.
The observer is the responsible by making any changes, in this case, redraw the GUI.
def grab(self, url):
try:
self._status = 'Downloading page.'
self.update_observers()
inpu = urllib2.urlopen(url)
except URLError, e:
self._status = 'Error: '+ e.reason
self.update_observers()
return None
resp = []
self._status = 'Parsing links'
self.update_observers()
for line in inpu.readlines():
for reg in self._regexes:
links = reg.findall(line)
for link in links:
resp.append(link)
self._status = 'Ready.'
self.update_observers()
return resp
This code is called here:
def grab(self, widget):
t = Thread(target=self.work)
t.setDaemon(True)
t.start()
def work(self):
print "Working"
self.links = None
self.links = self.grabber.grab(self.txtLink.get_text())
for link in self.links:
self.store.append([link])
print "Ok."
If I move the code from work() to grab, removing the threading stuff, it's all ok.
I just called gtk.gdk.threads_init() before gtk.main() and everything worked perfectly without any changes.
Related
I have below code to download a file inside a loop,
import wget
try:
wget.download(url)
except:
pass
But if the Internet goes down, it doesn't return!
So my whole loop is stuck.
I want to repeat the same download if internet goes down. So I wanna know does any error happen.
How can i mitigate this?
One simple solution is to move your download code to a thread and make it a separate process which can be interrupted.
You can use python Thread and Timer module to achieve it.
from threading import Thread, Timer
from functools import partial
import time
import urllib
def check_connectivity(t):
try:
urllib.request.urlopen("http://google.com", timeout=2)
except Exception as e:
t._Thread__stop()
class Download(Thread):
def run(self):
print("Trying to download file....")
con = partial(check_connectivity, self)
while True:
t = Timer(5, con) # Checks the connectivity every 5 second or less.
t.start()
# your download code....
def main():
down = Download()
down.start()
down.join()
You code move your main download loop inside the thread's run method. And start a timer inside which listens for the network connectivity.
The class BrokenLinkTest in the code below does the following.
takes a web page url
finds all the links in the web page
get the headers of the links concurrently (this is done to check if the link is broken or not)
print 'completed' when all the headers are received.
from bs4 import BeautifulSoup
import requests
class BrokenLinkTest(object):
def __init__(self, url):
self.url = url
self.thread_count = 0
self.lock = threading.Lock()
def execute(self):
soup = BeautifulSoup(requests.get(self.url).text)
self.lock.acquire()
for link in soup.find_all('a'):
url = link.get('href')
threading.Thread(target=self._check_url(url))
self.lock.acquire()
def _on_complete(self):
self.thread_count -= 1
if self.thread_count == 0: #check if all the threads are completed
self.lock.release()
print "completed"
def _check_url(self, url):
self.thread_count += 1
print url
result = requests.head(url)
print result
self._on_complete()
BrokenLinkTest("http://www.example.com").execute()
Can the concurrency/synchronization part be done in a better way. I did it using threading.Lock. This is my first experiment with python threading.
def execute(self):
soup = BeautifulSoup(requests.get(self.url).text)
threads = []
for link in soup.find_all('a'):
url = link.get('href')
t = threading.Thread(target=self._check_url, args=(url,))
t.start()
threads.append(t)
for thread in threads:
thread.join()
You could use the join method to wait for all the threads to finish.
Note I also added a start call, and passed the bound method object to the target param. In your original example you were calling _check_url in the main thread and passing the return value to the target param.
All threads in Python run on the same core, so you won't be gaining any performance by doing it this way. Also - it's very unclear what is actually happening?
You are never actually starting a threads, you are just initializing it
The threads themselves do absolutely nothing other than decrementing the thread count
You may only gain performance in a thread-based scenario if your program is delivering work to the IO (sending requests, writing to file and so on), where other threads can work in the meanwhile.
My tkinter app has 2 threads (I need them) and I found on stackoverflow a wonderful function tkloop(), which is made for tkinter-only-one-main-thread; it uses Queue. It does show tkMessagebox when I do this:
self.q.put((tkMessageBox.askyesno,("Cannot download it", "Download \"" + tag +"\" via internet site"),{}, self.q1 ))
But when I made my own function, it somehow doesn't execute the function
self.q.put((self.topleveldo,(resultlist),{},None))
There's only one class App:
self.q=Queue()
def tkloop(self):
try:
while True:
f, a, k, qr = self.q.get_nowait()
print f
r = f(*a,**k)
if qr: qr.put(r)
except:
pass
self.okno.after(100, self.tkloop)
def topleveldo(resultlist):
print ("executed - actually this never prints")
self.choice=Toplevel()
self.choices=Listbox(self.choice)
for result in resultlist:
self.choices.insert(END,str(result))
choosebutton=Button(text="Vybrat",command=self.readchoice)
def readchoice(self):
choice=int(self.choices.curselection())
self.choice.destroy()
self.q1.put(choice)
another code in a method in class App, run by the second thread:
def method(self):
self.q1=Queue()
self.q.put((self.topleveldo,(resultlist),{},None))
print ("it still prints this, but then it waits forever for q1.get, because self.topleveldo is never executed")
choice=self.q1.get()
Log errors in the tkloop exception handler - right now you don't know if the call to topleveldo failed (it probably did). The problem is that (1) (resultlist) is just resultlist, not a tuple with 1 argument like topleveldo expects. And (2) tkloop only puts a response if the 4th parameter in the message is a queue. You can fix it with:
self.q.put((self.topleveldo,(resultlist,),{},self.q1))
Added:
tkloop should always return a message, even if it caught an exception, so that callers can reliably call q.get() to get a response. One way to do this is to return the exception that the called program raised:
def tkloop(self):
while True:
try:
f, a, k, qr = self.q.get_nowait()
print f
r = f(*a,**k)
if qr:
qr.put(r)
del f,a,k,qr
except Exception, e:
if qr:
try:
qr.put(e)
except:
# log it
pass
self.okno.after(100, self.tkloop)
I am trying to make a simple function to download file in python
The code is something like
def download(url , dest):
urllib.urlretrieve(url, dest)
My issue is that if I want to cancel the download process in the middle of downloading how do I approach???
This function runs in the background of app and is triggered by a button. Now I am trying to trigger it off with another button.
The platform is XBMC.
A simple class to do the same as your download function:
import urllib
import threading
class Downloader:
def __init__(self):
self.stop_down = False
self.thread = None
def download(self, url, destination):
self.thread = threading.Thread(target=self.__down, args=(url, destination))
self.thread.start()
def __down(self, url, dest):
_continue = True
handler = urllib.urlopen(url)
self.fp = open(dest, "w")
while not self.stop_down and _continue:
data = handler.read(4096)
self.fp.write(data)
_continue = data
handler.close()
self.fp.close()
def cancel(self):
self.stop_down = True
So, when someone clicks the "Cancel" button you have to call the cancel() method.
Please note that this will not remove the partially downloaded file if you cancel it, but that should not be hard to achieve using os.unlink(), for example.
The following example script shows how to use it, starting the download of a ~20Mb file and cancelling it after 5 seconds:
import time
if __name__ == "__main__":
url = "http://ftp.postgresql.org/pub/source/v9.2.3/postgresql-9.2.3.tar.gz"
down = Downloader()
down.download(url, "file")
print "Download started..."
time.sleep(5)
down.cancel()
print "Download canceled"
If you are canceling by pressing CTRL+C, then you can use this built in exception and proceed with what you think the best move should be.
In this case, if I cancel in the middle of a download, I simply want that partial file to be deleted:
def download(url , dest):
try:
urllib.urlretrieve(url, dest)
except KeyboardInterrupt:
if os.path.exists(dest):
os.remove(dest)
except Exception, e:
raise
I'm working with threads and I need to download a website with a thread. I also have a thread that sends the petition to the site but doesn't wait for an answer.
The one that doesn't wait is this:
class peticion(Thread):
def __init__(self, url):
Thread.__init__(self)
self.url = url
def run(self):
f = urllib.urlopen(self.url)
f.close()
This one works correctly, however the one that has to wait for the response takes something like a random time to complete, from 5 seconds to 2 minutes, or it may never finish. This is the class:
class playerConn(Thread):
def __init__(self, ev):
Thread.__init__(self)
self.ev = ev
def run(self):
try:
params = urllib.urlencode('''params go here''')
f = urllib.urlopen('''site goes here''')
resp = f.read()
f.close()
finally:
# do something with the response
Whether or not I use the try...finally statement it doesn't work, the code after the urlopen function won't get to execute.
What can I do?
It appear to just be a problem with the URL, the code is fine and it does not appear to do something wrong.
I bet you have some type of problem from the website, maybe a 404 or similar.
Try opening something in localhost, just to test.