I can't figure out the problem in this code.
class Threader(threading.Thread):
def __init__(self, queue, url, host):
threading.Thread.__init__(self)
self.queue = queue
self.url = url
self.host = host
def run(self):
print self.url # http://www.stackoverflow.com
with contextlib.closing(urllib2.urlopen(self.url)) as u:
source = u.read()
print "hey" # this is not printing!
source = self.con()
doc = Document(source)
self.queue.put((doc, self.host))
When I run this code, print self.url succesfully outputs the url but print "hey" is not working. So basically, (I believe) there is something with contextlib which is blocking the code. I also tried the conventional urlopen method without using contextlib, but it doesn't work either. Furthermore, I tried try - except but the program doesn't raise any error. So what may be the problem here?
Your Code doesn't work, I have taken the liberty to adapt it a bit (imports, also it doesn't know about Document and self.con), and make it compatible with python2 (that's what I use here at the moment) - it works:
from __future__ import with_statement
import threading, Queue, urllib2, contextlib
class Threader(threading.Thread):
def __init__(self, queue, url, host):
threading.Thread.__init__(self)
self.queue = queue
self.url = url
self.host = host
def run(self):
print self.url
with contextlib.closing(urllib2.urlopen(self.url)) as u:
source = u.read()
print "hey"
if '__main__'==__name__:
t = Threader(Queue.Queue(), 'http://www.stackoverflow.com', '???')
t.start()
t.join()
EDIT: works also with "with" and contextlib
Since the problem persists with only using urllib, the most probable cause is that the url you are trying to open does not response.
You should try to
open the url in a browser or a simple web client (like wget on linux)
set the timeout parameter of urllib2.urlopen
Related
Getting the specifics out of the way, I'm writing an open source P2P social network over IPFS and Flask -- I know, it's been done. I'm choosing Flask because pyinstaller can put it in an exe file.
I am attempting to update my IPNS every 10 minutes to publish all status updates I've added to the network during said 10 minutes. The cron function from setup class (from library.py) is where that updater function is stored. At first, I threaded the cron function from init of setup. The server hung. Then I moved the threading process over to app.before_first_request. The server still hangs.
https://pastebin.com/bXHTuH83 (main.py)
from flask import Flask, jsonify
from library import *
#=========================TO BE DELETED=========================================
def pretty(json):
json = dumps(loads(json), indent=4, sort_keys=True)
return json
#===============================================================================
app = Flask(__name__)
GANN = setup()
#app.before_first_request
def cron_job():
Thread(target=GANN.cron())
#app.route("/")
def home():
return "Hello World!!!"
if __name__ == "__main__":
app.run(port="80", debug=True, threaded=True)
https://pastebin.com/W5P8Tpvd (library.py)
from threading import Thread
from time import time, sleep
import urllib.request
from json import loads, dumps
def api(*argv, **kwargs):
url = "http://127.0.0.1:5001/api/v0/"
for arg in argv:
arg = arg.replace(" ", "/")
if arg[:-1] != "/":
arg += "/"
url += arg
url = url[0:-1]
if kwargs:
url+="?"
for val in kwargs:
url = url + val + "=" + kwargs[val] + "&"
url = url[0:-1]
print(url)
try:
with urllib.request.urlopen(url, timeout=300) as response:
return response.read()
except:
return b"""{"ERROR": "CANNOT CONNECT TO IPFS!"}"""
class setup():
def __init__(self):
api("files", "mkdir", arg="/GANN", parents="True")
self.root_hash = ""
def update_root(self):
try:
for entry in loads(api("files", "ls", l="True").decode())["Entries"]:
if entry["Name"] == "GANN":
self.root_hash = entry["Hash"]
except:
return """{"ERROR": "CANNOT FIND ROOT DIRECTORY"}"""
def publish_root(self):
api("name", "publish", arg=self.root_hash)
def cron(self):
while True:
print("CRON Thread Started!")
self.update_root()
self.publish_root()
sleep(600)
I have searched the web for a couple days and have yet to find a threading technique that will split from the main process and not hang the server from taking other requests. I believe I'm on a single stream connection, as IPFS blocks connections to every other device in my home when it's started. It takes a couple minutes for the CLI IPNS update to go through, so I set urllib's timeout to 300 seconds.
Well what I think the threading code is not correct.
#app.before_first_request
def cron_job():
Thread(target=GANN.cron())
Here you created a Thread object. The argument must be callable, but you called your method already here. so the right way would be
Thread(target=GANN.cron)
So the thread can call the cron function later. having said that, the Thread must be started, so it will call the function target you gave. So it must be ike
thread_cron = Thread(target=GANN.cron)
thread_cron.start()
Since you called the GANN.cron() , the method starts executing and your app hung!
I new in mitmproxy. But I can't figure out how to used that in python script.
I want to put mitmproxy into my python script just like a library and also specific everything like port or host and do some modify with Request or Response in my python script.
So when I start my script like this
python sample.py
Everything will run automatic without run mitmproxy from commandline like this
mitmproxy -s sample.py
Thanks for reading.
You can use something like this. This code was taken from an issue posted on the mithproxy github found here
from mitmproxy import proxy, options
from mitmproxy.tools.dump import DumpMaster
from mitmproxy.addons import core
class AddHeader:
def __init__(self):
self.num = 0
def response(self, flow):
self.num = self.num + 1
print(self.num)
flow.response.headers["count"] = str(self.num)
addons = [
AddHeader()
]
opts = options.Options(listen_host='127.0.0.1', listen_port=8080)
pconf = proxy.config.ProxyConfig(opts)
m = DumpMaster(None)
m.server = proxy.server.ProxyServer(pconf)
# print(m.addons)
m.addons.add(addons)
print(m.addons)
# m.addons.add(core.Core())
try:
m.run()
except KeyboardInterrupt:
m.shutdown()
Start mitmproxy in the background programmatically to integrate it into an existing app:
from mitmproxy.options import Options
from mitmproxy.proxy.config import ProxyConfig
from mitmproxy.proxy.server import ProxyServer
from mitmproxy.tools.dump import DumpMaster
import threading
import asyncio
import time
class Addon(object):
def __init__(self):
self.num = 1
def request(self, flow):
flow.request.headers["count"] = str(self.num)
def response(self, flow):
self.num = self.num + 1
flow.response.headers["count"] = str(self.num)
print(self.num)
# see source mitmproxy/master.py for details
def loop_in_thread(loop, m):
asyncio.set_event_loop(loop) # This is the key.
m.run_loop(loop.run_forever)
if __name__ == "__main__":
options = Options(listen_host='0.0.0.0', listen_port=8080, http2=True)
m = DumpMaster(options, with_termlog=False, with_dumper=False)
config = ProxyConfig(options)
m.server = ProxyServer(config)
m.addons.add(Addon())
# run mitmproxy in backgroud, especially integrated with other server
loop = asyncio.get_event_loop()
t = threading.Thread( target=loop_in_thread, args=(loop,m) )
t.start()
# Other servers, such as a web server, might be started then.
time.sleep(20)
print('going to shutdown mitmproxy')
m.shutdown()
from my gist
I'm new to Python, I want open a web page(here is google.com) using pywebkitgtk
then countdown with another thread,
when time's up, send a signal to webview, download the html as file
Is there a way to open a web-page in gtk.main and countdown in background thread, then send a signal to GUI, make GUI do something..
reference material:
Downloading a page’s content with python and WebKit
using a separate thread to run code, two approaches for threads in PyGTK.
here is my code, it cannot run, I guess I do not understand Python's Class...
#!/usr/bin/env python
import sys, threading
import gtk, webkit
import time
import gobject
gobject.threads_init()
google = "http://www.google.com"
class WebView(webkit.WebView):
#return page's content
def get_html(self):
self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')
html = self.get_main_frame().get_title()
self.execute_script('document.title=oldtitle;')
return html
#wait 5 senconds and send a signal
class TimeSender(gobject.GObject, threading.Thread):
def __init__(self):
self.__gobject_init__()
threading.Thread.__init__(self)
def run(self):
print "sleep 5 seconds"
time.sleep(5)
self.emit("Sender_signal")
gobject.type_register(TimeSender)
gobject.signal_new("Sender_signal", TimeSender, gobject.SIGNAL_RUN_FIRST, gobject.TYPE_NONE, ())
#PywebkitGTK, open google.com, receive signal
class Window(gtk.Window, gobject.GObject):
def __init__(self, time_sender, url):
self.__gobject_init__()
gtk.Window.__init__(self)
time_sender.connect('Sender_signal', self._finished_loading)
self._url = url
def open_page(self):
view = WebView()
view.get_html()
view.open(self._url)
self.add(view)
gtk.main()
#write html to file
def _finished_loading(self, view):
with open("pagehtml.html", 'w') as f:
f.write(view.get_html())
gtk.main_quit()
'''
def user_callback(object):
with open("pagehtml2.html", 'w') as f:
f.write(view.get_html())
gtk.main_quit()
'''
if __name__ == '__main__':
time_sender = TimeSender()
window = Window(time_sender, google)
#time_sender.connect("Sender_signal", user_callback)
time_sender.start()
window.open_page()
I got an error:
AttributeError: 'TimeSender' object has no attribute 'get_html'
I've been confused for a few days... thanks
Looks like you are confused about singals/objects and threads. _finished_loading method does not get view as a parameter as yo are not passing it. If you make it global it should work. Following piece of code works as expected.
#!/usr/bin/env python
import sys, threading
import gtk, webkit
import time
import gobject
gobject.threads_init()
google = "http://www.google.com"
class WebView(webkit.WebView):
#return page's content
def get_html(self):
self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')
html = self.get_main_frame().get_title()
self.execute_script('document.title=oldtitle;')
return html
#wait 5 senconds and send a signal
class TimeSender(gobject.GObject, threading.Thread):
def __init__(self):
self.__gobject_init__()
threading.Thread.__init__(self)
def myEmit(self):
window.emit("Sender_signal")
def run(self):
print "sleep 5 seconds"
time.sleep(5)
gobject.idle_add(self.myEmit)
gobject.type_register(TimeSender)
#PywebkitGTK, open google.com, receive signal
class Window(gtk.Window, gobject.GObject):
def __init__(self, time_sender, url):
self.__gobject_init__()
gtk.Window.__init__(self)
self.connect('Sender_signal', self._finished_loading)
self._url = url
def open_page(self):
self.view = WebView()
self.view.get_html()
self.view.open(self._url)
self.add(self.view)
gtk.main()
#write html to file
def _finished_loading(self, view1):
with open("pagehtml.html", 'w') as f:
f.write(self.view.get_html())
gtk.main_quit()
'''
def user_callback(object):
with open("pagehtml2.html", 'w') as f:
f.write(view.get_html())
gtk.main_quit()
'''
if __name__ == '__main__':
gobject.signal_new("Sender_signal", Window, gobject.SIGNAL_RUN_FIRST, gobject.TYPE_NONE, ())
time_sender = TimeSender()
window = Window(time_sender, google)
#time_sender.connect("Sender_signal", user_callback)
time_sender.start()
window.open_page()
I have the following problem. Whenever a child thread wants to perform some IO operation (writing to file, downloading a file) the program hangs. In the following example the program hangs on opener.retrieve. If I execute python main.py the program is blocked on an retrieve function. If I execute python ./src/tmp.py everything is fine. I don't understand why. Can anybody explain me what is happening?
I am using python2.7 on Linux system (kernel 3.5.0-27).
File ordering:
main.py
./src
__init__.py
tmp.py
main.py
import src.tmp
tmp.py
import threading
import urllib
class DownloaderThread(threading.Thread):
def __init__(self, pool_sema, i):
threading.Thread.__init__(self)
self.pool_sema = pool_sema
self.daemon = True
self.i = i
def run(self):
try:
opener = urllib.FancyURLopener({})
opener.retrieve("http://www.greenteapress.com/thinkpython/thinkCSpy.pdf", "/tmp/" + str(self.i) + ".pdf")
finally:
self.pool_sema.release()
class Downloader(object):
def __init__(self):
maxthreads = 1
self.pool_sema = threading.BoundedSemaphore(value=maxthreads)
def download_folder(self):
for i in xrange(20):
self.pool_sema.acquire()
print "Downloading", i
t = DownloaderThread(self.pool_sema,i)
t.start()
d = Downloader()
d.download_folder()
I managed to get it to work by hacking urllib.py - if you inspect it you will see many import statements dispersed within the code - i.e. it uses imports stuff 'on the fly' and not just when the module loads.
So, the real reason is still unknown - but not worth investigating - probably some deadlock in Python's import system. You just shouldn't run nontrivial code during an import - that's just asking for trouble.
If you insist, you can get it to work if you move all these weird import statements to the beginning of urllib.py.
I'm working with threads and I need to download a website with a thread. I also have a thread that sends the petition to the site but doesn't wait for an answer.
The one that doesn't wait is this:
class peticion(Thread):
def __init__(self, url):
Thread.__init__(self)
self.url = url
def run(self):
f = urllib.urlopen(self.url)
f.close()
This one works correctly, however the one that has to wait for the response takes something like a random time to complete, from 5 seconds to 2 minutes, or it may never finish. This is the class:
class playerConn(Thread):
def __init__(self, ev):
Thread.__init__(self)
self.ev = ev
def run(self):
try:
params = urllib.urlencode('''params go here''')
f = urllib.urlopen('''site goes here''')
resp = f.read()
f.close()
finally:
# do something with the response
Whether or not I use the try...finally statement it doesn't work, the code after the urlopen function won't get to execute.
What can I do?
It appear to just be a problem with the URL, the code is fine and it does not appear to do something wrong.
I bet you have some type of problem from the website, maybe a 404 or similar.
Try opening something in localhost, just to test.