python simple threading won't ends without doing anything (maybe) - python

When i run the following code (using "sudo python servers.py") the process seem to just finish immediately with just printing "test".
why doesn't the functions "proxy_server" won't run ? or maybe they do but i do not realize that. (because the first line in proxy function doesn't print anything)
this is an impotent code, i didn't want to put unnecessary content, yet it still demonstrate my problem:
import os,sys,thread,socket,select,struct,time
HTTP_PORT = 80
FTP_PORT=21
FTP_DATA_PORT = 20
IP_IN = '10.0.1.3'
IP_OUT = '10.0.3.3'
sys_http = 'http_proxy'
sys_ftp = 'ftp_proxy'
sys_ftp_data = 'ftp_data_proxy'
def main():
try:
thread.start_new_thread(proxy_server, (HTTP_PORT, IP_IN,sys_http,http_handler))
thread.start_new_thread(proxy_server, (FTP_PORT, IP_IN,sys_ftp,http_handler))
thread.start_new_thread(proxy_server, (FTP_DATA_PORT, IP_OUT,sys_ftp_data,http_handler))
print "test"
except e:
print 'Error!'
sys.exit(1)
def proxy_server(host,port,fileName,handler):
print "Proxy Server Running on ",host,":",port
def http_handler(src,sock):
return ''
if __name__ == '__main__':
main()
What am i missing or doing wrong ?

First, you have indentation problems related to using mixed tabs and spaces for indentation. While they didn't cause your code to misbehave in this particular case, they will cause you problems later if you don't stick to consistently using one or the other. They've already broken the displayed indentation in your question; see the print "test" line in main, which looks misaligned.
Second, instead of the low-level thread module, you should be using threading. Your problem is occurring because, as documented in the thread module documentation,
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executing try ... finally clauses or executing object destructors.
threading threads let you explicitly define whether other threads should survive the death of the main thread, and default to surviving. In general, threading is much easier to use correctly.

Related

Python module function returning immediately in thread

I've written a couple of twitter scrapers in python, and am writing another script to keep them running even if they suffer a timeout, disconnection, etc.
My current solution is as follows:
Each scraper file has a doScrape/1 function in it, which will start up a scraper and run it once, eg:
def doScrape(logger):
try:
with DBWriter(logger=logger) as db:
logger.log_info("starting", __name__)
s = PastScraper(db.getKeywords(), TwitterAuth(), db, logger)
s.run()
finally:
logger.log_info("Done", __name__)
Where run is a near-infinite loop, which won't break unless there is an exception.
In order to run one of each kind of scraper at once, I'm using this code (with a few extra imports):
from threading import Thread
class ScraperThread(Thread):
def __init__(self, module, logger):
super(ScraperThread, self).__init__()
self.module = module # Module should contain a doScrape(logger) function
self.logger = logger
def run(self):
while True:
try:
print "Starting!"
print self.module.doScrape
self.module.doScrape(self.logger)
except: # if for any reason we get disconnected, reconnect
self.logger.log_debug("Restarting scraper", __name__)
if __name__ == "__main__":
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
past = ScraperThread(PastScraper, l)
stream = ScraperThread(StreamScraper, l)
past.start()
stream.start()
past.join()
stream.join()
However, it appears that my call of doScrape from above is returning immediately, hence "Starting!" is printed in the console repeatedly, and that "Done" message in the finally block is not written to the log, whereas when run individually like so:
if __name__ == "__main__":
# Example instantiation
from Scrapers.Logging import Logger
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
doScrape(l)
The code runs forever, as expected. I'm a bit stumped.
Is there anything silly that I might have missed?
get rid of the diaper pattern in your run() method, as in: get rid of that catch-all exception handler. You'll probably get the error printed there then. I think there may be something wrong in the DBWriter or other code you're calling from your doScrape function. Perhaps it is not thread-safe. That would explain why running it from the main program directly works, but calling it from a thread fails.
Aha, solved it! It was actually that I didn't realise that a default argument (here in TwitterAuth()) is evaluated at definition time. TwitterAuth reads the API key settings from a file handle, and the default argument opens up the default config file. Since this file handle is generated at definition time, both threads had the same handle, and once one had read it, the other one tried to read from the end of the file, throwing an exception. This is remedied by resetting the file before use, and using a mutex.
Cheers to Irmen de Jong for pointing me in the right direction.

Block python's atexit during a crash?

I have a python script I've written which uses atexit.register() to run a function to persist a list of dictionaries when the program exits. However, this code is also running when the script exits due to a crash or runtime error. Usually, this results in the data becoming corrupted.
Is there any way to block it from running when the program exits abnormally?
EDIT: To clarify, this involves a program using flask, and I'm trying to prevent the data persistence code from running on an exit that results from an error being raised.
You don't want to use atexit with Flask. You want to use Flask signals. It sounds like you are specifically looking for the request_finished signal.
from flask import request_finished
def request_finished_handler(sender, response, **extra):
sender.logger.debug('Request context is about to close down. '
'Response: %s', response)
# do some fancy storage stuff.
request_finished.connect(request_finished_handler, app)
The benefit of request_finished is that it only fires after a successful response. That means that so long as there isn't an error in another signal, you should be good.
One way: at global level in main program:
abormal_termination = False
def your_cleanup_function():
# Add next two lines at the top
if abnormal_termination:
return
# ...
# At end of main program:
try:
# your original code goes here
except Exception: # replace according to what *you* consider "abnormal"
abnormal_termination = True # stop atexit handler
Not pretty, but straightforward ;-)

Performing an action upon unexpected exit python

I was wandering if there was a way to perform an action before the program closes. I am running a program over a long time and I do want to be able to close it and have the data be saved in a text file or something but there is no way of me interfering with the while True loop I have running, and simply saving the data each loop would be highly ineffective.
So is there a way that I can save data, say a list, when I hit the x or destroy the program? I have been looking at the atexit module but have had no luck, except when I set the program to finish at a certain point.
def saveFile(list):
print "Saving List"
with open("file.txt", "a") as test_file:
test_file.write(str(list[-1]))
atexit.register(saveFile(list))
That is my whole atexit part of the code and like I said, it runs fine when I set it to close through the while loop.
Is this possible, to save something when the application is terminated?
Your atexit usage is wrong. It expects a function and its arguments, but you're just calling your function right away and passing the result to atexit.register(). Try:
atexit.register(saveFile, list)
Be aware that this uses the list reference as it exists at the time you call atexit.register(), so if you assign to list afterwards, those changes will not be picked up. Modifying the list itself without reassigning should be fine, though.
You could use the handle_exit context manager from this ActiveState recipe:
http://code.activestate.com/recipes/577997-handle-exit-context-manager/
It handles SystemExit, KeyboardInterrupt, SIGINT, and SIGTERM, with a simple interface:
def cleanup():
print 'do some cleanup here'
def main():
print 'do something'
if __name__ == '__main__':
with handle_exit(cleanup):
main()
There's nothing you can in reaction to a SIGKILL. It kills your process immediately, without any allowed cleanup.
Catch the SystemExit exception at the top of your application, then rethrow it.
There are a a couple of approaches to this. As some have commented you could used signal handling ... your [Ctrl]+[C] from the terminal where this is running in the foreground is dispatching a SIGHUP signal to your process (from the terminal's drivers).
Another approach would be to use a non-blocking os.read() on sys.stdin.fileno such that you're polling your keyboard one during every loop to see if an "exit" keystroke or sequence has been entered.
A similarly non-blocking polling approach can be implemented using the select module's functionality. I've see that used with the termios and tty modules. (Seems inelegant that it needs all those to save, set changes to, and restore the terminal settings, and I've also seen some examples using os and fcntl; and I'm not sure when or why one would prefer one over the other if os.isatty(sys.stdin.fileno())).
Yet another approach would be to use the curses module with window.nodelay() or window.timeout() to set your desired input behavior and then either window.getch() or window.getkey() to poll for any input.

Execute python code without invoking import statement each time

Here is a sample python script. How do I run this script multiple times from command line so that the import line is not called every time? The import statement takes too long to load.
import arcpy
val = arcpy.GetCellValue_management("D:\dem-merged\lidar_wsg84", "-95.090174910630012 29.973962146120652", "")
print str(val)
This problem has no solution if you strictly want this script "to be called from another program. by issuing 'python script.py' on command line".
If you want to do the "heavy import" only once, you have to start python script only once.
Think about starting a daemon, which will start once and then process calls from other program. This way all initialization has to be done only one time and next calls will be fast.
And if you split your python code into two parts (first part for daemon, second for daemon client), you'll be able to call 'python client.py' from another program, but actual computation will be performed by daemon, which is started just one time.
As example:
daemon.py
import socket
#import arcpy
def actual_work():
#val = arcpy.GetCellValue_management("D:\dem-merged\lidar_wsg84", "-95.090174910630012 29.973962146120652", "")
#return str(val)
return 'dummy_reply'
def main():
sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
try:
sock.bind( ('127.0.0.1', 6666) )
while True:
data, addr = sock.recvfrom( 4096 )
reply = actual_work()
sock.sendto(reply, addr)
except KeyboardInterrupt:
pass
finally:
sock.close()
if __name__ == '__main__':
main()
client.py
import socket
import sys
def main():
sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )
sock.settimeout(1)
try:
sock.sendto('', ('127.0.0.1', 6666))
reply, _ = sock.recvfrom(4096)
print reply
except socket.timeout:
sys.exit(1)
finally:
sock.close()
if __name__ == '__main__':
main()
It's virtually impossible. Once you leave the interpreter, the modules that were imported are no longer in the memory. It's similar to asking Firefox to save large webpages in memory because the read rate to the cache takes too long. Once Firefox (or Python) is shut off, it's pretty much bye-bye anything in the RAM.
You can make the load time faster, but at your own risk. By running
python -O
you can make it go a bit faster. You can also add another 'O' to make it go just a bit faster. However, this can make some programs buggy and doesn't always work.
You could copy the functions you need into your program by doing
from arcpy import <what you need>
and that might make things go slightly faster.
As far as I know the module gets imported once. So if you do:
import a
import a
it only gets imported once. So instead of running the script many times, maybe you can change it to make all the copies in one go.
If you have to run this specific script many times, I think you can't avoid the import and you'll have to import it every time.
One solution I can think of is to have a server process that runs persistently that does the actual work, while the script that's actually invoked from the command line merely issues requests to that script. This is a fair bit of work, but it may be worth it.
The only solution I can think of is to copy the individual function(s) you need into your code manually, if what you need to execute is small enough.
If you need help on how to do this, just ask in the comments.
Looking at your use case (calling it from a Ruby on Rails webservice), one of the easiest ways would be to use XML-RPC. Use the SimpleXMLRPCServer from the python standard lib, and then use a ruby client (ruby seems to have xmlrpc in the standard lib)?
Easy.
Write your own simple shell using the cmd module and use the runpy module to run your scripts. Import you big module in the shell program and pass it to the programs using init_globals
Look through the docs for http://pypi.python.org/pypi/cmd2/ and it should be fairly clear how you can write your own simple shell, even if it just has two commands, one to edit a file and one to run it.
runpy is part of the Python standard library http://docs.python.org/library/runpy.html and you may not need it, but it is useful to know that the import and module loading mechanism can be controlled and even modified by your command shell.
Have you ever wondered where the name "var1" goes when you execute something like var1 = 25? How does Python find what var1 refers to when you later execute print var1? The answer is that these names are in a dictionary and if you understand what Python dictionaries are and what they can do, it seems like an obvious solution to the problem of connecting names with values. But there's more. Python can have lots of namespaces and you can manipulate those namespaces the same way you manipulate dictionaries. Read this http://www.diveintopython.net/html_processing/locals_and_globals.html to understand the locals and globals namespace. Here is another discussion that will help http://lucumr.pocoo.org/2011/2/1/exec-in-python/
Play around with exec like in this question globals and locals in python exec() until you understand how it works. Then build your command shell to import the module one time at the beginning, and write your scripts to only import the module if it is not already available. When the script is run from inside your shell, the module will already be there.

Overriding basic signals (SIGINT, SIGQUIT, SIGKILL??) in Python

I'm writing a program that adds normal UNIX accounts (i.e. modifying /etc/passwd, /etc/group, and /etc/shadow) according to our corp's policy. It also does some slightly fancy stuff like sending an email to the user.
I've got all the code working, but there are three pieces of code that are very critical, which update the three files above. The code is already fairly robust because it locks those files (ex. /etc/passwd.lock), writes to to a temporary files (ex. /etc/passwd.tmp), and then, overwrites the original file with the temporary. I'm fairly pleased that it won't interefere with other running versions of my program or the system useradd, usermod, passwd, etc. programs.
The thing that I'm most worried about is a stray ctrl+c, ctrl+d, or kill command in the middle of these sections. This has led me to the signal module, which seems to do precisely what I want: ignore certain signals during the "critical" region.
I'm using an older version of Python, which doesn't have signal.SIG_IGN, so I have an awesome "pass" function:
def passer(*a):
pass
The problem that I'm seeing is that signal handlers don't work the way that I expect.
Given the following test code:
def passer(a=None, b=None):
pass
def signalhander(enable):
signallist = (signal.SIGINT, signal.SIGQUIT, signal.SIGABRT, signal.SIGPIPE, signal.SIGALRM, signal.SIGTERM, signal.SIGKILL)
if enable:
for i in signallist:
signal.signal(i, passer)
else:
for i in signallist:
signal.signal(i, abort)
return
def abort(a=None, b=None):
sys.exit('\nAccount was not created.\n')
return
signalhander(True)
print('Enabled')
time.sleep(10) # ^C during this sleep
The problem with this code is that a ^C (SIGINT) during the time.sleep(10) call causes that function to stop, and then, my signal handler takes over as desired. However, that doesn't solve my "critical" region problem above because I can't tolerate whatever statement encounters the signal to fail.
I need some sort of signal handler that will just completely ignore SIGINT and SIGQUIT.
The Fedora/RH command "yum" is written is Python and does basically exactly what I want. If you do a ^C while it's installing anything, it will print a message like "Press ^C within two seconds to force kill." Otherwise, the ^C is ignored. I don't really care about the two second warning since my program completes in a fraction of a second.
Could someone help me implement a signal handler for CPython 2.3 that doesn't cause the current statement/function to cancel before the signal is ignored?
As always, thanks in advance.
Edit: After S.Lott's answer, I've decided to abandon the signal module.
I'm just going to go back to try: except: blocks. Looking at my code there are two things that happen for each critical region that cannot be aborted: overwriting file with file.tmp and removing the lock once finished (or other tools will be unable to modify the file, until it is manually removed). I've put each of those in their own function inside a try: block, and the except: simply calls the function again. That way the function will just re-call itself in the event of KeyBoardInterrupt or EOFError, until the critical code is completed.
I don't think that I can get into too much trouble since I'm only catching user provided exit commands, and even then, only for two to three lines of code. Theoretically, if those exceptions could be raised fast enough, I suppose I could get the "maximum reccurrsion depth exceded" error, but that would seem far out.
Any other concerns?
Pesudo-code:
def criticalRemoveLock(file):
try:
if os.path.isFile(file):
os.remove(file)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalRemoveLock(file)
def criticalOverwrite(tmp, file):
try:
if os.path.isFile(tmp):
shutil.copy2(tmp, file)
os.remove(tmp)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalOverwrite(tmp, file)
There is no real way to make your script really save. Of course you can ignore signals and catch a keyboard interrupt using try: except: but it is up to your application to be idempotent against such interrupts and it must be able to resume operations after dealing with an interrupt at some kind of savepoint.
The only thing that you can really to is to work on temporary files (and not original files) and move them after doing the work into the final destination. I think such file operations are supposed to be "atomic" from the filesystem prospective. Otherwise in case of an interrupt: restart your processing from start with clean data.

Categories