Is multiprocessing or threading appropriate in this case in Python/Django?

Is multiprocessing or threading appropriate in this case in Python/Django? - python

I have a function like this in Django:
def uploaded_files(request):
global source
global password
global destination
username = request.user.username
log_id = request.user.id
b = File.objects.filter(users_id=log_id, flag='F') # Get the user id from session .delete() to use delete
source = 'sachet.adhikari#69.43.202.97:/home/sachet/my_files'
password = 'password'
destination = '/home/zurelsoft/my_files/'
a = Host.objects.all() #Lists hosts
command = subprocess.Popen(['sshpass', '-p', password, 'rsync', '--recursive', source],
stdout=subprocess.PIPE)
command = command.communicate()[0]
lines = (x.strip() for x in command.split('\n'))
remote = [x.split(None, 4)[-1] for x in lines if x]
base_name = [os.path.basename(ok) for ok in remote]
files_in_server = base_name[1:]
total_files = len(files_in_server)
info = subprocess.Popen(['sshpass', '-p', password, 'rsync', source, '--dry-run'],
stdout=subprocess.PIPE)
information = info.communicate()[0]
command = information.split()
filesize = command[1]
#st = int(os.path.getsize(filesize))
#filesize = size(filesize, system=alternative)
date = command[2]
users_b = User.objects.all()
return render_to_response('uploaded_files.html', {'files': b, 'username':username, 'host':a, 'files_server':files_in_server, 'file_size':filesize, 'date':date, 'total_files':total_files, 'list_users':users_b}, context_instance=RequestContext(request))
The main usage of the function is to transfer the file from the server to local machine and writes the data into the database. What I want it: There are single file which is of 10GB which will take a long time to copy. Since the copying happens using rsync in command line, I want to let user play with other menus while the file is being transferred. How can I achieve that? For example if the user presses OK, the file will be transferring in command line, so I want to show user "The file is being transferred" message and stop rolling the cursor or something like that? Is multiprocessing or threading appropriate in this case? Thanks

Assuming that function works inside of a view, your browser will timeout before the 10GB file has finished transferring over. Maybe you should re-think your architecture for this?
There are probably several ways to do this, but here are some that come to my mind right now:
One solution is to have an intermediary storing the status of the file transfer. Before you begin the process that transfers the file, set a flag somewhere like a database saying the process has begun. Then if you make your subprocess call blocking, wait for it to complete, check the output of the command if possible and update the flag you set earlier.
Then have whatever front end you have poll the status of the file transfer.
Another solution, if you make the subprocess call non-blocking as in your example, in that case you should use a thread which sits there reading the stdout and updating an intermediary store which your front end can query to get a more 'real time' update of the transfer process.

What you need is Celery.
It let's you spawn job as a parallel task and return http response.

RaviU solutions would certainly work.
Another option is to call a blocking subprocess in its own Thread. This thread could be responsible for setting a flag or information (in memcache, db, or just a file on the harddrive) as well as clearing it when it's complete. Personally, there is no love lost between reading rsyncs stdout and I so I usually just ask the OS for the filesize.
Also, if you don't need the file absolutely ASAP, adding "-c" to do a checksum can be good for those giant files. source: personal experience trying to transfer giant video files over spotty campus network.
I will say the one problem with all of the solutions so far is that it doesn't work for "N" files. Eventually, even if you make sure each file only can be transfered once at a time, if you have a lot of different files then eventually it'll bog down the system. You might be better off just using some sort of task queue unless you know it will only ever be the one file at a time. I haven't used one recently, but a quick google search yielded Celery which doesn't look to bad.

Every web server has a facility of uploading files. And what it does for large files is that it divides the file in chunks and does a merge after every chunk is received. What you can do here is that you can have a hidden tag in your html page which has a value attribute and whenever your upload webservice returns you an ok message at that point of time you can change the hidden html value to something relevant and also write a function that keeps on reading the value of that hidden html element and check whether your file uploading has been finished or not.

Related

Python thread fails to time out when terminal command requires user input

The Task
I'm building a Python script, the purpose of which is to audit a number of .tex files, and one step in this auditing process is to test whether each file will compile, each file being compiled using the terminal command xelatex filename.tex.
These are the methods with which I'm testing whether a given file compiles:
def run_xelatex(self):
""" Ronseal. """
self.latex_process = Popen(["xelatex", "current.tex"], stdout=PIPE)
lines = self.latex_process.stdout.readlines()
for line in self.latex_process.stdout:
self.screentext = self.screentext+line.decode("utf-8")+"\n"
def attempt_to_compile(self):
""" Attempt to compile an article, and kill the process if
necessary. """
thread = Thread(target=self.run_xelatex())
thread.start()
thread.join(3)
if thread.is_alive():
self.latex_process.kill()
thread.join()
return False
return True
In English: I create a thread, which in turn creates a process, which in turn tries to compile a given file. If the thread times out, then that file is marked as being uncompilable.
The Problem
The problem is that, if xelatex finds some bad syntax, it asks the user for manual input in order to resolve the issue. But then, for some reason, the thread does not time out when the process is waiting for user input. This means that, when I try to run the script, it stops in mid-flow at several points, until I mash the return key to get things going again. This is not ideal.
What I Want
An explanation of why a thread fails to time out when a process within it asks for user input.
A solution to the problem, either by forcing the thread to time out in the above circumstances, or by preventing xelatex from asking for user input.
Alternatively, an explanation for why what I'm trying to achieve is totally insane, and a suggestion for a better line of attack.

Faster alternatives to Popen for CAN bus access?

I'm currently using Popen to send instructions to a utility (canutils... the cansend function in particular) via the command line.
The entire function looks like this.
def _CANSend(self, register, value, readWrite = 'write'):
"""send a CAN frame"""
queue=self.CANbus.queue
cobID = hex(0x600 + self.nodeID) #assign nodeID
indexByteLow,indexByteHigh,indexByteHigher,indexByteHighest = _bytes(register['index'], register['objectDataType'])
subIndex = hex(register['subindex'])
valueByteLow,valueByteHigh,valueByteHigher,valueByteHighest = _bytes(value, register['objectDataType'])
io = hex(COMMAND_SPECIFIER[readWrite])
frame = ["cansend", self.formattedCANBus, "-i", cobID, io, indexByteLow, indexByteHigh, subIndex, valueByteLow, valueByteHigh, valueByteHigher, valueByteHighest, "0x00"]
Popen(frame,stdout=PIPE)
a=queue.get()
queue.task_done()
return a
I was running into some issues as I was trying to send frames (the Popen frame actually executes the command that sends the frame) in rapid succession, but found that the Popen line was taking somewhere on the order of 35 ms to execute... every other line was less than 2 us.
So... what might be a better way to invoke the cansend function (which, again, is part of the canutils utility..._CANSend is the python function above that calls ) more rapidly?

I suspect that most of that time is due to the overhead of forking every time you run cansend. To get rid of it, you'll want an approach that doesn't have to create a new process for each send.
According to this blog post, SocketCAN is supported by python 3.3. It should let your program create and use CAN sockets directly. That's probably the direction you'll want to go.

Block execution until a file is created/modified

I have a Python HTTP server, on a certain GET request a file is created which is returned as response afterwards. The file creation might take a second, respectively the modification (updating) of the file.
Hence, I cannot return immediately the file as response. How do I approach such a problem? Currently I have a solution like this:
while not os.path.isfile('myfile'):
time.sleep(0.1)
return myfile
This seems very inconvenient, but is there a possibly better way?
A simple notification would do, but I don't have control over the process which creates/updates the files.

You could use Watchdog for a nicer way to watch the file system?

Something like this will remove the os call:
while updating:
time.sleep(0.1)
return myfile
...
def updateFile():
# updating file
updating = false

Implementing blocking io operations in synchronous HTTP requests is a bad approach. If many people run the same procedure simultaneously you may soon run out of threads (if there is a limited thread pool). I'd do the following:
A client requests the file creation URI. A file generating procedure is initialized in a background process (some asynchronous task system), the user gets a file id / name in the HTTP response. Next the client makes AJAX calls every once a while (polling), to check if the file has been created/modified (seperate file serve/check-if-exists URI). When the file is finaly created, the user is redirected (js window.location) to the file serving URI.
This approach will require a bit more work, but eventually it will pay off.

You can try using os.path.getmtime, this would check the modification time of the file and return if it's less than 1 sec ago. Also I suggest you only make a limited amount of tries or you will be stuck in an infinite loop if the file doesn't get created/modified. And as #Krzysztof Rosiński pointed out you should probably think about doing it in a non-blocking way.
import os
from datetime import datetime
import time
for i in range(10):
try:
dif = datetime.now()-datetime.fromtimestamp(os.path.getmtime(file_path))
if dif.total_seconds() < 1:
return file
except OSError:
time.sleep(0.1)

pygtk textview getbuffer and write at the same time

I am trying to make a program(in python) that as I write it writes to a file and opens to a certain window that I have already created.I have looked allarund for a vaible soution bt it would seem that multi-threading may be the only option.
I was hoping that when option autorun is "activated" it will:
while 1:
wbuffer = textview.get_buffer()
text = wbuffer.get_text(wbuffer.get_start_iter(), wbuffer.get_end_iter())
openfile = open(filename,"w")
openfile.write(text)
openfile.close()
I am using pygtk and have a textview window, but when I get the buffer it sits forever.
I am thinking that I need to multi-thread it and queue it so one thread will be writing the buffer while it is being queued.
my source is here. (I think the statement is at line 177.)
any help is much appreciated. :)
and here is the function:
def autorun(save):
filename = None
chooser = gtk.FileChooserDialog("Save File...", None,
gtk.FILE_CHOOSER_ACTION_SAVE,
(gtk.STOCK_CANCEL, gtk.RESPONSE_CANCEL,
gtk.STOCK_SAVE, gtk.RESPONSE_OK))
response = chooser.run()
if response == gtk.RESPONSE_OK: filename = chooser.get_filename()
filen = filename
addr = (filename)
addressbar.set_text("file://" + filename)
web.open(addr)
chooser.destroy()
wbuffer = textview.get_buffer()
while 1:
text = wbuffer.get_text(wbuffer.get_start_iter(), wbuffer.get_end_iter())
time.sleep(1)
openfile = open(filename,"w")
openfile.write(text)
openfile.close()

Though not too easy to see exactly what your GTK-stuff not included here is doing, the main problem is that the control needs to be returned to the gtk main-loop. Else the program will hang.
So if you have a long process (like this eternal one here), then you need to thread it. The problem is that you need the thread to exit nicely when the main program quits, so you'll have to redesign a bit around that. Also, threading with gtk needs to be initialized correctly (look here).
However, I don't think you need threading, instead you could connect the changed signal of your TextBuffer to a function that writes the buffer to the target-file (if the user has put the program in autorun-mode). A problem with this is if the buffer gets large or program slow, in which case, you should consider threading the callback of the changed signal. So this solution requires to make sure you don't get into the situation where save-requests get stacked on top of each other because the user is faster at typing than the computer is saving. Takes some design thought.
So, finally, the easier solution: you may not want the buffer to save for every button-press. In which case, you could have the save-function (which could look like your first code-block without the loop) on a timeout instead. Just don't make the time-out too short.

UNIX named PIPE end of file

I'm trying to use a unix named pipe to output statistics of a running service. I intend to provide a similar interface as /proc where one can see live stats by catting a file.
I'm using a code similar to this in my python code:
while True:
f = open('/tmp/readstatshere', 'w')
f.write('some interesting stats\n')
f.close()
/tmp/readstatshere is a named pipe created by mknod.
I then cat it to see the stats:
$ cat /tmp/readstatshere
some interesting stats
It works fine most of the time. However, if I cat the entry several times in quick successions, sometimes I get multiple lines of some interesting stats instead of one. Once or twice, it has even gone into an infinite loop printing that line forever until I killed it. The only fix that I've got so far is to put a delay of let's say 500ms after f.close() to prevent this issue.
I'd like to know why exactly this happens and if there is a better way of dealing with it.
Thanks in advance

A pipe is simply the wrong solution here. If you want to present a consistent snapshot of the internal state of your process, write that to a temporary file and then rename it to the "public" name. This will prevent all issues that can arise from other processes reading the state while you're updating it. Also, do NOT do that in a busy loop, but ideally in a thread that sleeps for at least one second between updates.

What about a UNIX socket instead of a pipe?
In this case, you can react on each connect by providing fresh data just in time.
The only downside is that you cannot cat the data; you'll have to create a new socket handle and connect() to the socket file.
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
try:
os.unlink(MYSOCKETFILE)
except OSError: pass
s = socket.socket(socket.AF_UNIX)
s.bind(MYSOCKETFILE)
s.listen(10)
while True:
s2, peeraddr = s.accept()
s2.send('These are my actual data')
s2.close()
Program querying this socket:
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
s = socket.socket(socket.AF_UNIX)
s.connect(MYSOCKETFILE)
while True:
d = s.recv(100)
if not d: break
print d
s.close()

I think you should use fuse.
it has python bindings, see http://pypi.python.org/pypi/fuse-python/
this allows you to compose answers to questions formulated as posix filesystem system calls

Don't write to an actual file. That's not what /proc does. Procfs presents a virtual (non-disk-backed) filesystem which produces the information you want on demand. You can do the same thing, but it'll be easier if it's not tied to the filesystem. Instead, just run a web service inside your Python program, and keep your statistics in memory. When a request comes in for the stats, formulate them into a nice string and return them. Most of the time you won't need to waste cycles updating a file which may not even be read before the next update.

You need to unlink the pipe after you issue the close. I think this is because there is a race condition where the pipe can be opened for reading again before cat finishes and it thus sees more data and reads it out, leading to multiples of "some interesting stats."
Basically you want something like:
while True:
os.mkfifo(the_pipe)
f = open(the_pipe, 'w')
f.write('some interesting stats')
f.close()
os.unlink(the_pipe)
Update 1: call to mkfifo
Update 2: as noted in the comments, there is a race condition in this code as well with multiple consumers.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is multiprocessing or threading appropriate in this case in Python/Django? - python

What you need is Celery. It let's you spawn job as a parallel task and return http response.

Related

Python thread fails to time out when terminal command requires user input

Faster alternatives to Popen for CAN bus access?

Block execution until a file is created/modified

pygtk textview getbuffer and write at the same time

UNIX named PIPE end of file

Categories

Resources