receiving data from java program that calls python script is too slow

receiving data from java program that calls python script is too slow - python

I have a python script that is called from a java program
The java program feeds data to the python script sys.stdin and the java program receives data from the python process outputstream .
What is know is this .
running the command 'python script.py' from the java program on 10MB of data takes about 35 seconds.
However running the commands 'python script.py > temp.data ' and then cat temp.data is significantly faster.
The order of magnitude of performance is even more drastic as the data gets larger.
In order to address this , I am thinking maybe there is a way to change the sys.stdout to mimic what I am doing.
Or maybe I can pipe the python script output to a virtual file .
Any recommendations ?

This is probably a buffering problem when you have the Java program writing to one filehandle and reading from another filehandle. The order of those in the Java and the size of the writes is suboptimal and it's slowing itself down.
I would try "python -u script.py" to see what it does when you ask python to unbuffer, which should be slower but might trick your calling program into racing a different way, perhaps faster.
The larger fix, I think, is to batch your code, as you are testing with, and read the resulting file, or to use posix select() or filehandle events to handle how your java times its writes and reads.

Related

Debug a Python program which seems paused for no reason

I am writing a Python program to analyze log files. So basically I have about 30000 medium-size log files and my Python script is designed to perform some simple (line-by-line) analysis of each log file. Roughly it takes less than 5 seconds to process one file.
So once I set up the processing, I just left it there and after about 14 hours when I came back, my Python script simply paused right after analyzing one log file; seems that it hasn't written into the file system for the analyzing output of this file, and that's it. No more proceeding.
I checked the memory usage, it seems fine (less than 1G), I also tried to write to the file system (touch test), it also works as normal. So my question is that, how should I proceed to debug the issue? Could anyone share some thoughts on that? I hope this is not too general. Thanks.

You may use Trace or track Python statement execution and/or The Python Debugger module.

Try this tool https://github.com/khamidou/lptrace with command:
sudo python lptrace -p <process_id>
It will print every python function your program invokes and may help you understand where your program stucks or in an infinity loop.
If it does not output anything, that's proberbly your program get stucks, so try
pstack <process_id>
to check the stack trace and find out where stucks. The output of pstack is c frames, but I believe somehow you can find something useful to solve your problem.

Simultaneous Python and C++ run with read and write files

So this one is a doozie, and a little too specific to find an answer online.
I am writing to a file in C++ and reading that file in Python at the same time to move a robot. Or trying to.
When I try running both programs at the same time, the C++ one runs first and then the Python one runs.
Here's the command I use:
./ColorFollow & python fileToHex.py
This happens even if I switch the order of commands.
Even if I run them in different terminals (which is the same thing, just covering all bases).
Both the Python and C++ code read / write in 'infinite' loops, so these two should run until I say stop.
The code works fine; when the Python script finally runs the robot moves as intended. It's just that the code doesn't run at the same time.
Is there a way to make this happen, or is this impossible?
If you need more information, lemme know, but the code is pretty much what you'd expect it to be.

If you are using Linux, & will release bash session and in this case, CollorFlow and fileToXex.py will run in different bash sessions.
At the same time, composition ./ColorFollow | python fileToHex.py looks interesting, cause you redirect stdout of ColorFollow to fileToHex.py stdin - it can syncronize scripts by printing some code string upon exit, then reading it by fileToHex.py and exit as well.
I would create some empty file like /var/run/ColorFollow.flag and write there 1 when one of processes exit. Not a pipe - cause we do not care which process will start first. So, if next loop step of ColorFollow sees 1 in the file, it deletes it and exits (means that fileToHex already exited). The same - for fileToHex - check flag file each loop step and exit if it exists, after deleting flag file.

How to keep piping output of a program to a python script, if the program is restarted

I know i can read the output of another script in Python by e.g. calling some_program | print_input.py and using sys.stdin in print_input.py like this:
import sys
if __name__=='__main__':
while True:
print sys.stdin.read(1024)
But is it also possible to restart some_program and still get its output without restarting print_input.py?
The idea is that the script some_program may crash, so that i will have to restart it, without loosing the current state of print_input.py.
Additional info that might be needed:
Launching some_program from within print_input.py using e.g. subprocess is not an option unfortunately.
Low latency requirements, so no (long) blocking calls.
The output of some_program is massive.
I can't modify some_program.

The elegant/usual solution would be to use named pipes. Create a pipe using mkfifo , pipe the output of some_program to it and the python script can just read from the pipe. Both program can be restarted without issues.
Im not sure about performance but no IO should be involved (even thought the pipe seems to be a file).
Another possibility would be to create a temporary file in a tmpfs or ramfs filesystem, have some_program write to it, and the python script can just repeatedly try reading. But IMO this is strictly worse than using pipes..

twisted reactor.spawnProcess get stdout w/o bufffering on windows

I'm running an external process and I need to get the stdout immediately so I can push it to a textview, on GNU/Linux I can use "usePTY=True" to get the stdout by line, unfortunately usePTY is not available on windows.
I'm fairly new to twisted, is there a way to achieve the same result on Windows with some twisted (or python maybe) magic stuff?

on GNU/Linux I can use "usePTY=True" to get the stdout by line
Sort of! What usePTY=True actually does is create a PTY (a "pseudo-terminal" - the thing you always get when you log in to a shell on GNU/Linux unless you have a real terminal which no one does anymore :) instead of a boring old pipe. A PTY is a lot like a pipe but it has some extra features - but more importantly for you, a PTY is strongly associated with interactive sessions (ie, a user) whereas a pipe is pretty strongly associated with programmatic uses (think foo | bar - no user ever sees the output of foo).
This means that people tend to use existence of a PTY as stdout as a signal that they should produce output in a timely manner - because a human is waiting to see it. On the flip side, the existence of a regular old pipe as stdout is taken as a signal that another program is consuming the output and they should instead produce output in the most efficient way possible.
What this tends to mean in practice is that if a program has a PTY then it will line buffer its output and if it has a pipe then it will "block" buffer its output (usually gather up about 4kB of data before writing any of it) - because line buffering is less efficient.
The thing to note here is that it is the program you are running that does this buffering. Whether you pass usePTY=True or usePTY=False makes no direct difference to that buffering: it is just a hint to the program you are running what kind of output buffering it should do.
This means that you might run programs that block buffer even if you pass usePTY=True and vice versa.
However... Windows doesn't have PTYs. So programs on Windows can't consider PTYs as a hint for how to buffer their output.
I don't actually know if there is another hint that it is conventional for programs to respect on Windows. I've never come across one, at least.
If you're lucky, then the program you're running will have some way for you to request line-buffered output. If you're running Python, then it does - the PYTHONUNBUFFERED environment variable controls this, as does the -u command line option (and I think they both work on Windows).
Incidentally, if you plan to pass binary data between the two processes, then you probably also want to put stdio into binary mode in the child process as well:
import os, sys, mscvrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)

Python socket program with shell script?

I have two machines connected by a switch. I have a popular server application which we can call "SXC_SERVER" on machine A and I interrogate the "SXC_SERVER" with the corresponding application from machine B, which I'll call "SXC_CLIENT". What I am trying to do is two-fold:
firstly, gain the traffic flow of SXC_SERVER and SXC_CLIENT interaction through tcpdump. The interaction between the two is a simple GET and RESPONSE, but I require the traffic traces.
secondly, I am wanting to log the Resident Set Size (RSS) usage of the SXC_SERVER process during each interaction/iteration
Moreover, I don't just need one traffic trace of the communication and one memory usage log of the SXC_SERVER process otherwise I wouldn't be writing this because I could go away and do that in ten minutes... In fact I am aiming to do very many! But let's say here for simplicity I want to do 10.
Since this will be very labor intensive as it will require me to be at both machines stopping and starting all of the SCX_CLIENT-to-SXC_SERVER interrogation, the tcpdump traffic capture, and the RSS memory usage of SXC_SERVER logging I want to write an automation script.
But! I am not a programmer, or software guy...(darn)
However, that said I can imaging a separate client/server program that oversees this automation, which we can call AUTO_SERVER and AUTO_CLIENT. My thoughts are that machine B would run AUTO_CLIENT and machine A would run AUTO_SERVER. The aim of both are to facilitate the automation, i.e. the stopping and starting of the tcpdump, and the memory logging on machine A of SXC_SERVER process before machine B queries SXC_SERVER with SXC_CLIENT (if you follow me!).
Effectively after one run of the SXC_SERVER-to-SXC_CLIENT GET/RESPONSE interaction I'll end up with:
one traffic capture *.pcap file called n1.pcap
and one memory log dump (of the RSS associated to the process) called n1.csv.
I am not a programmer or software guy but I can see a rough method (to the best of my ability) to achieve this, as follows:
Machine A: AUTO_SERVER
BEGIN:
msgRecieved = open socket(listen on port *n*)
DO
1. wait for machine A to tell me when to start watch (as in the program) to log RSS memory usage of the SXC_SERVER process using hardcoded command:
watch -n 0.1 'ps -p $(pgrep -d"," -x snmpd) -o rss= | awk '\''{ i += $1 } END { print i }'\'' >> ~/Desktop/mem_logs/mem_i.csv
UNTIL (messageRecieved == "FINISH")
quit
END.
Machine B: AUTO_CLIENT
BEGIN:
open socket(new)
for i in 10, do
1. locally start tcpdump with hardcoded hardcoded tcpdump command with relevant filter to only capture the SXC_SERVER-to-SXC_CLIENT traffic and set output flag to capture all traffic to a PCAP file called n*i*.pcap where *i* is the integer of the current for loop, saving the file in folder "~/Desktop/test_captures/".
2. Send the GET request to SXC_SERVER
3. wait for RESPONSE reply from SXC_SERVER
4. after recieved reply tell machine B to stop watch command
i++
5. send string "FINISH" to machine A.
END.
As you can see I would assume that this would be achieved by the use of a separate, and small client/server-like program (which here I've called AUTO_SERVER and AUTO_CLIENT) on both machines. The really rought pseudo-code design should be self-explanatory.
I have found a small client/server socket program located here: http://www.velvetcache.org/2010/06/14/python-unix-sockets which I would think may be suitable if I edit it, but I am not sure how exactly I can feasibly achieve this. Which is where you may be able to provide some assistance.
Can Python to do this automating?
Can it be done with a single bash script?
Do you think I am on the right path with this?
Or have you any helpful suggestions?
Regards.

You can use Python for this kind of thing, but I would strongly recommend using SSH for the bulk of the work (rather than coding the connection stuff yourself), and then using either a bash script or Python script to launch the tcpdump etc. processes.
Your question, however, is a bit too open-ended for stackoverflow - it sounds like you are asking someone to write this program for you, rather than for help with a specific problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.