Stdin redirection from Python - python

Say I have a program called some_binary that can read data as:
some_binary < input
where input is usually a file in disk. I would like to send input to some_binary from Python without writing to disk.
For example input is typically a file with the following contents:
0 0.2
0 0.4
1 0.2
0 0.3
0 0.5
1 0.7
To simulate something like that in Python I have:
import numpy as np
# Random binary numbers
first_column = np.random.random_integers(0,1, (6,))
# Random numbers between 0 and 1
second_column = np.random.random((6,))
How can I feed the concatenation of first_column and second_column to some_binary as if I was calling some_binary < input from the command line, and collect stdout in a string?
I have the following:
def run_shell_command(cmd,cwd=None,my_input):
retVal = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stdin=my_input, cwd=cwd);
retVal = retVal.stdout.read().strip('\n');
return(retVal);
But I am not sure I am heading in the right direction.

Yes, you are heading in the right direction.
You can use pythons subprocess.check_output() functions which is a convenience wrapper around subprocess.Popen(). The Popen needs more infrastructure. For example you need to call comminucate() on the return value of Popen in order for things to happen.
Something like
output = subprocess.check_output([cmd], stdin = my_input)
should work in your case.

Related

Print out a specific part of the output result in python

I have a function that does some thing and displays a line of output mixed between integer and strings.
However, I just want to print out the last part of the output which is the number 5 that comes after the dots:
The number 5 is the value of the OID and it could be 5( as On)
Or 6(as OFF).
Is there any way how can I specify that in print or if condition?
Here is the function:
import subprocess, sys
p = subprocess.Popen(["powershell.exe",
"snmpwalk -v1 -c public 192.168.178.213 .1.3.6.1.4.1.9986.3.22.1.6.1.1.15"],
stdout=sys.stdout)
p.communicate()

Using subprocess and .format

So I have this code so far:
from subprocess import call
a, b = 10000000, 10000100
call('samtools faidx file.fa chr22:{}-{}'.format(a, b), shell = True)
but when I run it, the numbers assigned to a and b does not seem to go into the {} brackets as format should do.
Am I using format wrong here, or is my code itself wrong?
(file.fa is a file that holds a DNA sequence for chromosome 22)

Reversing a byte string in Python

I find out the following code in python:
def ExtractShellcodeArm(_arg_name):
ObjDumpOutput(_arg_name)
print("\033[101m\033[1mExtracted Shellcode:\033[0m\n")
proc = subprocess.Popen(['objdump','-d',_arg_name], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if line != b'':
array = line.decode().rstrip().split(':')
if len(array) > 1:
if array[1]:
array2 = array[1].split(' ')
array2 = array2[0].lstrip().rstrip()
if array2:
sc_part = '\t"'
sc_part += '\\x'
sc_part += '\\x'.join(a+b for a,b in zip(array2[::2], array2[1::2]))
sc_part += '"+'
print(sc_part)
else:
break
After I run this code in python3 it gives me the result of the objdump tools like the following:
"\xe2\x8f\x60\x01"+
"\xe1\x2f\xff\x16"+
"\x22\x0c"+
"\x46\x79"+
"\x31\x0e"+
"\x20\x01"+
"\x27\x04"+
"\xdf\x01"+
"\x1b\x24"+
"\x1c\x20"+
"\x27\x01"+
"\xdf\x01"+
"\x6c\x6c\x65\x48"+
"\x6f\x57\x20\x6f"+
"\x0a\x64\x6c\x72"+
But I want it shows the result in the big endian format. How can I change this represantion in python function. for example I want this code shows the result like the following:
"\x01\x60\x8f\xe2"+
"\x16\xff\x2f\xe1"+
"\x0c\x22"+
"\x79\x46"+
...
It's not the prettiest code, but this works:
''.join(a+b for a, b in zip(s[::-2], s[-2::-2]))
You should store each complete opcode (set of bytes) as an element in a list when you parse them, and then iterate over the list, flipping the bytes in the opcode one at a time. For example, rather than opcodes "\xcd\x80" + "\xeb\xfe" use opcodes = ["\xcd\x80", "\xeb\xfe". You should have no problem iterating over the list and reversing each opcode.
Another option is using shell utilities to reverse the bytes before they are received by Python by piping the objdump command to tools like sed and awk to do this by splitting up the bytes on each line into columns and then printing the columns backwards.

StringIO and pandas read_csv

I'm trying to mix StringIO and BytesIO with pandas and struggling with some basic stuff. For example, I can't get "output" below to work, whereas "output2" below does work. But "output" is closer to the real world example I'm trying to do. The way in "output2" is from an old pandas example but not really a useful way for me to do it.
import io # note for python 3 only
# in python2 need to import StringIO
output = io.StringIO()
output.write('x,y\n')
output.write('1,2\n')
output2 = io.StringIO("""x,y
1,2
""")
They seem to be the same in terms of type and contents:
type(output) == type(output2)
Out[159]: True
output.getvalue() == output2.getvalue()
Out[160]: True
But no, not the same:
output == output2
Out[161]: False
More to the point of the problem I'm trying to solve:
pd.read_csv(output) # ValueError: No columns to parse from file
pd.read_csv(output2) # works fine, same as reading from a file
io.StringIO here is behaving just like a file -- you wrote to it, and now the file pointer is pointing at the end. When you try to read from it after that, there's nothing after the point you wrote, so: no columns to parse.
Instead, just like you would with an ordinary file, seek to the start, and then read:
>>> output = io.StringIO()
>>> output.write('x,y\n')
4
>>> output.write('1,2\n')
4
>>> output.seek(0)
0
>>> pd.read_csv(output)
x y
0 1 2

Repeatedly write to stdin and read from stdout of a process from python

I have a piece of fortran code that reads some numbers from STDIN and writes results to STDOUT. For example:
do
read (*,*) x
y = x*x
write (*,*) y
enddo
So I can start the program from a shell and get the following sequence of inputs/outputs:
5.0
25.0
2.5
6.25
Now I need to do this from within python. After futilely wrestling with subprocess.Popen and looking through old questions on this site, I decided to use pexpect.spawn:
import pexpect, os
p = pexpect.spawn('squarer')
p.setecho(False)
p.write("2.5" + os.linesep)
res = p.readline()
and it works. The problem is, the real data I need to pass between python and my fortran program is an array of 100,000 (or more) double precision floats. If they're contained in an array called x, then
p.write(' '.join(["%.10f"%k for k in x]) + os.linesep)
times out with the following error message from pexpect:
buffer (last 100 chars):
before (last 100 chars):
after: <class 'pexpect.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 8574
child_fd: 3
closed: False
timeout: 30
delimiter: <class 'pexpect.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
unless x has less than 303 elements. Is there a way to pass large amounts of data to/from STDIN/STDOUT of another program?
I have tried splitting the data into smaller chunks, but then I lose a lot in speed.
Thanks in advance.
Found a solution using the subprocess module, so I'm posting it here for reference if anyone needs to do the same thing.
import subprocess as sbp
class ExternalProg:
def __init__(self, arg_list):
self.opt = sbp.Popen(arg_list, stdin=sbp.PIPE, stdout=sbp.PIPE, shell=True, close_fds=True)
def toString(self,x):
return ' '.join(["%.12f"%k for k in x])
def toFloat(self,x):
return float64(x.strip().split())
def sendString(self,string):
if not string.endswith('\n'):
string = string + '\n'
self.opt.stdin.write(string)
def sendArray(self,x):
self.opt.stdin.write(self.toString(x)+'\n')
def readInt(self):
return int(self.opt.stdout.readline().strip())
def sendScalar(self,x):
if type(x) == int:
self.opt.stdin.write("%i\n"%x)
elif type(x) == float:
self.opt.stdin.write("%.12f\n"%x)
def readArray(self):
return self.toFloat(self.opt.stdout.readline())
def close(self):
self.opt.kill()
The class is invoked with an external program called 'optimizer' as:
optim = ExternalProg(['./optimizer'])
optim.sendScalar(500) # send the optimizer the length of the state vector, for example
optim.sendArray(init_x) # the initial guess for x
optim.sendArray(init_g) # the initial gradient g
next_x = optim.readArray() # get the next estimate of x
next_g = evaluateGradient(next_x) # calculate gradient at next_x from within python
# repeat until convergence
On the fortran side (the program compiled to give the executable 'optimizer'), a 500-element vector would be read in so:
read(*,*) input_vector(1:500)
and would be written out so:
write(*,'(500f18.11)') output_vector(1:500)
and that's it! I've tested it with state vectors up to 200,000 elements (which is the upper limit of what I need right now). Hope this helps someone other than myself. This solution works with ifort and xlf90, but not with gfortran for some reason I don't understand.
example squarer.py program (it just happens to be in Python, use your Fortran executable):
#!/usr/bin/python
import sys
data= sys.stdin.readline() # expecting lots of data in one line
processed_data= data[-2::-1] # reverse without the newline
sys.stdout.write(processed_data+'\n')
example target.py program:
import thread, Queue
import subprocess as sbp
class Companion(object):
"A companion process manager"
def __init__(self, cmdline):
"Start the companion process"
self.companion= sbp.Popen(
cmdline, shell=False,
stdin=sbp.PIPE,
stdout=sbp.PIPE)
self.putque= Queue.Queue()
self.getque= Queue.Queue()
thread.start_new_thread(self._sender, (self.putque,))
thread.start_new_thread(self._receiver, (self.getque,))
def _sender(self, que):
"Actually sends the data to the companion process"
while 1:
datum= que.get()
if datum is Ellipsis:
break
self.companion.stdin.write(datum)
if not datum.endswith('\n'):
self.companion.stdin.write('\n')
def _receiver(self, que):
"Actually receives data from the companion process"
while 1:
datum= self.companion.stdout.readline()
que.put(datum)
def close(self):
self.putque.put(Ellipsis)
def send(self, data):
"Schedule a long line to be sent to the companion process"
self.putque.put(data)
def recv(self):
"Get a long line of output from the companion process"
return self.getque.get()
def main():
my_data= '12345678 ' * 5000
my_companion= Companion(("/usr/bin/python", "squarer.py"))
my_companion.send(my_data)
my_answer= my_companion.recv()
print my_answer[:20] # don't print the long stuff
# rinse, repeat
my_companion.close()
if __name__ == "__main__":
main()
The main function contains the code you will use: setup a Companion object, companion.send a long line of data, companion.recv a line. Repeat as necessary.
Here's a huge simplification: Break your Python into two things.
python source.py | squarer | python sink.py
The squarer application is your Fortran code. Reads from stdin, writes to stdout.
Your source.py is your Python that does
import sys
sys.stdout.write(' '.join(["%.10f"%k for k in x]) + os.linesep)
Or, perhaps something a tiny bit simpler, i.e.
from __future__ import print_function
print( ' '.join(["{0:.10f}".format(k) for k in x]) )
And your sink.py is something like this.
import fileinput
for line in fileinput.input():
# process the line
Separating source, squarer and sink gets you 3 separate processes (instead of 2) and will use more cores. More cores == more concurrency == more fun.
I think that you only add one linebreak here:
p.write(' '.join(["%.10f"%k for k in x]) + os.linesep)
instead of adding one per line.
Looks like you're timing out (default timeout, I believe, 30 seconds) because preparing, sending, receiving, and processing that much data is taking a lot of time. Per the docs, timeout= is an optional named parameter to the expect method, which you're not calling -- maybe there's an undocumented way to set the default timeout in the initializer, which could be found by poring over the sources (or, worst case, created by hacking those sources).
If the Fortran program read and saved (say) 100 items at a time, with a prompt, syncing up would become enormously easier. Could you modify your Fortran code for the purpose, or would you rather go for the undocumented / hack approach?

Categories