Splitting a large file in python by calling a command line function

Splitting a large file in python by calling a command line function - python

I am trying to split a file into a number of parts via a python script:
Here is my snippet:
def bashCommandFunc(commandToRun):
process = subprocess.Popen(commandToRun.split(), stdout=subprocess.PIPE)
output = process.communicate()
return output
filepath = "/Users/user/Desktop/TempDel/part-00000"
numParts = "5"
splitCommand = "split -l$((`wc -l < " + filepath + "/" + numParts + ")) " + filepath
splitCommand:
'split -l$((`wc -l < /Users/user/Desktop/TempDel/part-00000`/5)) /Users/user/Desktop/TempDel/part-00000'
If I run this command on a terminal, it splits the file as it's supposed to, but it fails for the above defined subprocess function.
I have tested the function for other generic commands and it works fine.
I believe the character " ` " (tilde) might be an issue,
What is the work around to getting this command to work?
Are there some better ways to split a file from python into "n" parts.
Thanks

You'll have to let Python run this line via a full shell, rather than trying to run it as a command. You can do that by adding shell=True option and not splitting your command. But you really shouldn't do that if any part of the command may be influenced by users (huge security risk).
You could do this in a safer way by first calling wc, getting the result and then calling split. Or even implement the whole thing in pure Python instead of calling out to other commands.
What happens now is that you're calling split with first parameter -l$((``wc, second parameter -l, etc.

Related

How to use input() function of Python in bash script?

I am trying to integrate a Python script into a bash script. However when I use the input() function, I am getting an EOFError. How can I fix this problem?
#!/bin/bash
python3 <<END
print(input(">>> "))
END

You cannot source both the script and the user input through the program's standard input. (That's in effect what you're trying to do. << redirects the standard input.)
Ideally, you would provide the script as command line argument instead of stdin using -c SCRIPT instead of <<EOF heredoc EOF:
#!/bin/bash
python3 -c 'print(input(">>> "))'
Note that you may need to mind your quoting and escaping in case you have a more complicated Python script with nested quotes.
You can still let the script run over multiple lines, if you need to:
#!/bin/bash
python3 -c '
import os.path
path_name = input("enter a path name >>> ")
file_exists = os.path.exists(path_name)
print("file " + path_name + " " +
("exists" if file_exists else "does not exist"))
'
Note that you will get into trouble when you want to use single quotes in your Python script, as happens when you want to print doesn't instead of does not.
You can work around that using several approaches. The one I consider most flexible (apart from putting you into quoting hell) is surrounding the Python script with double quotes instead and properly escape all inner double quotes and other characters that the shell interprets:
#!/bin/bash
python3 -c "
print(\"It doesn't slice your bread.\")
print('But it can', 'unsliced'[2:7], 'your strings.')
print(\"It's only about \$0. Neat, right?\")
"
Note that I also escaped $, as the shell would otherwise interpret it inside the surrounding double quotes and the result may not be what you wanted.

Python call to external software command

I have a slight problem. I have a software which has a command with two inputs. The command is: maf2hal inputfile outputfile.
I need to call this command from a Python script. The Python script asks the user for path of input file and path of output file and stores them in two variables.The problem is that when I call the command maf2hal giving the two variable names as the arguments, the error I get is cannot locate file.
Is there a way around this? Here's my code:
folderfound = "n" # looping condition
while (folderfound == "n"):
path = raw_input("Enter path of file to convert (with the extension) > ")
if not os.path.exists(path):
print "\tERROR! file not found. Maybe file doesn't exist or no extension was provided. Try again!\n"
else:
print "\tFile found\n"
folderfound = "y"
folderfound = "y" # looping condition
while (folderfound == "y"):
outName = raw_input("Enter path of output file to be created > ")
if os.path.exists(outName):
print "\tERROR! File already exists \n\tEither delete the existing file or enter a new file name\n\n"
else:
print "Creating output file....\n"
outputName = outName + ".maf"
print "Done\n"
folderfound = "n"
hal_input = outputName #inputfilename, 1st argument
hal_output = outName + ".hal" #outputfilename, 2nd argument
call("maf2hal hal_input hal_output", shell=True)

This is wrong:
call("maf2hal hal_input hal_output", shell=True)
It should be:
call(["maf2hal", hal_input, hal_output])
Otherwise you're giving "hal_input" as the actual file name, rather than using the variable.
You should not use shell=True unless absolutely necessary, and in this case it is not only unnecessary, it is pointlessly inefficient. Just call the executable directly, as above.
For bonus points, use check_call() instead of call(), because the former will actually check the return value and raise an exception if the program failed. Using call() doesn't, so errors may go unnoticed.

There are a few problems. Your first reported error was that the call to the shell can't find the maf2hal program - that sounds like a path issue. You need to verify that the command is in the path of the shell that is being created.
Second, your call is passing the words "hal_input" and "hal_output". You'll need to build that command up first to pass the values of those variables;
cmd = "maf2hal {0} {1}".format(hal_input, hal_output)
call(cmd, shell=True)

Your code is literally trying to open a file called hal_input, not using the contents of your variable with the same name. It looks like you're using the subprocess module to execute, so you can just change it to call(["maf2hal", hal_input, hal_output], shell=True) to use the contents.

at the end of your code:
call("maf2hal hal_input hal_output", shell=True)
you are literally calling that string, not the executable and then those paths, you need to concatenate your strings together first, either by adding them of using .join
eg:
call("maf2hal " + hal_input + " " + hal_output", shell=True)
or
call("maf2hal ".join(hal_input, " ", hal_output), shell=True)

Permission error when running jar from python

I have a .jar archive that loads a file and then does some things with it and writes it to the disk again.
If I call this .jar directly from the command prompt, everything works. But when I try to do it from within python, I get the following error:
Input file ("C:\xxx.txt") was not found or was not readable.
This is my python code:
import sys, os, subprocess
if os.path.isdir(sys.argv[1]):
for file in os.listdir("."):
print (" ".join(['java', '-jar', sys.argv[2], 'd', "\"" + os.path.abspath(file) + "\"", "\""+os.path.join(os.path.join(os.path.abspath(os.path.dirname(file)), "output"), file) + "\""]))
subprocess.call(['java', '-jar', sys.argv[2], 'd', "\"" + os.path.abspath(file) + "\"", "\""+os.path.join(os.path.join(os.path.abspath(os.path.dirname(file)), "output"), file) + "\""])
When I copy the printed statement into the commandline, the jar executes perfectly; everything works. I tried running cmd as an admin, but that didn't help.

The problem is the extra quotes you're adding. When you pass subprocess a list of args, it already quotes them appropriately; if you quote them yourself, it'll end up quoting your quotes, so instead of passing an argument that, when unquoted, means C:\xxx.txt, you'll be passing an argument that, when unquoted, means "C:\xxx.txt", which is not a valid pathname.
The rule of thumb for Windows* is: If you know exactly what each argument should be, pass them as a list, and don't try to quote them yourself; if you know exactly what the final command-line string should be, pass it as a string, and don't try to break it into a list of separate arguments yourself.
* Note that this is only for Windows. On POSIX, unless you're using shell=True, you should basically never use a string.

Run windows executable from python script with multiple arguments

I am working on a program that will find some files and provide the file information to a NSIS script. The NSIS script accepts the command line as follows
makensis.exe /DON="This is one" /DOD="c:\path1\path2 to dir\path 3" scriptfile.nsi
The values of the switches will change on each execution of the program. I have tried to get this to execute using subprocess.call and subprocess.Popen. The issue I am having has to do with quoting.
First of all the subprocess calls seem to put the entire argument statement between double quotes making NSIS see them as one argument. Second I am having some difficulty getting the individual switches properly quoted on the command line. Here is a snippet of what my program currently looks like.
subprocess.Popen([setup.profile['NSISExe'], ' /DON="' + setup.profile['DESC'] + '" /DOD="' + setup.profile['InstallDir'] + \
'" /DMT="' + app.machine_type.get() + '" /DSD="' + os.path.join(WinShellVar.LOCAL_APPDATA, 'MLC CAD', appname) + \
'" /DXV=X6 ' + setup.profile['NSISScript']])
And here is the output from NSIS
Can't open script " /DON="Mastercam X6 Standard" /DOD="C:\Users\John\Desktop" /D
MT="mill" /DSD="C:\Users\John\AppData\Local\MLC CAD\mcdeftool" /DXV=X6 bin\packa
ge.002.nsi"
As you can see I am using a mixed bag of data, getting some bits for dicts and some from class calls (be easy on me if my terms are somewhat incorrect, I have been learning python for about 4 days now, correct me please just nicely). If using this data like this is "unpythonic" let me know.
Looking forward to your input

disclaimer -- I don't use windows
I think you probably want something like:
subprocess.Popen([setup.profile['NSISExe'], '/DON=' + setup.profile['DESC'],
'/DOD=' + setup.profile['InstallDir'],
'/DMT=' + app.machine_type.get(),
'/DSD=' + os.path.join(WinShellVar.LOCAL_APPDATA, 'MLC CAD', appname),
'/DXV=X6',
setup.profile['NSISScript']])
When the shell reads the commandline, it splits on non-quoted, non-escaped whitespace. When you pass a list to Popen, it expects the list elements to be the way it would look after the shell split the arguments. The other option is to pass a string (instead of a list) exactly as you would put it into the windows shell and pass shell=True to Popen. But that method isn't preferred as it is much more vulnerable to shell-injection insecurities.

Passing shell commands with Python os.system() or subprocess.check_call()

I'm trying to call 'sed' from Python and having troubles passing the command line via either subprocess.check_call() or os.system().
I'm on Windows 7, but using the 'sed' from Cygwin (it's in the path).
If I do this from the Cygwin shell, it works fine:
$ sed 's/&nbsp;/\ /g' <"C:foobar" >"C:foobar.temp"
In Python, I've got the full pathname I'm working with in "name". I tried:
command = r"sed 's/&nbsp;/\ /g' " + "<" '\"' + name + '\" >' '\"' + name + '.temp' + '\"'
subprocess.check_call(command, shell=True)
All the concatenation is there to make sure I have double quotes around the input and output filenames (in case there are blank spaces in the Windows file path).
I also tried it replacing the last line with:
os.system(command)
Either way, I get this error:
sed: -e expression #1, char 2: unterminated `s' command
'amp' is not recognized as an internal or external command,
operable program or batch file.
'nbsp' is not recognized as an internal or external command,
operable program or batch file.
Yet, as I said, it works OK from the console. What am I doing wrong?

The shell used by subprocess is probably not the shell you want. You can specify the shell with executable='path/to/executable'. Different shells have different quoting rules.
Even better might be to skip subprocess altogether, and write this as pure Python:
with open("c:foobar") as f_in:
with open("c:foobar.temp", "w") as f_out:
for line in f_in:
f_out.write(line.replace('&nbsp;', ' '))

I agree with Ned Batchelder's assessment, but think what you might want to consider using the following code because it likely does what you ultimately want to accomplish which can be done easily with the help of Python's fileinput module:
import fileinput
f = fileinput.input('C:foobar', inplace=1)
for line in f:
line = line.replace('&nbsp;', ' ')
print line,
f.close()
print 'done'
This will effectively update the given file in place as use of the keyword suggests. There's also an optional backup= keyword -- not used above -- which will save a copy of the original file if desired.
BTW, a word of caution about using something like C:foobar to specify the file name because on Windows it means a file of that name in whatever the current directory is on drive C:, which might not be what you want.

I think you'll find that, in Windows Python, it's not actually using the CygWin shell to run your command, it's instead using cmd.exe.
And, cmd doesn't play well with single quotes the way bash does.
You only have to do the following to confirm that:
c:\pax> echo hello >hello.txt
c:\pax> type "hello.txt"
hello
c:\pax> type 'hello.txt'
The system cannot find the file specified.
I think the best idea would be to use Python itself to process the file. The Python language is a cross-platform one which is meant to remove all those platform-specific inconsistencies, such as the one you've just found.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a large file in python by calling a command line function - python

Related

How to use input() function of Python in bash script?

Python call to external software command

Permission error when running jar from python

Run windows executable from python script with multiple arguments

Passing shell commands with Python os.system() or subprocess.check_call()

Categories

Resources