I am trying to execute the following command in python using plumbum:
sort -u -f -t$'\t' -k1,1 file1 > file2
However, I am having issues passing the -t$'\t' argument. Here is my code:
from plumbum.cmd import sort
separator = r"-t$'\t'"
print separator
cmd = (sort["-u", "-f", separator, "-k1,1", "file1"]) > "file2"
print cmd
print cmd()
I can see problems right away after print separator and print cmd() executes:
-t$'\t'
/usr/bin/sort -u -f "-t\$'\\t'" -k1,1 file1 > file2
The argument is wrapped in double quotes.
An extra \ before $ and \t is inserted.
How should I pass this argument to plumbum?
You may have stumbled into limitations of the command line escaping.
I could make it work using subprocess module, passing a real tabulation char litteraly:
import subprocess
p=subprocess.Popen(["sort","-u","-f","-t\t","-k1,1","file1",">","file2"],shell=True)
p.wait()
Also, full python short solution that does what you want:
with open("file1") as fr, open("file2","w") as fw:
fw.writelines(sorted(set(fr),key=lambda x : x.split("\t")[0]))
The full python solution doesn't work exactly the same way sort does when dealing with unicity. If 2 lines have the same first field but not the same second field, sort keeps one of them, whereas the set will keep both.
EDIT: unchecked but you just confirmed that it works: just tweak your plumbum code with:
separator = "-t\t"
could just work, although out of the 3 ones, I'd recommend the full python solution since it doesn't involve an external process and therefore is more pythonic and portable.
Related
I have a bash script that calls a python script with parameters.
In the bash script, I'm reading a file that contains one row of parameters separated by ", and then calls the python script with the line I read.
My problem is that the python gets the parameters separated by the space.
The line looks like this: "param_a" "Param B" "Param C"
Code Example:
Bash Script:
LINE=`cat $tmp_file`
id=`python /full_path/script.py $LINE`
Python Script:
print sys.argv[1]
print sys.argv[2]
print sys.argv[3]
Received output:
"param_a"
"Param
B"
Wanted output:
param_a
Param B
Param C
How can I send the parameters to the Python script the way I need?
Thanks!
What about
id=`python /full_path/script.py $tmp_file`
and
import sys
for line in open(sys.argv[1]):
print(line)
?
The issue is in how bash passes the arguments. Python has nothing do to with it.
So, you have to solve all these stuff before sending it to Python, I decided to use awk and xargs for this. (but xargs is the actual MVP here.)
LINE=$(cat $tmp_file)
awk -v ORS="\0" -v FPAT='"[^"]+"' '{for (i=1;i<=NF;i++){print substr($i,2,length($i)-2)}}' <<<$LINE |
xargs -0 python ./script.py
First $(..) is preferred over backticks, because it is more readable. You are making a variable after all.
awk only reads from stdin or a file, but you can force it to read from a variable with the <<<, also called "here string".
With awk I loop over all fields (as defined by the regex in the FPAT variable), and print them without the "".
The output record separator I choose is the NULL character (-v ORF='\0'), xargs will split on this character.
xargs will now parse the piped input by separating the arguments on NULL characters (set with -0) and execute the command given with the parsed arguments.
Note, while awk is found on most UNIX systems, I make use of FPAT which is a GNU awk extension and you might not be having GNU awk as default (for example Ubuntu), but gnu awk is usually just a install gawk away.
Also, the next command would be a quick and easy solution, but generally considered as unsafe, since eval will execute everything it receives.
eval "python ./script "$LINE
This can be done using bash arrays:
tmp_file='gash.txt'
# Set IFS to " which splits on double quotes and removes them
# Using read is preferable to using the external program cat
# read -a reads into the array called "line"
# UPPERCASE variable names are discouraged because of collisions with bash variables
IFS=\" read -ra line < "$tmp_file"
# That leaves blank and space elements in "line",
# we create a new array called "params" without those elements
declare -a params
for((i=0; i < ${#line[#]}; i++))
do
p="${line[i]}"
if [[ -n "$p" && "$p" != " " ]]
then
params+=("$p")
fi
done
# `backticks` are frowned upon because of poor readability
# I've called the python script "gash.py"
id=$(python ./gash.py "${params[#]}")
echo "$id"
gash.py:
import sys
print "1",sys.argv[1]
print "2",sys.argv[2]
print "3",sys.argv[3]
Gives:
1 param_a
2 Param B
3 Param C
I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory
I'd like to execute the following UNIX command in Python:
cd 2017-02-10; pwd; echo missing > 123.txt
The date directory DATE = 2017-02-10 and OUT = 123.txt are already variables in Python so I have tried variations of
call("cd", DATE, "; pwd; echo missing > ", OUT)
using the subprocess.call function, but I’m struggling to find documentation for multiple UNIX commands at once, which are normally separated by ; or piping with >
Doing the commands on separate lines in Python doesn’t work either because it “forgets” what was executed on the previous line and essentiality resets.
You can pass a shell script as a single argument, with strings to be substituted as out-of-band arguments, as follows:
date='2017-02-10'
out='123.txt'
subprocess.call(
['cd "$1"; pwd; echo missing >"$2"', # shell script to run
'_', # $0 for that script
date, # $1 for that script
out, # $2 for that script
], shell=True)
This is much more secure than substituting your date and out values into a string which is evaluated by the shell as code, because these values are treated as literals: A date of $(rm -rf ~) will not in fact try to delete your home directory. :)
Doing the commands on separate lines in Python doesn’t work either
because it “forgets” what was executed on the previous line and
essentiality resets.
This is because if you have separate calls to subprocess.call it will run each command in its own shell, and the cd call has no effect on the later shells.
One way around that would be to change the directory in the Python script itself before doing the rest. Whether or not this is a good idea depends on what the rest of the script does. Do you really need to change directory? Why not just write "missing" to 2017-02-10/123.txt from Python directly? Why do you need the pwd call?
Assuming you're looping through a list of directories and want to output the full path of each and also create files with "missing" in them, you could perhaps do this instead:
import os
base = "/path/to/parent"
for DATE, OUT in [["2017-02-10", "123.txt"], ["2017-02-11", "456.txt"]]:
date_dir = os.path.join(base, DATE)
print(date_dir)
out_path = os.path.join(date_dir, OUT)
out = open(out_path, "w")
out.write("missing\n")
out.flush()
out.close()
The above could use some error handling in case you don't have permission to write to the file or the directory doesn't exist, but your shell commands don't have any error handling either.
>>> date = "2017-02-10"
>>> command = "cd " + date + "; pwd; echo missing > 123.txt"
>>> import os
>>> os.system(command)
I am trying to incorporate this sed command to remove the last comma in a son file.
sed -i -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /' file.json"
when i run this in the command line, it works fine. When i try to run as a subprocess as so it doesn't work.
Popen("sed -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /' file.json",shell=True).wait()
What am I doing wrong?
It doesn't work because when you write \1, python interprets that as \x01 and our regular expression doesn't work / is illegal.
That is already better:
check_call(["sed","-i","-e",r"1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /","file.json"])
because splitting as a real list and passing your regex as a raw string has a better chance to work. And check_call is what you need to just call a process, without caring about its output.
But I would do even better: since python is good at processing files, given your rather simple problem, I would create a fully portable version, no need for sed:
# read the file
with open("file.json") as f:
contents = f.read().rstrip().rstrip(",") # strip last newline/space then strip last comma
# write back the file
with open("file.json","w") as f:
f.write(contents)
In general, you might try the following solutions:
Pass the raw string, as was mentioned
Escape the '\' character.
This code also does what you need:
Popen("sed -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\\1 /' file.json", shell=True).wait()
or
try:
check_call(["sed", "-i", "-e", "1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\\1 /", "file.json"])
except:
pass # or handle the error
I'm trying to call 'sed' from Python and having troubles passing the command line via either subprocess.check_call() or os.system().
I'm on Windows 7, but using the 'sed' from Cygwin (it's in the path).
If I do this from the Cygwin shell, it works fine:
$ sed 's/ /\ /g' <"C:foobar" >"C:foobar.temp"
In Python, I've got the full pathname I'm working with in "name". I tried:
command = r"sed 's/ /\ /g' " + "<" '\"' + name + '\" >' '\"' + name + '.temp' + '\"'
subprocess.check_call(command, shell=True)
All the concatenation is there to make sure I have double quotes around the input and output filenames (in case there are blank spaces in the Windows file path).
I also tried it replacing the last line with:
os.system(command)
Either way, I get this error:
sed: -e expression #1, char 2: unterminated `s' command
'amp' is not recognized as an internal or external command,
operable program or batch file.
'nbsp' is not recognized as an internal or external command,
operable program or batch file.
Yet, as I said, it works OK from the console. What am I doing wrong?
The shell used by subprocess is probably not the shell you want. You can specify the shell with executable='path/to/executable'. Different shells have different quoting rules.
Even better might be to skip subprocess altogether, and write this as pure Python:
with open("c:foobar") as f_in:
with open("c:foobar.temp", "w") as f_out:
for line in f_in:
f_out.write(line.replace(' ', ' '))
I agree with Ned Batchelder's assessment, but think what you might want to consider using the following code because it likely does what you ultimately want to accomplish which can be done easily with the help of Python's fileinput module:
import fileinput
f = fileinput.input('C:foobar', inplace=1)
for line in f:
line = line.replace(' ', ' ')
print line,
f.close()
print 'done'
This will effectively update the given file in place as use of the keyword suggests. There's also an optional backup= keyword -- not used above -- which will save a copy of the original file if desired.
BTW, a word of caution about using something like C:foobar to specify the file name because on Windows it means a file of that name in whatever the current directory is on drive C:, which might not be what you want.
I think you'll find that, in Windows Python, it's not actually using the CygWin shell to run your command, it's instead using cmd.exe.
And, cmd doesn't play well with single quotes the way bash does.
You only have to do the following to confirm that:
c:\pax> echo hello >hello.txt
c:\pax> type "hello.txt"
hello
c:\pax> type 'hello.txt'
The system cannot find the file specified.
I think the best idea would be to use Python itself to process the file. The Python language is a cross-platform one which is meant to remove all those platform-specific inconsistencies, such as the one you've just found.