Hi I'm trying to call the following command from python:
comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"
How could I do the calling when the inputs for the comm command are also piped?
Is there an easy and straight forward way to do it?
I tried the subprocess module:
subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")
Without success, it says:
OSError: [Errno 2] No such file or directory
Or do I have to create the different calls individually and then pass them using PIPE as it is described in the subprocess documentation:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Process substitution (<()) is bash-only functionality. Thus, you need a shell, but it can't be just any shell (like /bin/sh, as used by shell=True on non-Windows platforms) -- it needs to be bash.
subprocess.call(['bash', '-c', "comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'"])
By the way, if you're going to be going this route with arbitrary filenames, pass them out-of-band (as below: Passing _ as $0, File1.txt as $1, and File2.txt as $2):
subprocess.call(['bash', '-c',
'''comm -3 <(awk '{print $1}' "$1" | sort | uniq) '''
''' <(awk '{print $1}' "$2" | sort | uniq) '''
''' | grep -v '#' | tr -d "\t"''',
'_', "File1.txt", "File2.txt"])
That said, the best-practices approach is indeed to set up the chain yourself. The below is tested with Python 3.6 (note the need for the pass_fds argument to subprocess.Popen to make the file descriptors referred to via /dev/fd/## links available):
awk_filter='''! /#/ && !seen[$1]++ { print $1 }'''
p1 = subprocess.Popen(['awk', awk_filter],
stdin=open('File1.txt', 'r'),
stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-u'],
stdin=p1.stdout,
stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', awk_filter],
stdin=open('File2.txt', 'r'),
stdout=subprocess.PIPE)
p4 = subprocess.Popen(['sort', '-u'],
stdin=p3.stdout,
stdout=subprocess.PIPE)
p5 = subprocess.Popen(['comm', '-3',
('/dev/fd/%d' % (p2.stdout.fileno(),)),
('/dev/fd/%d' % (p4.stdout.fileno(),))],
pass_fds=(p2.stdout.fileno(), p4.stdout.fileno()),
stdout=subprocess.PIPE)
p6 = subprocess.Popen(['tr', '-d', '\t'],
stdin=p5.stdout,
stdout=subprocess.PIPE)
result = p6.communicate()
This is a lot more code, but (assuming that the filenames are parameterized in the real world) it's also safer code -- you aren't vulnerable to bugs like ShellShock that are triggered by the simple act of starting a shell, and don't need to worry about passing variables out-of-band to avoid injection attacks (except in the context of arguments to commands -- like awk -- that are scripting language interpreters themselves).
That said, another thing to think about is just implementing the whole thing in native Python.
lines_1 = set(line.split()[0] for line in open('File1.txt', 'r') if not '#' in line)
lines_2 = set(line.split()[0] for line in open('File2.txt', 'r') if not '#' in line)
not_common = (lines_1 - lines_2) | (lines_2 - lines_1)
for line in sorted(not_common):
print line
Also checkout plumbum. Makes life easier
http://plumbum.readthedocs.io/en/latest/
Pipelining
This may be wrong, but you can try this:
from plumbum.cmd import grep, comm, awk, sort, uniq, sed
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
chain = comm['-3', _c1(), _c2() ] | grep['-v', '#'] | sed['s/\t//g']
chain()
Let me know if this goes wrong, Will try to fix it.
Edit: As pointed out, I missed the substitution thing, and I think it would have to be explicitly done by redirecting the above command output to a temporary file and then using that file in the argument to comm.
So the above would now actually become:
from plumbum.cmd import grep, comm, awk, sort, uniq, sed
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
(_c1 > "/tmp/File1.txt")(), (_c2 > "/tmp/File2.txt")()
chain = comm['-3', "/tmp/File1.txt", "/tmp/File2.txt" ] | grep['-v', '#'] | sed['s/\t//g']
chain()
Also, alternatively you can use the method described by #charles by making use of mkfifo.
Related
I'm trying to concatenate python variables into os.system, the command seems to execute but it doesn't take properly the value allocated.
I've tried using both os.system and subprocess, but none of them work. Here is some of my attempts.
interface = os.popen("netstat -i | awk '$1 ~ /^w/ {print $1}'")
os.system("iw dev %s station dump" % (interface))
.
interface = os.popen("netstat -i | awk '$1 ~ /^w/ {print $1}'")
os.system("iw dev" +interface+ "station dump")
.
p1 = subprocess.Popen(["netstat", "-i"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["awk", '$1 ~ /^w/ {print $1}'], stdin=p1.stdout,
stdout=subprocess.PIPE)
displayInterface = p2.communicate()[0].decode('ascii')
retrieveMac = subprocess.Popen(["iw", "dev", displayInterface, "station", "dump"])
In this line:
displayInterface = p2.communicate()[0].decode('ascii')
displayInterface results in a string with a trailing newline. I don't know whether you need the decode(), but you need to strip the newline.
displayInterface = p2.communicate()[0].rstrip()
You can specify the character(s) to strip in an argument to rstrip() if necessary.
I use this bash command for catching a string in a text file
cat a.txt | grep 'a' | grep 'b' | grep 'c' | cut -d" " -f1
How can I implement this solution in python? I don't want to call os commands because it should be a cross platform script.
You may try this,
with open(file) as f: # open the file
for line in f: # iterate over the lines
if all(i in line for i in ('a', 'b', 'c')): # check if the line contain all (a,b,c)
print line.split(" ")[0] # if yes then do splitting on space and print the first value
You can always use the os library to do a system call:
import os
bashcmd = " cat a.txt | grep 'a' | grep 'b' | grep 'c' | cut -d' ' -f1"
print os.system( bashcmd )
My text file (unfortunately) looks like this...
<amar>[amar-1000#Fem$$$_Y](1){india|1000#Fem$$$,mumbai|1000#Mas$$$}
<akbar>[akbar-1000#Fem$$$_Y](1){}
<john>[-0000#$$$_N](0){USA|0100#$avi$$,NJ|0100#$avi$$}
It contain the customer name followed by some information. The sequence is...
text string followed by list, set and then dictionary
<> [] () {}
This is not python compatible file so the data is not as expected. I want to process the file and extract some information.
amar 1000 | 1000 | 1000
akbar 1000
john 0000 | 0100 | 0100
1) name between <>
2) The number between - and # in the list
3 & 4) split dictionary on comma and the numbers between | and # (there can be more than 2 entries here)
I am open to using any tool best suited for this task.
The following Python script will read your text file and give you the desired results:
import re, itertools
with open("input.txt", "r") as f_input:
for line in f_input:
reLine = re.match(r"<(\w+)>\[(.*?)\].*?{(.*?)\}", line)
lNumbers = [re.findall(".*?(\d+).*?", entry) for entry in reLine.groups()[1:]]
lNumbers = list(itertools.chain.from_iterable(lNumbers))
print reLine.group(1), " | ".join(lNumbers)
This prints the following output:
amar 1000 | 1000 | 1000
akbar 1000
john 0000 | 0100 | 0100
As the grammer is quite complex you might find a proper parser the best solution.
#!/usr/bin/env python
import fileinput
from pyparsing import Word, Regex, Optional, Suppress, ZeroOrMore, alphas, nums
name = Suppress('<') + Word(alphas) + Suppress('>')
reclist = Suppress('[' + Optional(Word(alphas)) + '-') + Word(nums) + Suppress(Regex("[^]]+]"))
digit = Suppress('(' + Word(nums) + ')')
dictStart = Suppress('{')
dictVals = Suppress(Word(alphas) + '|') + Word(nums) + Suppress('#' + Regex('[^,}]+') + Optional(','))
dictEnd = Suppress('}')
parser = name + reclist + digit + dictStart + ZeroOrMore(dictVals) + dictEnd
for line in fileinput.input():
print ' | '.join(parser.parseString(line))
This solution uses the pyparsing library and running produces:
$ python parse.py file
amar | 1000 | 1000 | 1000
akbar | 1000
john | 0000 | 0100 | 0100
You can add all delimiters to the FS variable in awk and count fields, like:
awk -F'[<>#|-]' '{ print $2, $4, $6, $8 }' infile
In case you have more than two entries between curly braces, you could use a loop to traverse all fields until the last one, like:
awk -F'[<>#|-]' '{
printf "%s %s ", $2, $4
for (i = 6; i <= NF; i += 2) {
printf "%s ", $i
}
printf "\n"
}' infile
Both commands yield same results:
amar 1000 1000 1000
akbar 1000
john 0000 0100 0100
You could use regex to catch the arguments
sample:
a="<john>[-0000#$$$_N](0){USA|0100#$avi$$,NJ|0100#$avi$$}"
name=" ".join(re.findall("<(\w+)>[\s\S]+?-(\d+)#",a)[0])
others=re.findall("\|(\d+)#",a)
print name+" | "+" | ".join(others) if others else " "
output:
'john 0000 | 0100 | 0100'
Full code:
with open("input.txt","r") as inp:
for line in inp:
name=re.findall("<(\w+)>[\s\S]+?-(\d+)#",line)[0]
others=re.findall("\|(\d+)#",line)
print name+" | "+" | ".join(others) if others else " "
For one line of your file :
test='<amar>[amar-1000#Fem$$$_Y](1){india|1000#Fem$$$,mumbai|1000#Mas$$$}'
replace < with empty character and remove everything after > for getting the first name
echo $test | sed -e 's/<//g' | sed -e 's/>.*//g'
get all 4 digit characters suites :
echo $test | grep -o '[0-9]\{4\}'
replace space with your favorite separator
sed -e 's/ /|/g'
This will make :
echo $(echo $test | sed -e 's/<//g' | sed -e 's/>.*//g') $(echo $test | grep -o '[0-9]\{4\}') | sed -e 's/ /|/g'
This will output :
amar|1000|1000|1000
with a quick script you got it : your_script.sh input_file output_file
#!/bin/bash
IFS=$'\n' #line delimiter
#empty your output file
cp /dev/null "$2"
for i in $(cat "$1"); do
newline=`echo $(echo $i | sed -e 's/<//g' | sed -e 's/>.*//g') $(echo $i | grep -o '[0-9]\{4\}') | sed -e 's/ /|/g'`
echo $newline >> "$2"
done
cat "$2"
I've three files:
file1.txt:
XYZ与ABC
DFC什么
FBFBBBFde
warlaugh世界
file2.txt:
XYZ 与 ABC
warlaugh 世界
file3.txt:
XYZ with abc
DFC whatever
FBFBBBF
world of warlaugh
file2.txt is a processed file from file1.txt with spaces. The lines of file1.txt aligns with file3.txt, i.e. foobaristhehelloworld <-> XYZ with abc.
The processing threw away lines from file2.txt due to some reason but what's more important is to retrieve the corresponding lines from file3.txt after processing.
How could I check for which lines have been removed in file2.txt and then produce a file4.txt that looks like this:
file4.txt:
XYZ with abc
world of warlaugh
I could do it with python but I'm sure there's a simple way with sed/awk or bash tricks:
with open('file1.txt', 'r') as file1, open('file2.txt') as file2, open('file3.txt', 'r') as file3:
file2_nospace = [i.replace(' ', '') for i in file2.readlines()]
file2_indices = [i for i,j in enumerate(file1.readlines()) if j in file2_nospace]
file4 = [j for i,j in enumerate(file3.readlines()) if i in file2_indices]
open('file4.txt', 'w').write('\n'.join(file4))
How can i create file4.txt with sed/awk/grep or bash tricks?
first remove spaces in file2.txt to make its lines like file1.txt :
sed 's/ //g' file2.txt
then use that as a pattern to match with file1.txt. do this using grep -f command and use -n to see line numbers of file1.txt which matches with the constructed pattern from file2.txt :
$ grep -nf <(sed 's/ //g' file2.txt) file1.txt
1:XYZ与ABC
4:warlaugh世界
now you need to remove any character after : to make a new pattern to matches with file3.txt lines:
$ grep -nf <(sed 's/ //g' file2.txt) file1.txt | sed 's/:.*/:/'
1:
4:
to add line number to each line of file3.txt use this:
$ nl -s':' file3.txt | sed -r 's/^ +//'
1:XYZ with abc
2:DFC whatever
3:FBFBBBF
4:world of warlaugh
now you can use the first output as a pattern to match with the second:
$ grep -f <(grep -nf <(sed 's/ //g' file2.txt) file1.txt | sed 's/:.*/:/') <(nl -s':' file3.txt | sed -r 's/^ +//')
1:XYZ with abc
4:world of warlaugh
and to remove starting line numbers simply use cut:
$ grep -f <(grep -nf <(sed 's/ //g' file2.txt) file1.txt | sed 's/:.*/:/') <(nl -s':' file3.txt | sed -r 's/^ +//') | cut -d':' -f2
XYZ with abc
world of warlaugh
finally save result to file4.txt :
$ grep -f <(grep -nf <(sed 's/ //g' file2.txt) file1.txt | sed 's/:.*/:/') <(nl -s':' file3.txt | sed -r 's/^ +//') | cut -d':' -f2 > file4.txt
You can do it similarly in a single call to awk:
awk 'FILENAME ~ /file2.txt/ { gsub(/ /, ""); a[$0]; next }
FILENAME ~ /file1.txt/ && $0 in a { b[FNR]; next }
FILENAME ~ /file3.txt/ && FNR in b { print }' file2.txt file1.txt file3.txt
You can also use two awks to avoid using the FILENAME variable:
awk 'FNR==NR { gsub(/ /, ""); a[$0]; next }
$0 in a { print FNR }' file2.txt file1.txt |
awk 'FNR==NR { a[$0]; next } FNR in a { print }' - file3.txt
Use > file4.txt to output to file4.txt after either.
Basically it's
take file2.txt and store it in an associative array after stripping spaces.
store the line number form file1.txt compared to that associative array and store that in another associative array by file line number.
test to see if the line number in file3.txt is in the 2nd associative array and print when there's a match.
Loop through the original file, and look for the corresponding line in file2.
When the lines match, print the corresponding line from file3.
linenr=0
filternr=1
for line in $(cat file1.txt); do
(( linenr = linenr + 1 ))
line2=$(sed -n ${filternr}p file2.txt | cut -d" " -f1)
if [[ "${line}" = ${line2}* ]]; then
(( filternr = filternr + 1 ))
sed -n ${linenr}p file3.txt
fi
done > file4.txt
When the files are large (actually when the number of lines in file2 is large), you would like to change this solution, avoiding sed to go through file2 and file3 everytime. The solution would be less simple to write/understad/maintain...
Looking once in every file can be done with diff and redirection of stdin.
This solution only works when you are sure they do not have a '|'-character:
#/bin/bash
function mycheck {
if [ -z "${filteredline}" ]; then
exec 0<file2.txt
read filteredline
fi
line2=${filteredline%% *}
if [[ "${line}" = ${line2}* ]]; then
echo ${line} | sed 's/.*|\t//'
read filteredline
if [ -z "${filteredline}" ]; then
break;
fi
fi
}
IFS="
"
for line in $(diff -y file1.txt file3.txt); do
mycheck "${line}"
done > file4.txt
I have a file with multiple KV pairs.
Input:
$ cat input.txt
k1:v1 k2:v2 k3:v3
...
I am only interested in the values. The keys (name) are just to remember what each value meant. Essentially I am looking to cut the keys out so that I can plot the value columns.
Output:
$ ...
v1 v2 v3
Is their a single-liner bash command that can help me achieve this?
UPDATE
This is how I am currently doing it (looks ugly)
>> cat input.txt | python -c "import sys; \
lines = sys.stdin.readlines(); \
values = [[i.split(':')[1] for i in item] for item in \
[line.split() for line in lines]]; \
import os; [os.system('echo %s'%v) for v in \
['\t'.join(value) for value in values]]" > output.txt
is this ok for you?
sed -r 's/\w+://g' yourfile
test:
kent$ echo "k1:v1 k2:v2 k3:v3"|sed -r 's/\w+://g'
v1 v2 v3
update
well, if your key contains "-" etc: see below
kent$ echo "k1#-$%-^=:v1 k2:v2 k3:v3"|sed -r 's/[^ ]+://g'
v1 v2 v3
awk -v FS=':' -v RS=' ' -v ORS=' ' '{print $2}' foo.txt
http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
I see sed, awk and python, so here's plain bash:
while IFS=' ' read -a kv ; do printf '%s ' "${kv[#]#*:}" ; done < input.txt
Just for good measure, here's a perl version:
perl -n -e 'print(join(" ",values%{{#{[split(/[:\s]/,$_)]}}})," ")' < input.txt
The order of the values changes, though, so it's probably not going to be what you want.
Solution with awk:
awk '{split($0,p," "); for(kv in p) {split(p[kv],a,":"); printf "%s ",a[2];} print ""}' foo.txt
Try this
Input.txt
k1:v1 k2:v2 k3:v3
Code
awk -F " " '{for( i =1 ; i<=NF ;i+=1) print $i}' Input.txt | cut -d ":" -f 2 | tr '\n' ' '
Output
v1 v2 v3