Comparing two files and removing all whitespaces

Comparing two files and removing all whitespaces - python

Is there a more elegant way of comparing these two files?
Right now I am getting the following error message: syntax error near unexpected token (... diff <( tr -d ' '.
result = Popen("diff <( tr -d ' \n' <" + file1 + ") <( tr -d ' \n' <"
+ file2 + ") | wc =l", shell=True, stdout=PIPE).stdout.read()
Python seems to read "\n" as a literal character.

The constructs you are using are interpreted by bash and do not form a standalone statement that you can pass to system() or exec().
<( ${CMD} )
< ${FILE}
${CMD1} | ${CMD2}
As such, you will need to wire-up the redirection and pipelines yourself, or call on bash to interpret the line for you (as #wizzwizz4 suggests).
A better solution would be to use something like difflib that will perform this internally to your process rather than calling on system() / fork() / exec().
Using difflib.unified_diff will give you a similar result:
import difflib
def read_file_no_blanks(filename):
with open(filename, 'r') as f:
lines = f.readlines()
for line in lines:
if line == '\n':
continue
yield line
def count_differences(diff_lines):
diff_count = 0
for line in diff_lines:
if line[0] not in [ '-', '+' ]:
continue
if line[0:3] in [ '---', '+++' ]:
continue
diff_count += 1
return diff_count
a_lines = list(read_file_no_blanks('a'))
b_lines = list(read_file_no_blanks('b'))
diff_lines = difflib.unified_diff(a_lines, b_lines)
diff_count = count_differences(diff_lines)
print('differences: %d' % ( diff_count ))

This will fail when you fix the syntax error because you are attempting to use bash syntax in what is implemented as a C system call.
If you wish to do this in this way, either write a shell script or use the following:
result = Popen(['bash', '-c',
"diff <( tr -d ' \n' <" + file1 + ") <( tr -d ' \n' <"
+ file2 + ") | wc =l"], shell=True, stdout=PIPE).stdout.read()
This is not an elegant solution, however, since it is relying on the GNU coreutils and bash. A more elegant solution would be pure Python. You could do this with the difflib module and the re module.

Related

Get a string in Shell/Python with subprocess

After this topic Get a string in Shell/Python using sys.argv , I need to change my code, I need to use a subprocess in a main.py with this function :
def download_several_apps(self):
subproc_two = subprocess.Popen(["./readtext.sh", self.inputFileName_download], stdout=subprocess.PIPE)
Here is my file readtext.sh
#!/bin/bash
filename="$1"
counter=1
while IFS=: true; do
line=''
read -r line
if [ -z "$line" ]; then
break
fi
python3 ./download.py \
-c ./credentials.json \
--blobs \
"$line"
done < "$filename"
And my download.py file
if (len(sys.argv) == 2):
downloaded_apk_default_location = 'Downloads/'
else:
readtextarg = os.popen("ps " + str(os.getppid()) + " | awk ' { out = \"\"; for(i = 6; i <= NF; i++) out = out$i\" \" } END { print out } ' ").read()
textarg = readtextarg.split(" ")[1 : -1][0]
downloaded_apk_default_location = 'Downloads/'+textarg[1:]
How can I get and print self.inputFileName_download in my download.py file ?
I used sys.argv as answerd by #tripleee in my previous post but it doesn't work as I need.

Ok I changed the last line by :
downloaded_apk_default_location = 'Downloads/'+textarg.split("/")[-1]
to get the textfile name

The shell indirection seems completely superfluous here.
import download
with open(self.inputFileName_download) as apks:
for line in apks:
if line == '\n':
break
blob = line.rstrip('\n')
download.something(blob=blob, credentials='./credentials.json')
... where obviously I had to speculate about what the relevant function from downloads.py might be called.

How to solve a subprocess that contains a '|'

This code does not work.
I wrote like this.
str = "curl -s 'URL_ADDRESS' | tail -1".split()
p = subprocess.Popen(str,stdout=subprocess.PIPE).stdout
data = p.read()
p.close()
print(data)
But the result is b''.
What's the problem with this?

If you use subprocess, use instead of '|' like this.
This will solve the problem.
str = "curl -s 'URL_ADDRESS'".split()
tail = "tail -1".split()
temp = subprocess.Popen(str, stdout=subprocess.PIPE).stdout
temp1 = subprocess.Popen(tail, stdin=temp, stdout=subprocess.PIPE).stdout
temp.close()
data = temp1.read()
temp1.close()

Calling bash command inside Python returns error, but works in terminal

Here is the except of my code related to this:
def grd_commands(directory):
for filename in os.listdir(directory)[1:]:
print filename
new_filename = ''
first_letter = ''
second_letter = ''
bash_command = 'gmt grdinfo ' + filename + ' -I-'
print bash_command
coordinates = Popen(bash_command, stdout=PIPE, shell=True)
coordinates = coordinates.communicate()
latlong = re.findall(r'^\D*?([-+]?\d+)\D*?[-+]?\d+\D*?([-+]?\d+)', coordinates)
if '-' in latlong[1]:
first_letter = 'S'
else:
first_letter = 'N'
if '-' in latlong[0]:
second_letter = 'W'
else:
second_letter = 'E'
new_filename = first_letter + str(latlong[1]) + second_letter + str(latlong[0]) + '.grd'
Popen('gmt grdconvert ' + str(filename) + ' ' + new_filename, shell=True)
filenameis the name of the file that is is being passed to the function. When I run my code, I am receiving this error:
/bin/sh: gmt: command not found
Traceback (most recent call last):
File "/Users/student/Desktop/Code/grd_commands.py", line 38, in <module>
main()
File "/Users/student/Desktop/Code/grd_commands.py", line 10, in main
grd_commands(directory)
File "/Users/student/Desktop/Code/grd_commands.py", line 23, in grd_commands
latlong = re.findall(r'^\D*?([-+]?\d+)\D*?[-+]?\d+\D*?([-+]?\d+)', coordinates).split('\n')
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer
If I print out the string bash_command and try entering it into terminal it fully functions. Why doesn't it work when being called by my Python script?

The entire command line is being treated as a single command name. You need to either use shell=True to have the shell parse it as a command line:
coordinates = Popen(bash_command, stdout=PIPE, shell=True)
or preferably store the command name and its arguments as separate elements of a list:
bash_command = ['gmt', 'grdinfo', filename, '-I-']
coordinates = Popen(bash_command, stdout=PIPE)

Popen takes a list of arguments. There is a warning for using shell=True
Passing shell=True can be a security hazard if combined with untrusted input.
Try this:
from subprocess import Popen, PIPE
bash_command = 'gmt grdinfo ' + filename + ' -I-'
print(bash_command)
coordinates = Popen(bash_command.split(), stdout=PIPE)
print(coordinates.communicate()[0])
Ensure gmt is installed in a location specified by PATH in your /etc/environment file:
PATH=$PATH:/path/to/gmt
Alternatively, specify the path to gmt in bash_command:
bash_command = '/path/to/gmt grdinfo ' + filename + ' -I-'
You should be able to find the path with:
which gmt
As other people have suggested, an actual list would be the best approach instead of a string. Additionally, you must escape spaces with a '\' in order to actually access the file if there is a space in it.
for filename in os.listdir(directory)[1:]:
bash_command = ['gmt', 'grdinfo', filename.replace(" ", "\ "), '-I-']

grep: write error: Broken pipe with subprocess

I get couple of grep:write errors when I run this code.
What am I missing?
This is only part of it:
while d <= datetime.datetime(year, month, daysInMonth[month]):
day = d.strftime("%Y%m%d")
print day
results = [day]
first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
output1=first.communicate()[0]
d += delta
day = d.strftime("%Y%m%d")
second=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
output2=second.communicate()[0]
articleList = (output1.split('\n'))
articleList2 = (output2.split('\n'))
results.append( len(articleList)+len(articleList2))
w.writerow(tuple(results))
d += delta

When you do
A | B
in a shell, process A's output is piped into process B as input. If process B shuts down before reading all of process A's output (e.g. because it found what it was looking for, which is the function of the -l option), then process A may complain that its output pipe was prematurely closed.
These errors are basically harmless, and you can work around them by redirecting stderr in the subprocesses to /dev/null.
A better approach, though, may simply be to use Python's powerful regex capabilities to read the files:
def fileContains(fn, pat):
with open(file) as f:
for line in f:
if re.search(pat, line):
return True
return False
first = []
for file in glob.glob(monthDir +"/"+day+"*.txt"):
if fileContains(file, 'Algeria|Bahrain') and fileContains(file, 'Protest|protesters'):
file.append(first)

To find the files matching two patterns, the command structure should be:
grep -l pattern1 $(grep -l pattern2 files)
$(command) substitutes the output of the command into the command line.
So your script should be:
first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' $("+ grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt)", shell=True, stdout=subprocess.PIPE, )
and similarly for second

If you are just looking for whole words, you could use the count() member function;
# assuming names is a list of filenames
for fn in names:
with open(fn) as infile:
text = infile.read().lower()
# remove puntuation
text = text.replace(',', '')
text = text.replace('.', '')
words = text.split()
print "Algeria:", words.count('algeria')
print "Bahrain:", words.count('bahrain')
print "protesters:", words.count('protesters')
print "protest:", words.count('protest')
If you want more powerful filtering, use re.

Add stderr args in the Popen function based on the python version the stderr value will change. This will support if the python version is less than 3
first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+".txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+".txt", shell=True, stdout=subprocess.PIPE, stderr = subprocess.STDOUT)

How to print output using python?

When this .exe file runs it prints a screen full of information and I want to print a particular line out to the screen, here on line "6":
cmd = ' -a ' + str(a) + ' -b ' + str(b) + str(Output)
process = Popen(cmd, shell=True, stderr=STDOUT, stdout=PIPE)
outputstring = process.communicate()[0]
outputlist = outputstring.splitlines()
Output = outputlist[5]
print cmd
This works fine:
cmd = ' -a ' + str(a) + ' -b ' + str(b)
This doesn't work:
cmd = ' -a ' + str(a) + ' -b ' + str(b) + str(Output)
I get an error saying Output isn't defined. But when I cut and paste:
outputstring = process.communicate()[0]
outputlist = outputstring.splitlines()
Output = outputlist[5]
before the cmd statement it tells me the process isn't defined. str(Output) should be what is printed on line 6 when the .exe is ran.

You're trying to append the result of a call into the call itself. You have to run the command once without the + str(Output) part to get the output in the first place.
Think about it this way. Let's say I was adding some numbers together.
z = 5 + b
b = z + 2
I have to define either z or b before the statements, depending on the order of the two statements. I can't use a variable before I know what it is. You're doing the same thing, using the Output variable before you define it.

It's not supposed to be a "dance" to move things around. It's a matter of what's on the left side of the "=". If it's on the left side, it's getting created; if it's on the right side it's being used.
As it is, your example can't work even a little bit because line one wants part of output, which isn't created until the end.
The easiest way to understand this is to work backwards. You want to see as the final result?
print output[5]
Right? So to get there, you have to get this from a larger string, right?
output= outputstring.splitlines()
print output[5]
So where did outputstring come from? It was from some subprocess.
outputstring = process.communicate()[0]
output= outputstring.splitlines()
print output[5]
So where did process come from? It was created by subprocess Popen
process = Popen(cmd, shell=True, stderr=STDOUT, stdout=PIPE)
outputstring = process.communicate()[0]
output= outputstring.splitlines()
print output[5]
So where did cmd come from? I can't tell. Your example doesn't make sense on what command is being executed.
cmd = ?
process = Popen(cmd, shell=True, stderr=STDOUT, stdout=PIPE)
outputstring = process.communicate()[0]
output= outputstring.splitlines()
print output[5]

Just change your first line to:
cmd = ' -a ' + str(a) + ' -b ' + str(b)
and the print statement at the end to:
print cmd + str(Output)
This is without knowing exactly what it is you want to print...
It -seems- as if your problem is trying to use Output before you actually define what the Output variable is (as the posts above)

Like you said, a variable has to be declared before you can use it. Therefore when you call str(Output) ABOVE Output = outputlist[5], Output doesn't exist yet. You need the actually call first:
cmd = ' -a ' + str(a) + ' -b ' + str(b)
then you can print the output of that command:
cmd_return = ' -a ' + str(a) + ' -b ' + str(b) + str(Output)
should be the line directly above print cmd_return.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing two files and removing all whitespaces - python

Related

Get a string in Shell/Python with subprocess

How to solve a subprocess that contains a '|'

Calling bash command inside Python returns error, but works in terminal

grep: write error: Broken pipe with subprocess

How to print output using python?

Categories

Resources