Python RegEx Woes

Python RegEx Woes - python

I'm not sure why this isn't working:
import re
import csv
def check(q, s):
match = re.search(r'%s' % q, s, re.IGNORECASE)
if match:
return True
else:
return False
tstr = []
# test strings
tstr.append('testthisisnotworking')
tstr.append('This is a TEsT')
tstr.append('This is a TEST mon!')
f = open('testwords.txt', 'rU')
reader = csv.reader(f)
for type, term, exp in reader:
for i in range(2):
if check(exp, tstr[i]):
print exp + " hit on " + tstr[i]
else:
print exp + " did NOT hit on " + tstr[i]
f.close()
testwords.txt contains this line:
blah, blah, test
So essentially 'test' is the RegEx pattern. Nothing complex, just a simple word. Here's the output:
test did NOT hit on testthisisnotworking
test hit on This is a TEsT
test hit on This is a TEST mon!
Why does it NOT hit on the first string? I also tried \s*test\s* with no luck. Help?

The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space.
A quick way to fix this would be to add:
exp = exp.strip()
after you read from the CSV file.

Adding a print repr(exp) to the top of the first for loop shows that exp is ' test', note the leading space.
This isn't that surprising since csv.reader() splits on commas, try changing your code to the following:
for type, term, exp in reader:
exp = exp.strip()
for s in tstr:
if check(exp, s):
print exp + " hit on " + s
else:
print exp + " did NOT hit on " + s
Note that in addition to the strip() call which will remove the leading a trailing whitespace, I change your second for loop to just loop directly over the strings in tstr instead of over a range. There was actually a bug in your current code because tstr contained three values but you only checked the first two because for i in range(2) will only give you i=0 and i=1.

Related

Python - Need help in printing "with at least 3 spaces between columns and be left-aligned for names and right-aligned for number of occurrence."

I am having a problem with this problem in trying to output with at least 3 spaces between columns and be left-aligned for names and right-aligned for number of occurrence. Please guide me, I am trying to solve this programming problem.
def nameCount(fname1,fname2):
firstFile = open(fname1, 'r')
fContent = firstFile.read()
firstFile.close()
secondFile = open(fname2, 'r')
sContent = secondFile.read()
secondFile.close()
#Split first and last name to the following variables.
for content in fContent:
(first, last) = sContent.split()
countFirstName = 0
countSecondName = 0
if first == content or last == content:
countFirstName += 1
countSecondName += 1
thankYouMessage = 'Thank you for using the nameCount() function'
return thankYouMessage

To print with spaces in between test just us the "\n" which goes to a new line.
print("Hello" + "\n" + "\n" + "\n" + "\n" + "World")
Each "\n" makes it go to a newline each time.

You can try using the %s character for formatting columns

Python: Remove character only from end of string if character is ="/"

I add different Values to the Houdini Variables with Python.
Some of these Variables are file pathes and end with an "/" - others are just names and do not end with an "/".
In my current code I use [:-1] to remove the last character of the filepath, so I dont have the "/".
The problem is, that if I add a Value like "Var_ABC", the result will be "Var_AB" since it also removes the last character.
How can i remove the last character only if the last character is a "/"?
Thats what I have and it works so far:
def set_vars():
count = hou.evalParm('vars_names')
user_name = hou.evalParm('user_name')
for idx in range( 1,count+1):
output = hou.evalParm('vars_' + str(idx))
vars_path_out = hou.evalParm('vars_path_' + str(idx))
vars_path = vars_path_out[:-1]
hou.hscript("setenv -g " + output + "=" + vars_path)
final_vars = hou.hscript("setenv -g " + output + "=" + vars_path)
hou.ui.displayMessage(user_name +", " + "all variables are set.")
Thank you

As #jasonharper mentioned in the comments, you should probably use rstrip here. It is built-in and IMO more readable than the contitional one-liner:
vars_path_out.rstrip('/')
This will strip out those strings which end with / and return without that ending. Otherwise it will return your string as-is.

Try this in your code:
vars_path_out = hou.evalParm('vars_path_' + str(idx))
if vars_path_out[-1] == '/':
vars_path = vars_path_out[:-1]
or
based on the comment of jasonharper
vars_path = vars_path_out.rstrip('/')
This is much better than the first

Use endswith method to check if it ends with /
if vars_path_out.endswith('/')
Or simply check the last character:
if vars_path_out[-1] == '/'
Like this:
vars_path = vars_path_out[:-1] if vars_path_out.endswith('/') else vars_path_out
Or like this:
if vars_path_out.endswith('\'):
vars_path = vars_path_out[:-1]
else:
vars_path = vars_path_out
another way is rstrip method:
vars_path = vars_path_out.rstrip('/')

Python not ignoring empty items in list

I have this code to print some strings to a text file, but I need python to ignore every empty items, so it doesn't print empty lines.
I wrote this code, which is simple, but should do the trick:
lastReadCategories = open('c:/digitalLibrary/' + connectedUser + '/lastReadCategories.txt', 'w')
for category in lastReadCategoriesList:
if category.split(",")[0] is not "" and category is not None:
lastReadCategories.write(category + '\n')
print(category)
else: print("/" + category + "/")
lastReadCategories.close()
I can see no problem with it, yet, python keeps printing the empty items to the file. All categories are written in this notation: "category,timesRead", that's why I ask python to see if the first string before the comma is not empty. Then I see if the whole item is not empty (is not None). In theory I guess it should work, right?
P.S.: I've already tried asking the if to check if 'category' is not "" and is not " ", still, the same result.

Test for boolean truth instead, and reverse your test so that you are certain that .split() will work in the first place, None.split() would throw an exception:
if category is not None and category.split(",")[0]:
The empty string is 'false-y', there is no need to test it against anything.
You could even just test for:
if category and not category.startswith(','):
for the same end result.
From comments, it appears you have newlines cluttering up your data. Strip those away when testing:
for category in lastReadCategoriesList:
category = category.rstrip('\n')
if category and not category.startswith(','):
    lastReadCategories.write(category + '\n')
    print(category)
else: print("/{}/".format(category))
Note that you can simply alter category inside the loop; this avoids having to call .rstrip() multiple times.

rstrip() your category before writing it back to file
lastReadCategories = open('c:/digitalLibrary/' + connectedUser +'/lastReadCategories.txt', 'w')
for category in lastReadCategoriesList:
if category.split(",")[0] is not "" and category is not None:
lastReadCategories.write(category.rstrip() + '\n')
print(category.rstrip())
else: print("/" + category + "/")
lastReadCategories.close()
I was able to test it with your sample list provided (without writing it to file):
lastReadCategoriesList = ['A,52', 'B,1\n', 'C,50', ',3']
for category in lastReadCategoriesList:
if category.split(",")[0] is not "" and category is not None:
print(category.rstrip())
else: print("/" + category + "/")
>>> ================================ RESTART ================================
>>>
A,52
B,1
C,50
/,3/
>>>

The classic way to test for an empty string (ie, only whitespace but not '') is with str.strip():
>>> st=' '
>>> bool(st)
True
>>> bool(st.strip())
False
Which also works on a null string:
>>> bool(''.strip())
False
You have if category.split(",")[0] is not "" ... and this is not the recommended way. You can do this:
if category.split(',')[0] and ...
Or, if you want to be wordier:
if bool(category.split(',')[0]) is not False and ...
And you may be dealing with an issue with leading whitespace in the CSV:
>>> ' ,'.split(',')
[' ', '']
>>> ' ,val'.split(',')
[' ', 'val']

Replace single quotes with double quotes in python, for use with insert into database

Was wondering whether anyone has a clever solution for fixing bad
insert statements in Python, exported by a not so clever program. It didn't add
two single quotes for strings with a single quote in the string. To
make it a bit easier all the values being inserted are strings.
So it has:
INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');
instead of:
INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');
Obviously there are multiple lines of this and I don't want to replace
the enclosing single quotes as well.
Was thinking of using split and join, but I'm not sure how to easily update the split values while looping in a loop. Sorry I'm a noob. Something like the below, where I'm not sure how to do #update bit
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
bits = line.split("','")
for bit in bits:
if bit.find("'") > -1:
#update bit
line_out = "','".join(bits)
sys.stdout.write(line_out)
line = fileIN.readline()
Thanks

Based on katrielalex's suggestion, how about this:
>>> import re
>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
>>> def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
>>> re.sub("(.)'(.)", repl, s)
"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"
and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:
re.sub("((?<![(,])'(?![,)]))", "''", s)

while line:
# Restrain line2 to inside parentheses
line1, rest = line.split('(')
line2, line3 = rest.split(')')
# A bit more cleaner
new_bits = []
for bit in line2.split(','):
# Remove border ' characters
bit = bit[1:-1]
# Duplicate the ones inside
if "'" in bit:
bit = bit.replace("'", "''")
# Re-add border '
new_bits.append("'" + bit + "'")
sys.stdout.write(line1 + '(' + ','.join(new_bits + ')' + line3)
line = fileIN.readline()

Warning: This depends way too much on the formatting of the SQL statement. However, if your input is only ever going to have the format "statements (params) end" then this will work every time.
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
#split out the parameters (between the ()'s)
start, temp = line.split("(")
params, end = temp.split(")")
#replace the "'"s in the parameters (without the start and end quote)
newParams = "','".join([x.replace("'", "''") for x in params[1:-1].split("','")])
#join the statement back together
line_out = start + "('" + newParams + "')" + end
#next line
sys.stdout.write(line_out)
line = fileIN.readline()
Explanation:
Split the string into 3 parts: The query start, the parameters, and the end.
The generator takes the parameters (without the starting/ending 's), splits it on ',', and, for every element in the list the split generates (the individual data entries), replaces the 's with ''s.
The last line then joins the query start, the new params (with the parenthesis and quotes that were removed previously), and the end of the statement.

Another answer:
a = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
open_par = a.find("(")
close_par = a.find(")")
b = a[open_par+1:close_par]
c = b.split(",")
d = map(lambda x: '"' + x.strip().strip("'") + '"',c)
result = a[:open_par+1] + ",".join(d) + a[close_par:]

Went with:
import sys
import re
def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
line_out = re.sub("(.)'(.)", repl, line)
sys.stdout.write(line_out)
# Next line.
line = fileIN.readline()

Python RegEx nested search and replace

I need to to a RegEx search and replace of all commas found inside of quote blocks.
i.e.
"thing1,blah","thing2,blah","thing3,blah",thing4
needs to become
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
my code:
inFile = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()
p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
pg = p.search(line)
# found comment block
if pg:
q = re.compile(r'[^\\],')
# found comma within comment block
qg = q.search(pg.group(0))
if qg:
# Here I want to reconstitute the line and print it with the replaced text
#print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))
I need to filter only the columns I want based on a RegEx, filter further,
then do the RegEx replace, then reconstitute the line back.
How can I do this in Python?

The csv module is perfect for parsing data like this as csv.reader in the default dialect ignores quoted commas. csv.writer reinserts the quotes due to the presence of commas. I used StringIO to give a file like interface to a string.
import csv
import StringIO
s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()
result:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"

General Edit
There was
"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4
in the question, and now it is not there anymore.
Moreover, I hadn't remarked r'[^\\],'.
So, I completely rewrite my answer.
"thing1,blah","thing2,blah","thing3,blah",thing4
and
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
being displays of strings (I suppose)
import re
ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '
regx = re.compile('"[^"]*"')
def repl(mat, ri = re.compile('(?<!\\\\),') ):
return ri.sub('\\\\',mat.group())
print ss
print repr(ss)
print
print regx.sub(repl, ss)
print repr(regx.sub(repl, ss))
result
"thing1,blah","thing2,blah","thing3\,blah",thing4
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '
"thing1\blah","thing2\blah","thing3\,blah",thing4
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '

You can try this regex.
>>> re.sub('(?<!"),(?!")', r"\\,",
'"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4
The logic behind this is to substitute a , with \, if it is not immediately both preceded and followed by a "

I came up with an iterative solution using several regex functions:
finditer(), findall(), group(), start() and end()
There's a way to turn all this into a recursive function that calls itself.
Any takers?
outfile = open(outfileName,'w')
p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
pg = p.finditer(line)
pglen = len(p.findall(line))
if pglen > 0:
mpgstart = 0;
mpgend = 0;
for i,mpg in enumerate(pg):
if i == 0:
outfile.write(line[:mpg.start()])
qg = q.finditer(mpg.group(0))
qglen = len(q.findall(mpg.group(0)))
if i > 0 and i < pglen:
outfile.write(line[mpgend:mpg.start()])
if qglen > 0:
for j,mqg in enumerate(qg):
if j == 0:
outfile.write( mpg.group(0)[:mqg.start()] )
outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )
if j == (qglen-1):
outfile.write( mpg.group(0)[mqg.end():] )
else:
outfile.write(mpg.group(0))
if i == (pglen-1):
outfile.write(line[mpg.end():])
mpgstart = mpg.start()
mpgend = mpg.end()
else:
outfile.write(line)
outfile.close()

have you looked into str.replace()?
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
here is some documentation
hope this helps

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python RegEx Woes - python

The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space. A quick way to fix this would be to add: exp = exp.strip() after you read from the CSV file.

Related

Python - Need help in printing "with at least 3 spaces between columns and be left-aligned for names and right-aligned for number of occurrence."

Python: Remove character only from end of string if character is ="/"

Python not ignoring empty items in list

Replace single quotes with double quotes in python, for use with insert into database

Python RegEx nested search and replace

Categories

Resources