Python: use a list index as a function argument - python

I'm trying to use list indices as arguments for a function that performs regex searches and substitutions over some text files. The different search patterns have been assigned to variables and I've put the variables in a list that I want to feed the function as it loops through a given text.
When I call the function using a list index as an argument nothing happens (the program runs, but no substitutions are made in my text files), however, I know the rest of the code is working because if I call the function with any of the search variables individually it behaves as expected.
When I give the print function the same list index as I'm trying to use to call my function it prints exactly what I'm trying to give as my function argument, so I'm stumped!
search1 = re.compile(r'pattern1')
search2 = re.compile(r'pattern2')
search3 = re.compile(r'pattern3')
searches = ['search1', 'search2', 'search2']
i = 0
for …
…
def fun(find)
…
fun(searches[i])
if i <= 2:
i += 1
…
As mentioned, if I use fun(search1) the script edits my text files as wished. Likewise, if I add the line print(searches[i]) it prints search1 (etc.), which is what I'm trying to give as an argument to fun.
Being new to Python and programming, I've a limited investigative skill set, but after poking around as best I could and subsequently running print(searches.index(search1) and getting a pattern1 is not in list error, my leading (and only) theory is that I'm giving my function the actual regex expression rather than the variable it's stored in???
Much thanks for any forthcoming help!

Try to changes your searches list to be [search1, search2, search3] instead of ['search1', 'search2', 'search2'] (in which you just use strings and not regex objects)

Thanks to all for the help. eyl327's comment that I should use a list or dictionary to store my regular expressions pointed me in the right direction.
However, because I was using regex in my search patterns, I couldn't get it to work until I also created a list of compiled expressions (discovered via this thread on stored regex strings).
Very appreciative of juanpa.arrivillaga point that I should have proved a MRE (please forgive, with a highly limited skill set, this in itself can be hard to do), I'll just give an excerpt of a slightly amended version of my actual code demonstrating the answer (one again, please forgive its long-windedness, I'm not presently able to do anything more elegant):
…
# put regex search patterns in a list
rawExps = ['search pattern 1', 'search pattern 2', 'search pattern 3']
# create a new list of compiled search patterns
compiledExps = [regex.compile(expression, regex.V1) for expression in rawExps]
i = 0
storID = 0
newText = ""
for file in filepathList:
for expression in compiledExps:
with open(file, 'r') as text:
thisText = text.read()
lines = thisThis.splitlines()
setStorID = regex.search(compiledExps[i], thisText)
if setStorID is not None:
storID = int(setStorID.group())
for line in lines:
def idSub(find):
global storID
global newText
match = regex.search(find, line)
if match is not None:
newLine = regex.sub(find, str(storID), line) + "\n"
newText = newText + newLine
storID = plus1(int(storID), 1)
else:
newLine = line + "\n"
newText = newText + newLine
# list index number can be used as an argument in the function call
idSub(compiledExps[i])
if i <= 2:
i += 1
write()
newText = ""
i = 0

Related

Replacing specific substrings in a specific part of a string

I have a following text file that is to be edited in a certain manner. The part of the file that comes to inside the (init: part is to be overwritten and nothing except that should be edited.
File:
(define (problem bin-picking-doosra)
(:domain bin-picking-second)
;(:requirements :typing :negative-preconditions)
(:objects
)
(:init
(batsmen first_batsman)
(bowler none_bowler)
(umpire third_umpire)
(spectator no_spectator)
)
(:goal (and
(batsmen first_batsman)
(bowler last_bowler)
(umpire third_umpire)
(spectator full_spectator)
)
)
)
In this file I want replace every line that is inside the (init: section with the required string. In this case, I want to replace:
(batsmen first_batsman) with (batsmen none_batsmen)
(bowler none_bowler) with (bowler first_bowler)
(umpire third_umpire) with (umpire leg_umpire)
(spectator no_spectator) with (spectator empty_spectator)
The code I currently have the following:
file_path = "/home/mus/problem_turtlebot.pddl"
s = open(file_path).read()
s = s.replace('(batsmen first_batsman)', '(batsmen '+ predicate_batsmen + '_batsman)')
f = open(file_path, 'w')
f.write(s)
f.close()
The term predicate_batsmen here contains the word none. It works fine this way. This code only satisfies point number 1. mentioned above
There are three problems that I have.
This code also changes the '(batsmen first_batsmen)' part in (goal: part which I dont want. I only want it to change the (init: part
Currently for the other strings in the (init: part, I have to redo this code with different statement. For eg: for '(bowler none_bowler)' i.e. point number 2 above, I have to have a copy of the coded lines again which I think is a not a good coding technique. Any better way for it.
If we consider the first string in (init: that is to be overwritten i.e (batsmen first_batsman). Is there a way in python that no matter what matter what is written in the question mark part of the string like (batsmen ??????_batsman) could be replaced with none. For now it is 'first' but even if it is written 'second'((batsmen second_batsman)) or 'last' ((batsmen last_batsman)) , I want to replace it with 'none'(batsmen none_batsman).
Any ideas on these issues?
Thanks
First of all you need to find the init-group. The init-group seems to have the structure:
(:init
...
)
where ... is some recurrence of text contained inside parenthesis, e.g. "(batsmen first_batsman)". Regular expressions is a powerful way to locate these kind of patterns in text. If you are not familiar with regular expressions (or regex for short) have a look here.
The following regex locates this group:
import re
#Matches the items in the init-group:
item_regex = r"\([\w ]+\)\s+"
#Matches the init-group including items:
init_group_regex = re.compile(r"(\(:init\s+({})+\))".format(item_regex))
init_group = init_group_regex.search(s).group()
Now you have the init-group in match. The next step is to locate the term you would want to replace, and actually replace it. re.sub can do just that! First store the mappings in a dictionary:
mappings = {'batsmen first_batsman': 'batsmen '+ predicate_batsmen + '_batsman',
'bowler none_bowler': 'bowler first_bowler',
'umpire third_umpire': 'umpire leg_umpire',
'spectator no_spectator': 'spectator empty_spectator'}
Finding the occurrences and replacing them by their corresponding value one-by-one:
for key, val in mappings.items():
init_group = re.sub(key, val, init_group)
Finally you can replace the init-group in the original string:
s = init_group_regex.sub(init_group, s)
This is really flexible! You can use regex in mappings to have it match anything you like, including:
mappings = {'batsmen \w+_batsman': '(batsmen '+ predicate_batsmen + '_batsman)'}
to match 'batsmen none_batsman', 'batsmen first_batsman' etc.

Can you spot the problem with this REGEX statement?

Im running .txt files through a for loop which should slice out keywords and .append them into lists. For some reason my REGEX statements are returning really odd results.
My first statement which iterates through the full filenames and slices out the keyword works well.
# Creates a workflow list of file names within target directory for further iteration
stack = os.listdir(
"/Users/me/Documents/software_development/my_python_code/random/countries"
)
# declares list, to be filled, and their associated regular expression, to be used,
# in the primary loop
names = []
name_pattern = r"-\s(.*)\.txt"
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# extraction of country name from file name into `names` list
name_match = re.search(name_pattern, entry)
name = name_match.group(1)
names.append(name)
This works fine and creates the list that I expect
However, once I move on to a similar process with the actual contents of files, it no longer works.
religions = []
reli_pattern = r"religion\s=\s(.+)."
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# opens and reads file within `contents` variable
file_path = (
"/Users/me/Documents/software_development/my_python_code/random/countries" + "/" + entry
)
selection = open(file_path, "rb")
contents = str(selection.read())
# extraction of religion type and placement into `religions` list
reli_match = re.search(reli_pattern, contents)
religion = reli_match.group(1)
religions.append(religion)
The results should be something like: "therevada", "catholic", "sunni" etc.
Instead i'm getting seemingly random pieces of text from the document which have nothing to do with my REGEX like ruler names and stat values that do not contain the word "religion"
To try and figure this out I isolated some of the code in the following way:
contents = "religion = catholic"
reli_pattern = r"religion\s=\s(.*)\s"
reli_match = re.search(reli_pattern, contents)
print(reli_match)
And None is printed to the console so I am assuming the problem is with my REGEX. What silly mistake am I making which is causing this?
Your regular expression (religion\s=\s(.*)\s) requires that there be a trailing whitespace (the last \s there). Since your string doesn't have one, it doesn't find anything when searching thus re.search returns None.
You should either:
Change your regex to be r"religion\s=\s(.*)" or
Change the string you're searching to have a trailing whitespace (i.e 'religion = catholic' to 'religion = catholic ')

In Python,if startswith values in tuple, I also need to return which value

I have an area codes file I put in a tuple
for line1 in area_codes_file.readlines():
if area_code_extract.search(line1):
area_codes.append(area_code_extract.search(line1).group())
area_codes = tuple(area_codes)
and a file I read into Python full of phone numbers.
If a phone number starts with one of the area codes in the tuple, I need to do to things:
1 is to keep the number
2 is to know which area code did it match, as need to put area codes in brackets.
So far, I was only able to do 1:
for line in txt.readlines():
is_number = phonenumbers.parse(line,"GB")
if phonenumbers.is_valid_number(is_number):
if line.startswith(area_codes):
print (line)
How do I do the second part?
The simple (if not necessarily highest performance) approach is to check each prefix individually, and keep the first match:
for line in txt:
is_number = phonenumbers.parse(line,"GB")
if phonenumbers.is_valid_number(is_number):
if line.startswith(area_codes):
print(line, next(filter(line.startswith, area_codes)))
Since we know filter(line.startswith, area_codes) will get exactly one hit, we just pull the hit using next.
Note: On Python 2, you should start the file with from future_builtins import filter to get the generator based filter (which will also save work by stopping the search when you get a hit). Python 3's filter already behaves like this.
For potentially higher performance, the way to both test all prefixes at once and figure out which value hit is to use regular expressions:
import re
# Function that will match any of the given prefixes returning a match obj on hit
area_code_matcher = re.compile(r'|'.join(map(re.escape, area_codes))).match
for line in txt:
is_number = phonenumbers.parse(line,"GB")
if phonenumbers.is_valid_number(is_number):
# Returns None on miss, match object on hit
m = area_code_matcher(line)
if m is not None:
# Whatever matched is in the 0th grouping
print(line, m.group())
Lastly, one final approach you can use if the area codes are of fixed length. Rather than using startswith, you can slice directly; you know the hit because you sliced it off yourself:
# If there are a lot of area codes, using a set/frozenset will allow much faster lookup
area_codes_set = frozenset(area_codes)
for line in txt:
is_number = phonenumbers.parse(line,"GB")
if phonenumbers.is_valid_number(is_number):
# Assuming lines that match always start with ###
if line[:3] in area_codes_set:
print(line, line[:3])

Trying to use Python script to add strings to file

I have a spanish novel, in a plain textfile, and I want to make a Python script that puts a translation in brackets after difficult words. I have a list of the words (with translations) I want to do this with in a separate text file, which I have tried to format correctly.
I've forgotten everything I knew about Python, which was very little to begin with, so I'm struggling.
This is a script someone helped me with:
bookin = (open("C:\Users\King Kong\Documents\_div_tekstfiler_\coc_es.txt")).read()
subin = open("C:\Users\King Kong\Documents\_div_tekstfiler_\cocdict.txt")
for line in subin.readlines():
ogword, meaning = line.split()
subword = ogword + " (" + meaning + ")"
bookin.replace(ogword, subword)
ogword = ogword.capitalize()
subword = ogword + " (" + meaning + ")"
bookin.replace(ogword, subword)
subin.close()
bookout = open("fileout.txt", "w")
bookout.write(bookin)
bookout.close()
When I ran this, I got this error message:
Traceback (most recent call last):
File "C:\Python27\translscript_secver.py", line 4, in <module>
ogword, meaning = line.split()
ValueError: too many values to unpack
The novel pretty big, and the dictionary I've made consists of about ten thousand key value pairs.
Does this mean there's something wrong with the dictionary? Or it's too big?
Been researching this a lot, but I can't seem to make sense of it. Any advice would be appreciated.
line.split() in ogword, meaning = line.split() returns a list, and in this case it may be returning more than 2 values. Write your code in a way that can handle more than two values. For instance, by assigning line.split() to a list and then asserting that the list has two items:
mylist = line.split()
assert len(mylist) == 2
ogword, meaning = line.split()[:2]
line.split() return a list of words (space separated token) in line. The error you get suggest that somewhere, your dictionnary contains more than just pair. You may add trace message to locate the error (see below).
If your dictionnary contains richer definitions than synonym, you may use following lines, which put the first word in ogword and following ones in meaning.
words = line.split()
ogword, meaning = words[0], " ".join(words[1:])
If your dictionary syntax is more complex (composed ogword), you have to rely on an explicit separator. You can still use split to divide your lines (line.split("=") will split a line on "=" characters)
Edit: to ignore and display bad lines, replace ogword, meaning = line.split() with
try:
ogword,meaning = line.split()
except:
print "wrong formated line:", line
continue
split()
returns a single list, ie one item, you are trying to assign this one thing to two variables.
It will work if the number of items in the list is equal to the number of variables on the left hand side of the assignment statement. I.e., the list is unpacked and the individual parts are assigned to the variables on the left hand side.
In this case, as pointed out by #Josvic Zammit, the problem can occur if there are more than 2 items in the list and can not properly "unpacked" and assigned.

(permutation/Anagrm) words find in python 2.72 (need help to find what's wrong with my code)

i hope this request is legit.
i'm taking a programming course in python for engineers, so i'm kinda new at this business.
anyway, in my homework i was requested to write a function with receive two strings and check if one is a (permutation/Anagrm) of the other. (which means if they both have exactly the same letters and same number of appearances for each letter)
iv'e found some great codes here while searching, but i still don't get what's wrong with my code (and it's important for me to know for my studying process).
we got a tests file which suppose to check our functions, and it gave me that error:
Traceback (most recent call last):
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 110, in <module>
test_hw4()
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 97, in test_hw4
test(is_anagram('Tom Marvolo Riddle','I Am Lord Voldemort'), True)
File "C:\Users\Or\Desktop\תכנות\4\hw4\123456789_a4.py", line 31, in is_anagram
s2_list.sort()
NameError: global name 's2_list' is not defined
this is my code:
def is_anagram(string1, string2):
string1 = string1.lower() #turns Capital letter to small ones
string2 = string2.lower()
string1 = string1.replace(" ","") #turns the words inside the string to one word
string2 = string2.replace(" ","")
if len(string1)!= len(string2):
return False
s1_list = [string1[i] for i in range(len(string1))] #creates a list of string 1 letters
a2_list = [string1[k] for k in range(len(string1))]
s1_list.sort() #sorting the list
s2_list.sort()
booli=False
k=0
for i in s1_list: #for loop which compares each letter in the two lists
if s1_list[k]==s2_list[k]:
booli = True
k=k+1
else:
booli=False
break
return booli
any one know how to fix it ?
Thanks!
It looks like you have a typo with a2_list. That section should read:
s1_list = [string1[i] for i in range(len(string1))] #creates a list of string 1 letters
s2_list = [string2[k] for k in range(len(string2))]
s1_list.sort() #sorting the list
s2_list.sort()
FWIW, here is an interactive prompt example of how to tell if two strings are anagrams of one another:
>>> string1 = 'Logarithm'
>>> string2 = 'algorithm'
>>> sorted(string1.lower()) == sorted(string2.lower()) # see if they are anagrams
True
If you make a listify_string function and use that to set your s1_list and s2_list, it might be easier to see that there are multiple things that look to be wrong with your code, unless you intended both s1_list and s2_list to be populated from the same string.
def listify(string):
return [c for c in string]
Then you can simply do s1_list = listify(string1) and s2_list = ... to set the values.
I would probably turn at least the 'check if the two lists are the same' into a function, so I could use an early return to indicate falseness (so instead of starting with booli as true, setting it on each iteration through the loop and breaking out of the loop if false).
If you look at the join method of Python strings, you might find inspiration for another way to check if s1_list and s2_list are the same.
Try this one-liner instead:
sorted(s1.lower().replace(' ', '')) == sorted(s2.lower().replace(' ', ''))
Python strings are essentially lists, so they can be sorted. We just need to take care of uppercase and whitespace first. The python equals operator then takes care of the actual comparison.

Categories