What is the correct (pythonic) way to do something like this?
var = 'The quick brown fox'
def exists(query, string):
if query in string:
return query
else:
return None
if thing = exists('fox', var):
print(thing.upper())
This is my example, but what I'm really trying to do is check if a Selenium web element exists. I want to avoid setting the result to a variable because that defeats the purpose of "exists". Also, I don't want to perform the search twice returning true/false the first time and then again, if it's true, to do something with it.
This is one of those cases where there's more than one way to do it. A few things you could do:
Treat the result as a collection of zero to one elements (it makes more sense if you call it something like find_one in such cases):
def find_one(query, string):
if query in string:
return [query]
return []
Then you can use the function in a for loop:
for existing_element in find_one(query, string):
# Do something with existing element
break
else:
# Here if we don't have any elements (note the `break` above)
Pass a callback as the first argument:
def if_exists(cb, query, string):
if query in string:
cb(query)
def run_on_valid_query(q):
# Do something with q
if_exists(run_on_valid_query, query, string)
Bite the bullet and use an intermediate variable:
result = extract_from(query, string)
if result: # Do work here
Related
Given a stringified phone number of non-zero length, write a function that returns all mnemonics for this phone number in any order.
`
def phoneNumberMnemonics(phoneNumber, Mnemonics=[''], idx=0):
number_lookup={'0':['0'], '1':['1'], '2':['a','b','c'], '3':['d','e','f'], '4':['g','h','i'], '5':['j','k','l'], '6':['m','n','o'], '7':['p','q','r','s'], '8':['t','u','v'], '9':['w','x','y','z']}
if idx==len(phoneNumber):
return Mnemonics
else:
new_Mnemonics=[]
for letter in number_lookup[phoneNumber[idx]]:
for mnemonic in Mnemonics:
new_Mnemonics.append(mnemonic+letter)
phoneNumberMnemonics(phoneNumber, new_Mnemonics, idx+1)
`
If I use the input "1905", my function outputs null. Using a print statement right before the return statement, I can see that the list Mnemonics is
['1w0j', '1x0j', '1y0j', '1z0j', '1w0k', '1x0k', '1y0k', '1z0k', '1w0l', '1x0l', '1y0l', '1z0l']
Which is the correct answer. Why is null being returned?
I am not very good at implementing recursion (yet?), your help is appreciated.
There are different recursive expressions of this problem, but the simplest to think about when you are starting out is a "pure functional" one. This means you never mutate recursively determined values. Rather compute fresh new ones: lists, etc. (Python does not give you a choice regarding strings; they're always immutable.) In this manner you can think about values only, not how they're stored and what's changing them, which is extremely error prone.
A pure-functional way to think about this problem is this:
If the phone number is the empty string, then the return value is just a list containing the empty string.
Else break the number into its first character and the rest. Recursively get all the mnemonics R of the rest. Then find all the letters corresponding to the first and prepend each of these to each member of R to make a new string (This is called a Cartesian cross product, which comes up often in recursion.) Return all of those strings.
In this expression, the pure function has the form
M(n: str) -> list[str]:
It's accepting a string of digits and returning a list of mnemonics.
Putting this thought into python is fairly simple:
LETTERS_BY_DIGIT = {
'0':['0'],
'1':['1'],
'2':['a','b','c'],
'3':['d','e','f'],
'4':['g','h','i'],
'5':['j','k','l'],
'6':['m','n','o'],
'7':['p','q','r','s'],
'8':['t','u','v'],
'9':['w','x','y','z'],
}
def mneumonics(n: str):
if len(n) == 0:
return ['']
rest = mneumonics(n[1:])
first = LETTERS_BY_DIGIT[n[0]]
rtn = [] # A fresh list to return.
for f in first: # Cartesian cross:
for r in rest: # first X rest
rtn.append(f + r); # Fresh string
return rtn
print(mneumonics('1905'))
Note that this code does not mutate the recursive return values rest at all. It makes a new list of new strings.
When you've mastered all the Python idioms, you'll see a slicker way to code the same thing:
def mneumonics(n: str):
return [''] if len(n) == 0 else [
c + r for c in LETTERS_BY_DIGIT[n[0]] for r in mneumonics(n[1:])]
Is this the most efficient code to solve this problem? Absolutely not. But this isn't a very practical thing to do anyway. It's better to go for a simple, correct solution that's easy to understand rather than worry about efficiency before you have a solid grasp of this way of thinking.
As others have said, using recursion at all on this problem is not a great choice if this were a production requirement.
The correct list (Mnemonics) was generated for the deepest call of the recursion. However, it was not passed back to previous calls.
To fix this, the Mnemonics not only needs to be returned in the "else" block, but it also needs to be set to equal the output of the recursive function phone Number Mnemonics.
def phoneNumberMnemonics(phoneNumber, Mnemonics=[''], idx=0):
number_lookup={'0':['0'], '1':['1'], '2':['a','b','c'], '3':['d','e','f'], '4':['g','h','i'], '5':['j','k','l'], '6':['m','n','o'], '7':['p','q','r','s'], '8':['t','u','v'], '9':['w','x','y','z']}
print(idx, len(phoneNumber))
if idx==len(phoneNumber):
pass
else:
new_Mnemonics=[]
for letter in number_lookup[phoneNumber[idx]]:
for mnemonic in Mnemonics:
new_Mnemonics.append(mnemonic+letter)
Mnemonics=phoneNumberMnemonics(phoneNumber, new_Mnemonics, idx+1)
return Mnemonics
I still feel that I'm lacking sophistication in my understanding of recursion. Advice, feedback, and clarifications are welcome.
Not quite sure what the correct title should be.
I have a function with 2 inputs def color_matching(color_old, color_new). This function should check the strings in both arguments and assign either a new string if there is a hit.
def color_matching(color_old, color_new):
if ('<color: none' in color_old):
color_old = "NoHighlightColor"
elif ('<color: none' in color_new):
color_new = "NoHighlightColor"
And so forth. The problem is that each of the arguments can be matched to 1 of 14 different categories ("NoHighlightColor" being one of them). I'm sure there is a better way to do this than repeating the if statement 28 times for each mapping but I'm drawing a blank.
You can at first parse your input arguments, if for example it's something like that:
old_color='<color: none attr:ham>'
you can parse it to get only the value of the relevant attribute you need:
_old_color=old_color.split(':')[1].split()[0]
That way _old_color='none'
Then you can use a dictionary where {'none':'NoHighlightColor'}, lets call it colors_dict
old_color=colors_dict.get(_old_color, old_color)
That way if _old_color exists as a key in the dictionary old_color will get the value of that key, otherwise, old_color will remain unchanged
So your final code should look similar to this:
def color_matching(color_old, color_new):
""" Assuming you've predefined colros_dict """
# Parsing to get both colors
_old_color=old_color.split(':')[1].split()[0]
_new_color=new_color.split(':')[1].split()[0]
# Checking if the first one is a hit
_result_color = colors_dict.get(_old_color, None)
# If it was a hit (not None) then assign it to the first argument
if _result_color:
color_old = _result_color
else:
color_new = colors_dict.get(_color_new, color_new)
You can replace conditionals with a data structure:
def match(color):
matches = {'<color: none': 'NoHighlightColor', ... }
for substring, ret in matches.iteritems():
if substring in color:
return ret
But you seems to have a problem that requires a proper parser for the format you are trying to recognize.
You might build one from simple string operations like "<color:none jaja:a>".split(':')
You could maybe hack one with a massive regex.
Or use a powerful parser generated by a library like this one
The code below works, but looks very ugly. I'm looking for a more pythonic way to write the same thing.
The goal:
React on a result of a function that returns multiple values.
Example function
def myfilterfunc(mystr):
if 'la' in mystr:
return True, mystr
return False, None
This returns True and a string (if the string cointains "la"), or False and nothing.
In a second function, I'm passing myfilterfunc as an optional parameter
def mymainfunc(mystr,filterfunc=None):
This function fills a returnlist.
If no function is given, the result is not filtered and added as is.
If a filter function is given, if the filter function returns
True, a returned string is added. (This is just an example that would
easily work with one return value, but I'm trying to get the systax
right for a more complicated setup)
if filterfunc:
tmp_status,tmp_string = filterfunc(mystr[startpos:nextitem])
if tmp_status:
returnlist.append(tmp_string)
else:
returnlist.append(mystr[startpos:nextitem])
Any idea how I can write this without using temporary variables to store the return values of the function?
Full "working" test code below
def string2list(mystr,splitlist,filterfunc=None):
returnlist = []
startpos = 0
nextitem = -1
matched = True
while matched:
matched = False
for sub in splitlist:
if startpos == 0:
tmpi = mystr.find(sub)
else:
tmpi = mystr.find(sub,startpos + 1)
if (tmpi > 0) and ((nextitem < 0) or (nextitem > tmpi)):
nextitem = tmpi
matched = True
if filterfunc:
tmp_status,tmp_string = filterfunc(mystr[startpos:nextitem])
if tmp_status:
returnlist.append(tmp_string)
else:
returnlist.append(mystr[startpos:nextitem])
startpos = nextitem
nextitem = -1
return returnlist
def myfilterfunc(mystr):
if 'la' in mystr:
return True,mystr
return False,''
splitlist = ['li','la']
mytext = '''
li1
li2
li3
fg4
fg5
fg6
la7
la
la
tz
tz
tzt
tz
end
'''
print string2list(mytext,splitlist)
print
print string2list(mytext,splitlist,myfilterfunc)
If this is going to happen often you can factor out the uglyness:
def filtered(f, x):
if f:
status, result = f(x)
return result if status else x
else:
return x
used like
returnlist.append(filtered(filterfunc, mystr[startpos:nextitem]))
so that if you have many similar optional filters the code remains readable. This works because in Python functions/closures are first class citizens and you can pass them around like other values.
But then if the logic is about always adding (either the filtered or the unfiltered) why not just write the filter to return the input instead of (False, "") in case of failure?
That would make the code simpler to understand...
returnlist.append(filterfunc(mystr[startpos:nextitem]))
I think there are two better approaches to your problem that don't involve using two return values.
The first is to simply return a Boolean value and not a string at all. This works if your filter is always going to return the string it was passed unmodified if it returns a string at all (e.g. if the first value is True). This approach will let you avoid using temporary values at all:
if filterfunc:
if filterfunc(mystr[startpos:nextitem]):
returnlist.append(mystr[startpos:nextitem])
(Note, I'd suggest renaming filterfunc to predicate if you go this route.)
The other option will work if some filterfunc might return a different second value than it was passed under some situations, but never the 2-tuple True, None. In this approach you simply use the single value as both the signal and the payload. If it's None, you ignore it. If it's anything else, you use it. This does require a temporary variable, but only one (and it's a lot less ugly).
if filterfunc:
result = filterfunc(mystr[startpos:nextitem])
if result is not None:
returnlist.append(result)
I'm using the following code:
def recentchanges(bot=False,rclimit=20):
"""
#description: Gets the last 20 pages edited on the recent changes and who the user who edited it
"""
recent_changes_data = {
'action':'query',
'list':'recentchanges',
'rcprop':'user|title',
'rclimit':rclimit,
'format':'json'
}
if bot is False:
recent_changes_data['rcshow'] = '!bot'
else:
pass
data = urllib.urlencode(recent_changes_data)
response = opener.open('http://runescape.wikia.com/api.php',data)
content = json.load(response)
pages = tuple(content['query']['recentchanges'])
for title in pages:
return title['title']
When I do recentchanges() I only get one result. If I print it though, all the pages are printed.
Am I just misunderstanding or is this something relating to python?
Also, opener is:
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
Once a return statment is reached in a function, that functions execution ends, so the second return does not get executed. In order to return both values you need to pack them in a list or tuple:
...
returnList = [title['title'] for title in pages]
return returnList
This uses list comprehension to make a list of all the object you want the function to return and then returns it.
Then you can unpackage individual results from the return list:
answerList = recentchanges()
for element in answerList:
print element
The problem you are having is that a function ends at the first return line it sees.
So. in the line
for title in pages:
return title['title']
It returns only the first value: pages[0]['title'].
One way around this is to use a list-comprehension i.e.
return [ title['title'] for title in pages ]
Another option is to make recentchanges a generator and use yield.
for title in pages:
yield title['title']
return ends the function. So the loop only executes once, because you're returning in the loop. Think about it: how would the caller get subsequent values once the first value has been returned? Would they have to call the function again? But that would start it over again. Should Python wait until the loop is complete to return all the values at once? But where would they go and how would Python know to do this?
You might provide a generator here by yielding rather than returning it. You could also just return a generator:
return (page['title'] for page in pages)
Either way, the caller can then convert it to a list if desired, or iterate over it directly:
titles = list(recentchanges())
# or
for title in recentchanges():
print title
Alternatively, you can just return the list of titles:
return [page['title'] for page in pages]
Since you use return, your function will end after returning first value.
There are two alternatives;
you can append the titles to a list and return that, or
you can use yield instead of return to turn your function into a generator.
The latter is probably more pythonic, because you could then us it like this:
for title in recentchanges():
# do something with the title
pass
I have written a little program that parses log files of anywhere between a few thousand lines to a few hundred thousand lines. For this, I have a function in my code which parses every line, looks for keywords, and returns the keywords with the associated values.
These log files contain of little sections. Each section has some values I'm interested in and want to store as a dictionary.
I have simplified the sample below, but the idea is the same.
My original function looked like this, it gets called between 100 and 10000 times per run, so you can understand why I want to optimize it:
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif 'apples' in line:
d['apples'] = True
elif 'bananas' in line:
d['bananas'] = True
elif line.startswith('End of section'):
return d
f = open('fruit.txt','r')
d = parse_txt(f)
print d
The problem I run into, is that I have a lot of conditionals in my program, because it checks for a lot of different things and stores the values for it. And when checking every line for anywhere between 0 and 30 keywords, this gets slow fast. I don't want to do that, because, not every time I run the program I'm interested in everything. I'm only ever interested in 5-6 keywords, but I'm parsing every line for 30 or so keywords.
In order to optimize it, I wrote the following by using exec on a string:
def make_func(args):
func_str = """
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
"""
if 'apples' in args:
func_str += """
elif 'apples' in line:
d['apples'] = True
"""
if 'bananas' in args:
func_str += """
elif 'bananas' in line:
d['bananas'] = True
"""
func_str += """
elif line.startswith('End of section'):
return d"""
print func_str
exec(func_str)
return parse_txt
args = ['apples','bananas']
fun = make_func(args)
f = open('fruit.txt','r')
d = fun(f)
print d
This solution works great, because it speeds up the program by an order of magnitude and it is relatively simple. Depending on the arguments I put in, it will give me the first function, but without checking for all the stuff I don't need.
For example, if I give it args=['bananas'], it will not check for 'apples', which is exactly what I want to do.
This makes it much more efficient.
However, I do not like it this solution very much, because it is not very readable, difficult to change something and very error prone whenever I modify something. Besides that, it feels a little bit dirty.
I am looking for alternative or better ways to do this. I have tried using a set of functions to call on every line, and while this worked, it did not offer me the speed increase that my current solution gives me, because it adds a few function calls for every line. My current solution doesn't have this problem, because it only has to be called once at the start of the program. I have read about the security issues with exec and eval, but I do not really care about that, because I'm the only one using it.
EDIT:
I should add that, for the sake of clarity, I have greatly simplified my function. From the answers I understand that I didn't make this clear enough.
I do not check for keywords in a consistent way. Sometimes I need to check for 2 or 3 keywords in a single line, sometimes just for 1. I also do not treat the result in the same way. For example, sometimes I extract a single value from the line I'm on, sometimes I need to parse the next 5 lines.
I would try defining a list of keywords you want to look for ("keywords") and doing this:
for word in keywords:
if word in line:
d[word] = True
Or, using a list comprehension:
dict([(word,True) for word in keywords if word in line])
Unless I'm mistaken this shouldn't be much slower than your version.
No need to use eval here, in my opinion. You're right in that an eval based solution should raise a red flag most of the time.
Edit: as you have to perform a different action depending on the keyword, I would just define function handlers and then use a dictionary like this:
def keyword_handler_word1(line):
(...)
(...)
def keyword_handler_wordN(line):
(...)
keyword_handlers = { 'word1': keyword_handler_word1, (...), 'wordN': keyword_handler_wordN }
Then, in the actual processing code:
for word in keywords:
# keyword_handlers[word] is a function
keyword_handlers[word](line)
Use regular expressions. Something like the next:
>>> lookup = {'a': 'apple', 'b': 'banane'} # keyword: characters to look for
>>> pattern = '|'.join('(?P<%s>%s)' % (key, val) for key, val in lookup.items())
>>> re.search(pattern, 'apple aaa').groupdict()
{'a': 'apple', 'b': None}
def create_parser(fruits):
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif line.startswith('End of section'):
return d
else:
for testfruit in fruits:
if testfruit in line:
d[testfruit] = True
This is what you want - create a test function dynamically.
Depending on what you really want to do, it is, of course, possibe to remove one level of complexity and define
def parse_txt(f, fruits):
[...]
or
def parse_txt(fruits, f):
[...]
and work with functools.partial.
You can use set structure, like this:
fruit = set(['cocos', 'apple', 'lime'])
need = set (['cocos', 'pineapple'])
need. intersection(fruit)
return to you 'cocos'.