Converting code to a list comprehension - python

I wrote a bit of code to iterate over a list to see if a line contained one or more keywords:
STRINGS_TO_MATCH = [ "Foo",
"Bar",
"Oggle" ]
for string in STRINGS_TO_MATCH
if string in line:
val_from_line = line.split(' ')[-1]
Does anyone happen to know if there is a way to make this more readable? Would a list comprehension be a better fit here?

The thing to remember here is that comprehensions are expressions, whose purpose is to create a value - list comprehensions create lists, dict comprehensions create dicts, and set comprehensions create sets. They are unlikely to help in this case, because you aren't creating any such object.
Your code sample is incomplete, because it doesn't do anything with the val_from_line values that it extracts. I am presuming that you want to extract the last "word" from a line which contains any of the strings in STRINGS_TO_MATCH, but it's difficult to work with such incomplete information so this answer might, for all I know, be totally useless.
Assuming I'm correct, the easiest way to find out if line contains any of the STRINGS_TO_MATCH is to use the expression
any(s in line for s in STRINGS_TO_MATCH)
This uses a so-called generator expression, which is similar to a list comprehension - the interpreter can iterate over it to produce a sequence of values - but it doesn't go as far as creating a list of the values, it just creates them as the client code (in this case the any built-in function) requests them. So I might rewrite your code as
if any(s in line for s in STRINGS_TO_MATCH):
val_from_line = line.split(' ')[-1]
I'll leave you to decide what you actually want to do after that, with the warning note that after this code executes val_from_line may or may not exist (depending on whether or not the condition was true), which is never an entirely comfortable situation.

Related

Python list comprehension syntax showing output expression as "optional"?

This following page [see below] has a syntax description for a Python list comprehension which says that the output expression is "Optional." I haven't seen this "optional" designation elsewhere and it doesn't seem that a list comprehension would work without it. EG.
>>> llist = [1, 2, 3]
# list comprehension with output expression works
>>> listc = [num for num in llist]
# list comprehension without output expression fails
>>> listc2 =[for num in llist]
File "<stdin>", line 1
listc2 =[for num in llist]
^
SyntaxError: invalid syntax
** Here is the page:**
https://python-reference.readthedocs.io/en/latest/docs/comprehensions/list_comprehension.html
and here is the description from that page:
[expression(variable) for variable in input_set [predicate][, …]]
expression
Optional. An output expression producing members of the new set from members of the input set that satisfy the predicate expression.
variable
Required. Variable representing members of an input set.
input_set
Required. Represents the input set.
predicate
Optional. Expression acting as a filter on members of the input set.
[, …]]
Optional. Another nested comprehension.
Possibly they are trying to say that you can start a list comprehension with a bare variable, but that is still an expression, correct?
Looks like the doc is a bit unclear. You do need something on the left hand side. Otherwise the comprehension doesn't make much sense.
The page you have linked in an unofficial reference for Python, and as you can see on the GitHub page, it hasn't been updated in about 4 years. If you would like the up to date, and more importantly, correct information, go to the official documentation
Here is the link relevent to list comprehensions and their syntax
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
A list comprehension consists of brackets containing an expression
followed by a for clause, then zero or more for or if clauses.
The page you mentioned is quite clear about the expression. It's true, that in [expression(variable) for variable in input_set [predicate][, …]] the expression is optional, you may leave the variable as is, and it will still work. Moreover, you may leave there anything you like, not even remotely connected to the variable, like a number 42, and it still works.
Probably the original intention of the authors of the original page was to explain that you don't have to do anything with the loop variable if you don't want to an leave it as is.

Storing phrases from a string into a list

I've found this bit of code that works exactly as intended, yet im puzzled as to why.
The idea is to extract information from each line (without spaces or extra tabulation symbols).
The code i found is the following:
def extract_information(line: str) -> list:
return [phrase.strip() for phrase in line.split(' ') if phrase]
And it works! But since it's a one-liner, im having a hard time trying to decipher it, im used to fully written out loops.
Ie.
print(extract_information("Marni FIGHTS FOR LIFE Old Shack Will rule the kingdom"))
Should become:
['Marni', 'FIGHTS FOR LIFE', 'Old Shack', 'Will rule the kingdom']
Anyone has a clue about this ?
Python supports something called List comprehension.
Summary
It consists of brackets containing an expression followed by a for
clause, then zero or more for or if clauses. The expressions can be
anything, meaning you can put in all kinds of objects in lists.
Behind the scenes
Thus, you can read the above expression this way:
[return something for something in listofsomethings if something exists ]
Note that the above expression is only for educational purposes and is not valid in any way.
Therefore in your particular case, it could be translated to this:
for phrase in line.split(' '):
if phrase:
phrase.strip()
So as you can see, it does exactly what you expect it to do. Both do the exact same thing but list comprehension is generally considered as more pythonic.
It's list comprehension. Run something simple like this to see how it works. It's basically a way to loop through a list in a succinct way.
x=[1,2,3,4,5,6]
y=[a*2 for a in x]

Performing Counts, Sorting/mapping Large Dicts

I'm doing this week's 'easy' Daily Programmer Challenge on Reddit. The description is at the link, but essentially the challenge is to read a text file from a url and do a word count. Needless to say the resulting output is a fairly large dictionary object. I have a few questions, mostly regarding accessing or sorting keys according to their value.
First, I developed the code according to what I currently understand about OOP and good Python style. I wanted it to be as robust as possible but I also wanted to use the least amount of imported modules. My goal is to become a good programmer, thus I believe it's important to develop a strong foundation and figure out how to do things myself whenever possible. That being said, the code:
from urllib2 import urlopen
class Word(object):
def __init__(self):
self.word_count = {}
def alpha_only(self, word):
"""Converts word to lowercase and removes any non-alphabetic characters."""
x = ''
for letter in word:
s = letter.lower()
if s in 'abcdefghijklmnopqrstuvwxyz':
x += s
if len(x) > 0:
return x
def count(self, line):
"""Takes a line from the file and builds a list of lowercased words containing only alphabetic chars.
Adds each word to word_count if not already present, if present increases the count by 1."""
words = [self.alpha_only(x) for x in line.split(' ') if self.alpha_only(x) != None]
for word in words:
if word in self.word_count:
self.word_count[word] += 1
elif word != None:
self.word_count[word] = 1
class File(object):
def __init__(self,book):
self.book = urlopen(book)
self.word = Word()
def strip_line(self,line):
"""Strips newlines, tabs, and return characters from beginning and end of line. If remaining string > 1,
splits up the line and passes it along to the count method of the word object."""
s = line.strip('\n\r\t')
if s > 1:
self.word.count(s)
def process_book(self):
"""Main processing loop, will not begin processing until the first line after the line containing "START".
After processing it will close the file."""
begin = False
for line in self.book:
if begin == True:
self.strip_line(line)
elif 'START' in line:
begin = True
self.book.close()
book = File('http://www.gutenberg.org/cache/epub/47498/pg47498.txt')
book.process_book()
count = book.word.word_count
So now I have a fairly accurate and robust word count that probably doesn't have any duplicates or blank entries, but is nevertheless a dict object containing over 3k key/value pairs. I can't iterate over it using for k,v in count or it gives me the exception ValueError: too many values to unpack, which rules out using list comprehension or mapping to a function to perform any kind of sorting.
I was reading this HowTo on Sorting and playing with it a few minutes ago and noticed that for x in count.items() lets me iterate through a list of key/value pairs without throwing a ValueError exception, so I removed the line count = book.word.word_count and added the following:
s_count = sorted(book.word.word_count.items(), key=lambda count: count[1], reverse=True)
# Delete the original dict, it is no longer needed
del book.word.word_count
Now I finally have a sorted list of words, s_count. PHEW! So, my questions are:
Is a dict even the best data type to perform the original counting? Would a list of tuples like that returned by count.items() have been preferable? But that would probably slow it down, right?
This seems kind of 'clunky', as I'm building a dict, converting it to a list containing tuples, then sorting the list and returning a new list. However, it is my understanding that dictionaries allow me to perform the fastest lookups, so am I missing something here?
I read briefly about hashing. While I think I understand that the point is that hashing will save space in memory and allow me to perform faster look-ups and comparisons, wouldn't the trade off be that the program becomes more computationally expensive(higher CPU load) because it would then be calculating hashes for each word? Is hashing relevant here?
Any feedback on naming conventions (which I am terrible at), or any other suggestions about basically anything (including style), would be greatly appreciated.
Are you sure that for k,v in count: gives the exception ValueError: too many values to unpack? I expect it to give ValueError: need more than 1 value to unpack.
When you use a dict as an iterator (eg in a for loop) you just get the keys, you don't get the values. If you want key, value pairs you need to use the dict's iteritems() method as mentioned by figs in the comment (or in Python 3 the items() method).
Of course, you can always do something like:
for k in count:
print k, count[k]
...
I think that most of your questions are more suited to Code Review than to Stack Overflow. But since you've asked so nicely here, I'll mention a few points. :)
It's rather inefficient to build up a string char by char, so your alpha_only() method would be better if it collected chars in a list then used the str.join() method to join them into a single string. The usual Python idiom would do that using a list comprehension.
The list comprehension in your count() method calls alpha_only() twice for each word, which is in efficient.
You could make your strip() call simpler by using the default argument, as that strips all white space (and you don't need to preserve space chars in this application). Similarly, using split() with its default arg will split on any runs of blank space, which is probably better in this application, since giving an arg of a single space means that you'll get some empty strings in the list returned by split if there are any runs of multiple spaces within a line.
...
You mention hashing in your question, and whether it's useful for this application. Yes, it is. Python dictionaries actually use hashing of their keys, so you don't need to worry about the details. And yes, a dictionary is a good data structure to use for this task. There are fancier forms of dictionary that make things a bit simpler, but to use them does require importing a (standard) module. But using a dictionary (of some flavour or another) to hold data and then generating a list of tuples from it for final sorting is a fairly common practice in Python. And there's no need to specifically delete the dictionary when you've finished with it if the program's about to terminate anyway.
...
As for the duplicated call of alpha_only(), whenever you find yourself doing that sort of thing it's a sign that a list comprehension isn't really suitable for the task and that you should just use a normal for loop so that you can save the result of the function call rather than having to recalculate it. Eg,
words = []
for word in line.split():
word = self.alpha_only(word)
if word is not None:
words.append(word)

Python comparing elements in two lists

I have two lists:
a - dictionary which contains keywords such as ["impeccable", "obvious", "fantastic", "evident"] as elements of the list
b - sentences which contains sentences such as ["I am impeccable", "you are fantastic", "that is obvious", "that is evident"]
The goal is to use the dictionary list as a reference.
The process is as follows:
Take an element for the sentences list and run it against each element in the dictionary list. If any of the elements exists, then spit out that sentence to a new list
Repeating step 1 for each of the elements in the sentences list.
Any help would be much appreciated.
Thanks.
Below is the code:
sentences = "The book was awesome and envious","splendid job done by those guys", "that was an amazing sale"
dictionary = "awesome","amazing", "fantastic","envious"
##Find Matches
for match in dictionary:
if any(match in value for value in sentences):
print match
Now that you've fixed the original problem, and fixed the next problem with doing the check backward, and renamed all of your variables, you have this:
for match in dictionary:
if any(match in value for value in sentences):
print match
And your problem with it is:
The way I have the code written i can get the dictionary items but instead i want to print the sentences.
Well, yes, your match is a dictionary item, and that's what you're printing, so of course that's what you get.
If you want to print the sentences that contain the dictionary item, you can't use any, because the whole point of that function us to just return True if any elements are true. It won't tell you which ones—in fact, if there are more than one, it'll stop at the first one.
If you don't understand functions like any and the generator expressions you're passing to them, you really shouldn't be using them as magic invocations. Figure out how to write them as explicit loops, and you will be able to answer these problems for yourself easily. (Note that the any docs directly show you how to write an equivalent loop.)
For example, your existing code is equivalent to:
for match in dictionary:
for value in sentences:
if match in value:
print match
break
Written that way, it should be obvious how to fix it. First, you want to print the sentence instead of the word, so print value instead of match (and again, it would really help if you used meaningful variable names like sentence and word instead of meaningless names like value and misleading names like match…). Second, you want to print all matching sentences, not just the first one, so don't break. So:
for match in dictionary:
for value in sentences:
if match in value:
print value
And if you go back to my first answer, you may notice that this is the exact same structure I suggested.
You can simplify or shorten this by using comprehensions and iterator functions, but not until you understand the simple version, and how those comprehensions and iterator functions work.
First translate your algorithm into psuedocode instead of a vague description, like this:
for each sentence:
for each element in the dictionary:
if the element is in the sentence:
spit out the sentence to a new list
The only one of these steps that isn't completely trivial to convert to Python is "spit out the sentence to a new list". To do that, you'll need to have a new list before you get started, like a_new_list = [], and then you can call append on it.
Once you convert this to Python, you will discover that "I am impeccable and fantastic" gets spit out twice. If you don't want that, you need to find the appropriate please to break out of the inner loop and move on to the next sentence. Which is also trivial to convert to Python.
Now that you've posted your code… I don't know what problem you were asking about, but there's at least one thing obviously wrong with it.
sentences is a list of sentences.
So, for partial in sentences means each partial will be a sentence, like "I am impeccable".
dictionary is a list of words. So, for value in dictionary means each value will be a word, like "impeccable".
Now, you're checking partial in value for each value for each partial. That will never be true. "I am impeccable" is not in "impeccable".
If you turn that around, and check whether value in partial, it will give you something that's at least true sometimes, and that may even be what you actually want, but I'm not sure.
As a side note, if you used better names for your variables, this would be a lot more obvious. partial and value don't tell you what those things actually are; if you'd called them sentence and word it would be pretty clear that sentence in word is never going to be true, and that word in sentence is probably what you wanted.
Also, it really helps to look at intermediate values to debug things like this. When you use an explicit for statement, you can print(partial) to see each thing that partial holds, or you can put a breakpoint in your debugger, or you can step through in a visualizer like this one. If you have to break the any(genexpr) up into an explicit loop to do, then do so. (If you don't know how, then you probably don't understand what generator expressions or the any function do, and have just copied and pasted random code you didn't understand and tried changing random things until it worked… in which case you should stop doing that and learn what they actually mean.)

For...in questions (Python)

I was trying some different ways to run some for...in loops. Consider a list of lists:
list_of_lists = []
list = [1, 2, 3, 4, 5]
for i in range(len(list)):
list_of_lists.append(list) # each entry in l_o_l will now be list
Now let's say I want to have the first "column" of l_o_l be included in a separate list, a.
There are several ways I can go about this. For example:
a = [list[0] for list in list_of_lists] # this works (a = [1, 1, 1, 1, 1])
OR
a=[]
for list in list_of_lists:
a.append(hit[0]) #this also works
For the second example, however, I would imagine the "full" expansion to be equivalent to
a=[]
a.append(list[0] for list in list_of_lists) #but this, obviously, produces a generator error
The working "translation" is, in fact,
a=[]
a.append([list[0] for list in list_of_lists]) #this works
My question is on interpretation and punctuation, then. How come Python "knows" to append/does append the list brackets around the "list[0] for list in list_of_lists" expansion (and thus requires it in any rewrite)?
The issue here is that list comprehensions and generator expressions are not just loops, they are more than that.
List comprehensions are designed to be an easy way to build up a list from an iterable, as you have shown.
Your latter two examples both don't work - in both cases you are appending the wrong thing to the list - in the first case, a generator, the second appends a list inside your existing list. Neither of these are what you want.
You are trying to do something in two different ways at the same time, and it doesn't work. Just use the list comprehension - it does what you want to do in the most efficient and readable way.
Your main problem is you seem to have taken list comprehensions and generator expressions and not understood what they are and what they are trying to do. I suggest you try to understand them further before using them.
My question is on interpretation and punctuation, then. How come
Python "knows" to append/does append the list brackets around the
"hit[0] for list in list_of_lists" expansion (and thus requires it in
any rewrite)?
Not sure what that is supposed to mean. I think you might be unaware that in addition to list comprehensions [i*2 for i in range(0,3)] there are also generator expressions (i*2 for i in range(0,3)).
Generator expressions are a syntax for creating generators that perform a mapping, just like a list comprehension (but as a generator). Any list comprehension [c] can be rewritten list(c). The reason why there is a naked c inside list() is because where generator expressions appear as a parameter to a call, it is permitted to drop the brackets. This is what you are seeing in a.append(hit[0] for list in list_of_lists).

Categories