Storing phrases from a string into a list

Storing phrases from a string into a list - python

I've found this bit of code that works exactly as intended, yet im puzzled as to why.
The idea is to extract information from each line (without spaces or extra tabulation symbols).
The code i found is the following:
def extract_information(line: str) -> list:
return [phrase.strip() for phrase in line.split(' ') if phrase]
And it works! But since it's a one-liner, im having a hard time trying to decipher it, im used to fully written out loops.
Ie.
print(extract_information("Marni FIGHTS FOR LIFE Old Shack Will rule the kingdom"))
Should become:
['Marni', 'FIGHTS FOR LIFE', 'Old Shack', 'Will rule the kingdom']
Anyone has a clue about this ?

Python supports something called List comprehension.
Summary
It consists of brackets containing an expression followed by a for
clause, then zero or more for or if clauses. The expressions can be
anything, meaning you can put in all kinds of objects in lists.
Behind the scenes
Thus, you can read the above expression this way:
[return something for something in listofsomethings if something exists ]
Note that the above expression is only for educational purposes and is not valid in any way.
Therefore in your particular case, it could be translated to this:
for phrase in line.split(' '):
if phrase:
phrase.strip()
So as you can see, it does exactly what you expect it to do. Both do the exact same thing but list comprehension is generally considered as more pythonic.

It's list comprehension. Run something simple like this to see how it works. It's basically a way to loop through a list in a succinct way.
x=[1,2,3,4,5,6]
y=[a*2 for a in x]

Related

Perform regular expression on list containing multiple lines of strings

I need to perform regular expression to clean up the strings in the list below.
X = [ "This is a # scary wolf! ",
"welcome to a mysterious jungle",
"2020 was the year to remember forever_never",
"Remember the name s - John",
"I admire you" ]
I feel comfortable performing the regular expressions themselves on strings alone, however I'm having trouble figuring out how to handle this example as they're placed in a list.
The output is supposed to look like this:
This is scary wolf
welcome to mysterious jungle
was the year to remember forever
Remember the name John
I admire you
What would be the best approach to tackling this problem?

If you feel comfortable handling the individual strings, then you could write a function that takes an individual string in and returns it cleaned.
def your_regex_function(one_string):
your code here that applies regular expressions to one_string
return cleaned_string
Then you need to make a new list of strings using that function, which you can do with a list comprehension:
cleaned_string = [your_regex_function(element) for element in X]
The list comprehension will cycle through each element of X, run it through the function, and put the cleaned individual string as a new element into a new list.

Converting code to a list comprehension

I wrote a bit of code to iterate over a list to see if a line contained one or more keywords:
STRINGS_TO_MATCH = [ "Foo",
"Bar",
"Oggle" ]
for string in STRINGS_TO_MATCH
if string in line:
val_from_line = line.split(' ')[-1]
Does anyone happen to know if there is a way to make this more readable? Would a list comprehension be a better fit here?

The thing to remember here is that comprehensions are expressions, whose purpose is to create a value - list comprehensions create lists, dict comprehensions create dicts, and set comprehensions create sets. They are unlikely to help in this case, because you aren't creating any such object.
Your code sample is incomplete, because it doesn't do anything with the val_from_line values that it extracts. I am presuming that you want to extract the last "word" from a line which contains any of the strings in STRINGS_TO_MATCH, but it's difficult to work with such incomplete information so this answer might, for all I know, be totally useless.
Assuming I'm correct, the easiest way to find out if line contains any of the STRINGS_TO_MATCH is to use the expression
any(s in line for s in STRINGS_TO_MATCH)
This uses a so-called generator expression, which is similar to a list comprehension - the interpreter can iterate over it to produce a sequence of values - but it doesn't go as far as creating a list of the values, it just creates them as the client code (in this case the any built-in function) requests them. So I might rewrite your code as
if any(s in line for s in STRINGS_TO_MATCH):
val_from_line = line.split(' ')[-1]
I'll leave you to decide what you actually want to do after that, with the warning note that after this code executes val_from_line may or may not exist (depending on whether or not the condition was true), which is never an entirely comfortable situation.

Searching a string for an exact match from a list in Python

I'm working on a project that searches specific user's Twitter streams from my followers list and retweets them. The code below works fine, but if the string appears in side of the word (for instance if the desired string was only "man" but they wrote "manager", it'd get retweeted). I'm still pretty new to python, but my hunch is RegEx will be the way to go, but my attempts have proved useless thus far.
if tweet["user"]["screen_name"] in friends:
for phrase in list:
if phrase in tweet["text"].lower():
print tweet
api.retweet(tweet["id"])
return True

Since you only want to match whole words the easiest way to get Python to do this is to split the tweet text into a list of words and then test for the presence of each of your words using in.
There's an optimization you can use because position isn't important: by building a set from the word list you make searching much faster (technically, O(1) rather than O(n)) because of the fast hashed access used by sets and dicts (thank you Tim Peters, also author of The Zen of Python).
The full solution is:
if tweet["user"]["screen_name"] in friends:
tweet_words = set(tweet["text"].lower().split())
for phrase in list:
if phrase in tweet_words:
print tweet
api.retweet(tweet["id"])
return True
This is not a complete solution. Really you should be taking care of things like purging leading and trailing punctuation. You could write a function to do that, and call it with the tweet text as an argument instead of using a .split() method call.
Given that optimization it occurred to me that iteration in Python could be avoided altogether if the phrases were a set also (the iteration will still happen, but at C speeds rather than Python speeds). So in the code that follows let's suppose that you have during initialization executed the code
tweet_words = set(l.lower() for l in list)
By the way, list is a terrible name for a variable, since by using it you make the Python list type unavailable under its usual name (though you can still get at it with tricks like type([])). Perhaps better to call it word_list or something else both more meaningful and not an existing name. You will have to adapt this code to your needs, it's just to give you the idea. Note that tweet_words only has to be set once.
list = ['Python', 'Perl', 'COBOL']
tweets = [
"This vacation just isn't worth the bother",
"Goodness me she's a great Perl programmer",
"This one slides by under the radar",
"I used to program COBOL but I'm all right now",
"A visit to the doctor is not reported"
]
tweet_words = set(w.lower() for w in list)
for tweet in tweets:
if set(tweet.lower().split()) & tweet_words:
print(tweet)

If you want to use regexes to do this, look for a pattern that is of the form \b<string>\b. In your case this would be:
pattern = re.compile(r"\bman\b")
if re.search(pattern, tweet["text"].lower()):
#do your thing
\b looks for a word boundary in regex. So prefixing and suffixing your pattern with it will match only the pattern. Hope it helps.

Python comparing elements in two lists

I have two lists:
a - dictionary which contains keywords such as ["impeccable", "obvious", "fantastic", "evident"] as elements of the list
b - sentences which contains sentences such as ["I am impeccable", "you are fantastic", "that is obvious", "that is evident"]
The goal is to use the dictionary list as a reference.
The process is as follows:
Take an element for the sentences list and run it against each element in the dictionary list. If any of the elements exists, then spit out that sentence to a new list
Repeating step 1 for each of the elements in the sentences list.
Any help would be much appreciated.
Thanks.
Below is the code:
sentences = "The book was awesome and envious","splendid job done by those guys", "that was an amazing sale"
dictionary = "awesome","amazing", "fantastic","envious"
##Find Matches
for match in dictionary:
if any(match in value for value in sentences):
print match

Now that you've fixed the original problem, and fixed the next problem with doing the check backward, and renamed all of your variables, you have this:
for match in dictionary:
if any(match in value for value in sentences):
print match
And your problem with it is:
The way I have the code written i can get the dictionary items but instead i want to print the sentences.
Well, yes, your match is a dictionary item, and that's what you're printing, so of course that's what you get.
If you want to print the sentences that contain the dictionary item, you can't use any, because the whole point of that function us to just return True if any elements are true. It won't tell you which ones—in fact, if there are more than one, it'll stop at the first one.
If you don't understand functions like any and the generator expressions you're passing to them, you really shouldn't be using them as magic invocations. Figure out how to write them as explicit loops, and you will be able to answer these problems for yourself easily. (Note that the any docs directly show you how to write an equivalent loop.)
For example, your existing code is equivalent to:
for match in dictionary:
for value in sentences:
if match in value:
print match
break
Written that way, it should be obvious how to fix it. First, you want to print the sentence instead of the word, so print value instead of match (and again, it would really help if you used meaningful variable names like sentence and word instead of meaningless names like value and misleading names like match…). Second, you want to print all matching sentences, not just the first one, so don't break. So:
for match in dictionary:
for value in sentences:
if match in value:
print value
And if you go back to my first answer, you may notice that this is the exact same structure I suggested.
You can simplify or shorten this by using comprehensions and iterator functions, but not until you understand the simple version, and how those comprehensions and iterator functions work.

First translate your algorithm into psuedocode instead of a vague description, like this:
for each sentence:
for each element in the dictionary:
if the element is in the sentence:
spit out the sentence to a new list
The only one of these steps that isn't completely trivial to convert to Python is "spit out the sentence to a new list". To do that, you'll need to have a new list before you get started, like a_new_list = [], and then you can call append on it.
Once you convert this to Python, you will discover that "I am impeccable and fantastic" gets spit out twice. If you don't want that, you need to find the appropriate please to break out of the inner loop and move on to the next sentence. Which is also trivial to convert to Python.

Now that you've posted your code… I don't know what problem you were asking about, but there's at least one thing obviously wrong with it.
sentences is a list of sentences.
So, for partial in sentences means each partial will be a sentence, like "I am impeccable".
dictionary is a list of words. So, for value in dictionary means each value will be a word, like "impeccable".
Now, you're checking partial in value for each value for each partial. That will never be true. "I am impeccable" is not in "impeccable".
If you turn that around, and check whether value in partial, it will give you something that's at least true sometimes, and that may even be what you actually want, but I'm not sure.
As a side note, if you used better names for your variables, this would be a lot more obvious. partial and value don't tell you what those things actually are; if you'd called them sentence and word it would be pretty clear that sentence in word is never going to be true, and that word in sentence is probably what you wanted.
Also, it really helps to look at intermediate values to debug things like this. When you use an explicit for statement, you can print(partial) to see each thing that partial holds, or you can put a breakpoint in your debugger, or you can step through in a visualizer like this one. If you have to break the any(genexpr) up into an explicit loop to do, then do so. (If you don't know how, then you probably don't understand what generator expressions or the any function do, and have just copied and pasted random code you didn't understand and tried changing random things until it worked… in which case you should stop doing that and learn what they actually mean.)

Python 3.x dictionary-based keygen help?

I'm studing Python for one month and I'm trying to make a keygen application by using the dictionary. The idea was to compare each letter in name = input('Name: ') to dict.keys() and print as result dict.values() for each letter of name equal to dict.keys(). That's what I wrote:
name = input('Name: ')
kalg = dict()
kalg['a'] = '50075'
kalg['b'] = '18099'
kalg['c'] = '89885'
etc...
I tryed writing this...
for x in kalg.keys():
print(x)[/code]
...but i need to keep print(x) result but i don't know how to do it! If i do this:
for x in kalg.keys():
a = x
'a' keeps only the last key of the dictionary :(. I thought it was because print(x) prints each key of dict.keys() on a new line but i don't know how to solve it (I tryed by converting type etc... but it didn't work).
Please can you help me solve this? I also don't know how to compare each letter of a string with another string and print dict.values() as result and in the right position.
Sorry for this stupid question but i'm too excited in writing python apps :)
# Karl
I'm studing Python over two differt books: 'Learning Python' by Mark Luts which covers Python
2 and a pocket which covers Python 3. I examined the list comprehension ón the pocket one and Imanaged to write three other variants of this keygen. Now i want to ask you how can I implementthe source code of this keygen in a real application with a GUI which verify if name_textbox andkey_textbox captions match (i come from basic so that was what i used to write, just to give youan idea) as the keygen output result. I know i can try to do this by my own (I did but with nosuccess) but I would like to first complete the book (the pocket one) and understand all the mainaspects of Python. Thank you for the patience.

Calling print can't "keep" anything (since there is no variable to store it in), and repeatedly assigning to a variable replaces the previous assignments. (I don't understand your reasoning about the problem; how print(x) behaves has nothing to do with how a = x behaves, as they're completely different things to be doing.)
Your question boils down to "how do I keep a bunch of results from several similar operations?" and on a conceptual level, the answer is "put them into a container". But explicitly putting things into the container is more tedious than is really necessary. You have an English description of the data you want: "dict.values() for each letter of name equal to dict.keys()". And in fact the equivalent Python is shockingly similar.
Of course, we don't actually want a separate copy of dict.values() for each matching letter; and we don't actually want to compare the letter to the entire set of dict.keys(). As programmers, we must be more precise: we are checking whether the letter is a key of the dict, i.e. if it is in the set of dict.keys(). Fortunately, that test is trivial to write: for a given letter, we check letter in dict. When the letter is found, we want the corresponding value; we get that by looking it up normally, thus dict[letter].
Then we wrap that all up with our special syntax that gives us what we want: the list comprehension. We put the brackets for a list, and then inside we write (some expression that calculates a result from the input element) for (a variable name for the input elements, so we can use it in that first expression) in (the source of input elements); and we can additionally filter the input elements at the same time, by adding if (some condition upon the input element).
So that's simple enough: [kalg[letter] for letter in name if letter in kalg]. Notice that I have name as the "source of elements", because that's what it should be. You explained that perfectly clearly in your description of the problem - why are you iterating over dict.keys() in your existing for-loops? :)
Now, this expression will give us a list of the results, so e.g. ['foo', 'bar', 'baz']. If we want one continuous string (I assume all the values in your dict are strings), then we'll need to join them up. Fortunately, that's easy as well. In fact, since we're going to pass the results to a function taking one argument, there is a special syntax rule that will let us drop the square brackets, making things look quite a bit neater.
It's also easier than you're making it to initialize the dict in the first place; idiomatic Python code rarely actually needs the word dict.
Putting it all together:
kalg = {'a': '50075', 'b': '18099', 'c': '89885'} # etc.
name = input('Name: ')
print(''.join(kalg[letter] for letter in name if name in kalg))

I can only guess, but this could be what you want:
name = input('Name: ')
kalg = {'a':'50075', 'b': '18099', 'c': '89885'}
keylist = [kalg[letter] for letter in name]
print(" ".join(keylist))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.