Continuous letter check for items in list [duplicate] - python

This question already has answers here:
Determine prefix from a set of (similar) strings
(11 answers)
Closed 2 years ago.
I need to know how to identify prefixes in strings in a list. For example,
list = ['nomad', 'normal', 'nonstop', 'noob']
Its answer should be 'no' since every string in the list starts with 'no'
I was wondering if there is a method that iterates each letter in strings in the list at the same time and checks each letter is the same with each other.

Use os.path.commonprefix it will do exactly what you want.
In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']
In [2]: import os.path as p
In [3]: p.commonprefix(list)
Out[3]: 'no'
As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.

Here is a code without libraries:
for i in range(len(l[0])):
if False in [l[0][:i] == j[:i] for j in l]:
print(l[0][:i-1])
break
gives output:
no

There is no built-in function to do this. If you are looking for short python code that can do this for you, here's my attempt:
def longest_common_prefix(words):
i = 0
while len(set([word[:i] for word in words])) <= 1:
i += 1
return words[0][:i-1]
Explanation: words is an iterable of strings. The list comprehension
[word[:i] for word in words]
uses string slices to take the first i letters of each string. At the beginning, these would all be empty strings. Then, it would consist of the first letter of each word. Then the first two letters, and so on.
Casting to a set removes duplicates. For example, set([1, 2, 2, 3]) = {1, 2, 3}. By casting our list of prefixes to a set, we remove duplicates. If the length of the set is less than or equal to one, then they are all identical.
The counter i just keeps track of how many letters are identical so far.
We return words[0][i-1]. We arbitrarily choose the first word and take the first i-1 letters (which would be the same for any word in the list). The reason that it's i-1 and not i is that i gets incremented before we check if all of the words still share the same prefix.

Here's a fun one:
l = ['nomad', 'normal', 'nonstop', 'noob']
def common_prefix(lst):
for s in zip(*lst):
if len(set(s)) == 1:
yield s[0]
else:
return
result = ''.join(common_prefix(l))
Result:
'no'
To answer the spirit of your question - zip(*lst) is what allows you to "iterate letters in every string in the list at the same time". For example, list(zip(*lst)) would look like this:
[('n', 'n', 'n', 'n'), ('o', 'o', 'o', 'o'), ('m', 'r', 'n', 'o'), ('a', 'm', 's', 'b')]
Now all you need to do is find out the common elements, i.e. the len of set for each group, and if they're common (len(set(s)) == 1) then join it back.
As an aside, you probably don't want to call your list by the name list. Any time you call list() afterwards is gonna be a headache. It's bad practice to shadow built-in keywords.

Related

python list comprehension with cls

I encountered a snippet of code like the following:
array = ['a', 'b', 'c']
ids = [array.index(cls.lower()) for cls in array]
I'm confusing for two points:
what does [... for cls in array] mean, since cls is a reserved keyword for class, why not just using [... for s in array]?
why bother to write something complicated like this instead of just [i for i in range(len(array))].
I believe this code is written by someone more experienced with python than me, and I believe he must have some reason for doing so...
cls is not a reserved word for class. That would be a very poor choice of name by the language designer. Many programmers may use it by convention but it is no more reserved than the parameter name self.
If you use distinct upper and lower case characters in the list, you will see the difference:
array = ['a', 'b', 'c', 'B','A','c']
ids = [array.index(cls.lower()) for cls in array]
print(ids)
[0, 1, 2, 1, 0, 2]
The value at position 3 is 1 instead of 3 because the first occurrence of a lowercase 'B' is at index 1. Similarly, the value at the last positions is 2 instead of 5 because the first 'c' is at index 2.
This list comprehension requires that the array always contain a lowercase instance of every uppercase letter. For example ['a', 'B', 'c'] would make it crash. Hopefully there are other safeguards in the rest of the program to ensure that this requirement is always met.
A safer, and more efficient way to write this would be to build a dictionary of character positions before going through the array to get indexes. This would make the time complexity O(n) instead of O(n^2). It could also help make the process more robust.
array = ['a', 'b', 'c', 'B','A','c','Z']
firstchar = {c:-i for i,c in enumerate(array[::-1],1-len(array))}
ids = [firstchar.get(c.lower()) for c in array]
print(ids)
[0, 1, 2, 1, 0, 2, None]
The firstchar dictionary contains the first index in array containing a given letter. It is built by going backward through the array so that the smallest index remains when there are multiple occurrences of the same letter.
{'Z': 6, 'c': 2, 'A': 4, 'B': 3, 'b': 1, 'a': 0}
Then, going through the array to form ids, each character finds the corresponding index in O(1) time by using the dictionary.
Using the .get() method allows the list comprehension to survive an upper case letter without a corresponding lowercase value in the list. In this example it returns None but it could also be made to return the letter's index or the index of the first uppercase instance.
Some developers might be experienced, but actually terrible with the code they write and just "skate on by".
Having said that, your suggested output for question #2 would differ if the list contained two of any element. The suggested code would return the first indices where a list element occurs where as yours would give each individual items index. It would also differ if the array elements weren't lowercase.

list comprehension without if but with else

My question aims to use the else condition of a for-loop in a list comprehension.
example:
empty_list = []
def example_func(text):
for a in text.split():
for b in a.split(","):
empty_list.append(b)
else:
empty_list.append(" ")
I would like to make it cleaner by using a list comprehension with both for-loops.
But how can I do this by including an escape-clause for one of the loops (in this case the 2nd).
I know I can use if with and without else in a list comprehension. But how about using else without an if statement.
Is there a way, so the interpreter will understand it as escape-clause of a for loop?
Any help is much appreciated!
EDIT:
Thanks for the answers! In fact im trying to translate morse code.
The input is a string, containing morse codes.
Each word is separated by 3 spaces. Each letter of each word is separated by 1 space.
def decoder(code):
str_list = []
for i in code.split(" "):
for e in i.split():
str_list.append(morse_code_dic[e])
else:
str_list.append(" ")
return "".join(str_list[:-1]).capitalize()
print(decoder(".. - .-- .- ... .- --. --- --- -.. -.. .- -.--"))
I want to break down the whole sentence into words, then translate each word.
After the inner loop (translation of one word) is finished, it will launch its escape-clause else, adding a space, so that the structure of the whole sentence will be preserved. That way, the 3 Spaces will be translated to one space.
As noted in comments, that else does not really make all that much sense, since the purpose of an else after a for loop is actually to hold code for conditional execution if the loop terminates normally (i.e. not via break), which your loop always does, thus it is always executed.
So this is not really an answer to the question how to do that in a list comprehension, but more of an alternative. Instead of adding spaces after all words, then removing the last space and joining everything together, you could just use two nested join generator expressions, one for the sentence and one for the words:
def decoder(code):
return " ".join("".join(morse_code_dic[e] for e in i.split())
for i in code.split(" ")).capitalize()
As mentioned in the comments, the else clause in your particular example is pointless because it always runs. Let's contrive an example that would let us investigate the possibility of simulating a break and else.
Take the following string:
s = 'a,b,c b,c,d c,d,e, d,e,f'
Let's say you wanted to split the string by spaces and commas as before, but you only wanted to preserve the elements of the inner split up to the first occurrence of c:
out = []
for i in s.split():
for e in i.split(','):
if e == 'c':
break
out.append(e)
else:
out.append('-')
The break can be simulated using the arcane two-arg form of iter, which accepts a callable and a termination value:
>>> x = list('abcd')
>>> list(iter(iter(x).__next__, 'c'))
['a', 'b']
You can implement the else by chaining the inner iterable with ['-'].
>>> from itertools import chain
>>> x = list('abcd')
>>> list(iter(chain(x, ['-'])
.__next__, 'c'))
['a', 'b']
>>> y = list('def')
>>> list(iter(chain(y, ['-'])
.__next__, 'c'))
['d', 'e', 'f', '-']
Notice that the placement of chain is crucial here. If you were to chain the dash to the outer iterator, it would always be appended, not only when c is not encountered:
>>> list(chain(iter(iter(x).__next__, 'c'), ['-']))
['a', 'b', '-']
You can now simulate the entire nested loop with a single expression:
from itertools import chain
out = [e for i in s.split() for e in iter(chain(i.split(','), ['-']).__next__, 'c')]

How to find same elements from two list without turning them into a set?

I am making a function that corrects our wrong English words.
but I have a problem.
What I want to do is find common letters from both lists (of words). I know that I can do this using the intersection method of sets but this will remove all double words.
wrong_word='addition'
probably_right_word='addiction'
common_letters=common(wrong_word, probably_right_word)
#answer should be ['a','d','d','i','t','i','o','n'] here letter 'c' is not present that's what I wanted.
# wrong_word & probably_right_word this will remove the duplicate letters so this is not valid answer.
#other example of my problem
list1=[1,1,2,3,4,1]
list2=[1,1,3,6]
result=[1,1,3]
#as shown result is the list of the similar elements in the both list.
Builds a dict of counts of elements in l2. Uses that to decide which elements in l1 to include.
w1='addition'
w2='addiction'
#other example of my problem
list1=[1,1,2,3,4,1]
list2=[1,1,3,6]
def f(l1, l2):
l2c = {}
for i in l2:
l2c[i] = l2c.get(i, 0) + 1
# build dict of counts of elements in list2
res = []
for x in l1:
if l2c.get(x,0) > 0:
res.append(x)
l2c[x]-=1
return res
print(f(list1,list2))
print(f(w1,w2))
This will achieve what you've asked for, but for real use cases this algorithm should be problematic. I've given examples in my comment above on the main thread of why this could cause issues, which depends on what you are trying to do.
Try the following:
>>> [char for char in wrong_word if char in probably_right_word]
['a', 'd', 'd', 'i', 't', 'i', 'o', 'n']
>>>
Simply iterate through the characters in either word and only add them to the list if they are in the other word as well.

How to print the first letter of each word in a sentence?

I got this question in a quiz last week, a lot of people got it wrong, so I am pretty sure it will be on our midterm:
Write a function that takes as a parameter a list of strings and
returns a list containing the first letter of each of the strings.
That is, if the input parameter is ["Daniel","Nashyl","Orla",
"Simone","Zakaria"], your function should return ['D', 'N', 'O', 'S',
'Z']. The file you submit should include a main() function that
demonstrates that your function works.
I know you can use this [#:#] to print any letters of a word or sentence.
>>> `x = "Java, Python, Ruby"`
>>> `x[:13]`
'Java, Python,'
>>> `x[:-1]`
'Java, Python, Rub'
>>> `x[:1]`
'J'
But I get confused when it comes to printing the first letter of a bunch of words. I also think that the ".split" function is needed here. I am using python 3.3.3
def first_letters(lst):
return [s[:1] for s in lst]
def main():
lst = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
assert first_letters(lst) == ['D', 'N', 'O', 'S', 'Z']
if __name__=="__main__":
main()
str.split takes a string and breaks it into a list of strings. Your input is already a list of strings, therefore you do not need .split.
"mystring"[:1] gets the first character of the string (or "" if the string is "" to begin with). Apply this to each string in the input list, and return the result.
You can do this with a list comprehension. You'll definitely want to read about them! Here's a minimal example that does what you're looking for:
>>> L = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
>>> [item[0] for item in L]
['D', 'N', 'O', 'S', 'Z']
This loops through each name in your list and creates a new list from the first letter of each item in the original list. For example, "Daniel"[0] == 'D'. No .split is needed.
List comprehensions are cool, and you should learn to use them indeed, but let me explain a bit what's going on here, since in your question you said you're confused how to do it with a bunch of strings.
So, you have a list of strings. Lists are an iterable collection, which means we can iterate through it using, for example, a for loop:
words = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
for word in words:
print word[:1]
I'm sure you were taught about loops like this in class. Now, instead of printing the first letter, let's construct a new list that contains those letters:
result = []
for word in words:
result.append(word[:1])
Here I created a new list, then for every word, I appended the starting letter of that word to the new list. A list comprehension does the same thing, with a more obscure syntax, more elegance, and a bit more efficiency:
result = [word[:1] for word in words]
This is the gist of it.

Manipulating counter information - Python 2.7

I'm fairly new to Python and I have this program that I was tinkering with. It's supposed to get a string from input and display which character is the most frequent.
stringToData = raw_input("Please enter your string: ")
# imports collections class
import collections
# gets the data needed from the collection
letter, count = collections.Counter(stringToData).most_common(1)[0]
# prints the results
print "The most frequent character is %s, which occurred %d times." % (
letter, count)
However, if the string has 1 of each character, it only displays one letter and says it's the most frequent character. I thought about changing the number in the parenthesis in most_common(number), but I didn't want more to display how many times the other letters every time.
Thank you to all that help!
As I explained in the comment:
You can leave off the parameter to most_common to get a list of all characters, ordered from most common to least common. Then just loop through that result and collect the characters as long as the counter value is still the same. That way you get all characters that are most common.
Counter.most_common(n) returns the n most common elements from the counter. Or in case where n is not specified, it will return all elements from the counter, ordered by the count.
>>> collections.Counter('abcdab').most_common()
[('a', 2), ('b', 2), ('c', 1), ('d', 1)]
You can use this behavior to simply loop through all elements, ordered by their count. As long as the count is the same as of the first element in the output, you know that the element still ocurred in the same quantity in the string.
>>> c = collections.Counter('abcdefgabc')
>>> maxCount = c.most_common(1)[0][1]
>>> elements = []
>>> for element, count in c.most_common():
if count != maxCount:
break
elements.append(element)
>>> elements
['a', 'c', 'b']
>>> [e for e, c in c.most_common() if c == maxCount]
['a', 'c', 'b']

Categories