Compare variable number of list elements - python

I have a list containing a string in each element. I want to compare the characters of each string starting from the first character to the end. The loop loops over the length of the shortest string in the list.
For example:
strs = ["flower", "flow", "flight"]
The comparison would look something like this:
for sub_i in range(len(min(strs, key=len))):
if(strs[0][sub_i] == strs[1][sub_i] == strs[2][sub_i]):
#do something
How would I expand this so that I can have an arbitrary number of elements in strs? (Instead of just 3 in my example)

For some k
len(set([s[:k] for s in strs])) == 1
Example:
strs = ["flower", "flow", "flight"]
k = 2
if len(set([s[:k] for s in strs])) == 1:
# do something
print ("same")
Output:
same

For arbitrary lengths, you can zip() the strings. This will automatically iterated using the length of the shortest string. Then determine if all the letters are the same. Below converts it to a set() and checks the length (which will be 1 if all elements are equal), but of course, there are other ways:
strs = ["flower", "flow", "flight"]
for letters in zip(*strs):
if len(set(letters)) == 1:
# do something
print(letters)
Prints:
('f', 'f', 'f')
('l', 'l', 'l')

Related

Continuous letter check for items in list [duplicate]

This question already has answers here:
Determine prefix from a set of (similar) strings
(11 answers)
Closed 2 years ago.
I need to know how to identify prefixes in strings in a list. For example,
list = ['nomad', 'normal', 'nonstop', 'noob']
Its answer should be 'no' since every string in the list starts with 'no'
I was wondering if there is a method that iterates each letter in strings in the list at the same time and checks each letter is the same with each other.
Use os.path.commonprefix it will do exactly what you want.
In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']
In [2]: import os.path as p
In [3]: p.commonprefix(list)
Out[3]: 'no'
As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.
Here is a code without libraries:
for i in range(len(l[0])):
if False in [l[0][:i] == j[:i] for j in l]:
print(l[0][:i-1])
break
gives output:
no
There is no built-in function to do this. If you are looking for short python code that can do this for you, here's my attempt:
def longest_common_prefix(words):
i = 0
while len(set([word[:i] for word in words])) <= 1:
i += 1
return words[0][:i-1]
Explanation: words is an iterable of strings. The list comprehension
[word[:i] for word in words]
uses string slices to take the first i letters of each string. At the beginning, these would all be empty strings. Then, it would consist of the first letter of each word. Then the first two letters, and so on.
Casting to a set removes duplicates. For example, set([1, 2, 2, 3]) = {1, 2, 3}. By casting our list of prefixes to a set, we remove duplicates. If the length of the set is less than or equal to one, then they are all identical.
The counter i just keeps track of how many letters are identical so far.
We return words[0][i-1]. We arbitrarily choose the first word and take the first i-1 letters (which would be the same for any word in the list). The reason that it's i-1 and not i is that i gets incremented before we check if all of the words still share the same prefix.
Here's a fun one:
l = ['nomad', 'normal', 'nonstop', 'noob']
def common_prefix(lst):
for s in zip(*lst):
if len(set(s)) == 1:
yield s[0]
else:
return
result = ''.join(common_prefix(l))
Result:
'no'
To answer the spirit of your question - zip(*lst) is what allows you to "iterate letters in every string in the list at the same time". For example, list(zip(*lst)) would look like this:
[('n', 'n', 'n', 'n'), ('o', 'o', 'o', 'o'), ('m', 'r', 'n', 'o'), ('a', 'm', 's', 'b')]
Now all you need to do is find out the common elements, i.e. the len of set for each group, and if they're common (len(set(s)) == 1) then join it back.
As an aside, you probably don't want to call your list by the name list. Any time you call list() afterwards is gonna be a headache. It's bad practice to shadow built-in keywords.

Compare list with an exact or scrambled match of a long text

I have the following list I want to iterate over it and find if there's a scramble match with the long string aapxjdnrbtvldptfzbbdbbzxtndrvjblnzjfpvhdhhpxjdnrbt and return the number of matches. The below example should return 4
A scramble string basically starts and ends with the same letter, the rest letters are rearranged.
long_string = 'aapxjdnrbtvldptfzbbdbbzxtndrvjblnzjfpvhdhhpxjdnrbt'
my_list = [
'axpaj', # this is scrambled version of aapxj
'apxaj', # this is scrambled version of aapxj
'dnrbt', # this is exact match of dnrbt
'pjxdn', # this is scrambled version of pxjdn
'abd',
]
matches = 0
for l in my_list:
# check for exact match
if l in long_string:
matches += 1
# check for a scramble match
# ...
# matches = 1. Wrong should be 4.
def is_anagram(str1, str2):
str1_list = list(str1)
str1_list.sort()
str2_list = list(str2)
str2_list.sort()
return (str1_list == str2_list)
is_anagram('axpaj' , 'aapxjdnrbtvldptfzbbdbbzxtndrvjblnzjfpvhdhhpxjdnrbt')
['a', 'a', 'j', 'p', 'x']
['a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'd', ...]
This creates sorted match strings for each different word length required. It builds them on the fly to avoid excess processing.
(Edit: Oops, the previous version assumed one long string in doing the caching. Thanks for the catch, #BeRT2me!)
long_string = 'aapxjdnrbtvldptfzbbdbbzxtndrvjblnzjfpvhdhhpxjdnrbt'
my_list = [
'axpaj', # this is scrambled version of aapxj
'apxaj', # this is scrambled version of aapxj
'dnrbt', # this is exact match of dnrbt
'pjxdn', # this is scrambled version of pxjdn
'abd',
]
anagrams = {} # anagrams contains sorted slices for each word length
def is_anagram(str1,str2):
lettercount = len(str1)
cachekey = (str2,lettercount)
if cachekey not in anagrams:
# build the list for that letter length
anagrams[cachekey] = [sorted(str2[x:x+lettercount]) for x in range(len(str2)-lettercount+1)]
return (sorted(str1) in anagrams[cachekey])
matches = 0
for l in my_list:
if is_anagram(l,long_string):
matches += 1
print (f"There are {matches} in my list.")
I think step 1 of finding a solution is to write code which cuts off the first and last letter of a string.
someFunction("abcde") should return "bcd"
Next you'd need some way to check if the letters in the string are all the same. I would do this by alphabetising and comparing corresponding elements.
alphabetize("khfj") gives "fhjk"
isSame("abc","abc") gives True
You'd then need a way of cutting the string into every substring of a specified length, e.g.:
thisFunction("abcdef", 2) gives ["ab", "bc", "cd", "de", "ef"]
Once you have every possible substring of the length you can check for scramble matches by checking each item in the list against every substring in long_string with the same length
Here's a basic approach – iterate over all substrings of long_string with the matching length and check if they are equal to the search item after sorting.
matches = 0
for x in my_list:
# simple case 1: exact match
if x in long_string:
matches += 1
continue
# sort the string - returns a list
x = sorted(x)
# simple case 2: exact match after sorting
if ''.join(x) in long_string:
matches += 1
continue
window_size = len(x)
for i in range(len(long_string)-window_size):
# sort the substring
y = sorted(long_string[i:i+window_size])
# compare - note that both x and y are lists,
# but it doesn't make any difference for this purpose
if x == y:
matches += 1
break
Note that this won't be very efficient because the substrings are re-sorted for every loop. An easy way to optimize would be to store the sorted substrings in a dictionary that maps substring lengths to sets of sorted substrings ({4: {'aapx', 'ajpx', 'djpx', ...}}).
Throw a little Regex at it for an interesting solution:
Finds str matches that:
Start and end the same.
Are the same length.
Only contain the same inner letters as the sublist.
Checks that the sorted item matches the sorted substring. And that they don't match unsorted.
import re
def is_anagram(str1, str2):
return sorted(str1) == sorted(str2) and str1 != str2
def get_matches(str1, str2):
start, middle, end = str1[0], str1[1:-1], str1[-1]
pattern = fr'{start}[{middle}]{{{len(middle)}}}{end}'
return re.findall(pattern, str2)
matches = 0
for l in my_list:
for item in get_matches(l, long_string):
if is_anagram(item, l):
matches += 1
print(matches)
# Output:
4

How to remove items from a list without the [''] part

I am trying to get the word "Test" by taking each character out of the list using positions within it.
Here is my code:
test1 = ["T", "E", "S", "T"]
one = test1[0:1]
two = test1[1:2]
three = test1[2:3]
four = test1[3:4]
print(one, two, three, four)
At the moment my output from the program is:
['T'] ['E'] ['S'] ['T']
Although that does read "Test" it has [] around each letter which I don't want.
[a:b] returns a list with every value from index a until index b.
If you just want to access a singe value from a list you just need to point to the index of the value to access. E.g.
s = ['T', 'e', 's', 't']
print(s[0]) # T
print(s[0:1]) # ['T']
The problem is you are using slices of the list not elements. The syntax l[i1,i2] returns a list with all elements of l between the indices i1 and i2. If one of them is out of bound you get an error. To do what you intended you can do:
one = test[0]
two = test[1]
...
You have slicing and indexing confused. You are using slicing where you should use indexing.
Slicing always returns a new object of the same type, with the given selection elements. Slicing a list always gives you a list again:
>>> test1 = ["T","E","S","T"]
>>> test1[1:3]
['E', 'S']
>>> test1[:1]
['T']
while indexing uses individual positions only (no : colons to separate start and end positions), and gives you the individual elements from the list:
>>> test1[0]
'T'
>>> test1[1]
'E'
Not that you need to use indexing at all. Use the str.join() method instead; given a separator string, this joins the string elements of a list together with that delimiter in between. Use the empty string:
>>> ''.join(test1)
'TEST'
try this
test1 = ["T","E","S","T"]
final = ""
for i in range(0, len(test1)):
final = final + str(test1[i])

Manipulating counter information - Python 2.7

I'm fairly new to Python and I have this program that I was tinkering with. It's supposed to get a string from input and display which character is the most frequent.
stringToData = raw_input("Please enter your string: ")
# imports collections class
import collections
# gets the data needed from the collection
letter, count = collections.Counter(stringToData).most_common(1)[0]
# prints the results
print "The most frequent character is %s, which occurred %d times." % (
letter, count)
However, if the string has 1 of each character, it only displays one letter and says it's the most frequent character. I thought about changing the number in the parenthesis in most_common(number), but I didn't want more to display how many times the other letters every time.
Thank you to all that help!
As I explained in the comment:
You can leave off the parameter to most_common to get a list of all characters, ordered from most common to least common. Then just loop through that result and collect the characters as long as the counter value is still the same. That way you get all characters that are most common.
Counter.most_common(n) returns the n most common elements from the counter. Or in case where n is not specified, it will return all elements from the counter, ordered by the count.
>>> collections.Counter('abcdab').most_common()
[('a', 2), ('b', 2), ('c', 1), ('d', 1)]
You can use this behavior to simply loop through all elements, ordered by their count. As long as the count is the same as of the first element in the output, you know that the element still ocurred in the same quantity in the string.
>>> c = collections.Counter('abcdefgabc')
>>> maxCount = c.most_common(1)[0][1]
>>> elements = []
>>> for element, count in c.most_common():
if count != maxCount:
break
elements.append(element)
>>> elements
['a', 'c', 'b']
>>> [e for e, c in c.most_common() if c == maxCount]
['a', 'c', 'b']

sorting a list in python

My aim is to sort a list of strings where words have to be sorted alphabetically.Except words starting with "s" should be at the start of the list (they should be sorted as well), followed by the other words.
The below function does that for me.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
I am just looking for alternative approaches to achieve this or if anyone can find any issues with the code above.
You could do it in one line, with:
sorted(words, key=lambda x: 'a' + x if x.startswith('s') else 'b' + x)
The sorted() function takes a keyword argument key, which is used to translate the values in the list before comparisons are done.
For example:
sorted(words, key=str.lower)
# Will do a sort that ignores the case, since instead
# of checking 'A' vs. 'b' it will check str.lower('A')
# vs. str.lower('b').
sorted(intlist, key=abs)
# Will sort a list of integers by magnitude, regardless
# of whether they're negative or positive:
# >>> sorted([-5,2,1,-8], key=abs)
# [1, 2, -5, -8]
The trick I used translated strings like this when doing the sorting:
"hello" => "bhello"
"steve" => "asteve"
And so "steve" would come before "hello" in the comparisons, since the comparisons are done with the a/b prefix.
Note that this only affects the keys used for comparisons, not the data items that come out of the sort.
1 . You can use generator expression inside sorted.
2 . You can use str.startswith.
3 . Don't use list as a variable name.
4 . Use key=str.lower in sorted.
mylist1 = sorted((i for i in words if i.startswith(("s","S"))),key=str.lower)
mylist2 = sorted((i for i in words if not i.startswith(("s","S"))),key=str.lower)
return mylist1 + mylist2
why str.lower?
>>> "abc" > "BCD"
True
>>> "abc" > "BCD".lower() #fair comparison
False
>>> l = ['z', 'a', 'b', 's', 'sa', 'sb', '', 'sz']
>>> sorted(l, key=lambda x:(x[0].replace('s','\x01').replace('S','\x01') if x else '') + x[1:])
['', 's', 'sa', 'sb', 'sz', 'a', 'b', 'z']
This key function replaces, for the purpose of sorting, every value starting with S or s with a \x01 which sorts before everything else.
One the lines of Integer answer I like using a tuple slightly better because is cleaner and also more general (works for arbitrary elements, not just strings):
sorted(key=lambda x : ((1 if x[:1] in ("S", "s") else 2), x))
Explanation:
The key parameter allows sorting an array based on the values of f(item) instead of on the values of item where f is an arbitray function.
In this case the function is anonymous (lambda) and returns a tuple where the first element is the "group" you want your element to end up in (e.g. 1 if the string starts with an "s" and 2 otherwise).
Using a tuple works because tuple comparison is lexicographical on the elements and therefore in the sorting the group code will weight more than the element.

Categories