I encountered a snippet of code like the following:
array = ['a', 'b', 'c']
ids = [array.index(cls.lower()) for cls in array]
I'm confusing for two points:
what does [... for cls in array] mean, since cls is a reserved keyword for class, why not just using [... for s in array]?
why bother to write something complicated like this instead of just [i for i in range(len(array))].
I believe this code is written by someone more experienced with python than me, and I believe he must have some reason for doing so...
cls is not a reserved word for class. That would be a very poor choice of name by the language designer. Many programmers may use it by convention but it is no more reserved than the parameter name self.
If you use distinct upper and lower case characters in the list, you will see the difference:
array = ['a', 'b', 'c', 'B','A','c']
ids = [array.index(cls.lower()) for cls in array]
print(ids)
[0, 1, 2, 1, 0, 2]
The value at position 3 is 1 instead of 3 because the first occurrence of a lowercase 'B' is at index 1. Similarly, the value at the last positions is 2 instead of 5 because the first 'c' is at index 2.
This list comprehension requires that the array always contain a lowercase instance of every uppercase letter. For example ['a', 'B', 'c'] would make it crash. Hopefully there are other safeguards in the rest of the program to ensure that this requirement is always met.
A safer, and more efficient way to write this would be to build a dictionary of character positions before going through the array to get indexes. This would make the time complexity O(n) instead of O(n^2). It could also help make the process more robust.
array = ['a', 'b', 'c', 'B','A','c','Z']
firstchar = {c:-i for i,c in enumerate(array[::-1],1-len(array))}
ids = [firstchar.get(c.lower()) for c in array]
print(ids)
[0, 1, 2, 1, 0, 2, None]
The firstchar dictionary contains the first index in array containing a given letter. It is built by going backward through the array so that the smallest index remains when there are multiple occurrences of the same letter.
{'Z': 6, 'c': 2, 'A': 4, 'B': 3, 'b': 1, 'a': 0}
Then, going through the array to form ids, each character finds the corresponding index in O(1) time by using the dictionary.
Using the .get() method allows the list comprehension to survive an upper case letter without a corresponding lowercase value in the list. In this example it returns None but it could also be made to return the letter's index or the index of the first uppercase instance.
Some developers might be experienced, but actually terrible with the code they write and just "skate on by".
Having said that, your suggested output for question #2 would differ if the list contained two of any element. The suggested code would return the first indices where a list element occurs where as yours would give each individual items index. It would also differ if the array elements weren't lowercase.
Related
This question already has answers here:
Determine prefix from a set of (similar) strings
(11 answers)
Closed 2 years ago.
I need to know how to identify prefixes in strings in a list. For example,
list = ['nomad', 'normal', 'nonstop', 'noob']
Its answer should be 'no' since every string in the list starts with 'no'
I was wondering if there is a method that iterates each letter in strings in the list at the same time and checks each letter is the same with each other.
Use os.path.commonprefix it will do exactly what you want.
In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']
In [2]: import os.path as p
In [3]: p.commonprefix(list)
Out[3]: 'no'
As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.
Here is a code without libraries:
for i in range(len(l[0])):
if False in [l[0][:i] == j[:i] for j in l]:
print(l[0][:i-1])
break
gives output:
no
There is no built-in function to do this. If you are looking for short python code that can do this for you, here's my attempt:
def longest_common_prefix(words):
i = 0
while len(set([word[:i] for word in words])) <= 1:
i += 1
return words[0][:i-1]
Explanation: words is an iterable of strings. The list comprehension
[word[:i] for word in words]
uses string slices to take the first i letters of each string. At the beginning, these would all be empty strings. Then, it would consist of the first letter of each word. Then the first two letters, and so on.
Casting to a set removes duplicates. For example, set([1, 2, 2, 3]) = {1, 2, 3}. By casting our list of prefixes to a set, we remove duplicates. If the length of the set is less than or equal to one, then they are all identical.
The counter i just keeps track of how many letters are identical so far.
We return words[0][i-1]. We arbitrarily choose the first word and take the first i-1 letters (which would be the same for any word in the list). The reason that it's i-1 and not i is that i gets incremented before we check if all of the words still share the same prefix.
Here's a fun one:
l = ['nomad', 'normal', 'nonstop', 'noob']
def common_prefix(lst):
for s in zip(*lst):
if len(set(s)) == 1:
yield s[0]
else:
return
result = ''.join(common_prefix(l))
Result:
'no'
To answer the spirit of your question - zip(*lst) is what allows you to "iterate letters in every string in the list at the same time". For example, list(zip(*lst)) would look like this:
[('n', 'n', 'n', 'n'), ('o', 'o', 'o', 'o'), ('m', 'r', 'n', 'o'), ('a', 'm', 's', 'b')]
Now all you need to do is find out the common elements, i.e. the len of set for each group, and if they're common (len(set(s)) == 1) then join it back.
As an aside, you probably don't want to call your list by the name list. Any time you call list() afterwards is gonna be a headache. It's bad practice to shadow built-in keywords.
I'm trying to get a list consisting of the indexes for each item of another sequence.
Sounds easy enough in theory.
a = 'string of letters'
b = [a.index(x) for x in a]
But it doesn't work. I've tried list comprehensions, simple for loops, using enumerate etc, but every time b will return the same index for duplicates in a.
That is, 's' in a, for example, will return '0' in b for both the first and last item because they're the same character.
I'm guessing is cache or something like that as a way for Python to speed things up.
In any case, I can't figure this out and I'd appreciate some help as to how I can get this working as well as maybe an explanation of why this happens.
Thanks a lot for the input. I did figure it out with enumerate, actually.
To elaborate, I had two lists, a and b. a contains both uppercase and lowercase characters. b consists of the same characters as a, but shifted by a certain number of positions, like in a cipher.
I wanted to keep the case of the characters in b at the same position, after the 'encoding', but I needed the index of each character in 'A'.
Anyway, it was as simple as this:
a = 'tEXt'
c = [x for x,y in enumerate(a) if y.isupper()]
b = ['x', 't', 't', 'e'] #(this is the encoded version of 'a', returned from a different place as a string, but converted here to a list)
for x in c:
b[x] = b[x].upper()
b = ''.join[b]
b
'xTTe'
.index just returns the first occurrence of a character in a string - this has nothing to do with caches. It seems like you just want the list of numbers from 0 until your string length-1:
b = list(range(len(a)))
You do not mention why you need this, but it's pretty rare to need something like this in Python. Note in Python 3 range returns a a special type of it's own representing an immutable sequence of numbers, so you need to explicitly convert it to a list if you do actually need that.
I refactored the code you posted as an answer, let me know if I understood things correctly.
from typing import List
def copy_case(a: str, b: str) -> str:
res_chars: List[str] = []
curr_a: str
curr_b: str
for curr_a, curr_b in zip(a, b):
if curr_a.isupper():
curr_b = curr_b.upper()
else:
curr_b = curr_b.lower()
res_chars.append(curr_b)
return ''.join(res_chars)
print(copy_case('tEXt', 'xTTe'))
One approach could be to build a dictionary, iterating over the distinct letters in the string and using re.finditer to obtain the index of all occurrences in the string. So going step by step:
import re
a = 'string of letters'
We can find the unique letters in the string by taking a set:
letters = set(a.replace(' ',''))
# {'e', 'f', 'g', 'i', 'l', 'n', 'o', 'r', 's', 't'}
Then we could use a dictionary comprehension to build the dictionary, in which the the values are a list generated by iterating over all match instances returned by re.finditer:
{w: [m.start() for m in re.finditer(w, a)] for w in letters}
{'i': [3],
'o': [7],
'f': [8],
'l': [10],
'g': [5],
'e': [11, 14],
't': [1, 12, 13],
's': [0, 16],
'n': [4],
'r': [2, 15]}
A dict is probably better than a list for this purpose:
foo = {x : [] for x in a} #creates dict with keys being unique values in a
for i,x in enumerate(a):
foo[x].append(i) #adds each index into dict
for example for string 'abababababa':
{'a': [0, 2, 4, 6, 8], 'b': [1, 3, 5, 7, 9]}
Sounds like you're trying to get a list of the indeces of each input char as
an output. So, for s, you would get [0, 16], or something along those lines.
So for each input char, you would add its position to the right list.
Storing the results in a dict seems like a good approach, so, something like:
def index_dict(stringy):
d = {}
for index, char in enumerate(stringy):
if char not in d:
d[char] = []
d[char].append(index)
return d
The index() method always finds the first occurrence. You need to find all occurrences. So, the above func will give you a dict with all the keys matching the chars of your input string, and then the value for each key is a list of indeces where that char is found.
I am trying to use the number of items inside of a list that is inside of another list to make a bar chart, however, when I try to count the length of each individual list inside of that list, i get NameError: name 'no' is not defined
mesasnovio = [[no,nope,yes],[a,b,c,d],[f,g,h,i],[j,k,l,m],[s,t]]
print (len(mesasnovio[1]))
I want it to be able to show me the length
>>> 3
You missed ''.
mesasnovio = [['no','nope','yes'],['a','b','c','d'],['f','g','h','i'],['j','k','l','m'],['s','t']]
print (len(mesasnovio[1]))
Also, because lists are zero-indexed in Python, mesasnovio[1] refers to ['a', 'b', 'c', 'd']. So you will get a length of 4, not 3.
Here is one way using map to get all len of list inside your list
lenl=list(map(len,mesasnovio))
lenl
Out[595]: [3, 4, 4, 4, 2]
lenl[0]
3
You need to wrap all of your strings in quotations.
mesasnovio = [['no','nope','yes'],['a','b','c','d'],['f','g','h','i'],['j','k','l','m'],['s','t']]
Others have addressed your issue with not encapsulating your strings, but if you would like the length of all inner lists
lengths = list(map(len, mesasnovio))
I am trying to write a function 'add_to_hist' that takes a character and a histogram and adds an occurrence of that character to the histogram. If the character doesn't already have an entry, it should create one. For example:
>>>hist = [['B',1],['a',3],['n',2],['!',1]]
>>>add_to_hist('a',hist)
>>>hist
Should return: [['B', 1], ['a', 4], ['n', 2], ['!', 1]]
Here is what I have so far:
def add_to_hist(x,hist):
if x in hist:
hist['a'] = hist['a'] + 1
return hist
else: hist.append(x)
return (hist)
You choose to represent your histogram as a list of 2-element-lists; however in your add_to_hist function, in the else branch, you append the item itself. You should append [x, 1]. Also, for the same reason, you cannot check if x in hist, because x is an item (str), but the elements of hist are lists.
There are also other errors in your function (incorrect indentation; use of 'a' instead of x).
In case using a list of lists is not a requirement, there are better ways to do this, for example using a dict instead of a list, a defaultdict(int), or the collections.Counter class.
It looks like your code has been written with dictionaries in mind:
hist = {'B': 1, 'a': 4, 'n': 2, '!': 1}
def add_to_hist(x,hist):
if x in hist:
hist[i] = hist[i] + 1
else:
hist[i] = 1
return hist
It'd be easier to represent your histogram as a dictionary, as then you could directly access elements that match x.
However, as is (assuming you're forced to use lists), here's how you'd solve this problem:
def add_to_hist(x,hist):
for i in range(len(hist)):
if x == hist[i][0]:
hist[i][1] = hist[i][1] + 1
return hist
hist.append([x, 1])
return hist
It's a list of list, you cannot directly do "x in hist" as it'll just try and match x with each element of hist (which is a list) so this'll never work. You have to run through hist, getting each element, and then comparing x on that. If you find it, add one to the second element of that found list, and return out.
Now, if you run through the entire for loop without finding a matching element, you know it doesn't exist, so you can append a new value to the hist.
from pprint import *
sites = [['a','b','c'],['d','e','f'],[1,2,3]]
pprint(sites)
for site in sites:
sites.remove(site)
pprint(sites)
outputs:
[['a', 'b', 'c'], ['d', 'e', 'f'], [1, 2, 3]]
[['d', 'e', 'f']]
why is it not None, or an empty list [] ?
It's because you're modifying a list as you're iterating over it. You should never do that.
For something like this, you should make a copy of the list and iterate over that.
for site in sites[:]:
sites.remove(site)
Because resizing a collection while iterating over it is the Python equivalent to undefined behaviour in C and C++. You may get an exception or subtly wrong behaviour. Just don't do it. In this particular case, what likely happens under the hood is:
The iterator starts with index 0, stores that it is at index 0, and gives you the item stored at that index.
You remove the item at index 0 and everything afterwards is moved to the left by one to fill the hole.
The iterator is asked for the next item, and faithfully increments the index it's at by one, stores 1 as the new index, and gives you the item at that index. But because of said moving of items caused by the remove operation, the item at index 1 is the item that started out at index 2 (the last item).
You delete that.
The iterator is asked for the next item, but signals end of iteration as the next index (2) is out of range (which is now just 0..0).
Normally I would expect the iterator to bail out because of modifying the connected list. With a dictionary, this would happen at least.
Why is the d, e, f stuff not removed? I can only guess: Probably the iterator has an internal counter (or is even only based on the "fallback iteration protocol" with getitem).
I. e., the first item yielded is sites[0], i. e. ['a', 'b', 'c']. This is then removed from the list.
The second one is sites[1] - which is [1, 2, 3] because the indexes have changed. This is removed as well.
And the third would be sites[2] - but as this would be an index error, the iterator stops.