python get list element according to alphabet - python

I have a list of names alphabetically, like:
list = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ..]
How can I get element from each starting letter? Do I have to iterate the list one time? or Does python has some function to do it? New to python, this may be a really naive problem.
Suppose I want to get the second element from names that starts from 'A', this case I get 'ACE'.

If you're going to do multiple searches, you should take the one-time hit of iterating through everything and build a dictionary (or, to make it simpler, collections.defaultdict):
from collections import defaultdict
d = defaultdict(list)
words = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ...]
for word in words:
d[word[0]].append(word)
(Note that you shouldn't name your own variable list, as it shadows the built-in.)
Now you can easily query for the second word starting with "A":
d["A"][1] == "ACE"
or the first two words for each letter:
first_two = {c: w[:2] for c, w in d.items()}

Using generator expression and itertools.islice:
>>> import itertools
>>> names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'ACE'
>>> names = ['ABC', 'BBD', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'no-such-name'

Simply group all the elements by their first char
from itertools import groupby
from operator import itemgetter
example = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
d = {g:list(values) for g, values in groupby(example, itemgetter(0))}
Now to get a value starting with a:
print d.get('A', [])
This is most usefull when you have a static list and will have multiple queries since as you may see, getting the 3rd item starting with 'A' is done in O(1)

You might want to use list comprehensions
mylist = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
elements_starting_with_A = [i for i in mylist if i[0] == 'A']
>>> ['ABC', 'ACE']
second = elements_starting_with_A[1]
>>> 'ACE'

In addition to list comprehension as others have mentioned, lists also have a sort() method.
mylist = ['AA', 'BB', 'AB', 'CA', 'AC']
newlist = [i for i in mylist if i[0] == 'A']
newlist.sort()
newlist
>>> ['AA', 'AB', 'AC']

The simple solution is to iterate over the whole list in O(n) :
(name for name in names if name.startswith('A'))
However you could sort the names and search in O(log(n)) for the item which is supposed to be on the index or after (using lexicographic comparison). The module bisect will help you to find the bounds :
from bisect import bisect_left
names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
names.sort()
lower = bisect_left(names, 'B')
upper = bisect_left(names, chr(1+ord('B')))
print [names[i] for i in range(lower, upper)]
# ['BED', 'BRT']

Related

how do i align a list of strings to the right?

Is there a way for me to 'align' a list of strings to the right? I'm performing a counting sort here and I want sort my characters from the right.
For example, given a list of strings
eg. list = ['abc', 'a','qwerty', 'cd']
The length of the longest string in the list is 6 (qwerty),
list = ['abc', 'a','qwerty', 'cd']
biggest = max(list, key=len)
max = biggest - 1
list2= []
for col in range(-max, 0):
for i in list:
list2.append(i[abs(col)])
As my other strings are not the same length as qwerty, there will be an error, how do I 'align' all my strings to the right? so when I try to sort from the last alphabet, 'a' would be aligned with 'y' from 'qwerty' too.
a
cd
abc
qwerty
And I would like to accomplish this without padding
You can sort your whole list by length and use the format mini language for output:
data = ['abc', 'a','qwerty', 'cd']
s = sorted(data, key=len) # sorted copy of your list
maxlen = len(s[-1]) # longest is last element in sorted list
for l in s:
print(f"{l:>{maxlen}}") # not padded, just printed out right aligned
Output:
a
cd
abc
qwerty
As far as I understand the question, this should be the solution:
list_1 = ['aaaz', 'abc', 'a', 'qwerty', 'cd', "xxxxxxxxca"]
def my_sort(data):
inverted = sorted(data, key=lambda x: x[::-1])
return inverted
max_len = max([len(s) for s in list_1])
list_2 = my_sort(list_1)
print(list_2)
>>> ['a', 'xxxxxxxxca', 'abc', 'cd', 'qwerty', 'aaaz']
I understand that the strings should be sorted alphabetically but from right to left.
list_1 = ['abc', 'a','qwerty', 'cd']
biggest = max(list, key=len)
biggest=len(biggest)
list_2=[]
//with padding
for i in list_1:
list_2.append(' '*(biggest-len(i))+i)
//without padding
for i in list_1:
list_2.append(f"{i:>{biggest}}")
I'd go with this approach

How to split a list into smaller lists python

I have a nested list that looks something like:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']...]
Is it possible for me to split the lst into small lists by their ID so that each small list contained elements with the same ID? The results should look something looks like:
lst1 = [['ID1', 'A'], ['ID1', 'B']...]
lst2 = [['ID2', 'AAA'], ['ID2', 'DDD']...]
You can use groupby:
from itertools import groupby
grp_lists = []
for i, grp in groupby(lst, key= lambda x: x[0]):
grp_lists.append(list(grp))
print(grp_lists[0])
[['ID1', 'A'], ['ID1', 'B']]
print(grp_lists[1])
[['ID2', 'AAA'], ['ID2', 'DDD']]
using collections.defaultdict:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']]
from collections import defaultdict
result = defaultdict(list)
for item in lst:
result[item[0]].append(item)
print(list(result.values()))
output:
[[['ID1', 'A'], ['ID1', 'B']], [['ID2', 'AAA'], ['ID2', 'DDD']]]
Without external functions: build a set of unique indexes, then loop over the original list building a new list for each of the indexes and filling it with list items that contain that index:
lst = [['ID1', 'A'],['ID1','B'],['ID2','AAA'], ['ID2','DDD']]
unique_set = set(elem[0] for elem in lst)
lst2 = [ [elem for elem in lst if elem[0] in every_unique] for every_unique in unique_set]
print (lst2)
Result:
[[['ID2', 'AAA'], ['ID2', 'DDD']], [['ID1', 'A'], ['ID1', 'B']]]
(It is possible to move unique_set into the final line, making it a one-liner. But that would make it less clear what happens.)
If you want to get separate variables like your example of a result:
lst1 = [sub_lst for sub_lst in lst if sub_lst[0] == 'ID1']
and
lst2 = [sub_lst for sub_lst in lst if sub_lst[0] == 'ID2']
from that, you can make a function:
def create_sub_list(id_str, original_lst):
return [x for x in original_lst if x[0] == id_str]
And call it like that:
lst1 = create_sub_list('ID1', lst)
If you want a dictionary of the sub-lists, for easier access, you can use:
from functools import reduce
def reduce_dict(ret_dict, sub_lst):
if (sub_lst[0] not in ret_dict):
ret_dict[sub_lst[0]] = sub_lst[1:]
else:
ret_dict[sub_lst[0]] += sub_lst[1:]
return ret_dict
grouped_dict = reduce(reduce_dict, lst, dict())
(If you know that in your list there will only be 1 string after each ID slot you can change both the sub_lst[1:]'s to sub_lst[1])
And then to access the elements if the dictionary you use the ID strings:
print(grouped_dict['ID1'])
This will print:
['A', 'B']

combine lists in list/dict comprehension way

is it possible to apply list/dictionary comprehension to the following code to have ["abc", "ab", "cd"]
tk = {}
tk[1] = ["abc"]
tk[2] = ["ab", "cd"]
tk[3] = ["ef", "gh"]
t = (1, 2)
combined = []
combined.append(tk[i]) for i in t #does not work. note, not all tk values are used, this is based on t.
I could think of
ll = [tk[i] for i in t], then this turns to be flatten list out of lists. so
ll = [tk[i] for i in t]
[item for sublist in ll for item in sublist]
but this is not one-liner. I wonder if there is better way.
If the order of values in the desired list matters, the generalized way to achieve this would be to sort the items in the dict based on the key, and merge the list of values. For example:
>>> from operator import itemgetter
>>> from itertools import chain
>>> list(chain.from_iterable(i for _, i in sorted(tk.items(), key=itemgetter(0))))
['abc', 'ab', 'cd']
Simply iterate through the values of your dictionary. Something like this:
>>> tk = {}
>>> tk[1] = ['abc']
>>> tk[2] = ['ab', 'cd']
>>> combined = [x for t in tk.values() for x in t]
>>> print combined
['abc', 'ab', 'cd']
You can even use an OrderedDict if you need to maintain the order of your lists, since a regular dict does not guarantee the order of its keys:
>>> import collections
>>> tk = collections.OrderedDict()
>>> tk[1] = ['abc']
>>> tk[2] = ['ab', 'cd']
>>> combined = [x for t in tk.values() for x in t]
>>> print combined
['abc', 'ab', 'cd']
Note that [tk[i] for i in (1, 2)], as you proposed, won't have the desired results. You still need to iterate through the values inside each list.
>>> [tk[i] for i in (1, 2)]
[['abc'], ['ab', 'cd']]
Also note that [tk[i] for i in tk], as you proposed later, is exactly the same as tk.values(). So you could the proposed solution [x for t in tk.values() for x in t] is equivalent to what you achieved but in one line.
Given your constraint of choosing a sequence of keys manually:
>>> tk = {}
>>> tk[1] = ["abc"]
>>> tk[2] = ["ab", "cd"]
>>> tk[3] = ["ef", "gh"]
You want:
>>> [vals for i in (1,2) for vals in tk[i]]
['abc', 'ab', 'cd']

count how many a combination occurs in a list

I created a list by doing this:
list3= [zip(Indiener1, Indiener2)]
Both elements are long lists of names.
But as a third element in the small combined list I want the number of times the combination of names occurs in the whole list3 as I have to do calculations with that number.
I tried list3.count() but that function only wanted to take one item.
How can I do this?
from collections import Counter
list1=["a","b","d","b"]
list2=["5","u","55","u"]
list3=zip(list1,list2)
print Counter(list3)
it outputs:
Counter({('b', 'u'): 2, ('d', '55'): 1, ('a', '5'): 1})
Use a counter and reverse the pairings to get ("foo","bar") == ("bar","foo"):
l1 =["foo","bar","foobar"]
l2 = ["bar","foo","bar"]
from collections import Counter
c = Counter(zip(l1,l2))
for k,v in c.items():
rev = tuple(reversed(k))
print("pairing {} appears {}".format(k,v + c.get(rev,0)))
To avoid getting double output ('foo', 'bar') and ('bar', 'foo') you can add rev to a set and check that it has not been seen already:
from collections import Counter
c = Counter(zip(l1,l2))
seen = set()
for k, v in c.items():
rev = tuple(reversed(k))
if k not in seen:
seen.add(rev)
print("pairing {} appears {} times".format(k,v + c.get(rev,0)))
pairing ('foo', 'bar') appears 2 times
pairing ('foobar', 'bar') appears 1 times
Since ("foo","bar") and ("bar","foo") are considered the same, you have to count on something like sets, where order doesn't matter:
>>> from collections import Counter
>>> l1 = ['John', 'Doe', 'Paul', 'Pablo', 'Paul', 'Doe']
>>> l2 = ['Doe', 'John', 'Doe', 'Doe', 'Doe', 'Paul']
>>> print Counter(frozenset(pair) for pair in zip(l1, l2))
Counter({
frozenset(['Paul', 'Doe']): 3,
frozenset(['John', 'Doe']): 2,
frozenset(['Doe', 'Pablo']): 1
})
You can also sort the pairs before counting, but a set makes the purpose more explicit.

check if a number already exist in a list in python

I am writing a python program where I will be appending numbers into a list, but I don't want the numbers in the list to repeat. So how do I check if a number is already in the list before I do list.append()?
You could do
if item not in mylist:
mylist.append(item)
But you should really use a set, like this :
myset = set()
myset.add(item)
EDIT: If order is important but your list is very big, you should probably use both a list and a set, like so:
mylist = []
myset = set()
for item in ...:
if item not in myset:
mylist.append(item)
myset.add(item)
This way, you get fast lookup for element existence, but you keep your ordering. If you use the naive solution, you will get O(n) performance for the lookup, and that can be bad if your list is big
Or, as #larsman pointed out, you can use OrderedDict to the same effect:
from collections import OrderedDict
mydict = OrderedDict()
for item in ...:
mydict[item] = True
If you want to have unique elements in your list, then why not use a set, if of course, order does not matter for you: -
>>> s = set()
>>> s.add(2)
>>> s.add(4)
>>> s.add(5)
>>> s.add(2)
>>> s
39: set([2, 4, 5])
If order is a matter of concern, then you can use: -
>>> def addUnique(l, num):
... if num not in l:
... l.append(num)
...
... return l
You can also find an OrderedSet recipe, which is referred to in Python Documentation
If you want your numbers in ascending order you can add them into a set and then sort the set into an ascending list.
s = set()
if number1 not in s:
s.add(number1)
if number2 not in s:
s.add(number2)
...
s = sorted(s) #Now a list in ascending order
You could probably use a set object instead. Just add numbers to the set. They inherently do not replicate.
To check if a number is in a list one can use the in keyword.
Let's create a list
exampleList = [1, 2, 3, 4, 5]
Now let's see if it contains the number 4:
contains = 4 in exampleList
print(contains)
>>>> True
As you want to append when an element is not in a list, the not in can also help
exampleList2 = ["a", "b", "c", "d", "e"]
notcontain = "e" not in exampleList2
print(notcontain)
>>> False
But, as others have mentioned, you may want to consider using a different data structure, more specifically, set. See examples below (Source):
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
>>> print(basket) # show that duplicates have been removed
{'orange', 'banana', 'pear', 'apple'}
'orange' in basket # fast membership testing
True
'crabgrass' in basket
False
# Demonstrate set operations on unique letters from two words
...
a = set('abracadabra')
b = set('alacazam')
a # unique letters in a
>>> {'a', 'r', 'b', 'c', 'd'}
a - b # letters in a but not in b
>>> {'r', 'd', 'b'}
a | b # letters in a or b or both
>>> {'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
a & b # letters in both a and b
>>> {'a', 'c'}
a ^ b # letters in a or b but not both
>>> {'r', 'd', 'b', 'm', 'z', 'l'}

Categories