I have strings describing a range of characters alphabetically, made up of two characters separated by a hyphen. I'd like to expand them out into a list of the individual characters like this:
'a-d' -> ['a','b','c','d']
'B-F' -> ['B','C','D','E','F']
What would be the best way to do this in Python?
In [19]: s = 'B-F'
In [20]: list(map(chr, range(ord(s[0]), ord(s[-1]) + 1)))
Out[20]: ['B', 'C', 'D', 'E', 'F']
The trick is to convert both characters to their ASCII codes, and then use range().
P.S. Since you require a list, the list(map(...)) construct can be replaced with a list comprehension.
Along with aix's excellent answer using map(), you could do this with a list comprehension:
>>> s = "A-F"
>>> [chr(item) for item in range(ord(s[0]), ord(s[-1])+1)]
['A', 'B', 'C', 'D', 'E', 'F']
import string
def lis(strs):
upper=string.ascii_uppercase
lower=string.ascii_lowercase
if strs[0] in upper:
return list(upper[upper.index(strs[0]): upper.index(strs[-1])+1])
if strs[0] in lower:
return list(lower[lower.index(strs[0]): lower.index(strs[-1])+1])
print(lis('a-d'))
print(lis('B-F'))
output:
['a', 'b', 'c', 'd']
['B', 'C', 'D', 'E', 'F']
Related
I am trying to insert an element in the list at multiple instances. But by doing this, the length of the list is constantly changing. So, it is not reaching the last element.
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
aq = len(my_list)
for i in range(aq):
if my_list[i] == 'a':
my_list.insert(i+1, 'g')
aq = aq+1
print(my_list)
The output I am getting is -
['a', 'g', 'b', 'c', 'd', 'e', 'a']
The output I am trying to get is -
['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
How can I get that?
Changing aq in the loop does not change the range. That created an iterator when you entered the loop, and that iterator won't change. There are two ways to do this. The easy way is to build a new list:
newlist = []
for c in my_list:
newlist.append(c)
if c == 'a':
newlist.append('g')
The trickier way is to use .find() to find the next instance of 'a' and insert a 'g' after it, then keep searching for the next one.
Here is a nice way to write it using the built-in itertools.chain.from_iterable:
from itertools import chain
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
my_list = list(chain.from_iterable((x, "g") if x == "a" else x for x in my_list))
# ['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
Here, every occurance of "a" is replaced with "a", "g" in the list, otherwise the elements are left alone.
I'm trying to create a list of lists from a single list. I'm able to do this if the new list of lists have the same number of elements, however this will not always be the case
As said earlier, the function below works when the list of lists have the same number of elements.
I've tried using regular expressions to determine if an element matches a pattern using
pattern2=re.compile(r'\d\d\d\d\d\d') because the first value on my new list of lists will always be 6 digits and it will be the only one that follows that format. However, i'm not sure of the syntax of getting it to stop at the next match and create another list
def chunks(l,n):
for i in range(0,len(l),n):
yield l[i:i+n]
The code above works if the list of lists will contain the same number of elements
Below is what I expect.
OldList=[111111,a,b,c,d,222222,a,b,c,333333,a,d,e,f]
DesiredList=[[111111,a,b,c,d],[222222,a,b,c],[333333,a,d,e,f]]
Many thanks indeed.
Cheers
Likely a much more efficient way to do this (with fewer loops), but here is one approach that finds the indexes of the breakpoints and then slices the list from index to index appending None to the end of the indexes list to capture the remaining items. If your 6 digit numbers are really strings, then you could eliminate the str() inside re.match().
import re
d = [111111,'a','b','c','d',222222,'a','b','c',333333,'a','d','e','f']
indexes = [i for i, x in enumerate(d) if re.match(r'\d{6}', str(x))]
groups = [d[s:e] for s, e in zip(indexes, indexes[1:] + [None])]
print(groups)
# [[111111, 'a', 'b', 'c', 'd'], [222222, 'a', 'b', 'c'], [333333, 'a', 'd', 'e', 'f']]
You can use a fold.
First, define a function to locate the start flag:
>>> def is_start_flag(v):
... return len(v) == 6 and v.isdigit()
That will be useful if the flags are not exactly what you expected them to be, or to exclude some false positives, or even if you need a regex.
Then use functools.reduce:
>>> L = d = ['111111', 'a', 'b', 'c', 'd', '222222', 'a', 'b', 'c', '333333', 'a', 'd', 'e', 'f']
>>> import functools
>>> functools.reduce(lambda acc, x: acc+[[x]] if is_start_flag(x) else acc[:-1]+[acc[-1]+[x]], L, [])
[['111111', 'a', 'b', 'c', 'd'], ['222222', 'a', 'b', 'c'], ['333333', 'a', 'd', 'e', 'f']]
If the next element x is the start flag, then append a new list [x] to the accumulator. Else, add the element to the current list, ie the last list of the accumulator.
I have two arrays of 5 objects
a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
b = ['a', 'b', 'd', 'f', 'e', 'f']
I would like to identify the repeated patterns of more than one object and their occurrences like
['a', 'b']: 2
['e', 'f']: 3
['f', 'e', 'f']: 2
The first sequence ['a', 'b'] appeared once in a and once in b, so total count 2. The 2nd sequence ['e', 'f'] appeared twice in a, once in b, so total 3. The 3rd sequence ['f', 'e', 'f'] appeared once in a, and once in b, so total 2.
Is there a good way to do this in Python?
Also the universe of objects is limited. Was wondering if there's an efficient solution that utilizes hash table?
If the approach is only for two lists, the following approach should work. I am not sure if this is the most efficient solution though.
A nice description of find n-grams is given in this blog post.
This approach provides the min length and determines the max length that a repeating sequence of a list might have (at most half the length of the list).
We then find all the sequences for each of the lists by combining the sequences for individual lists. Then we have a counter of every sequence and its count.
Finally we return a dictionary of all the sequences that occur more than once.
def find_repeating(list_a, list_b):
min_len = 2
def find_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])
seq_list_a = []
for seq_len in range(min_len, len(list_a) + 1):
seq_list_a += [val for val in find_ngrams(list_a, seq_len)]
seq_list_b = []
for seq_len in range(min_len, len(list_b) + 1):
seq_list_b += [val for val in find_ngrams(list_b, seq_len)]
all_sequences = seq_list_a + seq_list_b
counter = {}
for seq in all_sequences:
counter[seq] = counter.get(seq, 0) + 1
filtered_counter = {k: v for k, v in counter.items() if v > 1}
return filtered_counter
Do let me know if you are unsure about anything.
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print find_repeating(list_a, list_b)
{('f', 'e'): 2, ('e', 'f'): 3, ('f', 'e', 'f'): 2, ('a', 'b'): 2}
When you mentioned that you were looking for an efficient solution, my first thought was of the approaches to solving the longest common subsequence problem. But in your case, we actually do need to enumerate all common subsequences so that we can count them, so a dynamic programming solution will not do. Here's my solution. It's certainly shorter than SSSINISTER's solution (mostly because I use the collections.Counter class).
#!/usr/bin/env python3
def find_repeating(sequence_a, sequence_b, min_len=2):
from collections import Counter
# Find all subsequences
subseq_a = [tuple(sequence_a[start:stop]) for start in range(len(sequence_a)-min_len+1)
for stop in range(start+min_len,len(sequence_a)+1)]
subseq_b = [tuple(sequence_b[start:stop]) for start in range(len(sequence_b)-min_len+1)
for stop in range(start+min_len,len(sequence_b)+1)]
# Find common subsequences
common = set(tup for tup in subseq_a if tup in subseq_b)
# Count common subsequences
return Counter(tup for tup in (subseq_a + subseq_b) if tup in common)
Resulting in ...
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print(find_repeating(list_a, list_b))
Counter({('e', 'f'): 3, ('f', 'e'): 2, ('a', 'b'): 2, ('f', 'e', 'f'): 2})
The advantage to using collections.Counter is that not only do you not need to produce the actual code to iterate and count, you get access to all of the dict methods as well as a few specialized methods for using those counts.
I'm working a on a list like this, a = ['a','b','','','c','d'], the real list is including thousands of data entries. Is there a fancy way to make the list a as [['a','b'],['c','d]] because the data is really huge?
You can use itertools.groupby for this. You basically group by consecutive empty strings, or consecutive non-empty strings. Then keep all groups that were grouped by True from the lambda in a list comprehension.
>>> from itertools import groupby
>>> [list(i[1]) for i in groupby(a, lambda i: i != '') if i[0]]
[['a', 'b'], ['c', 'd']]
For another example
>>> b = ['a','b','','','c','d', '', 'e', 'f', 'g', '', '', 'h']
>>> [list(i[1]) for i in groupby(b, lambda i: i != '') if i[0]]
[['a', 'b'], ['c', 'd'], ['e', 'f', 'g'], ['h']]
In Python I have a list of elements aList and a list of indices myIndices. Is there any way I can retrieve all at once those items in aList having as indices the values in myIndices?
Example:
>>> aList = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> myIndices = [0, 3, 4]
>>> aList.A_FUNCTION(myIndices)
['a', 'd', 'e']
I don't know any method to do it. But you could use a list comprehension:
>>> [aList[i] for i in myIndices]
Definitely use a list comprehension but here is a function that does it (there are no methods of list that do this). This is however bad use of itemgetter but just for the sake of knowledge I have posted this.
>>> from operator import itemgetter
>>> a_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> my_indices = [0, 3, 4]
>>> itemgetter(*my_indices)(a_list)
('a', 'd', 'e')
Indexing by lists can be done in numpy. Convert your base list to a numpy array and then apply another list as an index:
>>> from numpy import array
>>> array(aList)[myIndices]
array(['a', 'd', 'e'],
dtype='|S1')
If you need, convert back to a list at the end:
>>> from numpy import array
>>> a = array(aList)[myIndices]
>>> list(a)
['a', 'd', 'e']
In some cases this solution can be more convenient than list comprehension.
You could use map
map(aList.__getitem__, myIndices)
or operator.itemgetter
f = operator.itemgetter(*aList)
f(myIndices)
If you do not require a list with simultaneous access to all elements, but just wish to use all the items in the sub-list iteratively (or pass them to something that will), its more efficient to use a generator expression rather than list comprehension:
(aList[i] for i in myIndices)
Alternatively, you could go with functional approach using map and a lambda function.
>>> list(map(lambda i: aList[i], myIndices))
['a', 'd', 'e']
I wasn't happy with these solutions, so I created a Flexlist class that simply extends the list class, and allows for flexible indexing by integer, slice or index-list:
class Flexlist(list):
def __getitem__(self, keys):
if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
return [self[k] for k in keys]
Then, for your example, you could use it with:
aList = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
myIndices = [0, 3, 4]
vals = aList[myIndices]
print(vals) # ['a', 'd', 'e']