Keep strings that occur N times or more

Keep strings that occur N times or more - python

I have a list that is
mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
And I used Counter from collections on this list to get the result:
from collection import Counter
counts = Counter(mylist)
#Counter({'a': 3, 'c': 2, 'b': 2, 'd': 1})
Now I want to subset this so that I have all elements that occur some number of times, for example: 2 times or more - so that the output looks like this:
['a', 'b', 'c']
This seems like it should be a simple task - but I have not found anything that has helped me so far.
Can anyone suggest somewhere to look? I am also not attached to using Counter if I have taken the wrong approach. I should note I am new to python so I apologise if this is trivial.

[s for s, c in counts.iteritems() if c >= 2]
# => ['a', 'c', 'b']

Try this...
def get_duplicatesarrval(arrval):
dup_array = arrval[:]
for i in set(arrval):
dup_array.remove(i)
return list(set(dup_array))
mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
print get_duplicatesarrval(mylist)
Result:
[a, b, c]

The usual way would be to use a list comprehension as #Adaman does.
In the special case of 2 or more, you can also subtract one Counter from another
>>> counts = Counter(mylist) - Counter(set(mylist))
>>> counts.keys()
['a', 'c', 'b']

from itertools import groupby
mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
res = [i for i,j in groupby(mylist) if len(list(j))>=2]
print res
['a', 'b', 'c']

I think above mentioned answers are better, but I believe this is the simplest method to understand:
mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
newlist=[]
newlist.append(mylist[0])
for i in mylist:
if i in newlist:
continue
else:
newlist.append(i)
print newlist
>>>['a', 'b', 'c', 'd']

Related

Group items if trailed by string [duplicate]

I have a list called list_of_strings that looks like this:
['a', 'b', 'c', 'a', 'd', 'c', 'e']
I want to split this list by a value (in this case c). I also want to keep c in the resulting split.
So the expected result is:
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]]
Any easy way to do this?

You can use more_itertoools+ to accomplish this simply and clearly:
from more_itertools import split_after
lst = ["a", "b", "c", "a", "d", "c", "e"]
list(split_after(lst, lambda x: x == "c"))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
Another example, here we split words by simply changing the predicate:
lst = ["ant", "bat", "cat", "asp", "dog", "carp", "eel"]
list(split_after(lst, lambda x: x.startswith("c")))
# [['ant', 'bat', 'cat'], ['asp', 'dog', 'carp'], ['eel']]
+ A third-party library that implements itertools recipes and more. > pip install more_itertools

stuff = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
You can find out the indices with 'c' like this, and add 1 because you'll be splitting after it, not at its index:
indices = [i + 1 for i, x in enumerate(stuff) if x == 'c']
Then extract slices like this:
split_stuff = [stuff[i:j] for i, j in zip([0] + indices, indices + [None])]
The zip gives you a list of tuples analogous to (indices[i], indices[i + 1]), with the concatenated [0] allowing you to extract the first part and [None] extracting the last slice (stuff[i:])

You could try something like the following:
list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
output = [[]]
for x in list_of_strings:
output[-1].append(x)
if x == 'c':
output.append([])
Though it should be noted that this will append an empty list to your output if your input's last element is 'c'

def spliter(value, array):
res = []
while value in array:
index = array.index(value)
res.append(array[:index + 1])
array = array[index + 1:]
if array:
# Append last elements
res.append(array)
return res
a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
print(spliter('b',a))
# [['a', 'b'], ['c', 'a', 'd', 'c', 'e']]
print(spliter('c',a))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

What about this. It should only iterate over the input once and some of that is in the index method, which is executed as native code.
def splitkeep(v, c):
curr = 0
try:
nex = v.index(c)
while True:
yield v[curr: (nex + 1)]
curr = nex + 1
nex += v[curr:].index(c) + 1
except ValueError:
if v[curr:]: yield v[curr:]
print(list(splitkeep( ['a', 'b', 'c', 'a', 'd', 'c', 'e'], 'c')))
result
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
I wasn't sure if you wanted to keep an empty list at the end of the result if the final value was the value you were splitting on. I made an assumption you wouldn't, so I put a condition in excluding the final value if it's empty.
This has the result that the input [] results in only [] when arguably it might result in [[]].

How about this rather playful script:
a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
b = ''.join(a).split('c') # ['ab', 'ad', 'e']
c = [x + 'c' if i < len(b)-1 else x for i, x in enumerate(b)] # ['abc', 'adc', 'e']
d = [list(x) for x in c if x]
print(d) # [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
It can also handle beginnings and endings with a "c"
a = ['c', 'a', 'b', 'c', 'a', 'd', 'c', 'e', 'c']
d -> [['c'], ['a', 'b', 'c'], ['a', 'd', 'c'], ['e', 'c']]

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
value = 'c'
new_list = []
temp_list = []
for item in list_of_strings:
if item is value:
temp_list.append(item)
new_list.append(temp_list[:])
temp_list.clear()
else:
temp_list.append(item)
if (temp_list):
new_list.append(temp_list)
print(new_list)

You can try using below snippet. Use more_itertools
>>> l = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
>>> from more_itertools import sliced
>>> list(sliced(l,l.index('c')+1))
Output is:
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

Removing duplicate characters from a list in Python where the pattern repeats

I am monitoring a serial port that sends data that looks like this:
['','a','a','a','a','a','a','','b','b','b','b','b','b','b','b',
'','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d',
'','','e','e','e','e','e','e','','','a','a','a','a','a','a',
'','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c',
'','','','d','d','d','d','d','d','','','e','e','e','e','e','e',
'','','a','a','a','a','a','a','','b','b','b','b','b','b','b','b',
'','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d',
'','','e','e','e','e','e','e','','','a','a','a','a','a','a',
'','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c',
'','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','']
I need to be able to convert this into:
['a','b','c','d','a','b','c','d','a','b','c','d','a','b','c','d']
So I'm removing duplicates and empty strings, but also retaining the number of times the pattern repeats itself.
I haven't been able to figure it out. Can someone help?

Here's a solution using a list comprehension and itertools.zip_longest: keep an element only if it's not an empty string, and not equal to the next element. You can use an iterator to skip the first element, to avoid the cost of slicing the list.
from itertools import zip_longest
def remove_consecutive_duplicates(lst):
ahead = iter(lst)
next(ahead)
return [ x for x, y in zip_longest(lst, ahead) if x and x != y ]
Usage:
>>> remove_consecutive_duplicates([1, 1, 2, 2, 3, 1, 3, 3, 3, 2])
[1, 2, 3, 1, 3, 2]
>>> remove_consecutive_duplicates(my_list)
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd',
'e', 'a', 'b', 'c', 'd', 'e']
I'm assuming either that there are no duplicates separated by empty strings (e.g. 'a', '', 'a'), or that you don't want to remove such duplicates. If this assumption is wrong, then you should filter out the empty strings first:
>>> example = ['a', '', 'a']
>>> remove_consecutive_duplicates([ x for x in example if x ])
['a']

You can loop over the list and add the appropriate contitions. For the response that you are expecting, you just need to whether previous character is not same as current character
current_sequence = ['','a','a','a','a','a','a','','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','']
sequence_list = []
for x in range(len(current_sequence)):
if current_sequence[x]:
if current_sequence[x] != current_sequence[x-1]:
sequence_list.append(current_sequence[x])
print(sequence_list)

You need something like that
li = ['','a','a','a','a','a','a','','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d','','','e','e','e','e','e','e','','','a','a','a','a','a','a','','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c','','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','']
new_li = []
e_ = ''
for e in li:
if len(e) > 0 and e_ != e:
new_li.append(e)
e_ = e
print(new_li)
Output
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']

You can use itertools.groupby:
if your list is ll
ll = [i for i in ll if i]
out = []
for k, g in groupby(ll, key=lambda x: ord(x)):
out.append(chr(k))
print(out)
#prints ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', ...

from itertools import groupby
from operator import itemgetter
# data <- your data
a = [k for k, v in groupby(data) if k] # approach 1
b = list(filter(bool, map(itemgetter(0), groupby(data)))) # approach 2
assert a == b
print(a)
Result:
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']

using the set method you can remove the duplicates from the list
data = ['','a','a','a','a','a','a','','b','b','b','b','b','b','b','b',
'','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d',
'','','e','e','e','e','e','e','','','a','a','a','a','a','a',
'','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c',
'','','','d','d','d','d','d','d','','','e','e','e','e','e','e',
'','','a','a','a','a','a','a','','b','b','b','b','b','b','b','b',
'','','c','c','c','c','c','c','','','','d','d','d','d','d','d','d','d',
'','','e','e','e','e','e','e','','','a','a','a','a','a','a',
'','','','b','b','b','b','b','b','b','b','b','','','c','c','c','c','c','c',
'','','','d','d','d','d','d','d','','','e','e','e','e','e','e','','']
print(set(data))

Python: How to update dictionary with step-index from list

I am a week-old python learner. I would like to know: Let’s say:
list= [“a”, “A”, “b”, “B”, “c”, “C”]
I need to update them in dictionary to be a result like this:
dict={“a”:”A”, “b”:”B”, “c”:”C”}
I try to use index of list within dict.update({list[n::2]: list[n+1::2]} and for n in range(0,(len(list)/2))
I think i did something wrong. Please correct me.
Thank you in advance.

Try the following:
>>> lst = ['a', 'A', 'b', 'B', 'c', 'C']
>>> dct = dict(zip(lst[::2],lst[1::2]))
>>> dct
{'a': 'A', 'b': 'B', 'c': 'C'}
Explanation:
>>> lst[::2]
['a', 'b', 'c']
>>> lst[1::2]
['A', 'B', 'C']
>>> zip(lst[::2], lst[1::2])
# this actually gives a zip iterator which contains:
# [('a', 'A'), ('b', 'B'), ('c', 'C')]
>>> dict(zip(lst[::2], lst[1::2]))
# here each tuple is interpreted as key value pair, so finally you get:
{'a': 'A', 'b': 'B', 'c': 'C'}
NOTE: Don't name your variables same as python keywords.
Correct version of your program would be:
lst = ['a', 'A', 'b', 'B', 'c', 'C']
dct = {}
for n in range(0,int(len(lst)/2)):
dct.update({lst[n]: lst[n+1]})
print(dct)
Yours did not work because you used slices in each iteration, instead of accessing each individual element. lst[0::2] gives ['a', 'b', 'c'] and lst[1::2] gives ['A', 'B', 'C']. So for the first iteration, when n == 0 you are trying to update the dictionary with the pair ['a', 'b', 'c'] : ['A', 'B', 'C'] and you will get a type error as list can not be assigned as key to the dictionary as lists are unhashable.

You can use dictionary comprehension like this:
>>> l = list("aAbBcCdD")
>>> l
['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D']
>>> { l[i] : l[i+1] for i in range(0,len(l),2)}
{'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}

The below code would be the perfect apt to your question. Hope this helped you
a = ["a", "A", "B","b", "c","C","d", "D"]
b = {}
for each in range(len(a)):
if each % 2 == 0:
b[a[each]] = a[each + 1]
print(b)

How can I copy each element of a list a distinct, specified number of times?

I am using python 3 and I want to create a new list with elements from a the first list repeated as many times as the respective number of the second list
For example:
char = ['a', 'b', 'c']
int = [2, 4, 3]
result = ['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']
Thx all for help

One-liner solution
Iterate over both lists simultaneously with zip, and create sub-lists for each element with the correct length. Join them with itertools.chain:
# from itertools import chain
list(chain(*([l]*n for l, n in zip(char, int))))
Output:
['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']

char = ['a', 'b', 'c']
ints = [2, 4, 3]
Solution 1: Using numpy
import numpy as np
result = np.repeat(char, ints)
Solution 2: Pure python
result = []
for i, c in zip(ints, char):
result.extend(c*i)
Output:
['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']

Using zip
Ex:
c = ['a', 'b', 'c']
intVal = [2, 4, 3]
result = []
for i, v in zip(c, intVal):
result.extend(list(i*v))
print(result)
Output:
['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']

With for loops, very basic:
results = list()
for k, i in enumerate(integers):
results_to_add = char[k]*i
results.extend(results_to_add)

char = ['a', 'b', 'c']
rep = [2, 4, 3]
res = [c*i.split(",") for i,c in zip(char, rep )] # [['a', 'a'], ['b', 'b', 'b', 'b'], ['c', 'c', 'c']]
print([item for sublist in res for item in sublist]) # flattening the list
EDIT:
one-liner using itertools.chain:
print(list(chain(*[c*i.split(",") for (i,c) in zip(char, int)])))
OUTPUT:
['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']

One liner using list-comprehension and sum(list_, []).
sum([[x]*y for x,y in zip(char_, int_)], [])
>>> char_ = ['a', 'b', 'c']
>>> int_ = [2, 4, 3]
>>> print(sum([[x]*y for x,y in zip(char_, int_)], []))
>>> ['a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c']
Alternative:
list(itertools.chain.from_iterable([[x]*y for x,y in zip(char_, int_)]))
Looks like it is faster than using itertools.
>>> timeit.repeat(lambda:list(itertools.chain.from_iterable([[x]*y for x,y in zip(char_, int_)])), number = 1000000)
[1.2130177360377274, 1.115080286981538, 1.1174913379945792]
>>> timeit.repeat(lambda:sum([[x]*y for x,y in zip(char_, int_)], []), number = 1000000)
[1.0470570910256356, 0.9831087450147606, 0.9912429330288433]

Split List By Value and Keep Separators

I have a list called list_of_strings that looks like this:
['a', 'b', 'c', 'a', 'd', 'c', 'e']
I want to split this list by a value (in this case c). I also want to keep c in the resulting split.
So the expected result is:
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]]
Any easy way to do this?

You can use more_itertoools+ to accomplish this simply and clearly:
from more_itertools import split_after
lst = ["a", "b", "c", "a", "d", "c", "e"]
list(split_after(lst, lambda x: x == "c"))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
Another example, here we split words by simply changing the predicate:
lst = ["ant", "bat", "cat", "asp", "dog", "carp", "eel"]
list(split_after(lst, lambda x: x.startswith("c")))
# [['ant', 'bat', 'cat'], ['asp', 'dog', 'carp'], ['eel']]
+ A third-party library that implements itertools recipes and more. > pip install more_itertools

stuff = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
You can find out the indices with 'c' like this, and add 1 because you'll be splitting after it, not at its index:
indices = [i + 1 for i, x in enumerate(stuff) if x == 'c']
Then extract slices like this:
split_stuff = [stuff[i:j] for i, j in zip([0] + indices, indices + [None])]
The zip gives you a list of tuples analogous to (indices[i], indices[i + 1]), with the concatenated [0] allowing you to extract the first part and [None] extracting the last slice (stuff[i:])

You could try something like the following:
list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
output = [[]]
for x in list_of_strings:
output[-1].append(x)
if x == 'c':
output.append([])
Though it should be noted that this will append an empty list to your output if your input's last element is 'c'

def spliter(value, array):
res = []
while value in array:
index = array.index(value)
res.append(array[:index + 1])
array = array[index + 1:]
if array:
# Append last elements
res.append(array)
return res
a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
print(spliter('b',a))
# [['a', 'b'], ['c', 'a', 'd', 'c', 'e']]
print(spliter('c',a))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

What about this. It should only iterate over the input once and some of that is in the index method, which is executed as native code.
def splitkeep(v, c):
curr = 0
try:
nex = v.index(c)
while True:
yield v[curr: (nex + 1)]
curr = nex + 1
nex += v[curr:].index(c) + 1
except ValueError:
if v[curr:]: yield v[curr:]
print(list(splitkeep( ['a', 'b', 'c', 'a', 'd', 'c', 'e'], 'c')))
result
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
I wasn't sure if you wanted to keep an empty list at the end of the result if the final value was the value you were splitting on. I made an assumption you wouldn't, so I put a condition in excluding the final value if it's empty.
This has the result that the input [] results in only [] when arguably it might result in [[]].

How about this rather playful script:
a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
b = ''.join(a).split('c') # ['ab', 'ad', 'e']
c = [x + 'c' if i < len(b)-1 else x for i, x in enumerate(b)] # ['abc', 'adc', 'e']
d = [list(x) for x in c if x]
print(d) # [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
It can also handle beginnings and endings with a "c"
a = ['c', 'a', 'b', 'c', 'a', 'd', 'c', 'e', 'c']
d -> [['c'], ['a', 'b', 'c'], ['a', 'd', 'c'], ['e', 'c']]

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
value = 'c'
new_list = []
temp_list = []
for item in list_of_strings:
if item is value:
temp_list.append(item)
new_list.append(temp_list[:])
temp_list.clear()
else:
temp_list.append(item)
if (temp_list):
new_list.append(temp_list)
print(new_list)

You can try using below snippet. Use more_itertools
>>> l = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
>>> from more_itertools import sliced
>>> list(sliced(l,l.index('c')+1))
Output is:
[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keep strings that occur N times or more - python

[s for s, c in counts.iteritems() if c >= 2] # => ['a', 'c', 'b']

Try this... def get_duplicatesarrval(arrval): dup_array = arrval[:] for i in set(arrval): dup_array.remove(i) return list(set(dup_array)) mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'] print get_duplicatesarrval(mylist) Result: [a, b, c]

The usual way would be to use a list comprehension as #Adaman does. In the special case of 2 or more, you can also subtract one Counter from another >>> counts = Counter(mylist) - Counter(set(mylist)) >>> counts.keys() ['a', 'c', 'b']

from itertools import groupby mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'] res = [i for i,j in groupby(mylist) if len(list(j))>=2] print res ['a', 'b', 'c']

I think above mentioned answers are better, but I believe this is the simplest method to understand: mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'] newlist=[] newlist.append(mylist[0]) for i in mylist: if i in newlist: continue else: newlist.append(i) print newlist >>>['a', 'b', 'c', 'd']

Related

Group items if trailed by string [duplicate]

Removing duplicate characters from a list in Python where the pattern repeats

Python: How to update dictionary with step-index from list

How can I copy each element of a list a distinct, specified number of times?

Split List By Value and Keep Separators

Categories

Resources