Related
I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.
Say I have a list of options and I want to pick a certain number randomly.
In my case, say the options are in a list ['a', 'b', 'c', 'd', 'e'] and I want my script to return 3 elements.
However, there is also the case of two options that cannot appear at the same time. That is, if option 'a' is picked randomly, then option 'b' cannot be picked. And the same applies the other way round.
So valid outputs are: ['a', 'c', 'd'] or ['c', 'd', 'b'], while things like ['a', 'b', 'c'] would not because they contain both 'a' and 'b'.
To fulfil these requirements, I am fetching 3 options plus another one to compensate a possible discard. Then, I keep a set() with the mutually exclusive condition and keep removing from it and check if both elements have been picked or not:
import random
mutually_exclusive = set({'a', 'b'})
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
shuffled_options = random.sample(options, num_options_to_return + 1)
elements_returned = 0
for item in shuffled_options:
if elements_returned >= num_options_to_return:
break
if item in mutually_exclusive:
mutually_exclusive.remove(item)
if not mutually_exclusive:
# if both elements have appeared, then the set is empty so we cannot return the current value
continue
print(item)
elements_returned += 1
However, I may be overcoding and Python may have better ways to handle these requirements. Going through random's documentation I couldn't find ways to do this out of the box. Is there a better solution than my current one?
One way to do this is use itertools.combinations to create all of the possible results, filter out the invalid ones and make a random.choice from that:
>>> from itertools import combinations
>>> from random import choice
>>> def is_valid(t):
... return 'a' not in t or 'b' not in t
...
>>> choice([
... t
... for t in combinations('abcde', 3)
... if is_valid(t)
... ])
...
('c', 'd', 'e')
Maybe a bit naive, but you could generate samples until your condition is met:
import random
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
mutually_exclusive = set({'a', 'b'})
while True:
shuffled_options = random.sample(options, num_options_to_return)
if all (item not in mutually_exclusive for item in shuffled_options):
break
print(shuffled_options)
You can restructure your options.
import random
options = [('a', 'b'), 'c', 'd', 'e']
n_options = 3
selected_option = random.sample(options, n_options)
result = [item if not isinstance(item, tuple) else random.choice(item)
for item in selected_option]
print(result)
I would implement it with sets:
import random
mutually_exclusive = {'a', 'b'}
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
while True:
s = random.sample(options, num_options_to_return)
print('Sample is', s)
if not mutually_exclusive.issubset(s):
break
print('Discard!')
print('Final sample:', s)
Prints (for example):
Sample is ['a', 'b', 'd']
Discard!
Sample is ['b', 'a', 'd']
Discard!
Sample is ['e', 'a', 'c']
Final sample: ['e', 'a', 'c']
I created the below function and I think it's worth sharing it too ;-)
def random_picker(options, n, mutually_exclusives=None):
if mutually_exclusives is None:
return random.sample(options, n)
elif any(len(pair) != 2 for pair in mutually_exclusives):
raise ValueError('Lenght of pairs of mutually_exclusives iterable, must be 2')
res = []
while len(res) < n:
item_index = random.randint(0, len(options) - 1)
item = options[item_index]
if any(item == exc and pair[-(i - 1)] in res for pair in mutually_exclusives
for i, exc in enumerate(pair)):
continue
res.append(options.pop(item_index))
return res
Where:
options is the list of available options to pick from.
n is the number of items you want to be picked from options
mutually_exclusives is an iterable containing tuples pairs of mutually exclusive items
You can use it as follows:
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3)
['c', 'e', 'a']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b')])
['d', 'b', 'e']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b'), ('a', 'c')])
['e', 'd', 'a']
import random
l = [['a','b'], ['c'], ['d'], ['e']]
x = [random.choice(i) for i in random.sample(l,3)]
here l is a two-dimensional list, where the fist level reflects an and relation and the second level an or relation.
Rationale: I start a script with a list that contains multiple heaps. At every step, I want to "join" heaps, but keep both joined indexes pointing to the same element.
In short, given a list of mutable elements, such that list[a] is list[b] returns True, how can I assign (list[a] = [new]) while keeping list[a] and list[b] references to the same mutable container?
So for example, if each heap was represented by a letter, we would have
t is ['a', 'b', 'c', 'd']
t is ['ac', 'b', 'ac', 'd'] (0 and 2 merge)
t is ['abc', 'abc', 'abc', 'd'] (0 and 1 merge, but since 0 also refers to 2, it is as if 0/2 and 1 merged... into 0/1/2)
at this stage, if I did a set(map(id, t)), I would expect it to only have TWO elements.
my problem is that I cannot seem to affect the object being pointed to directly, so I have to iterate over the entire list, picking out any ids that match either merge index, and assign directly.
is there any way to change the underlying object rather than all pointers to it?
Full Example of desired behavior:
>>> my_list = [['a'], ['b'], ['c']]
>>> merge(my_list, 0, 2) # mutates, returns None
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] is my_list[2]
True
>>> merge(my_list, 0, 1)
>>> my_list
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
>>> my_list[0] is my_list[1]
True
>>> my_list[1] is my_list[2]
True
The problem is that, if within merge I simply call
my_list[arg1] = my_list[arg2] = my_list[arg1]+my_list[arg2]
it ONLY affects the entries at arg1 and arg2. I want it to affect any other entries that may point to the elements at either my_list[arg1] or my_list[arg2], so that eventually my_list is just a collection of pointers to the same big heap, that has absorbed all the little ones.
This is as close as you are going to get:
def merge(primary, first, second):
primary[first] += primary[second]
primary[second] = primary[first]
first = ['a']
second = ['b']
third = ['c']
main = [first, second, third]
print(main)
merge(main, 0, 2)
print(main)
assert main[0] is main[2]
merge(main, 0, 1)
print(main)
assert main[1] is main[2]
print(first, second, third)
and the printout:
[['a'], ['b'], ['c']]
[['a', 'c'], ['b'], ['a', 'c']]
[['a', 'c', 'b'], ['a', 'c', 'b'], ['a', 'c', 'b']]
(['a', 'c', 'b'], ['b'], ['c'])
As you can see, the list elements all end up being the same object, but the lists that went into the main list are not.
edit
Unfortunately, this also fails if you don't include the first list element in every merge.
So there are no short-cuts here. If you want every element to either be the same element, or just be equal, you are going to have to traverse the list to get it done:
def merge(primary, first, second):
targets = primary[first], primary[second]
result = primary[first] + primary[second]
for index, item in enumerate(primary):
if item in targets:
primary[index] = result
Chain assignment:
>>> my_list = [['a'], ['b'], ['c']]
>>> my_list[0] = my_list[2] = my_list[0]+my_list[2]
>>> my_list
[['a', 'c'], ['b'], ['a', 'c']]
>>> my_list[0] += [1,2,3]
>>> my_list
[['a', 'c', 1, 2, 3], ['b'], ['a', 'c', 1, 2, 3]]
>>>
If you do a=c=a+c first, it produce a==c==_list_object_001; when you do a=b=a+b 2nd, it produce a==b==_list_object_002, so it went wrong.
You can't do this even with a pointer if you don't take care of the order of assignment. I think you should maintain a map to get things right.
As far as I can see, I don't think it's possible to do this using the mutability of python lists. If you have elements X and Y, both of which are already amalgamations of other elements, and you want to merge them, list mutability will not be able to telegraph the change to both sides. You can make X, Y and all the elements pointing at the same object as X be the same, or you can make X, Y and all the elements pointing at the same object as Y be the same, but not both.
I think the best bet is to define a custom object to represent the elements, which has a notion of parent nodes and ultimate parent nodes.
class Nodes(object):
def __init__(self, input1):
self.parent = None
self.val = [input1]
def ultimate_parent(self):
if self.parent:
return self.parent.ultimate_parent()
else:
return self
def merge(self, child):
self.ultimate_parent().val += child.ultimate_parent().val
child.ultimate_parent().parent = self.ultimate_parent()
def __repr__(self):
return str(self.ultimate_parent().val)
def merge(node_list, i, j):
node_list[i].merge(node_list[j])
list1 = [Nodes(x) for x in 'abcd']
print list1
merge(list1, 0, 2)
print list1
merge(list1, 1, 3)
print list1
merge(list1, 0, 1)
print list1
This will output:
[['a', 'c'],['b'],['a', 'c'],['d']]
[['a', 'c'],['b', 'd'],['a', 'c'],['b', 'd']]
[['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd'],['a', 'c', 'b', 'd']]
My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.
One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']
Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>
Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.
Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']
I wrote a function to create a nested list.
For example:
input= ['a','b','c','','d','e','f','g','','d','s','d','a','']
I want to create a sublist before ''
As a return I want a nested list like:
[['a','b','c'],['d','e','f','g'],['d','s','d','a']]
Try the following implementation
>>> def foo(inlist, delim = ''):
start = 0
try:
while True:
stop = inlist.index(delim, start)
yield inlist[start:stop]
start = stop + 1
except ValueError:
# if '' may not be the end delimiter
if start < len(inlist):
yield inlist[start:]
return
>>> list(foo(inlist))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
Another possible implementation could be by itertools.groupby. But then you have to filter the result to remove the ['']. But though it might look to be one-liner yet the above implementation is more pythonic as its intuitive and readable
>>> from itertools import ifilter, groupby
>>> list(ifilter(lambda e: '' not in e,
(list(v) for k,v in groupby(inlist, key = lambda e:e == ''))))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
I'd use itertools.groupby:
l = ['a','b','c','','d','e','f','g','','d','s','d','a','']
from itertools import groupby
[list(g) for k, g in groupby(l, bool) if k]
gives
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
def nester(nput):
out = [[]]
for n in nput:
if n == '':
out.append([])
else:
out[-1].append(n)
if out[-1] == []:
out = out[:-1]
return out
edited to add check for empty list at end