I am trying to understand the process of creating a function that can replace duplicate strings in a list of strings. for example, I want to convert this list
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
to this
mylist = ['a', 'b', 'x', 'x', 'c', 'x']
initially, I know I need create my function and iterate through the list
def replace(foo):
newlist= []
for i in foo:
if foo[i] == foo[i+1]:
foo[i].replace('x')
return foo
However, I know there are two problems with this. the first is that I get an error stating
list indices must be integers or slices, not str
so I believe I should instead be operating on the range of this list, but I'm not sure how to implement it. The other being that this would only help me if the duplicate letter comes directly after my iteration (i).
Unfortunately, that's as far as my understanding of the problem reaches. If anyone can provide some clarification on this procedure for me, I would be very grateful.
Go through the list, and keep track of what you've seen in a set. Replace things you've seen before in the list with 'x':
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
seen = set()
for i, e in enumerate(mylist):
if e in seen:
mylist[i] = 'x'
else:
seen.add(e)
print(mylist)
# ['a', 'b', 'x', 'x', 'c', 'x']
Simple Solution.
my_list = ['a', 'b', 'b', 'a', 'c', 'a']
new_list = []
for i in range(len(my_list)):
if my_list[i] in new_list:
new_list.append('x')
else:
new_list.append(my_list[i])
print(my_list)
print(new_list)
# output
#['a', 'b', 'b', 'a', 'c', 'a']
#['a', 'b', 'x', 'x', 'c', 'x']
The other solutions use indexing, which isn't necessarily required.
Really simply, you could check if the value is in the new list, else you can append x. If you wanted to use a function:
old = ['a', 'b', 'b', 'a', 'c']
def replace_dupes_with_x(l):
tmp = list()
for char in l:
if char in tmp:
tmp.append('x')
else:
tmp.append(char)
return tmp
new = replace_dupes_with_x(old)
You can use the following solution:
from collections import defaultdict
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
ret, appear = [], defaultdict(int)
for c in mylist:
appear[c] += 1
ret.append(c if appear[c] == 1 else 'x')
Which will give you:
['a', 'b', 'x', 'x', 'c', 'x']
Related
I am trying to insert an element in the list at multiple instances. But by doing this, the length of the list is constantly changing. So, it is not reaching the last element.
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
aq = len(my_list)
for i in range(aq):
if my_list[i] == 'a':
my_list.insert(i+1, 'g')
aq = aq+1
print(my_list)
The output I am getting is -
['a', 'g', 'b', 'c', 'd', 'e', 'a']
The output I am trying to get is -
['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
How can I get that?
Changing aq in the loop does not change the range. That created an iterator when you entered the loop, and that iterator won't change. There are two ways to do this. The easy way is to build a new list:
newlist = []
for c in my_list:
newlist.append(c)
if c == 'a':
newlist.append('g')
The trickier way is to use .find() to find the next instance of 'a' and insert a 'g' after it, then keep searching for the next one.
Here is a nice way to write it using the built-in itertools.chain.from_iterable:
from itertools import chain
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
my_list = list(chain.from_iterable((x, "g") if x == "a" else x for x in my_list))
# ['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
Here, every occurance of "a" is replaced with "a", "g" in the list, otherwise the elements are left alone.
Say I have a list of options and I want to pick a certain number randomly.
In my case, say the options are in a list ['a', 'b', 'c', 'd', 'e'] and I want my script to return 3 elements.
However, there is also the case of two options that cannot appear at the same time. That is, if option 'a' is picked randomly, then option 'b' cannot be picked. And the same applies the other way round.
So valid outputs are: ['a', 'c', 'd'] or ['c', 'd', 'b'], while things like ['a', 'b', 'c'] would not because they contain both 'a' and 'b'.
To fulfil these requirements, I am fetching 3 options plus another one to compensate a possible discard. Then, I keep a set() with the mutually exclusive condition and keep removing from it and check if both elements have been picked or not:
import random
mutually_exclusive = set({'a', 'b'})
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
shuffled_options = random.sample(options, num_options_to_return + 1)
elements_returned = 0
for item in shuffled_options:
if elements_returned >= num_options_to_return:
break
if item in mutually_exclusive:
mutually_exclusive.remove(item)
if not mutually_exclusive:
# if both elements have appeared, then the set is empty so we cannot return the current value
continue
print(item)
elements_returned += 1
However, I may be overcoding and Python may have better ways to handle these requirements. Going through random's documentation I couldn't find ways to do this out of the box. Is there a better solution than my current one?
One way to do this is use itertools.combinations to create all of the possible results, filter out the invalid ones and make a random.choice from that:
>>> from itertools import combinations
>>> from random import choice
>>> def is_valid(t):
... return 'a' not in t or 'b' not in t
...
>>> choice([
... t
... for t in combinations('abcde', 3)
... if is_valid(t)
... ])
...
('c', 'd', 'e')
Maybe a bit naive, but you could generate samples until your condition is met:
import random
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
mutually_exclusive = set({'a', 'b'})
while True:
shuffled_options = random.sample(options, num_options_to_return)
if all (item not in mutually_exclusive for item in shuffled_options):
break
print(shuffled_options)
You can restructure your options.
import random
options = [('a', 'b'), 'c', 'd', 'e']
n_options = 3
selected_option = random.sample(options, n_options)
result = [item if not isinstance(item, tuple) else random.choice(item)
for item in selected_option]
print(result)
I would implement it with sets:
import random
mutually_exclusive = {'a', 'b'}
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
while True:
s = random.sample(options, num_options_to_return)
print('Sample is', s)
if not mutually_exclusive.issubset(s):
break
print('Discard!')
print('Final sample:', s)
Prints (for example):
Sample is ['a', 'b', 'd']
Discard!
Sample is ['b', 'a', 'd']
Discard!
Sample is ['e', 'a', 'c']
Final sample: ['e', 'a', 'c']
I created the below function and I think it's worth sharing it too ;-)
def random_picker(options, n, mutually_exclusives=None):
if mutually_exclusives is None:
return random.sample(options, n)
elif any(len(pair) != 2 for pair in mutually_exclusives):
raise ValueError('Lenght of pairs of mutually_exclusives iterable, must be 2')
res = []
while len(res) < n:
item_index = random.randint(0, len(options) - 1)
item = options[item_index]
if any(item == exc and pair[-(i - 1)] in res for pair in mutually_exclusives
for i, exc in enumerate(pair)):
continue
res.append(options.pop(item_index))
return res
Where:
options is the list of available options to pick from.
n is the number of items you want to be picked from options
mutually_exclusives is an iterable containing tuples pairs of mutually exclusive items
You can use it as follows:
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3)
['c', 'e', 'a']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b')])
['d', 'b', 'e']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b'), ('a', 'c')])
['e', 'd', 'a']
import random
l = [['a','b'], ['c'], ['d'], ['e']]
x = [random.choice(i) for i in random.sample(l,3)]
here l is a two-dimensional list, where the fist level reflects an and relation and the second level an or relation.
My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.
One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']
Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>
Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.
Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']
If I have a list like so:
List = ['a', 'b', 'c', 'd', 'e',]
what is the simplest way to get a Boolean answer if, I for instance asked it if 'g' was inside this list?
print 'g' in ['a', 'b', 'c', 'd']
l = ['a','b','c','d']
if 'g' in l:
print True
List = ['a', 'b', 'c', 'd', 'e']
def inlist (lst, character):
if character in lst and type(lst) is list:
return True
else:
return False
print inlist(List, 'g')
As you expect, this prints: False
NOTE: try to name your lists other than List, as that can cause some confusion when reading.
I wrote a function to create a nested list.
For example:
input= ['a','b','c','','d','e','f','g','','d','s','d','a','']
I want to create a sublist before ''
As a return I want a nested list like:
[['a','b','c'],['d','e','f','g'],['d','s','d','a']]
Try the following implementation
>>> def foo(inlist, delim = ''):
start = 0
try:
while True:
stop = inlist.index(delim, start)
yield inlist[start:stop]
start = stop + 1
except ValueError:
# if '' may not be the end delimiter
if start < len(inlist):
yield inlist[start:]
return
>>> list(foo(inlist))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
Another possible implementation could be by itertools.groupby. But then you have to filter the result to remove the ['']. But though it might look to be one-liner yet the above implementation is more pythonic as its intuitive and readable
>>> from itertools import ifilter, groupby
>>> list(ifilter(lambda e: '' not in e,
(list(v) for k,v in groupby(inlist, key = lambda e:e == ''))))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
I'd use itertools.groupby:
l = ['a','b','c','','d','e','f','g','','d','s','d','a','']
from itertools import groupby
[list(g) for k, g in groupby(l, bool) if k]
gives
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
def nester(nput):
out = [[]]
for n in nput:
if n == '':
out.append([])
else:
out[-1].append(n)
if out[-1] == []:
out = out[:-1]
return out
edited to add check for empty list at end