List with duplicated values and suffix - python

I have a list, a:
a = ['a','b','c']
and need to duplicate some values with the suffix _ind added this way (order is important):
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
I tried:
b = [[x, x + '_ind'] for x in a]
c = [item for sublist in b for item in sublist]
print (c)
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
Is there some better, more pythonic solution?

You could make it a generator:
def mygen(lst):
for item in lst:
yield item
yield item + '_ind'
>>> a = ['a','b','c']
>>> list(mygen(a))
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
You could also do it with itertools.product, itertools.starmap or itertools.chain or nested comprehensions but in most cases I would prefer a simple to understand, custom generator-function.
With python3.3, you can also use yield from—generator delegation—to make this elegant solution just a bit more concise:
def mygen(lst):
for item in lst:
yield from (item, item + '_ind')

It can be shortened a little bit by moving the options to the inner for loop in the list comprehension:
a = ['a','b','c']
[item for x in a for item in (x, x + '_ind')]
# ['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']

Another alternative with splicing (Python2.x, 3.x):
result = [None] * len(a) * 2
result[::2], result[1::2] = a, map(lambda x: x + '_ind', a)
result
# ['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']

You can use itertools.chain():
import itertools
l = ['a','b','c']
new_list = list(itertools.chain.from_iterable([[i, i+"_ind"] for i in l]))
print new_list
Output:
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']

Before list comprehensions and generators were invented/became widespread, people used to think much simpler1:
>>> a = ['a', 'b', 'c']
>>> b = []
>>> for x in a: b.extend([x, x+'_ind'])
...
>>> b
['a', 'a_ind', 'b', 'b_ind', 'c', 'c_ind']
* I don't mean that those constructs/tools are evil, just wanted to point out that there is a simple solution.

Since you asked for "simple", I thought I'd throw this in (albeit, maybe not the pythonic way):
for i in mylist:
mylist1.append(i);
mylist1.append(i + '_ind');

Related

Remove NOT duplicates value from list

The scenario is this something like this:
After joining several lists using:
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3
mainlist.sort()
mainlist now looks like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']
I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.
I tried to use this approach but seems something isn't working:
for i in mainlist:
if mainlist.count(i) <= 1:
mainlist.remove(i)
else:
continue
but what I return is a list that looks like the following:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?
What i would like to return is a list like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted
I can delete the duplicates with the below code:
for i in mainlist:
if mainlist.count(i) > 1:
mainlist.remove(i)
else:
continue
and then as a final result:
mainlist = ['A','B','C']
But the real question is: how can I delete the non-duplicates in a list?
You can find duplicates like this:
duplicates = [item for item in mainlist if mainlist.count(item) > 1]
You can use collections.Counter() to keep track of the frequencies of each item:
from collections import Counter
counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]
This outputs:
['A', 'A', 'B', 'B', 'C', 'C']
Use collections.Counter to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.
from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3
cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})
dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']
Another solution, using numpy:
u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)
Prints:
['A' 'A' 'B' 'B' 'C' 'C']
Your problem lies in you operating on the while iterating over it. After removing the "D" the loops stops because there are no more elements in the list as the "E" at index 6.
Create a copy of the list and only operate on that list:
new_list = list(mainlist)
for i in mainlist:
if mainlist.count(i) <= 1:
new_list.remove(i)
else:
continue
If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
fulllist = list1 + list2 + list3
fullset = set(list1) | set(list2) | set(list3)
dups = [x for x in fullset if fulllist.count(x) > 1]
print(dups) # ['A', 'C', 'B']

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Python - Removing duplicates elements while remain the index

I have two lists like:
x = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
I wanna remove the duplicates elements of the list, and it can be fulfilled by the answer in Removing elements that have consecutive duplicates
However, the ouput I expect is like
['A','B','C','D']
['0101','0104','0106','0109']
That is
For x, I wanna remove the duplicate elements.
For list_date, I wanna remain the dates based on the remaining elements in x.
Do you have any way to implement this?
2020-06-14 updated:
Thank you for the answers!
My data also has the case
y = ['A','A','A','B','B','C','C','C','D','A','A','C','C','B','B','B']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109','0110','0111','0112','0113','0114','0115','0116']
The output should be
['A','B','C','D','A','C','B']
['0101','0104','0106','0109','0110','0112','0114']
How should I process the list like this?
You can use zip() to couple your data to your dates, use a loop and a set to remove dupes and zip() again to get single lists from it:
x = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
r = []
k = zip(x,list_date)
s = set()
# go over the zipped values and remove dupes
for el in k:
if el[0] in s:
continue
# not a dupe, add to result and set
r.append(el)
s.add(el[0])
data, dates = map(list, zip(*r))
print(data)
print(dates)
Output:
['A', 'B', 'C', 'D']
['0101', '0104', '0106', '0109']
See How to iterate through two lists in parallel?
Try this below :
x = ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'D']
list_date = ['0101', '0102', '0103', '0104', '0105', '0106', '0107', '0108', '0109']
op = dict()
y = []
for i in range(len(x)):
if x[i] not in y:
y.append(x[i])
op[x[i]] = list_date[i]
z = list(op.values())
print(y)
print(z)
Output
['A', 'B', 'C', 'D']
['0101', '0104', '0106', '0109']
You Can use zip function to nail this
l = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
new_l = []
new_list_date = []
for i,j in zip(l,list_date):
if i not in new_l:
new_l.append(i)
new_list_date.append(j)
print(new_l)
#['A', 'B', 'C', 'D']
print(new_list_date)
#['0101', '0104', '0106', '0109']

Splitting string values in list into individual values, Python

I have a list of values in which some values are words separated by commas, but are considered single strings as shown:
l = ["a",
"b,c",
"d,e,f"]
#end result should be
#new_list = ['a','b','c','d','e','f']
I want to split those strings and was wondering if there's a one liner or something short to do such a mutation. So far what, I was thinking of just iterating through l and .split(',')-ing all the elements then merging, but that seems like it would take a while to run.
import itertools
new_list = list(itertools.chain(*[x.split(',') for x in l]))
print(new_list)
>>> ['a', 'b', 'c', 'd', 'e', 'f']
Kind of unusual but you could join all your elements with , and then split them:
l = ["a",
"b,c",
"d,e,f"]
newList = ','.join(l).split(',')
print(newList)
Output:
['a', 'b', 'c', 'd', 'e', 'f']
Here's a one-liner using a (nested) list comprehension:
new_list = [item for csv in l for item in csv.split(',')]
See it run here.
Not exactly a one-liner, but 2 lines:
>>> l = ["a",
"b,c",
"d,e,f"]
>>> ll =[]
>>> [ll.extend(x.split(',')) for x in l]
[None, None, None]
>>> ll
['a', 'b', 'c', 'd', 'e', 'f']
The accumulator needs to be created separately since x.split(',') can not be unpacked inside a comprehension.

Concise way to remove elements from list by index in Python

I have a list of characters and list of indexes
myList = ['a','b','c','d']
toRemove = [0,2]
and I'd like to get this in one operation
myList = ['b','d']
I could do this but is there is a way to do it faster?
toRemove.reverse()
for i in toRemove:
myList.pop(i)
Concise answer
>>> myList = ['a','b','c','d']
>>> toRemove = [0,2]
>>>
>>> [v for i, v in enumerate(myList) if i not in toRemove]
['b', 'd']
>>>
You could use a list comprehension as other answers have suggested, but to make it truly faster I would suggest using a set for the set of indices you want removed.
>>> myList = ['a','b','c','d']
>>> toRemove = set([0,2])
>>> [x for i,x in enumerate(myList) if i not in toRemove]
['b', 'd']
Checking every element in myList against every element in toRemove is O(n*m) (where n is the length of myList and m is the length of toRemove). If you use a set, checking for membership is O(1), so the whole procedure becomes O(n). Keep in mind though, the difference in speed will not be noticeable unless toRemove is really big (say more than a thousand).
If you wanted to, you could use numpy.
import numpy as np
myList = ['a','b','c','d']
toRemove = [0,2]
new_list = np.delete(myList, toRemove)
Result:
>>> new_list
array(['b', 'd'],
dtype='|S1')
Note that new_list is a numpy array.
One-liner:
>>>[myList[x] for x in range(len(myList)) if not x in [0,2]]
['b', 'd']
You could write a function to do it for you.
def removethese(list, *args):
for arg in args:
del list[arg]
Then do
mylist = ['a', 'b', 'c', 'd', 'e']
removethese(mylist, 0, 1, 4)
mylist now is ['c', 'd']

Categories