Removing Elements in A from B - python

I have two lists, let's say:
a = [1,2,3]
b = [1,2,3,1,2,3]
I would like to remove 1, 2 and 3 from list b, but not all occurrences. The resulting list should have:
b = [1,2,3]
I currently have:
for element in a:
try:
b.remove(element)
except ValueError:
pass
However, this has poor performance when a and b get very large. Is there a more efficient way of getting the same results?
EDIT
To clarify 'not all occurrences', I mean I do not wish to remove both '1's from b, as there was only one '1' in a.

I would do this:
set_a = set(a)
new_b = []
for x in b:
if x in set_a:
set_a.remove(x)
else:
new_b.append(x)
Unlike the other set solutions, this maintains order in b (if you care about that).

I would do something like this:
from collections import defaultdict
a = [1, 2, 3]
b = [1, 2, 3, 1, 2, 3]
# Build up the count of occurrences in b
d = defaultdict(int)
for bb in b:
d[bb] += 1
# Remove one for each occurrence in a
for aa in a:
d[aa] -= 1
# Create a list for all elements that still have a count of one or more
result = []
for k, v in d.iteritems():
if v > 0:
result += [k] * v
Or, if you are willing to be slightly more obscure:
from operator import iadd
result = reduce(iadd, [[k] * v for k, v in d.iteritems() if v > 0], [])
defaultdict generates a count of the occurrences of each key. Once it has been built up from b, it is decremented for each occurrence of a key in a. Then we print out the elements that are still left over, allowing them to occur multiple times.
defaultdict works with python 2.6 and up. If you are using a later python (2.7 and up, I believe), you can look into collections.Counter.
Later: you can also generalize this and create subtractions of counter-style defaultdicts:
from collections import defaultdict
from operator import iadd
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
def build_dd(lst):
d = defaultdict(int)
for item in lst:
d[item] += 1
return d
def subtract_dd(left, right):
return {k: left[k] - v for k, v in right.iteritems()}
db = build_dd(b)
da = build_dd(a)
result = reduce(iadd,
[[k] * v for k, v in subtract_dd(db, da).iteritems() if v > 0],
[])
print result
But the reduce expression is pretty obscure now.
Later still: in python 2.7 and later, using collections.Counter, it looks like this:
from collections import Counter
base = [1, 2, 3]
missing = [4, 5, 6]
extra = [7, 8, 9]
a = base + missing
b = base * 4 + extra
result = Counter(b) - Counter(a)
print result
assert result == dict([(k, 3) for k in base] + [(k, 1) for k in extra])

Generally, you want to always avoid list.remove() (you are right, it would hurt the performance really badly). Also, it is much faster (O(1)) to look up elements in a dictionary or a set than in a list; so create a set out of your list1 (and if order doesn't matter, out of your list2).
Something like this:
sa = set(a)
new_b = [x for x in b if not x in sa]
# here you created a 3d list but I bet it's OK.
However I have no idea what is your actual algo for choosing elements for removal. Please elaborate on "but not all occurrences".

Related

Filtering a (Nx1) list in Python

I have a list of the form
[(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
I want to scan the list and return those elements whose (i,1) are repeated. (I apologize I couldn't frame this better)
For example, in the given list the pairs are (2,3),(4,3) and I see that 3 is repeated so I wish to return 2 and 4. Similarly, from (3,4),(1,4),(5,4) I will return 3, 1, and 5 because 4 is repeated.
I have implemented the bubble search but that is obviously very slow.
for i in range(0,p):
for j in range(i+1,p):
if (arr[i][1] == arr[j][1]):
print(arr[i][0],arr[j][0])
How do I go about it?
You can use collections.defaultdict. This will return a mapping from the second item to a list of first items. You can then filter for repetition via a dictionary comprehension.
from collections import defaultdict
lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
d = defaultdict(list)
for i, j in lst:
d[j].append(i)
print(d)
# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})
res = {k: v for k, v in d.items() if len(v)>1}
print(res)
# {3: [2, 4], 4: [3, 1, 5]}
Using numpy allows to avoid for loops:
import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
a = np.array(l)
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates
# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output
(toggle comment to get output in form of tuples)
pandas is another option:
import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

Group repeated elements of a list

I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.
result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]
Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]
You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)
This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]

Create a complement of list preserving duplicate values

Given list a = [1, 2, 2, 3] and its sublist b = [1, 2] find a list complementing b in such a way that sorted(a) == sorted(b + complement). In the example above the complement would be a list of [2, 3].
It is tempting to use list comprehension:
complement = [x for x in a if x not in b]
or sets:
complement = list(set(a) - set(b))
However, both of this ways will return complement = [3].
An obvious way of doing it would be:
complement = a[:]
for element in b:
complement.remove(element)
But that feels deeply unsatisfying and not very Pythonic. Am I missing an obvious idiom or is this the way?
As pointed out below what about performance this is O(n^2) Is there more efficient way?
The only more declarative and thus Pythonic way that pops into my mind and that improves performance for large b (and a) is to use some sort of counter with decrement:
from collections import Counter
class DecrementCounter(Counter):
def decrement(self,x):
if self[x]:
self[x] -= 1
return True
return False
Now we can use list comprehension:
b_count = DecrementCounter(b)
complement = [x for x in a if not b_count.decrement(x)]
Here we thus keep track of the counts in b, for each element in a we look whether it is part of b_count. If that is indeed the case we decrement the counter and ignore the element. Otherwise we add it to the complement. Note that this only works, if we are sure such complement exists.
After you have constructed the complement, you can check if the complement exists with:
not bool(+b_count)
If this is False, then such complement cannot be constructed (for instance a=[1] and b=[1,3]). So a full implementation could be:
b_count = DecrementCounter(b)
complement = [x for x in a if not b_count.decrement(x)]
if +b_count:
raise ValueError('complement cannot be constructed')
If dictionary lookup runs in O(1) (which it usually does, only in rare occasions it is O(n)), then this algorithm runs in O(|a|+|b|) (so the sum of the sizes of the lists). Whereas the remove approach will usually run in O(|a|×|b|).
In order to reduce complexity to your already valid approach, you could use collections.Counter (which is a specialized dictionary with fast lookup) to count items in both lists.
Then update the count by substracting values, and in the end filter the list by only keeping items whose count is > 0 and rebuild it/chain it using itertools.chain
from collections import Counter
import itertools
a = [1, 2, 2, 2, 3]
b = [1, 2]
print(list(itertools.chain.from_iterable(x*[k] for k,x in (Counter(a)-Counter(b)).items() if x > 0)))
result:
[2, 2, 3]
O(n log n)
a = [1, 2, 2, 3]
b = [1, 2]
a.sort()
b.sort()
L = []
i = j = 0
while i < len(a) and j < len(b):
if a[i] < b[j]:
L.append(a[i])
i += 1
elif a[i] > b[j]:
L.append(b[j])
j += 1
else:
i += 1
j += 1
while i < len(a):
L.append(a[i])
i += 1
while j < len(b):
L.append(b[j])
j += 1
print(L)
If the order of elements in the complement doesn't matter, then collections.Counter is all that is needed:
from collections import Counter
a = [1, 2, 3, 2]
b = [1, 2]
complement = list((Counter(a) - Counter(b)).elements()) # complement = [2, 3]
If the order of items in the complement should be the same order as in the original list, then use something like this:
from collections import Counter, defaultdict
from itertools import count
a = [1,2,3,2]
b = [2,1]
c = Counter(b)
d = defaultdict(count)
complement = [x for x in a if next(d[x]) >= c[x]] # complement = [3, 2]
Main idea: if the values are not unique, make them unique
def add_duplicate_position(items):
element_counter = {}
for item in items:
element_counter[item] = element_counter.setdefault(item,-1) + 1
yield element_counter[item], item
assert list(add_duplicate_position([1, 2, 2, 3])) == [(0, 1), (0, 2), (1, 2), (0, 3)]
def create_complementary_list_with_duplicates(a,b):
a = list(add_duplicate_position(a))
b = set(add_duplicate_position(b))
return [item for _,item in [x for x in a if x not in b]]
a = [1, 2, 2, 3]
b = [1, 2]
assert create_complementary_list_with_duplicates(a,b) == [2, 3]

Python - split list of lists by value

I want to split the following list of lists
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
into the three following lists of lists:
a1 = [["aa",1,3]
["aa",3,3]]
a2 = [["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]]
a3 = [["fffffff",1,3]]
That is, according to the first value of each list. I need to do this for a list of lists with thousands of elements... How can I do it efficiently?
You're better off making a dictionary. If you really want to make a bunch of variables, you'll have to use globals(), which isn't really recommended.
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
d = {}
for sub in a:
key = sub[0]
if key not in d: d[key] = []
d[key].append(sub)
OR
import collections
d = collections.defaultdict(list)
for sub in a:
d[sub[0]].append(sub)
If input is sorted on first element:
from itertools import groupby
from operator import itemgetter
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
b = { k : list(v) for k, v in groupby(a, itemgetter(0))}
Create a dictionary with the first element as key and matching lists as value. And you will get a dictionary where value of each key value pair will be group of lists having same first element. For example,
a = [["aa", 1, 3],
["aa", 3, 3],
["sdsd", 1, 3],
["sdsd", 6, 0],
["sdsd", 2, 5],
["fffffff", 1, 3]]
d = {}
for e in a:
d[e[0]] = d.get(e[0]) or []
d[e[0]].append(e)
And now you can simply get the lists seperately,
a1 = d['aa']
a2 = d['sdsd']
A defaultdict will work nicely here:
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
from collections import defaultdict
d = defaultdict(list)
for thing in a:
d[thing[0]] += thing,
for separate_list in d.values():
print separate_list
Output
[['aa', 1, 3], ['aa', 3, 3]]
[['sdsd', 1, 3], ['sdsd', 6, 0], ['sdsd', 2, 5]]
[['fffffff', 1, 3]]

Unique items in a list with condition

If i have a list in python say
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0],[30,1,1]]
I would want to have a resulting list
thing = [[20,1,1],[20,0,2],[30,1,1]]
That is if the first element is the same, remove duplicates and give priority to the number 1 in the second element. Lastly the 3rd element must also be unique to the first element.
In this previous question we solved a complicated method where for a transaction it details a purchased unit. I want to output other units in that course. If two transactions exist that relate to two units in one course it will display them a duplicate (or times each subsequent unit).
The aim of this question it to ensure that this duplication is stopped. Because of the complication of this solution it has resulted in a series of question. Thanks for everyone that has helped so far.
I am not sure you would like this, but it works with your example:
[list(i) + j for i, j in dict([(tuple(x[:2]), x[2:]) for x in sorted(thing, key=lambda x:len(x))]).items()]
EDIT:
Here a bit more detailed (note that it fits better to your description of the problem, sorting ONLY by the length of each sublist, may not be the best solution):
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0],[30,1,1]]
dico = {}
for x in thing:
if not tuple(x[:2]) in dico:
dico[tuple(x[:2])] = x[2:]
continue
if tuple(x[:2])[1] < x[1]:
dico[tuple(x[:2])] = x[2:]
new_thing = []
for i, j in dico.items():
new_thing.append(list(i) + j)
You might want to try using the unique_everseen function from the itertools recipes.
As a first step, here is a solution excluding [20, 0]:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
thing = [[20,0,1],[20,0,2],[20,1,1],[30,1,1]]
thing.sort(key=lambda x: 0 if x[1] == 1 else 1)
print(list(unique_everseen(thing, key=lambda x: (x[0], x[2]))))
Output:
[[20, 1, 1], [30, 1, 1], [20, 0, 2]]
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0,1],[30,1,1]]
d = {}
for e in thing:
k = (e[0], e[2])
if k not in d or (d[k][1] != 1 and e[1] == 1):
d[k] = list(e)
print d.values()
[[20, 0, 2], [30, 1, 1], [20, 1, 1]]
if you don't need initial list:
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0,1],[30,1,1]]
d = {}
for e in thing:
k = (e[0], e[2])
if k not in d or (d[k][1] != 1 and e[1] == 1):
d[k] = e
thing = d.values()
[[20, 0, 2], [30, 1, 1], [20, 1, 1]]
if you want to keep order of your lists, use OrderedDict
from collections import OrderedDict
d = OrderedDict()

Categories