Find non-common elements in lists - python

I'm trying to write a piece of code that can automatically factor an expression. For example,
if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].
From this post: How to find list intersection?
I see that the common elements can be found by
set([1,2,3,4]&set([2,3,5]).
Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?
I can go ahead and do a for loop:
lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
for elem in eachList:
if elem not in common:
nonCommon += elem
But this seems redundant and inefficient. Does Python provide any handy function that can do that? Thanks in advance!!

Use the symmetric difference operator for sets (aka the XOR operator):
>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])

Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference().
EXAMPLE
list_one = [1,2,3,4]
list_two = [2,3,5]
one_not_two = set(list_one).difference(list_two)
# set([1, 4])
two_not_one = set(list_two).difference(list_one)
# set([5])
This could also be written as:
one_not_two = set(list_one) - set(list_two)
Timing
I ran some timing tests on both and it appears that .difference() has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.
Other Notes
It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. Most of the previous answers return a single list or set that include all of the items.
There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.
If the OP wants to maintain duplicates, a list comprehension could be used, for example:
one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]
...which is roughly the same solution as posed in the original question, only a little cleaner. This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.

You can use Intersection concept to deal with this kind of problems.
b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}
Best thing about using this code is it works pretty fast for large data also. I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.

You can use the .__xor__ attribute method.
set([1,2,3,4]).__xor__(set([2,3,5]))
or
a = set([1,2,3,4])
b = set([2,3,5])
a.__xor__(b)

You can use symmetric_difference command
x = {1,2,3}
y = {2,3,4}
z = set.difference(x,y)
Output will be : z = {1,4}

This should get the common and remaining elements
lis1=[1,2,3,4,5,6,2,3,1]
lis2=[4,5,8,7,10,6,9,8]
common = list(dict.fromkeys([l1 for l1 in lis1 if l1 in lis2]))
remaining = list(filter(lambda i: i not in common, lis1+lis2))
common = [4, 5, 6]
remaining = [1, 2, 3, 2, 3, 1, 8, 7, 10, 9, 8]

All the good solutions, starting from basic DSA style to using inbuilt functions:
# Time: O(2n)
def solution1(arr1, arr2):
map = {}
maxLength = max(len(arr1), len(arr2))
for i in range(maxLength):
if(arr1[i]):
if(not map.get(arr1[i])):
map[arr1[i]] = [True, False]
else:
map[arr1[i]][0] = True
if(arr2[i]):
if(not map.get(arr2[i])):
map[arr2[i]] = [False, True]
else:
map[arr2[i]][1] = False
res = [];
for key, value in map.items():
if(value[0] == False or value[1] == False):
res.append(key)
return res
def solution2(arr1, arr2):
return set(arr1) ^ set(arr2)
def solution3(arr1, arr2):
return (set(arr1).difference(arr2), set(arr2).difference(arr1))
def solution4(arr1, arr2):
return set(arr1).__xor__(set(arr2))
print(solution1([1,2,3], [2,4,6]))
print(solution2([1,2,3], [2,4,6]))
print(solution3([1,2,3], [2,4,6]))
print(solution4([1,2,3], [2,4,6]))

Related

Remove Variable(s) in List A if Variable(s) is/are in List B, Python

Like the title states I want to remove variables in one list if they happen to be in another list. I have tried various techniques but I can't seem to get a proper code. Can anyone help with this?
You may use list comprehension if you want to maintain the order:
>>> l = [1,2,3,4]
>>> l2 = [1,5,6,3]
>>> [x for x in l if x not in l2]
[2, 4]
In case the order of elements in original list don't matter, you may use set:
>>> list(set(l) - set(l2))
[2, 4]
def returnNewList(a,b):
h = {}
for e in b:
h[e] = True
return [e for e in a if e not in h]
hash table is used to keep the run time complexity linear.
In case list b is sorted then on place of using hash table you can perform binary search, complexity in this case will be nlog(n)
There are several ways
# just make a new list
[i for i in a if i not in b]
# use sets
list(set(a).difference(set(b)))
I figured it out, however is there a shorter way to write this code?
a = [0,1,2,3,4,5,6,7,8]
b = [0,5,8]
for i in a:
if i in b:
a.remove(i)

How to find second smallest UNIQUE number in a list?

I need to create a function that returns the second smallest unique number, which means if
list1 = [5,4,3,2,2,1], I need to return 3, because 2 is not unique.
I've tried:
def second(list1):
result = sorted(list1)[1]
return result
and
def second(list1):
result = list(set((list1)))
return result
but they all return 2.
EDIT1:
Thanks guys! I got it working using this final code:
def second(list1):
b = [i for i in list1 if list1.count(i) == 1]
b.sort()
result = sorted(b)[1]
return result
EDIT 2:
Okay guys... really confused. My Prof just told me that if list1 = [1,1,2,3,4], it should return 2 because 2 is still the second smallest number, and if list1 = [1,2,2,3,4], it should return 3.
Code in eidt1 wont work if list1 = [1,1,2,3,4].
I think I need to do something like:
if duplicate number in position list1[0], then remove all duplicates and return second number.
Else if duplicate number postion not in list1[0], then just use the code in EDIT1.
Without using anything fancy, why not just get a list of uniques, sort it, and get the second list item?
a = [5,4,3,2,2,1] #second smallest is 3
b = [i for i in a if a.count(i) == 1]
b.sort()
>>>b[1]
3
a = [5,4,4,3,3,2,2,1] #second smallest is 5
b = [i for i in a if a.count(i) == 1]
b.sort()
>>> b[1]
5
Obviously you should test that your list has at least two unique numbers in it. In other words, make sure b has a length of at least 2.
Remove non unique elements - use sort/itertools.groupby or collections.Counter
Use min - O(n) to determine the minimum instead of sort - O(nlongn). (In any case if you are using groupby the data is already sorted) I missed the fact that OP wanted the second minimum, so sorting is still a better option here
Sample Code
Using Counter
>>> sorted(k for k, v in Counter(list1).items() if v == 1)[1]
1
Using Itertools
>>> sorted(k for k, g in groupby(sorted(list1)) if len(list(g)) == 1)[1]
3
Here's a fancier approach that doesn't use count (which means it should have significantly better performance on large datasets).
from collections import defaultdict
def getUnique(data):
dd = defaultdict(lambda: 0)
for value in data:
dd[value] += 1
result = [key for key in dd.keys() if dd[key] == 1]
result.sort()
return result
a = [5,4,3,2,2,1]
b = getUnique(a)
print(b)
# [1, 3, 4, 5]
print(b[1])
# 3
Okay guys! I got the working code thanks to all your help and helping me to think on the right track. This code works:
`def second(list1):
if len(list1)!= len(set(list1)):
result = sorted(list1)[2]
return result
elif len(list1) == len(set(list1)):
result = sorted(list1)[1]
return result`
Okay, here usage of set() on a list is not going to help. It doesn't purge the duplicated elements. What I mean is :
l1=[5,4,3,2,2,1]
print set(l1)
Prints
[0, 1, 2, 3, 4, 5]
Here, you're not removing the duplicated elements, but the list gets unique
In your example you want to remove all duplicated elements.
Try something like this.
l1=[5,4,3,2,2,1]
newlist=[]
for i in l1:
if l1.count(i)==1:
newlist.append(i)
print newlist
This in this example prints
[5, 4, 3, 1]
then you can use heapq to get your second largest number in your list, like this
print heapq.nsmallest(2, newlist)[-1]
Imports : import heapq, The above snippet prints 3 for you.
This should to the trick. Cheers!

Optimize search to find next matching value in a list

I have a program that goes through a list and for each objects finds the next instance that has a matching value. When it does it prints out the location of each objects. The program runs perfectly fine but the trouble I am running into is when I run it with a large volume of data (~6,000,000 objects in the list) it will take much too long. If anyone could provide insight into how I can make the process more efficient, I would greatly appreciate it.
def search(list):
original = list
matchedvalues = []
count = 0
for x in original:
targetValue = x.getValue()
count = count + 1
copy = original[count:]
for y in copy:
if (targetValue == y.getValue):
print (str(x.getLocation) + (,) + str(y.getLocation))
break
Perhaps you can make a dictionary that contains a list of indexes that correspond to each item, something like this:
values = [1,2,3,1,2,3,4]
from collections import defaultdict
def get_matches(x):
my_dict = defaultdict(list)
for ind, ele in enumerate(x):
my_dict[ele].append(ind)
return my_dict
Result:
>>> get_matches(values)
defaultdict(<type 'list'>, {1: [0, 3], 2: [1, 4], 3: [2, 5], 4: [6]})
Edit:
I added this part, in case it helps:
values = [1,1,1,1,2,2,3,4,5,3]
def get_next_item_ind(x, ind):
my_dict = get_matches(x)
indexes = my_dict[x[ind]]
temp_ind = indexes.index(ind)
if len(indexes) > temp_ind + 1:
return(indexes)[temp_ind + 1]
return None
Result:
>>> get_next_item_ind(values, 0)
1
>>> get_next_item_ind(values, 1)
2
>>> get_next_item_ind(values, 2)
3
>>> get_next_item_ind(values, 3)
>>> get_next_item_ind(values, 4)
5
>>> get_next_item_ind(values, 5)
>>> get_next_item_ind(values, 6)
9
>>> get_next_item_ind(values, 7)
>>> get_next_item_ind(values, 8)
There are a few ways you could increase the efficiency of this search by minimising additional memory use (particularly when your data is BIG).
you can operate directly on the list you are passing in, and don't need to make copies of it, in this way you won't need: original = list, or copy = original[count:]
you can use slices of the original list to test against, and enumerate(p) to iterate through these slices. You won't need the extra variable count and, enumerate(p) is efficient in Python
Re-implemented, this would become:
def search(p):
# iterate over p
for i, value in enumerate(p):
# if value occurs more than once, print locations
# do not re-test values that have already been tested (if value not in p[:i])
if value not in p[:i] and value in p[(i + 1):]:
print(e, ':', i, p[(i + 1):].index(e))
v = [1,2,3,1,2,3,4]
search(v)
1 : 0 2
2 : 1 2
3 : 2 2
Implementing it this way will only print out the values / locations where a value is repeated (which I think is what you intended in your original implementation).
Other considerations:
More than 2 occurrences of value: If the value repeats many times in the list, then you might want to implement a function to walk recursively through the list. As it is, the question doesn't address this - and it may be that it doesn't need to in your situation.
using a dictionary: I completely agree with Akavall above, dictionary's are a great way of looking up values in Python - especially if you need to lookup values again later in the program. This will work best if you construct a dictionary instead of a list when you originally create the list. But if you are only doing this once, it is going to cost you more time to construct the dictionary and query over it than simply iterating over the list as described above.
Hope this helps!

Sort list of lists by unique reversed absolute condition

Context - developing algorithm to determine loop flows in a power flow network.
Issue:
I have a list of lists, each list represents a loop within the network determined via my algorithm. Unfortunately, the algorithm will also pick up the reversed duplicates.
i.e.
L1 = [a, b, c, -d, -a]
L2 = [a, d, c, -b, -a]
(Please note that c should not be negative, it is correct as written due to the structure of the network and defined flows)
Now these two loops are equivalent, simply following the reverse structure throughout the network.
I wish to retain L1, whilst discarding L2 from the list of lists.
Thus if I have a list of 6 loops, of which 3 are reversed duplicates I wish to retain all three.
Additionally, The loop does not have to follow the format specified above. It can be shorter, longer, and the sign structure (e.g. pos pos pos neg neg) will not occur in all instances.
I have been attempting to sort this by reversing the list and comparing the absolute values.
I am completely stumped and any assistance would be appreciated.
Based upon some of the code provided by mgibson I was able to create the following.
def Check_Dup(Loops):
Act = []
while Loops:
L = Loops.pop()
Act.append(L)
Loops = Popper(Loops, L)
return Act
def Popper(Loops, L):
for loop in Loops:
Rev = loop[::-1]
if all (abs(x) == abs(y) for x, y in zip(loop_check, Rev)):
Loops.remove(loop)
return Loops
This code should run until there are no loops left discarding the duplicates each time. I'm accepting mgibsons answers as it provided the necessary keys to create the solution
I'm not sure I get your question, but reversing a list is easy:
a = [1,2]
a_rev = a[::-1] #new list -- if you just want an iterator, reversed(a) also works.
To compare the absolute values of a and a_rev:
all( abs(x) == abs(y) for x,y in zip(a,a_rev) )
which can be simplified to:
all( abs(x) == abs(y) for x,y in zip(a,reversed(a)) )
Now, in order to make this as efficient as possible, I would first sort the arrays based on the absolute value:
your_list_of_lists.sort(key = lambda x : map(abs,x) )
Now you know that if two lists are going to be equal, they have to be adjacent in the list and you can just pull that out using enumerate:
def cmp_list(x,y):
return True if x == y else all( abs(a) == abs(b) for a,b in zip(a,b) )
duplicate_idx = [ idx for idx,val in enumerate(your_list_of_lists[1:])
if cmp_list(val,your_list_of_lists[idx]) ]
#now remove duplicates:
for idx in reversed(duplicate_idx):
_ = your_list_of_lists.pop(idx)
If your (sub) lists are either strictly increasing or strictly decreasing, this becomes MUCH simpler.
lists = list(set( tuple(sorted(x)) for x in your_list_of_lists ) )
I don't see how they can be equivalent if you have c in both directions - one of them must be -c
>>> a,b,c,d = range(1,5)
>>> L1 = [a, b, c, -d, -a]
>>> L2 = [a, d, -c, -b, -a]
>>> L1 == [-x for x in reversed(L2)]
True
now you can write a function to collapse those two loops into a single value
>>> def normalise(loop):
... return min(loop, [-x for x in reversed(L2)])
...
>>> normalise(L1)
[1, 2, 3, -4, -1]
>>> normalise(L2)
[1, 2, 3, -4, -1]
A good way to eliminate duplicates is to use a set, we just need to convert the lists to tuples
>>> L=[L1, L2]
>>> set(tuple(normalise(loop)) for loop in L)
set([(1, 2, 3, -4, -1)])
[pair[0] for pair in frozenset(sorted( (c,negReversed(c)) ) for c in cycles)]
Where:
def negReversed(list):
return tuple(-x for x in list[::-1])
and where cycles must be tuples.
This takes each cycle, computes its duplicate, and sorts them (putting them in a pair that are canonically equivalent). The set frozenset(...) uniquifies any duplicates. Then you extract the canonical element (in this case I arbitrarily chose it to be pair[0]).
Keep in mind that your algorithm might be returning cycles starting in arbitrary places. If this is the case (i.e. your algorithm might return either [1,2,-3] or [-3,1,2]), then you need to consider these as equivalent necklaces
There are many ways to canonicalize necklaces. The above way is less efficient because we don't care about canonicalizing the necklace directly: we just treat the entire equivalence class as the canonical element, by turning each cycle (a,b,c,d,e) into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a)}. In your case since you consider negatives to be equivalent, you would turn each cycle into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a), (-a,-b,-c,-d,-e), (-e,-a,-b,-c,-d), (-d,-e,-a,-b,-c), (-c,-d,-e,-a,-b), (-b,-c,-d,-e,-a)}. Make sure to use frozenset for performance, as set is not hashable:
eqClass.pop() for eqClass in {frozenset(eqClass(c)) for c in cycles}
where:
def eqClass(cycle):
for rotation in rotations(cycle):
yield rotation
yield (-x for x in rotation)
where rotation is something like Efficient way to shift a list in python but yields a tuple

Linear merging for lists in Python

I'm working through Google's Python class exercises. One of the exercises is this:
Given two lists sorted in increasing order, create and return a merged list of all the elements in sorted order. You may modify the passed in lists. Ideally, the solution should work in "linear" time, making a single pass of both lists.
The solution I came up with was:
def linear_merge(list1, list2):
list1.extend(list2)
return sorted(list1)
It passed the the test function, but the solution given is this:
def linear_merge(list1, list2):
result = []
# Look at the two lists so long as both are non-empty.
# Take whichever element [0] is smaller.
while len(list1) and len(list2):
if list1[0] < list2[0]:
result.append(list1.pop(0))
else:
result.append(list2.pop(0))
# Now tack on what's left
result.extend(list1)
result.extend(list2)
return result
Included as part of the solution was this:
Note: the solution above is kind of cute, but unfortunately list.pop(0) is
not constant time with the standard python list implementation, so the
above is not strictly linear time. An alternate approach uses pop(-1) to
remove the endmost elements from each list, building a solution list which
is backwards. Then use reversed() to put the result back in the correct
order. That solution works in linear time, but is more ugly.
Why are these two solutions so different? Am I missing something, or are they being unnecessarily complicated?
They're encouraging you to think about the actual method (algorithm) of merging two sorted lists. Suppose you had two stacks of paper with names on them, each in alphabetical order, and you wanted to make one sorted stack from them. You wouldn't just lump them together and then sort that from scratch; that would be too much work. You'd make use of the fact that each pile is already sorted, so you can just take the one that comes first off of one pile or the other, and put them into a new stack.
As you noted, your solution works perfectly. So why the complexity? Well, for a start
Ideally, the solution should work in "linear" time, making a single
pass of both lists.
Well, you're not explicitly passing through any lists, but you are calling sorted(). So how many times will sorted() pass over the lists?
Well, I don't actually know. Normally, a sorting algorithm would operate in something like O(n*log(n)) time, though look at this quote from the Python docs:
The Timsort algorithm used in Python does multiple sorts efficiently
because it can take advantage of any ordering already present in a
dataset.
Maybe someone who knows timsort better can figure it out.
But what they're doing in the solution, is using the fact that they know they have 2 sorted lists. So rather than starting from "scratch" with sorted, they're picking off elements 1 by 1.
I like the #Abhijit approach the most. Here is a slightly more pythonic/readable version of his code snippet:
def linear_merge(list1, list2):
result = []
while list1 and list2:
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
return (result + list1 + list2)[-1::-1]
With the help of the built-in python features, we:
don't need to explicitly check if the lists are empty with the
len function.
can merge/append empty lists and the result will remain unchanged, so no need for explicit checking.
we can combine multiple statements (if the readability allows), which sometimes makes the code more compact.
result = []
while list1 and list2:
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
if len(list1):
result += list1[-1::-1]
if len(list2):
result += list2[-1::-1]
return result[-1::-1]
The solution by #Abhijit and #intel do not work in all cases because they have not reversed the leftover parts of the original lists. If we have list1 = [1, 2, 3, 5, 9, 11, 13, 17] and list2 = [6, 7, 12, 15] then their solution would give [5, 3, 2, 1, 6, 7, 9, 11, 12, 13, 15, 17] where we would want [1, 2, 3, 5, 6, 7, 9, 11, 12, 13, 15, 17].
Your solution is O(n log n), which means that if your lists were 10 times as long, the program would take (roughly) 30 times as much time. Their solution would only take 10 times as long.
Pop off the end of the lists until one is empty. I think this is linear, and also the reverses are linear too. Ugly, but a solution.
def linear_merge(list1, list2):
# NOT return sorted (list1 + list2), as this is not linear
list3 = []
rem = []
empty = False
while not empty:
# Get last items from each list, if they exist
if len (list1) > 0:
a = list1[-1]
else:
rem = list2[:]
empty = True
if len (list2) > 0:
b = list2[-1]
else:
rem = list1[:]
empty = True
# Pop the one that's largest onto the new list
if not empty:
if a > b:
list3.append (a)
list1.pop ()
else:
list3.append (b)
list2.pop ()
# add the (reversed) remainder to the list
rem.reverse ()
list3 += rem
# reverse the entire list
list3.reverse ()
return list3
A slightly refined by still ugly solution (in Python3.5):
def linear_merge(list1: list, list2: list):
result = []
while len(list1) and len(list2):
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
result += list1 if len(list1) else list2
return result[-1::-1]
def linear_merge(list1, list2):
a= list1 + list2
a.sort()
return a

Categories