Ok let's try this again. I have 1 set of data. I want to make 2 copies, and then sort the copies in descending order based on different columns. Then I want to get the cumulative sum of the respective columns. When I run the following code I get different results for the two instances I call on print (setA[x][2]).
set = [[2,2,0],[1,3,0],[3,1,0]]
def getkey_setA (item):
return item[0]
setA = sorted(set, key=getkey_setA, reverse=True)
def getkey_setB (item):
return item[1]
setB = sorted(set, key=getkey_setB, reverse=True)
setA[0][2] = setA[0][0]
setB[0][2] = setB[0][1]
for x in range(1, 3):
setA[x][2] = setA[x-1][2] + setA[x][0]
print(setA[x][2])
for x in range(1, 3):
setB[x][2] = setB[x-1][2] + setB[x][1]
for x in range(1, 3):
print (setA[x][2])
This produces:
5
6
8
6
but I expected it to produce
5
6
5
6
instead.
sorted() creates a shallow copy of the sequence being sorted. This means that your nested lists are not copied, they are merely referenced:
>>> set = [[2,2,0],[1,3,0],[3,1,0]]
>>> setA = sorted(set, key=getkey_setA, reverse=True)
>>> setB = sorted(set, key=getkey_setB, reverse=True)
>>> setA[0] is set[2]
True
>>> setB[2] is set[2]
True
>>> setA[0] is setB[2]
True
So the last element in set is exactly the same object as setA[0] and setB[2]. Making changes to any one of those references is reflected in the others:
>>> setA[0][2]
0
>>> setA[0][2] = 42
>>> setB[2]
[3, 1, 42]
>>> set[2]
[3, 1, 42]
This is why the set object (from which you produced your sorted setA and setB lists) is also changed after running your code:
>>> set
[[2, 2, 8], [1, 3, 6], [3, 1, 9]]
You need to create a proper copy of the nested lists; you could use the copy.deepcopy() function to create a recursive copy of the list objects, or you could use a generator expression when sorting:
setA = sorted((subl[:] for subl in set), key=getkey_setA, reverse=True)
setB = sorted((subl[:] for subl in set), key=getkey_setB, reverse=True)
This shallowly copies the nested lists; this is fine because those nested lists only contain immutable objects themselves.
I have two lists of the same length which contains a variety of different elements. I'm trying to compare them to find the number of elements which exist in both lists, but have different indexes.
Here are some example inputs/outputs to demonstrate what I mean:
>>> compare([1, 2, 3, 4], [4, 3, 2, 1])
4
>>> compare([1, 2, 3], [1, 2, 3])
0
# Each item in the first list has the same index in the other
>>> compare([1, 2, 4, 4], [1, 4, 4, 2])
2
# The 3rd '4' in both lists don't count, since they have the same indexes
>>> compare([1, 2, 3, 3], [5, 3, 5, 5])
1
# Duplicates don't count
The lists are always the same size.
This is the algorithm I have so far:
def compare(list1, list2):
# Eliminate any direct matches
list1 = [a for (a, b) in zip(list1, list2) if a != b]
list2 = [b for (a, b) in zip(list1, list2) if a != b]
out = 0
for possible in list1:
if possible in list2:
index = list2.index(possible)
del list2[index]
out += 1
return out
Is there a more concise and eloquent way to do the same thing?
This python function does hold for the examples you provided:
def compare(list1, list2):
D = {e:i for i, e in enumerate(list1)}
return len(set(e for i, e in enumerate(list2) if D.get(e) not in (None, i)))
since duplicates don't count, you can use sets to find only the elements in each list. A set only holds unique elements. Then select only the elements shared between both using list.index
def compare(l1, l2):
s1, s2 = set(l1), set(l2)
shared = s1 & s2 # intersection, only the elements in both
return len([e for e in shared if l1.index(e) != l2.index(e)])
You can actually bring this down to a one-liner if you want
def compare(l1, l2):
return len([e for e in set(l1) & set(l2) if l1.index(e) != l2.index(e)])
Alternative:
Functionally you can use the reduce builtin (in python3, you have to do from functools import reduce first). This avoids construction of the list which saves excess memory usage. It uses a lambda function to do the work.
def compare(l1, l2):
return reduce(lambda acc, e: acc + int(l1.index(e) != l2.index(e)),
set(l1) & set(l2), 0)
A brief explanation:
reduce is a functional programming contruct that reduces an iterable to a single item traditionally. Here we use reduce to reduce the set intersection to a single value.
lambda functions are anonymous functions. Saying lambda x, y: x + 1 is like saying def func(x, y): return x + y except that the function has no name. reduce takes a function as its first argument. The first argument a the lambda receives when used with reduce is the result of the previous function, the accumulator.
set(l1) & set(l2) is a set consisting of unique elements that are in both l1 and l2. It is iterated over, and each element is taken out one at a time and used as the second argument to the lambda function.
0 is the initial value for the accumulator. We use this since we assume there are 0 shared elements with different indices to start.
I dont claim it is the simplest answer, but it is a one-liner.
import numpy as np
import itertools
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
print len(np.unique(list(itertools.chain.from_iterable([[a,b] for a,b in zip(l1,l2) if a!= b]))))
I explain:
[[a,b] for a,b in zip(l1,l2) if a!= b]
is the list of couples from zip(l1,l2) with different items. Number of elements in this list is number of positions where items at same position differ between the two lists.
Then, list(itertools.chain.from_iterable() is for merging component lists of a list. For instance :
>>> list(itertools.chain.from_iterable([[3,2,5],[5,6],[7,5,3,1]]))
[3, 2, 5, 5, 6, 7, 5, 3, 1]
Then, discard duplicates with np.unique(), and take len().
I want to re-assign each item in a list in Python.
In [20]: l = [1,2,3,4,5]
In [21]: for i in l:
....: i = i + 1
....:
....:
But the list didn't change at all.
In [22]: l
Out[22]: [1, 2, 3, 4, 5]
I want to know why this happened. Could any body explain the list iterating in detail? Thanks.
You can't do it like that, you are merely changing the value binded to the name i. On each iteration of the for loop, i is binded to a value in the list. It is not a pointer in the sense that by changing the value of i you are changing a value in the list. Instead, as I said before, it is simply a name and you are just changing the value that name refers to. In this case, i = i + 1, binds i to the value i + 1. So you aren't actually affecting the list itself, to do that you have to set it by index.
>>> L = [1,2,3,4,5]
>>> for i in range(len(L)):
L[i] = L[i] + 1
>>> L
[2, 3, 4, 5, 6]
Some pythonistas may prefer to iterate like this:
for i, n in enumerate(L): # where i is the index, n is each number
L[i] = n + 1
However you can easily achieve the same result with a list comprehension:
>>> L = [1,2,3,4,5]
>>> L = [n + 1 for n in L]
>>> L
[2, 3, 4, 5, 6]
For more info: http://www.effbot.org/zone/python-objects.htm
This is because of how Python handles variables and the values they reference.
You should modify the list element itself:
for i in xrange(len(l)):
l[i] += 1
>>> a = [1, 2, 3, 4, 5]
>>> a = [i + 1 for i in a]
>>> a
[2, 3, 4, 5, 6]
Initially i is a pointer to the item inside the list, but when you reassign it, it will point to the new number, that is why the list will not be changed.
For a list of mutable objects it would work:
class Number(object):
def __init__(self,n):
self.n=n
def increment(self):
self.n+=1
def __repr__(self):
return 'Number(%d)' % self.n
a = [Number(i) for i in xrange(5)]
print a
for i in a:
i.increment()
print a
But int are not mutable, when you do an operation on them you get a new int object, and that is why it doesn't work in your case.
I was wondering, if there is way in Python to modify collections without creating new ones. E.g.:
lst = [1, 2, 3, 4, 5, 6]
new_lst = [i for i in lst if i > 3]
Works just fine, but a new collection is created. Is there a reason, that Python collections lack a filter() method (or similar) that would modify the collection object in place?
If you want to do this in place, just use
lst[:] = [i for i in lst if i > 3]
This won't be faster or save any memory, but it changes the object in place, if this is the semantics you need.
The other answers are correct; if you want all the names pointing to the old list to point to the new list you can use slice assignment.
However, that's not truly in-place creation; the new list is first created elsewhere. The link in Sven's answer is good.
The reason there isn't one that truly operates in-place is that while making a new list like that is O(n), each truly in-place item removal would be O(k) by itself, where k is the length of the list from the removal point on. The only way to avoid that with Python lists is to use some temporary storage, which is what you're doing by using slice assignment.
An example of an in-place O(n) filter on a collections.deque, in case you don't need to store your data in a list:
from collections import deque
def dequefilter(deck, condition):
for _ in xrange(len(deck)):
item = deck.popleft()
if condition(item):
deck.append(item)
deck = deque((1, 2, 3, 4, 5))
dequefilter(deck, lambda x: x > 2) # or operator.gt(2)
print deck
# deque([3, 4, 5])
Correcting #larsmans original solution, you could either do
i = 0
while i < len(lst):
if lst[i] <= 3:
del lst[i]
else:
i += 1
or
i = len(lst)
while i > 0:
if lst[i-1] <= 3:
del lst[i-1]
i -= 1
The reason is the "index shift" which happens with the del. If I del at a certain index, that index needs to be re-examined because it now holds a different value.
Maybe I'm slightly late, but since no other "O(n) time/O(1) memory" solutions have been posted, and some people even claimed that it is impossible, I think I should post this.
# Retains the elements of xs for which p returned true
def retain(xs, p):
w = 0
for x in xs:
if p(x):
xs[w] = x
w += 1
del xs[w:]
The lst[:] solution by #Sven Marnach is one option. You can also perform this operation in-place, using constant extra memory, with
>>> i = 0
>>> while i < len(lst):
... if lst[i] <= 3:
... del lst[i]
... else:
... i += 1
...
>>> lst
[4, 5, 6]
... but this solution is not very readable and takes quadratic time due to all the element shifting involved.
Because it's not needed.
lst[:] = [i for i in lst if i > 3]
I think it's in place transformation;
lst = [1,2,3,4,5,6,7,8,9,10,11]
to_exclude = [8,4,11,9]
print 'lst == %s\nto_exclude == %s' % (lst,to_exclude)
for i in xrange(len(lst)-1,-1,-1):
if lst[i] in to_exclude:
lst.pop(i)
print '\nlst ==',lst
result
lst == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
to_exclude == [8, 4, 11, 9]
lst == [1, 2, 3, 5, 6, 7, 10]
I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]
Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y
Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.
if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]
That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])
For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.
If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]
The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts
Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.
We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]
Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>
The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y
from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))
Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.
This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))
list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.
def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes