Related
I want to get the values that appear in one of the lists but not in the others. I even tried using '<>', it says invalid syntax. I am trying using list comprehensions.
com_list = []
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
come_list = [a for a in a1 for b in b1 if a != b ]
Output:
[1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
My expected output would be `[3, 5, 6]
What you want is called symmetric difference, you can do:
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
set(a1).symmetric_difference(b1)
# {3, 5, 6}
which you can also write as:
set(a1) ^ set(b1)
If you really want a list in the end, just convert it:
list(set(a1) ^ set(b1))
# [3, 5, 6]
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
If you really want to do that using list comprehensions, well, here it is, but it's really not the right thing to do here.
A totally inefficient version:
# Don't do that !
sym_diff = [x for x in a1+b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
It would be a bit better using sets to test membership efficiently:
# Don't do that either
a1 = set([1,2,3,4,5])
b1 = set([6,4,2,1])
sym_diff = [x for x in a1|b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
But if you start using sets, which is the right thing to do here, use them all the way properly and use symmetric_difference.
You can do
come_list =[i for i in list((set(a1) - set(b1))) + list((set(b1) - set(a1)))]
print(come_list)
Output
[3, 5, 6]
This new list contains all unique numbers for both of the lists together.
the problem with this line come_list = [a for a in a1 for b in b1 if a != b ] is that the items iterating over each item in the first list over all the items in the second list to check if it's inited but it's not giving unique numbers between both.
Consider two sorted numpy arrays:
import numpy as np
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
How do I:
1. Find the elements that appear in both lists, and
2. Remove only one instance of that occurrence from each list.
That is the output should be:
a = [1,2,4,8,10,21]
b = [3,3,18,22]
So even if there are duplicates, only one instance is removed. However if the lists are
c = np.array([1,2,4,4,6,8,10,10,10,21])
d = np.array([3,3,4,6,10,10,18,22])
I expect to obtain the new outputs:
c = [1,2,4,8,10,21]
d = [3,3,18,22]
which is the same as above. The difference is the number of 10's in the list. Each of the two 10's in list d takes away one 10 each from c leaving the same result.
This post was the closest match to my question, but it removed all instances of repeats from both lists.
You can use collections.Counter:
from collections import Counter
import numpy as np
a = np.array([1, 2, 4, 4, 6, 8, 10, 10, 21])
b = np.array([3, 3, 4, 6, 10, 18, 22])
ca = Counter(a)
cb = Counter(b)
result_a = sorted((ca - cb).elements())
result_b = sorted((cb - ca).elements())
print(result_a)
print(result_b)
Output
[1, 2, 4, 8, 10, 21]
[3, 3, 18, 22]
It returns the same result for (as expected):
a = np.array([1, 2, 4, 4, 6, 8, 10, 10, 10, 21])
b = np.array([3, 3, 4, 6, 10, 10, 18, 22])
You can find the indices of first occurences of intersecting items using np.searchsorted as following and then remove them using np.delete() function:
In [58]: intersect = a[np.in1d(a, b)]
In [59]: mask1 = np.searchsorted(a, intersect)
In [60]: mask2 = np.searchsorted(b, intersect)
In [61]: np.delete(a, mask1)
Out[61]: array([ 1, 2, 4, 8, 10, 21])
In [62]: np.delete(b, mask2)
Out[62]: array([ 3, 3, 18, 22])
I'm not 100% sure what you're looking to do based on the question, but I have been able to duplicate the output using the methods described.
import numpy as np
# List of b that are not in a
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
newb = [x for x in b if x not in a]
print(newb)
# REMOVE ONE DUPLICATED ELEMENT FROM LIST
import collections
counter=collections.Counter(a)
print(counter)
newa = list(a)
for k,v in counter.items():
if v > 1:
newa.remove(k)
print(newa)
If you don't mind the verbosity:
import numpy as np
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
common_values = set(a) & set(b)
a = a.tolist()
b = b.tolist()
for value in common_values:
a.remove(value)
b.remove(value)
a = np.array(a)
b = np.array(b)
Using for loops:
import numpy as np
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
for i, val in enumerate(a):
if val in b:
a = np.delete(a, np.where(a == val)[0][0])
b = np.delete(b, np.where(b == val)[0][0])
for i, val in enumerate(b):
if val in a:
a = np.delete(a, np.where(a == val)[0][0])
b = np.delete(b, np.where(b == val)[0][0])
print(a)
print(b)
Outputs:
[1,2,4,8,10,21]
[3,3,18,22]
Here is a numpy approach:
import numpy as np
a = np.array([1,2,4,4,6,8,10,10,21])
b = np.array([3,3,4,6,10,18,22])
# join and sort (with Tim sort this should be O(n))
ab = np.concatenate([a,b])
i = ab.argsort(kind="stable")
abo = ab[i]
# mark 1st of each group of equal values
d = np.flatnonzero(np.diff(abo,prepend=abo[0]-1,append=abo[-1]+1))
# mark sorted total by origin (a -> False, b -> True)
ig = i>=len(a)
# compare origins of first and last of each group of equal values
# if they are different mark for deletion
dupl = ig[d[:-1]] ^ ig[d[1:]-1]
# finally, delete
ar = np.delete(a,i[d[:-1][dupl]])
br = np.delete(b,i[d[1:][dupl]-1]-len(a))
# inspect
ar
array([ 1, 2, 4, 8, 10, 21])
br
array([ 3, 3, 18, 22])
I was working on an algorithm and in that, we are trying to write every line in the code such that it adds up a good performance to the final code.
In one situation we have to add lists (more than two specifically). I know some of the ways to join more than two lists also I have looked upon StackOverflow but none of the answers are giving account on the performance of the method.
Can anyone show, what are the ways we can join more than two lists and their respective performance?
Edit : The size of the list is varying from 2 to 13 (to be specific).
Edit Duplicate : I have been specifically asking for the ways we can add and their respected questions and in duplicate question its limited to only 4 methods
There are multiples ways using which you can join more than two list.
Assuming that we have three list,
a = ['1']
b = ['2']
c = ['3']
Then, for joining two or more lists in python,
1)
You can simply concatenate them,
output = a + b + c
2)
You can do it using list comprehension as well,
res_list = [y for x in [a,b,c] for y in x]
3)
You can do it using extend() as well,
a.extend(b)
a.extend(c)
print(a)
4)
You can do it by using * operator as well,
res = [*a,*b,*c]
For calculating performance, I have used timeit module present in python.
The performance of the following methods are;
4th method < 1st method < 3rd method < 2nd [method on the basis of
time]
That means If you are going to use " * operator " for concatenation of more than two lists then you will get the best performance.
Hope you got what you were looking for.
Edit:: Image showing performance of all the methods (Calculated using timeit)
I did some simple measurements, here are my results:
import timeit
from itertools import chain
a = [*range(1, 10)]
b = [*range(1, 10)]
c = [*range(1, 10)]
tests = ("""output = list(chain(a, b, c))""",
"""output = a + b + c""",
"""output = [*chain(a, b, c)]""",
"""output = a.copy();output.extend(b);output.extend(c);""",
"""output = [*a, *b, *c]""",
"""output = a.copy();output+=b;output+=c;""",
"""output = a.copy();output+=[*b, *c]""",
"""output = a.copy();output += b + c""")
results = sorted((timeit.timeit(stmt=test, number=1, globals=globals()), test) for test in tests)
for i, (t, stmt) in enumerate(results, 1):
print(f'{i}.\t{t}\t{stmt}')
Prints on my machine (AMD 2400G, Python 3.6.7):
1. 6.010000106471125e-07 output = [*a, *b, *c]
2. 7.109999842214165e-07 output = a.copy();output += b + c
3. 7.720000212430023e-07 output = a.copy();output+=b;output+=c;
4. 7.820001428626711e-07 output = a + b + c
5. 1.0520000159885967e-06 output = a.copy();output+=[*b, *c]
6. 1.4030001693754457e-06 output = a.copy();output.extend(b);output.extend(c);
7. 1.4820000160398195e-06 output = [*chain(a, b, c)]
8. 2.525000127207022e-06 output = list(chain(a, b, c))
If you are going to concatenate a variable number of lists together, your input is going to be a list of lists (or some equivalent collection). The performance tests need to take this into account because you are not going to be able to do things like list1+list2+list3.
Here are some test results (1000 repetitions):
option1 += loop 0.00097 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option2 itertools.chain 0.00138 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option3 functools.reduce 0.00174 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option4 comprehension 0.00188 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option5 extend loop 0.00127 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option6 deque 0.00180 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
This would indicate that a += loop through the list of lists is the fastest approach
And the source to produce them:
allLists = [ list(range(10)) for _ in range(5) ]
def option1():
result = allLists[0].copy()
for lst in allLists[1:]:
result += lst
return result
from itertools import chain
def option2(): return list(chain(*allLists))
from functools import reduce
def option3():
return list(reduce(lambda a,b:a+b,allLists))
def option4(): return [ e for l in allLists for e in l ]
def option5():
result = allLists[0].copy()
for lst in allLists[1:]:
result.extend(lst)
return result
from collections import deque
def option6():
result = deque()
for lst in allLists:
result.extend(lst)
return list(result)
from timeit import timeit
count = 1000
t = timeit(lambda:option1(), number = count)
print(f"option1 += loop {t:.5f}",option1()[:15])
t = timeit(lambda:option2(), number = count)
print(f"option2 itertools.chain {t:.5f}",option2()[:15])
t = timeit(lambda:option3(), number = count)
print(f"option3 functools.reduce {t:.5f}",option3()[:15])
t = timeit(lambda:option4(), number = count)
print(f"option4 comprehension {t:.5f}",option4()[:15])
t = timeit(lambda:option5(), number = count)
print(f"option5 extend loop {t:.5f}",option5()[:15])
t = timeit(lambda:option6(), number = count)
print(f"option6 deque {t:.5f}",option6()[:15])
So I need to have a code that checks one integer, and checks if the integer after it is the same value. If so, it will add the value to x.
input1 = [int(i) for i in str(1234441122)]
x= 0
So my code currently gives the result [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]. I want it to give the result of x = 0+4+4+1+2.
I do not know any way to do that.
The following will work. Zip together adjacent pairs and only take the first elements if they are the same as the second ones:
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> sum(x for x, y in zip(lst, lst[1:]) if x == y)
11
While this should be a little less [space-]efficent in theory (as the slice creates an extra list), it still has O(N) complexity in time and space and is well more readable than most solutions based on indexed access. A tricky way to avoid the slice while still being concise and avoiding any imports would be:
>>> sum((lst[i] == lst[i-1]) * lst[i] for i in range(1, len(lst))) # Py2: xrange
11
This makes use of the fact that lst[i]==lst[i-1] will be cast to 0 or 1 appropriately.
Another way using itertools.groupby
l = [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]
from itertools import groupby
sum(sum(g)-k for k,g in groupby(l))
#11
You can try this:
s = str(1234441122)
new_data = [int(a) for i, a in enumerate(s) if i+1 < len(s) and a == s[i+1]]
print(new_data)
final_data = sum(new_data)
Output:
[4, 4, 1, 2]
11
No need for that list. You can remove the "non-repeated" digits from the string already:
>>> n = 1234441122
>>> import re
>>> sum(map(int, re.sub(r'(.)(?!\1)', '', str(n))))
11
You are simply iterating on string and converting character to integer. You need to iterate and compare to next character.
a = str(1234441122)
sum = 0
for i,j in enumerate(a[:-1]):
if a[i] == a[i+1]:
sum+=int(a[i])
print(sum)
Output
11
Try this one too:
input1 = [int(i) for i in str(1234441122)]
x= 0
res = [input1[i] for i in range(len(input1)-1) if input1[i+1]==input1[i]]
print(res)
print(sum(res))
Output:
[4, 4, 1, 2]
11
Here's a slightly more space efficient version of #schwobaseggl's answer.
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> it = iter(lst)
>>> next(it) # throw away first value
>>> sum(x for x,y in zip(lst, it) if x == y)
11
Alernatively, using an islice from the itertools module is equivalent but looks a bit nicer.
>>> from itertools import islice
>>> sum(x for x,y in zip(lst, islice(lst, 1, None, 1)) if x == y)
11
Ok so I have two lists:
x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]
I compare them in such a way so I get the following output:
x = [3, 4]
y = [1, 5, 6]
The basic is idea to go through each list and compare them. If they have an element in common remove that element. But only one of that element not all of them. If they don't have an element in common leave it. Two identical lists would become x = [], y = []
Here is my rather hacked up and pretty lame solution. I hoping other can recommend a better and / or more pythonic way of doing this. 3 loops seems excessive...
done = True
while not done:
done = False
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
print xlist, ylist
Thanks as always for taking the time to read this question. XOXO
It's possible that the data structure you are looking for is the multiset (or "bag"), and if so, a good way to implement it in Python is to use collections.Counter:
>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})
If you want to convert the Counter objects back to lists with multiplicity, you can use the elements method:
>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]
If you don't care about order, use collections.Counter to do it in one line:
>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})
>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})
If you care about order, you can probably iterate through your lists grabbing elements from the above dictionaries:
def prune(seq, toPrune):
"""Prunes elements from front of seq in O(N) time"""
remainder = Counter(seq)-Counter(toPrune)
R = []
for x in reversed(seq):
if remainder.get(x):
remainder[x] -= 1
R.insert(0,x)
return R
Demo:
>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]
To build on Gareth's answer:
>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]
Benchmark code:
from collections import Counter
from collections import defaultdict
import random
# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]
# long lists
a = []
b = []
for i in range(0, 1000):
q = random.choice((1, 2, 3, 4))
if q == 1:
a.append(i)
elif q == 2:
b.append(i)
elif q == 3:
a.append(i)
b.append(i)
else:
a.append(i)
b.append(i)
b.append(i)
# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
done = False
while not done:
done = True
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
return xlist, ylist # not strictly necessary, see above
def counter(xlist, ylist):
x = Counter(xlist)
y = Counter(ylist)
return ((x-y).elements(), (y-x).elements())
def nasty(xlist, ylist):
x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])
return x, y
def gnibbler(xlist, ylist):
d = defaultdict(int)
for i in xlist: d[i] += 1
for i in ylist: d[i] -= 1
return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]
# substitute algorithm to test in the call
for x in range(0, 100000):
original(list(a), list(b))
Running the Insufficiently Rigorous Benchmarks[tm] (short lists are the original ones, long lists are randomly generated lists approximately 1000 entries long with a mix of matches and repeats, times given in multipliers of the Original algorithm):
100K iterations, short lists 1K iterations, long lists
Original 1.0 1.0
Counter 9.3 0.06
Nasty 2.9 1.4
Gnibbler 2.4 0.02
Note 1: The creation of the Counter object seems to overshadow the actual algorithm at small list sizes.
Note 2: Original and gnibbler are the same at list lengths of approximately 35, above which gnibbler (and Counter) are faster.
Just using collections.defaultdict so will work on Python2.5+
>>> x = [1, 2, 3, 4]
>>> y = [1, 1, 2, 5, 6]
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in x:
... d[i] += 1
...
>>> for i in y:
... d[i] -= 1
...
>>> [k for k,v in d.items() for i in range(v)]
[3, 4]
>>> [k for k,v in d.items() for i in range(-v)]
[1, 5, 6]
I find this is better than range (or xrange) if the number repetitions get large
>>> from itertools import repeat
>>> [k for k,v in d.items() for i in repeat(None, v)]
Quite nasty :P
a = sum(([i]*(x.count(i)-y.count(i)) for i in set(x)),[])
b = sum(([i]*(y.count(i)-x.count(i)) for i in set(y)),[])
x,y = a,b
This is simple if you dont care about the duplicates:
>>> x=[1,2,3,4]
>>> y=[1,1,2,5,6]
>>> list(set(x).difference(set(y)))
[3, 4]
>>> list(set(y).difference(set(x)))
[5, 6]