find common elements in two lists in linear time complexity - python

I have two unsorted lists of integers without duplicates both of them contain the same elements but not in the same order and I want to find the indices of the common elements between the two lists in lowest time complexity. For example
a = [1, 8, 5, 3, 4]
b = [5, 4, 1, 3, 8]
the output should be :
list1[0] With list2[2]
list1[1] With list2[4]
list1[2] With list2[0]
and so on
I have thought of using set. intersection and then find the index using the 'index' function but I didn't know how to print the output in a right way
this is what I've tried
b = set(list1).intersection(list2)
ina = [list1.index(x) for x in b]
inb = [list2.index(x) for x in b]
print (ina , inb )

To find them in linear time you should use some kind of hashing. The easiest way in Python is to use a dict:
list1 = [1, 8, 5, 3, 4]
list2 = [5, 4, 1, 3, 8]
common = set(list1).intersection(list2)
dict2 = {e: i for i, e in enumerate(list2) if e in common}
result = [(i, dict2[e]) for i, e in enumerate(list1) if e in common]
The result will be
[(0, 2), (1, 4), (2, 0), (3, 3), (4, 1)]
You can use something like this to format and print it:
for i1, i2 in result:
print(f"list1[{i1}] with list2[{i2}]")
you get:
list1[0] with list2[2]
list1[1] with list2[4]
list1[2] with list2[0]
list1[3] with list2[3]
list1[4] with list2[1]

Create a dictionary that maps elements of one list to their indexes. Then update it to have the indexes of the corresponding elements of the other list. Then any element that has two indices is in the intersection.
intersect = {x: [i] for i, x in enumerate(list1)}
for i, x in enumerate(list2):
if x in intersect:
intersect[x].append(i)
for l in intersect.values():
if len(l) == 2:
print(f'list1[{l[0]}] with list2[{l[1]}]')

a = [1, 8, 5, 3, 4]
b = [5, 4, 1, 3, 8]
e2i = {e : i for (i, e) in enumerate(b)}
for i, e in enumerate(a):
if e in e2i:
print('list1[%d] with list2[%d]' % (i, e2i[e]))

Building on the excellent answers here, you can squeeze a little more juice out of the lemon by not bothering to record the indices of a. (Those indices are just 0 through len(a) - 1 anyway and you can add them back later if needed.)
e2i = {e : i for (i, e) in enumerate(b)}
output = [e2i.get(e) for e in enumerate(a)]
output
# [2, 4, 0, 3, 1]
With len(a) == len(b) == 5000 on my machine this code runs a little better than twice as fast as Björn Lindqvist's code (after I modified his code to store the output rather than print it).

Related

Comparing two lists and making new list

So lets say I have two lists a=[1,2,3,4,5,6] and b=[2,34,5,67,5,6] I want to create a third list which will have 1 where elements are different in a and b and 0 when they are same, so above would be like c=[1,1,1,1,0,0]
You can zip the lists and compare them in a list comprehension. This takes advantage of the fact that booleans are equivalent to 1 and 0 in python:
a=[1,2,3,4,5,6]
b=[2,34,5,67,5,6]
[int(m!=n) for m, n, in zip(a, b)]
# [1, 1, 1, 1, 0, 0]
Try a list comprehension over elements of each pair of items in the list with zip:
[ 0 if i == j else 1 for i,j in zip(a,b) ]
Iterating with a for loop is an option, though list comprehension may be more efficient.
a=[1,2,3,4,5,6]
b=[2,34,5,67,5,6]
c=[]
for i in range(len(a)):
if a[i] == b[i]:
c.append(0)
else:
c.append(1)
print(c)
prints
[1, 1, 1, 1, 0, 0]
If you will have multiple vector operations and they should be fast. Checkout numpy.
import numpy as np
a=[1,2,3,4,5,6]
b=[2,34,5,67,5,6]
a = np.array(a)
b = np.array(b)
c = (a != b).astype(int)
# array([1, 1, 1, 1, 0, 0])
idk if this is exactly what youre loocking for but this should work:
edidt: just found out that Joe Thor commented almost the exact same a few minutes earlier than me lmao
a = [1, 2, 3, 4, 5, 6]
b = [2, 34, 5, 67, 5, 6]
results = []
for f in range(0, len(a)):
if a[f] == b[f]:
results.append(0)
else:
results.append(1)
print(results)
This can be done fairly simply using a for loop. It does assume that both lists, a and b, are the same length. An example code would like something like this:
a = [1,2,3,4,5,6]
b = [2,34,5,67,5,6]
c = []
if len(a) == len(b):
for i in range(0,len(a)):
if(a[i] != b[i]):
c.append(1)
else:
c.append(0)
This can also be done using list comprehension:
a = [1,2,3,4,5,6]
b = [2,34,5,67,5,6]
c = []
if len(a) == len(b):
c = [int(i != j) for i,j in zip(a,b)]
The list comprehension code is from this thread: Comparing values in two lists in Python
a = [1, 2, 3, 4, 5, 6]
b = [2, 34, 5, 67, 5,6]
c = []
index = 0
x = 1
y = 0
for i in range(len(a)): # iterating loop from index 0 till the last
if a[index]!= b[index]: # comapring each index
c.append(x) # if not equal append c with '1'
index += 1 # increment index to move to next index in both lists
else:
c.append(y)
index += 1
print(c)
This should work for two lists of any type.
tstlist = ["w","s","u"]
lstseasons = ["s","u","a","w"]
lstbool_Seasons = [1 if ele in tstlist else 0 for ele in lstseasons]
Output: lstbool_Seasons = [1,1,0,1]
This is the first time I have posted anything, still figuring out how things work here, so please forgive faux pas...

Inserting elements of one list into another list at different positions in python

Consider the two lists:
a=[1,2,3]
and
b=[10,20,30],
and a list of positions
pos=[p1,p2,p3]
giving the positions that the elements of b should take in the final list of 6 elements given by the union of a and b, where p1 is the position of b[0]=10, p2 is the position of b[1]=20 and p3 is the position of b[2]=30.
What is the best python approach to this problem?
You could create the output list by extending it with slices of a and appending the next item of b where needed:
def insert(a, b, positions):
# reorder b and positions so that positions are in increasing order
positions, b = zip(*sorted(zip(positions, b)))
out = []
a_idx = 0
it_b = iter(b)
for pos in positions:
slice_length = pos - len(out)
out.extend(a[a_idx:a_idx + slice_length])
out.append(next(it_b))
a_idx += slice_length
out.extend(a[a_idx:])
return out
An example:
a=[1,2,3]
b=[10,20,30]
pos=[0, 1, 5]
insert(a, b, pos)
# [10, 20, 1, 2, 3, 30]
pos = [0, 2, 4]
insert(a, b, pos)
# [10, 1, 20, 2, 30, 3]
pos=[5, 3, 0]
insert(a, b, pos)
# [30, 1, 2, 20, 3, 10]
If you make the indices and values into a dictionary, you can then loop over the range of the combined lengths. If the index is in the dict, use the value, otherwise take the next value from a:
a = [1,2,3]
b = [10,20,30]
pos =[2,0,5]
p_b = dict(zip(pos, b))
it_a = iter(a)
[p_b[i] if i in p_b else next(it_a) for i in range(len(a) + len(b))]
# [20, 1, 10, 2, 3, 30]
You will need to insure that the lengths of the arrays and the positions all make sense. If they don't you can run out of a values which will produce a StopIteration exception.
You use a defaultdict for similar approach, which simplifies the list comprehension at the expense of a slightly more complicated setup:
from collections import defaultdict
a = [1,2,3]
b = [10,20,30]
pos =[4,0,2]
it_a = iter(a)
d = defaultdict(lambda: next(it_a))
d.update(dict(zip(pos, b)))
[d[i] for i in range(len(a) + len(b))]
# [20, 1, 30, 2, 10, 3]

Sum integer list when next integer is the same value

So I need to have a code that checks one integer, and checks if the integer after it is the same value. If so, it will add the value to x.
input1 = [int(i) for i in str(1234441122)]
x= 0
So my code currently gives the result [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]. I want it to give the result of x = 0+4+4+1+2.
I do not know any way to do that.
The following will work. Zip together adjacent pairs and only take the first elements if they are the same as the second ones:
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> sum(x for x, y in zip(lst, lst[1:]) if x == y)
11
While this should be a little less [space-]efficent in theory (as the slice creates an extra list), it still has O(N) complexity in time and space and is well more readable than most solutions based on indexed access. A tricky way to avoid the slice while still being concise and avoiding any imports would be:
>>> sum((lst[i] == lst[i-1]) * lst[i] for i in range(1, len(lst))) # Py2: xrange
11
This makes use of the fact that lst[i]==lst[i-1] will be cast to 0 or 1 appropriately.
Another way using itertools.groupby
l = [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]
from itertools import groupby
sum(sum(g)-k for k,g in groupby(l))
#11
You can try this:
s = str(1234441122)
new_data = [int(a) for i, a in enumerate(s) if i+1 < len(s) and a == s[i+1]]
print(new_data)
final_data = sum(new_data)
Output:
[4, 4, 1, 2]
11
No need for that list. You can remove the "non-repeated" digits from the string already:
>>> n = 1234441122
>>> import re
>>> sum(map(int, re.sub(r'(.)(?!\1)', '', str(n))))
11
You are simply iterating on string and converting character to integer. You need to iterate and compare to next character.
a = str(1234441122)
sum = 0
for i,j in enumerate(a[:-1]):
if a[i] == a[i+1]:
sum+=int(a[i])
print(sum)
Output
11
Try this one too:
input1 = [int(i) for i in str(1234441122)]
x= 0
res = [input1[i] for i in range(len(input1)-1) if input1[i+1]==input1[i]]
print(res)
print(sum(res))
Output:
[4, 4, 1, 2]
11
Here's a slightly more space efficient version of #schwobaseggl's answer.
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> it = iter(lst)
>>> next(it) # throw away first value
>>> sum(x for x,y in zip(lst, it) if x == y)
11
Alernatively, using an islice from the itertools module is equivalent but looks a bit nicer.
>>> from itertools import islice
>>> sum(x for x,y in zip(lst, islice(lst, 1, None, 1)) if x == y)
11

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?
From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.
As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

How do you calculate the greatest number of repetitions in a list?

If I have a list in Python like
[1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
How do I calculate the greatest number of repeats for any element? In this case 2 is repeated a maximum of 4 times and 1 is repeated a maximum of 3 times.
Is there a way to do this but also record the index at which the longest run began?
Use groupby, it group elements by value:
from itertools import groupby
group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1])
print max(group, key=lambda k: len(list(k[1])))
And here is the code in action:
>>> group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1])
>>> print max(group, key=lambda k: len(list(k[1])))
(2, <itertools._grouper object at 0xb779f1cc>)
>>> group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3])
>>> print max(group, key=lambda k: len(list(k[1])))
(3, <itertools._grouper object at 0xb7df95ec>)
From python documentation:
The operation of groupby() is similar
to the uniq filter in Unix. It
generates a break or new group every
time the value of the key function
changes
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
If you also want the index of the longest run you can do the following:
group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3])
result = []
index = 0
for k, g in group:
length = len(list(g))
result.append((k, length, index))
index += length
print max(result, key=lambda a:a[1])
Loop through the list, keep track of the current number, how many times it has been repeated, and compare that to the most times youve seen that number repeated.
Counts={}
Current=0
Current_Count=0
LIST = [1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
for i in LIST:
if Current == i:
Current_Count++
else:
Current_Count=1
Current=i
if Current_Count>Counts[i]:
Counts[i]=Current_Count
print Counts
If you want it for just any element (i.e. the element with the most repetitions), you could use:
def f((v, l, m), x):
nl = l+1 if x==v else 1
return (x, nl, max(m,nl))
maxrep = reduce(f, l, (0,0,0))[2];
This only counts continuous repetitions (Result for [1,2,2,2,1,2] would be 3) and only records the element with the the maximum number.
Edit: Made definition of f a bit shorter ...
This is my solution:
def longest_repetition(l):
if l == []:
return None
element = l[0]
new = []
lar = []
for e in l:
if e == element:
new.append(e)
else:
if len(new) > len(lar):
lar = new
new = []
new.append(e)
element = e
if len(new) > len(lar):
lar = new
return lar[0]
-You can make new copy of the list but with unique values and a corresponding hits list.
-Then get the Max of hits list and get from it's index your most repeated item.
oldlist = ["A", "B", "E", "C","A", "C","D","A", "E"]
newlist=[]
hits=[]
for i in range(len(oldlist)):
if oldlist[i] in newlist:
hits[newlist.index(oldlist[i])]+= 1
else:
newlist.append(oldlist[i])
hits.append(1);
#find the most repeated item
temp_max_hits=max(hits)
temp_max_hits_index=hits.index(temp_max_hits)
print(newlist[temp_max_hits_index])
print(temp_max_hits)
But I don't know is this the fastest way to do that or there are faster solution.
If you think there are faster or more efficient solution, kindly inform us.
I'd use a hashmap of item to counter.
Every time you see a 'key' succession, increment its counter value. If you hit a new element, set the counter to 1 and keep going. At the end of this linear search, you should have the maximum succession count for each number.
This code seems to work:
l = [1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
previous = None
# value/repetition pair
greatest = (-1, -1)
reps = 1
for e in l:
if e == previous:
reps += 1
else:
if reps > greatest[1]:
greatest = (previous, reps)
previous = e
reps = 1
if reps > greatest[1]:
greatest = (previous, reps)
print greatest
i write this code and working easly:
lst = [4,7,2,7,7,7,3,12,57]
maximum=0
for i in lst:
count = lst.count(i)
if count>maximum:
maximum=count
indexx = lst.index(i)
print(lst[indexx])

Categories