Optimize search to find next matching value in a list

Optimize search to find next matching value in a list - python

I have a program that goes through a list and for each objects finds the next instance that has a matching value. When it does it prints out the location of each objects. The program runs perfectly fine but the trouble I am running into is when I run it with a large volume of data (~6,000,000 objects in the list) it will take much too long. If anyone could provide insight into how I can make the process more efficient, I would greatly appreciate it.
def search(list):
original = list
matchedvalues = []
count = 0
for x in original:
targetValue = x.getValue()
count = count + 1
copy = original[count:]
for y in copy:
if (targetValue == y.getValue):
print (str(x.getLocation) + (,) + str(y.getLocation))
break

Perhaps you can make a dictionary that contains a list of indexes that correspond to each item, something like this:
values = [1,2,3,1,2,3,4]
from collections import defaultdict
def get_matches(x):
my_dict = defaultdict(list)
for ind, ele in enumerate(x):
my_dict[ele].append(ind)
return my_dict
Result:
>>> get_matches(values)
defaultdict(<type 'list'>, {1: [0, 3], 2: [1, 4], 3: [2, 5], 4: [6]})
Edit:
I added this part, in case it helps:
values = [1,1,1,1,2,2,3,4,5,3]
def get_next_item_ind(x, ind):
my_dict = get_matches(x)
indexes = my_dict[x[ind]]
temp_ind = indexes.index(ind)
if len(indexes) > temp_ind + 1:
return(indexes)[temp_ind + 1]
return None
Result:
>>> get_next_item_ind(values, 0)
1
>>> get_next_item_ind(values, 1)
2
>>> get_next_item_ind(values, 2)
3
>>> get_next_item_ind(values, 3)
>>> get_next_item_ind(values, 4)
5
>>> get_next_item_ind(values, 5)
>>> get_next_item_ind(values, 6)
9
>>> get_next_item_ind(values, 7)
>>> get_next_item_ind(values, 8)

There are a few ways you could increase the efficiency of this search by minimising additional memory use (particularly when your data is BIG).
you can operate directly on the list you are passing in, and don't need to make copies of it, in this way you won't need: original = list, or copy = original[count:]
you can use slices of the original list to test against, and enumerate(p) to iterate through these slices. You won't need the extra variable count and, enumerate(p) is efficient in Python
Re-implemented, this would become:
def search(p):
# iterate over p
for i, value in enumerate(p):
# if value occurs more than once, print locations
# do not re-test values that have already been tested (if value not in p[:i])
if value not in p[:i] and value in p[(i + 1):]:
print(e, ':', i, p[(i + 1):].index(e))
v = [1,2,3,1,2,3,4]
search(v)
1 : 0 2
2 : 1 2
3 : 2 2
Implementing it this way will only print out the values / locations where a value is repeated (which I think is what you intended in your original implementation).
Other considerations:
More than 2 occurrences of value: If the value repeats many times in the list, then you might want to implement a function to walk recursively through the list. As it is, the question doesn't address this - and it may be that it doesn't need to in your situation.
using a dictionary: I completely agree with Akavall above, dictionary's are a great way of looking up values in Python - especially if you need to lookup values again later in the program. This will work best if you construct a dictionary instead of a list when you originally create the list. But if you are only doing this once, it is going to cost you more time to construct the dictionary and query over it than simply iterating over the list as described above.
Hope this helps!

Related

Indexing a list with an unique index

I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].
This is my code:
a = list(set(l))
res = [a.index(x) for x in l]
Which turns out to be very slow.
l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?

You can do this in O(N) time using a defaultdict and a list comprehension:
>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
In Python 3 use __next__ instead of next.
If you're wondering how it works?
The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.
d at the end will look like this:
>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})

The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.
The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.
Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.
res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)

Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:
[1,1,2,3,1,2]
then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you
import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
and for this example the output of y is:
y = [0,0,2,1,0,2]
I tested this for 1,000,000 entries and it was done essentially immediately.

Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.
To make it O(n), get rid of index() and store indexes in a dictionary:
>>> idx, indexes = 1, {}
>>> for x in l:
... if x not in indexes:
... indexes[x] = idx
... idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.

You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)

For completness, you can also do it eagerly:
from itertools import count
wordid = dict(zip(set(list_), count(1)))
This uses a set to obtain the unique words in list_, pairs
each of those unique words with the next value from count() (which
counts upwards), and constructs a dictionary from the results.
Original answer, written by nneonneo.

How to find second smallest UNIQUE number in a list?

I need to create a function that returns the second smallest unique number, which means if
list1 = [5,4,3,2,2,1], I need to return 3, because 2 is not unique.
I've tried:
def second(list1):
result = sorted(list1)[1]
return result
and
def second(list1):
result = list(set((list1)))
return result
but they all return 2.
EDIT1:
Thanks guys! I got it working using this final code:
def second(list1):
b = [i for i in list1 if list1.count(i) == 1]
b.sort()
result = sorted(b)[1]
return result
EDIT 2:
Okay guys... really confused. My Prof just told me that if list1 = [1,1,2,3,4], it should return 2 because 2 is still the second smallest number, and if list1 = [1,2,2,3,4], it should return 3.
Code in eidt1 wont work if list1 = [1,1,2,3,4].
I think I need to do something like:
if duplicate number in position list1[0], then remove all duplicates and return second number.
Else if duplicate number postion not in list1[0], then just use the code in EDIT1.

Without using anything fancy, why not just get a list of uniques, sort it, and get the second list item?
a = [5,4,3,2,2,1] #second smallest is 3
b = [i for i in a if a.count(i) == 1]
b.sort()
>>>b[1]
3
a = [5,4,4,3,3,2,2,1] #second smallest is 5
b = [i for i in a if a.count(i) == 1]
b.sort()
>>> b[1]
5
Obviously you should test that your list has at least two unique numbers in it. In other words, make sure b has a length of at least 2.

Remove non unique elements - use sort/itertools.groupby or collections.Counter
Use min - O(n) to determine the minimum instead of sort - O(nlongn). (In any case if you are using groupby the data is already sorted) I missed the fact that OP wanted the second minimum, so sorting is still a better option here
Sample Code
Using Counter
>>> sorted(k for k, v in Counter(list1).items() if v == 1)[1]
1
Using Itertools
>>> sorted(k for k, g in groupby(sorted(list1)) if len(list(g)) == 1)[1]
3

Here's a fancier approach that doesn't use count (which means it should have significantly better performance on large datasets).
from collections import defaultdict
def getUnique(data):
dd = defaultdict(lambda: 0)
for value in data:
dd[value] += 1
result = [key for key in dd.keys() if dd[key] == 1]
result.sort()
return result
a = [5,4,3,2,2,1]
b = getUnique(a)
print(b)
# [1, 3, 4, 5]
print(b[1])
# 3

Okay guys! I got the working code thanks to all your help and helping me to think on the right track. This code works:
`def second(list1):
if len(list1)!= len(set(list1)):
result = sorted(list1)[2]
return result
elif len(list1) == len(set(list1)):
result = sorted(list1)[1]
return result`

Okay, here usage of set() on a list is not going to help. It doesn't purge the duplicated elements. What I mean is :
l1=[5,4,3,2,2,1]
print set(l1)
Prints
[0, 1, 2, 3, 4, 5]
Here, you're not removing the duplicated elements, but the list gets unique
In your example you want to remove all duplicated elements.
Try something like this.
l1=[5,4,3,2,2,1]
newlist=[]
for i in l1:
if l1.count(i)==1:
newlist.append(i)
print newlist
This in this example prints
[5, 4, 3, 1]
then you can use heapq to get your second largest number in your list, like this
print heapq.nsmallest(2, newlist)[-1]
Imports : import heapq, The above snippet prints 3 for you.
This should to the trick. Cheers!

PYTHON - creating multiple lists in a loop [duplicate]

This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 7 years ago.
I'm trying to put together a programming project at my university, but I'm stuck. The "math" part I've got covered, but I need help with lists and loops in Python.
I'm working with graphs, let me show one example, to make it clearer:
3 node graph
As you can see, I have 3 nodes in it and 2 edges, nothing fancy. I need to compute the closest route between each pair of nodes, and I have that covered already. Now I need to put it either in n lists, n elements each, or in a n x n, so either:
a = [0, 1, 2]
b = [1, 0, 1]
c = [2, 1, 0]
or a table (or matrix?) like this:
table 3x3, where I only really need the part with the white background.
When I read data to work on, I put each node into a list, as a new element, so with these 3 steps I gradually get:
lw = []
lw = ['a']
lw = ['a','b']
lw = ['a','b','c']
Is there any way, to create empty lists out of elemnts of lw? I would really like to name them like dis_a, dis_b, dis_c etc. I tried to do it with a dictionary, but then I couldn't manipulate with those lists, like I usually do. And truth be told, I'd much prefer the solution using lists, as I already have a latter part of the program written for this.
EDIT:
Ok, so one of you asked for input/output to make my question clearer. Mmy input is:
lw = ['a','b','c']
my desired output is:
a = []
b = []
c = []
or something like that (lists that can be easly identified with the nodes, that I have listed in lw)
EDIT2:
Ok, so now I have a number of lists created like this (still sticking to the example):
dis['a']
dis['b']
dis['c']
I have a working command
path[X][Y]
which on input takes the names of the nodes (as in lw list ex. 'a'), and on input returns a list of
nodes on the shortest path from X to Y. What I need to be doing now, is to take the length of it with
len(path[X][Y])
And substract 1 from it (it's counting the "starting" point as well, so it's correcting it). Then I have
to put this number in a corresponding place in a list. I would like to do it in a loop, so it would
automatically append numbers to existing lists, so I would automatically get from
dis['a'] = []
dis['b'] = []
dis['c'] = []
to
dis['a'] = [0, 1, 2]
dis['b'] = [1, 0, 1]
dis['c'] = [2, 1, 0]
Don't worry, about it calculating the path twice (ex. from a to b and then from b to a), It doesn't have to be perfect ;) I tried to create such method, but I have no idea, how to store the results in said lists (or if I'm even correct). Here is my proposition:
def lo():
for i in range(0, len(lw)):
for j in range (0, len(lw)):
dis[i].append(path[lw[i]][lw[j]])

What couldn't you manipulate with a dictionary?
dis_a, dis_b, dis_c... could as well be dis['a'], dis['b'] and dis['c']...
You could do:
dis = {}
for elem in lw:
dis[elem] = []
# ...
Want to loop over all dis-es now?
for elem, val in dis.iteritems():
print elem # your a, b, or c
print val # your [] corresponding to a, b, or c
If you want to add elements, and find min and max (as requested in the comments):
Let's say initially dis['a'] gets [0, 1, 2]
and ...
dis['b'] = [1, 0, 1]
dis['c'] = [2, 1, 0]
and now you want to add a new item to all of the dis-es...
item = 5
for elem in dis:
dis[elem].append(item)
and now you want to find the max ...
for elem in dis:
print max(dis[elem])

If I'm correct you want to dynamically create variables.
Here is short answer, for more information checkout this post How can you dynamically create variables via a while loop? Actually there you can find different approaches like using a dict.
>>> lw = ['a', 'b', 'c']
>>> for l in lw:
exec("{prefix}{var}=[]".format(prefix="dis_", var=l))
>>> dis_a
[]
>>> dis_b.append('some')
>>> dis_b
['some']

Find non-common elements in lists

I'm trying to write a piece of code that can automatically factor an expression. For example,
if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].
From this post: How to find list intersection?
I see that the common elements can be found by
set([1,2,3,4]&set([2,3,5]).
Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?
I can go ahead and do a for loop:
lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
for elem in eachList:
if elem not in common:
nonCommon += elem
But this seems redundant and inefficient. Does Python provide any handy function that can do that? Thanks in advance!!

Use the symmetric difference operator for sets (aka the XOR operator):
>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])

Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference().
EXAMPLE
list_one = [1,2,3,4]
list_two = [2,3,5]
one_not_two = set(list_one).difference(list_two)
# set([1, 4])
two_not_one = set(list_two).difference(list_one)
# set([5])
This could also be written as:
one_not_two = set(list_one) - set(list_two)
Timing
I ran some timing tests on both and it appears that .difference() has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.
Other Notes
It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. Most of the previous answers return a single list or set that include all of the items.
There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.
If the OP wants to maintain duplicates, a list comprehension could be used, for example:
one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]
...which is roughly the same solution as posed in the original question, only a little cleaner. This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.

You can use Intersection concept to deal with this kind of problems.
b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}
Best thing about using this code is it works pretty fast for large data also. I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.

You can use the .__xor__ attribute method.
set([1,2,3,4]).__xor__(set([2,3,5]))
or
a = set([1,2,3,4])
b = set([2,3,5])
a.__xor__(b)

You can use symmetric_difference command
x = {1,2,3}
y = {2,3,4}
z = set.difference(x,y)
Output will be : z = {1,4}

This should get the common and remaining elements
lis1=[1,2,3,4,5,6,2,3,1]
lis2=[4,5,8,7,10,6,9,8]
common = list(dict.fromkeys([l1 for l1 in lis1 if l1 in lis2]))
remaining = list(filter(lambda i: i not in common, lis1+lis2))
common = [4, 5, 6]
remaining = [1, 2, 3, 2, 3, 1, 8, 7, 10, 9, 8]

All the good solutions, starting from basic DSA style to using inbuilt functions:
# Time: O(2n)
def solution1(arr1, arr2):
map = {}
maxLength = max(len(arr1), len(arr2))
for i in range(maxLength):
if(arr1[i]):
if(not map.get(arr1[i])):
map[arr1[i]] = [True, False]
else:
map[arr1[i]][0] = True
if(arr2[i]):
if(not map.get(arr2[i])):
map[arr2[i]] = [False, True]
else:
map[arr2[i]][1] = False
res = [];
for key, value in map.items():
if(value[0] == False or value[1] == False):
res.append(key)
return res
def solution2(arr1, arr2):
return set(arr1) ^ set(arr2)
def solution3(arr1, arr2):
return (set(arr1).difference(arr2), set(arr2).difference(arr1))
def solution4(arr1, arr2):
return set(arr1).__xor__(set(arr2))
print(solution1([1,2,3], [2,4,6]))
print(solution2([1,2,3], [2,4,6]))
print(solution3([1,2,3], [2,4,6]))
print(solution4([1,2,3], [2,4,6]))

Python Values in Lists

I am using Python 3.0 to write a program. In this program I deal a lot with lists which I haven't used very much in Python.
I am trying to write several if statements about these lists, and I would like to know how to look at just a specific value in the list. I also would like to be informed of how one would find the placement of a value in the list and input that in an if statement.
Here is some code to better explain that:
count = list.count(1)
if count > 1
(This is where I would like to have it look at where the 1 is that the count is finding)
Thank You!

Check out the documentation on sequence types and list methods.
To look at a specific element in the list you use its index:
>>> x = [4, 2, 1, 0, 1, 2]
>>> x[3]
0
To find the index of a specific value, use list.index():
>>> x.index(1)
2
Some more information about exactly what you are trying to do would be helpful, but it might be helpful to use a list comprehension to get the indices of all elements you are interested in, for example:
>>> [i for i, v in enumerate(x) if v == 1]
[2, 4]
You could then do something like this:
ones = [i for i, v in enumerate(your_list) if v == 1]
if len(ones) > 1:
# each element in ones is an index in your_list where the value is 1
Also, naming a variable list is a bad idea because it conflicts with the built-in list type.
edit: In your example you use your_list.count(1) > 1, this will only be true if there are two or more occurrences of 1 in the list. If you just want to see if 1 is in the list you should use 1 in your_list instead of using list.count().
You can use list.index() to find elements in the list besides the first one, but you would need to take a slice of the list starting from one element after the previous match, for example:
your_list = [4, 2, 1, 0, 1, 2]
i = -1
while True:
try:
i = your_list[i+1:].index(1) + i + 1
print("Found 1 at index", i)
except ValueError:
break
This should give the following output:
Found 1 at index 2
Found 1 at index 4

First off, I would strongly suggest reading through a beginner’s tutorial on lists and other data structures in Python: I would recommend starting with Chapter 3 of Dive Into Python, which goes through the native data structures in a good amount of detail.
To find the position of an item in a list, you have two main options, both using the index method. First off, checking beforehand:
numbers = [2, 3, 17, 1, 42]
if 1 in numbers:
index = numbers.index(1)
# Do something interesting
Your other option is to catch the ValueError thrown by index:
numbers = [2, 3, 17, 1, 42]
try:
index = numbers.index(1)
except ValueError:
# The number isn't here
pass
else:
# Do something interesting
One word of caution: avoid naming your lists list: quite aside from not being very informative, it’ll shadow Python’s native definition of list as a type, and probably cause you some very painful headaches later on.

You can find out in which index is the element like this:
idx = lst.index(1)
And then access the element like this:
e = lst[idx]
If what you want is the next element:
n = lst[idx+1]
Now, you have to be careful - what happens if the element is not in the list? a way to handle that case would be:
try:
idx = lst.index(1)
n = lst[idx+1]
except ValueError:
# do something if the element is not in the list
pass

list.index(x)
Return the index in the list of the first item whose value is x. It is an error if there is no such item.
--
In the docs you can find some more useful functions on lists: http://docs.python.org/tutorial/datastructures.html#more-on-lists
--
Added suggestion after your comment: Perhaps this is more helpful:
for idx, value in enumerate(your_list):
# `idx` will contain the index of the item and `value` will contain the value at index `idx`

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimize search to find next matching value in a list - python

Related

Indexing a list with an unique index

How to find second smallest UNIQUE number in a list?

PYTHON - creating multiple lists in a loop [duplicate]

Find non-common elements in lists

Python Values in Lists

Categories

Resources