I have a set of related integers which I need to search for across a huge number of data, and am wondering what's considered to be the most Pythonic or efficient method for going about this.
For example, if I have a list of integers:
query = [1,5,7,8]
And need to find all objects that contain these values:
record_1 = [0,5,7,8,10,11,12]
record_2 = [1,3,5,8,10,13,14]
record_3 = [1,4,5,6,7,8,11]
record_4 = [1,5,6,7,8,10,14]
record_5 = [1,5,8,9,11,13,16]
I know it wouldn't be too difficult to load each record into a larger list and iteratively test each for whether or not they contain all the integers found in the query, but I am wondering if there's a more Pythonic way of doing it, or if there's a more efficient method than testing every value (which will become expensive when scaling).
Thanks in advance!
If the numbers in queries and records are unique I would represent them a sets (or frozensets for better performance). Lets assume you have a list of records and a query:
The filter function is applied onto the list of records. For each record the lambda function is executed to see if it is true. The lambda function checks if query is a subset of the current record. Thus the filtered list contains our results. The result is converted to a list.
query = set([1,5,7,8])
records = [
set([0,5,7,8,10,11,12]),
set([1,3,5,8,10,13,14]),
set([1,4,5,6,7,8,11]),
set([1,5,6,7,8,10,14]),
set([1,5,8,9,11,13,16]),
]
matches = list(filter(lambda r: query.issubset(r), records))
print(matches)
Output:
[{1, 4, 5, 6, 7, 8, 11}, {1, 5, 6, 7, 8, 10, 14}]
Using list map with issubset
for y,x in zip(records,map(lambda x : query.issubset(x),records)):
if x :
print(y)
{1, 4, 5, 6, 7, 8, 11}
{1, 5, 6, 7, 8, 10, 14}
Related
I'm using python language.
The clear algorithm will be enough for me.
I've tried using a dictionary, and counting the existence of each character if it is not in the list.
But I'm not sure if it has the possible less complexity.
Use the in built Counter(list).most_common(n) method, as below.
from collections import Counter
input_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 5, 7, 3, 1]
most_common_values = [value[0] for value in Counter(input_list).most_common(2)]
print(most_common_values)
This outputs: [1, 2].
The advantages to this approach are that it is fast, simple, and returns a list of the items in order. In addition, if their is a 'tie' in value count, it will return the example that appears first, as displayed in the example above.
Use built-int Counter in collection library
I have a list:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
There are multiple if usages that check a number is in the a list.
while True:
if 3 in a:
some_work1 #(different)
if 4 in a:
some_work2 #(different)
if 8 in a:
some_work3 #(different)
if 11 in a:
some_work4 #(different)
if 12 in a:
some_work5 #(different)
Are there any faster (less cpu usage) methods for these multiple if usages? (List a is always same. Also it does not change over iterations.). There is no dublicated items in the a list. Works do not overlap.
Python 3.8.7
Use a set which has constant insert and retrieve times. In comparison, the in operator performs a linear search in your a every check.
I'm not exactly sure what your use-case is without seeing your larger code. I'm assuming your use-case treats a as a list of flags. As such, a set fits the bill.
a = [1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12]
a = set(a) # pass an iterable
# or simply
a = {1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12}
# or built at runtime
a = set()
a.add(1)
a.add(2)
if 3 in a:
some_work1
If you want a more efficient switch statement, you have already found it. Python uses if..elif for this. This ensures each is evaluated in sequence with short-circuit. If you could match multiple outcomes, use a dict (e.g. {3: functor3, 4: functor4, ...}. A functor is a callable, ie it has a __call__() method defined. A lambda also satisfies this.
A set is an unordered collection that does not allow duplicates. It's like a dictionary but with the values removed, leaving just keys. As you know, dictionary keys are unique, and likewise members of a set are unique. Here we just want a set for performance.
option 1
You could use a dictionary whose keys are the number in a and values the corresponding function. Then you loop over them once and store the needed functions in an array (to_call). In the while loop you simply iterate over this array and call its members.
def some_work1():
print("work1")
def some_work2():
print("work2");
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
func = {3:some_work1,4:some_work2}
to_call = []
for k in func:
if k in a:
to_call.append(func[k])
while 1:
for f in to_call:
f();
Option 2
Write some kind of code generator that reads a and generates a .py file containing the function calls.
I have recently created a GUI, which contents tables. User can insert values in cells. As shown figur below.
I want to use values to make some calculation based on values given by user. Rows can be added and removed based on choice of user. with other word, the data I get from user could come from just one row or several rows.
I manage to obtain all values from tables automatically and assign them to python list. Each row gives 5 elements in the list.
I have achieved that. Data in python list have to be processed and organised. This is exactly I want to have help. Because few dags I have been thinking, and I can not figure out, how to continue...
Python data as list from table. as an example.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, '', 11, 12, 13, 14]
What I want to achieve!
I want to split list into 3 times len(data).
I have also achieved this.
def split_seq(seq, num_pieces):
start = 0
for i in range(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
for data in split_seq(data, int(len(data)/5)):
print(data)
Output would be:
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
['', 11, 12, 13, 14]
The difficulty part starts here.
I want to take each splitted list and throw them into an if condition and store values as variables and message those values to an external function.
Something like this below:
for i in range(len(splitted_list1)):
if splitted_list1[0] == '':
do nothing
else:
x_+str(i)= splitted_list1[i]
externalfunc(x_1,x_2,x_3,x_4,x_5)
for i in range(len(splitted_list2)):
if splitted_list2[0] == '':
do nothing
else:
x_+str(i)= splitted_list2[i]
externalfunc(x_1,x_2,x_3,x_4,x_5)
continues depending on number of splitted_lists
..............
I appreciate any help and you are welcome to come with another idea to come around this.
Use one single list and pass that to externalfunc.
The line x_+str(i)= ... is going to be interpreted as x_"0"= ... or whatever number eventually. The function is going to take in a possibly unknown number of variables. You can just group each "variable" into one list and index them based on the number instead. Which you already have. Each one would be in splitted_list0, splitted_list1, etc.
However, you do not need to asynchronously return the different lists. Instead, you can split the lists and put them in one larger list. This is a two-dimensional array. Seems scary but it's just some lists inside another list.
Now to pass each number to the externalfunc you can use each split list and pass it as an argument. Basically resulting in externalfunc(splitted_list0) and so on.
The final code ends up something like the following:
# Split the data into seperate lists:
split_lists = [data[i*(len(data)/3):(i+1)*(len(data)/3)] for i in range(3)]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, '', 11, 12, 13, 14] becomes
# [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], ['', 11, 12, 13, 14]]
# Pass each list to the external function:
for i in range(split_lists):
externalfunc(split_lists[i])
# split_lists[i] will be [1, 2, 3, 4, 5] or [6, 7, 8, 9, 10], etc.
Note that the 3 in the first line of code can be changed to be any number smaller than the length of the list data. This is just the number of lists to split the data into. Remember to change every 3 if it's hardcoded or just add a variable. Finally, the function externalfunc will have the list of 5 numbers as the first and only argument of the function.
def externalfunc(nums):
for n in nums:
# n = each number in the list
# do something with n, write it somewhere, store it, print it.
print(n)
I would like to get the union of two set of frozensets. I'm only interested in the union of frozensets that don't intersect. Another way to look at it is that I'm only interested in unions that have a length equal to the total length of both frozensets combined. Ideally I would like to ignore any frozensets that don't intersect with each other for a massive speedup. I expect many frozensets to have at least one element in common. Here is the code I have so far in python. I would like it to be as fast as possible as I'm working with a large dataset. Each of the frozensets are no more then 20 elements but there will be somewhere around 1,000 total in a set. All numbers will be between 0 and 100. I'm open to converting to other types if it would allow my program to run faster but I don't want any repeated elements and order is not important.
sets1 = set([frozenset([1,2,3]),frozenset([4,5,6]),frozenset([8,10,11])])
sets2 = set([frozenset([8,9,10]),frozenset([6,7,3])])
newSets = set()
for fset in sets1:
for fset2 in sets2:
newSet = fset.union(fset2)
if len(newSet) == len(fset)+len(fset2):
newSets.add(frozenset(newSet))
the correct output is
set(frozenset([1,2,3,8,9,10]),frozenset([4,5,6,8,9,10]),frozenset([8,10,11,6,7,3]))
sets1 = set([frozenset([1,2,3]),frozenset([4,5,6]),frozenset([8,10,11])])
sets2 = set([frozenset([8,9,10]),frozenset([6,7,3])])
union_ = set()
for s1 in sets1:
for s2 in sets2:
if s1.isdisjoint(s2):
union_.add(s1 | s2)
print(union_)
{frozenset({3, 6, 7, 8, 10, 11}), frozenset({1, 2, 3, 8, 9, 10}), frozenset({4, 5, 6, 8, 9, 10})}
In python, set() is an unordered collection with no duplicate elements. However, I am not able to understand how it generates the output.
For example, consider the following:
>>> x = [1, 1, 2, 2, 2, 2, 2, 3, 3]
>>> set(x)
set([1, 2, 3])
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> set(y)
set([8, 1, 6])
>>> z = [1, 1, 6, 6, 6, 6, 6, 7, 7]
>>> set(z)
set([1, 6, 7])
Shouldn't the output of set(y) be: set([1, 6, 8])? I tried the above two in Python 2.6.
Sets are unordered, as you say. Even though one way to implement sets is using a tree, they can also be implemented using a hash table (meaning getting the keys in sorted order may not be that trivial).
If you'd like to sort them, you can simply perform:
sorted(set(y))
which will produce a sorted list containing the set's elements. (Not a set. Again, sets are unordered.)
Otherwise, the only thing guaranteed by set is that it makes the elements unique (nothing will be there more than once).
Hope this helps!
As an unordered collection type, set([8, 1, 6]) is equivalent to set([1, 6, 8]).
While it might be nicer to display the set contents in sorted order, that would make the repr() call more expensive.
Internally, the set type is implemented using a hash table: a hash function is used to separate items into a number of buckets to reduce the number of equality operations needed to check if an item is part of the set.
To produce the repr() output it just outputs the items from each bucket in turn, which is unlikely to be the sorted order.
As +Volatility and yourself pointed out, sets are unordered. If you need the elements to be in order, just call sorted on the set:
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> sorted(set(y))
[1, 6, 8]
Python's sets (and dictionaries) will iterate and print out in some order, but exactly what that order will be is arbitrary, and not guaranteed to remain the same after additions and removals.
Here's an example of a set changing order after a lot of values are added and then removed:
>>> s = set([1,6,8])
>>> print(s)
{8, 1, 6}
>>> s.update(range(10,100000))
>>> for v in range(10, 100000):
s.remove(v)
>>> print(s)
{1, 6, 8}
This is implementation dependent though, and so you should not rely upon it.
After reading the other answers, I still had trouble understanding why the set comes out un-ordered.
Mentioned this to my partner and he came up with this metaphor: take marbles. You put them in a tube a tad wider than marble width : you have a list. A set, however, is a bag. Even though you feed the marbles one-by-one into the bag; when you pour them from a bag back into the tube, they will not be in the same order (because they got all mixed up in a bag).