Python: subset of list as equally distributed as possible? - python

I have a range of possible values, for example:
possible_values = range(100)
I have a list with unsystematic (but unique) numbers within that range, for example:
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
I want to create a new list of length < len(somelist) including a subset of these values but as equally distributed as possible over the range of possible values. For example:
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
Which would then ideally output something like
[33, 77]
So I neither want a random sample nor a sample that chosen from equally spaced integers. I'd like to have a sample based on a distribution (here an uniform distribution) in regard to an interval of possible values.
Is there a function or an easy way to achieve this?

What about the closest values of your subset to certain list's pivots? ie:
def some_function(somelist, length_list, possible_values):
a = min(possible_values)
b = max(possible_values)
chunk_size = (b-a)/(length_list+1)
new_list = []
for i in range(1,length_list+1):
index = a+i*chunk_size
new_list.append(min(somelist, key=lambda x:abs(x-index)))
return new_list
possible_values = range(100)
somelist = [0, 5, 10, 15, 20, 33, 77, 99]
length_newlist = 2
newlist = some_function(somelist, length_newlist, possible_values)
print(newlist)
In any case, I'd also recommend to take a look to numpy's random sampling functions, that could help you as well.

Suppose your range is 0..N-1, and you want a list of K<=N-1 values. Then define an "ideal" list of K values, which would be your desired distribution over this full list (which I am frankly not sure I understand what that would be, but hopefully you do). Finally, take the closest matches to those values from your randomly chosen greater-than-K-length sublist to get your properly distributed K-length random sublist.

I think you should check random.sample(population, k) function. It samples the population in k-length list.

Related

Pythonic way to group values from a list based on values from another list

I have 2 lists:
List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
I want to generate a list of lists such that I group values in list B by determining if they are in between values in list A. So in this example, list B would be grouped into:
[[2,19,23], [26,30,32,34,36]]
Is there any clean way in python to achieve this without multiple nested for loops?
Tried a messy double nested loop structure, was not pleased with how clunky it was (due to lack of readability).
Group List_B according to the index that they would have, if inserted into List_A. The standard library provides functionality in the bisect module to figure out (by using a standard bisection algorithm) where the value would go; it provides functionality in the itertools module to group adjacent values in an input sequence, according to some predicate ("key" function).
This looks like:
from bisect import bisect
from itertools import groupby
List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
groups = groupby(List_B, key=lambda x: bisect(List_A, x))
print([list(group) for key, group in groups])
which gives [[2, 19, 23], [26, 30, 32, 34, 36]] as requested.
bisect.bisect is an alias for bisect.bisect_right; that is, a value in List_B that is equal to a value from List_A will be put at the beginning of a later list. To have it as the end of the previous list instead, use bisect.bisect_left.
bisect.bisect also relies on List_A being sorted, naturally.
itertools.groupby will group adjacent values; it will make separate groups for values that belong in the same "bin" but are separated by values that belong in a different "bin". If this is an issue, sort the input first.
This will be O(N * lg M) where N is the length of List_B and M is the length of List_A. That is: finding a bin takes logarithmic time in the number of bins, and this work is repeated for each value to be binned.
This will not generate empty lists if there is a bin that should be empty; the actual indices into List_A are ignored by the list comprehension in this example.
This is the simplest way I can think of to code it.
result = []
for start, end in zip(List_A, List_A[1:]):
result.append([i for i in List_B if start <= i < end])
It's O(NxM), so not very efficient for large lists.
You could make it more efficient by sorting List_B (I assume List_A is already sorted) and stepping through both of them together, but it will be more complicated.

creating a function for for-loop across specific set of ranges in python

How can you loop across multiple different ranges. for example say i want to run a for loop for specific set of ranges
1 to 2, 4 to 6 and 10 to 13,
I don't know the exact final number so maybe a function might be useful.
The standard method of
for i in range(1,2) + range (4,6) + range (10,13)
I should have this
[1,4,5,6,10,11,13]
My question is, this is not so efficient, if i don't know the total number of ranges am covering, and I can't even create this as a function without so many parameters.
so if i want to have something like this
for i in range(a,b) + range (c,d) + range (e,f), .....
but as a simple function
Can someone help with an efficient way of doing this ?
Simple, though most likely the least efficient solution is to use a nested for loop,
def multi_range_loop(ranges):
all_values = []
for rg in ranges:
for i in range(rg[0], rg[1]):
all_values.append(i)
print(all_values)
my_ranges = [[1,3],[4,7],[10,14]]
multi_range_loop(my_ranges)
You can chain the ranges with itertools:
from itertools import chain
total_range = chain(range(1,2), range(4,6), range(10,13))
print(list(total_range))
#[1, 4, 5, 10, 11, 12]
Here is another solution with a function that returns a generator:
def total_range(lst):
return (i for tup in lst for i in range(tup[0], tup[1]))
list_of_ranges = [(1,2), (4,6), (10,13)]
print(list(total_range(list_of_ranges)))
To have an iterable object from the list of iterable form the different ranges you can use the chain function as in the following response https://stackoverflow.com/a/67547449/1287983.
And in order to have dynamic ranges in a function you can do the following:
def get_ranges(list_ranges):
lower_bounds, upper_bounds = zip(*list_ranges)
return list(chain(*list(map(
range, lower_bounds, upper_bounds))))
get_ranges([(1,2), (4,6), (10,13)])
[1, 4, 5, 10, 11, 12]
The function above returns the list of values. And if you want to iterate efficiently over the resulting values, you just need to return the iterator instead of the list. See the code below.
def get_ranges(list_ranges):
lower_bounds, upper_bounds = zip(*list_ranges)
return chain(*list(map(range, lower_bounds, upper_bounds)))
for val in get_ranges([(1,2), (4,6), (10,13), (2,5)]):
print(val)
1
4
5
10
11
12
2
3
4

Element-wise product of two 2-D lists

I can't use Numpy or any other library function as this is a question I have to do, I have to define my own way.
I am writing a function that takes two lists (2 dimensional) as arguments. The function should calculate the element-wise product of both lists and store them in a third list and return this resultant list from the function.
An example of the input lists are:
list1:
[[2,3,5,6,7],[5,2,9,3,7]]
list2:
[[5,2,9,3,7],[1,3,5,2,2]]
The function prints the following list:
[[10, 6, 45, 18, 49], [5, 6, 45, 6, 14]]
That is 2*5=10, 3*2=6, 5*9=45 ... and so on.
This is my code below, but it is only for a list with 2 lists (elements) inside in it like the example above and works perfectly fine for that, but what I want is to edit my code so that no matter how many number of lists (elements) are there in the 2-D list, it should print out its element-wise product in a new 2-D list e.g. it should also work for
[[5,2,9,3,7],[1,3,5,2,2],[1,3,5,2,2]]
or
[[5,2,9,3,7],[1,3,5,2,2],[1,3,5,2,2],[5,2,9,3,7]]
or any number of lists there are within the whole list.
def ElementwiseProduct(l,l2):
i=0
newlist=[] #create empty list to put prouct of elements in later
newlist2=[]
newlist3=[] #empty list to put both new lists which will have proudcts in them
while i==0:
a=0
while a<len(l[i]):
prod=l[i][a]*l2[i][a] #corresponding product of lists elements
newlist.append(prod) #adding the products to new list
a+=1
i+=1
while i==1:
a=0
while a<len(l[i]):
prod=l[i][a]*l2[i][a] #corresponding product of lists elements
newlist2.append(prod) #adding the products to new list
a+=1
i+=1
newlist3.append(newlist)
newlist3.append(newlist2)
print newlist3
#2 dimensional list example
list1=[[2,3,5,6,7],[5,2,9,3,7]]
list2=[[5,2,9,3,7],[1,3,5,2,2]]
ElementwiseProduct(list1,list2)
You can zip the two lists in a list comprehension, then further zip the resulting sublists and then finally multiply the items:
list2 = [[5,2,9,3,7],[1,3,5,2,2]]
list1 = [[2,3,5,6,7],[5,2,9,3,7]]
result = [[a*b for a, b in zip(i, j)] for i, j in zip(list1, list2)]
print(result)
# [[10, 6, 45, 18, 49], [5, 6, 45, 6, 14]]
Should in case the lists/sublists do not have the same number of elements, itertools.izip_longest can be used to generate fill values such as an empty sublist for the smaller list, or 0 for the shorter sublist:
from itertools import izip_longest
list1 = [[2,3,5,6]]
list2 = [[5,2,9,3,7],[1,3,5,2,2]]
result = [[a*b for a, b in izip_longest(i, j, fillvalue=0)]
for i, j in izip_longest(list1, list2, fillvalue=[])]
print(result)
# [[10, 6, 45, 18, 0], [0, 0, 0, 0, 0]]
You may change the inner fillvalue from 0 to 1 to return the elements in the longer sublists as is, instead of a homogeneous 0.
Reference:
List comprehensions
Here is a function that can handle any type of iterable, nested to any level (any number of dimensions, not just 2):
def elementwiseProd(iterA, iterB):
def multiply(a, b):
try:
iter(a)
except TypeError:
# You have a number
return a * b
return elementwiseProd(a, b)
return [multiply(*pair) for pair in zip(iterA, iterB)]
This function works recursively. For each element in a list, it checks if the element is iterable. If it is, the output element is a list containing the elementwise multiplication of the iterables. If not, the product of the numbers is returned.
This solution will work on mixed nested types. A couple of assumptions that are made here are that all the levels of nesting are the same size, and that an element that is a number in one iterable (vs a nested iterable), is always a number in the other.
In fact, this snippet can be extended to apply any n-ary function to any n iterables:
def elementwiseApply(op, *iters):
def apply(op, *items):
try:
iter(items[0])
except TypeError:
return op(*items)
return elementwiseApply(op, *items)
return [apply(op, *items) for items in zip(*iters)]
To do multiplication, you would use operator.mul:
from operator import mul
list1=[[2,3,5,6,7], [5,2,9,3,7]]
list2=[[5,2,9,3,7], [1,3,5,2,2]]
elementwiseApply(mul, list1, list2)
produces
[[10, 6, 45, 18, 49], [5, 6, 45, 6, 14]]
In Python, it's generally better to loop directly over the items in a list, rather than looping indirectly using indices. It makes the code easier to read as well as more efficient since it avoids the tedious index arithmetic.
Here's how to solve your problem using traditional for loops. We use the built-in zip function to iterate over two (or more) lists simultaneously.
def elementwise_product(list1,list2):
result = []
for seq1, seq2 in zip(list1,list2):
prods = []
for u, v in zip(seq1, seq2):
prods.append(u * v)
result.append(prods)
return result
list1=[[2,3,5,6,7], [5,2,9,3,7]]
list2=[[5,2,9,3,7], [1,3,5,2,2]]
print(elementwise_product(list1,list2))
output
[[10, 6, 45, 18, 49], [5, 6, 45, 6, 14]]
We can use list comprehensions to make that code a lot more compact. It may seem harder to read at first, but you'll get used to list comprehensions with practice.
def elementwise_product(list1,list2):
return [[u*v for u, v in zip(seq1, seq2)]
for seq1, seq2 in zip(list1,list2)]
You could use numpy arrays. They are your best option as they run on a C background and hence are much faster computationally
First, install numpy. Shoot up your terminal (CMD if you're in windows), type
pip install numpy
or, if in Linux, sudo pip install numpy
Then, go on to write your code
import numpy as np
list1=np.array([[2,3,5,6,7],[5,2,9,3,7]]) #2 dimensional list example
list2=np.array([[5,2,9,3,7],[1,3,5,2,2]])
prod = np.multiply(list1,list2)
# or simply, as suggested by Mad Physicist,
prod = list1*list2

Prune a list of combinations based on sets

While this question is formulated using the Python programming language, I believe it is more of a programming logic problem.
I have a list of all possible combinations, i.e.: n choose k
I can prepare such a list using
import itertools
bits_list = list(itertools.combinations(range(n), k))
If 'n' is 100, and `k' is 5, then the length of 'bits_list' will be 75287520.
Now, I want to prune this list, such that numbers appear in groups, or they don't. Let's use the following sets as an example:
Set 1: [0, 1, 2]
Set 2: [57, 58]
Set 3: [10, 15, 20, 25]
Set 4: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Here each set needs to appear in any member of the bits_list together, or not at all.
So far, I only have been able to think of a brute-force if-else method of solving this problem, but the number of if-else conditions will be very large this way.
Here's what I have:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
Now, this only covered Set 1. I would like to do this for many sets. If the length of the set is longer than the value of k, we can ignore the set (for example, k = 5 and Set 4).
Note, that the ultimate aim is to have 'k' iterate over a range, say [5:25] and work on the appended list. The size of the list grows exponentially here and computationally speaking, very expensive!
With 'k' as 10, the python interpreter interrupts the process before completion on any average laptop with 16 GB RAM. I need to find a solution that fits in the memory of a relatively modern server (not a cluster or a server farm).
Any help is greatly appreciated!
P.S.: Intuitively, think of this problem as generating all the possible cases for people boarding a public bus or train system. Usually, you board an entire group or you don't board anyone.
UPDATE:
For the given sets above, if k = 5, then a valid member of bits_list would be [0, 1, 2, 57, 58], i.e.: a combination of Set1 and Set2. If k = 10, then we could have built Set1 + Set2 + Set3 + NoSetElement as a possible member. #DonkeyKong's solution made me realize I haven't mentioned this explicitly in my question.
I have a lot of sets; I intend to use enough sets to prune the full list of combinations such that the bits_list eventually fits into memory.
#9000's suggestion is perfectly valid here, that during each iteration, I can save the combinations as actual bits.
This still gets crushed by a memory error (which I don't see how you're getting away from if you insist on a list) at a certain point (around n=90, k=5), but it is much faster than your current implementation. For n=80 and k=5, my rudimentary benchmarking had my solution at 2.6 seconds and yours around 52 seconds.
The idea is to construct the disjoint and subset parts of your filter separately. The disjoint part is trivial, and the subset part is calculated by taking the itertools.product of all disjoint combinations of length k - set_len and the individual elements of your set.
from itertools import combinations, product, chain
n = 80
k = 5
set1 = {0,1,2}
nots = set(range(n)) - set1
disj_part = list(combinations(nots, k))
subs_part = [tuple(chain(x, els)) for x, *els in
product(combinations(nots, k - len(set1)), *([e] for e in set1))]
full_l = disj_part + subs_part
If you actually represented your bits as bits, that is, 0/1 values in a binary representation of an integer n bits long with exactly k bits set, the amount of RAM you'd need to store the data would be drastically smaller.
Also, you'd be able to use bit operations to look check if all bits in a mask are actually set (value & mask == mask), or all unset (value | ~mask == value).
The brute-force will probably take shorter that the time you'd spend thinking about a more clever algorithm, so it's totally OK for a one-off filtering.
If you must execute this often and quickly, and your n is in small hundreds or less, I'd rather use cython to describe the brute-force algorithm efficiently than look at algorithmic improvements. Modern CPUs can efficiently operate on 64-bit numbers; you won't benefit much from not comparing a part of the number.
OTOH if your n is really large, and the number of sets to compare to is also large, you could partition your bits for efficient comparison.
Let's suppose you can efficiently compare a chunk of 64 bits, and your bit lists contain e.g. 100 chunks each. Then you can do the same thing you'd do with strings: compare chunk by chunk, and if one of the chunks fails to match, do not compare the rest.
A faster implementation would be to replace the if and all() statements in:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
with python's set operations isdisjoint() and issubset() operations.
bits_generator = (set(x) for x in itertools.combinations(range(n), k))
first_set = set([0,1,2])
filter_bits = (x for x in bits_generator
if x.issubset(first_set) or
x.isdisjoint(first_set))
answer_for_first_set = list(filter_bits)
I can keep going using generators and with generators you won't run out of memory but you will be waiting and hastening the heat death of the universe. Not because of python's runtime or other implementation details but because there are some problems that are just not feasible even in computer time if you pick a large N and K values.
Based on the ideas from #Mitch's answer, I created a solution with a slightly different thinking than originally presented in the question. Instead of creating the list (bits_list) of all combinations and then pruning those combinations that do not match the sets listed, I built bits_list from the sets.
import itertools
all_sets = [[0, 1, 2], [3, 4, 5], [6, 7], [8], [9, 19, 29], [10, 20, 30],
[11, 21, 31], [12, 22, 32], ...[57, 58], ... [95], [96], [97]]
bits_list = [list(itertools.chain.from_iterable(x)) for y in [1, 2, 3, 4, 5]
for x in itertools.combinations(all_sets, y)]
Here, instead of finding n choose k, and then looping for all k, and finding combinations which match the sets, I started from the sets, and even included the individual members as sets by themselves and therefore removing the need for the 2 components - the disjoint and the subset parts - discussed in #Mitch's answer.

"[float(n)-50 for n in range(100)]" - what does this do?

I stumbled upon this one-liner:
[float(n)-50 for n in range(100)]
Could somebody tell me what it does? It's supposed to return a float value for a vector.
Best,
Marius
That's a list comprehension that reads "create a list of 100 elements such that for each element at index n, set that element equal to n-50".
It's a list comprehension:
List comprehensions provide a concise way to create lists. Common
applications are to make new lists where each element is the result of
some operations applied to each member of another sequence or
iterable, or to create a subsequence of those elements that satisfy a
certain condition.
For example, assume we want to create a list of squares, like:
>>> squares = []
>>> for x in range(10):
... squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We can obtain the same result with:
squares = [x**2 for x in range(10)]
This is also equivalent to squares = map(lambda x: x**2, range(10)),
but it’s more concise and readable.
It means the same as:
[float(x) for x in range(-50, 50)]
Or (at least in Python 2):
map(float, range(-50, 50))
which are self-explanatory if you know how list comprehensions or the map function work: They transform the integer range -50...50 into a list of floats (the upper 50 is exclusive). The result is the list:
[-50.0, -49.0 ... 49.0]

Categories