Creating data in loop subject to moving condition - python

I am trying to create a list of data in a for loop then store this list in a list if it satisfies some condition. My code is
R = 10
lam = 1
proc_length = 100
L = 1
#Empty list to store lists
exponential_procs_lists = []
for procs in range(0,R):
#Draw exponential random variables
z_exponential = np.random.exponential(lam,proc_length)
#Sort values to increase
z_exponential.sort()
#Insert 0 at start of list
z_dat_r = np.insert(z_exponential,0,0)
sum = np.sum(np.diff(z_dat_r))
if sum < 5*L:
exponential_procs_lists.append(z_dat_r)
which will store some of the R lists that satisfies the sum < 5L condition. My question is, what is the best way to store R lists where the sum of each list is less than 5L? The lists can be different length but they must satisfy the condition that the sum of the increments is less than 5*L. Any help much appreciated.

Okay so based on your comment, I take that you want to generate an exponential_procs_list, inside which every sublist has a sum < 5*L.
Well, I modified your code to chop the sublists as soon as the sum exceeds 5*L.
Edit : See answer history to see my last answer for the approach above.
Well looking closer, notice you don't actually need the discrete difference array. You're finding the difference array, summing it up and checking whether the sum's < 5L and if it is, you append the original array.
But notice this:
if your array is like so: [0, 0.00760541, 0.22281415, 0.60476231], it's difference array would be [0.00760541 0.21520874 0.38194816].
If you add the first x terms of the difference array, you get the x+1th element of the original array. So you really just need to keep elements which are lesser than 5L:
import numpy as np
R = 10
lam = 1
proc_length = 5
L = 1
exponential_procs_lists = []
def chop(nums, target):
good_list = []
for num in nums:
if num >= target:
break
good_list.append(num)
return good_list
for procs in range(0,R):
z_exponential = np.random.exponential(lam,proc_length)
z_exponential.sort()
z_dat_r = np.insert(z_exponential,0,0)
good_list = chop(z_dat_r, 5*L)
exponential_procs_lists.append(good_list)
You could probably also just do a binary search(for better time complexity) or use a filter lambda, that's up to you.

Related

Recovering Subsets in Subset Sum Problem - Not All Subsets Appear

Brushing up on dynamic programming (DP) when I came across this problem. I managed to use DP to determine how many solutions there are in the subset sum problem.
def SetSum(num_set, num_sum):
#Initialize DP matrix with base cases set to 1
matrix = [[0 for i in range(0, num_sum+1)] for j in range(0, len(num_set)+1)]
for i in range(len(num_set)+1): matrix[i][0] = 1
for i in range(1, len(num_set)+1): #Iterate through set elements
for j in range(1, num_sum+1): #Iterate through sum
if num_set[i-1] > j: #When current element is greater than sum take the previous solution
matrix[i][j] = matrix[i-1][j]
else:
matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-num_set[i-1]]
#Retrieve elements of subsets
subsets = SubSets(matrix, num_set, num_sum)
return matrix[len(num_set)][num_sum]
Based on Subset sum - Recover Solution, I used the following method to retrieve the subsets since the set will always be sorted:
def SubSets(matrix, num_set, num):
#Initialize variables
height = len(matrix)
width = num
subset_list = []
s = matrix[0][num-1] #Keeps track of number until a change occurs
for i in range(1, height):
current = matrix[i][width]
if current > s:
s = current #keeps track of changing value
cnt = i -1 #backwards counter, -1 to exclude current value already appended to list
templist = [] #to store current subset
templist.append(num_set[i-1]) #Adds current element to subset
total = num - num_set[i-1] #Initial total will be sum - max element
while cnt > 0: #Loop backwards to find remaining elements
if total >= num_set[cnt-1]: #Takes current element if it is less than total
templist.append(num_set[cnt-1])
total = total - num_set[cnt-1]
cnt = cnt - 1
templist.sort()
subset_list.append(templist) #Add subset to solution set
return subset_list
However, since it is a greedy approach it only works when the max element of each subset is distinct. If two subsets have the same max element then it only returns the one with the larger values. So for elements [1, 2, 3, 4, 5] with sum of 10 it only returns
[1, 2, 3, 4] , [1, 4, 5]
When it should return
[1, 2, 3, 4] , [2, 3, 5] , [1, 4, 5]
I could add another loop inside the while loop to leave out each element but that would increase the complexity to O(rows^3) which can potentially be more than the actual DP, O(rows*columns). Is there another way to retrieve the subsets without increasing the complexity? Or to keep track of the subsets while the DP approach is taking place? I created another method that can retrieve all of the unique elements in the solution subsets in O(rows):
def RecoverSet(matrix, num_set):
height = len(matrix) - 1
width = len(matrix[0]) - 1
subsets = []
while height > 0:
current = matrix[height][width]
top = matrix[height-1][width]
if current > top:
subsets.append(num_set[height-1])
if top == 0:
width = width - num_set[height-1]
height -= 1
return subsets
Which would output [1, 2, 3, 4, 5]. However, getting the actual subsets from it seems like solving the subset problem all over again. Any ideas/suggestions on how to store all of the solution subsets (not print them)?
That's actually a very good question, but it seems mostly you got the right intuition.
The DP approach allows you to build a 2D table and essentially encode how many subsets sum up to the desired target sum, which takes time O(target_sum*len(num_set)).
Now if you want to actually recover all solutions, this is another story in the sense that the number of solution subsets might be very large, in fact much larger than the table you built while running the DP algorithm. If you want to find all solutions, you can use the table as a guide but it might take a long time to find all subsets. In fact, you can find them by going backwards through the recursion that defined your table (the if-else in your code when filling up the table). What do I mean by that?
Well let's say you try to find the solutions, having only the filled table at your disposal. The first thing to do to tell whether there is a solution is to check that the element at row len(num_set) and column num has value > 0, indicating that at least one subset sums up to num. Now there are two possibilities, either the last number in num_set is used in a solution in which case we must then check whether there is a subset using all numbers except that last one, which sums up to num-num_set[-1]. This is one possible branch in the recursion. The other one is when the last number in num_set is not used in a solution, in which case we must then check whether we can still find a solution to sum up to num, but having all numbers except that last one.
If you keep going you will see that the recovering can be done by doing the recursion backwards. By keeping track of the numbers along the way (so the different paths in the table that lead to the desired sum) you can retrieve all solutions, but again bear in mind that the running time might be extremely long because we want to actually find all solutions, not just know their existence.
This code should be what you are looking for recovering solutions given the filled matrix:
def recover_sol(matrix, set_numbers, target_sum):
up_to_num = len(set_numbers)
### BASE CASES (BOTTOM OF RECURSION) ###
# If the target_sum becomes negative or there is no solution in the matrix, then
# return an empty list and inform that this solution is not a successful one
if target_sum < 0 or matrix[up_to_num][target_sum] == 0:
return [], False
# If bottom of recursion is reached, that is, target_sum is 0, just return an empty list
# and inform that this is a successful solution
if target_sum == 0:
return [], True
### IF NOT BASE CASE, NEED TO RECURSE ###
# Case 1: last number in set_numbers is not used in solution --> same target but one item less
s1_sols, success1 = recover_sol(matrix, set_numbers[:-1], target_sum)
# Case 2: last number in set_numbers is used in solution --> target is lowered by item up_to_num
s2_sols, success2 = recover_sol(matrix, set_numbers[:-1], target_sum - set_numbers[up_to_num-1])
# If Case 2 is a success but bottom of recursion was reached
# so that it returned an empty list, just set current sol as the current item
if s2_sols == [] and success2:
# The set of solutions is just the list containing one item (so this explains the list in list)
s2_sols = [[set_numbers[up_to_num-1]]]
# Else there are already solutions and it is a success, go through the multiple solutions
# of Case 2 and add the current number to them
else:
s2_sols = [[set_numbers[up_to_num-1]] + s2_subsol for s2_subsol in s2_sols]
# Join lists of solutions for both Cases, and set success value to True
# if either case returns a successful solution
return s1_sols + s2_sols, success1 or success2
For the full solution with matrix filling AND recovering of solutions you can then do
def subset_sum(set_numbers, target_sum):
n_numbers = len(set_numbers)
#Initialize DP matrix with base cases set to 1
matrix = [[0 for i in range(0, target_sum+1)] for j in range(0, n_numbers+1)]
for i in range(n_numbers+1):
matrix[i][0] = 1
for i in range(1, n_numbers+1): #Iterate through set elements
for j in range(1, target_sum+1): #Iterate through sum
if set_numbers[i-1] > j: #When current element is greater than sum take the previous solution
matrix[i][j] = matrix[i-1][j]
else:
matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-set_numbers[i-1]]
return recover_sol(matrix, set_numbers, target_sum)[0]
Cheers!

Efficiently find index of smallest number larger than some value in a large sorted list

If I have a long list of sorted numbers, and I want to find the index of the smallest element larger than some value, is there a way to implement it more efficiently than using binary search on the entire list?
For example:
import random
c = 0
x = [0 for x in range(50000)]
for n in range(50000):
c += random.randint(1,100)
x[n] = c
What would be the most efficient way of finding the location of the largest element in x smaller than some number, z
I know that you can already do:
import bisect
idx = bisect.bisect(x, z)
But assuming that this would be performed many times, would there be an even more efficient way than binary search? Since the range of the list is large, creating a dict of all possible integers uses too much memory. Would it be possible to create a smaller list of say every 5000 numbers and use that to speed up the lookup to a specific portion of the large list?
Can you try if this can be a solution?
It takes long to generate the list, but seems fast to report the result.
Given the list:
import random
limit = 50 # to set the number of elements
c = 0
x = [0 for x in range(limit)]
for n in range(limit):
c += random.randint(1,100)
x[n] = c
print(x)
Since it is a sorted list, you could retrieve the value using a for loop:
z = 1600 # reference for lookup
res = ()
for i, n in enumerate(x):
if n > z:
res = (i, n)
break
print (res) # this is the index and value of the elements that match the condition to break
print(x[res[0]-1]) # this is the element just before

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]
A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)
This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)
You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever
The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

What is fastest way to determine numbers are within specific range of each other in Python?

I have list of numbers as follows -
L = [ 1430185458, 1430185456, 1430185245, 1430185246, 1430185001 ]
I am trying to determine which numbers are within range of "2" from each other. List will be in unsorted when I receive it.
If there are numbers within range of 2 from each other I have to return "1" at exact same position number was received in.
I was able to achieve desired result , however code is running very slow. My approach involves sorting list, iterating it twice taking two pointers and comparing it successively. I will have millions of records coming as seperate lists.
Just trying to see what is best possible approach to address this problem.
Edit - Apology as I was away for a while. List can have any number of elements in it ranging from 1 to n. Idea is to return either 0 or 1 in exact same position number was received. I can not post actual code I implemented but here is pseudo code.
a. create new list as list of list with second part as 0 for each element. We assume that there are no numbers within range of 2 of each other.
[[1430185458,0], [1430185456,0], [1430185245,0], [1430185246,0], [1430185001,0]]
b. sort original list
c. compare first element to second, second to third and so on until end of list is reached and whenever difference is less than or equal to 2 update corresponding second elements in step a to 1.
[[1430185458,1], [1430185456,1], [1430185245,1], [1430185246,1], [1430185001,0]]
The goal is to be fast, so that presumably means an O(N) algorithm. Building an NxN difference matrix is O(N^2), so that's not good at all. Sorting is O(N*log(N)), so that's out, too. Assuming average case O(1) behavior for dictionary insert and lookup, the following is an O(N) algorithm. It rips through a list of a million random integers in a couple of seconds.
def in_range (numbers) :
result = [0] * len(numbers)
index = {}
for idx, number in enumerate(numbers) :
for offset in range(-2,3) :
match_idx = index.get(number+offset)
if match_idx is not None :
result[match_idx] = result[idx] = 1
index[number] = idx
return result
Update
I have to return "1" at exact same position number was received in.
The update to the question asks for a list of the form [[1,1],[2,1],[5,0]] given an input of [1,2,5]. I didn't do that. Instead, my code returns [1,1,0] given [1,2,5]. It's about 15% faster to produce that simple 0/1 list compared to the [[value,in_range],...] list. The desired list can easily be created using zip:
zip(numbers,in_range(numbers)) # Generator
list(zip(numbers,in_range(numbers))) # List of (value,in_range) tuples
I think this does what you need (process() modifies the list L). Very likely it's still optimizable, though:
def process(L):
s = [(v,k) for k,v in enumerate(L)]
s.sort()
j = 0
for i,v_k in enumerate(s):
v = v_k[0]
while j < i and v-s[j][0]>2:
j += 1
while j < i:
L[s[j][1]] = 1
L[s[i][1]] = 1
j += 1

calculate the sum of repeated numbers in a tuple

I have a tuple of 50 numbers having digits 0...9 (with repetition), I want to calculate the sum of each repeated digits and create a new tuple for each repeated digits. how can I do that in python.
(1,2,2,3,4,6,9,1,3,5,6,9,2,2,2,4,6,8,....9)..so sum of each repeated number like sumof2, sumof3...!!! I do know how to proceed.
try using the groupby() function in itertools.
data = (1,2,2,2,3,3,...)
for key, group in groupby(data):
print "The sum of ", key, " is ", sum(list(group))
If you wanted to do this without itertools (because reasons), then the best approach would be to use a 'remembering' variable. (This code could probably be cleaned a little)
sums = []
prev = -1
curr_sum = 0
for element in data:
if element != prev:
if prev > 0:
sums.append(curr_sum)
curr_sum = 0
prev = 0
curr_sum += element
sums.append(curr_sum)
This will leave you with an array of the sums.
OR, with dictionaries even!
sums = {}
for element in data:
sums[element] = data.count(element) * element
# sums[4] = sum of 4s
Maybe collections.Counter might help in this case if I'm reading the question correctly.
From what I understand you want the sum of the repeated elements inside a tuple that is corresponded with the int value?
This is no no means an efficient way of solving this, but hopefully it helps. I found this answer from a different kind of question to help solve yours:
How to count the frequency of the elements in a list? Answered by YOU
from collections import Counter
data = (0,1,2,3,4,5,6,7,8,9,2,2,3,4,5,6,......)
results = ()
test = sorted(data)
counter = Counter(data)
values = counter.values()
keys = counter.keys()
for i in range(0,len(keys)):
results += ((keys[i],values[i]*keys[i]),)
print results

Categories