Randomnly select numbers from a list with a condition - python

I have a list(range(30000)). I need to randomnly select numbers inside it such that, it takes a certain count of numbers say 'n' in total at the same time it should take 'k' number of values between two index positions.
For example:
a = [1,2,3,4,5,,6,7,8......20,21,22......88,89.....30k]
I need to select 5000 numbers in total. 'x' numbers between 0th index to 100th index, 'y' numbers between 100 to 200 etc.
There is no hard rule that it should select 5000 numbers itself. But it should sure lie between 0-100, 100-200 etc.
I saw random.choice but how to give it a condition
To put the question accurately: I need 50 numbers from 0-100,200-300 etc.

Here's one way to approach this:
import random
# define the sample size of each interval
# or for your specific case generate a sequence
# that adds up to 5000
size = [5, 2, 10]
n = 300
step = 100
# iterate over both a range (or sequence) and size
# and take random samples accordingly
out = [random.sample(list(range(i-step,i)), k)
for i, k in zip(range(step, n+step, step), size)]
print(out)
[[6, 86, 96, 62, 53], [115, 176], [245, 259, 297, 249, 225, 281, 264, 274, 275, 206]]

This is a simple script that does exactly what you asked for.
import random
a = 0
b=100
l = list()
for z in range(149):
a = a+200
b = b+200
for x in range(50):
l.append(random.randint(a,b))

Related

Julia vs Python implementation for https://projecteuler.net/problem=2. This was very strange to me and I don't know where I am messing up in Julia

Python implementation
import numpy as np
# Function to generate the Fibonacci series
def fibonacci_fun(fib_series_fun, num_terms_fun):
# Initialize the first value in series
first_num_fun = 1
# Append the first value to series
fib_series_fun.append(first_num_fun)
# Initialize the value preceeding the first number; technically always 0 but 1 in this problem
num_bef_start_fun = 1
# Add the first_num to the num_bef_start value and append it
next_num_fun = first_num_fun + num_bef_start_fun
fib_series_fun.append(next_num_fun)
# While we do not have num_terms number of elements in Fibonacci series, repeat
while len(fib_series_fun) != num_terms_fun:
# Add the previous 2 numbers of the series to itself
new_num_fun = fib_series_fun[len(fib_series_fun)-2] + fib_series_fun[len(fib_series_fun)-1];
fib_series_fun.append(new_num_fun)
return fib_series_fun
# Main function
def main():
# Define the number of terms in Fibonacci series
num_terms = 25
# Define empty array of Int16 to store Fibonacci sereis values
fib_series = []
# Run the fibonacci_fun to get the Fibonacci series
fib_series = fibonacci_fun(fib_series, num_terms)
print(fib_series)
fib_series = np.array(fib_series)
# Values less than 4 million
values_less_than = fib_series[fib_series < 4000000]
# Get only the even values in the Fibonacci series
even_values_less_than_fib_series = values_less_than[values_less_than % 2 == 0]
print(even_values_less_than_fib_series)
sum_of_fib_series = sum(even_values_less_than_fib_series)
print("The sum of even values in the Fibonacci series is %d.\n" %sum_of_fib_series)
print("Process finished!")
main()
Julia Implementation
using Printf
# Function to generate the Fibonacci series
function fibonacci_fun(fib_series_fun, num_terms_fun)
# Initialize the first value in series
first_num_fun = 1
# Append the first value to series
append!(fib_series_fun, first_num_fun)
# Initialize the value preceeding the first number; technically always 0 but 1 in this problem
num_bef_start_fun = 1
# Add the first_num to the num_bef_start value and append it
next_num_fun = first_num_fun + num_bef_start_fun
append!(fib_series_fun, next_num_fun)
# While we do not have num_terms number of elements in Fibonacci series, repeat
while length(fib_series_fun) != num_terms_fun
# Add the previous 2 numbers of the series to itself
new_num_fun = fib_series_fun[length(fib_series_fun)-1] + fib_series_fun[length(fib_series_fun)];
append!(fib_series_fun, new_num_fun)
end
return fib_series_fun
end
# Main function
function main()
# Define the number of terms in Fibonacci series
num_terms = 25
# Define empty array of Int16 to store Fibonacci sereis values
global fib_series = Int16[]
# Run the fibonacci_fun to get the Fibonacci series
fib_series = fibonacci_fun(fib_series, num_terms)
println(fib_series)
# Values less than 4 million
values_less_than = fib_series[fib_series .< 4000000]
# Get only the even values in the Fibonacci series
even_values_less_than_fib_series = values_less_than[values_less_than .% 2 .== 0]
println(even_values_less_than_fib_series)
sum_of_fib_series = sum(even_values_less_than_fib_series)
#printf("The sum of even values in the Fibonacci series is %d.\n", sum_of_fib_series)
println("Process finished!")
end
main()
Python Implementation Output:
[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393]
[ 2 8 34 144 610 2584 10946 46368]
The sum of even values in the Fibonacci series is 60696.
Julia Implementation Output:
Int16[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, -19168, 9489, -9679]
Int16[2, 8, 34, 144, 610, 2584, 10946, -19168]
The sum of even values in the Fibonacci series is -4840.
Process finished!
TLDR: You are using a number that exceeds the upper limit of Int16, which causes it to "loop back through" with the remainder, returning a negative number.
Julia has Concrete Typing(like Int16), and Abstract Typing(like Int). A general rule is that you should use the Concrete typed versions for data storage, as its more efficient(which is what it is being used for in your code- so that's good). The difference being that Int will act normally no matter the input size, and Int16 limits itself to a certain number of bits, so it has bounds. The bounds for Int16 are (-2^15 , 2^15 - 1).
The values from 46368 and up are above the bounds, so they act wonky.
So you could solve your problem by changing:
global fib_series = Int64[]
or, if you wanted to:
global fib_series = Int[]

How to group approximately adjacent list

I have a list that has approximately adjacent.
x=[10,11,13,70,71,73,170,171,172,174]
I need to separate this into lists which has minimum deviation (i.e)
y=[[10,11,13],[70,71,73],[170,171,172,174]]
You can see in y list grouped into 3 separate lists and break this list when meeting huge deviation.
Can you give me a tip or any source to solve this?
the zip function is your friend when you need to compare items of a list with their successor or predecessor:
x=[10,11,13,70,71,73,170,171,172,174]
threshold = 50
breaks = [i for i,(a,b) in enumerate(zip(x,x[1:]),1) if b-a>threshold]
groups = [x[s:e] for s,e in zip([0]+breaks,breaks+[None])]
print(groups)
[[10, 11, 13], [70, 71, 73], [170, 171, 172, 174]]
breaks will contain the index (i) of elements (b) that are greater than their predecessor (a) by more than the treshold value.
Using zip() again allows you to pair up these break indexes to form start/end ranges which you can apply to the original list to get your groupings.
Note that i used a fixed threshold to detect a "huge" deviation, but you can use a percentage or any formula/condition of your choice in place of if b-a>threshold. If the deviation calculation is complex, you will probably want to make a deviates() function and use it in the list comprehension: if deviates(a,b) so that it remains intelligible
If zip() and list comprehensions are too advanced, you can do the same thing using a simple for-loop:
def deviates(a,b): # example of a (huge) deviation detection function
return b-a > 50
groups = [] # resulting list of groups
previous = None # track previous number for comparison
for number in x:
if not groups or deviates(previous, number):
groups.append([number]) # 1st item or deviation, add new group
else:
groups[-1].append(number) # approximately adjacent, add to last group
previous = number # remember previous value for next loop
Something like this should do the trick:
test_list = [10, 11, 13, 70, 71, 73, 170, 171, 172, 174]
def group_approximately_adjacent(numbers):
if not numbers:
return []
current_number = numbers.pop(0)
cluster = [current_number]
clusters = [cluster]
while numbers:
next_number = numbers.pop(0)
if is_approximately_adjacent(current_number, next_number):
cluster.append(next_number)
else:
cluster = [next_number]
clusters.append(cluster)
current_number = next_number
return clusters
def is_approximately_adjacent(a, b):
deviation = 0.25
return abs(a * (1 + deviation)) > abs(b) > abs(a * (1 - deviation))

How to make sure that a list of generated numbers follow a uniform distribution

I have a list of 150 numbers from 0 to 149. I would like to use a for loop with 150 iterations in order to generate 150 lists of 6 numbers such that,t in each iteration k, the number k is included as well as 5 different random numbers. For example:
S0 = [0, r1, r2, r3, r4, r5] # r1, r2,..., r5 are random numbers between 0 and 150
S1 = [1, r1', r2', r3', r4', r5'] # r1', r2',..., r5' are new random numbers between 0 and 150
...
S149 = [149, r1'', r2'', r3'', r4'', r5'']
In addition, the numbers in each list have to be different and with a minimum distance of 5. This is the code I am using:
import random
import numpy as np
final_list = []
for k in range(150):
S = [k]
for it in range(5):
domain = [ele for ele in range(150) if ele not in S]
d = 0
x = k
while d < 5:
d = np.Infinity
x = random.sample(domain, 1)[0]
for ch in S:
if np.abs(ch - x) < d:
d = np.abs(ch - x)
S.append(x)
final_list.append(S)
Output:
[[0, 149, 32, 52, 39, 126],
[1, 63, 16, 50, 141, 79],
[2, 62, 21, 42, 35, 71],
...
[147, 73, 38, 115, 82, 47],
[148, 5, 78, 115, 140, 43],
[149, 36, 3, 15, 99, 23]]
Now, the code is working but I would like to know if it's possible to force that number of repetitions that each number has through all the iterations is approximately the same. For example, after using the previous code, this plot indicates how many times each number has appeared in the generated lists:
As you can see, there are numbers that have appeared more than 10 times while there are others that have appeared only 2 times. Is it possible to reduce this level of variation so that this plot can be approximated as a uniform distribution? Thanks.
First, I am not sure that your assertion that the current results are not uniformly distributed is necessarily correct. It would seem prudent to me to try and examine the histogram over several repetitions of the process, rather than just one.
I am not a statistician, but when I want to approximate uniform distribution (and assuming that the functions in random provide uniform distribution), what I try to do is to simply accept all results returned by random functions. For that, I need to limit the choices given to these functions ahead of calling them. This is how I would go about your task:
import random
import numpy as np
N = 150
def random_subset(n):
result = []
cands = set(range(N))
for i in range(6):
result.append(n) # Initially, n is the number that must appear in the result
cands -= set(range(n - 4, n + 5)) # Remove candidates less than 5 away
n = random.choice(list(cands)) # Select next number
return result
result = np.array([random_subset(n) for n in range(N)])
print(result)
Simply put, whenever I add a number n to the result set, I take out of the selection candidates, an environment of the proper size, to ensure no number of a distance of less than 5 can be selected in the future.
The code is not optimized (multiple set to list conversions) but it works (as per my uderstanding).
You can force it to be precisely uniform, if you so desire.
Apologies for the mix of globals and locals, this seemed the most readable. You would want to rewrite according to how variable your constants are =)
import random
SIZE = 150
SAMPLES = 5
def get_samples():
pool = list(range(SIZE)) * SAMPLES
random.shuffle(pool)
items = []
for i in range(SIZE):
selection, pool = pool[:SAMPLES], pool[SAMPLES:]
item = [i] + selection
items.append(item)
return items
Then you will have exactly 5 of each (and one more in the leading position, which is a weird data structure).
>>> set(collections.Counter(vv for v in get_samples() for vv in v).values())
{6}
The method above does not guarantee the last 5 numbers are unique, in fact, you would expect ~10/150 to have a duplicate. If that is important, you need to filter your distribution a little more and decide how well you value tight uniformity, duplicates, etc.
If your numbers are approximately what you gave above, you also can patch up the results (fairly) and hope to avoid long search times (not the case for SAMPLES sizes closer to OPTIONS size)
def get_samples():
pool = list(range(SIZE)) * SAMPLES
random.shuffle(pool)
i = 0
while i < len(pool):
if i % SAMPLES == 0:
seen = set()
v = pool[i]
if v in seen: # swap
dst = random.choice(range(SIZE))
pool[dst], pool[i] = pool[i], pool[dst]
i = dst - dst % SAMPLES # Restart from swapped segment
else:
seen.add(v)
i += 1
items = []
for i in range(SIZE):
selection, pool = pool[:SAMPLES], pool[SAMPLES:]
assert len(set(selection)) == SAMPLES, selection
item = [i] + selection
items.append(item)
return items
This will typically take less than 5 passes through to clean up any duplicates, and should leave all arrangements satisfying your conditions equally likely.

Find the value of an index and the n-closest neighbors in a NumPy array

I wondered if anyone could tell me how to find the index of a number and the indexes of the n-closest neighbors in a NumPy array.
For example, In this array, I would like to find the index of the value 87, and its four closest neighbors 86, 88 to the left and 78, 43 to the right.
a = np.random.randint(1,101,15)
array([79, 86, 88, 87, 78, 43, 57])
If you want to change the values from time to time, although this would be expensive for large arrays, should do the trick:
a = np.array([79, 86, 88, 87, 78, 43, 57])
number = 87
n_nearest = 4
index = np.where(a == number)[0][0] # np.where returns (array([3]),)
a = a[max(0, index - n_nearest // 2):index + n_nearest // 2 + 1]
nearests = np.delete(a, n_nearest // 2)
print(nearests)
Output: [86 88 78 43]
First, find the index of the value you get the neighbors (may not work with duplicate values though).
You should do max(0, index - 2) in case of the value you want may be at the beginning of array (position 0 or 1).
Then, delete the number from the result. The rest of it will be the neighbors you want.
I had a go at it, with the caveat that I'm not hugely experienced with python or numpy - only a couple of months
(...so I'd also look for someone else to chip in a much cleaner/simpler/better method!)
from functools import reduce
import operator
a = np.array([5, 10, 15, 12, 88, 86, 5, 87, 1,2,3, 87,1,2,3])
look_for = 87
# find indicies where a == 87:
np.nonzero(a == look_for)
# get as interable
np.nonzero(a == look_for)[0]
# put into list comprehension with the delta in indicies you want and the values
# from above inside 'range' to generate a range of values b/w index-delta:index+delta,
# then wrap it into a list to generate the list from the range iterator:
delta = 2
[list(range(i-delta,i+delta+1)) for i in np.nonzero(a==87)[0]]
# above gives you a list of lists, need to flatten into a single list
reduce(operator.concat, [list(range(i-delta,i+delta+1)) for i in np.nonzero(a==87)[0]])
# want only unique values, so one way is to turn above into a set
set(reduce(operator.concat, [list(range(i-delta,i+delta+1)) for i in np.nonzero(a==87)[0]]))
# the above gives you a set with all the indicies, with only unique values.
# one remaning problem is it still could have values < 0 or > a.size, so
# you'd now want to wrap it all into another list comprehension to get rid of
# any values < 0 or > a.size

Faster way of searching than nested loops in python

I'm trying to find the optimal order for least expected cost for an array.
The input is:
input = [[390, 185, 624], [686, 351, 947], [276, 1023, 1024], [199, 148, 250]]
This is an array of four choices, the first number being a cost and the second two being the probability of getting the result, the first ([i][1]) of which is the numerator and the second ([i][2]) is the denominator.
The goal is to find the optimal order of these value/probability pairs that will provide the result at the least total cost.
def answer(input):
from itertools import permutations
length = len(input)
best_total = 999
for combination in permutations(input):
# print combination
total = 0.0
for i in range(0, length):
current_value = 1.0
for j in range(0, i):
current_value = current_value * (1.0 - \
(float(combination[j][1]) / float(combination[j][2])))
total = total + (float(combination[i][0]) * current_value)
if total > best_total:
i = length
# print total
if total <= best_total:
best_total = total
best_combination = combination
answer = map(input.index, best_combination)
return answer
Running:
print answer(input)
should return
[2, 3, 0, 1]
for the given input.
This is obviously an exhaustive search, which becomes very slow very quickly with more than four choices. I've considered binary search trees as the input for those is very similar, however I can't figure out how to implement it.
I've been working on this for four days and can't seem to come up with fast version that works for any input (assuming positive costs and probabilities).
This isn't for homework or anything, just a puzzle I've been trying to figure out.
I would determine the value of each case in the original array, store these values, and then sort the list. This is in python 3 so I don't know if that affects you.
Determining the value of each case in the original array and storing them:
inputA = [[390, 185, 624], [686, 351, 947], [276, 1023, 1024], [199, 148, 250]]
results = []
for idx,val in enumerate(inputA):
results.append((val[0]*val[1]/val[2], idx))
Sorting the list, extracting positions:
l = lambda t:t[1]
print(list(map(l,sorted(results,reverse=True))))
Iterating over the list is O(n), and the sort is O(nlogn). Map/list/print iterates over it again for O(n) so performance should be O(nlogn).

Categories