I have a set of randomized float values that are to be arranged into an array at the end of each loop that produces 67 of them, however, there are 64 total loops.
As an example, if I had 4 values per loop and 3 total loops of integers, I would like it to be like this:
values = [[0, 4, 5, 1],[6, 6, 5, 3],[0,0,0,7]]
such that I could identify them as separate arrays, however, I am unsure of the best way to append the values after they are created, but am aware of how to return them. Forgive me as I am unskilled with the logic.
import math
import random
funcs = []
coord = []
pi = math.pi
funcAmt = 0
coordAmt = 0
repeatAmt = 0
coordPass = 0
while funcAmt < 64:
while coordAmt < 67:
coordAmt += 1
uniform = round(random.uniform(-pi, pi), 2)
print("Coord [",coordAmt,"] {",uniform,"} Func:", funcAmt + 1)
if uniform in coord:
repeatAmt += 1
print("Repeat Found!")
coordAmt -= 1
print("Repeat [",repeatAmt,"] Resolved")
pass
else:
coordPass += 1
coord.append(uniform)
#<<<Append Here>>>
funcAmt += 1
coord.clear()
coordAmt = 0
In my given code above, it would be similar to:
func = [
[<67 items>],
...63 more times
]
Your "append here" logic should append the coordinate list and then clear that list for the next iteration of the outer loop:
funcs.append(coord[:]) # The slice notation makes a copy of the list
coord.clear() # or simply coord = []
You should learn to use a for loop. This will simplify your looping: you don't have to maintain the counts yourself. For instance:
for funcAmt in range(64):
for coordAmt in range(67):
...
You might also look up how to make a "list comprehension", which can reduce your process to a single line of code -- a long, involved line, but readable with proper white space.
Does that get you moving?
There are a couple of ways around this. Instead of using while lists and counters, you could just use for loops. Or at least do that for the outer loop, since it looks like you still want to check for repeats. Here's an example using your original dimensions of 3 and 4:
from math import pi
import random
coord_sets = 3
coords = 4
biglist = []
for i in range(coord_sets):
coords_set = []
non_repeating_coords = 0
while non_repeating_coords < coords:
new_coord = round(random.uniform(-1.0*pi, pi), 2)
if new_coord not in coords_set:
coords_set.append(new_coord)
non_repeating_coords += 1
biglist.append(coords_set)
print(biglist)
You can use sets because they don't allow duplicate values:
from math import pi
import random
funcs = []
funcAmt = 0
while funcAmt < 64: # This is the number of loops
myset = set()
while len(myset) < 67: # This is the length of each set
uniform = round(random.uniform(-pi, pi), 2)
myset.add(uniform)
funcs.append(list(myset)) # Append randomly generated set as a list
funcAmt += 1
print(funcs)
maybe you can benefit from arrays in numpy:
import numpy as np
funcs = np.random.uniform(-np.pi, np.pi, [63, 67])
This creates an array of shape (63, 67) from uniform random between -pi to pi.
Related
Im not sure if I am even asking this question the right way but here goes:
Say I want to create a python list with 20 non-zero integer elements and those elements must sum to 87.
How can I go about this to ensure that the integers chosen minimize the standard deviation of the list as a whole (not sure this is the right metric).
The following code example works, but I'm thinking there must be a better way to do this
import pandas as pd
import numpy as np
target = 87
target_length = 20
starter_series = pd.Series([1 for val in range(target_length)])
while True:
current_sum = starter_series.sum()
if current_sum==target:
break
if target - current_sum > 20:
starter_series += 1
continue
else:
to_be_added = target - current_sum
index_points = np.random.choice(starter_series.index.to_list(), to_be_added, replace=False)
starter_series.loc[index_points] += 1
This simple code should work:
n = 20
s = 87
q,r = divmod(s,n)
l = [q+1]*r + [q]*(n-r)
tmp = []
while len(tmp) == 0:
for i in range(0,385):
# Multiplying by 100 in order to remove the decimal point
if randint(0,10000) < chance*100:
tmp.append(i)
return tmp
This is the code I'm currently using to help clear things up. The output will be something like this.
[34, 234, 243, 321]
My current solution is very inefficient. I tried this:
sample(range(0,385), math.ceil(chance*3.85))
But it doesn't produce the same effect. Also if you could tell me the name of what this is called that would be great.
import numpy as np
n = 385
chance = 0.015 # Chance of 1.5%
main_list = np.arange(n) # Generate initial list
rnd = np.random.uniform(size=main_list.shape) # Generate random number between 0 and 1
sublist = main_list[rnd < chance] # Select numbers
This is a two part question, I have to make a selection of 2 indexes via a random range of any number of integers in a list. Can't return both if they're both in the same range as well
Selection1 = random.randint(0,100)
Selection2 = random.randint(0,100)
For the sake of this argument, say:
Selection1 = 10
Selection2 = 17
And the list would be like so [25, 50, 75, 100]
Both would return the index of 0 because they fall between 0-25
So both would fall into the first index range, the problem is i'm having some issues trying to fit it into this range (IE: 0-25) which will return this first index (return list[0])
What is the syntax for this type of logic in python?
I'm sure I can figure out how to return different indexes if they fall in the same range, probably just loop reset to the loop but if I can get some advice on that it wouldn't hurt.
I'll give the code i'm working with right now as a guideline. Mostly at the bottom is where i'm struggling.
Code Here
def roulette_selection(decimal_list, chromosome_fitness, population):
percentages = []
for i in range(population):
result = decimal_list[i]/chromosome_fitness
result = result * 100
percentages.append(result)
print(percentages)
range_in_fitness = []
current_percent = 0
for i in range(population):
current_percent = percentages[i] + current_percent
range_in_fitness.append(current_percent)
parent1 = random.randint(0, 100)
parent2 = random.randint(0, 100)
for i in range(population):
if parent1 >= range_in_fitness[i] and parent1<=range_in_fitness[i+1]:
print(parent1, parent2)
print(range_in_fitness)
If your list of ranges is sorted, or it is acceptable to sort it, and is contiguous (no gaps), you can use Python's bisect module to do this in an efficient manner. Example:
>>> l = [25, 50, 75, 100]
>>> import bisect
>>> bisect.bisect(l, 10)
0
>>> bisect.bisect(l, 17)
0
>>> bisect.bisect(l, 55)
2
>>> bisect.bisect(l, 25)
1
Bisect returns the index of where the input number should fall into the list to maintain sort order. Note that this is a little confusing to think about at first; In the case of 55 above, it returns 2 because it should be inserted at index 2 as it falls between the current values at indices 1 and 2. If you give it a number exactly on a range boundary, it will 'fall to the right', as evidenced by the bisect(l,25) example.
The linked documentation includes a set of recipes for searching through sorted lists using bisect.
Given an input val and a list of range delimiters delims, here are two approaches:
# Both methods require range_delims to be sorted
range_delims = [25,50,75,100]
# Simple way
def find_range1(val, delims):
for i,d in enumerate(delims):
if val < d: return i
print find_range1(10, range_delims) # 0
print find_range1(17, range_delims) # 0
print find_range1(32, range_delims) # 1
print find_range1(64, range_delims) # 2
print find_range1(96, range_delims) # 3
print find_range1(101, range_delims) # None
# More explicit, possibly unnecessarily so
import math
def find_range2(val, delims):
lbl = [float('-inf')] + delims
ubl = delims + [float('inf')]
for (i,(lb,ub)) in enumerate(zip(lbl, ubl)):
if lb <= val < ub: return i
print find_range2(10, range_delims) # 0
print find_range2(17, range_delims) # 0
print find_range2(32, range_delims) # 1
print find_range2(64, range_delims) # 2
print find_range2(96, range_delims) # 3
print find_range2(101, range_delims) # 4
The first just compares val to the elements of delims and when it finds that val is less than the element, returns the index of that element.
The second is a little more verbose, generating both upper and lower bounds, and ensuring that val is between them. For interior elements of delims the bounds are list elements, for the 2 exterior elements of delims, the bounds are the element and either + or - infinity.
Note: Both approaches require the input list of delimiters to be sorted. There are ways to deal with different delimiter list formats, but it looks like you have a sorted list of delimiters (or could sort it without issue).
Instead of a complete shuffle, I am looking for a partial shuffle function in python.
Example : "string" must give rise to "stnrig", but not "nrsgit"
It would be better if I can define a specific "percentage" of characters that have to be rearranged.
Purpose is to test string comparison algorithms. I want to determine the "percentage of shuffle" beyond which an(my) algorithm will mark two (shuffled) strings as completely different.
Update :
Here is my code. Improvements are welcome !
import random
percent_to_shuffle = int(raw_input("Give the percent value to shuffle : "))
to_shuffle = list(raw_input("Give the string to be shuffled : "))
num_of_chars_to_shuffle = int((len(to_shuffle)*percent_to_shuffle)/100)
for i in range(0,num_of_chars_to_shuffle):
x=random.randint(0,(len(to_shuffle)-1))
y=random.randint(0,(len(to_shuffle)-1))
z=to_shuffle[x]
to_shuffle[x]=to_shuffle[y]
to_shuffle[y]=z
print ''.join(to_shuffle)
This is a problem simpler than it looks. And the language has the right tools not to stay between you and the idea,as usual:
import random
def pashuffle(string, perc=10):
data = list(string)
for index, letter in enumerate(data):
if random.randrange(0, 100) < perc/2:
new_index = random.randrange(0, len(data))
data[index], data[new_index] = data[new_index], data[index]
return "".join(data)
Your problem is tricky, because there are some edge cases to think about:
Strings with repeated characters (i.e. how would you shuffle "aaaab"?)
How do you measure chained character swaps or re arranging blocks?
In any case, the metric defined to shuffle strings up to a certain percentage is likely to be the same you are using in your algorithm to see how close they are.
My code to shuffle n characters:
import random
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
idx = idx[:n]
mapping = dict((idx[i], idx[i-1]) for i in range(n))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
Basically chooses n positions to swap at random, and then exchanges each of them with the next in the list... This way it ensures that no inverse swaps are generated and exactly n characters are swapped (if there are characters repeated, bad luck).
Explained run with 'string', 3 as input:
idx is [0, 1, 2, 3, 4, 5]
we shuffle it, now it is [5, 3, 1, 4, 0, 2]
we take just the first 3 elements, now it is [5, 3, 1]
those are the characters that we are going to swap
s t r i n g
^ ^ ^
t (1) will be i (3)
i (3) will be g (5)
g (5) will be t (1)
the rest will remain unchanged
so we get 'sirgnt'
The bad thing about this method is that it does not generate all the possible variations, for example, it could not make 'gnrits' from 'string'. This could be fixed by making partitions of the indices to be shuffled, like this:
import random
def randparts(l):
n = len(l)
s = random.randint(0, n-1) + 1
if s >= 2 and n - s >= 2: # the split makes two valid parts
yield l[:s]
for p in randparts(l[s:]):
yield p
else: # the split would make a single cycle
yield l
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
mapping = dict((x[i], x[i-1])
for i in range(len(x))
for x in randparts(idx[:n]))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
import random
def partial_shuffle(a, part=0.5):
# which characters are to be shuffled:
idx_todo = random.sample(xrange(len(a)), int(len(a) * part))
# what are the new positions of these to-be-shuffled characters:
idx_target = idx_todo[:]
random.shuffle(idx_target)
# map all "normal" character positions {0:0, 1:1, 2:2, ...}
mapper = dict((i, i) for i in xrange(len(a)))
# update with all shuffles in the string: {old_pos:new_pos, old_pos:new_pos, ...}
mapper.update(zip(idx_todo, idx_target))
# use mapper to modify the string:
return ''.join(a[mapper[i]] for i in xrange(len(a)))
for i in xrange(5):
print partial_shuffle('abcdefghijklmnopqrstuvwxyz', 0.2)
prints
abcdefghljkvmnopqrstuxwiyz
ajcdefghitklmnopqrsbuvwxyz
abcdefhwijklmnopqrsguvtxyz
aecdubghijklmnopqrstwvfxyz
abjdefgcitklmnopqrshuvwxyz
Evil and using a deprecated API:
import random
# adjust constant to taste
# 0 -> no effect, 0.5 -> completely shuffled, 1.0 -> reversed
# Of course this assumes your input is already sorted ;)
''.join(sorted(
'abcdefghijklmnopqrstuvwxyz',
cmp = lambda a, b: cmp(a, b) * (-1 if random.random() < 0.2 else 1)
))
maybe like so:
>>> s = 'string'
>>> shufflethis = list(s[2:])
>>> random.shuffle(shufflethis)
>>> s[:2]+''.join(shufflethis)
'stingr'
Taking from fortran's idea, i'm adding this to collection. It's pretty fast:
def partial_shuffle(st, p=20):
p = int(round(p/100.0*len(st)))
idx = range(len(s))
sample = random.sample(idx, p)
res=str()
samptrav = 1
for i in range(len(st)):
if i in sample:
res += st[sample[-samptrav]]
samptrav += 1
continue
res += st[i]
return res
Is there any efficient way in python to count the times an array of numbers is between certain intervals? the number of intervals i will be using may get quite large
like:
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
some function(mylist, startpoints):
# startpoints = [0,10,20]
count values in range [0,9]
count values in range [10-19]
output = [9,10]
you will have to iterate the list at least once.
The solution below works with any sequence/interval that implements comparision (<, >, etc) and uses bisect algorithm to find the correct point in the interval, so it is very fast.
It will work with floats, text, or whatever. Just pass a sequence and a list of the intervals.
from collections import defaultdict
from bisect import bisect_left
def count_intervals(sequence, intervals):
count = defaultdict(int)
intervals.sort()
for item in sequence:
pos = bisect_left(intervals, item)
if pos == len(intervals):
count[None] += 1
else:
count[intervals[pos]] += 1
return count
data = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
print count_intervals(data, [10, 20])
Will print
defaultdict(<type 'int'>, {10: 10, 20: 9})
Meaning that you have 10 values <10 and 9 values <20.
I don't know how large your list will get but here's another approach.
import numpy as np
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
np.histogram(mylist, bins=[0,9,19])
You can also use a combination of value_counts() and pd.cut() to help you get the job done.
import pandas as pd
mylist = [4,4,1,18,2,15,6,14,2,16,2,17,12,3,12,4,15,5,17]
split_mylist = pd.cut(mylist, [0, 9, 19]).value_counts(sort = False)
print(split_mylist)
This piece of code will return this:
(0, 10] 10
(10, 20] 9
dtype: int64
Then you can utilise the to_list() function to get what you want
split_mylist = split_mylist.tolist()
print(split_mylist)
Output: [10, 9]
If the numbers are integers, as in your example, representing the intervals as frozensets can perhaps be fastest (worth trying). Not sure if the intervals are guaranteed to be mutually exclusive -- if not, then
intervals = [frozenzet(range(10)), frozenset(range(10, 20))]
counts = [0] * len(intervals)
for n in mylist:
for i, inter in enumerate(intervals):
if n in inter:
counts[i] += 1
if the intervals are mutually exclusive, this code could be sped up a bit by breaking out of the inner loop right after the increment. However for mutually exclusive intervals of integers >= 0, there's an even more attractive option: first, prepare an auxiliary index, e.g. given your startpoints data structure that could be
indices = [sum(i > x for x in startpoints) - 1 for i in range(max(startpoints))]
and then
counts = [0] * len(intervals)
for n in mylist:
if 0 <= n < len(indices):
counts[indices[n]] += 1
this can be adjusted if the intervals can be < 0 (everything needs to be offset by -min(startpoints) in that case.
If the "numbers" can be arbitrary floats (or decimal.Decimals, etc), not just integer, the possibilities for optimization are more restricted. Is that the case...?