I'm working in Python and I would like to insert a string within another string at a random location. But I would like the choice of random location based on a probability distribution that favors certain locations more than others. Specifically, I want to insert more strings towards beginning of the original string and less towards the end.
For example, if the insertion string is "I go here" and the original string is "this is a test string and it can be long." I want to insert the insertion string at a random location in the original string. But if I do it say 100 times, I would the result "I go here this is a test string and it can be long" to be the result more number of times than "this is a test string and it can be long. I go here". I want to be able to tune the probability distribution.
Any help is appreciated.
You can use the random.gauss() function.
It returns a random number with gaussian distribution.
The function takes two parameters: mean and sigma. mean is the expected mean of the function outputs and sigma is the standard deviation- 'how far the values will be from the mean'.
Try something like this:
import random
original_str_len = 7
mean = 0
sigma= original_str_len # just an example.
r = random.gauss(mean, sigma) # random number- can be negative
r = abs(r)
insertion_index = int(r)
Notice, this code is not perfect but the general idea should work.
I recommend you to read more about the Gaussian Distribution.
Related
For a research project, we are currently using the Gamblers Ruin Algorithm to produce random postfix expressions (terms) using term variables "x", "y", and "z" and an operator "*", as shown in the method below:
def gamblers_ruin_algorithm(prob=0.3,
min_term_length=1,
max_term_length=None):
"""
Generate a random term using the gamblers ruin algorithm
:type prop: float
:param prob: Probability of growing the size of a random term
:type max_term_length: int
:param max_term_length: Maximum length of the generated term
"""
term_variables = ["x", "y", "z"]
substitutions = ("EE*", "I")
term = "E"
term_length = 0
# randomly build a term
while("E" in term):
rand = uniform(0, 1)
if rand < prob or term_length < min_term_length:
index = 0
term_length += 1
else:
index = 1
if (max_term_length is not None and
term_length >= max_term_length):
term = term.replace("E", "I")
break
term = term.replace("E", substitutions[index], 1)
# randomly replace operands
while("I" in term):
term = term.replace("I", choice(term_variables), 1)
return term
The following method will produce a random postfix term like the following:
xyz*xx***zyz*zzx*zzyy***x**x****x**
This issue with this method is that when run thousands of times, it tends to frequently produce duplicate expressions.
Is there a different algorithm for producing random posfix expressions of an arbitrary length that minimizes the probability of producing the same expression more than once?
Your basic problem is not the algorithm; it's the way you force the minimize size of the resulting expression. That procedure introduces an important bias into the generation of the first min_term_length operators. If the expression manages to grow further, that bias will slowly decrease, but it will never disappear.
Until the expression reaches the minimum length, you replace the first E with EE*. So the first few expressions are always:
E
EE*
EE*E*
EE*E*E*
...
When the minimum length is reached, the function starts replacing E with I with probability 1-prob, which is 70% using the default argument. If this succeeds for all the Es, the function will return a tree with the above shape.
Suppose that min_term_length is 5. The probability of five successive tests choosing to not extend the expression is 0.75, or about 16.8%. At that point, the expression will be II*I*I*I*I, and the six Is will be randomly replaced by a variable name. There are three variables, making a total of 36 = 729 different postfix expressions. If you do a large number of samples, the fact that a sixth of the samples fall into 729 possible expressions will certainly create lots of duplicates. That's unnecessary because there are actually 42 possible shapes of a postfix expression with five operands (the fifth Catalan number), so there were actually 30618 possible postfix expressions. If all of those could be produced, you'd expect less than one duplicate in a hundred thousand samples.
Note that the bias introduced by forcing a particular replacement for the first min terms will continue to show up for longer strings as well. For example, If the algorithm happens to expand the string exactly once during the first six steps, which has a probability of about 10%, then it will choose one of six shapes, although there are 132 possibilities. So you can expect duplicates of that size as well, although somewhat fewer.
Instead of forcing a choice when the string is still short, you should let the algorithm just continue until the gambler is ruined or the maximum length occurs. If the gambler is ruined too soon, throw out that sample and start over. That will slow things down a bit, but it's still quite practical. If tossing out so many possibilities annoys you, you could instead pre-generate all possible patterns of the minimum length -- as noted above, if the minimum length is six operators, then that's 132 shapes, which are easy to enumerate -- and select one of those at random as the starting point.
You have four digits: x, y, z, and * such that:
x = 1
y = 2
z = 3
* = 4
So any expression can be expressed as a number using those digits. For example, the postfix expression xy*z* is 12434. And every such number maps to a unique expression.
With this technique, you can map each expression to a unique 32 bit or 64 bit number. And there are many good techniques for generating unique random numbers. See, for example, https://stackoverflow.com/a/34420445/56778.
So:
Generate a bunch of unique random numbers.
For each random number:
Convert to that modified base 5 number.
Generate the expression from that number.
You can of course combine the 3rd and 4th steps. That is, instead of generating a '1', generate an 'x'.
There will of course be some limit on the length of the expressions. Each digit requires two bits to represent, so the maximum length of an expression from a 32 bit number will be 16 characters. You can extend to longer expressions easily enough by generating 64 bit random numbers. Or 128. Or whatever you like. The basic algorithm remains the same.
I want to evaluate the following differential using the values in list y however I cant figure out what I am doing wrong. I am supposed to get 1.9256 for y=5, 1.9956 for y=10 and 2.1356 for y=20. What I'm trying to do is ask the user to input an equation with x as its variable, derive this equation with respect to x, ask the user to input as many values as he wants and evaluate the expression using these inputted values. Thanks for your help.
import sympy as sym
print('Sensitivity: ')
#exp = input('Enter the expression to find the sensitivity: ')
exp='(0.007*(x**2))+(1.8556*x)-1.8307'
#values=list(map(float,input('Enter the values at which you want to compute the sensitivity seperated by spaces: ').split()))
values=[5,10,20]
x=sym.Symbol('x')
differential=str(sym.diff(exp,x))
print(differential)
for i in y:
expression=differential.replace(str(x),str(values))
value=eval(expression)
print('The sensitivity at',i,'is: ',value)
What I believe you intended to write is:
import sympy as sym
exp='(0.007*(x**2))+(1.8556*x)-1.8307'
values=[5,10,20]
x=sym.Symbol('x')
differential=str(sym.diff(exp,x))
print(differential)
for value in values:
expression=differential.replace(str(x),str(value))
result=eval(expression)
print('The sensitivity at',value,'is: ',result)
...which emits as output:
The sensitivity at 5 is: 1.9256
The sensitivity at 10 is: 1.9956
The sensitivity at 20 is: 2.1356
Note the changes:
We're iterating for value in values -- values exists, y does not.
We're assigning the eval result to a separate variable (in this case, result)
This still is not by any means good code. Good code would not do string substitution to substitute values into code. Substituting repr(value) instead of str(value) would be at least somewhat less broken.
guys thanks for having me I've got a question already.
What I wanna do is to get sum of the list without for loop after splitting the given text by math symbols which the text as an example ** (1+3) * 3 ** should obtain the math priority for calculation.
my first question is how to get sum and sub and multiply or divide by the list and then how to check the priority and check it first.
# my calc
a = input()
result = a.split('+')
print(sum(result))
sol1: split brackets and mul /dev earlier sum /sub later but I know that split is not the best way!
sol2: make a tree I don t know what it is lol but mind it
it has answered here I know but with no split
Calculator in python
You could use eval (but be aware that it is usually a bad practice, see this answer):
result = eval(input())
If you input a string like (3-8)*4+5/2, the result will be automatically computed using normal priorities: -17.5.
I am trying to find out how to calculate percentage of my data for different number ranges. So I have a data that looks like this:
0.81761
0.255319
0.359551
0.210191
0.374046
0.188406
0.179487
0.265152
0.207792
0.202614
0.150943
..and I have these ranges:
0-0.3
0.3-0.7
0.7-1
I want to know out of my data, what is the percentage that fall into a specific number range. So, for example:
0-0.3 -> 72.7%
0.3-0.7 -> 18.18%
0.7-1 -> 9.09%
Does anybody knows how to do this calculation?
If you are using COUNTIF and COUNTIFS, you can better refer to each "bin" individually: eg:
If you want to refer to your entire range of "bins" in the same formula, then you can use the FREQUENCY function (at least in Excel), but the formula needs to be entered as an array formula over the results range:
My question is rather complicated for me to explain, as i'm not really good at maths, but i'll try to be as clear as possible.
I'm trying to code a cluster in python, which will generate words given a charset (i.e. with lowercase: aaaa, aaab, aaac, ..., zzzz) and make various operations on them.
I'm searching how to calculate, given the charset and the number of nodes, what range each node should work on (i.e.: node1: aaaa-azzz, node2: baaa-czzz, node3: daaa-ezzz, ...). Is it possible to make an algorithm that could compute this, and if it is, how could i implement this in python?
I really don't know how to do that, so any help would be much appreciated
Any way that you could compute a small integer from the string would be fine for clustering. For example, compute a hash with md5, and look at a byte of it:
import hashlib
s = "aaac"
num_nodes = 5 # or whatever
m = hashlib.md5(s)
node = ord(m.digest()[0]) % num_nodes
print node # prints 2
This won't guarantee to evenly distribute all the strings, but it will be close.
You should be able to treat your words as numerals in a strange base. For example, let's say you have a..z as your charset (26 characters), 4 character strings, and you want to distribute among equally 10 machines. Then there are a total of 26^4 strings, so each machine gets 26^4/10 strings. The first machine will get strings 0 through 26^4/10, the next 26^4/10 through 26^4/5, etc.
To convert the numbers to strings, just write the number in base 26 using your charset as the numbers. So 0 is 'aaaa' and 26^4/10 = 2*26^3 + 15*26^2 + 15*26 +15 is 'cppp'.