How do I generate test data for my Python script? - python

A equation takes values in the following form :
x = [0x02,0x00] # which is later internally converted to in the called function to 0x300
y = [0x01, 0xFF]
z = [0x01, 0x0F]
How do I generate a series of test values for this function ?
for instance I want to send a 100 odd values from a for loop
for i in range(0,300):
# where a,b are derived for a range
x = [a,b]
My question was a bit unclear so please let my clarify.
what I wanted to ask how I can do x =[a,b] generate different values for a,b

use generators:
def gen_xyz( max_iteration ):
for i in xrange( 0, max_iteration ):
# code which will generate next ( x, y, z )
yield ( x, y, z )
for x, y, z in gen_xyz( 1000 ):
f( x, y, z )

The hex() function?
import random
for i in range(10):
a1, a2 = random.randint(1,100), random.randint(1,100)
x = [hex(a1), hex(a2)]
print x
..outputs something similar to..
['0x21', '0x4f']
['0x59', '0x5c']
['0x61', '0x40']
['0x57', '0x45']
['0x1a', '0x11']
['0x4c', '0x49']
['0x40', '0x1b']
['0x1f', '0x7']
['0x8', '0x2b']
['0x1e', '0x13']

Related

I have a simple question about the logic behind this Python code

I'm a beginner in Python, and I'm stuck in a function code.
def max_of_two( x, y ):
if x > y:
return x
return y
def max_of_three( x, y, z ):
return max_of_two( x, max_of_two( y, z ) )
print(max_of_three(30, 25, 50))
Can someone explain to me the logic behind putting the first function (max_of_two()) inside the parameters of the second function (max_of_three())? I've seen a function inside a function code, and that's not a problem, but I've never seen a function inside the parameters of another function... I'm very confused. I know what the code does, it basically shows the greater number. The first function I understood perfectly, but the second one confused me...
x = 1
y = 2
z = 3
max_of_two( y, z )
> 3
max_of_two( x, max_of_two( y, z ) )
# is the same as
max_of_two( x, z )
# is the same as
max_of_two( x, 3 )
The result of the inner function is used as a parameter for the outer function because the inner function is evaluated first.
This is not putting a function inside parameters. First I recommend you understand parameter vs argument, here's a quote from "Parameter" vs "Argument" :
Old post, but another way of saying it: argument is the value/variable/reference being passed in, parameter is the receiving variable used w/in the function/block
def max_of_three( x, y, z ):
return max_of_two( x, max_of_two( y, z ) )
For example, (x, y, z) are parameters of max_of_three, and (y, z) are arguments passed to max_of_two
——————————————————————————————————————————
Then you should understand function calls. max_of_two( y, z ) is an example of a function call, where you call the function max_of_two, by making a function call, you get the return value corresponding to your arguments.
In this case, when you write:
max_of_two( x, max_of_two( y, z ) )
you first get the return value corresponding to (y, z) from max_of_two, and the pass x and the return value above to another max_of_two function, then you return the new return value from max_of_three. This is equivalent to:
retval = max_of_two( y, z )
retval2 = max_of_two( x, retval )
return retval2
It's like a nested if in other languages. You have three arguments to the second function. These are passed to the first function that verifies them in pairs.
If you wanted to use a single function max_of_three(x, y, z) it should look like a succession of if statements with an intermediary variable.
def max_of_three(x,y,z):
if x > y:
temp = x
else:
temp = y
if temp > z:
result = temp
else:
result = z
return result

Test whole range of possible inputs for a given question

Is there a quick way to find the maximum value (float) from a function and the corresponding arguments x, y that are both integers between 0 and 100 (inclusive)? Do I need to use the assert function or something like that to get the range of all possible inputs?
def fun_A(x,y):
import math
if x == y:
return 0
first = math.cos((y%75)*(math.pi/180))
second = math.sin((x%30)*(math.pi/180))
return (first + second) / (abs(x - y))
For small problems like this it is probably fast enough to evaluate every possible combination and choose the maximum. The numpy library makes this easy to write and pretty fast as well:
import numpy as np
def fun_A(x, y):
first = np.cos((y%75)*(np.pi/180))
second = np.sin((x%30)*(np.pi/180))
return np.where(x == y, 0, (first + second) / (abs(x - y)))
x, y = np.mgrid[0:101, 0:101]
f = fun_A(x, y)
maxindex = np.argmax(f)
print('Max =', f.flat[maxindex], ' at x =', x.flat[maxindex], 'y =', y.flat[maxindex])
Output:
Max = 1.4591796850315724 at x = 89 y = 88
Things to note:
I've just replaced calls to math with calls to np.
x and y are matrices which allow us to evaluate every possible combination the two values in one function call.
I would do this for the tan function :
from math import tan
y = 0
x = 0
for x_iteration in range(0, 101):
if tan(x_iteration) > y :
x = x_iteration
y = tan(x_iteration)
x = int(x)
y = int(y)
It's fairly straightforward to write a program to solve this:
max_result = None
max_x = 0
max_y = 0
for x in range(0, 101):
for y in range(0, 101):
result = fun_A(x, y)
if max_result is None or result > max_result:
max_result = result
max_x = x
max_y = y
print(f"x={max_x} and y={max_y} produced the maximum result of {max_result}")

Finding minima of curve inside curve_fit

Summary:
I have a function I want to put through curve_fit that is piecewise, but whose pieces do not have a clean analytical form. I've gotten the process to work by including a somewhat janky for loop, but it runs pretty slowly: 1-2 minutes for relatively sets of data with N=10,000. I'm hoping for advice on how to (a) use numpy broadcasting to speed up the operation (no for loop) or (b) do something totally differently that gets the same type of results, but much faster.
The function z=f(x,y; params) I'm working with is piecewise, monotonically increasing in y until it reaches the maximum value of f, at point y_0, and then saturates and becomes constant. My problem is that the break-point y_0 is not analytic, so requires some optimization. This would be relatively easy, except that I have real data in both x and y, and the break-point is a function of the fitting parameter c. All the data, x, y, and z have instrumentation noise.
Example problem:
The function below has been changed to make it easier to illustrate the problem I'm trying to deal with. Yes, I realize it is analytically solvable, but my real problem is not.
f(x,y; c) =
y*(x-c*y), for y <= x/(2*c)
x**2/(4*c), for y > x/(2*c)
The break-point y_0 = x/(2*c) is found by taking the derivative of f WRT y and solving for the maximum. The maximum f_max=x**2/(4*c) is found by putting y_0 back into f. The problem is is that the break-point is dependent on both the x-value and the fitting parameter c, so I can't compute the breakpoint outside the internal loop.
Code
I have reduced the number of points to ~500 points to allow the code to run in a reasonable amount of time. My real data has >10,000 points.
import numpy as np
from scipy.optimize import curve_fit,fminbound
import matplotlib.pyplot as plt
def function((x,y),c=1):
fxn_xy = lambda x,y: y*(x-c*y)
y_max = np.zeros(len(x)) #create array in which to put max values of y
fxn_max = np.zeros(len(x)) #array to put the results
'''This loop is the first part I'd like to optimize, since it requires
O(1/delta) time for each individual pass through the fitting function'''
for i,X in enumerate(x):
'''X represents specific value of x to solve for'''
fxn_y = lambda y: fxn_xy(X,y)
#reduce function to just one variable (y)
#by inputting given X value for the loop
max_y = fminbound(lambda Y: -fxn_y(Y), 0, X, full_output=True)
y_max[i] = max_y[0]
fxn_max[i] = -max_y[1]
return np.where(y<=y_max,
fxn_xy(x,y),
fxn_max
)
''' Create and plot 'real' data '''
delta = 0.01
xs = [0.5,1.,1.5,2.] #num copies of each X value
y = []
#create repeated x for each xs value. +1 used to match size of y, below
x = np.hstack([X]*(int(X//delta+1)) for X in xs)
#create sweep from 1 to the current value of x, with spacing=delta
y = np.hstack([np.arange(0, X, delta) for X in xs])
z = function((x,y),c=0.75)
#introduce random instrumentation noise to x,y,z
x += np.random.normal(0,0.005,size=len(x))
y += np.random.normal(0,0.005,size=len(y))
z += np.random.normal(0,0.05,size=len(z))
fig = plt.figure(1,(12,8))
axis1 = fig.add_subplot(121)
axis2 = fig.add_subplot(122)
axis1.scatter(y,x,s=1)
#axis1.plot(x)
#axis1.plot(z)
axis1.set_ylabel("x value")
axis1.set_xlabel("y value")
axis2.scatter(y,z, s=1)
axis2.set_xlabel("y value")
axis2.set_ylabel("z(x,y)")
'''Curve Fitting process to find optimal c'''
popt, pcov = curve_fit(function, (x,y),z,bounds=(0,2))
axis2.scatter(y, function((x,y),*popt), s=0.5, c='r')
print "c_est = {:.3g} ".format(popt[0])
The results are plotted below, with "real" x,y,z values (blue) and fitted values (red).
Notes: my intuition is figuring out how to broadcast the x variable s.t. I can use it in fminbound. But that may be naive. Thoughts?
Thanks everybody!
Edit: to clarify, the x-values are not always fixed in groups like that and could instead be swept as the y-values are held steady. Which is unfortunately why I need some way of dealing with x so many times.
There are several things that can be optimised. One problem is the data structure. If I understand the code correctly, you look for the max for all x. However, you made the structure such that the same values are repeated over and over again. Hence, here you waste a lot of computational effort.
I am not sure how difficult the evaluation of f is in reality, but I assume that it is not significantly more costly than the optimization. So in my solution I just calculate the full array, look for the maximum, and change the values coming afterwards.
I guess my code can be optimized as well, but right now it looks like:
import numpy as np
from scipy.optimize import leastsq
import matplotlib.pyplot as plt
def function( yArray, x=1, c=1 ):
out = np.fromiter( ( y * ( x - c * y ) for y in yArray ), np.float )
pos = np.argmax( out )
outMax = out[ pos ]
return np.fromiter( ( x if i < pos else outMax for i, x in enumerate( out ) ), np.float )
def residuals( param, xArray, yList, zList ):
c = param
zListTheory = [ function( yArray, x=X, c=c ) for yArray, X in zip( yList, xArray ) ]
diffList = [ zArray - zArrayTheory for zArray, zArrayTheory in zip( zList, zListTheory ) ]
out = [ item for diffArray in diffList for item in diffArray ]
return out
''' Create and plot 'real' data '''
delta = 0.01
xArray = np.array( [ 0.5, 1., 1.5, 2. ] ) #keep this as parameter
#create sweep from 1 to the current value of x, with spacing=delta
yList = [ np.arange( 0, X, delta ) for X in xArray ] ## as list of arrays
zList = [ function( yArray, x=X, c=0.75 ) for yArray, X in zip( yList, xArray ) ]
fig = plt.figure( 1, ( 12, 8 ) )
ax = fig.add_subplot( 1, 1, 1 )
for y,z in zip( yList, zList ):
ax.plot( y, z )
#introduce random instrumentation noise
yRList =[ yArray + np.random.normal( 0, 0.02, size=len( yArray ) ) for yArray in yList ]
zRList =[ zArray + np.random.normal( 0, 0.02, size=len( zArray ) ) for zArray in zList ]
ax.set_prop_cycle( None )
for y,z in zip( yRList, zRList ):
ax.plot( y, z, marker='o', markersize=2, ls='' )
sol, cov, info, msg, ier = leastsq( residuals, x0=.9, args=( xArray, yRList, zRList ), full_output=True )
print "c_est = {:.3g} ".format( sol[0] )
plt.show()
providing
>> c_est = 0.752
With the original graphs and noisy data being

How to create logarithmic function with base x in python

I want to create a logarithmic function with base x then plot it: y=logx10.
So I use:
y= math.log(10,x)
but it returned an error said: only length-1 array can be converted to Python scalars.
So what is the correct way to create a log function with base x?
The simple way to get a "smoother" line is by increasing the number of points (i.e., make length bigger.)
Also, you likely want to sort your x list before calculating and plotting:
length = 100 # or higher
:
x = sorted([random.uniform(rand_min, rand_max) for r in xrange(length)])
y = [math.log(10, _x) for _x in x]
Since you want 2 lists of values (x, y), you will have to generate the x list first, and use it to generate the y list:
import math
import random
length = 10
rand_min = 0.02
rand_max = 0.91
x = [random.uniform(rand_min, rand_max) for r in xrange(length)]
y = [math.log(10, _x) for _x in x]
Here you have lists x and y, both of length length.

Solving an equation for a variable

How can I get this to give me x = z*y/a ?
from sympy import *
x,y,a,z = symbols('x y a z')
z = a*x/y
solve(z,x) # returns 0!
# would like to get z*y/a
solve(z,x) correctly returns 0 because your code is effectively asking:
What's the value of x that would cause z to become 0?
What you really want to do (as described here) is solve a*x/y==z which can be done as follows:
from sympy import *
x,y,a,z = symbols('x y a z')
equation = a*x/y
new_eq = solve(equation - z, x) # its value is [y*z/a]
Don't assign z = a*x/y, and don't pass z to solve.
solve(expr, symbol) determines what values of symbol will make expr equal 0. If you want to figure out what value of x makes z equal a*x/y, you want z - a*x/y to equal 0:
solve(z - a*x/y, x)
You do not want to assign z = a*x/y. = means something entirely different from equality.
I think the answer to this question can be of help. Applied to your example, this gives:
>>> from sympy import *
>>> x,y,a,z = symbols('x y a z')
>>> l = z
>>> r = a*x/y
>>> solve(l-r,x)
[y*z/a]
As all the other answers points out the solution,
I would like to emphasize on the use of Eq instances here.
An Eq object represents An equal relation between two objects.
For using an Eq object, your code should look something like this:
In []: a, x, y, z = symbols('a, x, y, z')
In []: foo = Eq(z, a*x/y)
In []: solve(foo, x)
Out[]: [y*z/a]

Categories