Python Machine Learning - Figure out Mathematical Equation for expected outcome

Python Machine Learning - Figure out Mathematical Equation for expected outcome - python

Maybe I don't want machine learning and I might just be looking for a term to find some working examples, Basically, I have 'x' values and I want to figure out what formula(s) have a greater than 75% success rate overall the values given.
Let's say, for example, I have these values
1 2 3 6
7 9 1 63
10 1 2 20
9 3 3 33
What I'm trying to develop is an algorithm that will basically permiuntate all values in [0],[1],[2] by all basic math functions ( + / * - ) and end up with [3]. So I know if the first 3 values are multiplied [0][1][2] = [3] and since [3] fails it has hit my 75% rate.

This actually feels like a rare reasonable use for eval:
from itertools import product
from statistics import mean
from typing import Sequence
def build_expression(ops: Sequence[str], values: Sequence[int]) -> str:
return " ".join(
c for t in zip(map(str, values), ops) for c in t
) + " " + str(values[-1])
def apply_ops(ops: Sequence[str], values: Sequence[int]) -> int:
"""apply_ops(['+', '-'], [1, 2, 3]) == 1 + 2 - 3"""
assert len(ops) + 1 == len(values)
return eval(build_expression(ops, values))
def score_op_accuracy(
ops: Sequence[str],
data: Sequence[Sequence[int]]
) -> float:
assert all(len(ops) + 2 == len(values) for values in data)
return mean(
100.0
if apply_ops(ops, values[:-1]) == values[-1] else
0.0
for values in data
)
data = [
[1, 2, 3, 6],
[7, 9, 1, 63],
[10, 1, 2, 20],
[9, 3, 3, 33],
]
possible_ops = list(product(*([list("+-*/%")] * 2)))
valid_ops = [
ops for ops in possible_ops if score_op_accuracy(ops, data) >= 75.0
]
for ops in valid_ops:
for values in data:
left = values[:-1]
right = values[-1]
result = apply_ops(ops, left)
equals = "==" if right == result else "!="
print(f"{build_expression(ops, left)} {equals} {result}")
print("-" * 20)
prints:
1 * 2 * 3 == 6
7 * 9 * 1 == 63
10 * 1 * 2 == 20
9 * 3 * 3 != 81
--------------------

def plus(x,y):
return x+y
def minus(x,y):
return x-y
def times_by(x,y):
return x*y
def divided_by(x,y):
return x/y
def find_solution(l):
x,y,z,targ = l
for func1 in [plus, minus, times_by, divided_by]:
for func2 in [plus, minus, times_by, divided_by]:
if func2(func1(x,y), z) == targ :
print(f'({x} {func1.__name__} {y}) {func2.__name__} {z} = {targ}')
if func1(x, func2(y,z)) == targ:
print(f'{x} {func1.__name__} ({y} {func2.__name__} {z}) = {targ}')
find_solution([7, 9, 1, 63])
output:
(7 times_by 9) times_by 1 = 63
7 times_by (9 times_by 1) = 63
(7 times_by 9) divided_by 1 = 63
7 times_by (9 divided_by 1) = 63
I'll leave some edge cases and errors (dividing by zero, notably) to you

I'm not 100% sure if this is what you want to do, but here's what I came up with
df
0 1 2 3
0 1 2 3 6
1 7 9 1 63
2 10 1 2 20
3 9 3 3 33
Code
from itertools import permutations
for index, data in df.iterrows():
t = []
for p in permutations(data[0:3]):
t.extend([
p[0] + p[1] + p[2],
p[0] - p[1] - p[2],
p[0] / p[1] / p[2],
p[0] * p[1] * p[2]])
if np.mean([x==data[3] for x in t[0::4]]) >=.75:
print(data.values, 'sum')
elif np.mean([x==data[3] for x in t[1::4]]) >=.75:
print(data.values, 'subtract')
elif np.mean([x==data[3] for x in t[2::4]]) >=.75:
print(data.values, 'divide')
elif np.mean([x==data[3] for x in t[4::4]]) >=.75:
print(data.values, 'multiply')
Output
[1 2 3 6] sum

It's maybe a linear regression problem. Through linear regression, you can get a formula similar to the following:
y = a * x1 + b * x2 + c * x3 + d
a, b, c and d are the parameters that need to be learned.
First, copy your data to test.txt:
Then, read and train:
# Import libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
# Read data to a DataFrame
df = pd.read_csv('test.txt', header=None, sep=' ')
df.columns = list('ABCD')
# Train a linear regression model
reg = LinearRegression()
reg.fit(df[['A', 'B', 'C']], df['D'])
# Get score, i.e. accuracy
print(reg.score(df[['A', 'B', 'C']], df['D'])) # 1.0
# Get model's prediction
print(reg.predict(df[['A', 'B', 'C']])) # [ 6. 63. 20. 33.]
# Get parameters
print(reg.coef_, reg.intercept_) # [2.54761905 6.61904762 2.30952381] -16.71428571428573
So, the final formula is:
y = 2.54761905 * x1 + 6.61904762 * x2 + 2.30952381 * x3 - 16.71428571428573
If you have more data, this model will be more generalized.

Related

Is there a way to insert an arbitrary symbol before the Python output value?

I'm calculating the matrix value with Python, but I want to distinguish the value of equtaion, is there a way?
x - y - 2z = 4
2x - y - z = 2
2x +y +4z = 16
I want to make the expression above like this when I print out the matrix from the function I created
1 -1 -2 | 4
2 -1 -1 | 2
2 1 4 | 16
Same as the rref result of this
1 0 0 | 24
0 1 0 | 72
0 0 1 | -26
def showMatrix():
print("\n")
for i in sd:
for j in i:
print(j, end="\t")
print("\n")
def getone(pp):
for i in range(len(sd[0])):
if sd[pp][pp] != 1:
q00 = sd[pp][pp]
for j in range(len(sd[0])):
sd[pp][j] = sd[pp][j] / q00
def getzero(r, c):
for i in range(len(sd[0])):
if sd[r][c] != 0:
q04 = sd[r][c]
for j in range(len(sd[0])):
sd[r][j] = sd[r][j] - ((q04) * sd[c][j])
sd = [
[1, 1, 2, 9],
[2, 4, -3, 1],
[3, 6, -5, 0]
]
showMatrix()
for i in range(len(sd)):
getone(i)
for j in range(len(sd)):
if i != j:
getzero(j, i)
showMatrix()
print("FiNAL result")
showMatrix()

Here is a function which takes a list of 4 numbers and returns a string representing an equation in x,y,z. It handles coefficients which are negative, zero, or +/-1 appropriately:
def make_equation(nums):
coefficients = nums[:3]
variables = 'xyz'
terms = []
for c,v in zip(coefficients,variables):
if c == 0:
continue
elif c == 1:
coef = ''
elif c == -1:
coef = '-'
else:
coef = str(c)
terms.append(coef + v)
s = ' + '.join(terms)
s = s.replace('+ -','- ')
return s + ' = ' + str(nums[3])
Typical example:
make_equation([2,-3,1,6])
With output:
'2x - 3y + z = 6'

limiting data set to be used xlim

I have lots of files that contain x, y, yerr columns. I read them and save and apply a change on the x values, then I would like to set a limit on the x values I will use afterwards which are the newxval:
for key, value in files_data.items():
file_short_name = key
D_value_sale = value[1]
data = pd.DataFrame(value[0])
if data.shape[1] == 3:
data.columns = ["x", "y", "yerr"]
else:
data.columns = ["x", "y"]
D = D_value_sale
b = 111
c = 222
data["newx"] = -c*(((data.x*(1/(1+D)))-b)/b)
data["newy"] = (data.y-data.y.min())/(data.y.max()-data.y.min())
w = data[(data.newx < 20000) & (data.newx > 8000)]
dfx = w["newx"]
dfy = w["newy"]
peak = GaussianModel()
pars = offset.make_params(c=np.median(dfy))
pars += peak.guess(dfy, x= dfy, amplitude=-0.5)
result = model.fit(dfy, pars, dfx)

If I'm understanding correctly what you are asking this is what you could do:
for key, value in files_data.items():
file_short_name = key
# main = value[1]
data = pd.DataFrame(value[0])
if data.shape[1] == 3:
data.columns = ["x", "y", "yerr"]
else:
# Here you should define what happens in case
# the data isn't what you expected it to be
data["newx"] = data.x + 1 # Perform whatever transformation you need
# data["newy"] = data.y * (1.01234) # Etc.
# Then you can limit the newx column by doing:
data[(data.newx < upper_limit) & (data.newx > lower_limit)]
What you're doing won't work if you want to preserve the relationship between columns. When you assign the data columns to their own variables xval, yval and error you are implicitely "losing" their relationship.

I'll open with the same caveat of "if I'm understanding you correctly" then the crux of what you are looking for is the boolean array that you have created to apply your limits:
data = data[(data[0] >= xlim[0]) & (data[0] <= xlim[1])]
This boolean array can be saved and applied to any array of the same shape.
idx = (data[0] >= xlim[0]) & (data[0] <= xlim[1])
filtered_data = data[0][idx]
filtered_newxval = newxval[idx]
By way of a more complete and independent example, see the code below where this concept can be applied to multidimensional arrays and pandas dataframes
import numpy as np
import pandas as pd
np.random.seed(42)
x = np.random.randint(0, 20, 10)
y = np.random.randint(0, 20, 10)
print("x", x)
# >>> x [ 6 19 14 10 7 6 18 10 10 3]
print("y", y)
# >>> y [ 7 2 1 11 5 1 0 11 11 16]
xmin = 3
xmax = 17
idx = (x >= xmin) & (x <= xmax)
data = np.vstack((x, y))
print("filtered_data:\n", data[:, idx])
# >>> filtered_data:
# [[ 6 14 10 7 6 10 10 3]
# [ 7 1 11 5 1 11 11 16]]
df = pd.DataFrame({"x": x, "y": y})
df["xnew"] = df["x"] * 2
print(df[idx])
# >>> x y xnew
# >>> 0 6 7 12
# >>> 2 14 1 28
# >>> 3 10 11 20
# >>> 4 7 5 14
# >>> 5 6 1 12
# >>> 7 10 11 20
# >>> 8 10 11 20
# >>> 9 3 16 6

Given an array find element pairs whose sum is equal to the given sum and return the sum of their indices

Hey guys as you've read in the question i am trying to find the element pairs in an array equal to the given sum and return the sum of their respective indices.
I was able to return the element pairs for the given sum but failed to return the sum of their indices. Here is my code:
arr = [1, 4, 2, 3, 0 , 5]
sum = 7
x = min(arr)
y = max(arr)
while x < y:
if x + y > sum:
y -= 1
elif x + y < sum:
x += 1
else:
print("(", x, y, ")")
x += 1
My output:
( 2 5 )
( 3 4 )
This is what i need to do further:
2 + 5 = 7 → Indices 2 + 5 = 7;
3 + 4 = 7 → Indices 3 + 1 = 4;
7 + 4 = 11 → Return 11;
Thanks in Advance!

you can try using a nested loop :
arr = [1, 4, 2, 3, 0 , 5]
sums = 7
tlist = []
for i in range(len(arr)):
for j in range(len(arr)-1):
if (i!=j) and ((arr[i] + arr[j+1]) == sums):
if (i,j+1) not in tlist and (j+1,i) not in tlist:
tlist.append((i,j+1))
print("index ->",i," ",j+1)
print("sum=", i+j+1)
output:
index -> 1 3
sum= 4
index -> 2 5
sum= 7

You could use itertools for easily checking sum for combinations like,
>>> import itertools
>>> num = 7
>>> for a,b in itertools.combinations(arr, 2):
... if a + b == num:
aindex, bindex = arr.index(a), arr.index(b)
... indices_sum = aindex + bindex
... print('[element sum]: {} + {} = {} [indices sum]: {} + {} = {}'.format(a, b, a + b, aindex, bindex , indices_sum))
...
[element sum]: 4 + 3 = 7 [indices sum]: 1 + 3 = 4
[element sum]: 2 + 5 = 7 [indices sum]: 2 + 5 = 7
>>> arr
[1, 4, 2, 3, 0, 5]

You could take a different approach by calculating the difference then checking if each element is present in the first array or not.
arr = [1, 4, 2, 3, 0, 5]
the_sum = 7
diff = [the_sum - x for x in arr]
for idx, elem in enumerate(diff):
try:
index = arr.index(elem)
sum_of_indices = idx + index
print("{} + {} = {}".format(idx, index, sum_of_indices))
except ValueError:
pass
output
1 + 3 = 4
2 + 5 = 7
3 + 1 = 4
5 + 2 = 7
To remove the duplicates, its always easy to take a frozenset of the indices tuple
a = [(2,1), (1,2), (3,2), (2,3)]
{frozenset(x) for x in a} # {frozenset({2, 3}), frozenset({1, 2})}

Mixed integer program python

I have this optimization problem where I am trying to maximize column z based on a unique value from column X, but also within a constraint that each of the unique values picked of X added up column of Y most be less than or equal to (in this example) 23.
For example, I have this sample data:
X Y Z
1 9 25
1 7 20
1 5 5
2 9 20
2 7 10
2 5 5
3 9 10
3 7 5
3 5 5
The result should look like this:
X Y Z
1 9 25
2 9 20
3 5 5
This is replica for Set up linear programming optimization in R using LpSolve? with solution but I need the same in python.

For those who would want some help to get started with pulp in python can refer to http://ojs.pythonpapers.org/index.php/tppm/article/view/111
Github repo- https://github.com/coin-or/pulp/tree/master/doc/KPyCon2009 could be handy as well.
Below is the code in python for the dummy problem asked
import pandas as pd
import pulp
X=[1,1,1,2,2,2,3,3,3]
Y=[9,7,5,9,7,5,9,7,5]
Z=[25,20,5,20,10,5,10,5,5]
df = pd.DataFrame({'X':X,'Y':Y,'Z':Z})
allx = df['X'].unique()
possible_values = [(w,b) for w in allx for b in range(1,4)]
x = pulp.LpVariable.dicts('arr', (allx, range(1,4)),
lowBound = 0,
upBound = 1,
cat = pulp.LpInteger)
model = pulp.LpProblem("Optim", pulp.LpMaximize)
model += sum([x[w][b]*df[df['X']==w].reset_index()['Z'][b-1] for (w,b) in possible_values])
model += sum([x[w][b]*df[df['X']==w].reset_index()['Y'][b-1] for (w,b) in possible_values]) <= 23, \
"Maximum_number_of_Y"
for value in allx:
model += sum([x[w][b] for (w,b) in possible_values if w==value])>=1
for value in allx:
model += sum([x[w][b] for (w,b) in possible_values if w==value])<=1
##View definition
model
model.solve()
print("The choosen rows are out of a total of %s:"%len(possible_values))
for v in model.variables():
print v.name, "=", v.varValue
For solution in R
d=data.frame(x=c(1,1,1,2,2,2,3,3,3),y=c(9,7,5,9,7,5,9,7,5),z=c(25,20,5,20,10,5,10,5,3))
library(lpSolve)
all.x <- unique(d$x)
d[lp(direction = "max",
objective.in = d$z,
const.mat = rbind(outer(all.x, d$x, "=="), d$y),
const.dir = rep(c("==", "<="), c(length(all.x), 1)),
const.rhs = rep(c(1, 23), c(length(all.x), 1)),
all.bin = TRUE)$solution == 1,]

Calculate the extended gcd using a recursive function in Python

I am given the function gcd, which is defined as follows:
def gcd(a, b):
if (0 == a % b):
return b
return gcd(b, a%b)
Now I am asked to write a recursive function gcd2(a,b) that returns a list of three numbers (g, s, t) where g = gcd(a, b) and g = s*a + t*b.
This means that you would enter two values (a and b) into the gcd(a, b) function. The value it returns equals g in the next function.
These same a and b values are then called into gcd2(a, b). The recursive part is then used to find the values for s and t so that g = s*a + t*b.
I am not sure how to approach this because I can't really envision what the "stopping-condition" would be, or what exactly I'd be looping through recursively to actually find s and t. Can anyone help me out?

The key insight is that we can work backwards, finding s and t for each a and b in the recursion. So say we have a = 21 and b = 15. We need to work through each iteration, using several values -- a, b, b % a, and c where a = c * b + a % b. First, let's consider each step of the basic GCD algorithm:
21 = 1 * 15 + 6
15 = 2 * 6 + 3
6 = 2 * 3 + 0 -> end recursion
So our gcd (g) is 3. Once we have that, we determine s and t for 6 and 3. To do so, we begin with g, expressing it in terms of (a, b, s, t = 3, 0, 1, -1):
3 = 1 * 3 + -1 * 0
Now we want to eliminate the 0 term. From the last line of the basic algorithm, we know that 0 = 6 - 2 * 3:
3 = 1 * 3 + -1 * (6 - 2 * 3)
Simplifying, we get
3 = 1 * 3 + -1 * 6 + 2 * 3
3 = 3 * 3 + -1 * 6
Now we swap the terms:
3 = -1 * 6 + 3 * 3
So we have s == -1 and t == 3 for a = 6 and b = 3. So given those values of a and b, gcd2 should return (3, -1, 3).
Now we step back up through the recursion, and we want to eliminate the 3 term. From the next-to-last line of the basic algorithm, we know that 3 = 15 - 2 * 6. Simplifying and swapping again (slowly, so that you can see the steps clearly...):
3 = -1 * 6 + 3 * (15 - 2 * 6)
3 = -1 * 6 + 3 * 15 - 6 * 6
3 = -7 * 6 + 3 * 15
3 = 3 * 15 + -7 * 6
So for this level of recursion, we return (3, 3, -7). Now we want to eliminate the 6 term.
3 = 3 * 15 + -7 * (21 - 1 * 15)
3 = 3 * 15 + 7 * 15 - 7 * 21
3 = 10 * 15 - 7 * 21
3 = -7 * 21 + 10 * 15
And voila, we have calculated s and t for 21 and 15.
So schematically, the recursive function will look like this:
def gcd2(a, b):
if (0 == a % b):
# calculate s and t
return b, s, t
else:
g, s, t = gcd2(b, a % b)
# calculate new_s and new_t
return g, new_s, new_t
Note that for our purposes here, using a slightly different base case simplifies things:
def gcd2(a, b):
if (0 == b):
return a, 1, -1
else:
g, s, t = gcd2(b, a % b)
# calculate new_s and new_t
return g, new_s, new_t

The base case (stopping condition) is:
if a%b == 0:
# a = b*k for the integer k=a/b
# rearranges to b = -1*a + (k+1)*b
# ( g = s*a + t*b )
return (b, -1, a/b+1) # (g, s, t)
However the exercise is to rewrite the recursive part:
g1, s1, t1 = gcd(b, a%b) # where g1 = s1*b + t1*(a%b)
g, s, t = ??? # where g = s*a + t*b
return (g, s, t)
in terms of g1, s1 and t1... which boils down to rewriting a%b in terms of a and b.

"Write a recursive function in Python", at least in CPython, cries for this: be aware of http://docs.python.org/library/sys.html#sys.getrecursionlimit. This is, in my opinion, one of the most important answers to this question. Please do some research on this topic yourself. Also, this thread might be insightful: Python: What is the hard recursion limit for Linux, Mac and Windows?
In conclusion, try to use an iterative instead of a recursive approach in Python whenever possible.

It is based on Euclidian algorithm using better to while loop continued recursion even better and less execution
def gcd(m,n):
#assume m>= n
if m <n:
(m,n) = (n,m)
if (m%n) == 0:
return(n)
else:
diff =m-n
#diff >n ?Possible!
return(gcd(max(n,diff),min(n,diff)))
it can be better by while loop
def gcd(m,n):
if m<n :
(m,n) =(n,m)
while (m%n) !=0:
diff =m-n
(m,n) =(max(n,diff),min(n,diff))
return(n)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Machine Learning - Figure out Mathematical Equation for expected outcome - python

Related

Is there a way to insert an arbitrary symbol before the Python output value?

limiting data set to be used xlim

Given an array find element pairs whose sum is equal to the given sum and return the sum of their indices

Mixed integer program python

Calculate the extended gcd using a recursive function in Python

Categories

Resources