solving math equation for data loading problem - python

I'm having a dataframe df_N with n observations. I'd want to write a code that would create new dataframe df_M with records from df_N. The number of observations in df_M ( ie m observations ) is several orders greater that the number of observations on df_N.
The number of observations on df_M can be represented in by the following formula.
m = (n*(2^x)) + n^y + z
Note that the first part of the equation is the series n, n2, n4, n*8. ie n times 2^x
Note that all values are integers.
For example if n = 8 and m = 82
the values of the formula would be
82= (8*(2^3) + 8^2 + 2 = 8*8 + 16 + 2 = 64 + 16 + 2 = 82
values of x = 3 , y = 2 and z = 2
Also note that always (n*(2^x)) > n^y > z . This constraint will restrict the number of solutions for the equation.
Is there a way of solving this equation on python and finding the values of x y and z, given n and m?
Once the value of x y and z are determined, I'd be able to write a code to create additional records for each of the segment of the equation and combiming them to a single dataframe df_M

Assuming that you also want to maximize n*y over z, m and n are positive and other numbers should be non-negative:
x = m.bit_length() - 1
m -= 2**x
y = m//n
z = m - y*n

Related

Generating matrices by inputting different values of time?

I am kind of stuck on a problem. Basically, I need to generate a sequence of matrices at different time-points; however I only seem to get one. This is the code I have:
def E_matrix(degs, N_in, T2_in, T1_in):
"""
Generates a matrix.
Parameters
----------
degs: Takes in angles in degrees to generate the flip angle in radians.
N_in: The given flip angle (alpha) is repeated N_in times
TE_inc: Echoes every TE_inc (TE-inc = 0.1)
T2_in: T2_in = 0.2
T1_in:T1_in = 0.1
Returns
-------
Matrix
"""
#convert flip angle into [rad]. The given flip angle is repeated N_in times
alpha = np.array((degs/180.0) * np.pi)
# Additional variables
pn = np.repeat(alpha, N_in)
N = len(pn)
# Create state matrices Omega before and after.
E_matrix_preRF = np.identity(3)
E_matrix_postRF = np.eye(3, N)
# Preparing x-values array (same every time)
xs = np.arange(0, 10 - 1) * 0.1
# A for loop to generate the experienced matrix E.
for k in np.arange(0, N - 1):
for i in np.arange(0, len(xs) - 1):
E_matrix_preRF[:2][k] = np.exp(-xs[i]/T2_in) * E_matrix_postRF[:2][k]
E_matrix_preRF[2][2] = np.exp(-xs[i]/T1_in) * E_matrix_postRF[2][2] + (1 - np.exp(-xs[i]/T1_in))
return E_matrix_preRF
This is where I am confused, if I am iterating over a range of values, why would it not give me 7 matrices? Instead it gives me 1 by taking the last value from the np.arange, (i.e 0.70)?
0.0301974 0 0
0 0.0301974 0
0 0 1
I would like 7 matrices (3x3) at each time point and then I would like to multiply each matrix to another set of matrices (3x3). The final multiplication involves a 3x1 vector so my end product would be 7 matrices of size 3x1 at each of those time points.
I would appreciate the help very much.

Python: Need to find such coefficients that multiplying them by known data points minimises SD

I have a system of equations of the form
$n_i q=a_i \ 1\le i \le N$
where only $a_i$ are known. I am trying to find the numbers $n_i$ which should be close to small positive integers. I also know the bound for possible values of $q$. How do I solve such a system in Python?
Alternatively, we can say that I want to find such small positive integers that
$SD(\frac{a_1}{n_1}, ...,\frac{a_i}{n_i},...,\frac{a_N}{n_N})$
is minimized.
I thought about using least squares or partial derivatives and Newton's method. I'm not sure how to make curvefit work for $mx+b$ when I know that $b=0$ but don't know any $x$-s except that they should be close to integers.
Here is a guess at a solution with inline explanations and simulated data.
import numpy as np
import scipy.optimize
from numpy.random import default_rng
'''
nq = a
"a" ranges between 5 and 37.
a_min, expected to be close to 5, will be scaled by 1/q to 1 <= n_min <= 12;
all other "a" will be scaled by the same q.
n will have the same distribution as "a".
To have 3/4 of the elements of n be 12 or less, either "a" has a uniform
distribution and q needs to take a fixed value:
(37 - 5)*0.75 + 5 = 29
q = 29/12 ~ 2.4
or the distribution of "a" needs to be made non-uniform; an exponential example:
n_min < 12 < n_max
5 < 12q < 37
0 < (12q - 5)/(37 - 5) < 1
0.75^p = (12q - 5)/(37 - 5)
p = log((12q - 5)/(37 - 5)) / log(0.75)
This all means problem bounds of
a_min a_max n_min n_max q p
5 37 1.000 7.400 5.000
5 37 1.622 12.000 3.083 0.000 (bad here and above)
5 37 2.069 15.310 2.417 1.000
5 37 10.000 74.000 0.500 12.047
5 37 12.000 88.800 0.417 inf (bad here and below)
Regardless of the distribution of "a", for "n" to include 12 in its possible
range, 5/12 <= q <= 37/12 or 0.417 ~<= q ~<= 3.083.
'''
amin_theory = 5
amax_theory = 37
nmax_lower = 12 # Highest value of n for the lower 3/4 elements
nmin_upper = nmax_lower # Lowest value of n for the highest element
nmax_upper_theory = amax_theory/amin_theory*nmax_lower # Highest value of n for the highest element
qmin_theory = amin_theory/nmax_lower
qnom_theory = ((amax_theory - amin_theory)*0.75 + amin_theory)/nmax_lower
qmax_theory = amax_theory/nmax_lower
N = 14 # about a dozen
rand = default_rng(seed=0)
n_ideal = rand.integers(low=2, high=15, size=N)
n_noise = n_ideal + rand.uniform(-0.1, 0.1, size=N)
q_ideal = 2.4
a = n_noise*q_ideal
amin = a.min()
amax = a.max()
nmax_upper = amax/amin*nmax_lower
qmin = amin/nmax_lower
qmax = amax/nmax_lower
'''
Rounding produces sawtooth error which is impractical to optimize. A reasonable approximation is,
for the highest value whose relative quantization error is the lowest and sawtooth the highest-
frequency, try every integer in range, assuming that within the given integer the cost from
quantization error is differentiable.
'''
n_upper = np.arange(nmin_upper, 1 + np.round(nmax_upper))
n_upper_bounds = n_upper[:, np.newaxis] + ((-0.5, +0.5),)
# nq = a
q_init = amax / n_upper
q_bounds = (amax / n_upper_bounds)[:, ::-1]
n_rounded = np.round(a[:, np.newaxis] / q_init[np.newaxis, :])
# For each value of q, the error is the sum of squares of differences of n_rounded
def err(q: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
n = a[:, np.newaxis] / q[np.newaxis, :]
e = (n_rounded - n)**2
return e, n
def errsum(q: np.ndarray) -> np.ndarray:
e, n = err(q)
return e.sum()
result = scipy.optimize.minimize(
fun=errsum, x0=q_init, bounds=q_bounds,
)
q_opt = result.x
errs, n_opt = err(q_opt)
errs = errs.sum(axis=0)
order = np.argsort(errs)
q_opt = q_opt[order]
errs = errs[order]
n_opt = n_opt[:, order]

How to go about data modelling?

I've spent the last 2 hours or so figuring out how to apply it to my two variables. I am supposed to demonstrate/explain how I would handle the relationship of the two following variables in data modelling:
Pressure24h DangerLevel24h
1000.2 45
1014.8 90
990.8 14
998.4 95
1002.1 46
1006 21
There is another 185,000 data to work with but that's just a very small sample of it. Pressure24h is measured in hectopascals and DangerLevel24h is measured in percentage. That's the only information I have to work with.
Is there any method that can be used to approach this?
I created a scatter plot to show the relationship but that was as far as I have gotten so far.
https://i.stack.imgur.com/Ty5Yn.png
Here's my code as discussed in the comments:
def lobf(*cords):
cords = cords[0]
print(cords)
x_mean, y_mean = 0, 0
print(cords)
for x, y in cords:
x_mean += x # get x sum
y_mean += y # get y sum
x_mean /= len(cords) # get x mean
y_mean /= len(cords) # get y mean
# Step 2 from https://www.varsitytutors.com/hotmath/hotmath_help/topics/line-of-best-fit
sigma_numerator, sigma_denominator = 0, 0
for xi, yi in cords:
sigma_numerator += (xi - x_mean)*(yi - y_mean) # get numerator
sigma_denominator += (xi - x_mean)**2 # get denominator
m = sigma_numerator/sigma_denominator # get slope
c = y_mean - m*x_mean # get y-intercept
return m ,c
data_values = [(2,2), (4,4)] # Sample data value you can put yours here
# Creating a for loop for every increment of 5 to avoid the blue blob you got.
# You can change the increment as per your choice
predicted_values = []
increment = 5
m ,c = lobf(data_values)
for i in range(data_values[-1][0]+increment, len(data_values)*100, increment): # You can consider dangerlevel24h as your x
"""
Starts with incrementing the last x value of your data
"""
predicted_values.append((i,i*m + c)) # appends x, y
print(predicted_values)
You can then plot every value from predicted_values. By iterating through every 5th or your desired iteration, you can avoid blue blobs to form. Also, this method will help you in predicting future values that aren't in your data. You can also try using Pearson's Theory of Correlation which is related to this method.

Solving linear program with matrices in CVXPY

I'm using CVXPY in Python 3 to try to model the following linear program in X (N by T matrix). Let
R be an N by 1 matrix where each row is the sum of the entire row of values in X.
P be a 1 by N matrix defined in terms of X such as P_t = 1/(G-d-x_t).
I want to solve for an ideal such that:
minimize (X x P)
subject to:
The sum of reach row i in X has to be at least the value in R_i
Each value in X has to be at least 0
Any thoughts? I have the following code and not getting any luck:
from cvxpy import *
X = Variable(N,T)
P = np.random.randn(T, 1)
R = cumsum(X,axis=0) # using cumsum because
http://www.cvxpy.org/en/latest/tutorial/functions/index.html#vector-matrix-functions
objective = Minimize(sum_entries(square(X*P))) #think this is good
constraints = [0 <= X, cumsum(X,axis=0) >= R]
prob = Problem(objective, constraints)

Solving CVXPY Matrix Optimization Linear Programming

I'm trying to solve for the ideal matrix X in the following linear program setup:
X = N by T matrix which is our variable. For simplicity, let's set N to 4 and T to 3.
X_column_sum = 1 by T matrix where each column value is the sum of all values of the corresponding column in X
R = N by 1 matrix with randomly determined values
G = constant (let's set to 100 for simplicity)
d = 1 by T matrix whose values take in the range [0, G-1]
P = 1 by T matrix equal to X_column_sum + d
C = X dotted with P
I want to minimize the sum of the entries of C, while preserving the following constraints:
all values in X have to be >= 0
the sum of all values in each corresponding row of X have to be at least equal to the corresponding value in R
I tried the following code using cvxpy in python, but to no avail:
from cvxpy import *
X = Variable(N,T)
G = 100
d = np.random.randn(1, T)
d *= G-1
X_column_sum = cumsum(X,axis=0)
P = cost_matrix_cars + d
R = matrix([[10]]*N) # all set to 10 for testing
objective = Minimize(sum_entries(X#P)) #think this is good
constraints = [0 <= X, cumsum(X,axis=0) >= R]
prob = Problem(objective, constraints)
print("Optimal value", prob.solve())
print("Optimal X is",x.value ) # A numpy matrix.

Categories