I have two arrays with two rows each, using which I am trying to do some calculation. Last step in calculation is to sum. When I try to sum using .sum(), it gives one answer, and when I try to manually sum with + by explicitly indexing each row, it gives different answer:
>>> ad=[[0.,0.,0.,-0.91,0.34],[0.,0.,1.,0.93,0.65]]
>>> nad=np.array(ad)
>>> W=[[0.29,0.],[0.23,0.]]
>>> NW = np.array(W)
>>> (nad[:,4] * (nad[:,3] + 0.96 * NW[nad[:,2].astype(int)][0][0])).sum() #use sum() on row 0 and 1
0.5707160000000001
>>> (nad[0,4] * (nad[0,3] + 0.96 * NW[nad[0,2].astype(int)][0])) #calculate by explicitly indexing row 0
-0.21474400000000005
>>> (nad[1,4] * (nad[1,3] + 0.96 * NW[nad[1,2].astype(int)][0])) #calculate by explicitly indexing row 0
0.74802
>>> (nad[0,4] * (nad[0,3] + 0.96 * NW[nad[0,2].astype(int)][0])) + (nad[1,4] * (nad[1,3] + 0.96 * NW[nad[1,2].astype(int)][0])) #calculate by explicitly indexing row 0 and row 1
0.533276
The difference (0.57071-0.533276) might look small, but it blows with iterative calculations. I feel both approaches should give same values, right? But then why they are giving different values? What my eyes are missing?
Even manually doing sum gives later answer (one with +). Is numpy sum() doing something else which I dont know?
Related
I'm stuck on returning the result from the function which is checking samples for A/B test and gave the result. The calculation is correct, but somehow I'm getting the result twice. The code and output below.
def test (sample1, sample2):
for i in it.chain (range(len(sample1)), range(len(sample2))):
alpha = .05
difference = (sample1['step_conversion'][i] - sample2['step_conversion'][i])/100
if (i > 0):
p_combined = (sample1['unq_user'][i] + sample2['unq_user'][i]) / (sample1['unq_user'][i-1] + sample2['unq_user'][i-1])
z_value = difference / mth.sqrt(
p_combined * (1 - p_combined) * (1 / sample1['unq_user'][i-1] + 1 / sample2['unq_user'][i-1]))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print( sample1['event_name'][i], 'p-value: ', p_value)
if p_value < alpha:
print('Deny H0')
else:
print('Accept H0')
return
So I need the result in output just once (tagged in the box), but I get it twice from both samples.
When using Pandas dataframes, you should avoid most for loops, and use the standard vectorised approach. Use NumPy where applicable.
First, I've reset the indexes (indices) of the dataframes, to be sure .loc can be used with a standard numerical index.
sample1 = sample1.reset_index()
sample2 = sample2.reset_index()
The below does what I think you for loop does.
I can't test it, and without a clear description, example dataframes and expected outcome, it is anyone's guess if the code below does what you want. But it may get close, and mostly serves as an example of the vectorised approach.
import numpy as np
difference = (sample1['step_conversion'] - sample2['step_conversion']) / 100
n = len(sample1)
# Note that Pandas uses `n` as the highest *valid* index when using `.loc`, `n-1` is one lower
p_combined = ((sample1.loc[1:, 'unq_user'] + sample2.loc[1:, 'unq_user']).reset_index(drop=True) /
(sample1.loc[:n-1, 'unq_user'] + sample2.loc[:n-1, 'unq_user'])).reset_index(drop=True)
z_value = difference / np.sqrt(
p_combined * (1 - p_combined) * (
1 / sample1.loc[:n-1, 'unq_user'] + 1 / sample2.loc[:n-1, 'unq_user']))
distr = st.norm(0, 1) # ??
p_value = (1 - distr.cdf(np.abs(z_value))) * 2
sample1['p_value'] = p_value
print(sample1)
# The below prints a list of True values for elements for which the condition is valid.
# You can also use e.g. `print(sample1[p_value < alpha])`.
alpha = 0.05
print('Deny H0:')
print(p_value < alpha)
print('Accept H0:')
print(p_value > alpha)
No for loop needed, and for a large dataframe, the above will be notably faster.
Note that the .reset_index(drop=True) is a bit ugly. But if that is not there, Pandas will divide the two dataframes by equal indices, which is not what we want. This way, that is avoided.
Hi,
I have this dataset that has 7 columns (see image). First, I want to group by the Name column, afterwards I want to assign weights as follows:
Compute 10% of 1/n (if Provider for a Name is more than 1) for all n number of IDs in a Name. n = count of unique ID for one name. So for Sammy for example, n = 2.
Add 5% of 1/n if the column Accel_5 is 1, add an extra 10% of 1/n if the Accel_10 is 1 and add an extra 15% of 1/n if the Accel_15 is 1.
Add 10% for each additional tech
Altogether, groupby Name(Sammy, Josh, Sarah), then compute; 10% of 1/n(if provider is greater than 1) + 5% of 1/n(if Accel_5 is equal to 1) + 10% of 1/n (if Accel_10 is equal to 1) + 15% of 1/n (if Accel_15 is equal to 1) + 10% of 1/n (for each additional tech).
I have been able to groupby name only and I have gotten the unique number of IDs by name but I am stuck. See sample code below:
sample = pd.read_csv("Records.csv")
test = sample.groupby("Name")
test["ID"].nunique()
Link to data: Link to image depicted above
I appreciate your help.
Thanks.
You could try to create a custom function, and then use .apply() as:
def assign_weights(x):
n = len(x['ID'].unique())
x["Weight"] = 0
# 1.
n_providers = len(x['Provider'].unique())
if n_providers > 1:
x["Weight"] += 0.1 * 1/n
# 2.
if 1 in x['Accel_5']:
x["Weight"] += 0.05 * 1/n
if 1 in x['Accel_10']:
x["Weight"] += 0.1 * 1/n
if 1 in x['Accel_15']:
x["Weight"] += 0.15 * 1/n
# 3.
n_tech = len(x['Tech'].unique())
x["Weight"] += 0.1 * n_tech
return x
sample.groupby("Name").apply(lambda x: assign_weights(x))
This creates a new column Weight, based on the conditions 1, 2 and 3 you supplied. Because you did not specify the input data in an appropriate manner (not using an image), I have not tested the code, but I believe it should work as intended.
I am trying to convert these rate equations to python code, I have made I lot of research but can't seem to get any clear path to follow to achieve this, please any help will be appreciated
This is a newly updated code....i wrote using the quide from Tom10.....please what do you think?
import numpy as np
# import numpy as sum # not necessary, just for convenience, and replaces the builtin
# set N_core value
N_CORE = 0
# set the initial conditions appropriately (you need to set these correctly)
N = np.ones(8)
r = np.ones((8, 8))
dN = np.zeros(8) # the value here is not important for your equations
# set constant for equation 1
R_P1abs37 = 20
F_P1 = 20
R_P1abs47 = 40
W_3317 = 1.0
# set constant for equation 2
W_6142 = 90
W_5362 = 80
# Set you constants appropriately for equation 3
R_P2abs35 = 30
F_P2 = 40
R_L2se34 = 50
F_L2 = 90
# equation 4 constants
W_2214 = 20
#equation 5 constants
R_P1abs13 = 30
R_L2se32 = 20
F_L1 = 10
# equation 1 formular
dN[7] =sum(r[7,:]*N[7]) + (R_P1abs37*F_P1) + (R_P1abs47*F_P1) + (W_3317*N[3]**2)
# equation 2 formular
dN[6] = (r[7,6]*N[7]) - sum(r[6,:]*N[6]) - (W_6142*N[6]*N[1]) + (W_5362*N[5]*N[3])
#equation 3 formular
dN[5] = sum(r[:,5]*N) - sum(r[5,:]*N[5]) + R_P2abs35*F_P2 - R_L2se34*F_L2 - W_5362*N[5]*N[3]
# equation 4 formular
dN[4] = sum(r[:,4]*N) - sum(r[4,:]*N[4]) - (R_P1abs47*F_P1) + (R_L2se34*F_L2) + (W_2214*N[2]**2)+ (W_6142*N[6]*N[1])
#equation 5 formular
dN[3] = sum(r[:,3]*N) - sum(r[3,:]*N[3]) + (R_P1abs13*F_P1) - (R_P1abs37*F_P1) - (R_P2abs35*F_P2)
-(R_L2se32*F_L1) - ((2*W_3317)*N[3]**2) - (W_5362*N[5]*N[3])
#equation 6 formular
dN[2] = sum(r[:,2]*N) - (r[2,1]*N[2]) + (R_L2se32*F_L1) - ((2*W_2214)*N[2]**2) + (W_6142*N[6]*N[1])+(W_5362*N[5]*N[3])
#equation 7 formular
dN[1] = sum(r[:,1] * N) - (R_P1abs13*F_P1) + (W_2214*N[2]**2) + (W_3317+N[3]**2) - (W_6142+N[6]*N[1])
#equation for N CORE
N_CORE = sum(dN)
print(N_CORE)
Here is list of relevant issues based on your question and comments:
Usually if the summation is over i, then everything without an i subscript is constant for that sum. (Mathematically these constant terms can just be brought out of the sum; so the first equation is a bit odd where the N_7 could be moved out of the sum but I think they're keeping it in to show the symmetry with the other equations which all have an r*N term).
The capitol sigma symbol (Σ) means you need to do a sum, which you can do in a loop, but both Python list and numpy have a sum function. Numpy has the additional advantage that multiplication is interpreted as multiplication of the individual elements, making the expression easier. So for a[0]*[b0] + a[1]*b[1] + a[2]*b[2] and numpy arrays is simply sum(a*b) and for Python lists it's sum([a[i]*b[i] for in range(len(a))]
Therefore using numpy, the setup and your third equation would look like:
import numpy as np
import numpy.sum as sum # not necessary, just for convenience, and replaces the builtin
# set the initial conditions appropriately (you need to set these correctly)
N = np.ones(7, dtype=np.float)
# r seems to be a coupling matrix, and should be set according to your system
r = np.ones((7, 7), dtype = np.float)
# the values for dN are not important for your equations because dN only appears on the left side of the equations, so we just make a place to store the results
dN = np.zeros(7, dtype=np.float)
# Set you constants appropriate.y
R_P2abs35 = 1.0
F_P2 = 1.0
R_L2se34 = 1.0
F_L2 = 1.0
W_5362 = 1.0
dN[5] = sum(r[:,5]*N) - sum(r[5,:]*N[5]) + R_P2abs35*F_P2 - R_L2se34*F_L2 - W_5362*N[5]*N[3]
Note that although the expressions in the sums look similar, the first is essentially a dot product between two vectors and the second is a scalar times a vector so N[5] could be taken out of the sum (but I left it there to match the equation).
Final note: I see you're new to S.O. so I thought it would be helpful if I answered this question for you. In the future, please show some attempt at the code -- it really helps a lot.
I'm trying to get the Taylor series for this function
Which should be similar to this, considering that d is centered or around rs
However when I try to take the example of #Saullo for my problem,
As you can see the result is eliminating "d" from the series of Taylor, which should not be my goal.
Another additional info about the function in fact is:
I'm doing something wrong ??, is there a way to get my result without deleting "d" ??
Any help is appreciated
The code
Thank you for your response and interest in helping me, here is my code until nowdays #asmeurer
import sympy as sy
#import numpy as np
from sympy import init_printing
init_printing(use_latex=True)
# Define the variable and the function to approximate
z, d, r_s, N_e, r_t, r_s, r_b = sy.symbols('z d r_s N_e r_t r_s r_b')
# Define W_model
def W_model(r_t=r_t, r_b=r_b, r_s=r_s, z=z):
s_model = sy.sqrt(pow(r_t, 2) - pow(r_s*sy.sin(z), 2)) - sy.sqrt(pow(r_b, 2) - pow(r_s*sy.sin(z), 2))
d_model = r_t - r_b
STEC_approx = N_e * s_model
VTEC_approx = N_e * d_model
return STEC_approx/VTEC_approx
f = W_model()
# printing Standard model
f
# Some considerations for modify Standard model
rb = r_s - d/2
rt = r_s + d/2
f = W_model(r_b=rb, r_t=rt, r_s=r_s, z=z)
# printing My model
f
## Finding taylor series aproximmation for W_model
num_of_terms = 2
# creates a generator
taylor_series = f.series(x=d, n=None)
# takes the number of terms desired for your generator
taylor_series = sum([next(taylor_series) for i in range(num_of_terms)])
taylor_series
The issue is that your expression is complicated enough that series doesn't know that the odd order terms are zero (you get complicated expressions for them, but if you call simplify() on them, they go to 0). Consider
In [62]: s = f.series(d, n=None)
In [63]: a1 = next(s)
In [64]: a2 = next(s)
In [65]: simplify(a0)
Out[65]:
rₛ
────────────────
_____________
╱ 2 2
╲╱ rₛ ⋅cos (z)
In [66]: simplify(a1)
Out[66]: 0
If you print a0 and a1 they are both complicated expressions. In fact, you need to get several terms (up to a3) before series gets a term that doesn't cancel to 0:
In [73]: simplify(a3)
Out[73]:
_____________
2 ╱ 2 2 2
d ⋅╲╱ rₛ ⋅cos (z) ⋅sin (z)
───────────────────────────
3 6
8⋅rₛ ⋅cos (z)
If you do f.series(d, n=3), it gives the expansion up to d**2 (n=3 means + O(d**3)). You can simplify the expression quite a bit using
collect(expr.removeO(), d, simplify)
Internally, when you give series an explicit n, it uses the term-by-term generator to get as many terms as it needs to give the proper O(d**n) expansion. If you use the generator yourself (n=None), you need to do this manually.
In general, the iterator is not guaranteed to give you the next order term. If you want guarantees that you have all the terms, you need to provide an explicit n. The O term returned by series is always correct (meaning all the lower order terms are complete).
I get this error when using a python script that calculates pi using the Gauss-Legendre algorithm. You can only use up to 1024 iterations before getting this:
C:\Users\myUsernameHere>python Desktop/piWriter.py
End iteration: 1025
Traceback (most recent call last):
File "Desktop/piWriter.py", line 15, in <module>
vars()['t' + str(sub)] = vars()['t' + str(i)] - vars()['p' + str(i)] * math.
pow((vars()['a' + str(i)] - vars()['a' + str(sub)]), 2)
OverflowError: long int too large to convert to float
Here is my code:
import math
a0 = 1
b0 = 1/math.sqrt(2)
t0 = .25
p0 = 1
finalIter = input('End iteration: ')
finalIter = int(finalIter)
for i in range(0, finalIter):
sub = i + 1
vars()['a' + str(sub)] = (vars()['a' + str(i)] + vars()['b' + str(i)])/ 2
vars()['b' + str(sub)] = math.sqrt((vars()['a' + str(i)] * vars()['b' + str(i)]))
vars()['t' + str(sub)] = vars()['t' + str(i)] - vars()['p' + str(i)] * math.pow((vars()['a' + str(i)] - vars()['a' + str(sub)]), 2)
vars()['p' + str(sub)] = 2 * vars()['p' + str(i)]
n = i
pi = math.pow((vars()['a' + str(n)] + vars()['b' + str(n)]), 2) / (4 * vars()['t' + str(n)])
print(pi)
Ideally, I want to be able to plug in a very large number as the iteration value and come back a while later to see the result.
Any help appreciated!
Thanks!
Floats can only represent numbers up to sys.float_info.max, or 1.7976931348623157e+308. Once you have an int with more than 308 digits (or so), you are stuck. Your iteration fails when p1024 has 309 digits:
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216L
You'll have to find a different algorithm for pi, one that doesn't require such large values.
Actually, you'll have to be careful with floats all around, since they are only approximations. If you modify your program to print the successive approximations of pi, it looks like this:
2.914213562373094923430016933707520365715026855468750000000000
3.140579250522168575088244324433617293834686279296875000000000
3.141592646213542838751209274050779640674591064453125000000000
3.141592653589794004176383168669417500495910644531250000000000
3.141592653589794004176383168669417500495910644531250000000000
3.141592653589794004176383168669417500495910644531250000000000
3.141592653589794004176383168669417500495910644531250000000000
In other words, after only 4 iterations, your approximation has stopped getting better. This is due to inaccuracies in the floats you are using, perhaps starting with 1/math.sqrt(2). Computing many digits of pi requires a very careful understanding of the numeric representation.
As noted in previous answer, the float type has an upper bound on number size. In typical implementations, sys.float_info.max is 1.7976931348623157e+308, which reflects the use of 10 bits plus sign for the exponent field in a 64-bit floating point number. (Note that 1024*math.log(2)/math.log(10) is about 308.2547155599.)
You can add another half dozen decades to the exponent size by using the Decimal number type. Here is an example (snipped from an ipython interpreter session):
In [48]: import decimal, math
In [49]: g=decimal.Decimal('1e12345')
In [50]: g.sqrt()
Out[50]: Decimal('3.162277660168379331998893544E+6172')
In [51]: math.sqrt(g)
Out[51]: inf
This illustrates that decimal's sqrt() function performs correctly with larger numbers than does math.sqrt().
As noted above, getting lots of digits is going to be tricky, but looking at all those vars hurts my eyes. So here's a version of your code after (1) replacing your use of vars with dictionaries, and (2) using ** instead of the math functions:
a, b, t, p = {}, {}, {}, {}
a[0] = 1
b[0] = 2**-0.5
t[0] = 0.25
p[0] = 1
finalIter = 4
for i in range(finalIter):
sub = i + 1
a[sub] = (a[i] + b[i]) / 2
b[sub] = (a[i] * b[i])**0.5
t[sub] = t[i] - p[i] * (a[i] - a[sub])**2
p[sub] = 2 * p[i]
n = i
pi_approx = (a[n] + b[n])**2 / (4 * t[n])
Instead of playing games with vars, I've used dictionaries to store the values (the link there is to the official Python tutorial) which makes your code much more readable. You can probably even see an optimization or two now.
As noted in the comments, you really don't need to store all the values, only the last, but I think it's more important that you see how to do things without dynamically creating variables. Instead of a dict, you could also have simply appended the values to a list, but lists are always zero-indexed and you can't easily "skip ahead" and set values at arbitrary indices. That can occasionally be confusing when working with algorithms, so let's start simple.
Anyway, the above gives me
>>> print(pi_approx)
3.141592653589794
>>> print(pi_approx-math.pi)
8.881784197001252e-16
A simple solution is to install and use the arbitrary-precisionmpmath module which now supports Python 3. However, since I completely agree with DSM that your use ofvars()to create variables on the fly is an undesirable way to implement the algorithm, I've based my answer on his rewrite of your code and [trivially] modified it to make use ofmpmath to do the calculations.
If you insist on usingvars(), you could probably do something similar -- although I suspect it might be more difficult and the result would definitely harder to read, understand, and modify.
from mpmath import mpf # arbitrary-precision float type
a, b, t, p = {}, {}, {}, {}
a[0] = mpf(1)
b[0] = mpf(2**-0.5)
t[0] = mpf(0.25)
p[0] = mpf(1)
finalIter = 10000
for i in range(finalIter):
sub = i + 1
a[sub] = (a[i] + b[i]) / 2
b[sub] = (a[i] * b[i])**0.5
t[sub] = t[i] - p[i] * (a[i] - a[sub])**2
p[sub] = 2 * p[i]
n = i
pi_approx = (a[n] + b[n])**2 / (4 * t[n])
print(pi_approx) # 3.14159265358979