performing more complex calculations on groupby objects

performing more complex calculations on groupby objects - python

I am currently looking to do some calculations on a large dataset of options where I want first to split the data according to the strike price and expiry, then perform a set of calculations shown below onto each subgroup. I have been able to separate the data using groupby to get the split I want, I also wrote the calculation i want to do which works when tested on a subgroup. The only problem I have is to combine the two together.
Here is the code I used to group my data:
grouped =df.groupby(['Expiry','Strike'])
I had a read online and they mentioned the use of the apply function but the examples only included simple functions such as summation or averages.
Here is the calculation that I would like to perform on each subgroup data, where x,y,z,u,R are columns that in each subset that is the same for all subgroups:
def p(d, S, B, c):
return d * S + B - c
def b_t(r, b_old, S, d, d_old, t):
return np.exp(r * t) * b_old + S * (d_old - d)
def e_t(d_old, S, c, r, t, b_old):
return d_old * S - c + np.exp(r * t) * b_old
P_results = []
B_results = []
E_results = []
for i,(d,S,c,t,r) in enumerate(zip(x,y,z,u,R)):
B = b_t(r, b_old, S, d, d_old, t)
P = p(d, S, B, c)
E = e_t(d_old, S, c, r, t, b_old)
print('i={},P={},B={},E={}'.format(i,P,B,E))
B_results.append(B)
P_results.append(P)
E_results.append(E)
b_old = B
d_old = d
I thought maybe if I could save each subset as a new variable dataframe then maybe it could work but I haven't been able to do that.
I hope this is clear and I think posting some data would help but I am not sure how best to upload it here.
Much appreciate your help!
UPDATE 1: Found a solution that works
grouped =df.groupby(['Expiry','Strike'])
lg = list(grouped)
P_results = []
l_results =[]
B_results = []
E_results = []
for l in range(len(lg)):
df2=lg[l][1]
d_old = df2.iloc[0, 4]
S_old = df2.iloc[0, 8]
c_old = df2.iloc[0, 10]
b_old = c_old - d_old * S_old
x = df2.iloc[1:, 4]
y = df2.iloc[1:, 8]
z = df2.iloc[1:, 10]
u = df2.iloc[1:, 9]
R = df2.iloc[1:, 7]
for i, (d, S, c, t, r) in enumerate(zip(x, y, z, u, R)):
B = b_t(r, b_old, S, d, d_old, t)
P = p(d, S, B, c)
E = e_t(d_old, S, c, r, t, b_old)
print('i={},P={},B={},E={}'.format(i, P, B, E))
l_results.append(l)
B_results.append(B)
P_results.append(P)
E_results.append(E)
b_old = B
d_old = d
BB = pd.DataFrame(np.column_stack([l_results, P_results,
E_results,B_results]),columns=['l','P','E','B'])
All I did was to transform grouped into a callable list and then call each of the sections out using a for loop then use another for loop to perform the calculations. It is not the prettiest output, I put l_results there to show which group the calculations were referring to but seems to be sufficient for now. If there is any better way please let me know!

Related

Finding the positions of multiple objects in an image, but only when they are next to each other

Finding the positions of multiple objects in an image, but only when they are next to each other. So I would like to use parameters to decide whether 2 or 3 are next to each other and then get the coordinates.
At the moment I can only find one and it works. Code:
def diff(a, b):
return sum((a - b) ** 2 for a, b in zip(a, b))
def g_c_o(c, _b):
time.sleep(1)
s_i_p = ''
c_b_64 = _b.execute_script("return arguments[0].toDataURL('image/png').substring(21);", c)
c_i = base64.b64decode(c_b_64)
with open(r"canvas.png", 'wb') as f:
f.write(c_i)
with open("files/important.pickle", "rb") as f:
d_r = pickle.load(f)
if s_i_p == '' and d_r[10] is not None and d_r[10] != '':
s_i_p = d_r[10]
c_s_i = Image.open('canvas.png')
i_s = c_s_i.size
s_i = Image.open("findMe.png")
w_s = s_i.size
x0, y0 = w_s[0] // 2, w_s[1] // 2
p = s_i.getpixel((x0, y0))[:-1]
b = (100, 0, 0)
c = []
for x in range(i_s[0]):
for y in range(i_s[1]):
i_p_s = c_s_i.getpixel((x, y))
d = diff(i_p_s, p)
if d < b[0]:
b = (d, x, y)
x, y = b[1:]
return [x, y]
And here the images, if this can help:
Image to find:
Find in:
And it works, i find this:
Can anyone tell me if I can find 2 or 3 that are next to each other and coordinates of them, not just the one?
Can anyone tell me if I can find 2 or 3 that are next to each other and coordinates of them, not just the one?

Intersecting two sets, retaining all (up to) three parts efficiently

If you have two sets a and b and intersect them, there are three interesting parts (which may be empty): h(ead) elements of a not in b, i(ntersection) elements in both a and b, and t(ail) elements of b not in a.
For example: {1, 2, 3} & {2, 3, 4} -> h:{1}, i:{2, 3}, t:{4} (not actual Python code, clearly)
One very clean way to code that in Python:
h, i, t = a - b, a & b, b - a
I figure that this can be slightly more efficient though:
h, t = a - (i := a & b), b - i
Since it first computes the intersection and then subtracts only that from a and then b, which would help if i is small and a and b are large - although I suppose it depends on the implementation of the subtraction whether it's truly faster. It's not likely to be worse, as far as I can tell.
I was unable to find such an operator or function, but since I can imagine efficient implementations that would perform the three-way split of a and b into h, i, and t in fewer iterations, am I missing something like this, which may already exist?
from magical_set_stuff import hit
h, i, t = hit(a, b)

It's not in Python, and I haven't seen such a thing in a 3rd-party library either.
Here's a perhaps unexpected approach that's largely insensitive to which sets are bigger than others, and to how much overlap among inputs there may be. I dreamed it up when facing a related problem: suppose you had 3 input sets, and wanted to derive the 7 interesting sets of overlaps (in A only, B only, C only, both A and B, both A and C, both B and C, or in all 3). This version strips that down to the 2-input case. In general, assign a unique power of 2 to each input, and use those as bit flags:
def hit(a, b):
x2flags = defaultdict(int)
for x in a:
x2flags[x] = 1
for x in b:
x2flags[x] |= 2
result = [None, set(), set(), set()]
for x, flag in x2flags.items():
result[flag].add(x)
return result[1], result[3], result[2]

I won't accept my own answer unless nobody manages to beat my own solution or any of the good and concise Python ones.
But for anyone interested in some numbers:
from random import randint
from timeit import timeit
def grismar(a: set, b: set):
h, i, t = set(), set(), b.copy()
for x in a:
if x in t:
i.add(x)
t.remove(x)
else:
h.add(x)
return h, i, t
def good(a: set, b: set):
return a - b, a & b, b - a
def better(a: set, b: set):
h, t = a - (i := a & b), b - i
return h, i, t
def ok(a: set, b: set):
return a - (a & b), a & b, b - (a & b)
from collections import defaultdict
def tim(a, b):
x2flags = defaultdict(int)
for x in a:
x2flags[x] = 1
for x in b:
x2flags[x] |= 2
result = [None, set(), set(), set()]
for x, flag in x2flags.items():
result[flag].add(x)
return result[1], result[3], result[2]
def pychopath(a, b):
h, t = set(), b.copy()
h_add = h.add
t_remove = t.remove
i = {x for x in a
if x in t and not t_remove(x) or h_add(x)}
return h, i, t
def enke(a, b):
t = b - (i := a - (h := a - b))
return h, i, t
xs = set(randint(0, 10000) for _ in range(10000))
ys = set(randint(0, 10000) for _ in range(10000))
# validation
g = (f(xs, ys) for f in (grismar, good, better, ok, tim, enke))
l = set(tuple(tuple(sorted(s)) for s in t) for t in g)
assert len(l) == 1, 'functions are equivalent'
# warmup, not competing
timeit(lambda: grismar(xs, ys), number=500)
# competition
print('a - b, a & b, b - a ', timeit(lambda: good(xs, ys), number=10000))
print('a - (i := a & b), b - i ', timeit(lambda: better(xs, ys), number=10000))
print('a - (a & b), a & b, b - (a & b) ', timeit(lambda: ok(xs, ys), number=10000))
print('tim ', timeit(lambda: tim(xs, ys), number=10000))
print('grismar ', timeit(lambda: grismar(xs, ys), number=10000))
print('pychopath ', timeit(lambda: pychopath(xs, ys), number=10000))
print('b - (i := a - (h := a - b)) ', timeit(lambda: enke(xs, ys), number=10000))
Results:
a - b, a & b, b - a 5.6963334
a - (i := a & b), b - i 5.3934624
a - (a & b), a & b, b - (a & b) 9.7732018
tim 16.3080373
grismar 7.709292500000004
pychopath 6.76331460000074
b - (i := a - (h := a - b)) 5.197220600000001
So far, the optimisation proposed by #enke in the comments appears to win out:
t = b - (i := a - (h := a - b))
return h, i, t
Edit: added #Pychopath's results, which is indeed substantially faster than my own, although #enke's result is still the one to beat (and likely won't be with just Python). If #enke posts their own answer, I'd happily accept it as the answer.

Optimized version of yours, seems to be about 20% faster than yours in your benchmark:
def hit(a, b):
h, t = set(), b.copy()
h_add = h.add
t_remove = t.remove
i = {x for x in a
if x in t and not t_remove(x) or h_add(x)}
return h, i, t
And you might want to do this at the start, especially if the two sets can have significantly different sizes:
if len(a) > len(b):
return hit(b, a)[::-1]

sympy - substitute a specific power

I have this equality.
import sympy as sp
D, L, V = sp.symbols("D, L, V", real=True, positive=True)
Veq = sp.Eq(V, sp.pi * D**3 / 4 * (sp.Rational(2, 3) + L / D))
I would like to solve Veq for D**3. If I try a direct approach, sp.solve(Veq, D**3) the computation is going to take a while eventually giving me a tremendously long result (useless to me).
My attempt: trying to substitute D**3 with a new symbol, then solve for it. Unfortunately, the substitution is also going to replace the other D in the equality:
t = sp.symbols("t")
print(Veq.subs(D**3, t))
>>> Eq(V, pi*t*(L/t**(1/3) + 2/3)/4)
Note the term L/t**(1/3). I would like it to be L/D after the substitution. So far I've managed to manipulate the expression and reaching my goal with this code:
res = sp.Mul(*[a.subs(D**3, sp.symbols("t")) if a.has(D**3) else a for a in asd.args[1].args])
Veq = sp.Eq(V, res)
print(Veq)
>>> Eq(V, pi*t*(2/3 + L/D)/4)
I'm wondering, is there some flag for subs that I can use to reach my goal? Or some other method?

If you want the substitution to be exact you can use the exact flag:
>>> var('D V L')
(D, V, L)
>>> Veq = sp.Eq(V, sp.pi * D**3 / 4 * (sp.Rational(2, 3) + L / D))
>>> Veq.subs(D**3,y,exact=True)
Eq(V, pi*y*(2/3 + L/D)/4)
>>> solve(Veq.subs(D**3,y,exact=True),y)
[12*D*V/(pi*(2*D + 3*L))]
The exact flag appears to be ignore when assumptions are given:
>>> D, L, V = symbols("D, L, V", real=True, positive=True)
>>> (D**3+D).subs(D**3,y,exact=True)
y**(1/3) + y
>>> D, L, V = symbols("D, L, V")
>>> (D**3+D).subs(D**3,y,exact=True)
D + y
You can use replace for your situation:
>>> D, L, V = symbols("D, L, V", real=True, positive=True)
>>> (D**3+D).replace(D**3,y)
D + y
But since your expression is a Relational you have to use replace on the arguments, not the Relational (or else you will get an error):
>>> eq = Eq(D**3, D - 1)
>>> eq.func(*[a.replace(D**3,y) for a in eq.args])
Eq(y, D - 1)

How to find reverse of pow(a,b,c) in python?

pow(a,b,c) operator in python returns (a**b)%c . If I have values of b, c, and the result of this operation (res=pow(a,b,c)), how can I find the value of a?

Despite the statements in the comments this is not the discrete logarithm problem. This more closely resembles the RSA problem in which c is the product of two large primes, b is the encrypt exponent, and a is the unknown plaintext. I always like to make x the unknown variable you want to solve for, so you have y= xb mod c where y, b, and c are known, you want to solve for x. Solving it involves the same basic number theory as in RSA, namely you must compute z=b-1 mod λ(c), and then you can solve for x via x = yz mod c. λ is Carmichael's lambda function, but you can also use Euler's phi (totient) function instead. We have reduced the original problem to computing an inverse mod λ(c). This is easy to do if c is easy to factor or we already know the factorization of c, and hard otherwise. If c is small then brute-force is an acceptable technique and you can ignore all the complicated math.
Here is some code showing these steps:
import functools
import math
def egcd(a, b):
"""Extended gcd of a and b. Returns (d, x, y) such that
d = a*x + b*y where d is the greatest common divisor of a and b."""
x0, x1, y0, y1 = 1, 0, 0, 1
while b != 0:
q, a, b = a // b, b, a % b
x0, x1 = x1, x0 - q * x1
y0, y1 = y1, y0 - q * y1
return a, x0, y0
def inverse(a, n):
"""Returns the inverse x of a mod n, i.e. x*a = 1 mod n. Raises a
ZeroDivisionError if gcd(a,n) != 1."""
d, a_inv, n_inv = egcd(a, n)
if d != 1:
raise ZeroDivisionError('{} is not coprime to {}'.format(a, n))
else:
return a_inv % n
def lcm(*x):
"""
Returns the least common multiple of its arguments. At least two arguments must be
supplied.
:param x:
:return:
"""
if not x or len(x) < 2:
raise ValueError("at least two arguments must be supplied to lcm")
lcm_of_2 = lambda x, y: (x * y) // math.gcd(x, y)
return functools.reduce(lcm_of_2, x)
def carmichael_pp(p, e):
phi = pow(p, e - 1) * (p - 1)
if (p % 2 == 1) or (e >= 2):
return phi
else:
return phi // 2
def carmichael_lambda(pp):
"""
pp is a sequence representing the unique prime-power factorization of the
integer whose Carmichael function is to be computed.
:param pp: the prime-power factorization, a sequence of pairs (p,e) where p is prime and e>=1.
:return: Carmichael's function result
"""
return lcm(*[carmichael_pp(p, e) for p, e in pp])
a = 182989423414314437
b = 112388918933488834121
c = 128391911110189182102909037 * 256
y = pow(a, b, c)
lam = carmichael_lambda([(2,8), (128391911110189182102909037, 1)])
z = inverse(b, lam)
x = pow(y, z, c)
print(x)

The best you can do is something like this:
a = 12
b = 5
c = 125
def is_int(a):
return a - int(a) <= 1e-5
# ============= Without C ========== #
print("Process without c")
rslt = pow(a, b)
print("a**b:", rslt)
print("a:", pow(rslt, (1.0 / b)))
# ============= With C ========== #
print("\nProcess with c")
rslt = pow(a, b, c)
i = 0
while True:
a = pow(rslt + i*c, (1.0 / b))
if is_int(a):
break
else:
i += 1
print("a**b % c:", rslt)
print("a:", a)
You can never be sure that you have found the correct modulo value, it is the first value that is compatible with your settings. The algorithm is based on the fact that a, b and c are integers. If they are not you have no solution a likely combination that was the original one.
Outputs:
Process without c
a**b: 248832
a: 12.000000000000002
Process with c
a**b % c: 82
a: 12.000000000000002

whats wrong with this while loop in Python?

Ok so I have spent hours trying to resolve this and I feel its some simple error but I cannot find a way to resolve this.
the section I am having issues with is the second half of the code. There seems to be an infinite loop somewhere among the 2 nested while loops. If anyone is able to help, this would be great, thanks in advance.
import sympy as sym
import random
A, B, C, D, E, F, G, H, I, J = sym.symbols('A, B, C, D, E, F, G, H, I, J')
picks_a_person = [A, B, C, D, E, F, G, H, I, J] #List of people picking a name from a hat
person_gets_picked = [A, B, C, D, E, F, G, H, I, J] # List of names drawn from a hat
def re_draws(p):
n = 0
count = 0
while n < 1000: #Repeats the test 1000 times for an accurate percentage
n += 1
random.shuffle(person_gets_picked) #Chooses a random order of the list of names drawn
for i in range(p):
if person_gets_picked[i] == picks_a_person[i]: #Checks for all 'p' elements of the lists are different
count = count + 1
print("count = " + str(count)) #Returns the number of times a re-draw was not required
import numpy as np
from collections import Counter
total = []
while len(total) < 1000:
order = []
picks_a_person = [A, B, C, D, E, F, G, H, I, J]
person_gets_picked = [A, B, C, D, E, F, G, H, I, J]
while len(order) < 10:
a = person_gets_picked[random.randint(0, (len(person_gets_picked)-1))]
if a != picks_a_person[0]:
order.append(a)
person_gets_picked.remove(a)
del picks_a_person[0]
total.append(order)
Counter(np.array(total)[:,1])

While there are a lot of odd things about your code, this is where it gets into an infinite loop:
picks_a_person = [A, B, C, D, E, F, G, H, I, J]
person_gets_picked = [A, B, C, D, E, F, G, H, I, J]
while len(order) < 10:
a = person_gets_picked[random.randint(0, (len(person_gets_picked)-1))]
if a != picks_a_person[0]:
order.append(a)
person_gets_picked.remove(a)
del picks_a_person[0]
total.append(order)
Let's do some rubber duck debugging - what happens when your random.randint(0, (len(person_gets_picked)-1)) returns a number larger than 0 nine times in a row (worst case scenario)? All person_gets_picked elements except A get removed and added to the order list (which is still under 10 elements to break away from the while loop).
At that point we have a state as picks_a_person = [A] and person_gets_picked = [A]. random.randint(0, (len(person_gets_picked)-1)) will, thus, always return 0, a will always be set to A and since picks_a_person[0] == A the condition if a != picks_a_person[0] will never be evaluated as True, hence the order will never get its 10th element and therefore you got yourself an infinite loop.
It doesn't even have to be nine positive numbers in a row for this to occur - all it needs to happen is for A to remain as one of the last two picks and for random to land on the other option.
So why don't you write your whole loop as:
persons = [A, B, C, D, E, F, G, H, I, J]
persons_num = len(persons)
total = [random.sample(persons, persons_num) for _ in range(1000)]
And you're done.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

performing more complex calculations on groupby objects - python

Related

Finding the positions of multiple objects in an image, but only when they are next to each other

Intersecting two sets, retaining all (up to) three parts efficiently

sympy - substitute a specific power

How to find reverse of pow(a,b,c) in python?

whats wrong with this while loop in Python?

Categories

Resources