Guys I have the following short version of my code.
ab10=10
ab11=78
ab12=68
bc10=91
bc11=73
bc12=54
df10=87
df11=81
df12=90
b1=ab10/df10
b2=ab11/df11
b3=ab12/df12
c1=bc10/df10
c2=bc11/df11
c3=bc12/df12
m1=bc10/ab10
m2=bc11/ab11
m3=bc12/ab12
Isthere shorter way to make such multiplications as I have more and more such variables to calculate by years from 10-12 ?
I tried for i in range (10,12) but not working. In my case variables ab10, ab11 and so on I used other variables. Doing everything manually takes lots of time provided there are more years not limited to 10,11,12 at least 10 years. I will be happy to hear any way to simplify the process.
I would. appreciate any thoughts or codes shares to direct me and make my work efficient.
You could also consider using numpy if the amount of numbers is very high, the numbers are saved as vectors to which you can perform operations:
import numpy as np
ab1 = np.array([10,78,68])
bc1 = np.array([91,73,54])
df1 = np.array([87,81,90])
b = ab1/df1
c = bc1/df1
m = bc1/ab1
You can use dictionary for this kind of problems:
my_dict = {'ab10': 10, 'ab11':78, 'df10':87}
b1 = my_dict.ab10 / my_dict.df10 or my_dict['ab10] / my_dict['df10]
Related
I am using sympy sympify function for evaluating a formula (dynamic) for data present in a dataframe.
import sympy as sy
def evaluate_function(formula,dataframe):
gfg_exp = sy.sympify(formula)
dataframe_dict=dataframe.to_dict()
gfg_exp = gfg_exp.subs(dataframe_dict)
return gfg_exp
df['result']=df.apply(lambda row:evaluate_function(formula=condition_to_check,dataframe=row),axis=1)
sample datadata is like:
A B
200 400
320 100
formula: A/B > 1
This is working for small datasets (around 20k records in less time), but when dataset size is huge around 1 million records -
its taking longer time to finish the computation.
Is there anyother way to do this process.
Thanks in advance.
You might try using lambdify to convert your expression into a Python function, rather than using subs. See the documentation https://docs.sympy.org/latest/modules/utilities/lambdify.html#sympy.utilities.lambdify.lambdify
In the image above column B is multiples of A1 and column C is C = C + B (working down the rows)
I worked out that in order for C for be 50 in 20 rows A1 has to be 0.2631579 but I'd like to be able to simplify that to a function that will return a list: list = exp(50, 20).
I'm not sure about the terminology of such a script so researching beforehand didn't really bring anything up sorry.
Well based on your problem statement, we know that:
Bn=(n-1)×a; and Cn=(n-1)×n×a/2 (here a is the value for A1).
So we only have to solve a for C20=50. Or more genic: Cn=m. This is simply: a = 2×m/(n×(n-1)).
So the function is simply:
def find_threshold(m,n):
return 2.0*m/(n*(n-1))
For your sample input, the lower bound is:
>>> find_threshold(50,20)
0.2631578947368421
If you plug in this value in the Excel sheet, you will obtain 50 (although there can be small rounding errors). Given we assume calculations on the numbers are done in constant time, this script works in constant time as well (O(1)) so it is quite fast (even if the row number, etc. would be huge).
here is my problem:
I would like to define an array of persons and change the entries of this array in a for loop. Since I also would like to see the asymptotics of the resulting distribution, I want to repeat this simulation quiet a lot, thus I'm using a matrix to store the several array in each row. I know how to do this with two for loops:
import random
import numpy as np
nobs = 100
rep = 10**2
steps = 10**2
dmoney = 1
state = np.matrix([[10] * nobs] * rep)
for i in range(steps):
for j in range(rep)
sample = random.sample(range(state.shape[1]),2)
state[j,sample[0]] = state[j,sample[0]] + dmoney
state[j,sample[1]] = state[j,sample[1]] - dmoney
I thought I use the multiprocessing library but I don't know how to do it, because in my simple mind, the workers manipulate the same global matrix in parallel, which I read is not a good idea.
So, how can I do this, to speed up calculations?
Thanks in advance.
OK, so this might not be much use, I haven't profiled it to see if there's a speed-up, but list comprehensions will be a little faster than normal loops anyway.
...
y_ix = np.arange(rep) # create once as same for each loop
for i in range(steps):
# presumably the two locations in the population to swap need refreshing each loop
x_ix = np.array([np.random.choice(nobs, 2) for j in range(rep)])
state[y_ix, x_ix[:,0]] += dmoney
state[y_ix, x_ix[:,1]] -= dmoney
PS what numpy splits over multiple processors depends on what libraries have been included when compiled (BLAS etc). You will be able to find info on line about this.
EDIT I can confirm, after comparing the original with the numpy indexed version above, that the original method is faster!
I am working with a large amount of data and am trying to use the fftpack in scipy to help me out. In Matlab, I have code that looks like:
val1 = fft(val1,fft_window_size);
where fft_window_size is 2^19.
This outputted:
val1
ans = -5.5162
ans = 4.5001 - 0.0263i
ans = -2.4261 + 0.0256i
ans = 0.8575 - 0.0233i
ans = -0.2189 + 0.0531i
For the first 5 indices of Val1.
In Python, I used:
val1=scipy.fftpack.fft(val1,fft_window_size)
where the fft_window_size was the same as listed above (2**19) and got completely different answers for the first 5 indices:
('val1', array([-7.6888 +0j, 5.2122 - 0.07556j,
-1.4928+0.02275j, 0.15854 +0.01481j, -0.07528+0.03379j]))
I've looked at as many examples as I could find and couldn't find a good answer as to why they are so vastly different. I checked val1 all the way up to this command and the two matched perfectly (with Python having more decimal places). I don't think this is a rounding issue, but I'm not sure what else to look at.
Any thoughts would be good.
Using: Python 2.7.1 on Windows
Hello, I fear this question has a very simple answer, but I just can't seem to find an appropriate and efficient solution (I have limited python experience). I am writing an application that just downloads historic weather data from a third party API (wundergorund). The thing is, sometimes there's no value for a given hour (eg, we have 20 degrees at 5 AM, no value for 6 AM, and 21 degrees at 7 AM). I need to have exactly one temperature value in any given hour, so I figured I could just fit the data I do have and evaluate the points I'm missing (using SciPy's polyfit). That's all cool, however, I am having problems handling my program to detect if the list has missing hours, and if so, insert the missing hour and calculate a temperature value. I hope that makes sense..
My attempt at handling the hours and temperatures list is the following:
from scipy import polyfit
# Evaluate simple cuadratic function
def tempcal (array,x):
return array[0]*x**2 + array[1]*x + array[2]
# Sample data, note it has missing hours.
# My final hrs list should look like range(25), with matching temperatures at every point
hrs = [1,2,3,6,9,11,13,14,15,18,19,20]
temps = [14.0,14.5,14.5,15.4,17.8,21.3,23.5,24.5,25.5,23.4,21.3,19.8]
# Fit coefficients
coefs = polyfit(hrs,temps,2)
# Cycle control
i = 0
done = False
while not done:
# It has missing hour, insert it and calculate a temperature
if hrs[i] != i:
hrs.insert(i,i)
temps.insert(i,tempcal(coefs,i))
# We are done, leave now
if i == 24:
done = True
i += 1
I can see why this isn't working, the program will eventually try to access indexes out of range for the hrs list. I am also aware that modifying list's length inside a loop has to be done carefully. Surely enough I am either not being careful enough or just overlooking a simpler solution altogether.
In my googling attempts to help myself I came across pandas (the library) but I feel like I can solve this problem without it, (and I would rather do so).
Any input is greatly appreciated. Thanks a lot.
When I is equal 21. It means twenty second value in list. But there is only 21 values.
In future I recommend you to use PyCharm with breakpoints for debug. Or try-except construction.
Not sure i would recommend this way of interpolating values. I would have used the closest points surrounding the missing values instead of the whole dataset. But using numpy your proposed way is fairly straight forward.
hrs = np.array(hrs)
temps = np.array(temps)
newTemps = np.empty((25))
newTemps.fill(-300) #just fill it with some invalid data, temperatures don't go this low so it should be safe.
#fill in original values
newTemps[hrs - 1] = temps
#Get indicies of missing values
missing = np.nonzero(newTemps == -300)[0]
#Calculate and insert missing values.
newTemps[missing] = tempcal(coefs, missing + 1)