I'm trying to learn some python and are currently doing a few stock market examples. However, I ran across something called an Accumulated Distribution Line(technical indicator) and tried to follow the mathematical expression for this until I reached the following line:
ADL[i] = ADL[i-1] + money flow volume[i]
Now. I have the money flow volume at index 8 and an empty table for the ADL at index 9 (index for rows in a csv file). How would I actually compute the mathematical expression above in python? (Currently using Python with Pandas)
Currently tried using the range function such as:
for i in range(1,stock["Money flow volume"])):
stock.iloc[0,9] = stock.iloc[(i-1),9] + stock.iloc[i,8]
But I think I'm doing something wrong.
that just looks like a cumulative sum with an unspecified base case, so I'd just use the built in cumsum functionality.
import pandas as pd
df = pd.DataFrame(dict(mfv=range(10)))
df['adl'] = df['mfv'].cumsum()
should do what you want relatively efficiently
Related
I'm new at using Xarray (using it inside jupyter notebooks), and up to now everything has worked like a charm, except when I started to look at how much RAM is used by my functions (e.g. htop), which is confusing me (I didn't find anything on stackexchange).
I am combining monthly data to yearly means, taking into account month lengths, masking nan values and also using specific months only, which requires the use of groupby and resample.
As I can see from using the memory profiler these operations temporarily take up ~15gm RAM, which as such is not a problem because I have 64gb RAM at hand.
Nonetheless it seems like some memory is blocked permanently, even though I call these methods inside a function. For the function below it blocks ~4gb of memory although the resulting xarray only has a size of ~440mb (55*10**6 float 64entries), with more complex operations it blocks more memory.
Explicitly using del , gc.collect() or Dataarray.close() inside the function did not change anything.
A basic function to compute a weighted yearly mean from monthly data looks like this:
import xarray as xr
test=xr.open_dataset(path)['pr']
def weighted_temporal_mean(ds):
"""
Taken from https://ncar.github.io/esds/posts/2021/yearly-averages-xarray/
Compute yearly average from monthly data taking into account month length and
masking nan values
"""
# Determine the month length
month_length = ds.time.dt.days_in_month
# Calculate the weights
wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()
# Setup our masking for nan values
cond = ds.isnull()
ones = xr.where(cond, 0.0, 1.0)
# Calculate the numerator
obs_sum = (ds * wgts).resample(time="AS").sum(dim="time")
# Calculate the denominator
ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
# Return the weighted average
return obs_sum / ones_out
wm=weighted_temporal_mean(test)
print("nbytes in MB:", wm.nbytes / (1024*1024))
Any idea how to ensure that the memory is freed up, or am I overlooking something and this behavior is actually expected?
Thank you!
The only hypothesis I have for this behavior is that some of the operations involving the passed in ds modify it in place, increasing its size, as, apart of the returned objects, this the the only object that should survive after the function execution.
That can be easily verified by using del on the ds structure used as input after the function is run. (If you need the data afterwards, re-read it, or make a deepcopy before calling the function).
If that does not resolve the problem, then this is an issue with the xarray project, and I'd advise you to open an issue in their project.
I have big table of data that I read from excel in Python where I perform some calculation my dataframe looks like this but my true table is bigger and more complex but the logic stays the same:
with : My_cal_spread=set1+set2 and Errors = abs( My_cal_spread - spread)
My goal is to find using Scipy Minimize to the only same combination of (Set1 and Set 2) that can be used in each row so My_cal_spread is as close as possible to Spread by optimizing in finding the minimum sum of errors Possible.
this is the solution that I get when I am using excel solver, I'm looking for implementing the same solution using Scipy. Thanks
My code looks like this :
lnt=len(df['Spread'])
df['my_cal_Spread']=''
i=0
while i<lnt:
df['my_cal_Spread'].iloc[i]=df['set2'].iloc[i]+df['set1'].iloc[i]
df['errors'].iloc[i] = abs(df['my_cal_Spread'].iloc[i]-df['Spread'].iloc[i])
i=i+1
errors_sum=sum(df['errors'])
I am brand new to python, I am attempting to convert the function I made in R to Python, R function described here:
How to optimize this process?
From my reading it looks like the best way to do this in python would be to use a for loop that would take the following form
for line 1 in probe test
find user in U_lookup
find movie in M_lookup
take the value found in U_lookup and retrieve that line number from knn_text
take the values found in that row of knn_text, and retrieve the line numbers from dfm
for those line numbers in dfm, retrieve column=U_lookup
take the average of the non zero values found
save value into pandas datafame in new column for that line
Is this the most efficient (in terms of speed of calculation) way to complete an operation like this? Coming from R so I wasn't sure if there was better functionality for something like this within the pandas package or not.
As a followup, is there an equivalent in python to the function dput() in R? dput essentially provides code to easily share a subset of data for questions like this.
You can use df.apply(my_func, axis=1) to apply the function/calculation to each row of a dataframe.
Where, my_func would contain the required calculations
I'm currently writng a code involving some financial calculation. More in particular some exponential moving average. To do the job I have tried Pandas and Talib:
talib_ex=pd.Series(talib.EMA(self.PriceAdjusted.values,timeperiod=200),self.PriceAdjusted.index)
pandas_ex=self.PriceAdjusted.ewm(span=200,adjust=True,min_periods=200-1).mean()
They both work fine, but they provide different results at the begining of the array:
So there is some parameter to be change into pandas's EWMA or it is a bug and I should worry?
Thanks in advance
Luca
For the talib ema, the formula is:
So when using the pandas, if you want to make pandas ema the same as talib, you should use it as:
pandas_ex=self.PriceAdjusted.ewm(span=200,adjust=False,min_periods=200-1).mean()
Set the adjust as False according to the document(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html) if you want to use the same formula as talib:
When adjust is True (default), weighted averages are calculated using weights (1-alpha)(n-1), (1-alpha)(n-2), ..., 1-alpha, 1.
When adjust is False, weighted averages are calculated recursively as:
weighted_average[0] = arg[0]; weighted_average[i] = (1-alpha)weighted_average[i-1] + alphaarg[i].
You can also reference here:
https://en.wikipedia.org/wiki/Moving_average
PS: however, in my project, i still find some small differences between the talib and the pandas.ewm and don't know why yet...
When using the xarray package for Python 2.7, is it possible to group over multiple parameters like you can in pandas? In essence, an operation like:
data.groupby(['time.year','time.month']).mean()
if you wanted to get mean values for each year and month of a dataset.
Unfortunately, xarray does not support grouping with multiple arguments yet. This is something we would like to support and it would be relatively straightforward, but nobody has had the time to implement it yet (contributions would be welcome!).
An easy way around is to construct a multiindex and group by that "new" coordinate:
da_multiindex = da.stack(my_multiindex=['time.year','time.month'])
da_mean = da.groupby("my_multiindex").mean()
da_mean.unstack() # go back to normal index