Ta-lib evaluation order for building series - python

I'm building indicator series based on market prices using ta-lib. I made a couple of implementations of the same concept but I found the same issue in any implementation. To obtain a correct series of values I must revert the input series and finally revert the resulting series. The python code that does the call to ta-lib library through a convenient wrapper is:
rsi1 = np.asarray(run_example( function_name,
arguments,
30,
weeklyNoFlatOpen[0],
weeklyNoFlatHigh[0],
weeklyNoFlatLow[0],
weeklyNoFlatClose[0],
weeklyNoFlatVolume[0][::-1]))
rsi2 = np.asarray(run_example( function_name,
arguments,
30,
weeklyNoFlatOpen[0][::-1],
weeklyNoFlatHigh[0][::-1],
weeklyNoFlatLow[0][::-1],
weeklyNoFlatClose[0][::-1],
weeklyNoFlatVolume[0][::-1]))[::-1]
The graphs of both series can be observed here (the indicator is really SMA):
The green line is clearly computed in reverse order (from n sample to 0) and the red one in the expected order. To achieve the red line I must reverse input series and output series.
The code of this test is available on: python code
Anybody observed the same behavior?

I found what's wrong with my approach to the problem. The simple answer is that the MA indicator puts the first valid value on the results array in the position zero, so the result series starts from zero and has N less samples than the input series (where N is the period value in this case). The reverted computation idea was completely wrong.
Here's the proof:
enter image description here
Adding 30 zeros at the beginning and removing the last ones the indicator fits over the input series nicely.
enter image description here

Related

Is there another way to convert ee.Number to float except getInfo()?

Hello friends!
Summarization:
I got a ee.FeatureCollection containing around 8500 ee.Point-objects. I would like to calculate the distance of these points to a given coordinate, lets say (0.0, 0.0).
For this i use the function geopy.distance.distance() (ref: https://geopy.readthedocs.io/en/latest/#module-geopy.distance). As input the the function takes 2 coordinates in the form of 2 tuples containing 2 floats.
Problem: When i am trying to convert the coordinates in form of an ee.List to float, i always use the getinfo() function. I know this is a callback and it is very time intensive but i don't know another way to extract them. Long story short: To extract the data as ee.Number it takes less than a second, if i want them as float it takes more than an hour. Is there any trick to fix this?
Code:
fc_containing_points = ee.FeatureCollection('projects/ee-philadamhiwi/assets/Flensburg_100') #ee.FeatureCollection
list_containing_points = fc_containing_points.toList(fc_containing_points.size()) #ee.List
fc_containing_points_length = fc_containing_points.size() #ee.Number
for index in range(fc_containing_points_length.getInfo()): #i need to convert ee.Number to int
point_tmp = list_containing_points.get(i) #ee.ComputedObject
point = ee.Feature(point_tmp) #transform ee.ComputedObject to ee.Feature
coords = point.geometry().coordinates() #ee.List containing 2 ee.Numbers
#when i run the loop with this function without the next part
#i got all the data i want as ee.Number in under 1 sec
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0]) #tuple containing 2 floats
#when i add this part to the function it takes hours
PS: This is my first question, pls be patient with me.
I would use .map instead of your looping. This stays server side until you export the table (or possibly do a .getInfo on the whole thing)
fc_containing_points = ee.FeatureCollection('projects/eephiladamhiwi/assets/Flensburg_100')
fc_containing_points.map(lambda feature: feature.set("distance_to_point", feature.distance(ee.Feature(ee.Geometry.Point([0.0,0.0])))
# Then export using ee.batch.Export.Table.toXXX or call getInfo
(An alternative might be to useee.Image.paint to convert the target point to an image then, use ee.Image.distance to calculate the distance to the point (as an image), then use reduceRegions over the feature collection with all points but 1) you can only calculate distance to a certain distance and 2) I don't think it would be any faster.)
To comment on your code, you are probably aware loops (especially client side loops) are frowned upon in GEE (primarily for the performance reasons you've run into) but also note that any time you call .getInfo on a server side object it incurs a performance cost. So this line
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0])
Would take roughly double the time as this
coords_client = coords.getInfo()
coords_as_tuple_of_ints = (coords_client[1],coords_client[0])
Finally, you could always just export your entire feature collection to a shapefile (using ee.batch.Export.Table.... as above) and do all the operations using geopy locally.

Pandas - value changes when adding a new column to a dataframe

I'm trying to add a new column to a dataframe using the following code.
labels = df1['labels']
df2['labels'] = labels
However, in the later part of my program, I found that there might be something wrong with the assignment. So, I checked it using
labels.equals(other=df2['labels'])
and I got a False. (I added this line instantly after assignment)
I also tried to
print out part of labels and df2, and it turns out that there are indeed some lines that are different.
check max and min values of both series, and they are different
check number of unique values in both series using len(set(labels)) and len(set(df2['labels'])), and they differs a lot
test with a smaller amount of data, but this works totally fine.
My dataframe is rather large (40 million+ lines), so I cannot print them all out and check the values. Does anyone have any idea about what might lead to this kind of problem? Or is there any suggestions for further tests?

How does the adjust method work in Pandas ewm() function?

When calculating a exponentially weighted average in Pandas the parameter adjust is set to a default value of True.
I know what the adjust parameter does (but not how it does it which is what I want to know).
When adjust = True the ewa is calculated for every point in the sample but when adjust=False, then for a window of size n, you must wait for n observations to calculate the first ewa value.
I looked at the pandas documentation but it only proves that the adjust = True is equivalent to adjust = False for later values. It doesn't say how the earlier values are adjusted in the adjust=True case.
https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows
I even looked at the pandas code on github:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/window/ewm.py
see L99 onwards: but it just seems to be using the regular ewm formula for the earlier points?
This blog post demonstrates the difference between the two version of ewm based on the following data points:
https://towardsdatascience.com/trading-toolbox-02-wma-ema-62c22205e2a9
I tried to replicate the results in the blog post, for the earlier data points, using the formula in L99 above.
Where every time I calculate the mean I am using the current and all preceding ewm values.
Is this what pandas ewm function also use all previous values when calculating the mean?
i Price alpha^i ewm ewm.mean
0 1
1 22.273 0.181818 =22.273*1/1 =22.273 22.273
2 22.194 0.03306 =(22.194*1+22.273*0.03306)/(1+0.03306)=22.20615 22.23958
3 22.085 0.00601 =(22.085*1+22.194*0.181818+22.273*0.03306)/(1+0.181818+0.3306)=22.10643 22.19519
The results are different to those shown in the blog post but if the method was correct they should be exactly the same.
Can someone please tell me where I'm going wrong?

Resampling the signal array in Python to use signal.fftconvolve()

I have a problem with two signals in an array that i want to use with the function fftconvolve.
They represent two measurements of same time duration and the start and the end of the signal are matched.
The trouble is that because of their different sampling rate, at which each measurement was taken, the array lengths are different
LS1= len(SIG1) # - > LS1=819
LS2= len(SIG2) # - > LS2=3441
therefore the convolution is not calculated properly.
What i need is basically a way to correctly down-sample the longer array signal to get LS1=LS2.
I have tried using it with mode='same' as it says in the function description
KOR=signal.fftconvolve(SIG1, SIG2, mode='same')
but the output still seems strange and I realy dont know if the calculation is correct.
Here is an example of signal convolution plot.
Than you for any help.
SOLUTION: It was quick & simple! Thank you J. Piquard!! The 'resample' function does the trick
SIG2 = signal.resample(SIG2, LS1)

incorrect mean from PANDAS dataframe

So here's an interesting thing:
Using python 2.7:
I've got a dataframe of about 5,100 entries, each with a number (melting point) in a column titled 'Tm'. Using the code:
self.sort_df[['Tm']].mean(axis=0)
I get a mean of:
Tm 92.969204
dtype: float64
This doesn't make sense because no entry has a Tm of greater than 83.
Does .mean() not work for this many values? I've tried pairing down the dataset and it seems to work for ~1,000 entries but considering I have full dataset of 150,000 to run at once, I'd like to know if I need to find a different way to calculate the mean.
A more readable syntax would be :
sort_df['Tm'].mean()
Try to do a sort_df['Tm'].value_counts() or sort_df['Tm'].max() to see what values are present. Some unexpected values must have crept up.
The .mean function gives accurate result irrespective of the size.

Categories