I have a time-series data and i am trying to calculate angle (degree) between two points. Here is what i did so far but it doesn't seem to give the correct solution:
bars = 2
df = pd.read_csv("EURUSD.csv")
df = df.reset_index()
df['A'] = np.rad2deg(np.arctan2(df['Low']-df['Low'].shift(pts), df['index']-df['index'].shift(pts)))
However, sometimes this gives me weird outputs like:
2693 3.141258
2702 -3.141383
2708 -3.141451
2719 -3.141033
2724 -3.140893
2734 3.141550
I have also tried the following code:
df['A'] = ((df['Low']-df['Low'].shift(pts))/(df['index']-df['index'].shift(pts)))
2693 -0.000334
2702 0.000210
2708 0.000142
2719 0.000560
2724 0.000700
2734 -0.000043
what am i doing wrong here?
Here is the screenshot i'm trying to do. I'm simply trying to find that -48 degree in Python. I am not trying to get these points automatically. I have spotted them manually and just need to do calculation.
I guess that your question is how do I calculated the angle between two lines? Where those lines are each of them defined by a single point and a common origin. Then you want to perform this operation for a series of x1, x2 points recorded over time.
Here you can find the arithmetics and here an example.
To get your line angle between the two points, you'll need the following:
price difference (looks like 1.29250 - 1.29650 = -0.004)
number of bar between the two points (That appears to be 10 bars)
Price to Bar ratio (you'll have to look at the settings for that particular graph)
price_diff = -0.004
bars = 10
price_to_bar = unknown
X = bars * price_to_bar
Final output:
import numpy as np
round(np.angle(complex(x, price_diff), deg=True), 0)
I'm trying to transform some points that are tabulated .csv in a netcdf file.
This is my .csv file: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
In my spreadsheet, I have the unique location of each point, not regular for all area but points are spaced by 0.1 degree, an SP value per year up to 100 years forward.
To work with this data, I needed something like other sources that use netcdf data tabled in sp(time, lat, lon). So, I can evaluate and visualize the values of this specific region by year (using panoply or ncview for example).
For that, I came up with this code:
import pandas as pd
import xarray as xr
import numpy as np
csv_file = 'example.csv'
df = pd.read_csv(csv_file)
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.fillna(0)
xc.to_netcdf(csv_file + '.nc')
And I got a netcdf file like this: https://1drv.ms/u/s!AhZf0QH5jEVSjWfnPtJjJgmXf-i0?e=WEpMyU
At first, my code seems to work and create my netcdf file without problems, however, I noticed that in some places I am creating some "leakage" of points, or interpolating the same values in some direction (north-south and west-east) when it shouldn't happen.
If you do a simple plot before converting to xarray you can see there are 3 west segments and one south segment
And this ends up being masked a bit when I fill the NaN with 0 and plot it again:
Checking the netcdf file using panoply I got something similar as well:
So I've start to check every-step of my code to see if I miss something.. my first guess was the melt part but I not 100% sure because if I plot df I can't see any leaking or extrapolation in the same region:
joint_axes = seaborn.jointplot(
x="lon", y="lat", data=df, s=0.5
So anyone have any idea what's happening here?
Now a solution that would help me at the moment would be to fill in the missing coordinates with a value equal to 0 within my domain area using the minimum and maximum latitudes and longitudes.
My first (and unconventional) idea was to create a 0.1 x 0.1 grid with values equal to zero and feed this grid with my existing values.
However, the method using reindex would help me and I would be able to execute it in a few lines. My doubt is whether I should do this before or after the df.melt in my code.
I'm in this situation:
csv_file = '/Users/helioguerraneto/Desktop/example.csv'
df = pd.read_csv(csv_file)
lonmin, lonmax = df['lon'].min(), df['lon'].max()
latmin, latmax = df['lat'].min(), df['lat'].max()
df = pd.melt(df, id_vars=["lon", "lat"], var_name="time", value_name="sp")
df['time']= pd.to_datetime(df['time'])
df = df.set_index(["time", "lat", "lon"])
df = df.astype('float32')
xr = df.to_xarray()
xc = xr.reindex(lat=np.arange(latmin, latmax, 0.1), lon=np.arange(lonmin, lonmax, 0.1), fill_value=0)
xc.to_netcdf(csv_file + '.nc')
Seems like reindex is the way but I need to keep original data. I was expecting some zeros but not in all area:
I think I found something might help! My goal now could be same what's happing here: How to interpolate latitude/longitude and heading in Pandas
But instead of interpolation by the nearest I just could match with the exactly coordinates. Maybe the real problem here is mix 100 hundred grids in the end..
Any suggestions?
I have a pandas DataFrame of 8664 rows. Shown here:
frame_3_LW contains the following columns of importance to me: EASTVEL , NORTHVEL, Z_NAP, DATE+TIME. Definitions of the columns are:
EASTVEL = Flow of current where min(-) values are west and plus(+) values are east.
NORTHVEL = Flow of current where min(-) values are south and plus(+) values are north.
Z_NAP = Depth of water
DATE+TIME = Date + time in this format: 2021-11-17 10:00:00
Now the problem that I encouter is the following: I want to generate a plot with EASTVEL on the x-axis and Z_NAP on the y-axis.
I used a simple:
df.plot(x = 'EASTVEL', y = 'Z_NAP')
However because I have so many values I get an unclear plot with a lot of lines. However I would like just one line describing the course of EASTVEL (x axis) against Z_NAP (y axis). Can anybody help me with that? It would be such a big help!
In my work I have the task to read in a CSV file and do calculations with it. The CSV file consists of 9 different columns and about 150 lines with different values acquired from sensors. First the horizontal acceleration was determined, from which the distance was derived by double integration. This represents the lower plot of the two plots in the picture. The upper plot represents the so-called force data. The orange graph shows the plot over the 9th column of the CSV file and the blue graph shows the plot over the 7th column of the CSV file.
As you can see I have drawn two vertical lines in the lower plot in the picture. These lines represent the x-value, which in the upper plot is the global minimum of the orange function and the intersection with the blue function. Now I want to do the following, but I need some help: While I want the intersection point between the first vertical line and the graph to be (0,0), i.e. the function has to be moved down. How do I achieve this? Furthermore, the piece of the function before this first intersection point (shown in purple) should be omitted, so that the function really only starts at this point. How can I do this?
In the following picture I try to demonstrate how I would like to do that:
If you need my code, here you can see it:
import numpy as np
import matplotlib.pyplot as plt
import math as m
import loaddataa as ld
import scipy.integrate as inte
from scipy.signal import find_peaks
import pandas as pd
import os
# Loading of the values
a,b = os.path.split(os.path.realpath(__file__))
path=path+"\\Data\\1 Fabienne\\Test1\\left foot\\50cm"
dataListStride = ld.loadData(path)
indexStrideData = 0
strideData = dataListStride[indexStrideData]
#%%Calculation of the horizontal acceleration
def horizontal(yAngle, yAcceleration, xAcceleration):
a = ((m.cos(m.radians(yAngle)))*yAcceleration)-((m.sin(m.radians(yAngle)))*xAcceleration)
return a
resultsHorizontal = list()
for i in range (len(strideData)):
strideData_yAngle = strideData.to_numpy()[i, 2]
strideData_xAcceleration = strideData.to_numpy()[i, 4]
strideData_yAcceleration = strideData.to_numpy()[i, 5]
resultsHorizontal.append(horizontal(strideData_yAngle, strideData_yAcceleration, strideData_xAcceleration))
resultsHorizontal.insert(0, 0)
#plt.plot(x_values, resultsHorizontal)
#x-axis "convert" into time: 100 Hertz makes 0.01 seconds
scale_factor = 0.01
x_values = np.arange(len(resultsHorizontal)) * scale_factor
#Calculation of the global high and low points
plt.scatter(heel_one.idxmax()*scale_factor,heel_one.max(), color='red')
plt.scatter(heel_one.idxmin()*scale_factor,heel_one.min(), color='blue')
plt.scatter(heel_two.idxmax()*scale_factor,heel_two.max(), color='orange')
plt.scatter(heel_two.idxmin()*scale_factor,heel_two.min(), color='green')#!
#Plot of force data
plt.plot(x_values[:-1],strideData.iloc[:,7]) #force heel
plt.plot(x_values[:-1],strideData.iloc[:,9]) #force toe
# while - loop to calculate the point of intersection with the blue function
i = heel_one.idxmax()
while strideData.iloc[i,7] > strideData.iloc[i,9]:
i = i-1
# Length calculation between global minimum orange function and intersection with blue function
#%% Integration of horizontal acceleration
velocity = inte.cumtrapz(resultsHorizontal,x_values)
plt.plot(x_values[:-1], velocity)
#%% Integration of the velocity
s = inte.cumtrapz(velocity, x_values[:-1])
I hope it's clear what I want to do. Thanks for helping me!
I didn't dig all the way through your code, but the following tricks may be useful.
Say you have x and y values:
x = np.linspace(0,3,100)
y = x**2
Now, you only want the values corresponding to, say, .5 < x < 1.5. First, create a boolean mask for the arrays as follows:
mask = np.logical_and(.5 < x, x < 1.5)
(If this seems magical, then run x < 1.5 in your interpreter and observe the results).
Then use this mask to select your desired x and y values:
x_masked = x[mask]
y_masked = y[mask]
Then, you can translate all these values so that the first x,y pair is at the origin:
x_translated = x_masked - x_masked[0]
y_translated = y_masked - y_masked[0]
Is this the type of thing you were looking for?
I am trying to use Pandas ewm function to calculating exponentially weighted moving averages. However i've noticed that information seems to carry through your entire time series. What this means is that every data point's MA is dependant on a different number of previous data points. Therefore the ewm function at every data point is mathematically different.
I think some here had a similar question
Does Pandas calculate ewm wrong?
But i did try their method, and i am not getting functionality i want.
def EMA(arr, window):
sma = arr.rolling(window=window, min_periods=window).mean()[:window]
rest = arr[window:]
return pd.concat([sma, rest]).ewm(com=window, adjust=False).mean()
a = pd.DataFrame([x for x in range(100)])
print(list(EMA(a, 10)[0])[-1])
print(list(EMA(a[50:], 10)[0])[-1])
In this example, i have an array of 1 through 100. I calculate moving averages on this array, and array of 50-100. The last moving average should be the same, since i am using only a window of 10. But when i run this code i get two different values, indicating that ewm is indeed dependent on the entire series.
IIUC, you are asking for ewm in a rolling window, which means, every 10 rows return a single number. If that is the case, then we can use a stride trick:
Edit: update function works on series only
def EMA(arr, window=10, alpha=0.5):
ret = pd.Series(index=arr.index, name=arr.name)
l = len(arr)
stride = arr.strides[0]
ret.iloc[window-1:] = (pd.DataFrame(np.lib.stride_tricks.as_strided(arr,
return ret
a = pd.Series([x for x in range(100)])
# 98 97.500169
# 99 98.500169
# Name: 9, dtype: float64
# 98 97.500169
# 99 98.500169
# Name: 9, dtype: float64
EMA(a, 2).tail(2)
98 97.75
99 98.75
dtype: float64
Test on random data:
a = pd.Series(np.random.uniform(0,1,10000))
fig, ax = plt.subplots(figsize=(12,6))
EMA(a,alpha=0.99, window=2).plot(ax=ax)
EMA(a,alpha=0.99, window=1500).plot(ax=ax)
Output: we can see that the larger window (green) is less volatile than the smaller window (orange).
This can be achieved by working with the formula for exponential smoothing by cancelling the lagged terms. The formula can be found on the ewm page.
The following code demonstrates that no memory is left after adjustment. For every point, the fixed window of information used is L=1000. And the factor f should be included if one desires to have the equivalent for the adjust=True version (for adjust=False just get rid of the f factor).
This seems to be possible in pandas 1.5 with a mix of rolling, and win_type:
pd.Series.rolling(window=10, win_type='exponential').mean(tau=0.5, center=10, sym=False)
I use a non symetric exponential window centered at the same size of the window in order to have a exponential function decaying towards the past.
This yields the same results as the EMA function provided by Quang Hoang.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def EMA(arr, window=10, alpha=0.5):
ret = pd.Series(index=arr.index, name=arr.name, dtype='float64')
l = len(arr)
stride = arr.strides[0]
ret.iloc[window-1:] = (pd.DataFrame(np.lib.stride_tricks.as_strided(arr,
return ret
a = pd.Series([x for x in range(100)])
builtin= a.rolling(window=10, win_type='exponential').mean(tau=0.5, center=10, sym=False)
custom=custom.plot.line(label="Custom EMA")
builtin.plot.line(label="Built-in EMA")
I am trying to visualize the correlation of the Result column with every other column.
A_B A_C B_C Result
0 0.318182 0.925311 0.860465 91
1 -0.384030 0.991803 0.996344 12
2 -0.818182 0.411765 0.920000 53
3 0.444444 0.978261 0.944444 64
A_B = (A-B)/(A+B) correspondingly all other values too.
which works for smaller no. of columns but if I increase the no. of columns then no. of rows in heatmap keeps on stacking up.Is there any compact way to represent it.
Following code will reproduce the output-
import pandas as pd
import seaborn as sns
data = {'A':[232,243,12,546,67,12,78,11,245],
df = pd.DataFrame(data,columns=['A','B','C','Result'])
#Responsible for (A-B)/(A+B) ,(A-C)/(A+C) and similarly
colnames = df.columns.tolist()[:-1]
for i,c in enumerate(colnames):
if i!=len(colnames):
for k in range(i+1,len(colnames)):
newdf = df[['A_B','A_C','B_C','Result']].copy()
#Plotting A_B,A_C,B_C by ignoring the output of result of itself
plot = pd.DataFrame(newdf.corr().iloc[:-1,-1])
A technique which I heard but unable to find any source ,is representing each correlation factor in the mini-recangles like
So according to it, considering the given map as a matrix of 3*3 and (0,0) starting from left-bottom, A_B will be represented in (1,1)
A_C in (2,1),B_C in (2,2).
But ,I am not getting it how to do it ?
You can plot the correlation of each column against the Result column and other columns as well. Below is one way to do so. Providing the x- and y-ticklabels guides you better for comparing the correlations. You can also annotate the correlation values to be displayed on the heat map.
cor = newdf.corr()
sns.heatmap(cor, xticklabels=cor.columns.values,
yticklabels=cor.columns.values, annot=True)