How can I convert back? [duplicate] - python

This question already has answers here:
Pandas reverse of diff()
(6 answers)
Closed 4 years ago.
I converted my timeseries into stationary time series with differentiation
data['consumption_diff'] = data.consumption-data.consumption.shift(1)
How can I convert consumption_diff back into consumption?

You can use numpy's "r_" object which concatenates and flattens arrays and the "cumsum()" function which cumulatively sums values.
import numpy as np
undiffed = np.r_[data.consumption.iloc[0], data.consumption_diff.iloc[1:]].cumsum()
That is how you can undiff timeseries data and can be helpful if you've done a prediction into future dates that you need to undiff. However, you already have the undiffed values: data.consumption are your original undifffed data.

Related

subset fail on np.meshgrid generated dataframe [duplicate]

This question already has answers here:
Working with floating point NumPy arrays for comparison and related operations
(1 answer)
What is the best way to compare floats for almost-equality in Python?
(18 answers)
Pandas Dataframe Comparison and Floating Point Precision
(1 answer)
Closed 19 days ago.
I generate a dataframe for lonlat like this
a=np.arange(89.7664, 89.7789, 1e-4)
b=np.arange(20.6897, 20.7050, 1e-4)
temp_arr=np.array(np.meshgrid(a, b)).T.reshape(-1, 2)
np_df=pd.DataFrame(temp_arr, columns = ['lon','lat'])
and it create the dataframe I want
When I tried to subset the first lon
len(np_df[np_df['lon']==89.7664])
it will return 153. But when I tried subset some last lon
len(np_df[np_df['lon']==89.7788])
it will return 0
I wonder what is wrong here. Thank you
Use numpy.isclose if compare floats within a tolerance:
len(np_df[np.isclose(np_df['lon'], 89.7788)])
If still not working integers with multiple by 10000 and cast to ints should help:
len(np_df[np_df['lon'].mul(10000).astype(int).eq(897788)])

Combine numpy arrays [duplicate]

This question already has answers here:
NumPy stack or append array to array
(3 answers)
Closed 5 months ago.
I have three numpy arrays:
a1=np.array([5.048e-02, 2.306e+00, 0.000e+00])
a2=np.array([1.018e-01, 4.077e+00, 0.100e+00])
a3=np.array([1.02e-01, 5.077e+00, 0.200e+00])
As a combined result I would like to have:
array(
[5.048e-02, 1.018e-01, 1.02e-01],
[2.306e+00, 4.077e+00, 5.077e+00],
[0.000e+00, 0.100e+00, 0.200e+00]
)
How can I do this with numpy?
(Please excuse me for the error.)
np.array([a1,a2,a3])
Just create a new numpy array from those three individual array.

Need to plot Pairplot for a dataframe that has duplicate indices [duplicate]

This question already has answers here:
dataframe to long format
(2 answers)
Reshape wide to long in pandas
(2 answers)
Closed 9 months ago.
I have a dataframe 'df' (310, 7) and need to plot a pairplot for it. But I'm getting an error <ValueError: cannot reindex from a duplicate axis> when I do it in a regular way.
sns.pairplot(df,hue='Class')
ValueError: cannot reindex from a duplicate axis
The data is of this form:
[data]
P_incidence P_tilt L_angle S_slope P_radius S_Degree Class
0 38.505273 16.964297 35.112814 21.540976 127.632875 7.986683 Normal
1 54.920858 18.968430 51.601455 35.952428 125.846646 2.001642 Normal
2 44.362490 8.945435 46.902096 35.417055 129.220682 4.994195 Normal
3 48.318931 17.452121 48.000000 30.866809 128.980308 -0.910941 Normal
4 45.701789 10.659859 42.577846 35.041929 130.178314 -3.388910 Normal
I tried removing the duplicates using:
df.loc[df['L_angle'].duplicated(), 'L_angle'] = ''
But, this method converts the column to an object and I'm not able to negate it.
The expected output plot is as follows:
[expected]

Subtle differences in data calcuations from array vs list [duplicate]

This question already has answers here:
Different std in pandas vs numpy
(2 answers)
Why is pandas.Series.std() different from numpy.std()?
(1 answer)
Different result for std between pandas and numpy
(1 answer)
Difference between numpy var() and pandas var()
(1 answer)
Closed 2 years ago.
As you can see in the code below, I calculate variance for data in the 'open' column two different ways. The only difference being that in the second version I grab the values rather than the column containing values. Why would this lead to different variance calculations?
apple_prices = pd.read_csv('apple_prices.csv')
print(apple_prices['open'].values.var())
#prints 102.22564310059172
print(apple_prices['open'].var())
#prints 103.82291877403847
The reason for the difference is because that pandas.Series.var has a default ddof (delta degrees of freedom) of 1, and numpy.ndarray.var has a default ddof of 0. Manually setting this produces the same result:
import pandas as pd
import numpy as np
np.random.seed(0)
x = pd.Series(np.random.rand(100))
print(x.var(ddof=1))
# 0.08395738934787107
print(x.values.var(ddof=1))
# 0.08395738934787107
See the documentation at:
pandas.Series.var
numpy.var

How to change value of remainder of a row in a numpy array once a certain condition is met? [duplicate]

This question already has answers here:
Can NumPy take care that an array is (nonstrictly) increasing along one axis?
(2 answers)
Closed 3 years ago.
I have a 2d numpy array of the form:
array = [[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]]
I'd like to go to each of the rows, iterate over the entries until the value 1 is found, then replace every subsequent value in that row to a 1. The output would then look like:
array = [[0,0,0,1,1], [0,1,1,1,1], [1,1,1,1,1]]
My actual data set is very large, so I was wondering if there is a specialized numpy function that does something like this, or if there's an obvious way to do it that I'm missing.
Thanks!
You can use apply.
import numpy as np
array = np.array([[0,0,0,1,0], [0,1,0,0,0], [1,0,0,0,0]])
def myfunc(l):
i = 0
while(l[i]!=1):
i+=1
return([0]*i+[1]*(len(l)-i))
print(np.apply_along_axis(myfunc, 1, array))

Categories