ValueError: Found infinity in column x - python

I got an error ValueError: Found infinity in column x.
Traceback says
---> 20 model.fit(df)
242 df[‘x’] = pd.to_numeric(df[‘x’])
243 if np.isinf(df[‘x’].values).any():
--> 244 raise ValueError('Found infinity in column y.')
245 df[‘d’] = pd.to_datetime(df[‘d’])
246 if df[‘d’].isnull().any():
I really cannot understand what is the meaning of this error message because I do not have infinity number in df.How should I fix this?What is wrong in my codes?
My codes is
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from fbprophet import Prophet
for i in range(10):
df = pd.read_csv('data'+ i + '.csv', encoding='shift-jis')
model = Prophet()
model.fit(df)
future_data = model.make_future_dataframe(periods=12, freq = 'm')
forecast_data = model.predict(future_data)
model.plot(forecast_data)
model.plot_components(forecast_data)
plt.show()

So, you need to remove infinity values from your DataFrame. It can be done like this:
DataFrame.replace([np.inf, -np.inf], np.nan)
When you replaced infinity values to NaN you can remove it from DataFrame via dropna:
DataFrame.dropna(subset=["YourColumn"], how="all")

Related

IndexError: index 395469 is out of bounds for axis 0 with size 390

I builded a panda dataframe (about user activities on a website) called df with 4 columns : pass_id, item_id, ratings and date_view. I have a fixed number of 390 users and a fixed number of items of 2637.
Then I did this,
from sklearn import model_selection
train_data, test_data = model_selection.train_test_split(df, test_size=0.25)
print(np.shape(train_data))
(2289, 4)
and then I need to do this :
import numpy as np
train_data_matrix = np.zeros((390,2637))
for line in train_data.itertuples():
train_data_matrix[line[1]-1, line[2]-1] = line[3]
but I have this error message :
IndexError: index 395469 is out of bounds for axis 0 with size 390
What can I do please ?

ValueError: Found unknown categories ['Off', 'On'] in column 0 during fit

This is my code I do not know where the error is.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels
from sklearn.preprocessing import OrdinalEncoder,OneHotEncoder
Light = ['On','Off']
Watering = ['Low','High']
# create combinations for all parameters
experiments = [(x,y) for x in Light for y in Watering]
exp_df = pd.DataFrame(experiments,columns=['A','B'])
print(exp_df)
OHE_model = OneHotEncoder(handle_unknown = 'ignore')
enc = OrdinalEncoder(categories=[['A','B'],['Low','High']])
encoded_df = pd.DataFrame(enc.fit_transform(exp_df[['A','B']]),columns=['A','B'])
#define the experiments order which must be random
encoded_df['exp_order'] = np.random.choice(np.arange(4),4,replace=False)
encoded_df['outcome'] = [25,37,55,65]
and this is the error
ValueError: Found unknown categories ['Off', 'On'] in column 0 during fit
The error indicates that On and Off are expected in place off A and B.
Use this code for your enc variable instead
enc = OrdinalEncoder(categories=[['On','Off'],['Low','High']])

pycaret, regression on target column

I'm trying to apply some machine learning based regression on data from a CSV file. My columns are:
Index(['date', 'customer_id', 'product_category', 'payment_method',
'value [USD]', 'time_on_site', 'clicks_in_site', 'USD/[Minutes]',
'USD/clicks_in_site'],
dtype='object')
When I run:
from pycaret.regression import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
exp_reg = setup(data = df, target='value [USD]', session_id=123,
high_cardinality_features = ['product_category'],
normalize = True,
ignore_features = ['customer_id', 'date', 'time_on_site']
)
I get the following error message:
KeyError Traceback (most recent call last)
<ipython-input-43-20eab85de0cc> in <module>()
2 high_cardinality_features = ['product_category'],
3 normalize = True,
----> 4 ignore_features = ['customer_id', 'date', 'time_on_site']
5 )
6
5 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
5285 if mask.any():
5286 if errors != "ignore":
-> 5287 raise KeyError(f"{labels[mask]} not found in axis")
5288 indexer = indexer[~mask]
5289 return self.delete(indexer)
KeyError: "['value [USD]'] not found in axis"
I found the solution. The column name ['value [USD]'] was the problem. After renaming it the code works as intended. It has probably something to do with the brackets inside the column name which can maybe be interpreted as a dictionary or list but I'm not sure.

Python - linear regression TypeError: invalid type promotion

i am trying to run linear regression and i am having issues with data type i think. I have tested line by line and everything works until i reach last line where i get the issue TypeError: invalid Type promotion. Based on my research i think it is due to date format.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
data=pd.read_excel('C:\\Users\\Proximo\\PycharmProjects\Counts\\venv\\Counts.xlsx')
data['DATE'] = pd.to_datetime(data['DATE'])
data.plot(x = 'DATE', y = 'COUNT', style = 'o')
plt.title('Corona Spread Over the Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.show()
X=data['DATE'].values.reshape(-1,1)
y=data['COUNT'].values.reshape(-1,1)
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=.2,random_state=0)
regressor = LinearRegression()
regressor.fit(X_train,Y_train)
y_pre = regressor.predict(X_test)
When i run it this is the full error i get:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-c9e943251026> in <module>
----> 1 y_pre = regressor.predict(X_test)
2
c:\users\slavi\pycharmprojects\coronavirus\venv\lib\site-packages\sklearn\linear_model\_base.py in predict(self, X)
223 Returns predicted values.
224 """
--> 225 return self._decision_function(X)
226
227 _preprocess_data = staticmethod(_preprocess_data)
c:\users\slavi\pycharmprojects\coronavirus\venv\lib\site-packages\sklearn\linear_model\_base.py in _decision_function(self, X)
207 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
208 return safe_sparse_dot(X, self.coef_.T,
--> 209 dense_output=True) + self.intercept_
210
211 def predict(self, X):
c:\users\Proximo\pycharmprojects\Count\venv\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output)
149 ret = np.dot(a, b)
150 else:
--> 151 ret = a # b
152
153 if (sparse.issparse(a) and sparse.issparse(b)
TypeError: invalid type promotion
My date format which looks like this:
array([['2020-01-20T00:00:00.000000000'],
['2020-01-21T00:00:00.000000000'],
['2020-01-22T00:00:00.000000000'],
['2020-01-23T00:00:00.000000000'],
['2020-01-24T00:00:00.000000000'],
['2020-01-25T00:00:00.000000000'],
['2020-01-26T00:00:00.000000000'],
['2020-01-27T00:00:00.000000000'],
['2020-01-28T00:00:00.000000000'],
['2020-01-29T00:00:00.000000000'],
['2020-01-30T00:00:00.000000000'],
['2020-01-31T00:00:00.000000000'],
['2020-02-01T00:00:00.000000000'],
['2020-02-02T00:00:00.000000000']], dtype='datetime64[ns]')
Any suggestion on how to resolve this issue?
I think linear regression not work for date type data.You need to convert it to numerical data.
for example
import numpy as np
import pandas as pd
import datetime as dt
X_test = pd.DataFrame(np.array([
['2020-01-24T00:00:00.000000000'],
['2020-01-25T00:00:00.000000000'],
['2020-01-26T00:00:00.000000000'],
['2020-01-27T00:00:00.000000000'],
['2020-01-28T00:00:00.000000000'],
['2020-01-29T00:00:00.000000000'],
['2020-01-30T00:00:00.000000000'],
['2020-01-31T00:00:00.000000000'],
['2020-02-01T00:00:00.000000000'],
['2020-02-02T00:00:00.000000000']], dtype='datetime64[ns]'))
X_test.columns = ["Date"]
X_test['Date'] = pd.to_datetime(X_test['Date'])
X_test['Date']=X_test['Date'].map(dt.datetime.toordinal)
Try this approach.this should work.
Note - it is better to covert training set dates to numeric and train on that data.

Python pandas recursive function polynomial form

I am trying to create a recursive function using Pandas dataframe in python.
I read through, and there seems to be a few different methods, either for/if loop or Dataframe.apply methods; or scipy.signal.lfilter. However, lfilter doesn't work for me as my recursive formula can be of a polynomial form.
The recursive formula I am looking to do is :
x(t) = A * Bid + B * x(t-1)^C + BidQ
I looked through some examples, and one possibility is of the form below.
import pandas as pd
import datetime as dt
import numpy as np
import scipy.stats as stats
import scipy.optimize as optimize
from scipy.signal import lfilter
#xw.func
#xw.ret(expand='table')
def py_Recursive(v, lamda_, AscendType):
df = pd.DataFrame(v, columns=['Bid', 'Ask', 'BidQ', 'AskQ'])
df = df.sort_index(ascending=AscendType)
NewBid = lfilter([1], [1,-2], df['Bid'].astype=(float))
df = df.join(NewBid)
df = df.sort_index(ascending=True)
return df
lamda_ is a decay function that potentially will be used in the future and AscendType is either TRUE or FALSE
My input data-set for v is as per below
v =
763.1 763.3 89 65
762.5 762.7 861 687
772.1 772.3 226 761
770.6 770.8 927 333
777.8 778.0 59 162
786.5 786.7 125 431
784.7 784.9 915 595
776.8 777.0 393 843
777.7 777.9 711 935
771.6 771.8 871 956
770.0 770.2 727 300
768.7 768.9 565 923
so I was not able to run your code, but I think what you can do to create your column recursively and with the formula you gave:
df = pd.DataFrame(v, columns=['Bid', 'Ask', 'BidQ', 'AskQ'])
# initialise your parameters, but they can be a function of something else
A, B, C = 10, 2, 0.5
x0 = 1
#create the column x filled with x0 first
df['x'] = x0
# now change each row depending on the previous one and other information
for i in range(1,len(df)):
df.loc[i,'x'] = A*df.loc[i,'Bid'] + B*df.loc[i-1,'x']**C + df.loc[i,'BidQ']
I was tinkering around with various ways, and below is a more complete code.
import pandas as pd
import datetime as dt
import numpy as np
import scipy.stats as stats
import scipy.optimize as optimize
from scipy.signal import lfilter
# if using xlwings addin
#xw.func
#xw.ret(expand='table')
df = pd.DataFrame(v, A=10, B=2, C=0.5, columns=['Bid', 'Ask', 'BidQ', 'AskQ'])
# initialise your parameters, but they can be a function of something else
Trend = pd.Series(1, name = 'Trend')
df = df.join(Trend)
#create the column Trend filled with 1 first
# now change each row depending on the previous one and other information
for i in range(1,len(df)):
df.loc[i,'Trend'] = A * df.loc[i,'Bid'] + B * df.loc[i-1,'Trend']**C + df.loc[i,'BidQ']
return df

Categories