Drawing a trendline using matplotlib - python

I've been trying to graph some price v time data, and I cannot figure a way of drawing a trendline here. The dates are datetime objects. The graph is fine. However, using polyfit as I do below throws up an error.
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
from datetime import datetime
import numpy as np
import matplotlib.pylab as plb
notes = pd.read_csv("tsla.csv")
notes.dropna(inplace=True)
date_list = notes['x']
price_list = notes['Close']
date_list = date_list.tolist()
price_list = price_list.tolist()
for i in range(len(date_list)):
date_list[i] = (date_list[i][:-8])
date_list[i] = date_list[i][:-5] + date_list[i][-3:-1]
##print(len(date_list[i]))
date_list[i] = datetime.strptime(date_list[i], "%m/%d/%y")
##print(date_list[i])
##print(date_list)
price_list = list(map(lambda x: int(x), price_list))
plt.plot(date_list, price_list)
plt.ylabel("Prices")
plt.xlabel("Dates")
# calc the trendline (it is simply a linear fitting)
z = np.polyfit(date_list, price_list, 1)
p = np.poly1d(z)
plb.plot(x,p(x),"r--")
##### Showing time series line graph below
plt.show()
Error below
Traceback (most recent call last):
File "/Users/ramapriyansrivatsanpd/Documents/Python for finance - fintech soc.py", line 42, in <module>
z = np.polyfit(date_list, price_list, 1)
File "<__array_function__ internals>", line 5, in polyfit
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/polynomial.py", line 590, in polyfit
x = NX.asarray(x) + 0.0
TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'float'

I know it's 10 months but maybe you or some other people
You could cast the Y-Axis to int() and the X-Axis to datetime-Object after that x = mdates.date2num(x).
After this it worked for me.

Related

What is invalid index to scalar variable error in python?

I am quite new to python so please bear with me.
Currently, this is my code
import statistics
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from datetime import datetime
df = pd.read_csv(r"/Users/aaronhuang/Documents/Desktop/ffp/exfileCLEAN2.csv", skiprows=[1]) # replace this with wherever the file is.
start_time = datetime.now()
magnitudes = df['Magnitude '].values
times = df['Time '].values
average = statistics.mean(magnitudes)
sd = statistics.stdev(magnitudes)
below = sd*3
i = 0
while(i < len(df['Magnitude '])):
if(abs(df['Magnitude '][i]) <= (average - below)):
print(df['Time '][i])
outlier_indicies=(df['Time '][i])
i += 1
window = 2
num = 1
x = times[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
y = magnitudes[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
plt.plot(x, y)
plt.xlabel('Time (units)')
plt.ylabel('Magnitude (units)')
plt.show()
fig = plt.figure()
It outputs this:
/Users/aaronhuang/.conda/envs/EXTTEst/bin/python "/Users/aaronhuang/PycharmProjects/EXTTEst/Code sandbox.py"
2456116.494
2456116.535
2456116.576
2456116.624
2456116.673
2456116.714
2456116.799
2456123.527
2456166.634
2456570.526
2456595.515
2457485.722
2457497.93
2457500.674
2457566.874
2457567.877
Traceback (most recent call last):
File "/Users/aaronhuang/PycharmProjects/EXTTEst/Code sandbox.py", line 38, in <module>
x = times[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
IndexError: invalid index to scalar variable.
Process finished with exit code 1
How can I solve this error? I would like my code to take the "time" values printed, and graph them to their "magnitude" values. If there are any questions please leave a comment.
Thank you
Can't tell exactly what you are trying to do. But the indexing format you are using should evaluate to something like times[10:20], going from the 10th to the 20th index of times. The problem is that (I'm guessing) the numbers you have in there aren't ints, but possibly timestamps?
Maybe you want something like:
mask = (times > outlier_indicies[num-window]) & (times < outlier_indicies[num+window+1])
x = times[mask]
y = magnitude[mask]
But I'm really just guessing, and obv can't see your data.

Lambidify throws TypeError: can't multiply sequence by non-int of type 'float' when I try to plot through Matplotlib

Whenever I try to run this:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
from numpy import *
import sympy
from sympy.abc import x,y
from sympy import symbols
from sympy.plotting import plot
def Plot_2D_RHS(xdata,ydata,RHS):
x = symbols('x')
RHS=sympy.sympify(RHS)
RHS=sympy.lambdify([x], RHS)
plt.figure(figsize=(6, 4))
plt.scatter(xdata, ydata, label='Data')
plt.plot(xdata, RHS(xdata), label='Fit')
plt.legend(loc='best')
plt.show()
xdata=[1, 4 , 6, 8 ,9,10]
ydata=[2, 5, 7, 8,11,12]
RHS='9.8+3.7*x'
Plot_2D_RHS(xdata,ydata,RHS)
I get
Traceback (most recent call last):
File "C:\Users\Essam\Source\Repos\Curve-Fitting\NetwonWillRemember\Trail1\Trail1.py", line 182, in <module>
Plot_2D_RHS(xdata,ydata,RHS)
File "C:\Users\Essam\Source\Repos\Curve-Fitting\NetwonWillRemember\Trail1\Trail1.py", line 16, in Plot_2D_RHS
plt.plot(xdata, RHS(xdata), label='Fit')
File "<lambdifygenerated-1>", line 2, in _lambdifygenerated
TypeError: can't multiply sequence by non-int of type 'float'
Press any key to continue . . .
and no plot is shown, however for some reason if we change the x in the RHS to something like 'cos(x)' it works with no problems, how can I solve this issue without using Sympy's plot since it doesn't offer scatter plots.
In [23]: sympify('9.8+3.7*x')
Out[23]: 3.7⋅x + 9.8
In [24]: f=lambdify([x],sympify('9.8+3.7*x'))
In [25]: print(f.__doc__)
Created with lambdify. Signature:
func(x)
Expression:
3.7*x + 9.8
Source code:
def _lambdifygenerated(x):
return (3.7*x + 9.8)
Imported modules:
It works fine with an array argument, but not with a list:
In [26]: f(np.arange(3))
Out[26]: array([ 9.8, 13.5, 17.2])
In [27]: f([1,2,3])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-816cce84b257> in <module>
----> 1 f([1,2,3])
<lambdifygenerated-2> in _lambdifygenerated(x)
1 def _lambdifygenerated(x):
----> 2 return (3.7*x + 9.8)
TypeError: can't multiply sequence by non-int of type 'float'

How to Prevent TypeError: only size-1 arrays can be converted to Python scalars from happening

I am trying to visualise a dataset with matplotlib.
The code is:
import time as ti
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import csv
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
from scipy.interpolate import *
data = pd.read_csv("includes\\csv.csv")
#x = array(data["day"])
#y = np.array(data["balance"])
x = float(np.array(data["day"]))
y = float(np.array(data["balance"]))
p1 = np.polyfit(x, y, 1)
print(p1)
plt.plot(x, y, "o")
plt.plot(x, polyval(p1, x), "-r")
plt.show()
The error that accurs is:
Traceback (most recent call last):
File "mittel.py", line 19, in <module>
x = float(np.array(data["day"]))
TypeError: only size-1 arrays can be converted to Python scalars
I am wondering why thats a thing because the csv file i am using is this simple:
balance,day
242537,28-5
246362,29-5
246659,30-5
246844,31-5
I have been working on this for hours.
Any answers appreciated.
Day column in your csv file is having value '28-5','29-5' ....
and np.array(data['day']) will result into a array so you cant cast array to float so getting TypeError.
change line 14-15 to this
x = [float(day_str.split('-')[0]) for day_str in np.array(data["day"])]
y = np.array(data["balance"], dtype=float)
I solved it by formatting it into a n/m/y format.

How can use scipy with a datetime without the right formatting?

I am trying to visualise a dataset and its average with scipy.interpolate and matplotlb.
But when im trying to run the code that should work perfectly fine it gives me the error:
File "mittel.py", line 19, in <module>
p1 = polyfit(x, y, 1)
File "C:\Users\simon\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\lib\polynomial.py", line 589, in polyfit
x = NX.asarray(x) + 0.0
TypeError: can only concatenate str (not "float") to str
And the code is:
import time as ti
import pandas as pd
from numpy import *
from matplotlib import pyplot as plt
import csv
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
from scipy.interpolate import *
data = pd.read_csv("includes\\csv.csv")
x = array(data["day"])
y = array(data["balance"])
p1 = polyfit(x, y, 1)
print(p1)
plt.plot(x, y, "o")
plt.plot(x, polyval(p1, x), "-r")
plt.show()
I have already tried to convert the x array to a string with
x = str(x)
but that didnt help at all.
My csv file looks like this:
balance,day
242537,28-5
246362,29-5
246659,30-5
246844,31-5
Do you know why that error accurs?
x = NX.asarray(x) + 0.0
TypeError: can only concatenate str (not "float") to str
As you can see here, + is interpreted to concatenate two strings, whereas you need to add float. So instead of converting x to a string object, try converting x to a float object:
x = float(array(data["day"]))
y = float(array(data["balance"]))

Type error while calculating numpy array mean

I am trying to perform linear discriminant analysis using a code given below
#!/usr/bin/python3
import pandas as pd
df = pd.read_excel('Hazara1.xlsx', sheetname='Sheet1')
feature_dict = {i:label for i,label in zip(
range(15), ("DYS19","DYS389I","DYS389II","DYS390","DYS391","DYS392","DYS393","DYS437","DYS438","DYS439","DYS448","DYS456","DYS458","DYS635","Y_GATA_H4",))}
df.columns = [l for i,l in sorted(feature_dict.items())] + ['class label']
df.dropna(how="all", inplace=True)
from sklearn.preprocessing import LabelEncoder
X = df[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]].values
y = df['class label'].values
enc = LabelEncoder()
label_encoder = enc.fit(y)
y = label_encoder.transform(y) + 1
label_dict = {1: 'Central_Asia', 2: 'South_Asia', 3:'Russia',4:'East_Asia',5:'Hazara'}
from matplotlib import pyplot as plt
import numpy as np
import math
np.set_printoptions(precision=15)
mean_vectors = []
for cl in range(1,6):
mean_vectors.append(np.mean(X[y==cl], axis=0))
Where df contains data as shown here.
But when I execute the above code I get following error:
Traceback (most recent call last):
File "iris2.py", line 30, in <module>
mean_vectors.append(np.mean(X[y==cl], axis=0))
File "/home/ammar/anaconda3/lib/python3.5/site-packages/numpy /core/fromnumeric.py", line 2878, in mean
out=out, keepdims=keepdims)
File "/home/ammar/anaconda3/lib/python3.5/site-packages/numpy/core/_methods.py", line 65, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Is there a way to solve this problem?

Categories