Error plotting scikit-learn dataset training and test data - python

I am trying to plot the training and test data from a scikit-learn dataset.
import sys, os
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
plt.switch_backend('agg')
%matplotllib inline
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = np.matrix(diabetes.target[:-20]).T
diabetes_y_test = np.matrix(diabetes.target[-20:]).T
plt.scatter(diabetes_X_train, diabetes_y_train, color='black')
plt.scatter(diabetes_X_test, diabetes_y_test, color='red')
but I have the following error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 422 and the array at index 1 has size 1
I checked the shape of the matrices and the training data has (422,1) and the test data (20,1). What is causing this error?

plt.scatter is expecting to plot two same-shaped datasets against each other. IF they aren't 1D, they will be flattened. It does not make sense to flatten X in a machine-learning problem.
Check the dimensions of X_train and y_train. You'll see that they aren't compatible. This is a 2D plot you're making, you can only plot one set of numbers against another. X is a matrix: every row is a bunch of numbers.
So you can do this:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.random.random((422, 1)), np.random.random((422, 1))
plt.scatter(x, y)
But you can't do this:
X, y = np.random.random((422, 10)), np.random.random((422, 1))
plt.scatter(X, y)
Which is essentially what you're trying to do. (I don't think you want to transpose y by the way.)
So this should work for you:
plt.scatter(diabetes_X_train[:, 0], diabetes_y_train)
But that only shows the relationship with one feature of X.
Assuming you're just trying to explore the data, I recommend checking out seaborn.pairplot. It's perfect for this sort of thing.

Related

Plotting Numpy Nd array (3d to 2d)

I've trained a model and extracted .h5 model architecture, then I used .h5 as prediction of time series datasets. This process was done by converting pandas dataframe to numpy array and adding dummy dimension. Then, on plotting section, there must be 2D plot instead of 3D array, so i reshaped it to 2D but on plotting section, there is nothing to show. How can I plot prediction results?
Full code:
from keras.models import load_model
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
model = tf.keras.models.load_model('finaltemp.h5',compile = True)
df = pd.read_excel("new.xls")
#rescaling
mean = df.mean()
std = df.std()
df_new = (df-mean)/std
#pandas to numpy
numpy_array = df_new.to_numpy()
#add dummy dim
x = np.expand_dims(numpy_array, axis=0)
#predict
predictions = model.predict(x)
print(predictions)
array([[[-0.05154558],
[-0.01212088],
[-0.07192875],
...,
[ 0.24430084],
[-0.04761859],
[-0.1841197 ]]], dtype=float32)
#get shapes
predictions.shape
(1, 31390, 1)
#reshape to 2D
newarr = predictions.reshape(1,31390*1)
print(newarr)
[[-0.05154558 -0.01212088 -0.07192875 ... 0.24430084 -0.04761859
-0.1841197 ]]
#plot
plt.plot(newarr)
plt.show()
final result
According to #ShubhamSharma 's comment, I changed the plot to
plt.plot(predictions.squeeze())

Plot Tensor based on Numpy Array

I want to create a 3 dimensional tensor based on three numpy array. My idea is to create three 3 numpy arrays for the column, row and layer. Then I want to plot this tensor like the example image.
For instance a tensor like this one:
I tried to create such a plot with matplotlib, but I understand that my approach is not working beacuse it tries to print the values within the tensor and not the shape of the tensor.
How can I plot a tensor like the one in the example?
Here is what I have tried:
import numpy as np
import matplotlib.pyplot as plt
def createRowCol():
array= np.arange(25, dtype = 'float')
matrix=np.matrix(array.reshape(5,5))
return matrix
def createLayer():
array = np.arange(6, dtype = 'float')
matrix = np.matrix(array.reshape((3,2)))
return matrix
row = createRowCol()
col = createRowCol()
layer = createLayer()
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(row,col,layer) # Error occurs because the "layer" has a different shape then "row" and "col"
plt.show()

How do I apply FFT on a 3D Array

I have a 3D array that has the shape (features, timestep, samples). I would like to apply the numpy fft function on each feature for the length of timestep for each sample. I have this, but I am uncertain whether this is the best way or whether there needs to be a loop to iterate through each sample.
import numpy as np
x_train_fft = np.fft.fft(x_train, axis=0) #selected axis 0 as this is the axis of features
Looks like this was the way to do it
X_transform_FFT =[]
for i in range(x_train.shape[0]):
f = abs(np.fft.fft(x_train[i, :, :], axis = 1))
X_transform_FFT.append(f)
np.asarray(X_transform_FFT)
print(X_transform_FFT)

Error using sklearn and linear regression: shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)

I wanted to learn about machine learning and I stumbled upon youtube siraj and his Udacity videos and wanted to try and pick up a few things.
His video in reference: https://www.youtube.com/watch?v=vOppzHpvTiQ&index=1&list=PL2-dafEMk2A7YdKv4XfKpfbTH5z6rEEj3
In his video, he had a txt file he imported and read, but when I tried to recreate the the txt file it couldn't be read correctly. Instead, I tried to create a pandas dataframe with the same data and perform the linear regression/predict on it, but then I got the below error.
Found input variables with inconsistent numbers of samples: [1, 16] and something about passing 1d arrays and I need to reshape them.
Then when I tried to reshape them following this post: Sklearn : ValueError: Found input variables with inconsistent numbers of samples: [1, 6]
I get this error....
shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)
This is my code down below. I know it's probably a syntax error, I'm just not familiar with this scklearn yet and would like some help.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
#DF = pd.read_fwf('BrainBodyWeight.txt')
DF = pd.DataFrame()
DF['Brain'] = [3.385, .480, 1.350, 465.00,36.330, 27.660, 14.830, 1.040, 4.190, 0.425, 0.101, 0.920, 1.000, 0.005, 0.060, 3.500 ]
DF['Body'] = [44.500, 15.5, 8.1, 423, 119.5, 115, 98.2, 5.5,58, 6.40, 4, 5.7,6.6, .140,1, 10.8]
try:
x = DF['Brain']
y = DF['Body']
x = x.tolist()
y = y.tolist()
x = np.asarray(x)
y = np.asarray(y)
body_reg = linear_model.LinearRegression()
body_reg.fit(x.reshape(-1,1),y.reshape(-1,1))
plt.scatter(x,y)
plt.plot(x,body_reg.predict(x))
plt.show()
except Exception as e:
print(e)
Can anyone explain why sklearn doesn't like my input????
From documentation LinearRegression.fit() requires an x array with [n_samples,n_features] shape. So that's why you are reshaping your x array before calling fit. Since if you don't you'll have an array with (16,) shape, which does not meet the required [n_samples,n_features] shape, there are no n_features given.
x = DF['Brain']
x = x.tolist()
x = np.asarray(x)
# 16 samples, None feature
x.shape
(16,)
# 16 samples, 1 feature
x.reshape(-1,1).shape
(16,1)
The same requirement goes for the LinearRegression.predict function (and also for consistency), you just simply need to do the same reshaping when calling the predict function.
plt.plot(x,body_reg.predict(x.reshape(-1,1)))
Or alternatively you can just reshape the x array before calling any functions.
And for feature reference, you can easily get the inner numpy array of values by just calling DF['Brain'].values. You don't need to cast it to list -> numpy array. So you can just use this instead of all the conversion:
x = DF['Brain'].values.reshape(1,-1)
y = DF['Body'].values.reshape(1,-1)
body_reg = linear_model.LinearRegression()
body_reg.fit(x, y)

Data Visulization : Matplotlib and Numpy throwing value error

I am new to machine learning. I was teaching myself data visualization with MATPLOTLIB. my code is pretty simple.
It takes a numpy array (x = np.random.rand(1,100)) of shape=(1, 100)).
It converts numpy array x into y(y = np.sin(x)).
Final task is to visualise this in a BAR(plt.bar(x, y, label="BAR", color='r'))
But it is throwing VALUE ERROR.Even though there are already answers to this question, but none seems to work so far for me.
In one answer for this question By unutbu
he explains that this error is raised "whenever one tries to evaluate an array in boolean context".
I am unable to understand how I am using these arrays as boolean?
MY CODE:
import matplotlib.pyplot as plt
import numpy as np
#arguments are shape: 1=row; 100=columns
x = np.random.rand(1, 100)
y = np.cos(x)
#bars
plt.bar(x, y, label='Bars1', color='pink')
#legends
plt.legend()
#show the figure
plt.show()
You need to replace
x = np.random.rand(1, 100)
with
x = np.random.rand(100)
The reason is that the former gives you an array of arrays (with one array inside, but it is still a 2D array overall with dimensions 1-by-100), while the latter gives you a 1D array (of length 100). In order to visualize it with plt, you need the latter.

Categories