Why am I getting the error below?
ValueError: Expected 2D array, got scalar array instead: array=5.5.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv("decision-tree-regression-dataset.csv",sep = ";",header = None)
x = df.iloc[:,0].values.reshape(-1,1)
y = df.iloc[:,1].values.reshape(-1,1)
# decision tree regression
from sklearn.tree import DecisionTreeRegressor
tree_reg = DecisionTreeRegressor() # random sate = 0
tree_reg.fit(x,y)
tree_reg.predict(5.5)
x_ = np.arange(min(x),max(x),0.01).reshape(-1,1)
y_head = tree_reg.predict(x_)
# visualize
plt.scatter(x,y,color="red")
plt.plot(x_,y_head,color = "green")
plt.xlabel("tribun level")
plt.ylabel("ucret")
plt.show()
try to use this to predict :
tree_reg.predict([[5.5]])
note to use [[]] as 2d array which like (sample_num, feature_num)
Related
I've trained a model and extracted .h5 model architecture, then I used .h5 as prediction of time series datasets. This process was done by converting pandas dataframe to numpy array and adding dummy dimension. Then, on plotting section, there must be 2D plot instead of 3D array, so i reshaped it to 2D but on plotting section, there is nothing to show. How can I plot prediction results?
Full code:
from keras.models import load_model
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
model = tf.keras.models.load_model('finaltemp.h5',compile = True)
df = pd.read_excel("new.xls")
#rescaling
mean = df.mean()
std = df.std()
df_new = (df-mean)/std
#pandas to numpy
numpy_array = df_new.to_numpy()
#add dummy dim
x = np.expand_dims(numpy_array, axis=0)
#predict
predictions = model.predict(x)
print(predictions)
array([[[-0.05154558],
[-0.01212088],
[-0.07192875],
...,
[ 0.24430084],
[-0.04761859],
[-0.1841197 ]]], dtype=float32)
#get shapes
predictions.shape
(1, 31390, 1)
#reshape to 2D
newarr = predictions.reshape(1,31390*1)
print(newarr)
[[-0.05154558 -0.01212088 -0.07192875 ... 0.24430084 -0.04761859
-0.1841197 ]]
#plot
plt.plot(newarr)
plt.show()
final result
According to #ShubhamSharma 's comment, I changed the plot to
plt.plot(predictions.squeeze())
I have a task to create a 30x40 feature matrix with random integers between 1 & 100:
import numpy as np
matrix= np.random.randint(1,100,size=(30,40))
Next I need to rescale the elements in the matrix to be between the range 5-10:
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
scaler.fit (5,10)
matrix1 = scaler.fit_transform(matrix)
Which gives me this error:
ValueError: Expected 2D array, got scalar array instead:
array=5.0.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
I've tried reshaping the data:
matrix.reshape(-1,1)
but I get the same error.
I think you need to define the feature range when you create an instance of MinMaxScaler like this:
scaler = preprocessing.MinMaxScaler(feature_range=(5, 10))
And then you could fit and transform the data like this:
matrix1 = scaler.fit_transform(matrix)
The last line is a short form for:
scaler.fit(matrix)
matrix1 = scaler.transform(matrix)
The dataset I'm using has some columns that are categories. I applied OneHotEncoder to them. Then, I tried to join the array the numeric features and the array that is result of OneHotEncoder, thus forming a single array with all the features!
The first array is (5074382, 82) and the second is (5074382, 9276434)
I tried:
features_final = np.column_stack((features2, features_encoded))
features_final will be used instead of features
features_encoded
(5074382, 9276434) dtype('float64') scipy.sparse.csr.csr_matrix
features2
(5074382, 82) dtype('float64') numpy.ndarray
The code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.max_columns = None #Display all dataframe columns in a Jupyter Python Notebook
pd.set_option('display.max_rows', 1000)
get_ipython().run_line_magic('matplotlib', 'inline')
CIC2019 = pd.read_csv(r"DrDoS_DNS.csv")
remove =lambda x:x.strip()# remove the blancks in columns names
columns = list(CIC2019.columns)
new_columns =list(map(lambda x:x.strip(),columns))# removing blamcks
CIC2019 = pd.read_csv(r"CSV-01-12\DrDoS_DNS.csv", names =new_columns, header = None, skiprows=1,nrows=None)
CIC2019.rename(columns={"Unnamed: 0": "ID"}, inplace=True)
CIC2019 = CIC2019.dropna()
CIC2019.isna().sum()
features = CIC2019.drop("Label", axis =1)
# # Handling categorical attributes
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
CIC2019["Label"]
Label_encoded = encoder.fit_transform(CIC2019["Label"].to_numpy().reshape(1,-1))
features[["Flow ID","Source IP","Timestamp","SimillarHTTP","Destination IP"]]
features2 = features.drop(["Flow ID","Source IP","Timestamp","Destination IP","SimillarHTTP"], axis =1)
features2 = features2.to_numpy()
features_encoded = encoder.fit_transform(features[["Flow ID","Source IP","Timestamp","Destination IP",]].to_numpy())
#"SimillarHTTP" : error when you added this
# # Training - Linear Regression
features_final = np.column_stack((features2, features_encoded))
I got the error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 5074382 and the array at index 1 has size 1
What happened? How to fix it?
I am trying to plot the training and test data from a scikit-learn dataset.
import sys, os
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
plt.switch_backend('agg')
%matplotllib inline
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = np.matrix(diabetes.target[:-20]).T
diabetes_y_test = np.matrix(diabetes.target[-20:]).T
plt.scatter(diabetes_X_train, diabetes_y_train, color='black')
plt.scatter(diabetes_X_test, diabetes_y_test, color='red')
but I have the following error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 422 and the array at index 1 has size 1
I checked the shape of the matrices and the training data has (422,1) and the test data (20,1). What is causing this error?
plt.scatter is expecting to plot two same-shaped datasets against each other. IF they aren't 1D, they will be flattened. It does not make sense to flatten X in a machine-learning problem.
Check the dimensions of X_train and y_train. You'll see that they aren't compatible. This is a 2D plot you're making, you can only plot one set of numbers against another. X is a matrix: every row is a bunch of numbers.
So you can do this:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.random.random((422, 1)), np.random.random((422, 1))
plt.scatter(x, y)
But you can't do this:
X, y = np.random.random((422, 10)), np.random.random((422, 1))
plt.scatter(X, y)
Which is essentially what you're trying to do. (I don't think you want to transpose y by the way.)
So this should work for you:
plt.scatter(diabetes_X_train[:, 0], diabetes_y_train)
But that only shows the relationship with one feature of X.
Assuming you're just trying to explore the data, I recommend checking out seaborn.pairplot. It's perfect for this sort of thing.
I am using scikit linear regression - single variable to predict y from x. The argument is in float datatype. How can i transform the float into numpy array to predict the output ?
import matplotlib.pyplot as plt
import pandas
import numpy as np
from sklearn import linear_model
import sys
colnames = ['charge_time', 'running_time']
data = pandas.read_csv('trainingdata.txt', names=colnames)
data = data[data.running_time < 8]
x = np.array(list(data.charge_time))
y = np.array(list(data.running_time))
clf = linear_model.LinearRegression() # Creating a Linear Regression Modal
clf.fit(x[:,np.newaxis], y) # Fitting x and y array as training set
data = float(sys.stdin.readline()) # Input is Float e.g. 4.8
print clf.predict(data[:,np.newaxis]) # As per my understanding parameter should be in 1-D array.
First of all, a suggestion not directly related to your question:
You don't need to do x = np.array(list(data.charge_time)), you can directly call x = np.array(data.charge_time) or, even better, x = data.charge_time.values which directly returns the underlying ndarray.
It is also not clear to me why you're adding a dimension to the input arrays using np.newaxis.
Regarding your question, predict expects an array-like parameters: that can be a list, a numpy array, or other.
So you should be able to just do data = np.array([float(sys.stdin.readline())]). Putting the float value in a list ([]) is needed because without it numpy would create a 0-d array (i.e. a single value, which is not sliceable) instead of a 1-d array.