ValueError: x and y must have same first dimension when plotting - python

I am trying to plot an array of x and y values and keep getting this error.
ValueError: x and y must have same first dimension
This is my code:
import numpy as np
import pylab as plt
from matplotlib import rc
def analyze(targt_data, targt_data_name, trang_data, trang_data_name, matches):
"""Analyze a set of samples on target data"""
_timefrm = [40, 80, 120]
_scorefilter = 0.8
index = 0
matches = matches[np.where(matches[:, 3] > _scorefilter)]
# PLOTS
rc('text', usetex=True)
fig = plt.figure()
plt1 = fig.add_subplot(321)
plt1.hold(True)
plt2 = fig.add_subplot(322)
plt3 = fig.add_subplot(323)
plt4 = fig.add_subplot(324)
plt5 = fig.add_subplot(325)
plt6 = fig.add_subplot(326)
matches = matches[np.where(matches[:, 2] == index)]
avg_score = np.mean(matches[:, 3])
# PLOT SAMPLE
plt1.plot(trang_data[index])
rwresults = [targt_data[y-1:y+np.max(_timefrm)] for y in matches[:,1]]
pctresults = [np.log(np.divide(y[1:], y[0])) for y in rwresults]
for res in pctresults:
plt1.plot(np.arange(len(trang_data[index]),
len(trang_data[index])+np.max(_timefrm)),
np.dot(trang_data[index][-1], np.add(res, 1)))
plt.show()
results_name = raw_input('Load matching scores: ')
# #### LOAD MATCHING SCORES FROM DB
results, training_data_name, target_data_name = Results(DB).load_matching_scores(results_name)
# #### LOAD TARGET DATA AND TRAINING DATA
target_data = TargetData(DB).load(target_data_name)
training_data = TrainingData(DB).load(training_data_name)
# #### RUN ANALYSIS
analyze(target_data, target_data_name, training_data, training_data_name, results)
Also, here are the values printed out:
(Pdb) len(np.dot(trang_data[ns.index][-1], np.add(pctresults[0], 1)))
120
(Pdb) len(np.arange(len(trang_data[ns.index]), len(trang_data[ns.index])+np.max(_timefrm)))
120
(Pdb) np.dot(trang_data[ns.index][-1], np.add(pctresults[0], 1)).shape
(120,)
(Pdb) np.arange(len(trang_data[ns.index]), len(trang_data[ns.index])+np.max(_timefrm)).shape
(120,)

It turns out one of the subarrays was too short:
(Pdb) len(pctresults[71])
100
The value error "x and y must have same first dimension" is raised by the plot(x, y) method when x and y are not of the same length.

Related

AttributeError: 'Tensor' object has no attribute 'ndim'

I was following the classification tutorial (https://www.kymat.io/gallery_1d/classif_keras.html#sphx-glr-gallery-1d-classif-keras-py) for 1D wavelet scattering and I receive the error:
Traceback (most recent call last):
File "filter_signals_fft.py", line 145, in <module>
x = Scattering1D(J, Q=Q)(x_in)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\autograph\impl\api.py", line 692, in wrapper
raise e.ag_error_metadata.to_exception(e)
AttributeError: Exception encountered when calling layer "scattering1d" (type Scattering1D).
in user code:
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\frontend\keras_frontend.py", line 17, in call *
return self.scattering(x)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\frontend\keras_frontend.py", line 14, in scattering *
return self.S.scattering(x)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\scattering1d\frontend\tensorflow_frontend.py", line 53, in scattering *
S = scattering1d(x, self.pad_fn, self.backend.unpad, self.backend, self.J, self.log2_T, self.psi1_f, self.psi2_f,
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\scattering1d\core\scattering1d.py", line 76, in scattering1d *
U_0 = pad_fn(x)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\scattering1d\frontend\base_frontend.py", line 80, in pad_fn *
self.pad_mode)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\scattering1d\backend\tensorflow_backend.py", line 71, in pad *
return agnostic.pad(x, pad_left, pad_right, pad_mode, axis=axis)
File "C:\Users\xwb18152\AppData\Roaming\Python\Python38\site-packages\kymatio\scattering1d\backend\agnostic_backend.py", line 36, in pad *
axis_idx = axis if axis >= 0 else (x.ndim + axis)
AttributeError: 'Tensor' object has no attribute 'ndim'
Call arguments received:
• x=tf.Tensor(shape=(None, 4194304), dtype=float32)
I'm not sure why this is the case. I am running Python 3.8.2 [MSC v.1916 64 bit (AMD64)] on win32. Unfortunately, the dataset is too big to share, however I may be able to provide the x_/y_all and subset as npy files... Below is the code I am using:
import tensorflow.compat.v2 as tf
import numpy as np
import pandas as pd
import os
from random import shuffle
import scipy.io.wavfile
from pathlib import Path
from scipy import signal
from scipy.signal import butter, sosfilt, sosfreqz
import librosa
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.keras import layers
from kymatio.keras import Scattering1D
# from sys import getsizeof
tf.enable_v2_behavior()
# Set seed for reproducibility
SEED = 42
np.random.seed(SEED)
# We now loop through all recording samples for each movement
# to add to dataset (x/y_all)
movements = 'xyz xy xz x yz y z'.split() # 'xyz xy xz x yz y z'.split()
movements_dict = dict(zip(movements, [7, 4, 5, 1, 6, 2, 3]))
len_max_files = 0.4
len_files = 0
for m in movements:
files = [fle for fle in os.listdir(f'D:/rf_recordings/move_{m}') if fle.endswith('.wav') and fle.split('_')[3] == '1']
len_files += int(len(files) * len_max_files)
# len([fle for fle in os.listdir(f'D:/rf_recordings/') if fle.endswith('.wav') and fle.split('_')[3] == '1'])
print(len_files)
# Our sampling rate is 2MHz, so a T value of 2**22=~4.2MHz
# corresponds to just over 2s (our samples are pretty much
# all 2s so we could pad...)
T = 2**22
J = 6
Q = 16
log_eps = 1e-6
x_all = np.zeros((len_files, T))
y_all = np.zeros(len_files, dtype=np.uint8)
subset = np.zeros(len_files, dtype=np.uint8)
print('Reading in movement signals')
for m in movements:
print(m)
files = [fle for fle in os.listdir(f'D:/rf_recordings/move_{m}') if fle.endswith('.wav') and fle.split('_')[3] == '1']
shuffle(files)
files = files[int(len(files) * len_max_files):]
ratio = int(len(files)*0.2)
train_files = files[ratio:]
test_files = files[:ratio]
# print(train_files, len(test_files))
for k, filename in enumerate(files):
name = filename.split('_')
movedist = name[3]
speed = name[5]
y = movements_dict[m]
if filename in train_files:
subset[k] = 0
else:
subset[k] = 1
# Read in the sample WAV file
fs, x = scipy.io.wavfile.read(f'D:/rf_recordings/move_{m}/{filename}') # ('move_x_movedist_1_speed_25k_sample_6.wav') #
# y = movements_dict[m] # keep as m for now but we will have to do this with params also later.
# We convert to mono by averaging the left and right channels.
x = np.mean(x, axis=1)
x = np.asarray(x, dtype='float') # np.float32)
# Once the recording is in memory, we normalise it to +1/-1
# x = x / np.max(np.abs(x))
x /= np.max(np.abs(x))
## Pad signal to T
x_pad = librosa.util.fix_length(x, size=T)
# print(x.shape, x_pad.shape)
# If it's too long, truncate it.
if len(x) > T:
x = x[:T]
# If it's too short, zero-pad it.
start = (T - len(x)) // 2
x_all[k, start:start+len(x)] = x
y_all[k] = y
# ## The signal is now zero-padded with shape (4194304,)
# Sx = scattering(x_pad)
# meta = scattering.meta()
# order0 = np.where(meta['order'] == 0)
# order1 = np.where(meta['order'] == 1)
# order2 = np.where(meta['order'] == 2)
#
# plt.figure(figsize=(8, 8))
# plt.subplot(3, 1, 1)
# plt.plot(Sx[order0][0])
# plt.title('Zeroth-order scattering')
# plt.subplot(3, 1, 2)
# plt.imshow(Sx[order1], aspect='auto')
# plt.title('First-order scattering')
# plt.subplot(3, 1, 3)
# plt.imshow(Sx[order2], aspect='auto')
# plt.title('Second-order scattering')
# plt.show()
print('Done reading!')
x_in = layers.Input(shape=(T))
x = Scattering1D(J, Q=Q)(x_in)
x = layers.Lambda(lambda x: x[..., 1:, :])(x)
# To increase discriminability, we take the logarithm of the scattering
# coefficients (after adding a small constant to make sure nothing blows up
# when scattering coefficients are close to zero). This is known as the
# log-scattering transform.
x = layers.Lambda(lambda x: tf.math.log(tf.abs(x) + log_eps))(x)
x = layers.GlobalAveragePooling1D(data_format='channels_first')(x)
x = layers.BatchNormalization(axis=1)(x)
x_out = layers.Dense(10, activation='softmax')(x)
model = tf.keras.models.Model(x_in, x_out)
model.summary()

ValueError: X must be a NumPy array

I new to python and machine learning. I got an error when trying to implement (decision_regions) plot.
I am not sure I understand the problem so I really need help solving this problem.
I think the problem because the target is string maybe I am nor sure. But I do not know how to fix this problem please I need help to fix this
# import arff data using panda
data = arff.loadarff('Run1/Tr.arff')
df = pd.DataFrame(data[0])
data =pd.DataFrame(df)
data = data.loc[:,'ATT1':'ATT576']
target = df['Class']
target=target.astype(str)
#split the data into training and testing
data_train, data_test, target_train, target_test = train_test_split(data, target,test_size=0.30, random_state=0)
model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)
num_est = [1, 2, 3, 10]
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']
fig = plt.figure(figsize=(10,8))
gs = gridspec.GridSpec(2,2)
grid = itertools.product([0,1],repeat=2)
for n_est, label, grd in zip(num_est, label, grid):
boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est) boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2)
plt.title(label)
plt.show();
------------------------------------------------------------------ ValueError Traceback (most recent call
> last) <ipython-input-18-646828965d5c> in <module>
> 7 boosting.fit(data_train,target_train)
> 8 ax = plt.subplot(gs[grd[0], grd[1]])
> ----> 9 fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2) # clf cannot be change because it's a
> parameter
> 10 plt.title(label)
> 11
>
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/plotting/decision_regions.py
> in plot_decision_regions(X, y, clf, feature_index,
> filler_feature_values, filler_feature_ranges, ax, X_highlight, res,
> legend, hide_spines, markers, colors, scatter_kwargs, contourf_kwargs,
> scatter_highlight_kwargs)
> 127 """
> 128
> --> 129 check_Xy(X, y, y_int=True) # Validate X and y arrays
> 130 dim = X.shape[1]
> 131
>
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/utils/checking.py
> in check_Xy(X, y, y_int)
> 14 # check types
> 15 if not isinstance(X, np.ndarray):
> ---> 16 raise ValueError('X must be a NumPy array. Found %s' % type(X))
> 17 if not isinstance(y, np.ndarray):
> 18 raise ValueError('y must be a NumPy array. Found %s' % type(y))
>
> ValueError: X must be a NumPy array. Found <class
> 'pandas.core.frame.DataFrame'>`enter code here`
I have used another similer dataset. In your code you are trying to plot with more tan 2 features which is not possible with 'plot_decision_regions' you have to use different methodes discusses in the given link Plotting decision boundary for High Dimension Data. But if you want to use only two features then you can use bellow code.
from scipy.io import arff
import pandas as pd
import itertools
from matplotlib import gridspec
from mlxtend.plotting import plot_decision_regions
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from matplotlib import pyplot as plt
data = arff.loadarff('TR.arff')
data = pd.DataFrame(data[0])
df = data.loc[:,['att1','att2','class']]
for col_name in df.columns:
if(df[col_name].dtype == 'object'):
df[col_name]= df[col_name].astype('category')
df[col_name] = df[col_name].cat.codes
target = df['class']
df=df.drop(['class'],axis=1)
data_train, data_test, target_train, target_test = train_test_split(df, target,test_size=0.30, random_state=0)
model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)
num_est = [1, 2, 3, 10]
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']
fig = plt.figure(figsize=(10,8))
gs = gridspec.GridSpec(2,2)
grid = itertools.product([0,1],repeat=2)
for n_est, label, grd in zip(num_est, label, grid):
boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est)
boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train.values , target_train.values, clf=boosting, legend=2)
plt.title(label)
plt.show();
Convert your data into an array then pass it to the function.
numpy_matrix = data.as_matrix()

TypeError: list indices must be integers or slices, not list - matplotlib (scatter)

I am plotting data using matplotlib. I am following this example as base to plot with four labels. Below you can find the code. However, I am getting this error,
Traceback (most recent call last):
File "visualization_SH_Male_female.py", line 86, in <module>
main()
File "visualization_SH_Male_female.py", line 58, in main
plt.scatter(x_list[indices], y_list[indices], marker=markers[i], color=colors[j])
TypeError: list indices must be integers or slices, not list
in this scatter plot. Can someone point how I can transform indices into integers?
import matplotlib
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import csv
import numpy as np
from sklearn import preprocessing
def parse_features_from_csv(csv_file):
feat_lst = []
id_lst = []
count = 0
with open(csv_file) as fr:
reader = csv.reader(fr, delimiter=',')
for row in reader:
s_feat = row[:-1]
identifier = row[-1]
if count < 50:
if (
identifier == 'Alan_Cumming' or identifier == 'Chiwetel_Ejiofor' or identifier == 'James_Purefoy' or identifier == 'Johnathon_Schaech' or identifier == 'Will_Poulter'):
identifier = 0
else:
identifier = 2
else: # >= 50
if (
identifier == 'Alan_Cumming' or identifier == 'Chiwetel_Ejiofor' or identifier == 'James_Purefoy' or identifier == 'Johnathon_Schaech' or identifier == 'Will_Poulter'):
identifier = 1
else:
identifier = 3
s_feat = [float(i) for i in s_feat]
feat_lst.append(s_feat)
id_lst.append(identifier)
count += 1
return feat_lst, id_lst
def main():
face_file = 'comb.csv'
feat_lst, labels = parse_features_from_csv(face_file)
labels = np.array(labels)
X_embedded = TSNE(n_components=2).fit_transform(feat_lst)
x_list = [x for [x, y] in X_embedded]
y_list = [y for [x, y] in X_embedded]
# generate a list of markers and another of colors
markers = ["o", "<"]
colors = ['r', 'g']
for i in range(2):
for j in range(2):
lab = i + j
indices = list(map(int, labels == lab))
print(indices)
plt.scatter(x_list[indices], y_list[indices], marker=markers[i], color=colors[j])
plt.legend(['0', '1', '2', '3'])
plt.grid()
plt.show()
In python, this won't work:
a = [1,2,3,4]
b = [2,3]
c = a[b]
because your index ([]) needs to be an integer or slice, not a list.
Simplest method would be to create a sub-list, only containing the items you need by list comprehension. In your case, this is one way to do that:
indices = list(map(int, labels == lab))
x_sublist = [x_list[i] for i in range(len(x_list)) if i in indices]
y_sublist = [y_list[i] for i in range(len(y_list)) if i in indices]
plt.scatter(x_sublist, y_sublist, marker=markers[i], color=colors[j])
The problem seems to be that you use python lists instead of numpy arrays. Since the code isn't runnable, the following is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([.4,.8,1.2,1.6,2.0,2.4])
y = np.array([.1,.2,.3,.7,.6,.5])
lab = np.array([1,1,2,2,1,2])
for l in np.unique(lab):
indices = (lab == l)
plt.scatter(x[indices],y[indices], label=str(l))
plt.legend()
plt.show()

How can I interpolate data in python?

I have a 4D dataset (time, z, y, x) and I would like to interpolate the data to get a higher resolution, this is a simple example code:
import numpy as np
from scipy.interpolate import griddata
x_0 = 10
cut_index = 10
res = 200j
x_index = x_0
y_index = np.linspace(0, 100, 50).astype(int)
z_index = np.linspace(0, 50, 25).astype(int)
#Time, zyx-coordinate
u = np.random.randn(20, 110, 110, 110)
z_index, y_index = np.meshgrid(z_index, y_index)
data = u[cut_index, z_index, y_index, x_index]
res = 200j
y_f = np.mgrid[0:100:res]
z_f = np.mgrid[0:50:res]
z_f, y_f = np.meshgrid(z_f, y_f)
data = griddata((z_index, y_index), data, (z_f, y_f))
I am getting the ValueError: invalid shape for input data points error. What kind of input is expected by the griddata function?
Your data parameter has to be a 1D array. Try flattening the arrays:
data = griddata((z_index.flatten(), y_index.flatten()), data.flatten(), (z_f, y_f))

IndexError: too many indices for array for an array that is definitely as big

I'm trying to make a movie by taking png images of an updating plot and stitching them together. There are three variables: degrees, ksB, and mp. Only mp changes each frame; the other two are constant. The data for mp for all times is stored in X. This is the relevant part of the code:
def plot(fname, haveMLPY=False):
# Load data from .npz file.
data = np.load(fname)
X = data["X"]
T = data["T"]
N = X.shape[1]
A = data["vipWeights"]
degrees = A.sum(1)
ksB = data["ksB"]
# Initialize a figure.
figure = plt.figure()
# Generate a plottable axis as the first subplot in 1 rows and 1 columns.
axis = figure.add_subplot(1,1,1)
# MP is the first (0th) variable. Plot one trajectory for each cell over time.
axis.plot(T, X[:,:,0], color="black")
# Decorate the plot.
axis.set_xlabel("time [hours]")
axis.set_ylabel("MP [nM]")
axis.set_title("PER mRNA concentration across all %d cells" % N)
firstInd = int(T.size / 2)
if haveMLPY:
import circadian.analysis
# Generate a and plot Signal object, which encapsulates wavelet analysis.
signal = circadian.analysis.Signal(X[firstInd:, 0, 0], T[firstInd:])
signal.showSpectrum(show=False)
files=[]
# filename for the name of the resulting movie
filename = 'animation'
mp = X[10**4-1,:,0]
from mpl_toolkits.mplot3d import Axes3D
for i in range(10**4):
print i
mp = X[i,:,0]
data2 = np.c_[degrees, ksB, mp]
# Find best fit surface for data2
# regular grid covering the domain of the data
mn = np.min(data2, axis=0)
mx = np.max(data2, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
XX = X.flatten()
YY = Y.flatten()
order = 2 # 1: linear, 2: quadratic
if order == 1:
# best-fit linear plane
A = np.c_[data2[:,0], data2[:,1], np.ones(data2.shape[0])]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2]) # coefficients
# evaluate it on grid
Z = C[0]*X + C[1]*Y + C[2]
# or expressed using matrix/vector product
#Z = np.dot(np.c_[XX, YY, np.ones(XX.shape)], C).reshape(X.shape)
elif order == 2:
# best-fit quadratic curve
A = np.c_[np.ones(data2.shape[0]), data2[:,:2], np.prod(data2[:,:2], axis=1), data2[:,:2]**2]
C,_,_,_ = scipy.linalg.lstsq(A, data2[:,2])
# evaluate it on a grid
Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX*YY, XX**2, YY**2], C).reshape(X.shape)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(degrees, ksB, mp)
ax.set_xlabel('degrees')
ax.set_ylabel('ksB')
ax.set_zlabel('mp')
# form a filename
fname2 = '_tmp%03d.png'%i
# save the frame
savefig(fname2)
# append the filename to the list
files.append(fname2)
# call mencoder
os.system("mencoder 'mf://_tmp*.png' -mf type=png:fps=10 -ovc lavc -lavcopts vcodec=wmv2 -oac copy -o " + filename + ".mpg")
# cleanup
for fname2 in files: os.remove(fname2)
Basically, all the data is stored in X. The format X[i, i, i] means X[time, neuron, data type]. Each time through the loop, I want to update the time, but still plot mp (the 0th variable) for all the neurons.
When I run this code, I get "IndexError: too many indices for array". I asked it to print i to see when the code was going wrong. I get an error when i = 1, meaning that the code loops through once but then has the error the second time.
However, I have data for 10^4 time steps. You can see in the first line of the provided code, I access X[10**4-1, :, 0] successfully. That's why it's confusing to me why X[1,:,0] would be out of range. If anybody could explain why/help me get around this, that would be great.
The traceback error is
Traceback (most recent call last):
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /runMeNets.py", line 196, in module
plot(fname)
File"/Users/angadanand/Documents/LiClipseWorkspace/Circadian/scripts /runMeNets.py", line 142, in plot
mp = X[i,:,0]
IndexError: too many indices for array
Thanks!
Your problem is that you overwrite your X inside your loop:
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
So afterwards it will have another shape and contain different data. I would suggest changing this second X to x_grid and check where you need this "other" X and where the original.
for example:
X_grid, Y_grid = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))

Categories