How to read and display MNIST dataset? - python
The code below opens the mnist dataset as a csv
import numpy as np
import csv
import matplotlib.pyplot as plt
with open('C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet/MNIST_Dataset/mnist_train.csv/mnist_train.csv', 'r') as csv_file:
for data in csv.reader(csv_file):
# The first column is the label
label = data[0]
# The rest of columns are pixels
pixels = data[1:]
# Make those columns into a array of 8-bits pixels
# This array will be of 1D with length 784
# The pixel intensity values are integers from 0 to 255
pixels = np.array(pixels, dtype='uint8')
print(pixels.shape)
# Reshape the array into 28 x 28 array (2-dimensional array)
pixels = pixels.reshape((28, 28))
print(pixels.shape)
# Plot
plt.title('Label is {label}'.format(label=label))
plt.imshow(pixels, cmap='gray')
plt.show()
break # This stops the loop, I just want to see one
I got the code above from someone and cannot get it to display the mnist digits.
I get the error:
Traceback (most recent call last):
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\Test_View_Mnist.py", line 16, in
pixels = np.array(pixels, dtype='uint8')
ValueError: invalid literal for int() with base 10: '1x1'
When I remove dtype='unit8'
I get the error:
Traceback (most recent call last):
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\Test_View_Mnist.py", line 24, in
plt.imshow(pixels, cmap='gray')
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_api\deprecation.py", line 456, in wrapper
return func(*args, **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\pyplot.py", line 2640, in imshow
_ret = gca().imshow(
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_api\deprecation.py", line 456, in wrapper
return func(*args, **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_init.py", line 1412, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\axes_axes.py", line 5488, in imshow
im.set_data(X)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\image.py", line 706, in set_data
raise TypeError("Image data of dtype {} cannot be converted to "
TypeError: Image data of dtype <U5 cannot be converted to float
Process finished with exit code 1
Could someone explain why this error is happening and how to fix it?
Thanks.
There are two problems here. (1) You need to skip the first row because they are labels. (1x1), (1x2) and etc. (2) You need int64 data type. The code below will solve both. next(csvreader) skips the first row.
import numpy as np
import csv
import matplotlib.pyplot as plt
with open('./mnist_test.csv', 'r') as csv_file:
csvreader = csv.reader(csv_file)
next(csvreader)
for data in csvreader:
# The first column is the label
label = data[0]
# The rest of columns are pixels
pixels = data[1:]
# Make those columns into a array of 8-bits pixels
# This array will be of 1D with length 784
# The pixel intensity values are integers from 0 to 255
pixels = np.array(pixels, dtype = 'int64')
print(pixels.shape)
# Reshape the array into 28 x 28 array (2-dimensional array)
pixels = pixels.reshape((28, 28))
print(pixels.shape)
# Plot
plt.title('Label is {label}'.format(label=label))
plt.imshow(pixels, cmap='gray')
plt.show()
Related
Can't get correct input for DBSCAN clustersing
I have a node2vec embedding stored as a .csv file, values are a square symmetric matrix. I have two versions of this, one with node names in the first column and another with node names in the first row. I would like to cluster this data with DBSCAN, but I can't seem to figure out how to get the input right. I tried this: import numpy as np import pandas as pd from sklearn.cluster import DBSCAN from sklearn import metrics input_file = "node2vec-labels-on-columns.emb" # for tab delimited use: df = pd.read_csv(input_file, header = 0, delimiter = "\t") # put the original column names in a python list original_headers = list(df.columns.values) emb = df.as_matrix() db = DBSCAN(eps=0.3, min_samples=10).fit(emb) labels = db.labels_ # Number of clusters in labels, ignoring noise if present. n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) n_noise_ = list(labels).count(-1) print("Estimated number of clusters: %d" % n_clusters_) print("Estimated number of noise points: %d" % n_noise_) This leads to an error: dbscan.py:14: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. emb = df.as_matrix() Traceback (most recent call last): File "dbscan.py", line 15, in <module> db = DBSCAN(eps=0.3, min_samples=10).fit(emb) File "C:\Python36\lib\site-packages\sklearn\cluster\_dbscan.py", line 312, in fit X = self._validate_data(X, accept_sparse='csr') File "C:\Python36\lib\site-packages\sklearn\base.py", line 420, in _validate_data X = check_array(X, **check_params) File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f return f(**kwargs) File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 646, in check_array allow_nan=force_all_finite == 'allow-nan') File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 100, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). I've tried other input methods that lead to the same error. All the tutorials I can find use datasets imported form sklearn so those are of not help figuring out how to read from a file. Can anyone point me in the right direction?
The error does not come from the fact that you are reading the dataset from a file but on the content of the dataset. DBSCAN is meant to be used on numerical data. As stated in the error, it does not support NaNs. If you are willing to cluster strings or labels, you should find some other model.
I want to use matplotlib show a grayscale figure transformed by tensorflow 2.0
I am newcomer for TensorFlow 2.0, and after I load a figure, I want to plot the grayscaled figure transformed by tensorflow, unfortunately there was a error came up. import tensorflow as tf import numpy as np import matplotlib.pyplot as plt im = tf.io.read_file('/home/1.png') image = tf.image.decode_png(im) image_gray = tf.image.rgb_to_grayscale(image) plt.figure() plt.imshow(image_gray) Then the error pops: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2677, in imshow None else {}), **kwargs) File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/__init__.py", line 1599, in inner return func(ax, *map(sanitize_sequence, args), **kwargs) File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 369, in wrapper return func(*args, **kwargs) File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 369, in wrapper return func(*args, **kwargs) File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 5679, in imshow im.set_data(X) File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/image.py", line 690, in set_data .format(self._A.shape)) TypeError: Invalid shape (321, 327, 1) for image data But the original transformed figure without any question. plt.figure() plt.imshow(image) plt.show()
The important part of your error message is: TypeError: Invalid shape (321, 327, 1) for image data Obviously, TensorFlow's rgb_to_grayscale stores converted images in that way: The size of the last dimension of the output is 1, containing the Grayscale value of the pixels. Nevertheless, Matplotlib can't handle data in that way for grayscale images, but expects a shape like (321, 327), i.e. without single-dimensional data. Since you're dealing with NumPy arrays here, you can use NumPy's squeeze method to get rid of the additional dimension: import tensorflow as tf import numpy as np import matplotlib.pyplot as plt im = tf.io.read_file('/home/1.png') image = tf.image.decode_png(im) image_gray = tf.image.rgb_to_grayscale(image).squeeze() # <-- ! plt.figure() plt.imshow(image_gray) Hope that helps!
KMeans in Python: ValueError: setting an array element with a sequence
I am trying to perform kmeans clustering in Python using numpy and sklearn. I have a txt file with 45 columns and 645 rows. The first row is Y and remaining 644 rows are X. My Python code is: import numpy as np import matplotlib.pyplot as plt import csv from sklearn.cluster import KMeans #The following code reads the first row and terminates the loop with open('trainDataXY.txt','r') as f: read = csv.reader(f) for first_row in read: y = list(first_row) break #The following code skips the first row and reads rest of the rows firstLine = True with open('trainDataXY.txt','r') as f1: readY = csv.reader(f1) for rows in readY: if firstLine: firstLine=False continue x = list(readY) X = np.array((x,y), dtype=object) kmean = KMeans(n_clusters=2) kmean.fit(X) I get an error at this line: kmean.fit(X) The error I get is: Traceback (most recent call last): File "D:\file_path\kmeans.py", line 25, in <module> kmean.fit(X) File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py", line 812, in fit X = self._check_fit_data(X) File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py", line 786, in _check_fit_data X = check_array(X, accept_sparse='csr', dtype=np.float64) File "C:\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 373, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: setting an array element with a sequence.` trainDataXY.txt 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5 47,64,50,39,66,51,46,37,43,37,37,35,36,34,37,38,37,39,104,102,103,103,102,108,109,107,106,115,116,116,120,122,121,121,116,116,131,131,130,132,126,127,131,128,127 47,65,58,30,39,48,47,35,42,37,38,37,37,36,38,38,38,40,104,103,103,103,101,108,110,108,106,116,115,116,121,121,119,121,116,116,133,131,129,132,127,128,132,126,127 49,69,55,28,56,64,50,30,41,37,39,37,38,36,39,39,39,40,105,103,104,104,103,110,110,108,107,116,115,117,120,120,117,121,115,116,134,131,129,134,128,125,134,126,127 51,78,52,46,56,74,50,28,38,38,39,38,38,37,40,39,39,41,96,101,99,104,97,101,111,101,104,115,116,116,119,110,112,119,116,116,135,130,129,135,120,108,133,120,125 55,79,53,65,52,102,55,28,36,39,40,38,39,37,40,39,40,42,79,86,84,105,84,57,110,85,76,117,118,115,110,66,86,117,117,118,123,130,130,129,106,93,130,113,114 48,80,59,81,50,120,63,26,31,39,40,39,40,38,42,37,41,42,53,73,77,90,47,34,76,52,63,106,102,97,80,33,68,105,105,113,115,130,124,111,83,91,128,105,110 45,95,56,86,38,137,60,27,27,39,40,38,40,37,41,52,38,41,24,44,44,79,40,32,48,26,28,63,52,59,42,30,62,79,67,77,116,121,122,114,96,90,126,93,103 45,93,47,86,35,144,60,26,27,39,40,45,39,38,43,87,46,58,33,21,26,62,42,49,49,37,24,33,41,56,29,28,68,79,58,74,115,111,115,119,117,104,132,92,97 48,85,50,83,37,142,62,25,29,57,47,77,43,64,61,115,70,101,41,28,28,48,39,46,42,38,37,47,43,74,32,28,64,86,80,81,127,113,99,130,140,112,139,92,97 48,94,78,77,30,138,57,28,29,91,66,94,61,94,103,129,89,140,38,34,32,38,33,43,38,36,39,50,39,75,31,33,65,89,82,84,127,112,100,133,141,107,136,95,97 45,108,158,77,30,140,67,29,26,104,97,113,92,106,141,137,116,151,33,32,32,43,44,40,37,34,37,54,86,77,55,48,77,112,83,109,120,111,105,124,133,98,129,89,99 48,139,173,64,40,159,61,55,27,115,117,128,106,124,150,139,125,160,27,26,29,54,51,47,36,36,32,80,125,105,97,96,86,130,102,118,117,104,105,118,117,92,130,94,97 131,157,143,66,87,130,57,118,26,124,137,129,133,138,156,133,132,173,29,25,28,81,48,38,48,32,24,134,165,144,149,142,110,145,147,161,114,112,103,118,115,94,126,87,102 160,162,146,78,116,127,52,133,71,116,141,125,125,141,169,115,110,161,69,53,46,97,79,47,76,59,32,148,147,134,165,152,111,155,139,145,116,113,101,118,105,86,123,92,99
Your data matrix should not be of type object. It should be a matrix of numbers of shape n_samples x n_features. This error usually crops up when people try to convert a list of samples into a data matrix, and each sample is an array or a list, and at least one of the samples does not have the same length as the others. This can be figured out by evaluating np.unique(list(map(len, X))). In your case it is different. Make sure you obtain a data matrix. The first thing to try is to replace the line X = np.array((x,y), dtype=object) with something that creates a data matrix. You should also opt for using numpy.recfromcsv to read your data. It will make everything easier to read.
translation/rotation through phase correlation in python
I have two pictures, one that was the original and another one that I have modified so that it's translated up and left a bit and then rotated 90 degrees (so the shape of the picture is transposed as well). Now I'd like to determine how many pixels (or any distance unit) the modified picture is translated from the original, as well as the degrees of rotation relative to the original. Phase correlation is supposed to solve this problem by first converting the coordinates to logpolar coordinates, then doing a number of things so that in the end you get a correlation matrix. From that matrix I'm supposed to find the peak and the (x,y) combination will reveal the translation and rotation somehow. This link explains it much better: Phase correlation This is the following code I have: import scipy as sp from scipy import ndimage from PIL import Image from math import * import numpy as np def logpolar(input,silent=False): # This takes a numpy array and returns it in Log-Polar coordinates. if not silent: print("Creating log-polar coordinates...") # Create a cartesian array which will be used to compute log-polar coordinates. coordinates = sp.mgrid[0:max(input.shape)*2,0:360] # Compute a normalized logarithmic gradient log_r = 10**(coordinates[0,:]/(input.shape[0]*2.)*log10(input.shape[1])) # Create a linear gradient going from 0 to 2*Pi angle = 2.*pi*(coordinates[1,:]/360.) # Using scipy's map_coordinates(), we map the input array on the log-polar # coordinate. Do not forget to center the coordinates! if not silent: print("Interpolation...") lpinput = ndimage.interpolation.map_coordinates(input, (log_r*sp.cos(angle)+input.shape[0]/2., log_r*sp.sin(angle)+input.shape[1]/2.), order=3,mode='constant') # Returning log-normal... return lpinput def load_image( infilename ) : img = Image.open( infilename ) img.load() data = np.asarray( img, dtype="int32" ) return data def save_image( npdata, outfilename ) : img = Image.fromarray( np.asarray( np.clip(npdata,0,255), dtype="uint8"), "L" ) img.save( outfilename ) image = load_image("C:/images/testing_image1.jpg") target = load_image("C:/images/testing_otherimage.jpg") # Conversion to log-polar coordinates lpimage = logpolar(image) lptarget = logpolar(target) # Correlation through FFTs Fcorr = np.fft.fft(lpimage)*np.fft.fft(lptarget) correlation = np.fft.ifft(Fcorr) The problem I have now is that this code will give as output: Traceback (most recent call last): File "./phase.py", line 44, in <module> lpimage = logpolar(image) File "./phase.py", line 24, in logpolar order=3,mode='constant') File "C:\Python27\lib\site-packages\scipy\ndimage\interpolation.py", line 295, in map_coordinates raise RuntimeError('invalid shape for coordinate array') RuntimeError: invalid shape for coordinate array As I just have a very superficial understanding of what exactly is happening in the whole phase correlation process, I'm unclear on what the problem is about. I have tried to see if something's wrong with the input so I added save_image(image,"C:/testing.jpg") right after loading the image to see if there's something wrong with the numpy array from my images. And sure enough, the images I convert to np array, cannot be converted back to an image. This is the error I get: Traceback (most recent call last): File "./phase.py", line 41, in <module> save_image(image,"C:/testing.jpg") File "./phase.py", line 36, in save_image img = Image.fromarray( np.asarray( np.clip(npdata,0,255), dtype="uint8"), "L" ) File "C:\Python27\lib\site-packages\PIL\Image.py", line 1917, in fromarray raise ValueError("Too many dimensions.") ValueError: Too many dimensions. Taking a peek at the original documentation didn't give me much inspiration on what the problem could be. I don't think the code to convert images to numpy arrays are wrong as I've tested for the type with print type(image) and the results looked legit. Yet I can't convert it back to an image. Any help I can get would be greatly appreciated.
I think the problem is that you are trying to input a 3D image array (R,G,B,A?), into your function. Whereas the input only takes a 2D arrays. Try using a single channel to determine the transformation. E.g. image = load_image("/path/to/image")[:,:,0]
MemoryError during Fast Fourier Transform on an image using NumPy arrays under Windows
The code could compute Fourier transform from a .tiff image on my Ubuntu 11.04. On Windows XP it produces memory error. What to change? Thank you. def fouriertransform(result): #function for Fourier transform computation for filename in glob.iglob ('*.tif') imgfourier = scipy.misc.imread(filename) #read the image arrayfourier = numpy.array([imgfourier])#make an array # Take the fourier transform of the image. F1 = fftpack.fft2(arrayfourier) # Now shift so that low spatial frequencies are in the center. F2 = fftpack.fftshift(F1) # the 2D power spectrum is: psd2D = np.abs(F2)**2 L = psd2D np.set_printoptions(threshold=3) #np.set_printoptions(precision = 3, threshold = None, edgeitems = None, linewidth = 3, suppress = True, nanstr = None, infstr = None, formatter = None) for subarray in L: for array in subarray: for array in subarray: for elem in array: print '%3.10f\n' % elem The error output is: Traceback (most recent call last): File "C:\Documents and Settings\HrenMudak\Мои документы\Моя музыка\fourier.py", line 27, in <module> F1 = fftpack.fft2(arrayfourier) File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 571, in fft2 return fftn(x,shape,axes,overwrite_x) File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 521, in fftn return _raw_fftn_dispatch(x, shape, axes, overwrite_x, 1) File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 535, in _raw_fftn_dispatch return _raw_fftnd(tmp,shape,axes,direction,overwrite_x,work_function) File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 463, in _raw_fftnd x, copy_made = _fix_shape(x, s[i], waxes[i]) File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 134, in _fix_shape z = zeros(s,x.dtype.char) MemoryError
I've tried to run your code, except that I replaced the mahotas.imread with the scipy.misc.imread function, because I don't have that library, and I could not reproduce your error. Some further remarks: can you try to use the scipy.misc.imread function instead of the mahotas function? I suppose the issue could be there what is the actual exception that is thrown? (+other output?) what are the dimensions of your image? Gray-scale / RGB? Printing all values for a large image could indeed take up quite some memory, so it might be better to visualize the results with e.g. matplotlibs imshow function.