How to read and display MNIST dataset? - python

The code below opens the mnist dataset as a csv
import numpy as np
import csv
import matplotlib.pyplot as plt
with open('C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet/MNIST_Dataset/mnist_train.csv/mnist_train.csv', 'r') as csv_file:
for data in csv.reader(csv_file):
# The first column is the label
label = data[0]
# The rest of columns are pixels
pixels = data[1:]
# Make those columns into a array of 8-bits pixels
# This array will be of 1D with length 784
# The pixel intensity values are integers from 0 to 255
pixels = np.array(pixels, dtype='uint8')
print(pixels.shape)
# Reshape the array into 28 x 28 array (2-dimensional array)
pixels = pixels.reshape((28, 28))
print(pixels.shape)
# Plot
plt.title('Label is {label}'.format(label=label))
plt.imshow(pixels, cmap='gray')
plt.show()
break # This stops the loop, I just want to see one
I got the code above from someone and cannot get it to display the mnist digits.
I get the error:
Traceback (most recent call last):
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\Test_View_Mnist.py", line 16, in
pixels = np.array(pixels, dtype='uint8')
ValueError: invalid literal for int() with base 10: '1x1'
When I remove dtype='unit8'
I get the error:
Traceback (most recent call last):
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\Test_View_Mnist.py", line 24, in
plt.imshow(pixels, cmap='gray')
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_api\deprecation.py", line 456, in wrapper
return func(*args, **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\pyplot.py", line 2640, in imshow
_ret = gca().imshow(
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_api\deprecation.py", line 456, in wrapper
return func(*args, **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib_init.py", line 1412, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\axes_axes.py", line 5488, in imshow
im.set_data(X)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet\source\lib\site-packages\matplotlib\image.py", line 706, in set_data
raise TypeError("Image data of dtype {} cannot be converted to "
TypeError: Image data of dtype <U5 cannot be converted to float
Process finished with exit code 1
Could someone explain why this error is happening and how to fix it?
Thanks.

There are two problems here. (1) You need to skip the first row because they are labels. (1x1), (1x2) and etc. (2) You need int64 data type. The code below will solve both. next(csvreader) skips the first row.
import numpy as np
import csv
import matplotlib.pyplot as plt
with open('./mnist_test.csv', 'r') as csv_file:
csvreader = csv.reader(csv_file)
next(csvreader)
for data in csvreader:
# The first column is the label
label = data[0]
# The rest of columns are pixels
pixels = data[1:]
# Make those columns into a array of 8-bits pixels
# This array will be of 1D with length 784
# The pixel intensity values are integers from 0 to 255
pixels = np.array(pixels, dtype = 'int64')
print(pixels.shape)
# Reshape the array into 28 x 28 array (2-dimensional array)
pixels = pixels.reshape((28, 28))
print(pixels.shape)
# Plot
plt.title('Label is {label}'.format(label=label))
plt.imshow(pixels, cmap='gray')
plt.show()

Related

Can't get correct input for DBSCAN clustersing

I have a node2vec embedding stored as a .csv file, values are a square symmetric matrix. I have two versions of this, one with node names in the first column and another with node names in the first row. I would like to cluster this data with DBSCAN, but I can't seem to figure out how to get the input right. I tried this:
import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn import metrics
input_file = "node2vec-labels-on-columns.emb"
# for tab delimited use:
df = pd.read_csv(input_file, header = 0, delimiter = "\t")
# put the original column names in a python list
original_headers = list(df.columns.values)
emb = df.as_matrix()
db = DBSCAN(eps=0.3, min_samples=10).fit(emb)
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print("Estimated number of clusters: %d" % n_clusters_)
print("Estimated number of noise points: %d" % n_noise_)
This leads to an error:
dbscan.py:14: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
emb = df.as_matrix()
Traceback (most recent call last):
File "dbscan.py", line 15, in <module>
db = DBSCAN(eps=0.3, min_samples=10).fit(emb)
File "C:\Python36\lib\site-packages\sklearn\cluster\_dbscan.py", line 312, in fit
X = self._validate_data(X, accept_sparse='csr')
File "C:\Python36\lib\site-packages\sklearn\base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 73, in inner_f
return f(**kwargs)
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 646, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Python36\lib\site-packages\sklearn\utils\validation.py", line 100, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I've tried other input methods that lead to the same error. All the tutorials I can find use datasets imported form sklearn so those are of not help figuring out how to read from a file. Can anyone point me in the right direction?
The error does not come from the fact that you are reading the dataset from a file but on the content of the dataset.
DBSCAN is meant to be used on numerical data. As stated in the error, it does not support NaNs.
If you are willing to cluster strings or labels, you should find some other model.

I want to use matplotlib show a grayscale figure transformed by tensorflow 2.0

I am newcomer for TensorFlow 2.0, and after I load a figure, I want to plot the grayscaled figure transformed by tensorflow, unfortunately there was a error came up.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
im = tf.io.read_file('/home/1.png')
image = tf.image.decode_png(im)
image_gray = tf.image.rgb_to_grayscale(image)
plt.figure()
plt.imshow(image_gray)
Then the error pops:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2677, in imshow
None else {}), **kwargs)
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/__init__.py", line 1599, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 369, in wrapper
return func(*args, **kwargs)
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 369, in wrapper
return func(*args, **kwargs)
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 5679, in imshow
im.set_data(X)
File "/home/zhongl/miniconda3/envs/tf2/lib/python3.7/site-packages/matplotlib/image.py", line 690, in set_data
.format(self._A.shape))
TypeError: Invalid shape (321, 327, 1) for image data
But the original transformed figure without any question.
plt.figure()
plt.imshow(image)
plt.show()
The important part of your error message is:
TypeError: Invalid shape (321, 327, 1) for image data
Obviously, TensorFlow's rgb_to_grayscale stores converted images in that way:
The size of the last dimension of the output is 1, containing the Grayscale value of the pixels.
Nevertheless, Matplotlib can't handle data in that way for grayscale images, but expects a shape like (321, 327), i.e. without single-dimensional data.
Since you're dealing with NumPy arrays here, you can use NumPy's squeeze method to get rid of the additional dimension:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
im = tf.io.read_file('/home/1.png')
image = tf.image.decode_png(im)
image_gray = tf.image.rgb_to_grayscale(image).squeeze() # <-- !
plt.figure()
plt.imshow(image_gray)
Hope that helps!

KMeans in Python: ValueError: setting an array element with a sequence

I am trying to perform kmeans clustering in Python using numpy and sklearn.
I have a txt file with 45 columns and 645 rows. The first row is Y and remaining 644 rows are X.
My Python code is:
import numpy as np
import matplotlib.pyplot as plt
import csv
from sklearn.cluster import KMeans
#The following code reads the first row and terminates the loop
with open('trainDataXY.txt','r') as f:
read = csv.reader(f)
for first_row in read:
y = list(first_row)
break
#The following code skips the first row and reads rest of the rows
firstLine = True
with open('trainDataXY.txt','r') as f1:
readY = csv.reader(f1)
for rows in readY:
if firstLine:
firstLine=False
continue
x = list(readY)
X = np.array((x,y), dtype=object)
kmean = KMeans(n_clusters=2)
kmean.fit(X)
I get an error at this line: kmean.fit(X)
The error I get is:
Traceback (most recent call last):
File "D:\file_path\kmeans.py", line 25, in <module> kmean.fit(X)
File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py",
line 812, in fit X = self._check_fit_data(X)
File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py",
line 786, in _check_fit_data X = check_array(X, accept_sparse='csr',
dtype=np.float64)
File "C:\Anaconda2\lib\site-packages\sklearn\utils\validation.py",
line 373, in check_array array = np.array(array, dtype=dtype,
order=order, copy=copy) ValueError: setting an array element with a
sequence.`
trainDataXY.txt
1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5
47,64,50,39,66,51,46,37,43,37,37,35,36,34,37,38,37,39,104,102,103,103,102,108,109,107,106,115,116,116,120,122,121,121,116,116,131,131,130,132,126,127,131,128,127
47,65,58,30,39,48,47,35,42,37,38,37,37,36,38,38,38,40,104,103,103,103,101,108,110,108,106,116,115,116,121,121,119,121,116,116,133,131,129,132,127,128,132,126,127
49,69,55,28,56,64,50,30,41,37,39,37,38,36,39,39,39,40,105,103,104,104,103,110,110,108,107,116,115,117,120,120,117,121,115,116,134,131,129,134,128,125,134,126,127
51,78,52,46,56,74,50,28,38,38,39,38,38,37,40,39,39,41,96,101,99,104,97,101,111,101,104,115,116,116,119,110,112,119,116,116,135,130,129,135,120,108,133,120,125
55,79,53,65,52,102,55,28,36,39,40,38,39,37,40,39,40,42,79,86,84,105,84,57,110,85,76,117,118,115,110,66,86,117,117,118,123,130,130,129,106,93,130,113,114
48,80,59,81,50,120,63,26,31,39,40,39,40,38,42,37,41,42,53,73,77,90,47,34,76,52,63,106,102,97,80,33,68,105,105,113,115,130,124,111,83,91,128,105,110
45,95,56,86,38,137,60,27,27,39,40,38,40,37,41,52,38,41,24,44,44,79,40,32,48,26,28,63,52,59,42,30,62,79,67,77,116,121,122,114,96,90,126,93,103
45,93,47,86,35,144,60,26,27,39,40,45,39,38,43,87,46,58,33,21,26,62,42,49,49,37,24,33,41,56,29,28,68,79,58,74,115,111,115,119,117,104,132,92,97
48,85,50,83,37,142,62,25,29,57,47,77,43,64,61,115,70,101,41,28,28,48,39,46,42,38,37,47,43,74,32,28,64,86,80,81,127,113,99,130,140,112,139,92,97
48,94,78,77,30,138,57,28,29,91,66,94,61,94,103,129,89,140,38,34,32,38,33,43,38,36,39,50,39,75,31,33,65,89,82,84,127,112,100,133,141,107,136,95,97
45,108,158,77,30,140,67,29,26,104,97,113,92,106,141,137,116,151,33,32,32,43,44,40,37,34,37,54,86,77,55,48,77,112,83,109,120,111,105,124,133,98,129,89,99
48,139,173,64,40,159,61,55,27,115,117,128,106,124,150,139,125,160,27,26,29,54,51,47,36,36,32,80,125,105,97,96,86,130,102,118,117,104,105,118,117,92,130,94,97
131,157,143,66,87,130,57,118,26,124,137,129,133,138,156,133,132,173,29,25,28,81,48,38,48,32,24,134,165,144,149,142,110,145,147,161,114,112,103,118,115,94,126,87,102
160,162,146,78,116,127,52,133,71,116,141,125,125,141,169,115,110,161,69,53,46,97,79,47,76,59,32,148,147,134,165,152,111,155,139,145,116,113,101,118,105,86,123,92,99
Your data matrix should not be of type object. It should be a matrix of numbers of shape n_samples x n_features.
This error usually crops up when people try to convert a list of samples into a data matrix, and each sample is an array or a list, and at least one of the samples does not have the same length as the others. This can be figured out by evaluating np.unique(list(map(len, X))).
In your case it is different. Make sure you obtain a data matrix. The first thing to try is to replace the line X = np.array((x,y), dtype=object) with something that creates a data matrix.
You should also opt for using numpy.recfromcsv to read your data. It will make everything easier to read.

translation/rotation through phase correlation in python

I have two pictures, one that was the original and another one that I have modified so that it's translated up and left a bit and then rotated 90 degrees (so the shape of the picture is transposed as well).
Now I'd like to determine how many pixels (or any distance unit) the modified picture is translated from the original, as well as the degrees of rotation relative to the original. Phase correlation is supposed to solve this problem by first converting the coordinates to logpolar coordinates, then doing a number of things so that in the end you get a correlation matrix. From that matrix I'm supposed to find the peak and the (x,y) combination will reveal the translation and rotation somehow. This link explains it much better:
Phase correlation
This is the following code I have:
import scipy as sp
from scipy import ndimage
from PIL import Image
from math import *
import numpy as np
def logpolar(input,silent=False):
# This takes a numpy array and returns it in Log-Polar coordinates.
if not silent: print("Creating log-polar coordinates...")
# Create a cartesian array which will be used to compute log-polar coordinates.
coordinates = sp.mgrid[0:max(input.shape)*2,0:360]
# Compute a normalized logarithmic gradient
log_r = 10**(coordinates[0,:]/(input.shape[0]*2.)*log10(input.shape[1]))
# Create a linear gradient going from 0 to 2*Pi
angle = 2.*pi*(coordinates[1,:]/360.)
# Using scipy's map_coordinates(), we map the input array on the log-polar
# coordinate. Do not forget to center the coordinates!
if not silent: print("Interpolation...")
lpinput = ndimage.interpolation.map_coordinates(input,
(log_r*sp.cos(angle)+input.shape[0]/2.,
log_r*sp.sin(angle)+input.shape[1]/2.),
order=3,mode='constant')
# Returning log-normal...
return lpinput
def load_image( infilename ) :
img = Image.open( infilename )
img.load()
data = np.asarray( img, dtype="int32" )
return data
def save_image( npdata, outfilename ) :
img = Image.fromarray( np.asarray( np.clip(npdata,0,255), dtype="uint8"), "L" )
img.save( outfilename )
image = load_image("C:/images/testing_image1.jpg")
target = load_image("C:/images/testing_otherimage.jpg")
# Conversion to log-polar coordinates
lpimage = logpolar(image)
lptarget = logpolar(target)
# Correlation through FFTs
Fcorr = np.fft.fft(lpimage)*np.fft.fft(lptarget)
correlation = np.fft.ifft(Fcorr)
The problem I have now is that this code will give as output:
Traceback (most recent call last):
File "./phase.py", line 44, in <module>
lpimage = logpolar(image)
File "./phase.py", line 24, in logpolar
order=3,mode='constant')
File "C:\Python27\lib\site-packages\scipy\ndimage\interpolation.py", line 295, in map_coordinates
raise RuntimeError('invalid shape for coordinate array')
RuntimeError: invalid shape for coordinate array
As I just have a very superficial understanding of what exactly is happening in the whole phase correlation process, I'm unclear on what the problem is about. I have tried to see if something's wrong with the input so I added save_image(image,"C:/testing.jpg") right after loading the image to see if there's something wrong with the numpy array from my images. And sure enough, the images I convert to np array, cannot be converted back to an image. This is the error I get:
Traceback (most recent call last):
File "./phase.py", line 41, in <module>
save_image(image,"C:/testing.jpg")
File "./phase.py", line 36, in save_image
img = Image.fromarray( np.asarray( np.clip(npdata,0,255), dtype="uint8"), "L" )
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1917, in fromarray
raise ValueError("Too many dimensions.")
ValueError: Too many dimensions.
Taking a peek at the original documentation didn't give me much inspiration on what the problem could be. I don't think the code to convert images to numpy arrays are wrong as I've tested for the type with print type(image) and the results looked legit. Yet I can't convert it back to an image. Any help I can get would be greatly appreciated.
I think the problem is that you are trying to input a 3D image array (R,G,B,A?), into your function. Whereas the input only takes a 2D arrays. Try using a single channel to determine the transformation. E.g.
image = load_image("/path/to/image")[:,:,0]

MemoryError during Fast Fourier Transform on an image using NumPy arrays under Windows

The code could compute Fourier transform from a .tiff image on my Ubuntu 11.04. On Windows XP it produces memory error. What to change? Thank you.
def fouriertransform(result): #function for Fourier transform computation
for filename in glob.iglob ('*.tif')
imgfourier = scipy.misc.imread(filename) #read the image
arrayfourier = numpy.array([imgfourier])#make an array
# Take the fourier transform of the image.
F1 = fftpack.fft2(arrayfourier)
# Now shift so that low spatial frequencies are in the center.
F2 = fftpack.fftshift(F1)
# the 2D power spectrum is:
psd2D = np.abs(F2)**2
L = psd2D
np.set_printoptions(threshold=3)
#np.set_printoptions(precision = 3, threshold = None, edgeitems = None, linewidth = 3, suppress = True, nanstr = None, infstr = None, formatter = None)
for subarray in L:
for array in subarray:
for array in subarray:
for elem in array:
print '%3.10f\n' % elem
The error output is:
Traceback (most recent call last):
File "C:\Documents and Settings\HrenMudak\Мои документы\Моя музыка\fourier.py", line 27, in <module>
F1 = fftpack.fft2(arrayfourier)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 571, in fft2
return fftn(x,shape,axes,overwrite_x)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 521, in fftn
return _raw_fftn_dispatch(x, shape, axes, overwrite_x, 1)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 535, in _raw_fftn_dispatch
return _raw_fftnd(tmp,shape,axes,direction,overwrite_x,work_function)
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 463, in _raw_fftnd
x, copy_made = _fix_shape(x, s[i], waxes[i])
File "C:\Python27\lib\site-packages\scipy\fftpack\basic.py", line 134, in _fix_shape
z = zeros(s,x.dtype.char)
MemoryError
I've tried to run your code, except that I replaced the mahotas.imread with the scipy.misc.imread function, because I don't have that library, and I could not reproduce your error.
Some further remarks:
can you try to use the scipy.misc.imread function instead of the mahotas function? I suppose the issue could be there
what is the actual exception that is thrown? (+other output?)
what are the dimensions of your image? Gray-scale / RGB? Printing all values for a large image could indeed take up quite some memory, so it might be better to visualize the results with e.g. matplotlibs imshow function.

Categories