What does matrix[x] for different x indicate? - python

While using the MNIST datasetfrom kaggle,i have noticed that all the tutorials use mnist[x] for different values of x to retrieve different pictures.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
mnist=pd.read_csv(r"(dir of dataset)").values
img=mnist[1]
img.shape=(28,28)
plt.imshow(img)
plt.show()
My doubt is what mnist[1] retrieves,also i have noticed that mnist[-1] also works,so that is why i am confused.

In Python, a matrix is just an array of array. Notice the second "array" I mentioned here could be another "matrix".
So your "matrix[x]" simply means the (x+1)th element of your object.
In case of the matrix for a dataset, mostly the first dimension of the matrix would be the sample id.
So your "matrix[x]" means the argument array of the (x+1)th sample.

Related

Histogram of 2D arrays and determine array which contains highest and lowest values

I have a 2D array of shape 5 and 10. So 5 different arrays with 10 values. I am hoping to get a histogram and see which array is on the lower end versus higher end of a histogram. Hope that makes sense. I am attaching an image of an example of what I mean (labeled example).
Looking for one histogram but the histogram is organized by the distribution of the highest and lowest of each array.
I'm having trouble doing this with Python. I tried a few ways of doing this:
# setting up 2d array
import numpy as np
from scipy import signal
np.random.seed(1234)
array_2d = np.random.random((5,20))
I thought you could maybe just plot all the histograms of each array (5 of them) like this:
for i in range(5):
plt.hist(signal.detrend(array_2d[i,:],type='constant'),bins=20)
plt.show()
And then looking to see which array's histogram is furthest to the right or left, but not sure if that makes too much sense...
Then also considered using .ravel to make the 2D array into a 1D array which makes a nice histogram. But all the values within each array are being shifted around so it's difficult to tell which array is on the lower or higher end of the histogram:
plt.hist(signal.detrend(array_2d.ravel(),type='constant'),bins=20)
plt.xticks(np.linspace(-1,1,10));
How might I get a histogram of the 5 arrays (shape 5, 10) and get the range of the arrays with the lowest values versus array with highest values?
Also please let me know if this is unclear or not possible at all too haha. Thanks!
Maybe you could use a kdeplot? This would replace each input value with a small Gaussian curve and sum them.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1234)
array_2d = np.random.random((5, 20))
sns.kdeplot(data=pd.DataFrame(array_2d.T, columns=range(1, 6)), palette='Set1', multiple='layer')

How to apply scipy.savgol_filter to pandas Series?

In code:
import pandas as pd
import numpy as np
from scipy.signal import savgol_filter
s = pd.Series(np.random.rand(20), index=np.arange(20))
def smooth(x):
return savgol_filter(x, 7, 3)
s.apply(smooth)
I got an error If mode is 'interp', window_length must be less than or equal to the size of x..
The reason I think is smooth takes the value row-wisely from Series, so it got only 1 value each time which is smaller than the window_length=7.
If I use pd.DataFrame.apply(), there is a parameter axis that can be set to get the correct result.
I also want to avoid using smooth(s) which returns a NumPy array instead of pandas.Series.
Is there any way to .apply() sav_gol filter on pandas.Series directly?

Python: Convert 2d point cloud to grayscale image

I have an array of variable length filled with 2d coordinate points (coming from a point cloud) which are distributed around (0,0) and i want to convert them into a 2d matrix (=grayscale image).
# have
array = [(1.0,1.1),(0.0,0.0),...]
# want
matrix = [[0,100,...],[255,255,...],...]
how would i achieve this using python and numpy
Looks like matplotlib.pyplot.hist2d is what you are looking for.
It basically bins your data into 2-dimensional bins (with a size of your choice).
here the documentation and a working example is given below.
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.randn(1000), np.random.randn(1000)]
plt.scatter(data[0], data[1])
Then you can call hist2d on your data, for instance like this
plt.hist2d(data[0], data[1], bins=20)
note that the arguments of hist2d are two 1-dimensional arrays, so you will have to do a bit of reshaping of our data prior to feed it to hist2d.
Quick solution using only numpy without the need for matplotlib and therefor plots:
import numpy as np
# given a 2dArray "array" and a desired image shape "[x,y]"
matrix = np.histogram2d(array[:,0], array[:,1], bins=[x,y])

Plot 3rd axis of a 3D numpy array

I have a 3D numpy array that is a stack of 2D (m,n) images at certain timestamps, t. So my array is of shape (t, m, n). I want to plot the value of one of the pixels as a function of time.
e.g.:
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in xrange(10):
a = np.random(100,100)
data_cube.append(a)
So my (t, m, n) now has shape (10,100,100). Say I wanted a 1D plot the value of index [12][12] at each of the 10 steps I would do:
plt.plot(data_cube[:][12][12])
plt.show()
But I'm getting index out of range errors. I thought I might have my indices mixed up, but every plot I generate seems to be in the 'wrong' axis, i.e. across one of the 2D arrays, but instead I want it 'through' the vertical stack. Thanks in advance!
Here is the solution: Since you are already using numpy, convert you final list to an array and just use slicing. The problem in your case was two-fold:
First: Your final data_cube was not an array. For a list, you will have to iterate over the values
Second: Slicing was incorrect.
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in range(10):
a = np.random.rand(100,100)
data_cube.append(a)
data_cube = np.array(data_cube) # Added this step
plt.plot(data_cube[:,12,12]) # Modified the slicing
Output
A less verbose version that avoids iteration:
data_cube = np.random.rand(10, 100,100)
plt.plot(data_cube[:,12,12])

How to compute mean of an 4D array individually?

I have an array img with shape is 64x128x512x3 that is concated from three images 64x128x512. I want to compute mean of each image individually, given by the array img. Hence, I performed the code as bellows:
import numpy as np
img_means = np.mean(img, (0, 1, 2)))
Is it correct? My expected result is that img_means[0,:,:,:] is mean of the first image, img_means[1,:,:,:] is mean of second image, img_means[2,:,:,:] of third image.
Yes it is correct, but note that img_means is just an array of three numbers (each one is the mean of the corresponding figure).
Your code is not working in python 3.x
Do it like this:
First generate the data
import numpy as np
img=np.arange(64*128*512*3).reshape(64,128,512,3)
And this is what you want:
img_means=[img[:,:,:,i].mean() for i in range(img.shape[3]) ]

Categories