I am trying to understand what this code does. I am going through some examples about numpy and plotting and I can't figure out what u and v are. I know u is an array of two arrays each with size 10000. What does v=u.max(axis=0) do? Is the max function being invoked part of the standard python library? When I plot the histogram I get a pdf defined by 2x as opposed to a normal uniform distribution.
import numpy as np
import numpy.random as rand
import matplotlib.pyplot as plt
np.random.seed(123)
u=rand.uniform(0,1,[2,10000])
v=u.max(axis=0)
plt.figure()
plt.hist(v,100,normed=1,color='blue')
plt.ylim([0,2])
plt.show()
u.max(), or equivalently np.max(u), will give you the maximum value in the array - i.e. a single value. It's the Numpy function here, not part of the standard library. You often want to find the maximum value along a particular axis/dimension and that's what is happening here.
U has shape (2,10000), and u.max(axis=0) gives you the max along the 0 axis, returning an array with shape (10000,). If you did u.max(axis=1) you would get an array with shape (2,).
Simple illustration/example:
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.max(axis=0)
array([3, 4])
>>> a.max(axis=1)
array([2, 4])
>>> a.max()
4
first three lines you load in different modules (libraries that are relied apon in the rest of the code). you load numpy which is a numerical library, numpy.random which is a library that does a lot of great work to create random numbers and matplotlib allows for plotting functions.
the rest is described here:
np.random.seed(123)
A computer does not really generate a random number rather picks a number from a long list of numbers (for a more correct explanation of how this is done http://en.wikipedia.org/wiki/Random_number_generation). In essence if you want to reproduce the work with the same random numbers the computer needs to know where in this list of numbers to start picking numbers. This is what this line of code does. If anybody else runs the same piece of code now you end up with the same 'random' numbers.
u=rand.uniform(0,1,[2,10000])
This generates 10000 random numbers twice that are distributed between 0 and 1. This is uniform distribution so it is equally likely to get any point between 0 and 1. (Again more information can be found here: http://en.wikipedia.org/wiki/Uniform_distribution_(continuous) ). You are creating two arrays within an array. This can be checked by doing: len(u) and len(u[0]).
v=u.max(axis=0)
The u.max? command in iPython refers you to the docs. It is basically select a max and the axis determines how the max is chosen. Try the following:
a = np.arange(4).reshape((2,2))
np.amax(a, axis=0) # gives array([2, 3])
np.amax(a, axis=1) # gives array([1, 3])
The rest of the code is meant to set the histogram plot. There are 100 bins in total in the histogram and the bars will be colored blue. The maximum height on the histogram y-axis is 2 and normed will guarantee that at least one sample will be in every bin.
I can't clearly make up what the true purpose or application of the code was. But this is en essence what it is doing.
Related
I have a 2D array of shape 5 and 10. So 5 different arrays with 10 values. I am hoping to get a histogram and see which array is on the lower end versus higher end of a histogram. Hope that makes sense. I am attaching an image of an example of what I mean (labeled example).
Looking for one histogram but the histogram is organized by the distribution of the highest and lowest of each array.
I'm having trouble doing this with Python. I tried a few ways of doing this:
# setting up 2d array
import numpy as np
from scipy import signal
np.random.seed(1234)
array_2d = np.random.random((5,20))
I thought you could maybe just plot all the histograms of each array (5 of them) like this:
for i in range(5):
plt.hist(signal.detrend(array_2d[i,:],type='constant'),bins=20)
plt.show()
And then looking to see which array's histogram is furthest to the right or left, but not sure if that makes too much sense...
Then also considered using .ravel to make the 2D array into a 1D array which makes a nice histogram. But all the values within each array are being shifted around so it's difficult to tell which array is on the lower or higher end of the histogram:
plt.hist(signal.detrend(array_2d.ravel(),type='constant'),bins=20)
plt.xticks(np.linspace(-1,1,10));
How might I get a histogram of the 5 arrays (shape 5, 10) and get the range of the arrays with the lowest values versus array with highest values?
Also please let me know if this is unclear or not possible at all too haha. Thanks!
Maybe you could use a kdeplot? This would replace each input value with a small Gaussian curve and sum them.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1234)
array_2d = np.random.random((5, 20))
sns.kdeplot(data=pd.DataFrame(array_2d.T, columns=range(1, 6)), palette='Set1', multiple='layer')
I want to know how much different are two numpy matrices. Matrix1 and Matrix2 could be much similar, like 80% same values but just shifted... I attach images of two identical arrays that differ in a little sequence of values in top right.
from skimage.util import compare_images
#matrix1 & matrix2 are numpy arrays
compare_images(matrix1, matrix2, method='diff')
Gives me a first comparison, but what about two numpy matrices, one of which is, for example, left-shifted by a couple of columns?
from scipy.signal import correlate2d
corr = correlate2d(matrix1, matrix2)
plt.figure(figsize=(10,10))
plt.imshow(corr)
plt.grid(False)
plt.show()
Prints out correlation and it seems a nice method, but I do not understand how the results are displayed, since the differences are in top right of the images.
Otherwise:
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))
print(np.sum(picture2_norm*picture1_norm))
Returns a value in range 0-1 of similarity; for example 0.9942.
What could be a good method?
Correlation between two matrices is a legitimate measure of how similar both are. If both contain the same values the (normalized) correlation will be 1 and your (max?) value of 0.9942 is already very close to that.
Regarding translational (in-)variance of your result have a closer look at the mode argument of scipy.signal.correlate2d which defines how to handle differing sizes along both axes of your matrices and how far to slide one matrix over the other when calculating the correlation.
I have an array of values and would like to create a matrix from that, where each row is my starting point vector multiplied by a sample from a (normal) distribution.
The number of rows of this matrix will then vary in dependence from the number of samples I want.
%pylab
my_vec = array([1,2,3])
my_rand_vec = my_vec*randn(100)
Last command does not work, because array shapes do not match.
I could think of using a for loop, but I am trying to leverage on array operations.
Try this
my_rand_vec = my_vec[None,:]*randn(100)[:,None]
For small numbers I get for example
import numpy as np
my_vec = np.array([1,2,3])
my_rand_vec = my_vec[None,:]*np.random.randn(5)[:,None]
my_rand_vec
# array([[ 0.45422416, 0.90844831, 1.36267247],
# [-0.80639766, -1.61279531, -2.41919297],
# [ 0.34203295, 0.6840659 , 1.02609885],
# [-0.55246431, -1.10492863, -1.65739294],
# [-0.83023829, -1.66047658, -2.49071486]])
Your solution my_vec*rand(100) does not work because * corresponds to the element-wise multiplication which only works if both arrays have identical shapes.
What you have to do is adding an additional dimension using [None,:] and [:,None] such that numpy's broadcasting works.
As a side note I would recommend not to use pylab. Instead, use import as in order to include modules as pointed out here.
It is the outer product of vectors:
my_rand_vec = numpy.outer(randn(100), my_vec)
You can pass the dimensions of the array you require to numpy.random.randn:
my_rand_vec = my_vec*np.random.randn(100,3)
To multiply each vector by the same random number, you need to add an extra axis:
my_rand_vec = my_vec*np.random.randn(100)[:,np.newaxis]
I'm using a function in python's opencv library to get the light flow movement of my hand as I move it around. Specifically http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#calcopticalflowfarneback
This function outputs a numpy array
flow = cv2.calcOpticalFlowFarneback(prevgray, gray, 0.5, 3, 15, 3, 5, 1.2, 0)
print flow.shape # prints (480,320,2)
So flow is a matrix with each entry a vector. I want a way to quantify this matrix so I though of using the L1 Matrix norm (numpy.linalg.norm(flow, 1)) Which throws a improper dimensions to norm error.
I'm thinking about getting around this by calculating the euclidean norm of every vector and then finding the L1 norm of a matrix with the distances of the vectors.
I'm having trouble iterating through the flow matrix efficiently. I have done it using two for loops by going first through columns and then rows, but it's way too slow.
r,c,d = flow.shape
flowprime = numpy.zeros((r,c),flow.dtype)
for i in range(0,r):
for j in range (0,c):
flowprime[i,j] = numpy.linalg.norm(flow[i,j], 2)
print(numpy.linalg.norm(flowprime, 1))
I had also tried using numpy.nditer but
for x in numpy.nditer(flow, op_flags=['readwrite']):
print x
just prints a single value rather than a vector.
What would be the fastest way to iterate through a numpy matrix with vectors as entries, norm them and then take the L1 norm?
As of numpy version 1.9, norm takes an axis argument.
Aside from that, say what you want ideally, and almost surely you can ask numpy to do it. E.g., assuming no complex entries or missing values, the simplest case np.sqrt((flow**2).sum()) or the case I think you describe np.linalg.norm(np.sqrt((flow**2).sum(axis=-1)),1).
I am trying to make a 2D representation of a 3D data in matplotlib.
I have some data files, for example:
a_1.dat
a_2.dat
a_3.dat
b_1.dat
b_2.dat
b_3.dat
From each data file I can extract the letter, the number, and a parameter associated with the letter-number pair.
I am trying to make a scatter plot where one axis is the range of letters, another axis is the range of numbers, and the scattered points represent the magnitude of the parameter associated with each letter-number pair. I would prefer is this was a 2D plot with a colorbar of some kind, as opposed to a 3D plot.
At this point, I can make a stack of 2d numpy arrays, where each 2d array looks something like
[a 1 val_a1
a 2 val_a2
a 3 val_a3]
[b 1 val_b1
b 2 val_b2
b 3 val_b3]
First question: Is this the best way to store the data for the plot I am trying to make?
Second question: How do I make the plot using python (I am most familiar with matplotlib pyplot)?
To be able to fully determine if your way of storing data is correct, you should consider how you use it. If you're using it only want to use it for plotting as described here, then for the sake of the simplicity you can just use three 1D arrays. If, however, you wish to achieve tighter structure, you might consider using a 2D array with custom dtype.
Having this in mind, you can easily create a 2D scatter plot with different colors, where exact color is determined by the value associated with each pair (letter, number).
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import cm
# You might note that in this simple case using numpy for creating array
# was actually unnecessary as simple lists would suffice
letters = np.array(['a', 'a', 'a', 'b', 'b', 'b'])
numbers = np.array([1, 2, 3, 1, 2, 3])
values = np.array([1, 2, 3, 1.5, 3.5, 4.5])
items = len(letters)
# x and y should be numbers, so we first feed it some integers
# Parameter c defines color values and cmap defines color mappings
plt.scatter(xrange(items), numbers, c=values, cmap=cm.jet)
# Now that data is created, we can re-set xticks
plt.xticks(xrange(items), letters)
Hopefully, this should be enough for a good start.