plt.eventplot refuses lineoffsets - python

This should be quite easy to reproduce:
plt.eventplot(positions=[1, 2, 3], lineoffsets=[1, 2, 3])
raises
ValueError: lineoffsets and positions are unequal sized sequences
For reasons I can't figure out, because they clearly aren't.

If I understand correctly you want to plot 3 lines, at different starting heights (offsets). The way this works with plt.eventplot is as follows:
import numpy as np
import matplotlib.pyplot as plt
positions = np.array([1, 2, 3])[:,np.newaxis] # or np.array([[1], [2], [3]])
offsets = [1,2,3]
plt.eventplot(positions, lineoffsets=offsets)
plt.show()
You have to set the offset for each group of data you want to plot. In your case, you have to divide the list into a 3D array (shape (m,n) with m the number of datasets, and n number of data points per set). This way plt.eventplot knows it has to use the different offsets for each group of data. Also see this example.

Related

Ptyhon: how to compute mean and percentiles over a list of arrays avoiding zeros

I am working with satellite images that are geotiff files in form of arrays. Let say that I have multiple images with blank spaces (because of clouds or other elements). However I am collecting those arrays in a list.
from rasterio.plot import show
LST = [array1, array2]
f,ax=plt.subplots(1,2, figsize=[20,20])
show(lst_paris2, cmap='hot_r', vmin=vmin, ax=ax[0])
show(lst_paris3, cmap='hot_r', vmin=vmin1, ax=ax[1])
I would like to compute, for each pixel (i.e. cell i,j of the array) the mean as numpy.mean(LST)and the percentiles as numpy.percentile(LIST, [5,50,95]) avoiding the zeros values.
Your LST variable seems to be a list of two lists/arrays. It would help if you use np.hstack to make a single np.array from those two lists. Then you can make calculations like #midtownguru stated in comments.
array1 = [1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10]
array2 = [15,16,17,18,19]
# Stack two arrays to make a single np.array
LST = np.hstack([array1, array2])
print(LST.shape)
>>> (24,)
# Now you can calculate mean, percentile etc. without 0's
np.mean(LST[LST != 0])
>>> 5.5
np.percentile(LST[LST != 0], [5, 85])
>>> array([1.45, 8.65])
You can use .nonzero() method with boolean indexing
import numpy as np
lst = np.asarray(LST)
np.percentile(lst[lst.nonzero()], [5, 50, 95])

Plot 3rd axis of a 3D numpy array

I have a 3D numpy array that is a stack of 2D (m,n) images at certain timestamps, t. So my array is of shape (t, m, n). I want to plot the value of one of the pixels as a function of time.
e.g.:
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in xrange(10):
a = np.random(100,100)
data_cube.append(a)
So my (t, m, n) now has shape (10,100,100). Say I wanted a 1D plot the value of index [12][12] at each of the 10 steps I would do:
plt.plot(data_cube[:][12][12])
plt.show()
But I'm getting index out of range errors. I thought I might have my indices mixed up, but every plot I generate seems to be in the 'wrong' axis, i.e. across one of the 2D arrays, but instead I want it 'through' the vertical stack. Thanks in advance!
Here is the solution: Since you are already using numpy, convert you final list to an array and just use slicing. The problem in your case was two-fold:
First: Your final data_cube was not an array. For a list, you will have to iterate over the values
Second: Slicing was incorrect.
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in range(10):
a = np.random.rand(100,100)
data_cube.append(a)
data_cube = np.array(data_cube) # Added this step
plt.plot(data_cube[:,12,12]) # Modified the slicing
Output
A less verbose version that avoids iteration:
data_cube = np.random.rand(10, 100,100)
plt.plot(data_cube[:,12,12])

How do I make a scatter plot with these data?

I am trying to make a 2D representation of a 3D data in matplotlib.
I have some data files, for example:
a_1.dat
a_2.dat
a_3.dat
b_1.dat
b_2.dat
b_3.dat
From each data file I can extract the letter, the number, and a parameter associated with the letter-number pair.
I am trying to make a scatter plot where one axis is the range of letters, another axis is the range of numbers, and the scattered points represent the magnitude of the parameter associated with each letter-number pair. I would prefer is this was a 2D plot with a colorbar of some kind, as opposed to a 3D plot.
At this point, I can make a stack of 2d numpy arrays, where each 2d array looks something like
[a 1 val_a1
a 2 val_a2
a 3 val_a3]
[b 1 val_b1
b 2 val_b2
b 3 val_b3]
First question: Is this the best way to store the data for the plot I am trying to make?
Second question: How do I make the plot using python (I am most familiar with matplotlib pyplot)?
To be able to fully determine if your way of storing data is correct, you should consider how you use it. If you're using it only want to use it for plotting as described here, then for the sake of the simplicity you can just use three 1D arrays. If, however, you wish to achieve tighter structure, you might consider using a 2D array with custom dtype.
Having this in mind, you can easily create a 2D scatter plot with different colors, where exact color is determined by the value associated with each pair (letter, number).
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import cm
# You might note that in this simple case using numpy for creating array
# was actually unnecessary as simple lists would suffice
letters = np.array(['a', 'a', 'a', 'b', 'b', 'b'])
numbers = np.array([1, 2, 3, 1, 2, 3])
values = np.array([1, 2, 3, 1.5, 3.5, 4.5])
items = len(letters)
# x and y should be numbers, so we first feed it some integers
# Parameter c defines color values and cmap defines color mappings
plt.scatter(xrange(items), numbers, c=values, cmap=cm.jet)
# Now that data is created, we can re-set xticks
plt.xticks(xrange(items), letters)
Hopefully, this should be enough for a good start.

scipy.ndimage.filters.convolve - different modes along different axes?

Several of the functions in scipy.ndimage.filters, including scipy.ndimage.filters.convolve, have a "mode" parameter that defines how it behaves at the boundaries. mode='constant' uses a constant value for points beyond the boundaries, while mode='wrap' wraps around. This applies to all axes.
I want to do a convolution on a 2d array (for example) so that:
Points with axis 0 outside the boundaries wrap around
Points with axis 1 outside the boundaries are constant
What's the most efficient way to do this?
I could use mode='wrap' and add some dead space at the end of the axis i want to be constant:
import numpy
from scipy import misc, ndimage
lena = misc.lena()
image = numpy.vstack((lena, numpy.zeros(lena.shape[1])))
weights = numpy.array([[1, 1, 1],
[1, 8, 1],
[1, 1, 1]])/16.
convimage = ndimage.convolve(image, weights, mode='wrap')[0:lena.shape[1],]

Uniform Random Numbers

I am trying to understand what this code does. I am going through some examples about numpy and plotting and I can't figure out what u and v are. I know u is an array of two arrays each with size 10000. What does v=u.max(axis=0) do? Is the max function being invoked part of the standard python library? When I plot the histogram I get a pdf defined by 2x as opposed to a normal uniform distribution.
import numpy as np
import numpy.random as rand
import matplotlib.pyplot as plt
np.random.seed(123)
u=rand.uniform(0,1,[2,10000])
v=u.max(axis=0)
plt.figure()
plt.hist(v,100,normed=1,color='blue')
plt.ylim([0,2])
plt.show()
u.max(), or equivalently np.max(u), will give you the maximum value in the array - i.e. a single value. It's the Numpy function here, not part of the standard library. You often want to find the maximum value along a particular axis/dimension and that's what is happening here.
U has shape (2,10000), and u.max(axis=0) gives you the max along the 0 axis, returning an array with shape (10000,). If you did u.max(axis=1) you would get an array with shape (2,).
Simple illustration/example:
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.max(axis=0)
array([3, 4])
>>> a.max(axis=1)
array([2, 4])
>>> a.max()
4
first three lines you load in different modules (libraries that are relied apon in the rest of the code). you load numpy which is a numerical library, numpy.random which is a library that does a lot of great work to create random numbers and matplotlib allows for plotting functions.
the rest is described here:
np.random.seed(123)
A computer does not really generate a random number rather picks a number from a long list of numbers (for a more correct explanation of how this is done http://en.wikipedia.org/wiki/Random_number_generation). In essence if you want to reproduce the work with the same random numbers the computer needs to know where in this list of numbers to start picking numbers. This is what this line of code does. If anybody else runs the same piece of code now you end up with the same 'random' numbers.
u=rand.uniform(0,1,[2,10000])
This generates 10000 random numbers twice that are distributed between 0 and 1. This is uniform distribution so it is equally likely to get any point between 0 and 1. (Again more information can be found here: http://en.wikipedia.org/wiki/Uniform_distribution_(continuous) ). You are creating two arrays within an array. This can be checked by doing: len(u) and len(u[0]).
v=u.max(axis=0)
The u.max? command in iPython refers you to the docs. It is basically select a max and the axis determines how the max is chosen. Try the following:
a = np.arange(4).reshape((2,2))
np.amax(a, axis=0) # gives array([2, 3])
np.amax(a, axis=1) # gives array([1, 3])
The rest of the code is meant to set the histogram plot. There are 100 bins in total in the histogram and the bars will be colored blue. The maximum height on the histogram y-axis is 2 and normed will guarantee that at least one sample will be in every bin.
I can't clearly make up what the true purpose or application of the code was. But this is en essence what it is doing.

Categories