Why is numpy.ravel() required in this code that produces small multiples? - python

I found some code to generate a set of small multiples and it is working perfectly.
fig, axes = plt.subplots(6,3, figsize=(21,21))
fig.subplots_adjust(hspace=.3, wspace=.175)
for ax, data in zip(axes.ravel(), clean_sets):
ax.plot(data.ETo, "o")
The line for ax, data in zip(axes.ravel(), clean_sets): contians .ravel() but I do not understand what this is actually doing or why it is necessary.
If I take a look at the docs I find the following:
Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
I guess the return that corresponds to axes from plt.subplot() is a multidimensional array that can't be iterated over, but really I'm not sure. A simple explanation would be greatly appreciated.
What is the purpose of using .ravel() in this case?

Your guess is correct. plt.subplots() returns either an Axes or a numpy array of several axes, depending on the input. In case a 2D grid is defined by the arguments nrows and ncols, the returned numpy array will be a 2D array as well.
This behaviour is explained in the pyplot.subplots documentation inside the squeeze argument,
squeeze : bool, optional, default: True
If True, extra dimensions are squeezed out from the returned Axes object:
if only one subplot is constructed (nrows=ncols=1), the resulting single Axes object is returned as a scalar.
for Nx1 or 1xN subplots, the returned object is a 1D numpy object array of Axes objects are returned as numpy 1D arrays.
for NxM, subplots with N>1 and M>1 are returned as a 2D arrays.
If False, no squeezing at all is done: the returned Axes object is always a 2D array containing Axes instances, even if it ends up being 1x1.
Since here you have plt.subplots(6,3) and hence N>1, M>1, the resulting object is necessarily a 2D array, independent of what squeeze is set to.
This makes it necessary to flatten this array in order to be able to zip it. Options are
zip(axes.ravel())
zip(axes.flatten())
zip(axes.flat)

Related

1D plots from 3D array

I have a 3D data cube and I am trying to make a plot of the first axis at a specific value of the other two axes. The goal is to make a velocity plot at given coordinates in the sky.
I have tried to create an 1D array from the 3D array by putting in my values for the last two axes. This is what I have tried
achan=50
dchan = 200
lmcdata[:][achan][dchan] #this array has three axes, vchan, achan, dchan.
I am expecting an array of size 120 as there are 120 velocity channels that make up the vchan axis. When trying the code above I keep getting an array of size 655 which is the number of entries for the dchan axis.
Python slicing works from left to right. In this case, lmcdata[:] is returning the whole lmcdata list. So, lmcdata[:][achan][dchan] is equivalent to just lmcdata[achan][dchan].
For higher level indexing and slicing tasks like this, I highly recommend the numpy package. You will be able to slice lmcdata as expected after turning it into a numpy array: lmcdata = np.asarray(lmcdata).

Using np.ravel to specify yerr in errorbar plot

My code generates values and corresponding standard deviations in sets of 3, i.e. 3x1 arrays. I want to plot them all together as a categorical errorbar plot. For specifying the yerr, since it only accepts scalar or (N,) or N x 2, I used np.ravel to convert all the 3x1 arrays to one single N x 1 array. But I still get the error ValueError: err must be [ scalar | N, Nx1 or 2xN array-like ]
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
names_p=['p1','p1','p1','p2','p2','p2','p3','p3','p3','p4','p4','p4','p5','p5','p5','p6','p6','p6'] #### The names are repeated three times because for each variable I have three values
y=(p1sdm2N_ratem,p2sdm2N_ratem,p3sdm2N_ratem,p4sdm2N_ratem,p5sdm2N_ratem,p6sdm2N_ratem) #### each of these 6 elements is 3 x 1 E.g. p1sdm2N_ratem=(0.04,0.02,0.03)
c=np.ravel((p1sdm2N_ratestd,p2sdm2N_ratestd,p3sdm2N_ratestd,p4sdm2N_ratestd,p5sdm2N_ratestd,p6sdm2N_ratestd)) ### each of these 6 elements is 3x1 e.g. p1sdm2N_ratestd=(0.001,0.003,0.001)
plt. errorbar(names_p,y,yerr=c)
This gives the error I mentioned before, even though c is an 18x1 array. (It's not an array of an array, I checked.)
Note, with the way I've set up my variables,
plt.scatter(names_p,y)
and
plt. errorbar(names_p,y,yerr=None)
work, but without the errorbars, of course.
I'd appreciate any help!

Sum of all slices along given axis of a numpy array

I have a numpy array of shape (3,12,7,5). I would like to have the sum of all slices along the first axis of this array.
data = np.random.randint(low=0, high=8000, size=3*12*7*5).reshape(3,12,7,5)
data[0,...].sum()
data[1,...].sum()
data[2,...].sum()
np.array((data[0,...].sum(), data[1,...].sum(), data[2,...].sum()))
First, I thought this should be possible using np.sum(data, axis=...) but it is not.
How do I perform this calculation in a single shot. What is the numpy magic?
For a generic ndarray, you could reshape into a 2D array, keeping the number of elements along the first axis same and merging all of the remaining axes as the second axis and finally sum along that axis, like so -
data.reshape(data.shape[0],-1).sum(axis=1)
For a 4D array, you could include the axes along which the summation is to be performed. So, to solve our case, we would have -
data.sum(axis=(1,2,3))
This could be extended to make it work for generic ndarrays by creating a tuple of appropriate axis IDs and thus avoid reshaping, like so -
data.sum(axis=tuple(np.arange(1,data.ndim)))

Concatenating numpy arrays of different shapes

I have several N-dimensional arrays of different shapes and want to combine them into a new (N+1)-dimensional array, where the new axis has a length corresponding to the number of initial N-d arrays.
This answer is sufficient if the original arrays are all the same shape; however, it does not work if they have different shapes.
I don't really want to reshape the arrays to a congruent size and fill with empty elements due to the subsequent analysis I need to perform on the final array.
Specifically, I have four 4D arrays. One of the things I want to do with the resulting 5D array is plot parts of the four arrays on the same matplotlib figure. Obviously I could plot each one separately, however soon I will have more than four 4D arrays and am looking for a dynamic solution.
While I was writing this, Sven gave the same answer in the comments...
Put the arrays in a python list in the following manner:
5d_list = []
5d_list.append(4D_array_1)
5d_list.append(4D_array_2)
...
Then you can unpack them:
for 4d_array in 5d_list:
#plot 4d array on figure

matplotlib.pyplot.hist returns a histogram where all bins have the same value when I have varying data

I am trying to create a histogram in python using matplotlib.pyplot.hist.
I have an array of data that varies, however when put my code into python the histogram is returned with values in all bins equal to each other, or equal to zero which is not correct.
The histogram should look the the line graph above it with bins roughly the same height and in the same shape as the graph above.
The line graph above the histogram is there to illustrate what my data looks like and to show that my data does vary.
My data array is called spectrumnoise and is just a function I have created against an array x
x=np.arange[0.1,20.1,0.1]
The code I am using to create the histogram and the line graph above it is
import matplotlib.pylot as mpl
mpl.plot(x,spectrumnoise)
mpl.hist(spectrumnoise,bins=50,histtype='step')
mpl.show()
I have also tried using
mpl.hist((x,spectrumnoise),bins=50,histtype=step)
I have also changed the number of bins countless times to see if that helps an normalising the histogram function but nothing works.
Image of the output of the code can be seen here
The problem is that spectrumnoise is a list of arrays, not a numpy.ndarray. When you hand hist a list of arrays as its first argument, it treats each element as a separate dataset to plot. All the bins have the same height because each 'dataset' in the list has only one value in it!
From the hist docstring:
Multiple data can be provided via x as a list of datasets
of potentially different length ([x0, x1, ...]), or as
a 2-D ndarray in which each column is a dataset.
Try converting spectrumnoise to a 1D array:
pp.hist(np.vstack(spectrumnoise),50)
As an aside, looking at your code there's absolutely no reason to convert your data to lists in the first place. What you ought to do is operate directly on slices in your array, e.g.:
data[20:40] += y1

Categories