Looping within matplotlib - python

I am trying to plot multiple graphs on a single set of axis.
I have a 2D array of data and want to break it down into 111 1D arrays and plot them. Here is an example of my code so far:
from numpy import *
import matplotlib.pyplot as plt
x = linspace(1, 130, 130) # create a 1D array of 130 integers to set as the x axis
y = Te25117.data # set 2D array of data as y
plt.plot(x, y[1], x, y[2], x, y[3])
This code works fine, but I cannot see a way of writing a loop which will loop within the plot itself. I can only make the code work if I explicitly write a number 1 to 111 each time, which is not ideal! (The range of numbers I need to loop over is 1 to 111.)

Let me guess...long time matlab user?
Matplotlib automatically add a line plot to the present plot if you don't create a new one. So your code can be simply:
from numpy import *
import matplotlib.pyplot as plt
x = linspace(1, 130, 130) # create a 1D array of 130 integers to set as the x axis
y = Te25117.data # set 2D array of data as y
L = len(y) # I assume you can infere the size of the data in this way...
#L = 111 # this is if you don't know any better
for i in range(L)
plt.plot(x, y[i], color='mycolor',linewidth=1)

import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2])
y = np.array([[1,2],[3,4]])
In [5]: x
Out[5]: array([1, 2])
In [6]: y
Out[6]:
array([[1, 2],
[3, 4]])
In [7]: for y_i in y:
....: plt.plot(x, y_i)
Will plot these in one figure.

Related

Is there a way to slice an x,y array diagonally?

I have a 3D array (time, y direction, x direction), and I want to split it up spatially. However, is there a way to slice a spatial array diagonally instead of just in y and x?
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
data = np.random.rand(100,45,60)
data_1 = data[:,0:30,0:30]
X,Y = np.meshgrid(np.arange(0,60,1),np.arange(0,45,1))
plt.contourf(X,Y,data[2])
plt.show()
plt.contourf(data_1[2])
plt.xlim(0,60)
plt.ylim(0,45)
plt.show()
first graph shows the contour plot if data, and then the data_1, but is there a way to slice it diagonally? For example, where the red line is.
By slicing I mean selecting only sections of the 3D data array in x and y direction. For example get only the data under the red arrow.
import numpy as np
from numpy import ma
import matplotlib.pyplot as plt
data = np.random.rand(5,45,60)
data1 = data[2,0:30,0:30]
x2, y2 = np.meshgrid(np.arange(0, 30, 1), np.arange(0, 30, 1))
data1 = ma.masked_where(x2 + y2 > 30, data1)
plt.contourf(x2, y2, data1)
plt.xlim(0,60)
plt.ylim(0,45)
plt.show()
I have used a masked array above, but it is also possible to use np.where instead and set values to np.NaN:
data1 = np.where(x2 + y2 > 30, np.NaN, data1)
Matplotlib will also not plot NaN values.
Setting values to NaN, however, will lose the original values, while a mask simply hides them (removing the mask will retrieve the original values). NaNs can also be tricky in comparisons. So a mask may be better.

y axis has decreasing values instead of increasing ones for plt

I am trying to build a histogram and here is my code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','38','40','41','42','43','44','45','48','50','51','53','54','57','60','64','70','77','93','104','108','147'] #sample names
y = ['164','189','288','444','311','216','122','111','92','54','45','31','31','30','18','15','15','10','4','15','2','8','6','4','7','5','3','3','1','10','3','3','3','2','4','2','1','1','1','2','2','1','1','1','1','1','2','1','2','2','2','1','1','2','1','1','1','1']
plt.bar(x, y)
plt.xlabel('Number of Methods')
plt.ylabel('Variables')
plt.show()
Here is the histogram I obtain:
I would like the values in the y axis to be in an increasing order. This means that 1 should be first followed by 3, 5, 7, etc. How can I fix this?
They're not decreasing, they're in the order in which they are in the list, because the list items are strings. Try
x = [int(i) for i in x]
y = [int(i) for i in y]
to convert them to numbers before plotting.

Removing datapoints outside interval for both axes of a plot

I am trying to plot some data using matplotlib.
import matplotlib.pyplot as plt
x_data = np.arange(0,100)
y_data = np.random.randint(11, size=(100,))
plt.plot(x_data, y_data)
plt.show
This, of course, works fine. However, I would like to remove the data that is outside a given interval (e.g. 4 < y_data < 6). For the y_data, this is done by
y_data_2 = [x for x in y_data if 4 <= x <= 6]
However, since the first dimensions are no longer equal, you are no longer able to plot y_data_2 vs. x_data. If you try to
plt.plot(x_data, y_data_2)
you will, of course, get an error stating that
ValueError: x and y must have same first dimension, but have shapes (100,) and (35,)
My question is thus twofold: is there a simple way for me to remove the equivalent datapoints in x_data? Also, is there a way I could find the indices of the points that are to be removed?
Thank you.
You can use masking together with indexing. Here you create a mask to capture values y values which lie between 4 and 6. You then apply this conditional mask to your x_data and y_data to get the corresponding values. This way you don't need any for loop or list comprehensions.
x_data = np.arange(0,100)
y_data = np.random.randint(11, size=(100,))
mask = (y_data>=4) & (y_data<=6)
plt.plot(x_data[mask], y_data[mask], 'bo')
First, you can get the index of y_data_2 in y_data, and then get the subarray x_data_2 of x_data. Then, plot the x_data_2, y_data_2.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x_data = np.arange(0,100)
y_data = np.random.randint(11, size=(100,))
y = pd.Series(y_data)
y_data_2 = [x for x in y_data if 4 <= x <= 6]
index = y[y.isin(y_data_2)].index
print(index)
x_data_2 = x_data[index]
plt.plot(x_data, y_data)
plt.scatter(x_data_2, y_data_2)
plt.show()

How to adjust branch lengths of dendrogram in matplotlib (like in astrodendro)? [Python]

Here is my resulting plot below but I would like it to look like the truncated dendrograms in astrodendro such as this:
There is also a really cool looking dendrogram from this paper that I would like to recreate in matplotlib.
Below is the code for generating an iris data set with noise variables and plotting the dendrogram in matplotlib.
Does anyone know how to either: (1) truncate the branches like in the example figures; and/or (2) to use astrodendro with a custom linkage matrix and labels?
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance
def iris_data(noise=None, palette="hls", desat=1):
# Iris dataset
X = pd.DataFrame(load_iris().data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])
y = pd.Series(load_iris().target,
index = X.index,
name = "Species")
c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])
if noise is not None:
X_noise = pd.DataFrame(
np.random.RandomState(0).normal(size=(X.shape[0], noise)),
index=X_iris.index,
columns=[*map(lambda x:f"noise_{x}", range(noise))]
)
X = pd.concat([X, X_noise], axis=1)
return (X, y, c)
def dism2linkage(DF_dism, method="ward"):
"""
Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
Output: Hierarchical clustering encoded as a linkage matrix
Further reading:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
https://pypi.python.org/pypi/fastcluster
"""
#Linkage Matrix
Ar_dist = distance.squareform(DF_dism.as_matrix())
return linkage(Ar_dist,method=method)
# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)
#Create dendrogram
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots(figsize=(13,3))
D_dendro = dendrogram(
Z,
labels=df_dism.index,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
ax=ax
)
ax.set_ylabel("Distance")
I'm not sure this really constitutes a practical answer, but it does allow you to generate dendrograms with truncated hanging lines. The trick is to generate the plot as normal, then manipulate the resulting matplotlib plot to recreate the lines.
I couldn't get your example to work locally, so I've just created a dummy dataset.
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
a = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
b = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
X = np.concatenate((a, b),)
Z = linkage(X, 'ward')
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax)
The resulting plot is the usual long-arm dendrogram.
Now for the more interesting bit. A dendrogram is made up of a number of LineCollection objects (one for each colour). To update the lines we iterate through these, extracting the details about their constituent paths, modifying these to remove any lines reaching to a y of zero, and then recreating a LineCollection for these modified paths.
The updated path is then added to the axes, and the original is removed.
The one tricky part is determining what height to draw to instead of zero. Since we are iterating over each dendrograms path, we don't know which point came before — we basically have no idea where we are. However, we can exploit the fact that hanging lines hang vertically. Assuming there are no lines on the same x, we can look for the known other y values for a given x and use that as the basis for our new y when calculating. The downside is that in order to make sure we have this number, we have to pre-scan the data.
Note: If you can get dendrogram hanging lines on the same x, you would need to include the y and search for nearest y above this x to do this.
import numpy as np
from matplotlib.path import Path
from matplotlib.collections import LineCollection
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax);
for c in ax.collections[:]: # use [:] to get a copy, since we're adding to the same list
paths = []
for path in c.get_paths():
segments = []
y_at_x = {}
# Pre-pass over all elements, to find the lowest y value at each x value.
# we can use this to caculate where to cut our lines.
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
# Don't store if the y is zero, or if it's higher than the current low.
if y > 0 and y < y_at_x.get(x, np.inf):
y_at_x[x] = y
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
if y == 0:
# If we know the last y at this x, use it - 0.5, limit > 0
y = max(0, y_at_x.get(x, 0) - 0.5)
segments.append([x,y])
paths.append(segments)
lc = LineCollection(paths, colors=c.get_colors()) # Recreate a LineCollection with the same params
ax.add_collection(lc)
ax.collections.remove(c) # Remove the original LineCollection
The resulting dendrogram looks like this:

What is the most efficient way to plot 3d array in Python?

What is the most efficient way to plot 3d array in Python?
For example:
volume = np.random.rand(512, 512, 512)
where array items represent grayscale color of each pixel.
The following code works too slow:
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca(projection='3d')
volume = np.random.rand(20, 20, 20)
for x in range(len(volume[:, 0, 0])):
for y in range(len(volume[0, :, 0])):
for z in range(len(volume[0, 0, :])):
ax.scatter(x, y, z, c = tuple([volume[x, y, z], volume[x, y, z], volume[x, y, z], 1]))
plt.show()
For better performance, avoid calling ax.scatter multiple times, if possible.
Instead, pack all the x,y,z coordinates and colors into 1D arrays (or
lists), then call ax.scatter once:
ax.scatter(x, y, z, c=volume.ravel())
The problem (in terms of both CPU time and memory) grows as size**3, where size is the side length of the cube.
Moreover, ax.scatter will try to render all size**3 points without regard to
the fact that most of those points are obscured by those on the outer
shell.
It would help to reduce the number of points in volume -- perhaps by
summarizing or resampling/interpolating it in some way -- before rendering it.
We can also reduce the CPU and memory required from O(size**3) to O(size**2)
by only plotting the outer shell:
import functools
import itertools as IT
import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def cartesian_product_broadcasted(*arrays):
"""
http://stackoverflow.com/a/11146645/190597 (senderle)
"""
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
dtype = np.result_type(*arrays)
rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)
out = np.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
# #profile # used with `python -m memory_profiler script.py` to measure memory usage
def main():
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
size = 512
volume = np.random.rand(size, size, size)
x, y, z = cartesian_product_broadcasted(*[np.arange(size, dtype='int16')]*3).T
mask = ((x == 0) | (x == size-1)
| (y == 0) | (y == size-1)
| (z == 0) | (z == size-1))
x = x[mask]
y = y[mask]
z = z[mask]
volume = volume.ravel()[mask]
ax.scatter(x, y, z, c=volume, cmap=plt.get_cmap('Greys'))
plt.show()
if __name__ == '__main__':
main()
But note that even when plotting only the outer shell, to achieve a plot with
size=512 we still need around 1.3 GiB of memory. Also beware that even if you have enough total memory but, due to a lack of RAM, the program uses swap space, then the overall speed of the program will
slow down dramatically. If you find yourself in this situation, then the only solution is to find a smarter way to render an acceptable image using fewer points, or to buy more RAM.
First, a dense grid of 512x512x512 points is way too much data to plot, not from a technical perspective but from being able to see anything useful from it when observing the plot. You probably need to extract some isosurfaces, look at slices, etc. If most of the points are invisible, then it's probably okay, but then you should ask ax.scatter to only show the nonzero points to make it faster.
That said, here's how you can do it much more quickly. The tricks are to eliminate all Python loops, including ones that would be hidden in libraries like itertools.
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
# Make this bigger to generate a dense grid.
N = 8
# Create some random data.
volume = np.random.rand(N, N, N)
# Create the x, y, and z coordinate arrays. We use
# numpy's broadcasting to do all the hard work for us.
# We could shorten this even more by using np.meshgrid.
x = np.arange(volume.shape[0])[:, None, None]
y = np.arange(volume.shape[1])[None, :, None]
z = np.arange(volume.shape[2])[None, None, :]
x, y, z = np.broadcast_arrays(x, y, z)
# Turn the volumetric data into an RGB array that's
# just grayscale. There might be better ways to make
# ax.scatter happy.
c = np.tile(volume.ravel()[:, None], [1, 3])
# Do the plotting in a single call.
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter(x.ravel(),
y.ravel(),
z.ravel(),
c=c)
A similar solution can be achieved with product from itertools:
from itertools import product
from matplotlib import pyplot as plt
N = 8
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(projection="3d")
space = np.array([*product(range(N), range(N), range(N))]) # all possible triplets of numbers from 0 to N-1
volume = np.random.rand(N, N, N) # generate random data
ax.scatter(space[:,0], space[:,1], space[:,2], c=space/8, s=volume*300)

Categories