contour plot with mutiplile files - python

I have a sequence of data files which contain two columns of data (x value, and z value). I want to asign each file with a unique constant y value with a loop and then use x,y,z values to make a contour plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files=glob.glob('C:\Users\DDT\Desktop\DATA TIANYU\materials\AB2O4\synchronchron\OX1\YbFe1Mn1O4_2cyc_600_meth_ox1-*.xye')
s1=1
for file in files:
t1=s1/3
x,z = np.loadtxt(file,skiprows=3,unpack=True, usecols=[0,1])
def f(x, y):
return x*0 +y*0 +z
l1=np.size(x)
y=np.full(l1, t1,dtype=int)
X,Y=np.meshgrid(x,y)
Z = f(X,Y)
plt.contour(X,Y,Z)
s1=s1+1
continue
plt.show()
There is no error in this code, however what I got is an empty figure with nothing.
What mistake did I make?

It is very hard to guess what you're trying to do. Here is an attempt. It supposes that all x-arrays are equal. And that the y really makes sense (although that is hard if the files are read in an unspecified order). To get a useful plot, the data from all the files should be collected before starting to plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files = glob.glob('........')
zs = []
for file in files:
x, z = np.loadtxt(file, skiprows=3, unpack=True, usecols=[0, 1])
zs.append(z)
# without creating a new x, the x from the last file will be used
# x = np.linspace(0, 15, 10)
y = np.linspace(-100, 1000, len(zs))
zs = np.array(zs)
fig, axs = plt.subplots(ncols=2)
axs[0].scatter(np.tile(x, y.size), np.repeat(y, x.size), c=zs)
axs[1].contour(x, y, zs)
plt.show()
With simulated random data, the scatter plot and the contour plot would look like:

Related

3D plot of Excel data

I'm trying to recreate this plot using some of my own excel data but I've hit a wall. So far I have:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_excel(r'/path/to/data.xlsx')
yr = df['Year']
jd = df['Jday']
dc = df['Discharge']
x = np.asarray(yr)
y = np.asarray(jd)
z = np.asarray(dc)
X,Y,Z = np.meshgrid(x,y,z)
ax = plt.figure().add_subplot(projection='3d')
ax.plot_surface(X,Y,Z, cmap='autumn')
ax.set_xlabel("Year")
ax.set_ylabel("Jday")
ax.set_zlabel("Discharge")
plt.show()
But when I run this I get:
Traceback (most recent call last):
File "/Users/Desktop/main.py", line 19, in <module>
ax.plot_surface(X,Y,Z, cmap='autumn')
File "/Users/venv/lib/python3.10/site-packages/matplotlib/_api/deprecation.py", line 412, in wrapper
return func(*inner_args, **inner_kwargs)
File "/Users/venv/lib/python3.10/site-packages/mpl_toolkits/mplot3d/axes3d.py", line 1581, in plot_surface
raise ValueError("Argument Z must be 2-dimensional.")
ValueError: Argument Z must be 2-dimensional.
Any help would be appreciated.
EDIT:
I changed my code to:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_excel(r'/path/to/data.xlsx')
yr = df['Year']
jd = df['Jday']
dc = df['Discharge']
X = np.asarray(yr).reshape(-1,2)
Y = np.asarray(jd).reshape(-1,2)
Z = np.asarray(dc).reshape(-1,2)
fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection='3d')
my_cmap = plt.get_cmap('seismic')
surf = ax.plot_surface(X,Y,Z,
cmap = my_cmap,
edgecolor = 'none')
fig.colorbar(surf, ax=ax,
shrink = 0.5, aspect = 5)
plt.show()
When I run this it produces the following plot:
Which obviously doesn't match the other plot. It seems to be plotting the data from each year in a single line instead of creating filled in polygons which is what I think it's supposed to do. I have a feeling this issue has to do with the .reshape function but I'm not entirely sure.
Note: original answer completely rewritten!
The problem is, as your data stated, that the Z-argument must be two-dimensional. In your problem, you don't need np.meshgrid at all. This is typically used to make a 'grid' of all possible combinations of X/Y, after which you can use these combinations to calculate your response matrix Z. However, since all your data is read in, it is merely a reshaping of all 1d-arrays to 2d-arrays:
target_shape = (np.sqrt(X.shape[0]),-1)
X = np.reshape(X, target_shape)
Y = np.reshape(Y, target_shape)
Z = np.reshape(Z, target_shape)
Have a look at the documentation of np.reshape for some more information.

Heatmap from 3D-data, with float-numbers

I am trying to generate a heatmap from 3D-data in a csv-file. The csv-file has the format x,y,z for each line. The problem is when I create a array to link the values, I can't use float-numbers as keys. When setting the dtype to int in np.loadtext(), the code works fine; but this makes the resolution only half of what the csv-file can replicate. Is there another way of linking the values?
The code so far is:
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
fname = 'test18.csv'
x, y, z = np.loadtxt(fname, delimiter=',', dtype=float).T
pltZ = np.zeros((y.max()+1, x.max()+1), dtype=float)
pltZ[y, x] = z
heat_map = sb.heatmap(pltZ, cmap=plt.cm.rainbow)
plt.show()

How to use pandas with matplotlib to create 3D plots

I am struggling a bit with the pandas transformations needed to make data render in 3D on matplot lib. The data I have is usually in columns of numbers (usually time and some value). So lets create some test data to illustrate.
import pandas as pd
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# create the data and coords
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
Zinverse = list(map(lambda d:1 if d == '.' else -int(d), pattern))
Xdata = [x for y in range(1,10) for x in range(1,10)]
Ydata = [y for y in range(1,10) for x in range(1,10)]
# pivot the data into columns
data = [d for d in zip(Xdata,Ydata,Zdata,Zinverse)]
# create the data frame
df = pd.DataFrame(data, columns=['X','Y','Z',"Zi"], index=zip(Xdata,Ydata))
df.head(5)
Edit: This block of data is demo data that would normally come from a query on a
database that may need more cleaning and transforms before plotting. In this case data is already aligned and there are no problems aside having one more column we don't need (Zi).
So the numbers in pattern are transferred into height data in the Z column of df ('Zi' being the inverse image) and with that as the data frame I've struggled to come up with this pivot method which is 3 separate operations. I wonder if that can be better.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Xs = df.pivot(index='X', columns='Y', values='X').values
Ys = df.pivot(index='X', columns='Y', values='Y').values
Zs = df.pivot(index='X', columns='Y', values='Z').values
ax.plot_surface(Xs,Ys,Zs, cmap=cm.RdYlGn)
plt.show()
Although I have something working I feel there must be a better way than what I'm doing. On a big data set I would imagine doing 3 pivots is an expensive way to plot something. Is there a more efficient way to transform this data ?
I guess you can avoid some steps during the preparation of the data by not using pandas (but only numpy arrays) and by using some convenience fonctions provided by numpy such as linespace and meshgrid.
I rewrote your code to do so, trying to keep the same logic and the same variable names :
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# Extract the value according to your logic
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
# Assuming the pattern is always a square
size = int(len(Zdata) ** 0.5)
# Create a mesh grid for plotting the surface
Xdata = np.linspace(1, size, size)
Ydata = np.linspace(1, size, size)
Xs, Ys = np.meshgrid(Xdata, Ydata)
# Convert the Zdata to a numpy array with the appropriate shape
Zs = np.array(Zdata).reshape((size, size))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the surface
ax.plot_surface(Xs, Ys, Zs, cmap=cm.RdYlGn)
plt.show()

How to adjust branch lengths of dendrogram in matplotlib (like in astrodendro)? [Python]

Here is my resulting plot below but I would like it to look like the truncated dendrograms in astrodendro such as this:
There is also a really cool looking dendrogram from this paper that I would like to recreate in matplotlib.
Below is the code for generating an iris data set with noise variables and plotting the dendrogram in matplotlib.
Does anyone know how to either: (1) truncate the branches like in the example figures; and/or (2) to use astrodendro with a custom linkage matrix and labels?
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance
def iris_data(noise=None, palette="hls", desat=1):
# Iris dataset
X = pd.DataFrame(load_iris().data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])
y = pd.Series(load_iris().target,
index = X.index,
name = "Species")
c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])
if noise is not None:
X_noise = pd.DataFrame(
np.random.RandomState(0).normal(size=(X.shape[0], noise)),
index=X_iris.index,
columns=[*map(lambda x:f"noise_{x}", range(noise))]
)
X = pd.concat([X, X_noise], axis=1)
return (X, y, c)
def dism2linkage(DF_dism, method="ward"):
"""
Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
Output: Hierarchical clustering encoded as a linkage matrix
Further reading:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
https://pypi.python.org/pypi/fastcluster
"""
#Linkage Matrix
Ar_dist = distance.squareform(DF_dism.as_matrix())
return linkage(Ar_dist,method=method)
# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)
#Create dendrogram
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots(figsize=(13,3))
D_dendro = dendrogram(
Z,
labels=df_dism.index,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
ax=ax
)
ax.set_ylabel("Distance")
I'm not sure this really constitutes a practical answer, but it does allow you to generate dendrograms with truncated hanging lines. The trick is to generate the plot as normal, then manipulate the resulting matplotlib plot to recreate the lines.
I couldn't get your example to work locally, so I've just created a dummy dataset.
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
a = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
b = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
X = np.concatenate((a, b),)
Z = linkage(X, 'ward')
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax)
The resulting plot is the usual long-arm dendrogram.
Now for the more interesting bit. A dendrogram is made up of a number of LineCollection objects (one for each colour). To update the lines we iterate through these, extracting the details about their constituent paths, modifying these to remove any lines reaching to a y of zero, and then recreating a LineCollection for these modified paths.
The updated path is then added to the axes, and the original is removed.
The one tricky part is determining what height to draw to instead of zero. Since we are iterating over each dendrograms path, we don't know which point came before — we basically have no idea where we are. However, we can exploit the fact that hanging lines hang vertically. Assuming there are no lines on the same x, we can look for the known other y values for a given x and use that as the basis for our new y when calculating. The downside is that in order to make sure we have this number, we have to pre-scan the data.
Note: If you can get dendrogram hanging lines on the same x, you would need to include the y and search for nearest y above this x to do this.
import numpy as np
from matplotlib.path import Path
from matplotlib.collections import LineCollection
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax);
for c in ax.collections[:]: # use [:] to get a copy, since we're adding to the same list
paths = []
for path in c.get_paths():
segments = []
y_at_x = {}
# Pre-pass over all elements, to find the lowest y value at each x value.
# we can use this to caculate where to cut our lines.
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
# Don't store if the y is zero, or if it's higher than the current low.
if y > 0 and y < y_at_x.get(x, np.inf):
y_at_x[x] = y
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
if y == 0:
# If we know the last y at this x, use it - 0.5, limit > 0
y = max(0, y_at_x.get(x, 0) - 0.5)
segments.append([x,y])
paths.append(segments)
lc = LineCollection(paths, colors=c.get_colors()) # Recreate a LineCollection with the same params
ax.add_collection(lc)
ax.collections.remove(c) # Remove the original LineCollection
The resulting dendrogram looks like this:

Contourf plotting from spreadsheet columns in python

I want to plot a coloured contour graph with x,y,z from 3 columns of a comma delimited text file, but each time I try the code below, I get ValueError: too many values to unpack (expected 3) error. I would be grateful if that could be resolved.
I would also like to know if there is another (probably better) code for plotting the 3 independent columns.
This is the code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import scipy.interpolate
N = 100000
long_col, lat_col, Bouguer_col = np.genfromtxt(r'data.txt', unpack=True)
xi = np.linspace(long_col.min(), long_col.max(), N)
yi = np.linspace(lat_col.min(), lat_col.max(), N)
zi = scipy.interpolate.griddata((long_col, lat_col), Bouguer_col, (xi[None,:], yi[:,None]), method='cubic')
fig = plt.figure()
plt.contourf(xi, yi, zi)
plt.xlabel("Long")
plt.ylabel("Lat")
plt.show()
This is the 'data.txt' sample data.
Lat, Long, Elev, ObsGrav, Anomalies
6.671482000000001022e+00,7.372505999999999560e+00,3.612977999999999952e+02,9.780274000000000233e+05,-1.484474523360840976e+02
6.093078000000000216e+00,7.480882000000001142e+00,1.599972999999999956e+02,9.780334000000000233e+05,-1.492942383352201432e+02
6.092045999999999850e+00,7.278669999999999973e+00,1.462445999999999913e+02,9.780663000000000466e+05,-1.190960417173337191e+02
6.402087429999999912e+00,7.393360939999999992e+00,5.237939999999999827e+02,9.780468000000000466e+05,-8.033459449396468699e+01
6.264082730000000154e+00,7.518244540000000420e+00,2.990849999999999795e+02,9.780529000000000233e+05,-1.114865156192099676e+02
6.092975000000000030e+00,7.482914000000000065e+00,1.416474000000000046e+02,9.780338000000000466e+05,-1.525697779102483764e+02
6.383570999999999884e+00,7.289616999999999791e+00,2.590403000000000020e+02,9.780963000000000466e+05,-8.300666170357726514e+01
6.318417000000000172e+00,7.557638000000000744e+00,1.672036999999999978e+02,9.780693000000000466e+05,-1.246774551668204367e+02
6.253779999999999895e+00,7.268805999999999656e+00,1.059429999999999978e+02,9.781026999999999534e+05,-9.986763240839354694e+01
6.384635000000000282e+00,7.291032000000000401e+00,2.615624000000000251e+02,9.780963000000000466e+05,-8.256190758384764194e+01
If the data file looks exactly like in the question you first of all have 5 columns, which you cannot unpack to 3 variables.
Next, you have a header line which you do not want to be part of the data. Also the header line is separated by ,<space>, while the data is separated by ,.
So in total you need
import numpy as np
a,b,c,d,e = np.genfromtxt("data.txt", unpack=True, delimiter=",", skip_header=1)

Categories