How to make hexbin plots from a data file using seaborn? - python

I'm pretty new to using matplotlib and seaborn, and I couldn't really find any "for dummies" guides on how to do this. I keep getting error messages trying to use code from the guides I can find. I guess I'm having difficulty taking their pieces of code and knowing how to apply it to my problem.
I'd like to make a plot like the ones here: 1 and 2. I have a data file with two columns of data ranging from -180 to 180.
This is my attempt at the code:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import kendalltau
sns.set(style="ticks")
f2 = open("dihs23")
lines = f2.readlines()
f2.close()
x = []
y = []
for line in lines:
p = line.split()
x.append(float(p[0]))
y.append(float(p[1]))
sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")
sns.plt.show()
Which returns the error
Traceback (most recent call last):
File "heatmap.py", line 30, in <module>
sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")
File "/usr/local/lib/python2.7/dist-packages/seaborn/distributions.py", line 973, in jointplot
xlim=xlim, ylim=ylim)
File "/usr/local/lib/python2.7/dist-packages/seaborn/axisgrid.py", line 1133, in __init__
x = x[not_na]
TypeError: only integer arrays with one element can be converted to an index
I'm guessing there's some aspect to the format of the data that is part of the problem, but I'm not sure how to fix it.
Thank you for the help!

Try transforming your lists to an array with NumPy
x_axis = np.asarray(x)
y_axis = np.asarray(y)

Related

3D plot of Excel data

I'm trying to recreate this plot using some of my own excel data but I've hit a wall. So far I have:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_excel(r'/path/to/data.xlsx')
yr = df['Year']
jd = df['Jday']
dc = df['Discharge']
x = np.asarray(yr)
y = np.asarray(jd)
z = np.asarray(dc)
X,Y,Z = np.meshgrid(x,y,z)
ax = plt.figure().add_subplot(projection='3d')
ax.plot_surface(X,Y,Z, cmap='autumn')
ax.set_xlabel("Year")
ax.set_ylabel("Jday")
ax.set_zlabel("Discharge")
plt.show()
But when I run this I get:
Traceback (most recent call last):
File "/Users/Desktop/main.py", line 19, in <module>
ax.plot_surface(X,Y,Z, cmap='autumn')
File "/Users/venv/lib/python3.10/site-packages/matplotlib/_api/deprecation.py", line 412, in wrapper
return func(*inner_args, **inner_kwargs)
File "/Users/venv/lib/python3.10/site-packages/mpl_toolkits/mplot3d/axes3d.py", line 1581, in plot_surface
raise ValueError("Argument Z must be 2-dimensional.")
ValueError: Argument Z must be 2-dimensional.
Any help would be appreciated.
EDIT:
I changed my code to:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_excel(r'/path/to/data.xlsx')
yr = df['Year']
jd = df['Jday']
dc = df['Discharge']
X = np.asarray(yr).reshape(-1,2)
Y = np.asarray(jd).reshape(-1,2)
Z = np.asarray(dc).reshape(-1,2)
fig = plt.figure(figsize=(14,8))
ax = plt.axes(projection='3d')
my_cmap = plt.get_cmap('seismic')
surf = ax.plot_surface(X,Y,Z,
cmap = my_cmap,
edgecolor = 'none')
fig.colorbar(surf, ax=ax,
shrink = 0.5, aspect = 5)
plt.show()
When I run this it produces the following plot:
Which obviously doesn't match the other plot. It seems to be plotting the data from each year in a single line instead of creating filled in polygons which is what I think it's supposed to do. I have a feeling this issue has to do with the .reshape function but I'm not entirely sure.
Note: original answer completely rewritten!
The problem is, as your data stated, that the Z-argument must be two-dimensional. In your problem, you don't need np.meshgrid at all. This is typically used to make a 'grid' of all possible combinations of X/Y, after which you can use these combinations to calculate your response matrix Z. However, since all your data is read in, it is merely a reshaping of all 1d-arrays to 2d-arrays:
target_shape = (np.sqrt(X.shape[0]),-1)
X = np.reshape(X, target_shape)
Y = np.reshape(Y, target_shape)
Z = np.reshape(Z, target_shape)
Have a look at the documentation of np.reshape for some more information.

Iterating over imshow and plotting multiple 2D maps with common colorbar using matplotlib

I have 4 input data files. I am trying to plot 2D-contour maps using by reading data from these input files with a common colorbar. I have taken inspiration from the following answers :
1) How can I create a standard colorbar for a series of plots in python
2)Matplotlib 2 Subplots, 1 Colorbar
Code :
import numpy as np
import matplotlib.pyplot as plt
#Reading data from input files
dat1 = np.genfromtxt('curmapdown.dat', delimiter=' ')
dat2 = np.genfromtxt('curmapup.dat', delimiter=' ')
dat3 = np.genfromtxt('../../../zika/zika1/CalculateCurvature/curmapdown.dat', delimiter=' ')
dat4 = np.genfromtxt('../../../zika/zika1/CalculateCurvature/curmapup.dat', delimiter=' ')
data=[]
for i in range(1,5):
data.append('dat%d'%i)
fig, axes = plt.subplots(nrows=2, ncols=2)
# Error comes from this part
for ax,dat in zip(axes.flat,data):
im = ax.imshow(dat, vmin=0, vmax=1)
fig.colorbar(im, ax=axes.ravel().tolist())
plt.show()
Error :
Traceback (most recent call last):
File "2dmap.py", line 15, in <module>
im = ax.imshow(dat, vmin=0, vmax=1)
File "/usr/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1814, in inner
return func(ax, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/matplotlib/axes/_axes.py", line 4947, in imshow
im.set_data(X)
File "/usr/lib/python2.7/dist-packages/matplotlib/image.py", line 449, in set_data
raise TypeError("Image data can not convert to float")
TypeError: Image data can not convert to float
You are appending the string "dat1" into a list called data. When plotting this you are trying to convert this string to float, which obviously fails
plt.imshow("hello")
will recreate the error you are seeing.
You want the data itself which you have loaded into variables called dat1 etc. So you would want to remove the first for loop and do something like
data = [dat1, dat2, dat3, dat4]

How shall I draw "K" plot by matplotlib.finance for some special formate data as below?

The codes are as below:
import tushare as ts
import matplotlib.pyplot as plt
from matplotlib.finance import candlestick_ohlc as candle
stock=ts.get_hist_data('000581',ktype='w')
the data form of "stock" is as below picture:
enter image description here
Then the below codes:
vals=stock.iloc[:,0:4]
fig=plt.figure()
ax=fig.add_subplot(111)
candle(ax,vals)
I get error as below:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/anaconda3/lib/python3.5/site-packages/matplotlib/finance.py", line 735, in candlestick_ohlc
alpha=alpha, ochl=False)
File "/usr/local/anaconda3/lib/python3.5/site-packages/matplotlib/finance.py", line 783, in _candlestick
t, open, high, low, close = q[:5]
ValueError: not enough values to unpack (expected 5, got 4)
How shall I resove it?
candlestick needs a very specific format and order to work. For example, if you use _ohlc, then the order must be open-high-low-close. The array for the candlestick graph can be prepared as follows:
candleArray = []
while i < len(datep):
newLine = datep[i], openp[i], highp[i], lowp[i], closep[i], volumep[i], pricechange[i], pchange[i]
candleArray.append(newLine)
i += 1
Then, you can call candlestick with the array candleArray.

How do I generate a histogram from a list of values using matplotlib?

so I've been trying to plot a histogram using python with mathplotlib.
So I've got two datasets, basically the heights of a sample of men and women as a list in python, imported off a csv file.
The code that I'm using:
import csv
import numpy as np
from matplotlib import pyplot as plt
men=[]
women=[]
with open('women.csv','r') as f:
r1=csv.reader(f, delimiter=',')
for row in r1:
women+=[row[0]]
with open('men.csv','r') as f:
r2=csv.reader(f, delimiter=',')
for row in r2:
men+=[row[0]]
fig = plt.figure()
ax = fig.add_subplot(111)
numBins = 20
ax.hist(men,numBins,color='blue',alpha=0.8)
ax.hist(women,numBins,color='red',alpha=0.8)
plt.show()
and the error that I get:
Traceback (most recent call last):
File "//MEME/Users/Meme/Miniconda3/Lib/idlelib/test.py", line 22, in <module>
ax.hist(men,numBins,color='blue',alpha=0.8)
File "\\MEME\Users\Meme\Miniconda3\lib\site-packages\matplotlib\__init__.py", line 1811, in inner
return func(ax, *args, **kwargs)
File "\\MEME\Users\Meme\Miniconda3\lib\site-packages\matplotlib\axes\_axes.py", line 5983, in hist
raise ValueError("color kwarg must have one color per dataset")
ValueError: color kwarg must have one color per dataset
NOTE:assume your files contain multiple lines (comma separated) and the first entry in each line is the height.
The bug is when you append "data" into the women and men list. row[0] is actually a string. Hence matplotlib is confused. I suggest you run this code before plotting (python 2):
import csv
import numpy as np
from matplotlib import pyplot as plt
men=[]
women=[]
import pdb;
with open('women.csv','r') as f:
r1=csv.reader(f, delimiter=',')
for row in r1:
women+=[(row[0])]
with open('men.csv','r') as f:
r2=csv.reader(f, delimiter=',')
for row in r2:
men+=[(row[0])]
fig = plt.figure()
ax = fig.add_subplot(111)
print men
print women
#numBins = 20
#ax.hist(men,numBins,color='blue',alpha=0.8)
#ax.hist(women,numBins,color='red',alpha=0.8)
#plt.show()
A sample output will be
['1','3','3']
['2','3','1']
So in the loops, you just do a conversion from string into float or integers e.g. women += [float(row[0])] and men += [float(row[0])]

Plotting a contour plot in Python with Matplotlib with data with imshow

I am trying to plot a contour plot with Python's Matplotlib package. I'm trying to get similar results to what can be seen in this other stack overflow post. However, I'm getting the problem of it saying that there is a type error and it tells me TypeError: Invalid dimensions for image data, which can be seen in the full error code below.
Traceback (most recent call last):
File "./plot_3.py", line 68, in <module>
plt.imshow(zi, vmin=temp.min(), vmax=temp.max(), origin="lower", extent=[x.min(), x.max(), y.min(), y.max()])
File "/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 3022, in imshow
**kwargs)
File "/usr/lib64/python2.7/site-packages/matplotlib/__init__.py", line 1812, in inner
return func(ax, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/matplotlib/axes/_axes.py", line 4947, in imshow
im.set_data(X)
File "/usr/lib64/python2.7/site-packages/matplotlib/image.py", line 453, in set_data
raise TypeError("Invalid dimensions for image data")
TypeError: Invalid dimensions for image data
I'm unsure of what this means, as googling brought up no useful results on how to fix it. The code is listed below, and the data that I'm using can be found here. The code below simply runs the code which will parse the file and then return the data to the main where it's supposed to plot it then. To run the code, you have to use ./plot_3.py 20.0 to use it with the specific file that I posted above. x ranges from 0 to 0.3 with 61 grids, while y ranges from 0 to 0.4 with 81 grids. The data is in the format x,y,temperature where I want the temperature values to be the contour values.
from __future__ import print_function, division
import math
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sys
import matplotlib.cm as cm
from matplotlib.mlab import griddata
import scipy.interpolate
def ParseFile(filename):
x = []
y = []
temp = []
infile = open(filename, 'r')
lines = [line.strip() for line in infile.readlines()]
for line in lines:
x.append(float(line.split(',')[0]))
y.append(float(line.split(',')[1]))
temp.append(float(line.split(',')[2]))
return np.array(x), np.array(y), np.array(temp)
time = str(sys.argv[1])
filename = time + "_sec.dat"
x,y,temp = ParseFile(filename)
xi = np.linspace(min(x), max(x))
yi = np.linspace(min(y), max(y))
zi = scipy.interpolate.griddata((x,y),temp,(xi,yi),method="linear")
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
plt.imshow(zi, vmin=temp.min(), vmax=temp.max(), origin="lower",
extent=[x.min(), x.max(), y.min(), y.max()])
plt.colorbar()
plt.show()
I think the problem is that that you need to have the points to be interpolated in a gridded format, not two 1D matrices for the interpolate.griddata function.
Adding this line to the (xi, yi) declaration I think fixes your problem:
x,y,temp = ParseFile(filename)
xi = np.linspace(min(x), max(x))
yi = np.linspace(min(y), max(y))
#create the 2D grid for interpolation:
xi, yi = np.meshgrid(xi,yi)
zi = scipy.interpolate.griddata((x,y),temp,(xi,yi),method="linear")

Categories