Histograms in Python using matplotlib - python

I'm trying to make a histogram, and i've been doing some searches and trying to find the right code, but everything I try doesn't end up working. This is my code right now,
import matplotlib.pyplot as plt
import numpy as np
with open('gaubg.csv') as f:
v = np.loadtxt(f, delimiter= ',', dtype="float", skiprows=1, usecols='None')
plt.hist(v, bins=100)
plt.xlabel("G-r0")
plt.ylabel('# of stars')
plt.title("Bottom half g-r0")
plt.show()
gaubg.csv is a csv file that includes about 600,000 (float, not int) data points that have to do with the color of stars. Every time I run this through python, this is the error message that shows up
Traceback (most recent call last):
File "gaub.py", line 5, in
v = np.loadtxt(f, delimiter= ',', dtype="float", skiprows=1, usecols='None')
File "/sdss/ups/prd/numpy/v1_6_1/Linux/lib/python2.7/sitepackages/numpy/lib/npyio.py", line 794, in loadtxt
vals = [vals[i] for i in usecols]
TypeError: list indices must be integers, not str
I have no idea what that means. I've been trying to fix the code but I'm not sure how. If you could point out the obvious error(s) I'd be grateful!

usecols= 'None'
should be
usecols= None
Or you can skip adding the usecols argument altogether. When you specified a string numpy tried to iterate through each character with the assumption that it's an integer.

Related

matplotlib fill_between raises errors of "ufunc 'isfinite' not supported for the input types..."?

I want to fill the gap between two lines in a plot using matplotlib, but it always shows an error message like below. I have checked there is a similar question posted in the forum, but their solutions do not solve this problem.
import numpy as np
import pandas as pd
import openpyxl, os
import matplotlib.pyplot as plt
#read and convert data from an excel file to two dimensional numpy.array
df = pd.read_excel("Water.xlsx").values
#x axis is year, from 1900 to 2017
year = np.arange(1900,2018,1)
#read data from df and convert them to one-dimensional numpy.array
W_low = df[:,[29]].ravel()
W_high = df[:,[30]].ravel()
#plot the two lines and fill in the gap
fig, ax1 = plt.subplots(1, 1, sharex=True)
#no errors are raised until this line
ax1.fill_between (year, W_low, W_high)
Error message
File "C:\Users\Christina\.spyder-py3\temp.py", line 24, in <module>
ax1.fill_between (year, W_low, W_high)
File "C:\Users\Christina\anaconda3\lib\site-packages\matplotlib\__init__.py", line 1599, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\Christina\anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 5233, in fill_between
y1 = ma.masked_invalid(self.convert_yunits(y1))
File "C:\Users\Christina\anaconda3\lib\site-packages\numpy\ma\core.py", line 2377, in masked_invalid
condition = ~(np.isfinite(a))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I have solved the problem myself. It seems that the dataset I used previously was not clean enough. I copy-pasted the whole dataset to a new excel sheet, and run the same script then it works perfectly.
Hope my solution will help others who have encountered the same problem.

Pcolormesh or column_stack function has memory error when I try creating images in a loop even after closing figures

I am quite new to python and am trying to plot data using pcolormesh, and have to create many images in a loop with quite large arrays. The code to generate the data works fine and creates the arrays, however the problem arises when I try to plot. It seems that the first iteration creates the image fine, however the second has a memory error unlike the usual ones I've seen before. This one comes from the column_stack function apparently multiplying my x and y arrays together. I'm not entirely sure why it would do this, I've tried closing the figure and deleting all variables before the next iteration, however the problems still there. Heres an example of my function within the loop for plotting the data:
import numpy as np
import matplotlib.pyplot as plt
import pathlib
import sys
import os
from scipy import stats
def dataplot(x, y, time, outputfolder, filenumber, gamma, colormap):
clrmap = plt.get_cmap(colormap)
xum = x * 1e6 # convert to microns
yum = y * 1e6
fig = plt.figure()
cax = plt.pcolormesh(xum, yum, gamma.T, cmap=clrmap)
cbar = plt.colorbar(cax, orientation='vertical')
cbar.set_label('$\gamma$')
plt.xlabel('x/$\mu$m')
plt.ylabel('y/$\mu$m')
plt.title('gamma ' + time + 'ps')
plt.tight_layout()
plt.savefig(outputfolder + filenumber + 'GammaGrid_' +'.png', dpi=300)
# delete everything after:
plt.clf()
plt.close(fig)
del x
del xum
del y
del yum
del gamma
Where:
>>> gamma.shape
(26000,3000)
>>> xum.shape
(26000,)
>>> yum.shape
(3000,)
And the following error arises on the second loop iteration:
Traceback (most recent call last):
File "gamma_grid.py", line 245, in <module>
dataplot(x,y,time,outputfolder,sdffilenumber, gamma_grid, colormap, stat)
File "gamma_grid.py", line 153, in dataplot
cax = plt.pcolormesh(xum, yum, gamma.T, cmap=clrmap)
File "/users/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py", line 2758, in pcolormesh
**({"data": data} if data is not None else {}), **kwargs)
File "/users/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py", line 1599, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/users/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 6176, in pcolormesh
coords = np.column_stack((X, Y)).astype(float, copy=False)
MemoryError: Unable to allocate 1.16 GiB for an array with shape (78000000, 2) and data type float64
The shape of the array in the memory error is the same size as xum.shape * yum.shape, and I'm not quite sure why it is doing that. Any help would be greatly appreciated.

ValueError: Could not convert string to float: "nbformat":4

After some long calculation, I've got files which contain following strings.
(Each string is separated with "\t" and has "\n" at the end of the each line.)
0.0000008375000 829.685601736 555.939928236
0.0000008376000 829.511081539 555.889353246
0.0000008377000 829.336613968 555.838785601
0.0000008378000 829.162199002 555.7882253
0.0000008379000 828.987836621 555.737672342
0.0000008380000 828.813526805 555.687126727
0.0000008381000 828.639269533 555.636588453
Then I tried to plot these files. (The file's name is starting with P.)
fList = np.array(gl.glob("P*"))
for i in fList:
f = open(i, "r")
data = f.read()
data = data.replace("\n", "\t")
data = np.array(data.split("\t"))[:-1].reshape(-1,3)
plt.plot(data[:,0], data[:,1], label=i)
Then I ended up with following error.
(Error pointer indicates this happened at the line plt.plot(data[:,0], data[:,1], label=i))
ValueError: could not convert string to float: "nbformat": 4,
I've looked up some other tutorials or walkthroughs but unfortunately, could not understand how to fix this issue. Any help or advice would be very grateful.
You can directly use numpy to read in the file into three arrays:
import numpy as np
import matplotlib.pyplot as plt
from glob import glob
fList = glob("P*")
for i in fList:
x,y,z = np.loadtxt(i, unpack=True)
plt.plot(x,y, label=i)
plt.legend()
plt.show()

Length-1 Arrays and Python Scalars Via plt.text

I'm trying to use plt.text to plot temperature values at their associated lat/lon points on a plot.
After reviewing the plt.text documentation, it appears that the plotted value (third arg) has to be a number and that the number has to be a whole number, NOT a number with decimals.
Below is the code that I'm trying to work with and the associated traceback error that I'm receiving:
Script Code:
data = np.loadtxt('/.../.../.../tmax_day0', delimiter=',', skiprows=1)
grid_x, grid_y = np.mgrid[-85:64:dx, 34:49:dx]
temp = data[:,2]
#print temp
grid_z = griddata((data[:,1],data[:,0]), data[:,2], (grid_x,grid_y), method='linear')
x,y = m(data[:,1], data[:,0]) # flip lat/lon
grid_x,grid_y = m(grid_x,grid_y)
#m.plot(x,y, 'ko', markersize=2)
def str_to_float(str):
try:
number = float(str)
except ValueError:
number = 0.0
return number
fmt = str_to_float(temp)
#annotate point temperature on plot
plt.text(grid_x, grid_y, fmt, fontdict=None)
Traceback Error:
Traceback (most recent call last):
File "plotpoints.py", line 56, in <module>
fmt = str_to_float(temp)
File "plotpoints.py", line 51, in str_to_float
number = float(str)
TypeError: only length-1 arrays can be converted to Python scalars
Data sample from text file tmax_day0:
latitude,longitude,value
36.65408,-83.21783,90
41.00928,-74.73628,92.02
43.77714,-71.75598,90
44.41944,-72.01944,88.8
39.5803,-79.3394,79
38.3154,-76.5501,86
38.91444,-82.09833,94
40.64985,-75.44771,92.6
41.25389,-70.05972,81.2
39.45202,-74.56699,90.88
I was able to achieve plotting data values only by using the following code:
for i in range(len(temp)):
plt.text(x[i], y[i], temp[i], va="top", family="monospace")
Result:
You aren't using a "proper" array, and are instead using a numpy array. Numpy arrays don't play well with non-numpy functions.
Going from your comment, this has been edited.
You would first need to fix the string so it's a proper array.
fmt = fmt[0].split()
I think should work to create a new (normal) array of strings. And then this to map that to an array of floats:
list_of_floats = np.array(map(float, fmt))

ValueError when trying to save ndarray (Numpy)

I am trying to translate a project I have in MATLAB to Python+Numpy because MATLAB keeps running out of memory. The file I have is rather long, so I have tried to make a minimal example that shows the same error.
Basically I'm making a 2d histogram of a dataset, and want to save it after some processing. The problem is that the numpy.save function throws a "ValueError: setting an array element with a sequence" when I try to save the output of the histogram function. I can't find the problem when I look at the docs of Numpy.
My version of Python is 2.6.6, Numpy version 1.4.1 on a Debian distro.
import numpy as np
import random
n_samples = 5
rows = 5
out_file = file('dens.bin','wb')
x_bins = np.arange(-2.005,2.005,0.01)
y_bins = np.arange(-0.5,n_samples+0.5)
listy = [random.gauss(0,1) for r in range(n_samples*rows)]
dens = np.histogram2d( listy, \
range(n_samples)*rows, \
[y_bins, x_bins])
print 'Write data'
np.savez(out_file, dens)
out_file.close()
Full output:
$ python error.py
Write data
Traceback (most recent call last):
File "error.py", line 19, in <module>
np.savez(out_file, dens)
File "/usr/lib/pymodules/python2.6/numpy/lib/io.py", line 439, in savez
format.write_array(fid, np.asanyarray(val))
File "/usr/lib/pymodules/python2.6/numpy/core/numeric.py", line 312, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
ValueError: setting an array element with a sequence.
Note that np.histogram2d actually returns a tuple of three arrays: (hist, x_bins, y_bins). If you want to save all three of these, you have to unpack them as #Francesco said.
dens = np.histogram2d(listy,
range(n_samples)*rows,
[y_bins, x_bins])
np.savez('dens.bin', *dens)
Alternatively, if you only need the histogram itself, you could save just that.
np.savez('dens.bin', dens[0])
If you want to keep track of which of these is which, use the **kwds instead of the *args
denskw = dict(zip(['hist','y_bins','x_bins'], dens))
np.savez('dens.bin', **denskw)
Then, you can load it like
dens = np.load('dens.bin')
hist = dens['hist']# etc

Categories