Using Python, I am trying to write tests that compare the current output with an expected output. The output is a matplotlib figure and I would like to do this without saving the figure to a file.
I had the idea to find the cryptographic hash of the object so that I would just need to compare one hash with another to confirm that the entire object is unchanged from what is expected.
This works fine for a numpy array as follows:
import numpy as np
import hashlib
np.random.seed(1)
A = np.random.rand(10,100)
actual_hash = hashlib.sha1(A).hexdigest()
expected_hash = '38f682cab1f0bfefb84cdd6b112b7d10cde6147f'
assert actual_hash == expected_hash
When I try this on a matplotlib object I get: TypeError: object supporting the buffer API required
import hashlib
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(0,100,1000)
Y = np.sin(0.5*X)
plt.plot(X,Y)
fig = plt.gcf()
actual_hash = hashlib.sha1(fig).hexdigest() #this raises the TypeError
Any ideas how I can use hashlib to find the cryptographic hash of a matplotlib object?
Thanks.
You can get the figure as a numpy array using buffer_rgba(). Before using it you must actually draw the figure:
draw must be called at least once before this function will work and
to update the renderer for any subsequent changes to the Figure.
import hashlib
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(0,100,1000)
Y = np.sin(0.5*X)
plt.plot(X,Y)
canvas = plt.gcf().canvas
canvas.draw()
actual_hash = hashlib.sha1(np.array(canvas.buffer_rgba())).hexdigest()
Related
Hello and thank you in advance for your help!
I am getting ValueError: You must specify a period or x must be a pandas object with a DatetimeIndex with a freq not set to None when I try to do a time series decomposition that pulls from GitHub. I think I have a basic understanding of the error, but I do not get this error when I directly pull the data from the file on my computer, instead of pulling from GitHub. Why do I only get this error when I pull my data from GitHub? And how should I change my code so that I no longer get this error?
import pandas as pd
import numpy as np
%matplotlib inline
from statsmodels.tsa.seasonal import seasonal_decompose
topsoil = pd.read_csv('https://raw.githubusercontent.com/the-
datadudes/deepSoilTemperature/master/meanDickinson.csv',parse_dates=True)
topsoil = topsoil.dropna()
topsoil.head()
topsoil.plot();
result = seasonal_decompose(topsoil['Topsoil'],model='ad')
from pylab import rcParams
rcParams['figure.figsize'] = 12,5
result.plot();
Try this:
import pandas as pd
import numpy as np
%matplotlib inline
from statsmodels.tsa.seasonal import seasonal_decompose
topsoil = pd.read_csv('https://raw.githubusercontent.com/the-datadudes/deepSoilTemperature/master/meanDickinson.csv',parse_dates=True)
topsoil = topsoil.dropna()
topsoil.head()
topsoil.plot();
topsoil['Date'] = pd.to_datetime(topsoil['Date'])
topsoil = topsoil.set_index('Date').asfreq('D')
result = seasonal_decompose(topsoil, model='ad')
from pylab import rcParams
rcParams['figure.figsize'] = 12,5
result.plot();
Output:
try adding this
freq=12, extrapolate_trend=12
Full code would look like:
from pylab import rcParams
import statsmodels.api as sm
rcParams['figure.figsize'] = 12, 8
decomposition = sm.tsa.seasonal_decompose(data.Column, model='additive', freq=12, extrapolate_trend=12)
fig = decomposition.plot()
plt.show()
Trying to plot a simple graph in Jupyter Notebook with the package matplotlib, I came accross a strange problem that I had never had before.
I've seen that it has hapenned before to other people, and the answers talk about backends and other complicated stuff that I can't understand, me having only a rather basic knowledge of Python.
Here comes the code:
import numpy as np
import matplotlib.pyplot as plt
time_samples = np.arange(17000)
force_samples = np.arange(17000)
plt.plot(time_samples,force_samples)
plt.show()
time_samples2 = np.random.rand(1,1000)
force_samples2 = np.random.rand(1,1000)
plt.plot(time_samples2,force_samples2)
plt.show()
And this is what I get:
I have no clue why this is happenning.
I think the array dimension is the issue. x and y should be a 1D array.
import numpy as np
import matplotlib.pyplot as plt
time_samples = np.arange(17000)
force_samples = np.arange(17000)
plt.plot(time_samples,force_samples)
plt.show()
time_samples2 = np.random.rand(1000)
force_samples2 = np.random.rand(1000)
plt.plot(time_samples2,force_samples2)
plt.show()
I have a series of data which consists of values from several experiments (1-40, in the MWE it is 1-5). The overall amount of entries in my original data is ~4.000.000, which I try to smooth in order to display it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import spline
from statsmodels.nonparametric.smoothers_lowess import lowess
df = pd.DataFrame()
df["values"] = np.random.randint(100000, 200000, 1000)
df["id"] = [1,2,3,4,5] * 200
plt.figure(1, figsize=(11.69,8.27))
# Both fail for my amount of data:
plt.plot(spline(df["values"], df["id"], range(100)), "r-")
plt.plot(lowess(df["values"], df["id"]), "r-")
Both, scipy.interplate and statsmodels.nonparametric.smoothers_lowess.lowess, throw out of memory exceptions for my data. Is there any efficient way to solve this like in, e.g., GNU R using ggplot2 and geom_smooth()?
I can't quite tell what you're getting at with all the dimensions to your data, but one very simple thing you can try is to just use the 'markevery' kwarg like so:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,100,1E7)
y=x**2
plt.figure(1, figsize=(11.69,8.27))
plt.plot(x,y,markevery=100)
plt.show()
This will only plot every nth point (n=100 here).
If that doesn't help then you may want to try just a simple numpy interpolation with fewer samples like so:
x_large=np.linspace(1,100,1E7)
y_large=x**2
x_small=np.linspace(1,100,1E3)
y_small=np.interp(x_small,x_large,y_large)
plt.plot(x_small,y_small)
I'm trying to color code a series of lines in a plot based on the python module datetime. I've tried mapping the the datetime data (as a numpy array) to RGBA using ScalarMappable; however, I'm running into difficulties. Please find a segment of the code below:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.cm as mplcm
import matplotlib.colors as colors
clmap = 'jet'
cm = plt.get_cmap(clmap)
iDT = dt.datetime(2012,1,1,0,0,0)
fDT = dt.datetime(2012,1,2,0,0,0)
cNorm = colors.Normalize( vmin=iDT, vmax=fDT )
scalarMap = mplcm.ScalarMappable( norm=cNorm, cmap=cm )
scalarMap.set_array(retDays)
fig,ax = plt.subplots()
ax.set_color_cycle( [scalarMap.to_rgba(x) for x in retDays] )
Where retDays is a numpy array of datetime values.
I get the following error when using set_color_cycle:
ValueError: setting an array element with a sequence.
Your help is greatly appreciated.
What happened is I followed this demo, I modified it to suit my needs had it working, changed it to use a function to draw two graphs but now it doesn't work at all using plt.show() or plt.savefig()
here's my code
import csv
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
# I converted excel to a csv file
data = [x for x in csv.reader(open('ASS1_Q1.csv'))]
question1 = {}
question1['males'] = []
question1['females'] = []
for x in data:
if x[0].lower() == "male":
question1["males"].append(float(x[1]))
elif x[0].lower() == "female":
question1['females'].append(float(x[1]))
else:
print "Not a valid dataline", x
def plot_graph(data, filename):
fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(np.array(data), bins=13, align='mid', facecolor='#888888')
ax.set_xlabel('Speed in kph')
ax.set_ylabel('Amount of Females')
ax.set_xlim(min(data, max(data)))
# plt.savefig(filename)
plt.show()
plot_graph(question1['males'], "ASS1Q1-males.eps")
#plot_graph(question1['females'], "ASSQ2-females.eps")
print summary(question1['males'])
print summary(question1['females'])
Can someone explain why this is happening? what am I doing wrong?
Try removing
import matplotlib
matplotlib.use('Agg')
The command
python -c 'import matplotlib; matplotlib.use("")'
will show you the valid string arguments that can be sent to matplotlib.use.
On my machine, 'Agg' is listed as valid, though I get no output when this is set. If you are curious, you could just keep trying various options until you find one that works.
When you find the one that your prefer, you may also find it more convenient to set something like
backend : GtkAgg
in your ~/.matplotlib/matplotlibrc instead of using matplotlib.use(...).