Trying to understand how the time series of matplotlib works.
Unfortunately, this doc just load data straight from a file using bumpy, which makes it very cryptic for non-fluent numpy adepts.
From the doc:
with cbook.get_sample_data('goog.npz') as datafile:
r = np.load(datafile)['price_data'].view(np.recarray)
r = r[-30:] # get the last 30 days
# Matplotlib works better with datetime.datetime than np.datetime64, but the
# latter is more portable.
date = r.date.astype('O')
In my case, I have a dictionary of datetime (key) and int, which I can transform to an array or list, but I wasn't quite successful to get anything that pyplot would take and the doc isn't much of help, especially for timeseries.
def toArray(dict):
data = list(dict.items())
return np.array(data)
>>>
[datetime.datetime(2020, 5, 4, 16, 44) -13]
[datetime.datetime(2020, 5, 4, 16, 45) 7]
[datetime.datetime(2020, 5, 4, 16, 46) -11]
[datetime.datetime(2020, 5, 4, 16, 47) -75]
[datetime.datetime(2020, 5, 4, 16, 48) -41]
[datetime.datetime(2020, 5, 4, 16, 49) -39]
[datetime.datetime(2020, 5, 4, 16, 50) -4]
The most important part is to split X axis from Y axis (in your case - dates from values). Using your function toArray() to retrieve data, the following code produces a desired result:
import matplotlib.pyplot as plt
data = toArray(your_dict)
fig, ax = plt.subplots(figsize=(20, 10))
dates = [x[0] for x in data]
values = [x[1] for x in data]
ax.plot(dates, values, 'o-')
ax.set_title("Default")
fig.autofmt_xdate()
plt.show()
Note how we split data from 2D array of dates and values into two 1D arrays dates and values.
Related
So from the database, I'm trying to plot a histogram using the matplot lib library in python.
as shown here:
cnx = sqlite3.connect('practice.db')
sql = pd.read_sql_query('''
SELECT CAST((deliverydistance/1)as int)*1 as bin, count(*)
FROM orders
group by 1
order by 1;
''',cnx)
which outputs
This
From the sql table, I try to extract the columns using a for loop and place them in array.
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
print(distance)
print(counts)
OUTPUT:
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
When I plot a histogram
plt.hist(counts,bins=distance)
I get this out put:
click here
My question is, how do I make it so that the count is on the Y axis and the distance is on the X axis? It doesn't seem to allow me to put it there.
you could also skip the for loop and plot direct from your pandas dataframe using
sql.bin.plot(kind='hist', weights=sql['count(*)'])
or with the for loop
import matplotlib.pyplot as plt
import pandas as pd
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
plt.hist(distance, bins=distance, weights=counts)
You can skip the middle section where you count the instances of each distance. Check out this example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'distance':np.round(20 * np.random.random(100))})
df['distance'].hist(bins = np.arange(0,21,1))
Pandas has a built-in histogram plot which counts, then plots the occurences of each distance. You can specify the bins (in this case 0-20 with a width of 1).
If you are not looking for a bar chart and are looking for a horizontal histogram, then you are looking to pass orientation='horizontal':
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
# plt.style.use('dark_background')
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
plt.hist(counts,bins=distance, orientation='horizontal')
Use :
plt.bar(distance,counts)
I want to achieve the following workflow (as an edge case in a larger project):
Create Dask Array from 2D numpy array
Correlate with map_overlap using depth=1 and no boundary, similar to dask_image.ndfilters.correlate
Compute and store dask array in original numpy array
I have trouble achieving step 3 without doubling memory usage. I get artifacts at the chunk boundaries when using dask_array.store(numpy_array, compute=True), but not when I use numpy_array = dask_array.compute().
My attempt at a minimum reproducible example which share my workflow is using dask_image.correlate:
import numpy as np
import dask.array as da
import matplotlib.pyplot as plt
import dask_image.ndfilters as da_ndf
def initalize_arrays():
array = np.ones((150,100),dtype=np.uint8)
dask_array = da.from_array(array,chunks=((8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6),
(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4)))
return array, dask_array
array, dask_array = initalize_arrays()
weight_sums = da_ndf.correlate(dask_array,weights=np.ones((3,3)),mode='constant',cval=0.0)
weight_sums.store(array,compute=True)
array_store = array.copy()
array, dask_array = initalize_arrays()
weight_sums = da_ndf.correlate(dask_array,weights=np.ones((3,3)),mode='constant',cval=0.0)
array_compute = weight_sums.compute()
Image of the results, cannot embed images yet.
Image of 2D arrays showing artifacts at chunk boundaries
I have a set of data like this:
numpy.array([[3, 7],[5, 8],[6, 19],[8, 59],[10, 42],[12, 54], [13, 32], [14, 19], [99, 19]])
which I want to split into number of chunkcs with a percantage of overlapping, for each column separatly... for example for column 1, splitting into 3 chunkcs with %50 overlapping (results in a 2-d array):
[[3, 5, 6, 8,],
[6, 8, 10, 12,],
[10, 12, 13, 14,]]
(ignoring last row which will result in [13, 14, 99] not identical in size as the rest).
I'm trying to make a function that takes the array, number of chunkcs and overlpapping percantage and returns the results.
That's a window function, so use skimage.util.view_as_windows:
from skimage.util import view_as_windows
out = view_as_windows(in_arr[:, 0], window_shape = 4, step = 2)
If you need numpy only, you can use this recipe
For numpy only, quite fast approach is:
def rolling(a, window, step):
shape = ((a.size - window)//step + 1, window)
strides = (step*a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
And you can call it like so:
rolling(arr[:,0].copy(), 4, 2)
Remark: I've got unexpected outputs for rolling(arr[:,0], 4, 2) so just took a copy instead.
I am trying to do a piecewise linear regression in Python and the data looks like this,
I need to fit 3 lines for each section. Any idea how? I am having the following code, but the result is shown below. Any help would be appreciated.
import numpy as np
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy import optimize
def piecewise(x,x0,x1,y0,y1,k0,k1,k2):
return np.piecewise(x , [x <= x0, np.logical_and(x0<x, x< x1),x>x1] , [lambda x:k0*x + y0, lambda x:k1*(x-x0)+y1+k0*x0 lambda x:k2*(x-x1) y0+y1+k0*x0+k1*(x1-x0)])
x1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y1 = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y1 = np.flip(y1,0)
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y = np.flip(y,0)
perr_min = np.inf
p_best = None
for n in range(100):
k = np.random.rand(7)*20
p , e = optimize.curve_fit(piecewise, x1, y1,p0=k)
perr = np.sum(np.abs(y1-piecewise(x1, *p)))
if(perr < perr_min):
perr_min = perr
p_best = p
xd = np.linspace(0, 21, 100)
plt.figure()
plt.plot(x1, y1, "o")
y_out = piecewise(xd, *p_best)
plt.plot(xd, y_out)
plt.show()
data with fit
Thanks.
A very simple method (without iteration, without initial guess) can solve this problem.
The method of calculus comes from page 30 of this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf (copy below).
The next figure shows the result :
The equation of the fitted function is :
Or equivalently :
H is the Heaviside function.
In addition, the details of the numerical calculus are given below :
I am plotting two lists using matplotlib python library. There are two arrays x and y which look like this when plotted-
Click here for plot (sorry don't have enough reputation to post pictures here)
The code used is this-
import matplotlib.pyplot as plt
plt.plot(x,y,"bo")
plt.fill(x,y,'#99d8cp')
It plots the points then connects the points using a line. But the problem is that it is not connecting the points correctly. Point 0 and 2 on x axis are connected wrongly instead of 1 and 2. Similarly on the other end it connects points 17 to 19, instead of 18 to 19. I also tried plotting simple line graph using-
plt.plot(x,y)
But then too it wrongly connected the points. Would really appreciated if anyone could point me in right direction as to why this is happening and what can be done to resolve it.
Thanks!!
The lines of matplotlib expects that the coordinates are in order, therefore you are connecting your points in a 'strange' way (although exactly like you told matplotlib to do, e.g. from (0,1) to (3,2)). You can fix this by simply sorting the data prior to plotting.
#! /usr/bin/env python
import matplotlib.pyplot as plt
x = [20, 21, 22, 23, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
y = [ 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1]
x2,y2 = zip(*sorted(zip(x,y),key=lambda x: x[0]))
plt.plot(x2,y2)
plt.show()
That should give you what you want, as shown below: