So from the database, I'm trying to plot a histogram using the matplot lib library in python.
as shown here:
cnx = sqlite3.connect('practice.db')
sql = pd.read_sql_query('''
SELECT CAST((deliverydistance/1)as int)*1 as bin, count(*)
FROM orders
group by 1
order by 1;
''',cnx)
which outputs
This
From the sql table, I try to extract the columns using a for loop and place them in array.
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
print(distance)
print(counts)
OUTPUT:
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
When I plot a histogram
plt.hist(counts,bins=distance)
I get this out put:
click here
My question is, how do I make it so that the count is on the Y axis and the distance is on the X axis? It doesn't seem to allow me to put it there.
you could also skip the for loop and plot direct from your pandas dataframe using
sql.bin.plot(kind='hist', weights=sql['count(*)'])
or with the for loop
import matplotlib.pyplot as plt
import pandas as pd
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
plt.hist(distance, bins=distance, weights=counts)
You can skip the middle section where you count the instances of each distance. Check out this example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'distance':np.round(20 * np.random.random(100))})
df['distance'].hist(bins = np.arange(0,21,1))
Pandas has a built-in histogram plot which counts, then plots the occurences of each distance. You can specify the bins (in this case 0-20 with a width of 1).
If you are not looking for a bar chart and are looking for a horizontal histogram, then you are looking to pass orientation='horizontal':
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
# plt.style.use('dark_background')
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
plt.hist(counts,bins=distance, orientation='horizontal')
Use :
plt.bar(distance,counts)
Related
Trying to understand how the time series of matplotlib works.
Unfortunately, this doc just load data straight from a file using bumpy, which makes it very cryptic for non-fluent numpy adepts.
From the doc:
with cbook.get_sample_data('goog.npz') as datafile:
r = np.load(datafile)['price_data'].view(np.recarray)
r = r[-30:] # get the last 30 days
# Matplotlib works better with datetime.datetime than np.datetime64, but the
# latter is more portable.
date = r.date.astype('O')
In my case, I have a dictionary of datetime (key) and int, which I can transform to an array or list, but I wasn't quite successful to get anything that pyplot would take and the doc isn't much of help, especially for timeseries.
def toArray(dict):
data = list(dict.items())
return np.array(data)
>>>
[datetime.datetime(2020, 5, 4, 16, 44) -13]
[datetime.datetime(2020, 5, 4, 16, 45) 7]
[datetime.datetime(2020, 5, 4, 16, 46) -11]
[datetime.datetime(2020, 5, 4, 16, 47) -75]
[datetime.datetime(2020, 5, 4, 16, 48) -41]
[datetime.datetime(2020, 5, 4, 16, 49) -39]
[datetime.datetime(2020, 5, 4, 16, 50) -4]
The most important part is to split X axis from Y axis (in your case - dates from values). Using your function toArray() to retrieve data, the following code produces a desired result:
import matplotlib.pyplot as plt
data = toArray(your_dict)
fig, ax = plt.subplots(figsize=(20, 10))
dates = [x[0] for x in data]
values = [x[1] for x in data]
ax.plot(dates, values, 'o-')
ax.set_title("Default")
fig.autofmt_xdate()
plt.show()
Note how we split data from 2D array of dates and values into two 1D arrays dates and values.
The sample data is as follows:
unique_list = ['home0', 'page_a0', 'page_b0', 'page_a1', 'page_b1',
'page_c1', 'page_b2', 'page_a2', 'page_c2', 'page_c3']
sources = [0, 0, 1, 2, 2, 3, 3, 4, 4, 7, 6]
targets = [3, 4, 4, 3, 5, 6, 8, 7, 8, 9, 9]
values = [2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2]
Using the sample code from the documentation
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = unique_list,
color = "blue"
),
link = dict(
source = sources,
target = targets,
value = values
))])
fig.show()
This outputs the following sankey diagram
However, I would like to get all the values which end in the same number in the same vertical column, just like how the leftmost column has all of it's nodes ending with a 0. I see in the docs that it is possible to move the node positions, however I was wondering if there was a cleaner way to do it other than manually inputting x and y values. Any help appreciated.
In go.Sankey() set arrangement='snap' and adjust x and y positions in x=<list> and y=<list>. The following setup will place your nodes as requested.
Plot:
Please note that the y-values are not explicitly set in this example. As soon as there are more than one node for a common x-value, the y-values will be adjusted automatically for all nodes to be displayed in the same vertical position. If you do want to set all positions explicitly, just set arrangement='fixed'
Edit:
I've added a custom function nodify() that assigns identical x-positions to label names that have a common ending such as '0' in ['home0', 'page_a0', 'page_b0']. Now, if you as an example change page_c1 to page_c2 you'll get this:
Complete code:
import plotly.graph_objects as go
unique_list = ['home0', 'page_a0', 'page_b0', 'page_a1', 'page_b1',
'page_c1', 'page_b2', 'page_a2', 'page_c2', 'page_c3']
sources = [0, 0, 1, 2, 2, 3, 3, 4, 4, 7, 6]
targets = [3, 4, 4, 3, 5, 6, 8, 7, 8, 9, 9]
values = [2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2]
def nodify(node_names):
node_names = unique_list
# uniqe name endings
ends = sorted(list(set([e[-1] for e in node_names])))
# intervals
steps = 1/len(ends)
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[-1]] for n in node_names]
y_values = [0.1]*len(x_values)
return x_values, y_values
nodified = nodify(node_names=unique_list)
# plotly setup
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = unique_list,
color = "blue",
x=nodified[0],
y=nodified[1]
),
link = dict(
source = sources,
target = targets,
value = values
))])
fig.show()
I am a beginner in Python and I use NearestNeighbors in sklearn and the output is:
print(neigh.kneighbors([[0.00015217, 0.00050968, 0.00044049, 0.00014538,
0.00077339, 0.0020284 , 0.00047572]]))
And the output is:
(array([[1.01980586e-08, 7.73354596e-05, 7.73354596e-05, 1.20134585e-04,
1.39792434e-04, 1.48002389e-04, 1.98794609e-04, 4.63512739e-04,
5.31436554e-04, 5.36960418e-04, 5.72679303e-04, 6.28187320e-04,
6.67923141e-04, 7.51928163e-04, 8.97313642e-04, 1.00023442e-03,
1.06114362e-03, 1.11943158e-03, 1.12626043e-03, 1.20185118e-03,
1.51073901e-03, 1.71592746e-03, 1.73362257e-03]]),array([[ 0, 16, 15,
19,1, 23, 5, 8, 20, 9,6, 10, 17, 3, 21, 22,14, 2, 13, 7, 11, 12,
18]],dtype=int64))
I would like to import these data to csv because I need both the arrays in csv. how can I separate these arrays?
hh = neigh.kneighbors([[0.00015217, 0.00050968, 0.00044049, 0.00014538,
0.00077339, 0.0020284 , 0.00047572]])
first_array = hh[0]
second_array = hh[1]
I am trying to do a piecewise linear regression in Python and the data looks like this,
I need to fit 3 lines for each section. Any idea how? I am having the following code, but the result is shown below. Any help would be appreciated.
import numpy as np
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy import optimize
def piecewise(x,x0,x1,y0,y1,k0,k1,k2):
return np.piecewise(x , [x <= x0, np.logical_and(x0<x, x< x1),x>x1] , [lambda x:k0*x + y0, lambda x:k1*(x-x0)+y1+k0*x0 lambda x:k2*(x-x1) y0+y1+k0*x0+k1*(x1-x0)])
x1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y1 = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y1 = np.flip(y1,0)
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y = np.flip(y,0)
perr_min = np.inf
p_best = None
for n in range(100):
k = np.random.rand(7)*20
p , e = optimize.curve_fit(piecewise, x1, y1,p0=k)
perr = np.sum(np.abs(y1-piecewise(x1, *p)))
if(perr < perr_min):
perr_min = perr
p_best = p
xd = np.linspace(0, 21, 100)
plt.figure()
plt.plot(x1, y1, "o")
y_out = piecewise(xd, *p_best)
plt.plot(xd, y_out)
plt.show()
data with fit
Thanks.
A very simple method (without iteration, without initial guess) can solve this problem.
The method of calculus comes from page 30 of this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf (copy below).
The next figure shows the result :
The equation of the fitted function is :
Or equivalently :
H is the Heaviside function.
In addition, the details of the numerical calculus are given below :
I am plotting two lists using matplotlib python library. There are two arrays x and y which look like this when plotted-
Click here for plot (sorry don't have enough reputation to post pictures here)
The code used is this-
import matplotlib.pyplot as plt
plt.plot(x,y,"bo")
plt.fill(x,y,'#99d8cp')
It plots the points then connects the points using a line. But the problem is that it is not connecting the points correctly. Point 0 and 2 on x axis are connected wrongly instead of 1 and 2. Similarly on the other end it connects points 17 to 19, instead of 18 to 19. I also tried plotting simple line graph using-
plt.plot(x,y)
But then too it wrongly connected the points. Would really appreciated if anyone could point me in right direction as to why this is happening and what can be done to resolve it.
Thanks!!
The lines of matplotlib expects that the coordinates are in order, therefore you are connecting your points in a 'strange' way (although exactly like you told matplotlib to do, e.g. from (0,1) to (3,2)). You can fix this by simply sorting the data prior to plotting.
#! /usr/bin/env python
import matplotlib.pyplot as plt
x = [20, 21, 22, 23, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
y = [ 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1]
x2,y2 = zip(*sorted(zip(x,y),key=lambda x: x[0]))
plt.plot(x2,y2)
plt.show()
That should give you what you want, as shown below: