So from the database, I'm trying to plot a histogram using the matplot lib library in python.
as shown here:
cnx = sqlite3.connect('practice.db')
sql = pd.read_sql_query('''
SELECT CAST((deliverydistance/1)as int)*1 as bin, count(*)
FROM orders
group by 1
order by 1;
''',cnx)
which outputs
This
From the sql table, I try to extract the columns using a for loop and place them in array.
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
print(distance)
print(counts)
OUTPUT:
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
When I plot a histogram
plt.hist(counts,bins=distance)
I get this out put:
click here
My question is, how do I make it so that the count is on the Y axis and the distance is on the X axis? It doesn't seem to allow me to put it there.
you could also skip the for loop and plot direct from your pandas dataframe using
sql.bin.plot(kind='hist', weights=sql['count(*)'])
or with the for loop
import matplotlib.pyplot as plt
import pandas as pd
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
plt.hist(distance, bins=distance, weights=counts)
You can skip the middle section where you count the instances of each distance. Check out this example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'distance':np.round(20 * np.random.random(100))})
df['distance'].hist(bins = np.arange(0,21,1))
Pandas has a built-in histogram plot which counts, then plots the occurences of each distance. You can specify the bins (in this case 0-20 with a width of 1).
If you are not looking for a bar chart and are looking for a horizontal histogram, then you are looking to pass orientation='horizontal':
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
# plt.style.use('dark_background')
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
plt.hist(counts,bins=distance, orientation='horizontal')
Use :
plt.bar(distance,counts)
The sample data is as follows:
unique_list = ['home0', 'page_a0', 'page_b0', 'page_a1', 'page_b1',
'page_c1', 'page_b2', 'page_a2', 'page_c2', 'page_c3']
sources = [0, 0, 1, 2, 2, 3, 3, 4, 4, 7, 6]
targets = [3, 4, 4, 3, 5, 6, 8, 7, 8, 9, 9]
values = [2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2]
Using the sample code from the documentation
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = unique_list,
color = "blue"
),
link = dict(
source = sources,
target = targets,
value = values
))])
fig.show()
This outputs the following sankey diagram
However, I would like to get all the values which end in the same number in the same vertical column, just like how the leftmost column has all of it's nodes ending with a 0. I see in the docs that it is possible to move the node positions, however I was wondering if there was a cleaner way to do it other than manually inputting x and y values. Any help appreciated.
In go.Sankey() set arrangement='snap' and adjust x and y positions in x=<list> and y=<list>. The following setup will place your nodes as requested.
Plot:
Please note that the y-values are not explicitly set in this example. As soon as there are more than one node for a common x-value, the y-values will be adjusted automatically for all nodes to be displayed in the same vertical position. If you do want to set all positions explicitly, just set arrangement='fixed'
Edit:
I've added a custom function nodify() that assigns identical x-positions to label names that have a common ending such as '0' in ['home0', 'page_a0', 'page_b0']. Now, if you as an example change page_c1 to page_c2 you'll get this:
Complete code:
import plotly.graph_objects as go
unique_list = ['home0', 'page_a0', 'page_b0', 'page_a1', 'page_b1',
'page_c1', 'page_b2', 'page_a2', 'page_c2', 'page_c3']
sources = [0, 0, 1, 2, 2, 3, 3, 4, 4, 7, 6]
targets = [3, 4, 4, 3, 5, 6, 8, 7, 8, 9, 9]
values = [2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2]
def nodify(node_names):
node_names = unique_list
# uniqe name endings
ends = sorted(list(set([e[-1] for e in node_names])))
# intervals
steps = 1/len(ends)
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
# x and y values in list form
x_values = [nodes_x[n[-1]] for n in node_names]
y_values = [0.1]*len(x_values)
return x_values, y_values
nodified = nodify(node_names=unique_list)
# plotly setup
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = unique_list,
color = "blue",
x=nodified[0],
y=nodified[1]
),
link = dict(
source = sources,
target = targets,
value = values
))])
fig.show()
I was trying to plot a 3d box in a 3d scatterplot. Basically, this was the result of an optimization problem (background is here). The box is the largest empty box possible given all the points.
In the plotly docs I noticed an example of a 3d cube built using 3dmesh. I copied this:
import plotly.graph_objects as go
x=[ 0.93855, 0.20203, 0.54967, 0.58658, 0.39931, 0.06736, 0.61786, 0.36016, 0.12761, 0.71581, 0.81998, 0.04528, 0.08231, 0.41814, 0.58679, 0.21181, 0.34489, 0.21812, 0.46830, 0.81898,
0.57360, 0.18453, 0.99792, 0.37970, 0.51954, 0.84264, 0.22431, 0.31440, 0.23893, 0.28493, 0.76353, 0.45365, 0.44480, 0.94911, 0.98050, 0.28615, 0.02626, 0.85477, 0.60404, 0.47469,
0.10588, 0.55919, 0.42194, 0.34432, 0.80530, 0.88291, 0.53627, 0.45454, 0.01345, 0.84411, 0.04520, 0.35532, 0.45255, 0.99365, 0.72259, 0.08634, 0.78806, 0.28674, 0.57993, 0.84025,
0.22766, 0.51236, 0.83945, 0.21910, 0.41881, 0.18910, 0.00183, 0.59310, 0.12687, 0.45273, 0.14348, 0.66694, 0.28690, 0.32822, 0.93954, 0.34411, 0.25276, 0.14377, 0.08142, 0.05422,
0.51448, 0.48659, 0.66585, 0.25156, 0.69205, 0.21175, 0.72413, 0.92027, 0.79572, 0.13293, 0.81984, 0.25584, 0.42517, 0.41333, 0.75978, 0.60823, 0.83418, 0.37497, 0.10177, 0.01215]
y=[ 0.61424, 0.39918, 0.57526, 0.04537, 0.24058, 0.18701, 0.18450, 0.82907, 0.66274, 0.96315, 0.58458, 0.12807, 0.38695, 0.30646, 0.88417, 0.63859, 0.40404, 0.06445, 0.19149, 0.91259,
0.99317, 0.67468, 0.12954, 0.11868, 0.79252, 0.98170, 0.74706, 0.28944, 0.55650, 0.91190, 0.26978, 0.94868, 0.82534, 0.37846, 0.38055, 0.42637, 0.26349, 0.09109, 0.10308, 0.63728,
0.37470, 0.85528, 0.19407, 0.29683, 0.71095, 0.72789, 0.47052, 0.54725, 0.62322, 0.52442, 0.32547, 0.54581, 0.51336, 0.58652, 0.76841, 0.00042, 0.80743, 0.32560, 0.29931, 0.19091,
0.95850, 0.42236, 0.70728, 0.85435, 0.79661, 0.14909, 0.80658, 0.36827, 0.46344, 0.92196, 0.09802, 0.02856, 0.73966, 0.55969, 0.34595, 0.80634, 0.18350, 0.84283, 0.04560, 0.41515,
0.50151, 0.52665, 0.44211, 0.48040, 0.39643, 0.99743, 0.18206, 0.09721, 0.33793, 0.69245, 0.97670, 0.70870, 0.75288, 0.51147, 0.22298, 0.84305, 0.62014, 0.41474, 0.82815, 0.42865]
z=[ 0.13338, 0.81253, 0.46946, 0.76145, 0.83335, 0.96434, 0.79175, 0.20481, 0.60056, 0.26519, 0.89917, 0.16271, 0.02890, 0.49017, 0.18970, 0.16751, 0.47065, 0.85533, 0.73768, 0.14031,
0.92923, 0.11933, 0.40330, 0.46713, 0.69964, 0.25784, 0.87656, 0.25886, 0.64603, 0.92604, 0.83728, 0.71988, 0.48486, 0.57123, 0.78618, 0.70429, 0.30544, 0.20687, 0.47584, 0.58176,
0.43336, 0.35453, 0.96509, 0.98293, 0.88605, 0.70571, 0.51733, 0.09292, 0.69618, 0.76415, 0.82743, 0.99876, 0.86101, 0.58373, 0.03917, 0.60540, 0.59567, 0.94481, 0.35552, 0.80555,
0.97449, 0.31020, 0.61952, 0.48569, 0.50740, 0.69248, 0.01918, 0.04973, 0.21958, 0.98663, 0.09143, 0.24220, 0.96312, 0.66227, 0.91103, 0.26285, 0.28079, 0.10938, 0.07499, 0.34065,
0.83692, 0.33815, 0.89640, 0.06275, 0.01852, 0.08153, 0.88351, 0.08171, 0.87036, 0.51620, 0.90021, 0.67128, 0.36607, 0.54804, 0.72661, 0.18951, 0.11629, 0.46170, 0.24500, 0.88841]
fig = go.Figure(data=[
go.Scatter3d(x=x, y=y, z=z,
mode='markers',
marker=dict(size=2)
),
go.Mesh3d(
# 8 vertices of a cube
x=[0.608, 0.608, 0.998, 0.998, 0.608, 0.608, 0.998, 0.998],
y=[0.091, 0.963, 0.963, 0.091, 0.091, 0.963, 0.963, 0.091],
z=[0.140, 0.140, 0.140, 0.140, 0.571, 0.571, 0.571, 0.571],
i = [7, 0, 0, 0, 4, 4, 6, 6, 4, 0, 3, 2],
j = [3, 4, 1, 2, 5, 6, 5, 2, 0, 1, 6, 3],
k = [0, 7, 2, 3, 6, 7, 1, 1, 5, 5, 7, 6],
opacity=0.6,
color='#DC143C'
)
])
fig.show()
However, the picture shows really the triangles. Any better way to draw a 3d box (in Plotly)?
For me using the argument flatshading = True did the job.
Code
fig = go.Figure(data=[
go.Scatter3d(x=x, y=y, z=z,
mode='markers',
marker=dict(size=2)
),
go.Mesh3d(
# 8 vertices of a cube
x=[0.608, 0.608, 0.998, 0.998, 0.608, 0.608, 0.998, 0.998],
y=[0.091, 0.963, 0.963, 0.091, 0.091, 0.963, 0.963, 0.091],
z=[0.140, 0.140, 0.140, 0.140, 0.571, 0.571, 0.571, 0.571],
i = [7, 0, 0, 0, 4, 4, 6, 6, 4, 0, 3, 2],
j = [3, 4, 1, 2, 5, 6, 5, 2, 0, 1, 6, 3],
k = [0, 7, 2, 3, 6, 7, 1, 1, 5, 5, 7, 6],
opacity=0.6,
color='#DC143C',
flatshading = True
)
])
Output
You can use Poly3DCollection to create shape that you need with defining shape corners.
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
from itertools import product
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# create list of corners
z = list(product([-1,1], repeat=3))
# set verts connectors
verts = [[z[0],z[1],z[5],z[4]], [z[4],z[6],z[7],z[5]], [z[7], z[6], z[2], z[3]], [z[2], z[0], z[1], z[3]],
[z[5], z[7], z[3], z[1]], [z[0], z[2], z[6], z[4]]]
ax.set_xlim3d(-2,2)
ax.set_ylim3d(-2,2)
ax.set_zlim3d(-2,2)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
# plot sides
ax.add_collection3d(Poly3DCollection(verts,facecolors='blue', linewidths=1, edgecolors='black', alpha=.1))
plt.show()
Output:
I am trying to do a piecewise linear regression in Python and the data looks like this,
I need to fit 3 lines for each section. Any idea how? I am having the following code, but the result is shown below. Any help would be appreciated.
import numpy as np
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy import optimize
def piecewise(x,x0,x1,y0,y1,k0,k1,k2):
return np.piecewise(x , [x <= x0, np.logical_and(x0<x, x< x1),x>x1] , [lambda x:k0*x + y0, lambda x:k1*(x-x0)+y1+k0*x0 lambda x:k2*(x-x1) y0+y1+k0*x0+k1*(x1-x0)])
x1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y1 = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y1 = np.flip(y1,0)
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18,19,20,21], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,145,147,149,151,153,155])
y = np.flip(y,0)
perr_min = np.inf
p_best = None
for n in range(100):
k = np.random.rand(7)*20
p , e = optimize.curve_fit(piecewise, x1, y1,p0=k)
perr = np.sum(np.abs(y1-piecewise(x1, *p)))
if(perr < perr_min):
perr_min = perr
p_best = p
xd = np.linspace(0, 21, 100)
plt.figure()
plt.plot(x1, y1, "o")
y_out = piecewise(xd, *p_best)
plt.plot(xd, y_out)
plt.show()
data with fit
Thanks.
A very simple method (without iteration, without initial guess) can solve this problem.
The method of calculus comes from page 30 of this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf (copy below).
The next figure shows the result :
The equation of the fitted function is :
Or equivalently :
H is the Heaviside function.
In addition, the details of the numerical calculus are given below :