Generate a data set consisting of N=100 2-dimensional samples - python

How do I generate a data set consisting of N = 100 2-dimensional samples x = (x1,x2)T ∈ R2 drawn from a 2-dimensional Gaussian distribution, with mean
µ = (1,1)T
and covariance matrix
Σ = (0.3 0.2
0.2 0.2)
I'm told that you can use a Matlab function randn, but don't know how to implement it in Python?

Just to elaborate on #EamonNerbonne's answer: the following uses Cholesky decomposition of the covariance matrix to generate correlated variables from uncorrelated normally distributed random variables.
import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg
N = 1000
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()
The yellow dots were generated by np.random.multivariate_normal. The green dots were generated by multiplying normally distributed points by the Cholesky decomposition matrix L.

You are looking for numpy.random.multivariate_normal
Code
>>> import numpy
>>> print numpy.random.multivariate_normal([1,1], [[0.3, 0.2],[0.2, 0.2]], 100)
[[ 0.02999043 0.09590078]
[ 1.35743021 1.08199363]
[ 1.15721179 0.87750625]
[ 0.96879114 0.94503228]
[ 1.23989167 1.13473083]
[ 1.55917608 0.81530847]
[ 0.89985651 0.7071519 ]
[ 0.37494324 0.739433 ]
[ 1.45121732 1.17168444]
[ 0.69680785 1.2727178 ]
[ 0.35600769 0.46569276]
[ 2.14187488 1.8758589 ]
[ 1.59276393 1.54971412]
[ 1.71227009 1.63429704]
[ 1.05013136 1.1669758 ]
[ 1.34344004 1.37369725]
[ 1.82975724 1.49866636]
[ 0.80553877 1.26753018]
[ 1.74331784 1.27211784]
[ 1.23044292 1.18110192]
[ 1.07675493 1.05940509]
[ 0.15495771 0.64536509]
[ 0.77409745 1.0174171 ]
[ 1.20062726 1.3870498 ]
[ 0.39619719 0.77919884]
[ 0.87209168 1.00248145]
[ 1.32273339 1.54428262]
[ 2.11848535 1.44338789]
[ 1.45226461 1.42061198]
[ 0.33775737 0.24968543]
[ 1.06982557 0.64674411]
[ 0.92113229 1.0583153 ]
[ 0.54987592 0.73198037]
[ 1.06559727 0.77891362]
[ 0.84371805 0.72957046]
[ 1.83614557 1.40582746]
[ 0.53146009 0.72294094]
[ 0.98927818 0.73732053]
[ 1.03984002 0.89426628]
[ 0.38142362 0.32471126]
[ 1.44464929 1.15407227]
[-0.22601279 0.21045592]
[-0.01995875 0.45051782]
[ 0.58779449 0.44486237]
[ 1.31335981 0.92875936]
[ 0.42200098 0.6942829 ]
[ 0.10714426 0.11083002]
[ 1.44997839 1.19052704]
[ 0.78630506 0.45877582]
[ 1.63432202 1.95066539]
[ 0.56680926 0.92203111]
[ 0.08841491 0.62890576]
[ 1.4703602 1.4924649 ]
[ 1.01118864 1.44749407]
[ 1.19936276 1.02534702]
[ 0.67893239 0.8482461 ]
[ 0.71537211 0.53279103]
[ 1.08031573 1.00779064]
[ 0.66412568 0.57121041]
[ 0.96098528 0.72318386]
[ 0.7690299 0.76058713]
[ 0.77466896 0.77559282]
[ 0.47906664 0.58602633]
[ 0.52481326 0.78486453]
[-0.40240438 0.17374116]
[ 0.75730444 0.22365892]
[ 0.67811008 1.17730408]
[ 1.62245699 1.71775386]
[ 1.12317847 1.04252136]
[-0.06461117 0.23557416]
[ 0.46299482 0.51585414]
[ 0.88125676 1.23284201]
[ 0.57920534 0.63765861]
[ 0.88239858 1.32092112]
[ 0.63500551 0.94788141]
[ 1.76588148 1.63856465]
[ 0.65026599 0.6899672 ]
[ 0.06854287 0.29712499]
[ 0.61575737 0.87526625]
[ 0.30057552 0.54475194]
[ 0.66578769 0.21034844]
[ 0.94670438 0.7699764 ]
[ 0.39870371 0.91681577]
[ 1.37531351 1.62337899]
[ 1.92350877 1.34382017]
[ 0.56631877 0.77456137]
[ 1.18702642 0.63700271]
[ 0.74002244 1.04535471]
[ 0.3272063 0.75097037]
[ 1.57583435 1.55809705]
[ 0.44325124 0.39620769]
[ 0.59762516 0.58304621]
[ 0.72253698 0.68302097]
[ 0.93459597 1.01101948]
[ 0.50139577 0.52500942]
[ 0.84696441 0.68679341]
[ 0.63483432 0.22205385]
[ 1.43642478 1.34724612]
[ 1.58663111 1.49941374]
[ 0.73832806 0.95690866]]
>>>

Although numpy has handy utility functions, you can always "rescale" multiple independant normally distributed variables to match your given covariance matrix. So if you can generate a column-vector x (or many vectors grouped in a matrix) in which each element is normally distributed, and you scale by matrix M, the result will have covariance M M^T. Conversely, if you decompose your covariance C into the form M M^T then it's really simple to generate such a distribution even without the utility functions numpy provides (just multiply your bunch of normally distributed vectors by M).
This is perhaps not the answer you're directly looking for, but it's useful to keep in mind e.g.:
if you ever find yourself scaling the result of the random generation, you could instead combine the scaling with your initial covariance
if you need to ever port code to libraries that don't directly support such a utility method it's very easy to implement yourself.

Related

Python: How to create trajectory between start and end positions with specified speeds in a vectorized way?

I am trying to find a way to create a trajectory between two points with specified speeds in a vectorized way.
For example start point is [0, 0] and end point is [25, 15].
Next I need to specified speeds [25, 4, 8, 2] and their corresponding probabilities [0.4, 0.2, 0.3, 0.1]. So for each time interval, speed can be 25 m/s (with probability 40%), or 4 m/s (probability 20%), etc.
Here is an example of desired output:
[[0,0], [3,1], [6,3.5], [12,7], [14,8], [19,11], [25,15]]
As you see object moved from [0,0] to [3,1] with speed e.g. 4 m/s, next from [3,1] to [6,3.5] with speed e.g. 8 m/s, etc.
(note it is just example with approximate coordinates)
Here is my attempt to create a script to generate such trajectories:
from math import asin, degrees
ue_speed = [3, 4, 8, 25]
ue_speed_prob = [0.4, 0.2, 0.3, 0.1]
steps = 20 # this parameter hardcoded and should be removed
square_size = 100 # need for scaling
time_interval = 100 # need for scaling
start_coord = [0, 0]
end_coord = [25, 15]
b = end_coord[0] - start_coord[0]
a = end_coord[1] - start_coord[1]
c = (a**2 + b**2)**0.5
theta = [degrees(asin(a/c))] * (steps - 1)
start = [start_coord]
v = np.random.choice(ue_speed, size=steps-1, p=ue_speed_prob)
R = np.expand_dims(((v * time_interval) / square_size), axis=-1)
xy = np.dstack((np.cos(theta), np.sin(theta))) * R
trajectory_ = np.hstack((np.zeros((1, 1, 2)), np.cumsum(xy, axis=1)))
trajectory = np.abs(trajectory_[0] + start )
Output:
array([[ 0. , 0. ],
[ 2.69850332, 1.31075545],
[ 5.39700664, 2.62151089],
[ 8.09550996, 3.93226634],
[10.79401327, 5.24302179],
[14.3920177 , 6.99069571],
[21.58802655, 10.48604357],
[24.28652987, 11.79679902],
[26.98503318, 13.10755446],
[30.58303761, 14.85522839],
[33.28154093, 16.16598384],
[35.98004425, 17.47673929],
[38.67854756, 18.78749473],
[45.87455641, 22.28284259],
[49.47256084, 24.03051652],
[56.66856969, 27.52586437],
[59.36707301, 28.83661982],
[62.06557632, 30.14737527],
[65.66358075, 31.8950492 ],
[69.26158517, 33.64272312]])
Output is not correct. End point should be [25, 15].
Is it possible to change somehow code above to generate correct results?
First I think you were fooled by a simple but yet subtle mathematical error you made. In your illustration, you show the first path segment to be from (0,0) to (3,1). This however does not correspond to a "speed" of 4, as your object moves along the diagonal path. For the first line segment you would get a speed of
v = (1**2 + 3**2)**0.5 (= 3.162)
Even less obvious to me is, how you got to a speed of 4 in the second line segment, but you mentioned approximate coordinates, so I will assume, that these are not the exact coordinates that you are looking for.
That put aside, you will not generally be able to get to the exact ending point with different specified speeds, so I will show a solution, that gets just past the ending point and stops there.
import numpy as np
ue_speed = [3, 4, 8, 25]
ue_speed_prob = [0.4, 0.2, 0.3, 0.1]
square_size = 100 # need for scaling
time_interval = 100 # need for scaling
start_coord = [0, 0]
end_coord = [25, 15]
b = end_coord[0] - start_coord[0]
a = end_coord[1] - start_coord[1]
c = (a**2+b**2)**0.5
theta = np.arctan2(a,b)
steps = int(c//(min(ue_speed)*time_interval/square_size) + 1) # fixed scaling issue
v = np.random.choice(ue_speed, size=steps, p=ue_speed_prob)
R = np.cumsum((v * time_interval) / square_size)
R = R[:np.argmax(R>c)+1]
P = np.column_stack((R*np.cos(theta)+start_coord[0],
R*np.sin(theta)+start_coord[1]))
trajectory = P
This will print trajectory as
[[ 2.57247878 1.54348727]
[ 5.14495755 3.08697453]
[12.00490096 7.20294058]
[33.4422241 20.06533446]]
Note: You also made a very serious error in your code, when your converted the result of the math.asin to degrees and then used it again in trigonometric functions like np.sin and np.cos. I strongly recommand you to stick to radiant angle values and only convert them to degrees if you want to print them out.
Also I recommend the use of an arctan2 like function, to correctly get the angle of an x and y coordinate, as this will also work for negative x directions.
Correct, vectorized solution follows.
You will notice one of the samples is the end point. You can play with the numbers.
Also, I think it is best staying with Cartesian coordinates. No need to work with angles at all for this.
import numpy as np
# ue_speed = np.array([3, 4, 8, 25])
ue_speed = np.array([1, 1, 1, 1])
ue_speed_prob = np.array([0.4, 0.2, 0.3, 0.1])
steps = 200 # this parameter hardcoded and should be removed
time_interval = 1 # need for scaling
start_coord = np.array([0, 0])
end_coord = np.array([25, 15])
r_direction = end_coord - start_coord
r_hat = r_direction / np.linalg.norm(r_direction)
v = np.random.choice(ue_speed, size=steps + 1, p=ue_speed_prob)
dr = v * time_interval
dr_vecs = r_hat[np.newaxis, :] * dr[:, np.newaxis]
r_vecs = np.cumsum(dr_vecs, axis=0)
r = start_coord + r_vecs
print(r_vecs)
With given parameters, this outputs
[[ 0.85749293 0.51449576]
[ 1.71498585 1.02899151]
[ 2.57247878 1.54348727]
[ 3.4299717 2.05798302]
[ 4.28746463 2.57247878]
[ 5.14495755 3.08697453]
[ 6.00245048 3.60147029]
[ 6.85994341 4.11596604]
[ 7.71743633 4.6304618 ]
[ 8.57492926 5.14495755]
[ 9.43242218 5.65945331]
[ 10.28991511 6.17394907]
[ 11.14740803 6.68844482]
[ 12.00490096 7.20294058]
[ 12.86239389 7.71743633]
[ 13.71988681 8.23193209]
[ 14.57737974 8.74642784]
[ 15.43487266 9.2609236 ]
[ 16.29236559 9.77541935]
[ 17.14985851 10.28991511]
[ 18.00735144 10.80441086]
[ 18.86484437 11.31890662]
[ 19.72233729 11.83340237]
[ 20.57983022 12.34789813]
[ 21.43732314 12.86239389]
[ 22.29481607 13.37688964]
[ 23.15230899 13.8913854 ]
[ 24.00980192 14.40588115]
[ 24.86729485 14.92037691]
[ 25.72478777 15.43487266]
[ 26.5822807 15.94936842]
[ 27.43977362 16.46386417]
[ 28.29726655 16.97835993]
[ 29.15475947 17.49285568]
[ 30.0122524 18.00735144]
[ 30.86974533 18.5218472 ]
[ 31.72723825 19.03634295]
[ 32.58473118 19.55083871]
[ 33.4422241 20.06533446]
[ 34.29971703 20.57983022]
[ 35.15720995 21.09432597]
[ 36.01470288 21.60882173]
[ 36.87219581 22.12331748]
[ 37.72968873 22.63781324]
[ 38.58718166 23.15230899]
[ 39.44467458 23.66680475]
[ 40.30216751 24.18130051]
[ 41.15966043 24.69579626]
[ 42.01715336 25.21029202]
[ 42.87464629 25.72478777]
[ 43.73213921 26.23928353]
[ 44.58963214 26.75377928]
[ 45.44712506 27.26827504]
[ 46.30461799 27.78277079]
[ 47.16211091 28.29726655]
[ 48.01960384 28.8117623 ]
[ 48.87709677 29.32625806]
[ 49.73458969 29.84075381]
[ 50.59208262 30.35524957]
[ 51.44957554 30.86974533]
[ 52.30706847 31.38424108]
[ 53.16456139 31.89873684]
[ 54.02205432 32.41323259]
[ 54.87954725 32.92772835]
[ 55.73704017 33.4422241 ]
[ 56.5945331 33.95671986]
[ 57.45202602 34.47121561]
[ 58.30951895 34.98571137]
[ 59.16701187 35.50020712]
[ 60.0245048 36.01470288]
[ 60.88199773 36.52919864]
[ 61.73949065 37.04369439]
[ 62.59698358 37.55819015]
[ 63.4544765 38.0726859 ]
[ 64.31196943 38.58718166]
[ 65.16946235 39.10167741]
[ 66.02695528 39.61617317]
[ 66.88444821 40.13066892]
[ 67.74194113 40.64516468]
[ 68.59943406 41.15966043]
[ 69.45692698 41.67415619]
[ 70.31441991 42.18865195]
[ 71.17191283 42.7031477 ]
[ 72.02940576 43.21764346]
[ 72.88689869 43.73213921]
[ 73.74439161 44.24663497]
[ 74.60188454 44.76113072]
[ 75.45937746 45.27562648]
[ 76.31687039 45.79012223]
[ 77.17436331 46.30461799]
[ 78.03185624 46.81911374]
[ 78.88934917 47.3336095 ]
[ 79.74684209 47.84810525]
[ 80.60433502 48.36260101]
[ 81.46182794 48.87709677]
[ 82.31932087 49.39159252]
[ 83.17681379 49.90608828]
[ 84.03430672 50.42058403]
[ 84.89179965 50.93507979]
[ 85.74929257 51.44957554]
[ 86.6067855 51.9640713 ]
[ 87.46427842 52.47856705]
[ 88.32177135 52.99306281]
[ 89.17926427 53.50755856]
[ 90.0367572 54.02205432]
[ 90.89425013 54.53655008]
[ 91.75174305 55.05104583]
[ 92.60923598 55.56554159]
[ 93.4667289 56.08003734]
[ 94.32422183 56.5945331 ]
[ 95.18171475 57.10902885]
[ 96.03920768 57.62352461]
[ 96.89670061 58.13802036]
[ 97.75419353 58.65251612]
[ 98.61168646 59.16701187]
[ 99.46917938 59.68150763]
[100.32667231 60.19600339]
[101.18416523 60.71049914]
[102.04165816 61.2249949 ]
[102.89915109 61.73949065]
[103.75664401 62.25398641]
[104.61413694 62.76848216]
[105.47162986 63.28297792]
[106.32912279 63.79747367]
[107.18661571 64.31196943]
[108.04410864 64.82646518]
[108.90160157 65.34096094]
[109.75909449 65.85545669]
[110.61658742 66.36995245]
[111.47408034 66.88444821]
[112.33157327 67.39894396]
[113.18906619 67.91343972]
[114.04655912 68.42793547]
[114.90405205 68.94243123]
[115.76154497 69.45692698]
[116.6190379 69.97142274]
[117.47653082 70.48591849]
[118.33402375 71.00041425]
[119.19151667 71.51491 ]
[120.0490096 72.02940576]
[120.90650253 72.54390152]
[121.76399545 73.05839727]
[122.62148838 73.57289303]
[123.4789813 74.08738878]
[124.33647423 74.60188454]
[125.19396715 75.11638029]
[126.05146008 75.63087605]
[126.90895301 76.1453718 ]
[127.76644593 76.65986756]
[128.62393886 77.17436331]
[129.48143178 77.68885907]
[130.33892471 78.20335482]
[131.19641763 78.71785058]
[132.05391056 79.23234634]
[132.91140349 79.74684209]
[133.76889641 80.26133785]
[134.62638934 80.7758336 ]
[135.48388226 81.29032936]
[136.34137519 81.80482511]
[137.19886811 82.31932087]
[138.05636104 82.83381662]
[138.91385397 83.34831238]
[139.77134689 83.86280813]
[140.62883982 84.37730389]
[141.48633274 84.89179965]
[142.34382567 85.4062954 ]
[143.20131859 85.92079116]
[144.05881152 86.43528691]
[144.91630445 86.94978267]
[145.77379737 87.46427842]
[146.6312903 87.97877418]
[147.48878322 88.49326993]
[148.34627615 89.00776569]
[149.20376907 89.52226144]
[150.061262 90.0367572 ]
[150.91875493 90.55125296]
[151.77624785 91.06574871]
[152.63374078 91.58024447]
[153.4912337 92.09474022]
[154.34872663 92.60923598]
[155.20621955 93.12373173]
[156.06371248 93.63822749]
[156.92120541 94.15272324]
[157.77869833 94.667219 ]
[158.63619126 95.18171475]
[159.49368418 95.69621051]
[160.35117711 96.21070626]
[161.20867003 96.72520202]
[162.06616296 97.23969778]
[162.92365589 97.75419353]
[163.78114881 98.26868929]
[164.63864174 98.78318504]
[165.49613466 99.2976808 ]
[166.35362759 99.81217655]
[167.21112051 100.32667231]
[168.06861344 100.84116806]
[168.92610637 101.35566382]
[169.78359929 101.87015957]
[170.64109222 102.38465533]
[171.49858514 102.89915109]
[172.35607807 103.41364684]]
BONUS
if you want to not hard code the steps, you can do
steps = int((np.linalg.norm(end_coord - start_coord) / np.min(ue_speed)) / time_interval)
steps = steps if steps > 0 else 1
and the last sample will be approximately the end coord.

What does negative radius mean for ellipsoid

I've been using the Ellipsoid fit python module from https://github.com/aleksandrbazhin/ellipsoid_fit_python and I've mostly found it to be relatively good, but I've recently been running some data through it and I notice that I'm getting lots of negative radii:
points = np.array([[ 0.09149729, 0.03684962, -0.02292631],
[ 0.09248848, 0.03587991, -0.02036695],
[ 0.09290258, 0.03932948, -0.02168421],
[ 0.11715488, 0.02191344, -0.03957262],
[ 0.09938425, 0.02479092, -0.01535327],
[ 0.09911977, 0.02794963, -0.01118133],
[ 0.12063151, 0.03880141, -0.01510232],
[ 0.11984777, 0.02508288, -0.02870339],
[ 0.10012223, 0.02373475, -0.02195443],
[ 0.09790555, 0.02624265, -0.01190708],
[ 0.10180188, 0.02583424, -0.01340349],
[ 0.12224249, 0.02299428, -0.03712141],
[ 0.12637239, 0.03043518, -0.02760782],
[ 0.12438858, 0.02703345, -0.02828939],
[ 0.0974825 , 0.02577809, -0.01916746],
[ 0.12031736, 0.02822308, -0.03366493],
[ 0.1021885 , 0.02674174, -0.03242179],
[ 0.10101997, 0.03994928, -0.01519449],
[ 0.12693756, 0.03200349, -0.02941957],
[ 0.09250743, 0.0386544 , -0.02030381],
[ 0.11748721, 0.02688126, -0.02310617],
[ 0.11888266, 0.03919276, -0.01614771],
[ 0.1175726 , 0.02390139, -0.03775631],
[ 0.09802308, 0.02690862, -0.02278864],
[ 0.0974572 , 0.02665273, -0.0109419 ],
[ 0.11867452, 0.03764389, -0.01400771],
[ 0.10302589, 0.04016999, -0.01659405],
[ 0.12613943, 0.03701292, -0.02291183],
[ 0.12622967, 0.03926508, -0.01887258]])
centre3, radii3, evecs3, v3 = ellipsoid_fit(points )
radii3 = [-0.00490022, 0.05778404, -0.01372089]
The ellipsoid_fit function for some reason applies a sign to the radii - I don't understand why the radius would have a sign, should it not just be absolute values?
Can I simply just ignore these signs and take the absolute values? If not, what does a negative radius mean?

Trying to create a graph dynamically

I can manually create a chart of kmeans data, with 5 centroids (code below).
# computing K-Means with K = 5 (5 clusters)
centroids,_ = kmeans(data,5)
# assign each sample to a cluster
idx,_ = vq(data,centroids)
# some plotting using numpy's logical indexing
plot(data[idx==0,0],data[idx==0,1],'ob',
data[idx==1,0],data[idx==1,1],'oy',
data[idx==2,0],data[idx==2,1],'or',
data[idx==3,0],data[idx==3,1],'og',
data[idx==4,0],data[idx==4,1],'om')
plot(centroids[:,0],centroids[:,1],'sg',markersize=15)
show()
Now, I am trying to figure out how to dynamically create a chart in Python. I thin it should be something like this (below) but it doesn't actually work.
for i in range(2, 20):
plot(data[idx==[i],0],data[idx==[i],1],'some_dynamic_color'
plot(centroids[:,0],centroids[:,1],'sg',markersize=15)
show()
Finally, here is my array of data, for reference. Not sure it's even relevant to the problem at hand.
array([[ 0.01160815, 0.28552583],
[ 0.01495681, 0.24965798],
[ 0.52218559, 0.26969486],
[ 0.16408791, 0.30713289],
[ 0.35037607, 0.28401598],
[-0.32413957, 0.53144262],
[ 0.10853278, 0.19756793],
[ 0.08275109, 0.18140047],
[-0.04350157, 0.26407197],
[-0.04789838, 0.31644537],
[-0.03852801, 0.21557165],
[ 0.02213885, 0.20033466],
[-0.80612714, 0.35888803],
[-0.27971428, 0.3195602 ],
[ 0.21359135, 0.14144335],
[ 0.09936109, 0.22313638],
[ 0.15504834, 0.17022939],
[ 0.47012351, 0.41452523],
[ 0.28616062, 0.23098198],
[ 0.25941178, 0.14843141],
[ 0.20049158, 0.23769455],
[-0.19766684, 0.39110416],
[-0.29619519, 0.53520109],
[ 0.29319037, 0.23907492],
[ 0.16644319, 0.18737667],
[ 0.37407685, 0.22463339],
[-0.34262982, 0.40264906],
[ 0.52658291, 0.3542729 ],
[ 0.5747167 , 0.50042607],
[ 0.15607962, 0.20861585],
[-0.50769188, 0.34266008],
[ 0.43373588, 0.22526141],
[ 0.1624051 , 0.29859298],
[ 0.22789948, 0.20157262],
[-0.1179015 , 0.21471169],
[ 0.26108742, 0.26604149],
[ 0.10019146, 0.25547835],
[ 0.18906467, 0.19078555],
[-0.02575308, 0.2877592 ],
[-0.45292564, 0.51866493],
[ 0.11516754, 0.21504329],
[ 0.10020043, 0.23943587],
[ 0.21402611, 0.34297039],
[ 0.24574342, 0.15734118],
[ 0.58083355, 0.22886509],
[ 0.33975699, 0.33309233],
[ 0.19002609, 0.14372212],
[ 0.35220577, 0.23879166],
[ 0.27427999, 0.1529184 ],
[ 0.06261825, 0.18908223],
[ 0.25005859, 0.21363957],
[ 0.1676683 , 0.26111871],
[ 0.14703364, 0.25532777],
[ 0.26130579, 0.14012819],
[-0.14897454, 0.23037735],
[-0.26827493, 0.23193457],
[ 0.51701526, 0.17887009],
[-0.05870745, 0.18040883],
[ 0.25651599, 0.227289 ],
[ 0.06881783, 0.28114007],
[ 0.43079653, 0.21510341]])
Any thoughts on how I can create the chart dynamically?
Thanks.
Index i of for loop should be from 0 to 4 (Thiere are 5 centroids).
for i in range(0, 5):
plot(data[idx==[i],0],data[idx==[i],1],'some_dynamic_color' ...
I reproduced like below. Using matplotlib and scipy.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sklearn.cluster import KMeans
data = np.array(#your data)
kmeans = KMeans(n_clusters=5)
kmeans.fit(data)
y_kmeans = kmeans.predict(data)
viridis = cm.get_cmap('viridis', 5)
for i in range(0, len(data)):
plt.scatter(data[i,0], data[i,1], c=viridis(y_kmeans[i]), s= 50)
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5)
k-Means ref https://jakevdp.github.io/PythonDataScienceHandbook/05.11-k-means.html

matplotlib plot multiple 2 point lines, not joined up

I'd like to print a series of ticks on a scatter plot, the pairs of x and y points are stored in two nx2 arrays. Instead of small ticks between the pairs of points, it is printing lines between all the points. Do I need to create n lines?
xs.round(2)
Out[212]:
array([[ 555.59, 557.17],
[ 867.64, 869. ],
[ 581.95, 583.25],
[ 822.08, 823.47],
[ 198.46, 199.91],
[ 887.29, 888.84],
[ 308.68, 310.06],
[ 340.1 , 341.52],
[ 351.68, 353.21],
[ 789.45, 790.89]])
ys.round(2)
Out[213]:
array([[ 737.55, 738.78],
[ 404.7 , 406.17],
[ 7.17, 8.69],
[ 276.72, 278.16],
[ 84.71, 86.1 ],
[ 311.89, 313.14],
[ 615.63, 617.08],
[ 653.9 , 655.32],
[ 76.33, 77.62],
[ 858.54, 859.93]])
plt.plot(xs, ys)
The easiest solution is indeed to plot n lines.
import numpy as np
import matplotlib.pyplot as plt
xs =np.array([[ 555.59, 557.17],
[ 867.64, 869. ],
[ 581.95, 583.25],
[ 822.08, 823.47],
[ 198.46, 199.91],
[ 887.29, 888.84],
[ 308.68, 310.06],
[ 340.1 , 341.52],
[ 351.68, 353.21],
[ 789.45, 790.89]])
ys = np.array([[ 737.55, 738.78],
[ 404.7 , 406.17],
[ 7.17, 8.69],
[ 276.72, 278.16],
[ 84.71, 86.1 ],
[ 311.89, 313.14],
[ 615.63, 617.08],
[ 653.9 , 655.32],
[ 76.33, 77.62],
[ 858.54, 859.93]])
for (x,y) in zip(xs,ys):
plt.plot(x,y, color="crimson")
plt.show()
If n is very large, a more efficient solution would be to use a single LineCollection to show all lines. The advantage is that this can be drawn faster, since only a single collection is used instead of n line plots.
# data as above.
seq = np.concatenate((xs[:,:,np.newaxis],ys[:,:,np.newaxis]), axis=2)
c= matplotlib.collections.LineCollection(seq)
plt.gca().add_collection(c)
plt.gca().autoscale()
plt.show()
You need to iterate over the end points of the arrays xs and ys:
import matplotlib.pyplot as plt
import numpy as np
xs = np.array([[ 555.59, 557.17],
[ 867.64, 869. ],
[ 581.95, 583.25],
[ 822.08, 823.47],
[ 198.46, 199.91],
[ 887.29, 888.84],
[ 308.68, 310.06],
[ 340.1 , 341.52],
[ 351.68, 353.21],
[ 789.45, 790.89]])
ys = np.array([[ 737.55, 738.78],
[ 404.7 , 406.17],
[ 7.17, 8.69],
[ 276.72, 278.16],
[ 84.71, 86.1 ],
[ 311.89, 313.14],
[ 615.63, 617.08],
[ 653.9 , 655.32],
[ 76.33, 77.62],
[ 858.54, 859.93]])
for segment in zip(xs, ys):
plt.plot(segment)
plt.show()

tf.matmul to find theta that minimizes the cost function

I am using TensorFlow to calculate the value of theta that minimizes the cost function where theta = Inverse of(xT . X) . xT . y
I am following along the tutorial in https://github.com/ageron/handson-ml/blob/master/09_up_and_running_with_tensorflow.ipynb
booktheta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)),XT),y)
mytheta = tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)),tf.matmul(XT,y))
with tf.Session() as sess:
book_tvalue= booktheta.eval()
print(book_tvalue)
[[ -3.74651413e+01]
[ 4.35734153e-01]
[ 9.33829229e-03]
[ -1.06622010e-01]
[ 6.44106984e-01]
[ -4.25131839e-06]
[ -3.77322501e-03]
[ -4.26648885e-01]
[ -4.40514028e-01]]
with tf.Session as sess:
my_tvalue= mytheta.eval()
print(my_tvalue)
[[ -3.74218750e+01]
[ 4.35844421e-01]
[ 9.34600830e-03]
[ -1.06735229e-01]
[ 6.44439697e-01]
[ -4.22634184e-06]
[ -3.77440453e-03]
[ -4.26208496e-01]
[ -4.40002441e-01]]
I am trying to understand why the two are different and if I am interpreting the way theta is being calculated accurately.

Categories