How to plot lines for individual rows in matplotlib? - python

Each row in the dataset has three datapoints. How can I plot a line for each one, as indicated?
import matplotlib.pyplot as plt
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:, 0:2].values
y = dataset.iloc[:, -1].values
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], y, marker='.', color="red")
ax.set_xlabel("Cone")
ax.set_ylabel("Time")
ax.set_zlabel("Temp")
plt.show()
This is the data. SO wont let me save the post now that I have added the data because it says my question is mostly code, so I am writing this longwinded thing so that hopefully it lets me post. You can just ignore this paragraph. It is only here to balance out the code with prose so that Stack overflow will let me post.
cone,ramp,temp
4,15,1141
4,60,1162
4,150,1183
5,15,1159
5,60,1186
5,150,1207
6,15,1185
6,60,1222
6,150,1243
7,15,1201
7,60,1239
7,150,1257
8,15,1211
8,60,1249
8,150,1271
9,15,1224
9,60,1260
9,150,1280
10,15,1251
10,60,1285
10,150,1305
11,15,1272
11,60,1294
11,150,1315
12,15,1285
12,60,1306
12,150,1326
13,15,1310
13,60,1331
13,150,1348
14,15,1351
14,60,1365
14,150,1384

One way is to loop over the unique values of the cone column:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')
for u in dataset["cone"].unique():
extracted_df = dataset[dataset["cone"] == u]
values = extracted_df.values
ax.plot(values[:, 0], values[:, 1], values[:, 2], color="red")
ax.set_xlabel("Cone")
ax.set_ylabel("Time")
ax.set_zlabel("Temp")
plt.show()

Related

How to label cluster after applying to k-mean clustering to dataset?

I have a dataset in .csv format which looks like this -
data
x,y,z, label
2,1,3, A
5,3,1, B
6,2,2, C
9,5,3, B
2,3,4, A
4,1,4, A
I would like to apply k-mean clustering to the above dataset. As we see above the 3 dimension dataset(x-y-z). And after that, I would like to visualize the clustering in 3-dimension with a specific cluster label in diagram. Please let know if you need more details.
I have used for 2-dimension dataset as see below -
kmeans_labels = cluster.KMeans(n_clusters=5).fit_predict(data)
And plot the visualize for 2-dimension dataset,
plt.scatter(standard_embedding[:, 0], standard_embedding[:, 1], c=kmeans_labels, s=0.1, cmap='Spectral');
Similarly, I would like to plot 3-dimension clustering with label. Please let me know if you need more details.
Could something like that be a good solution?
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
data = np.array([[2,1,3], [5,3,1], [6,2,2], [9,5,3], [2,3,4], [4,1,4]])
cluster_count = 3
km = KMeans(cluster_count)
clusters = km.fit_predict(data)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(data[:, 0], data[:, 1], data[:, 2], c=clusters, alpha=1)
labels = ["A", "B", "C"]
for i, label in enumerate(labels):
ax.text(km.cluster_centers_[i, 0], km.cluster_centers_[i, 1], km.cluster_centers_[i, 2], label)
ax.set_title("3D K-Means Clustering")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
plt.show()
EDIT
If you want a legend instead, just do this:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
data = np.array([[2,1,3], [5,3,1], [6,2,2], [9,5,3], [2,3,4], [4,1,4]])
cluster_count = 3
km = KMeans(cluster_count)
clusters = km.fit_predict(data)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(data[:, 0], data[:, 1], data[:, 2], c=clusters, alpha=1)
handles = scatter.legend_elements()[0]
ax.legend(title="Clusters", handles=handles, labels = ["A", "B", "C"])
ax.set_title("3D K-Means Clustering")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
plt.show()

How to plot scatter plot using python?

I have used this code to create clusters and I want to plot the scatter plot of the clusters. The vectorAssembles_01 produces data with ID and features. Both should be used to plot the scatter plot.When I am running the code in google Collab I am getting an error message stating RecursionError: maximum recursion depth exceeded in comparison. please correct if I am wrong.
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
import numpy as np
import matplotlib.pyplot as plt
FEATURES_COL = ['Height(CM)', 'Weight(KG)',
'Crossing', 'Finishing', 'HeadingAccuracy',
'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
'FKAccuracy', 'LongPassing', 'BallControl',
'Acceleration', 'SprintSpeed', 'Agility',
'Reactions', 'Balance', 'ShotPower', 'Jumping',
'Stamina', 'Strength', 'LongShots', 'Aggression',
'Interceptions', 'Positioning', 'Vision', 'Penalties',
'Composure', 'Marking', 'StandingTackle', 'SlidingTackle']
vecAssembler_01 = VectorAssembler(inputCols=FEATURES_COL, outputCol="features")
df_kmeansn = vecAssembler_01.transform(df).select('ID','features')
df_kmeansn.show()
#df_kmeansn.plot("ID","fearures",kind="Scatter")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
x = df_kmeansn.ID
y = df_kmeansn.features
ax.scatter(x, y, alpha=0.8, edgecolors='none')
The output of the df_kmeansn is as shown below.
I'm not sure you can just plot Spark Dataframe directly, perhaps you should call "to_pandas" first
# ...
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df_pandas = df_kmeansn.to_pandas()
x = df_pandas.ID
y = df_pandas.features
ax.scatter(x, y, alpha=0.8, edgecolors='none')

how to generate a series of histograms on matplotlib?

I would like to generate a series of histogram shown below:
The above visualization was done in tensorflow but I'd like to reproduce the same visualization on matplotlib.
EDIT:
Using plt.fill_between suggested by #SpghttCd, I have the following code:
colors=cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i]
edgecolor='w')
plt.show()
This works great, but is it possible to use histogram instead of a continuous curve?
EDIT:
joypy based approach, like mentioned in the comment of october:
import pandas as pd
import joypy
import numpy as np
df = pd.DataFrame()
for i in range(0, 400, 20):
df[i] = np.random.normal(i/410*5, size=30)
joypy.joyplot(df, overlap=2, colormap=cm.OrRd_r, linecolor='w', linewidth=.5)
for finer control of colors, you can define a color gradient function which accepts a fractional index and start and stop color tuples:
def color_gradient(x=0.0, start=(0, 0, 0), stop=(1, 1, 1)):
r = np.interp(x, [0, 1], [start[0], stop[0]])
g = np.interp(x, [0, 1], [start[1], stop[1]])
b = np.interp(x, [0, 1], [start[2], stop[2]])
return (r, g, b)
Usage:
joypy.joyplot(df, overlap=2, colormap=lambda x: color_gradient(x, start=(.78, .25, .09), stop=(1.0, .64, .44)), linecolor='w', linewidth=.5)
Examples with different start and stop tuples:
original answer:
You could iterate over your dataarrays you'd like to plot with plt.fill_between, setting colors to some gradient and the line color to white:
creating some sample data:
import numpy as np
t = np.linspace(-1.6, 1.6, 11)
y = np.cos(t)**2
y2 = lambda : y + np.random.random(len(y))/5-.1
plot the series:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
colors = cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
for i in range(10):
plt.fill_between(t+i, y2()+10-i/10, 10-i/10, facecolor = colors[i], edgecolor='w')
If you want it to have more optimized towards your example you should perhaps consider providing some sample data.
EDIT:
As I commented below, I'm not quite sure if I understand what you want - or if you want the best for your task. Therefore here a code which plots besides your approach in your edit two smples of how to present a bunch of histograms in a way that they are better comparable:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
N = 10
np.random.seed(42)
colors=cm.OrRd_r(np.linspace(.2, .6, N))
fig1 = plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i],
edgecolor='w')
data = np.random.binomial(20, .3, (N, 100))
fig2, axs = plt.subplots(N, figsize=(10, 6))
for i, d in enumerate(data):
axs[i].hist(d, range(20), color=colors[i], label=str(i))
fig2.legend(loc='upper center', ncol=5)
fig3, ax = plt.subplots(figsize=(10, 6))
ax.hist(data.T, range(20), color=colors, label=[str(i) for i in range(N)])
fig3.legend(loc='upper center', ncol=5)
This leads to the following plots:
your plot from your edit:
N histograms in N subplots:
N histograms side by side in one plot:

Scatterplot in matplotlib with legend and randomized point order

I'm trying to build a scatterplot of a large amount of data from multiple classes in python/matplotlib. Unfortunately, it appears that I have to choose between having my data randomised and having legend labels. Is there a way I can have both (preferably without manually coding the labels?)
Minimum reproducible example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
Plotting with randomized points:
plot_idx = np.random.permutation(data.shape[0])
colors = pd.factorize(classes)
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
label=classes[plot_idx],
alpha=0.4)
plt.legend()
plt.show()
This gives me the wrong legend.
Plotting with the correct legend:
from matplotlib import cm
unique_classes = np.unique(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
for i, class in enumerate(unique_classes):
ax.scatter(data[classes == class, 0],
data[classes == class, 1],
c=colors[i],
label=class,
alpha=0.4)
plt.legend()
plt.show()
But now the points are not randomized and the resulting plot is not representative of the data.
I'm looking for something that would give me a result like I get as follows in R:
library(ggplot2)
X <- matrix(rnorm(10000, 0, 1), ncol=2)
Y <- matrix(rnorm(10000, 0.5, 1), ncol=2)
data <- as.data.frame(rbind(X, Y))
data$classes <- rep(c('X', 'Y'), times=nrow(X))
plot_idx <- sample(nrow(data))
ggplot(data[plot_idx,], aes(x=V1, y=V2, color=classes)) +
geom_point(alpha=0.4, size=3)
You need to create the legend manually. This is not a big problem though. You can loop over the labels and create a legend entry for each. Here one may use a Line2D with a marker similar to the scatter as handle.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
plot_idx = np.random.permutation(data.shape[0])
colors,labels = pd.factorize(classes)
fig, ax = plt.subplots()
sc = ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
alpha=0.4)
h = lambda c: plt.Line2D([],[],color=c, ls="",marker="o")
plt.legend(handles=[h(sc.cmap(sc.norm(i))) for i in range(len(labels))],
labels=list(labels))
plt.show()
Alternatively you can use a special scatter handler, as shown in the quesiton Why doesn't the color of the points in a scatter plot match the color of the points in the corresponding legend? but that seems a bit overkill here.
It's a bit of a hack, but you can save the axis limits, set the labels by drawing points well outside the limits of the plot, and then resetting the axis limits as follows:
plot_idx = np.random.permutation(data.shape[0])
color_idx, unique_classes = pd.factorize(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[color_idx[plot_idx]],
alpha=0.4)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
for i in range(len(unique_classes)):
ax.scatter(xlim[1]*10,
ylim[1]*10,
c=colors[i],
label=unique_classes[i])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.legend()
plt.show()

Plotting Acclerometer Data With Respect To Time

So I am trying to plot accelerometer data with regards to time, my csv reads like this(columns -> time, x, y, z):
1518999378635,2.275090217590332,8.601768493652344,3.691260576248169
1518999378653,2.38462495803833,8.633491516113281,4.0964789390563965
1518999378658,2.449866771697998,8.506000518798828,4.082113742828369
1518999378667,2.4372973442077637,8.166622161865234,4.016273498535156
1518999378675,1.8381483554840088,8.848969459533691,4.086902141571045
1518999378681,1.1402385234832764,8.762179374694824,4.225766181945801
1518999378688,1.7818846702575684,8.652046203613281,3.6110546588897705
1518999378694,2.076371431350708,8.80467700958252,4.0527849197387695
1518999378700,2.3720552921295166,8.471882820129395,4.120420932769775
My initial bet (as given below!) was to use a scatter with time as color, however the output is, well, not very obvious.
from numpy import genfromtxt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
if __name__ == "__main__":
print("Plotting Accelerometer Data")
acm_data = genfromtxt("acm_data.csv", delimiter=',', names="time, acc_x, acc_y, acc_z")
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = acm_data["acc_x"]
y = acm_data["acc_y"]
z = acm_data["acc_z"]
c = acm_data["time"]
ax.scatter(x, y, z, c=c, cmap=plt.hot())
plt.show()
The output looks viz:
and is not very interpretable. What would be the best way to handle this?
Thanks.
Something like this:
import matplotlib.pyplot as plt
x = [0, 1, 2, 3]
x_accel = [5, 6, 3, 4]
y_accel = [2, 7, 6, 8]
z_accel = [1, 2, 3, 4]
plt.subplot(3, 1, 1)
plt.plot(x, x_accel, '.-')
plt.title('A tale of 3 subplots')
plt.ylabel('X acceleration')
plt.subplot(3, 1, 2)
plt.plot(x, y_accel, '.-')
plt.xlabel('time (s)')
plt.ylabel('Y acceleration')
plt.subplot(3, 1, 3)
plt.plot(x, z_accel, '.-')
plt.xlabel('time (s)')
plt.ylabel('Z acceleration')
plt.show()
Generates:
Of course you'll have to mess with your axes and what not to make the presentation of your data as clear as possible. But in general, this is much clearer than what is posted in your question.
Well, here's my answer (break it into 3 2-dimensional plots):
from numpy import genfromtxt
import matplotlib.pyplot as plt
import numpy as np
if __name__ == "__main__":
print("Plotting Accelerometer Data")
acm_data = genfromtxt("acm_data.csv", delimiter=',', names="time, acc_x, acc_y, acc_z")
fig = plt.figure()
x = acm_data["acc_x"]
y = acm_data["acc_y"]
z = acm_data["acc_z"]
t = acm_data["time"]
for dat, num, axis in zip((x,y,z), range(311, 314), "XYZ"):
plt.subplot(num)
plt.plot(t, dat, ".")
plt.title("%s-axis" %axis)
plt.show()
Which gave me this as the visual output:
Visual output
Which is more readable that color-codes.
Notes:
1) If you want to connect them, remove the "." or change it to "-"
2) This was on Python 3.4
3) If you wanted, you could also add labels on the left and bottom of the graphs.

Categories