features = ["Ask1", "Bid1", "smooth_midprice", "BidSize1", "AskSize1"]
client = InfluxDBClient(host='127.0.0.1', port=8086, database='data',
username=username, password=password)
series = "DCIX_2016_11_15"
sql = "SELECT * FROM {} where time >= '{}' AND time <= '{}' ".format(series,FROMT,TOT)
df = pd.DataFrame(client.query(sql).get_points())
#Separating out the features
X = df.loc[:, features].values
# Standardizing the features
X = StandardScaler().fit_transform(X)
tsne = TSNE(n_components=3, n_jobs=5).fit_transform(X)
I would like map my 5 features into a 2D or 3D plot. I am a bit confused how to do that. How can I build a plot from that information?
You already have most of the work done. t-SNE is a common visualization for understanding high-dimensional data, and right now the variable tsne is an array where each row represents a set of (x, y, z) coordinates from the obtained embedding. You could use other visualizations if you would like, but t-SNE is probably a good starting place.
As far as actually seeing the results, even though you have the coordinates available you still need to plot them somehow. The matplotlib library is a good option, and that's what we'll use here.
To plot in 2D you have a couple of options. You can either keep most of your code the same and simply perform a 2D t-SNE with
tsne = TSNE(n_components=2, n_jobs=5).fit_transform(X)
Or you can just use the components you have and only look at two of them at a time. The following snippet should handle either case:
import matplotlib.pyplot as plt
plt.scatter(*zip(*tsne[:,:2]))
plt.show()
The zip(*...) transposes your data so that you can pass the x coordinates and the y coordinates individually to scatter(), and the [:,:2] piece selects two coordinates to view. You could ignore it if your data is already 2D, or you could replace it with something like [:,[0,2]] to view, for example, the 0th and 2nd features in higher-dimensional data rather than just the first 2.
To plot in 3D the code looks much the same, at least for a minimal version.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(*zip(*tsne))
plt.show()
The main differences are a use of 3D plotting libraries and making a 3D subplot.
Adding color: t-SNE visualizations are typically more helpful if they're color-coded somehow. One example might be the smooth midprice you currently have stored in X[:,2]. For exploratory visualizations, I find 2D plots more helpful, so I'll use that as the example:
plt.scatter(*zip(*tsne[:,:2]), c=X[:,2])
You still need the imports and whatnot, but by passing the keyword argument c you can color code the scatter plot. To adjust how that numeric data is displayed, you could use a different color map like so:
plt.scatter(*zip(*tsne[:,:2]), c=X[:,2], cmap='RdBu')
As the name might suggest, this colormap consists of a gradient between red and blue, and the lower values of X[:,2] will correspond to red.
Related
I am currently taking a Matplotlib class. I was given an image to create the image as a 3D subplot 4 times at 4 different angles. It's a linear plot. As the data changes the plots change colors. As it's an image, I'm not certain where the actual changes start. I don't want an exact answer, just an explanation of how this would work. I have found many methods for doing this for a small list but this has 75 data points and I can't seem to do it without adding 75 entries.
I've also tried to understand cmap but I am confused on it as well.
Also, it needs to done without Seaborn.
This is part of the photo.
I am finding your question a little bit hard to understand. What I think you need is a function to map the input x/y argument onto a colour in your chosen colour map. See the below example:
import numpy as np
import matplotlib.pyplot
def number_to_colour(number, total_number):
return plt.cm.rainbow(np.linspace(0,1.,total_number))[list(number)]
x = np.arange(12)
y = x*-3.
z = x
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c=number_to_colour(x, len(x)))
plt.show()
plt.cm.rainbow(np.linspace(0,1.,total_number)) creates an array of colours of length total_number evenly spaced spaced across the colour map (in this case rainbow). Modifying the indexing of this array (or changing np.linspace to another function with the desired scaling), should give you the colour scaling that you need.
First of all, if anyone has a link to a good tutorial to creating colomaps with geoviews or holoviews and transporting that to a dashbooard please send a link. I am trying to mimick what they did at the timestamp in the video here . Also having a hard time finding good documentation of geoviews other than the few examples on their website, so a point to the full docs would be great.
Anyways, I have a pretty basic plot I think. It a mesh of x a mesh of y and a mesh of a z value. I want to plot this in geoviews. It contains interpolated motions from GPS stations basically and I want to make a colormap of the z value. I can plot this really easily with matplotlib with a simple
plot = plt.scatter(mesh_x, mesh_y, c = z1, cmap = cm.hsv)
but trying to get this into geoviews makes a really funky dataframe.
running print(np.shape(mesh_x),np.shape(mesh_y), np.shape(z1)) shows the shape of all of these are (41,348). If I try to put them into a single array with a = np.array((mesh_x,mesh_y,z1)) I get an array of shape (3,41,348) as expected. From here I am really just guessing on what to do. When I try to put this into a geoviews points data frame with
points = [a[0], a[1], a[2]]
df = gv.Points(points)
df.dframe()
and then run df.dframe() it shows two columns, longitude and lattitude with incorrect values, here is a screenshot of what it shows if its helpful
I have tried converting to an xarray because it seems that is preferred in all the examples shown on geoviews website but that looks funky as well. When I try xrtest = xr.DataArray((mesh_x,mesh_y,z1)) I get a xarray that looks like this
At this point I have no idea what to do. I have tried a few different ways that I though may work but I can't remember all of them. This is where I am at now. I am sure I am doing something completely wrong, I just have no idea how to do it correctly. Thank you
Assuming you want a points plot as you are using in Matplotlib, the HoloViews equivalent to plt.scatter is hv.Points. hv.Points accepts a tidy data format that you can get by transposing the data compared to Matplotlib:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
mesh_x = [1,2,3,6]
mesh_y = [6,2,8,0]
z1 = [0.5, 4, 6,2]
plot = plt.scatter(mesh_x, mesh_y, c = z1, cmap = cm.hsv)
import holoviews as hv
hv.extension('matplotlib')
hv.Points(zip(mesh_x,mesh_y,z1), kdims=["x","y"], vdims=["z"]).opts(color='z', cmap="hsv")
Here kdims=["x","y"], is optional but is explicit about the key dimensions you want. You may also want to consider hvPlot, which handles the same data format as plt.scatter:
import pandas as pd
df = pd.DataFrame(dict(x=mesh_x,y=mesh_y,z=z1))
import hvplot.pandas
df.hvplot.scatter(x="x", y="y", c="z", cmap="hsv")
I want to create 10 violin plots but within one diagram. I looked at many examples like this one: Violin plot matplotlib, what shows what I would like to have at the end.
But I did not know how to adapt it to a real data set. They all just generate some random data which is normal distributed.
I have data in form D[10,730] and if I try to adapt it from the link above with :
example:
axes[0].violinplot(all_data,showmeans=False,showmedians=True)
my code:
axes[0].violinplot(D,showmeans=False,showmedians=True)
it do not work.
It should print 10 violin plot in parallel (first dimension of D).
So how do my data need to look like to get the same type of violin plot?
You just need to transpose your data array D.
axes[0].violinplot(D.T,showmeans=False,showmedians=True)
This appears to be a small bug in matplotlib. The axes are treated in a non-consistent manner for a list of 1D arrays and a 2D array.
import numpy as np
import matplotlib.pyplot as plt
n_datasets = 10
n_samples = 730
data = np.random.randn(n_datasets,n_samples)
fig, axes = plt.subplots(1,3)
# http://matplotlib.org/examples/statistics/boxplot_vs_violin_demo.html
axes[0].violinplot([d for d in data])
# should be equivalent to:
axes[1].violinplot(data)
# is actually equivalent to
axes[2].violinplot(data.T)
You should file a bug report.
If i want to color a square grid with different color in each grid cells, then it is possible in MATLAB with a simple call to imagesc command like here.
What if i want to color different cells in a grid like this:
Is this functionality available by default in either python or Matlab? I tried discretizing this grid with very small square cells. And then color each cell. That works. But it seems ordinary. Is there a smarter way to get his done?
In python, there is the builtin polar projection for the axes. This projection allows you to automatically use almost every plotting method in polar coordinates. In particular, you need to you pcolor or pcolormesh as follows
import numpy as np
from matplotlib import pyplot as plt
r = np.linspace(0,4,5)
theta = np.linspace(0,2*np.pi,10)
theta,r = np.meshgrid(theta,r)
values = np.random.rand(*(theta.shape))
ax = plt.subplot(111,polar=True)
ax.pcolor(theta,r,values)
plt.show()
Note that this will produce a plot like this
which is almost what you want. The obvious problem is that the patch vertices are joined by straight lines and not lines that follow the circle arc. You can solve this by making the angles array denser. Here is a posible way to do it.
import numpy as np
from matplotlib import pyplot as plt
r = np.linspace(0,4,5)
theta = np.linspace(0,2*np.pi,10)
values = np.random.rand(r.size,theta.size)
dense_theta = np.linspace(0,2*np.pi,100)
v_indeces = np.zeros_like(dense_theta,dtype=np.int)
i = -1
for j,dt in enumerate(dense_theta):
if dt>=theta[i+1]:
i+=1
v_indeces[j] = i
T,R = np.meshgrid(dense_theta,r)
dense_values = np.zeros_like(T)
for i,v in enumerate(values):
for j,ind in enumerate(v_indeces):
dense_values[i,j] = v[ind]
ax = plt.subplot(111,polar=True)
ax.pcolor(T,R,dense_values)
plt.show()
Which would produce
I am not aware of a way to do this in matlab but I googled around and found this that says it can produce pcolor plots in polar coordinates. You should check it out.
I was wondering if there's a way to plot a data cube in Python. I mean I have three coordinate for every point
x=part.points[:,0]
y=part.points[:,1]
z=part.points[:,2]
And for every point I have a scalar field t(x,y,z)
I would like to plot a 3D data cube showing the position of the point and for every point a color which is proportional to the scalar field t in that point.
I tried with histogramdd but it didn't work.
You can use matplotlib.
Here you have a working example (that moves!):
import random
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
mypoints = []
for _ in range(100):
mypoints.append([random.random(), #x
random.random(), #y
random.random(), #z
random.randint(10,100)]) #scalar
data = zip(*mypoints) # use list(zip(*mypoints)) with py3k
fig = pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data[0], data[1], data[2], c=data[3])
pyplot.show()
You probably have to customize the relation of your scalar values with the corresponding colors.
Matplotlib has a very nice look but it can be slow drawing and moving these 3D drawings when you have many points. In these cases I used to use Gnuplot controlled by gnuplot.py. Gnuplot can also be used directly as a subprocess as shown here and here.
Another option is Dots plot, produced by MathGL. It is GPL plotting library. Add it don't need many memory if you save in bitmap format (PNG, JPEG, GIF and so on).