Plotting from array to geoviews/holoviews. Converting to xarray needed? - python

First of all, if anyone has a link to a good tutorial to creating colomaps with geoviews or holoviews and transporting that to a dashbooard please send a link. I am trying to mimick what they did at the timestamp in the video here . Also having a hard time finding good documentation of geoviews other than the few examples on their website, so a point to the full docs would be great.
Anyways, I have a pretty basic plot I think. It a mesh of x a mesh of y and a mesh of a z value. I want to plot this in geoviews. It contains interpolated motions from GPS stations basically and I want to make a colormap of the z value. I can plot this really easily with matplotlib with a simple
plot = plt.scatter(mesh_x, mesh_y, c = z1, cmap = cm.hsv)
but trying to get this into geoviews makes a really funky dataframe.
running print(np.shape(mesh_x),np.shape(mesh_y), np.shape(z1)) shows the shape of all of these are (41,348). If I try to put them into a single array with a = np.array((mesh_x,mesh_y,z1)) I get an array of shape (3,41,348) as expected. From here I am really just guessing on what to do. When I try to put this into a geoviews points data frame with
points = [a[0], a[1], a[2]]
df = gv.Points(points)
df.dframe()
and then run df.dframe() it shows two columns, longitude and lattitude with incorrect values, here is a screenshot of what it shows if its helpful
I have tried converting to an xarray because it seems that is preferred in all the examples shown on geoviews website but that looks funky as well. When I try xrtest = xr.DataArray((mesh_x,mesh_y,z1)) I get a xarray that looks like this
At this point I have no idea what to do. I have tried a few different ways that I though may work but I can't remember all of them. This is where I am at now. I am sure I am doing something completely wrong, I just have no idea how to do it correctly. Thank you

Assuming you want a points plot as you are using in Matplotlib, the HoloViews equivalent to plt.scatter is hv.Points. hv.Points accepts a tidy data format that you can get by transposing the data compared to Matplotlib:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
mesh_x = [1,2,3,6]
mesh_y = [6,2,8,0]
z1 = [0.5, 4, 6,2]
plot = plt.scatter(mesh_x, mesh_y, c = z1, cmap = cm.hsv)
import holoviews as hv
hv.extension('matplotlib')
hv.Points(zip(mesh_x,mesh_y,z1), kdims=["x","y"], vdims=["z"]).opts(color='z', cmap="hsv")
Here kdims=["x","y"], is optional but is explicit about the key dimensions you want. You may also want to consider hvPlot, which handles the same data format as plt.scatter:
import pandas as pd
df = pd.DataFrame(dict(x=mesh_x,y=mesh_y,z=z1))
import hvplot.pandas
df.hvplot.scatter(x="x", y="y", c="z", cmap="hsv")

Related

Python: Histogram return wrong values for counts (EDIT: more general with example)

EDIT: Ive found a general example where it doesnt work either!
I am trying to extract the data for a histogram, but different counts seem wrong. As an example code:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.rand(1000000)
bins = np.arange(0,1,0.0001)
a,b,c = plt.hist(data,bins)
This gives me this rather messy histogram, and i've saved the counts as a and the interval as b. Now, plotting a and b, I should expect the same histogram, right? But that's not what I get:
plt.scatter(b[0:len(b)-1],a,s=2)
which gives me this, which doesnt match at all! Furthurmore, when I try and find the maximum value of a, it gives me 144, which fits fine with the scatterplot, but not with the histogram function.
If I count the numbers myself with the following code:
len(np.intersect1d(np.where(data>=b[np.argmax(a)]),np.where(data<b[np.argmax(a)+1])))
then it also gives me 144, in accordance with the values. So is the displayed histogram just wrong for some reason, and I should ignore it and just take the extracted data?
Old, unedited post:
For a physics course I am trying to bin my results in the following way:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as ss
from scipy.optimize import curve_fit
plt.rc("font", family=["Helvetica", "Arial"])
plt.rc("axes", labelsize=18)
plt.rc("xtick", labelsize=16, top=True, direction="in")
plt.rc("ytick", labelsize=16, right=True, direction="in")
plt.rc("axes", titlesize=22)
plt.rc("legend", fontsize=16)
data_Ra = np.loadtxt('Ra226_cal2_ch001.txt',skiprows=5)
t_Ra = data_Ra[:,0]*10**-8 # time in seconds
channels_Ra = data_Ra[:,1]
channels_Ra = channels_Ra[np.where(channels_Ra>0)] # removing all the measurements at channel = 0
intervalspace = 2 #The intervals in which we count
bins=np.arange(0,4000,intervalspace)
counts, intervals , stuff = plt.hist(channels_Ra,bins)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.show()
Here, the histogram plot looks totally fine, with a max near 13000 counts. But when I then use np.max(counts), I am given about 24000, and when I try and just plot the values it gives me with:
plt.scatter(intervals[0:len(intervals)-1]+intervalspace/2,counts,s=1)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.title('Ra225')
plt.show()
it looks like this, which is totally different, and I can't figure out why. I am expecting the scatterplot to resemble the histogram, and while the peaks are located at the same x-vales, the height do not match.
This problem is in other large datasets as well.
I dont think i'm allowed to drop the txt-file here? So im not sure how much more I can show, but any help will be appreciated!
I don't know why you interpret the results in that way.
If you look at the histogram plot, you will be able to see the maximum value of the y-axis is 25,000. That means that there are some values close to 25,000. This fact can be verified in the scatter plot.
Your scatter plot shows actual values. It would be clearer if you describe how your expected plot looks like.
If you want discard some outlier points, you should apply some filtering before plotting the data.

Seaborn showing x-tick labels overlapping

I am trying to make a box plot that looks like this.
Now, there are a lot of tickmarks that I do not need and truly do not show any additional information.
The code I am using is the following:
plot=sns.boxplot(y=MSE, x=Sim,
width=0.5,
palette='colorblind')
plot=sns.stripplot(y=MSE, x=Sim,
jitter=True,
marker='o',
alpha=0.15,
color='black')
plt.xlabel('xlabel')
plt.ylabel('ylabel')
plt.gca().invert_xaxis()
Where MSE and SIM are two numpy arrays of 400 elements each.
I reviewed some solutions that use locator_params and set_xticklabels. However, I want to know:
why this happen and,
is there a simple transformation in the MSE and SIM arrays to solve this?
I hope my questions are clear enough.
Thanks in advance.
Not very sure what you have as Sim, if it is an array of floats, then they are converted to categorical before plotting. The thing you can do, since the labels are not useful, is to use a range of values thats as long as the y-values.
With that, it still overlaps a lot because you are trying to fit 400 x ticks onto the x-axis, and the font size are set by default to be something readable. For example:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
fig,ax = plt.subplots(figsize=(15,6))
MSE = [np.random.normal(0,1,10) for i in range(100)]
Sim = np.arange(len(MSE))
g = sns.boxplot(y=MSE, x=Sim, width=0.5,palette='colorblind',ax=ax)
You can set the font size to be smaller and they don't overlap but I guess its hardly readable:
So like you said in your case, they are not useful, you can do:
ax.set(xticks=Sim[0::10])

t-SNE map into 2D or 3D plot

features = ["Ask1", "Bid1", "smooth_midprice", "BidSize1", "AskSize1"]
client = InfluxDBClient(host='127.0.0.1', port=8086, database='data',
username=username, password=password)
series = "DCIX_2016_11_15"
sql = "SELECT * FROM {} where time >= '{}' AND time <= '{}' ".format(series,FROMT,TOT)
df = pd.DataFrame(client.query(sql).get_points())
#Separating out the features
X = df.loc[:, features].values
# Standardizing the features
X = StandardScaler().fit_transform(X)
tsne = TSNE(n_components=3, n_jobs=5).fit_transform(X)
I would like map my 5 features into a 2D or 3D plot. I am a bit confused how to do that. How can I build a plot from that information?
You already have most of the work done. t-SNE is a common visualization for understanding high-dimensional data, and right now the variable tsne is an array where each row represents a set of (x, y, z) coordinates from the obtained embedding. You could use other visualizations if you would like, but t-SNE is probably a good starting place.
As far as actually seeing the results, even though you have the coordinates available you still need to plot them somehow. The matplotlib library is a good option, and that's what we'll use here.
To plot in 2D you have a couple of options. You can either keep most of your code the same and simply perform a 2D t-SNE with
tsne = TSNE(n_components=2, n_jobs=5).fit_transform(X)
Or you can just use the components you have and only look at two of them at a time. The following snippet should handle either case:
import matplotlib.pyplot as plt
plt.scatter(*zip(*tsne[:,:2]))
plt.show()
The zip(*...) transposes your data so that you can pass the x coordinates and the y coordinates individually to scatter(), and the [:,:2] piece selects two coordinates to view. You could ignore it if your data is already 2D, or you could replace it with something like [:,[0,2]] to view, for example, the 0th and 2nd features in higher-dimensional data rather than just the first 2.
To plot in 3D the code looks much the same, at least for a minimal version.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(*zip(*tsne))
plt.show()
The main differences are a use of 3D plotting libraries and making a 3D subplot.
Adding color: t-SNE visualizations are typically more helpful if they're color-coded somehow. One example might be the smooth midprice you currently have stored in X[:,2]. For exploratory visualizations, I find 2D plots more helpful, so I'll use that as the example:
plt.scatter(*zip(*tsne[:,:2]), c=X[:,2])
You still need the imports and whatnot, but by passing the keyword argument c you can color code the scatter plot. To adjust how that numeric data is displayed, you could use a different color map like so:
plt.scatter(*zip(*tsne[:,:2]), c=X[:,2], cmap='RdBu')
As the name might suggest, this colormap consists of a gradient between red and blue, and the lower values of X[:,2] will correspond to red.

Plotting vertical profile of wind barbs with matplotlib

I am trying to plot the two horizontal wind components u and v in a vertical profile using matplotlib. Next to that, I would like to plot wind barbs at the same heights, so that the two plots share the y-axis. I would like to use the sharey-option so that i can control the ylimits more easily using values of z.
So far I did this:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
u = np.array([1,2,3,5,7,7,7,7])
v = np.array([-1,-1,-1,1,3,3,3,3])
z = np.array([2,10,50,100,200,300,400,500])
fig,ax = plt.subplots(figsize=[15,8],ncols=2,sharey=True)
ax[0].plot(u,z,label='U-component')
ax[0].plot(v,z,label='V-component')
ax[0].axvline(0,color='k')
ax[0].legend(loc=4)
Xq,Yq = np.meshgrid(1,np.arange(0,u.shape[0]))
ax[1].barbs(Xq,Yq,u,v)
plt.show()
which gives me the following plot:
image with sharey-option
As you can see, the barb-function plots the wind barbs against the index of the array, but i would like to plot it against my z-array (just like I did with the u and v arrays). It should look something like this:
without sharey-option
Here I simply switched off the sharey-option to show what I would like to have. But as I mentioned, with this option I cannot set the ylimits of the barb-plot using z-values.
Can anyone help me to get this done?
I hope I made myself clear, if not, please help me to improve my question. The code should work as a minimal working example.
thanks in advance
philipp
Not sure I understood correctly, but if I did then you should try changing the meshgrid line to:
Xq,Yq = np.meshgrid(1, z)
The point being setting the barbs origin to be at (1, z) coordinate.

Issue Controlling Size of Holoviews + Datashader with Matplotlib Backend

I'm currently trying to use holoviews+datashader with the matplotlib backend. The data I'm using has very different x and y ranges and the result is that the datashader plots are stretched unhelpfully. The opts and output keywords I've tried using can solve the problem with the holoviews only plots but not once datashade is applied.
For example:
import holoviews as hv
hv.extension('matplotlib')
import numpy as np
from holoviews.operation.datashader import datashade
np.random.seed(1)
positions = np.random.multivariate_normal((0,0),[[0.1,0.1], [0.1,50.0]], (1000000,))
positions2 = np.random.multivariate_normal((0,0),[[0.1,0.1], [0.1,50]], (1000,))
points = hv.Points(positions,label="Points")
points2 = hv.Points(positions2,label="Points2")
plot = datashade(points) + points2
plot
Generates:
datashader and points output
I can control the size of the points only plot using the fig_size opts keyword
e.g. points2(plot=dict(fig_size=200))
but the same doesn't work for datashader plots. Any advice for changing the size of such datashader figures with matplotlib would be greatly appreciated. Ideally, I'd like to use functions and not cell magic keywords so the code can be ported to a script.
Thanks!
Changing the size of matplotlib plots in HoloViews is always controlled by the outer container, so when you have a Layout you can change the size on that object, e.g. in your example that would be:
plot = datashade(points) + points2
plot.opts(plot=dict(fig_size=200))
The other part that might be confusing is that RGB elements (which is what datashade operation returns) uses aspect='equal' by default. You can change that by setting aspect to 'square' or an explicit aspect ratio:
datashade(points).opts(plot=dict(fig_size=200, aspect='square'))
Putting that together you might want to do something like this:
plot = datashade(points).opts(plot=dict(aspect='square')) + points2
plot.opts(plot=dict(fig_size=200))

Categories