How to overplot arrays of different shape? - python

I'm trying to overplot two arrays with different shapes but I'm unable to project one on the top of the other. For example:
#importing the relevant packages
import numpy as np
import matplotlib.pyplot as plt
def overplot(data1,data2):
'''
This function should make a contour plot
of data2 over the data1 plot.
'''
#creating the figure
fig = plt.figure()
#adding an axe
ax = fig.add_axes([1,1,1,1])
#making the plot for the
#first dataset
ax.imshow(data1)
#overplotting the contours
#for the second dataset
ax.contour(data2, projection = data2,
levels = [0.5,0.7])
#showing the figure
plt.show(fig)
return
if __name__ == '__main__':
'''
testing zone
'''
#creating two mock datasets
data1 = np.random.rand(3,3)
data2 = np.random.rand(9,9)
#using the overplot
overplot(data1,data2)
Currently, my output is something like:
While what I actually would like is to project the contours of the second dataset into the first one. This way, if I got images of the same object but with different resolution for the cameras I would be able to do such plots. How can I do that?
Thanks for your time and attention.

It's generally best to make the data match, and then plot it. This way you have complete control over how things are done.
In the simple example you give, you could use repeat along each axis to expand the 3x3 data to match the 9x9 data. That is, you could use, data1b = np.repeat(np.repeat(data1, 3, axis=1), 3, axis=0) to give:
But for the more interesting case of images, like you mention at the end of your question, then the axes probably won't be integer multiples and you'll be better served by a spline or other type interpolation. This difference is an example of why it's better to have control over this yourself, since there are many ways to to this type of mapping.

Related

How can I plot only particular values in xarray?

I am using data from cdasws to plot dynamic spectra. I am following the example found here https://cdaweb.gsfc.nasa.gov/WebServices/REST/jupyter/CdasWsExample.html
This is my code which I have modified to obtain a dynamic spectra for STEREO.
from cdasws import CdasWs
from cdasws.datarepresentation import DataRepresentation
import matplotlib.pyplot as plt
cdas = CdasWs()
import numpy as np
datasets = cdas.get_datasets(observatoryGroup='STEREO')
for index, dataset in enumerate(datasets):
print(dataset['Id'], dataset['Label'])
variables = cdas.get_variables('STEREO_LEVEL2_SWAVES')
for variable_1 in variables:
print(variable_1['Name'], variable_1['LongDescription'])
data = cdas.get_data('STEREO_LEVEL2_SWAVES', ['avg_intens_ahead'],
'2020-07-11T02:00:00Z', '2020-07-11T03:00:00Z',
dataRepresentation = DataRepresentation.XARRAY)[1]
print(data)
plt.figure(figsize = (15,7))
# plt.ylim(100,1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.yscale('log')
sorted_data.transpose().plot()
plt.xlabel("Time",size=18)
plt.ylabel("Frequency (kHz)",size=18)
plt.show()
Using this code gives a plot that looks something like this,
My question is, is there anyway of plotting this spectrum only for a particular frequency? For example, I want to plot just the intensity values at 636 kHz, is there any way I can do that?
Any help is greatly appreciated, I dont understand xarray, I have never worked with it before.
Edit -
Using the command,
data_stereo.avg_intens_ahead.loc[:,625].plot()
generates a plot that looks like,
While this is useful, what I needed is;
for the dynamic spectrum, if i choose a particular frequency like 600khz, can it display something like this (i have just added white boxes to clarify what i mean) -
If you still want the plot to be 2D, but to include a subset of your data along one of the dimensions, you can provide an array of indices or a slice object. For example:
data_stereo.avg_intens_ahead.sel(
frequency=[625]
).plot()
Or
# include a 10% band on either side
data_stereo.avg_intens_ahead.sel(
frequency=slice(625*0.9, 625*1.1)
).plot()
Alternatively, if you would actually like your plot to show white space outside this selected area, you could mask your data with where:
data_stereo.avg_intens_ahead.where(
data_stereo.frequency==625
).plot()

Holoviews: how to customize histogram for linked time series Curve plots

I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!
Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)

How to Convert Color-Code Legends from Logarithmic Scale to Actual Values?

What is the best way to display actual vallues in color-code legend when using logarithmic scale color coding in plotly.figure_factory.create_choropleth?
Here is the sample code:
import plotly.figure_factory as ff
fips = df['fips']
values = np.log10(df['values'])
endpts = list(np.linspace(0, 4, len(colorscale) - 1))
fig = ff.create_choropleth(fips=fips, values=values, scope = ['usa'], binning_endpoints = endpts)
Here is what I have currently:
Here is what I wish to have:
Exactly same as above map except in the legend displaying actual numbers instead of log10(values). For example instead of 0.0-0.5, and 0.5-1.0 (meaning 10^0-to-10^1/2, and 10^1/2-to-10^1) I would like to see: 1-3, 4-10 and so forth.
I am not familiar with Plotly API and since you do not provide a minimal working example, it is hard for me to test, but I am quite confident that you could specify a colormap. If so, then you could just convert the colormap in logarithmic scale while feeding the numbers in liner scale.

Plotting a large point cloud using plotly produces a blank graph

Plotting a fairly large point cloud in python using plotly produces a graph with axes (not representative of the data range) and no data points.
The code:
import pandas as pd
import plotly.express as px
import numpy as np
all_res = np.load('fullshelf4_11_2019.npy' )
all_res.shape
(3, 6742382)
np.max(all_res[2])
697.5553566696478
np.min(all_res[2])
-676.311654692491
frm = pd.DataFrame(data=np.transpose(all_res[0:, 0:]),columns=["X", "Y", "Z"])
fig = px.scatter_3d(frm, x='X', y='Y', z='Z')
fig.update_traces(marker=dict(size=4))
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))
fig.show()
Alternatively you could generate random data and follow the process through
all_res = np.random.rand(3, 6742382)
Which also produces a blank graph with a axis scales that are incorrect.
So -- what am I doing wrong, and is there a better way to plot such a moderately large data set?
Thanks for your help!
Try plotting using ipyvolume.It can handle large point cloud datasets.
It seems like that's too much data for WebGL to handle. I managed to plot 100k points, but 1M points already caused Jupyter to crash. However, a 3D scatterplot of 6.7 million points is of questionable value anyway. You probably won't be able to make any sense out of it (except for data boundaries maybe) and it will be super slow to rotate etc.
I would try to think of alternative approaches, depending on what you want to do. Maybe pick a representative subset of points and plot those.
I would suggest using pythreejs for a point cloud. It has very good performance, even for a large number of points.
import pythreejs as p3
import numpy as np
N = 1_000_000
# Positions centered around the origin
positions = np.random.normal(loc=0.0, scale=100.0, size=(N, 3)).astype('float32')
# Create a buffer geometry with random color for each point
geometry = p3.BufferGeometry(
attributes={'position': p3.BufferAttribute(array=positions),
'color': p3.BufferAttribute(
array=np.random.random((N, 3)).astype('float32'))})
# Create a points material
material = p3.PointsMaterial(vertexColors='VertexColors', size=1)
# Combine the geometry and material into a Points object
points = p3.Points(geometry=geometry, material=material)
# Create the scene and the renderer
view_width = 700
view_height = 500
camera = p3.PerspectiveCamera(position=[800.0, 0, 0], aspect=view_width/view_height)
scene = p3.Scene(children=[points, camera], background="#DDDDDD")
controller = p3.OrbitControls(controlling=camera)
renderer = p3.Renderer(camera=camera, scene=scene, controls=[controller],
width=view_width, height=view_height)
renderer

Python plot lines with specific x values from numpy

I have a situation with a bunch of datafiles, these datafiles have a number of samples in a given time frame that depends on the system. i.e. At time t=1 for instance I might have a file with 10 items, or 20 items, at later times in that file I will always have the same number of items. The format is time, x, y, z in columns, and loaded into a numpy array. The time values show which frame, but as mentioned there's always the same, let's go with 10 as a sample. So I'll have a (10,4) numpy array where the time values are identical, but there are many frames in the file, so lets say 100 frames, so really I have (1000,4). I want to plot the data with time on the x-axis and manipulations of the other data on the y, but I am unsure how to do this with line plot methods in matplotlib. Normally to provide both x,y values I believe I need to do a scatter plot, so I'm hoping there's a better way to do this. What I ideally want is to treat each line that has the same time code as a different series (so it will colour differently), and the next bit of data for that same line number in the next frame (time value) will be labelled the same colour, giving those good contiguous lines. We can look at the time column and figure out how many items share a time code, let's call it "n". Sample code:
a = numpy.loadtxt('sampledata.txt')
plt.plot(a[:0,:,n],a[:1,:1])
plt.show()
I think this code expresses what I'm going for, though it doesn't work.
Edit:
I hope this is what you wanted.
seaborn scatterplot can categorize data to some groups which have the same codes (time code in this case) and use the same colors to them.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r"E:\Programming\Python\Matplotlib\timecodes.csv",
names=["time","x","y","z","code"]) #use your file
df["time"]=pd.to_datetime(df["time"]) #recognize the data as Time
df["x"]=df["time"].dt.day # I changed the data into "Date only" and imported to x column. Easier to see on graph.
#just used random numbers in y and z in my data.
sns.scatterplot("x", "y", data = df, hue = "code") #hue does the grouping
plt.show()
I used csv file here but you can do to your text file as well by adding sep="\t" in the argument. I also added a code in the file. If you have it the code can group the data in the graph, so you don't have to separate or make a hierarchical index. If you want to change colors or grouping please see seaborn website.
Hope this helps.
Alternative, the method I used, but Tim's answer is still accurate as well. Since the time codes are not date/time information I modified my own code to add tags as a second column I call "p" (they're polymers).
import numpy as np
import pandas as pd
datain = np.loadtxt('somefile.txt')
df = pd.DataFrame(data = datain, columns = ["t","p","x","y","z"])
ax = sns.scatterplot("t","x", data = df, hue = "p")
plt.show()
And of course the other columns can be plotted similarly if desired.

Categories