Python: define color curve section

Python: define color curve section - python

I am trying to replicate the following figure:
The color gradient goes from blue to red and indicates the state of a material. I can currently plot each curve. Each line is defined by two points, and is then simply plotted using pyplot (matplotlib library). I also have a clear idea on how to compute the associated value.
However, it can be noticed, for example for point 9 or 22 on the first figure, that the value is different at the overlapping area. I have currently no clue on how to do that effectively.
The only idea I have comes from this solution. Basically, I have to turn each curve section into a polygon. But it looks very heavy, and prehaps not the best solution in this case.
I am mainly asking for leads that could help me to achieve this, or just a smarter way to look at the problem !
The code that produced this figure:
import shelve
import matplotlib.pyplot as plt
import os
path = "C:/Users/***/Desktop/Python/PyHugo/"
d = shelve.open(os.path.join(path, 'output.db'))
pointMat = d ['curve']
d.close()
fig=plt.figure()
ax=fig.add_subplot(111)
for matID in pointMat.keys():
for couple in range(len(pointMat[matID])-1):
plt.plot([pointMat[matID][couple][0][0],pointMat[matID][couple][1][0]],[pointMat[matID][couple][0][1],pointMat[matID][couple][1][1]])
plt.show()
Points are stored in the pointMat dictionnary. Each area has a set of points. An area is a specific material. It is represented on figure one by le black line (around 540). So in the current example there are 2 materials.
The, the first set of points is given by:
print matPoint[0][0]
results : [[20, 20], [0, 40]]. We asked for the first couple of point in the first material.
Edit 1: code added, off-topic question removed
EDIT 2: Instead of plotting the curves, I am mapping the values over a grid (discretisation of the phenomena). The problem has too many variations, and this seemed a better idea. Thank you for the time spend trying to help me !

You need to make your lines semi-transparent using the alpha parameter:
plt.plot([pointMat[matID][couple][0][0],
[pointMat[matID][couple][1][0]],
[pointMat[matID][couple][0][1],
[pointMat[matID][couple][1][1]],
alpha=0.7)
as an example.

Related

How to tell datashader to use gray color for NA values when plotting categorical data

I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?
I show you the way I do it now (as more or less proposed in https://holoviews.org/user_guide/Large_Data.html ).
I provide you an example:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,spread
raw_data = [('Alice', 60, 'London', 5) ,
('Bob', 14, 'Delhi' , 7) ,
('Charlie', 66, np.NaN, 11) ,
('Dave', np.NaN,'Delhi' , 15) ,
('Eveline', 33, 'Delhi' , 4) ,
('Fred', 32, 'New York', np.NaN ),
('George', 95, 'Paris', 11)
]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])
x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories
# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
#Just to make clear how many points of a given category we have
print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})
# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)
datashaded=datashade(points,aggregator=ds.by(color)).opts(width=800, height=480)
(spread(datashaded,px=4, shape='square')*color_points).opts(legend_position='right')
It produces the following picture:
You can see some issues:
Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Then there are some minor issues I have I did not want to open questions for. (If you think they deserve their own question please tell me, I am new to stackoverflow and appreciate your advice.)
One other problem:
The dots are not all of the same size. This is quite ugly. Is there a way to change that?
And then there is also a question that I have: Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.

Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Right now I believe Datashader replaces NaNs with zeros (see https://github.com/holoviz/datashader/blob/master/datashader/transfer_functions/__init__.py#L351). Seems like a good feature request to be able to supply Datashader with a color to use for NaNs instead, but in the meantime, I'd recommend replacing the NaNs with an actual category name like "Other" or "Missing" or "Unknown", and then both the coloring and the legend should reflect that name.
One other problem: The dots are not all of the same size. This is quite ugly. Is there a way to change that?
Usually Datashader in a Bokeh HoloViews plot will render once initially before it is put into a Bokeh layout, and will be triggered to update once the layout is finished with a final version. Here, the initial rendering is being auto-ranged to precisely the range of the data points, then clipped by the boundaries of the plot (making squares near the edges become rectangles), and then the range of the plot is updated once the legend is added. To see how that works, remove *color_points and you'll see the same shape of dots, but now cropped by the plot edges:
You can manually trigger an update to the plot by zooming or panning slightly once it's displayed, but to force it to update without needing manual intervention, you can supply an explicit plot range:
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),).redim.range(Age=(0,90), Experience=(0,14))
It would be great if you could file a bug report on HoloViews asking why it is not refreshing automatically in this case when the legend is included. Hopefully a simple fix!
Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
Definitely! Whenever you show a legend, you should be sure to pass in the color key as a dictionary so that you can be sure that the legend and the plot coloring use the same colors for each category. Just pass in color_key={k:v for k,v in color_key} when you call datashade.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.
Points is meant for a 2D location in a plane where both axes are interchangeable, such as the physical x,y location on a geometric plane. Points has two independent dimensions, and expects you to have any dependent dimensions be used for the color, marker shape, etc. It's true that if you don't know which variable might be dependent on the other, a Scatter is tricky to use, but simply choosing to put something on the x axis will set people up to think that that variable is independent, so there's not much that can be done about it. Definitely not appropriate to use Points in this case.

Python/Seaborn: What does the inside horizontal distribution of the data-points means or is it random?

It seems like that inside-distribution of the histogram data points is almost random every time you plot (using Seaborn) - is it for the ease of readability or other meaningful purpose?
I am using Python 3.0 and Seaborn provided dataset called 'tips' for this question.
import seaborn as sns
tips = sns.load_dataset("tips")
After I ran my same code below twice I see differences of inside points distribution. Here is the code you can run a couple of times:
ax = sns.stripplot(x="day", y="total_bill", data=tips, alpha=.55,
palette='Set1', jitter=True, linewidth=1 )
Now, if you look into the plots (if you ran it twice for example) you will notice that the distribution of the points is not the same between 2 plots:
Please explain why points are not distributed identically with 2 separate runs? Also, judging those points on the horizontal scale; is there a reason why (for example) one red point is further left than other red point OR is it simply for readability?
Thank you in advance!

After a bit more research, I believe that the distribution of data points is random but uniform (thank you #ImportanceOfBeingErnest for pointing to the code). Therefore, answering my own questions there is no hidden meaning in terms of distribution and horizontal range is simply set for visibility that also changes or stays the same based on set/notset seed.

I do think that both displays are identical along the vertical axis (I.e. : both distributions are equal since they represent the same scatter plot of a given dataset). The slight visual differences comes along the position onto the horizontal (categorical days) axis; this one comes from the 'jitter' option (=True) that induces slight random relatively to the vertical axis they are related to (day). The jitter option helps to distinguish scatter plots with the same total_bill value (that should be superimposed if equal) : thus the difference comes from the jitter option set to True, that is used for readability.

Making a simple plot smooth in python

I have made a simple program that generates a path avoiding objects (objects here are shown in green and yellow). The path is shown in red.
This is meant to be used to navigate a small car however it cant make straight 45° turns and i need a way to make the path more smooth.
The yellow area is just a safety zone so there is no problem if it slightly cuts into it.
The plot is being made using the follow code (it updates as the object moves around).
path_data, = plt.plot(path_x, path_y, 'r-')
Image of path and object:
Regards, Jakob
Edit: The difference (from what i can tell) between my problem and the problem answered in the other thread is that I will not know what degree my curve will have beforehand and it will be used continuously so i can not plot it and the decide the degree myself. (I´m not an experienced programmer so I could very well be wrong)

You could try generating smoother data before plotting it. Usually one does this by fitting a smooth function to the data. Here you should invert x-axis and y-axis to get a real function (only for the interpolation process).
One function you could use:
from scipy.interpolate import interp1d
The doc: https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

matplotlib legend performance issue

I am using Jupyter-notebook with python 3.6.2 and matplotlib to plot some data.
When I plot my data, I want to add a legend to the plot (basically to know which line is which)
However calling plt.legend takes a lot of time (almost as much as the plot itself, which to my understanding should just be instant).
Minimal toy problem that reproduces the issue:
import numpy as np
import matplotlib.pyplot as plt
# Toy useless data (one milion x 4)
my_data = np.random.rand(1000000,4)
plt.plot(my_data)
#plt.legend(['A','C','G','T'])
plt.show()
The data here is just random and useless, but it reproduces my problem:
If I uncomment the plt.legend line, the run takes almost double the time
Why? Shouldn't the legend just look at the plot, see that 4 plots have been made, and draw a box assigning each color to the corresponding string?
Why is a simple legend taking so much time?
Am I missing something?

Replicating the answer by #bnaecker, such that this question is answered:
By default, the legend will be placed in the "best" location, which requires computing how many points from each line are inside a potential legend box. If there are many points, this can take a while. Drawing is much faster when specifying a location other than "best", e.g. plt.legend(loc=3).

Best way to create a 2D Contour Map with Python

I am trying to create a 2D Contour Map in Python that looks like this:
In this case, it is a map of chemical concentration for a number of points on the map. But for the sake of simplicity, we could just say it's elevation.
I am given the map, in this case 562 by 404px. I am given a number of X & Y coordinates with the given value at that point. I am not given enough points to smoothly connect the line, and sometimes very few data points to draw from. It's my understanding that Spline plots should be used to smoothly connect the points.
I see that there are a number of libraries out there for Python which assist in creation of the contour maps similar to this.
Matplotlib's Pyplot Contour looks promising.
Numpy also looks to have some potential
But to me, I don't see a clear winner. I'm not really sure where to start, being new to this programming graphical data such as this.
So my question really is, what's the best library to use? Simpler would be preferred. Any insight you could provide that would help get me started the proper way would be fantastic.
Thank you.

In the numpy example that you show, the author is actually using Matplotlib. While there are several plotting libraries, Matplotlib is the most popular for simple 2D plots like this. I'd probably use that unless there is a compelling reason not to.
A general strategy would be to try to find something that looks like what you want in the Matplotlib example gallery and then modify the source code. Another good source of high quality Matplotlib examples that I like is:
http://astroml.github.com/book_figures/

Numpy is actually a N-dimensional array object, not a plotting package.
You don't need every pixel with data. Simply mask your data array. Matplotlib will automatically plot the area that it can and leave other area blank.

I was having this same question. I found that matplotlib has interpolation which can be used to smoothly connect discrete X-Y points.
See the following docs for what helped me through:
Matplotlib's matplotlib.tri.LinearTriInterpolator docs.
Matplotlib's Contour Plot of Irregularly Spaced Data example
How I used the above resources loading x, y, z points in from a CSV to make a topomap end-to-end

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.