The 3D surface plot in plotly never shows the data, I get the plot to show up, but nothing shows up in the plot, as if I had ploted an empty Data Frame.
At first, I tried something like the solution I found here(Plotly Plot surface 3D not displayed), but had the same result, another plot with no data.
df3 = pd.DataFrame({'x':[1, 2, 3, 4, 5],'y':[10, 20, 30, 40, 50],'z': [5, 4, 3, 2, 1]})
iplot(dict(data=[Surface(x=df3['x'], y=df3['y'], z=df3['z'])]))
And so I tried the code at the plotly website(the first cell of this notebook: https://plot.ly/python/3d-scatter-plots/), exactly as it is there, just to see if their example worked, but I get an error.
I am getting this:
https://lh3.googleusercontent.com/sOxRsIDLVkBGKTksUfVqm3HtaSQAN_ybQq2HLA-aclzEU-9ekmvd1ETdfsswC2SdbysizOI=s151
But I should get this:
https://lh3.googleusercontent.com/5Hy2Z-97_vwd3ftKBA6dYZfikJHnA-UMEjd3PHvEvdBzw2m2zeEHBtneLC1jzO3RmE2lyw=s151
Observation: could not post the images because of lack of reputation.
In order to plot a surface you have to provide a value for each point. In this case your x and y are series of size 5, that means that your z should have a shape (5, 5).
If I had a bit more info I could give you more details but for a minimal working example try to pass a (5, 5) dataframe, numpy array or even a list of lists to the z value of data.
EDIT:
In a notebook environment the following code works for me:
from plotly import offline
from plotly import graph_objs as go
offline.init_notebook_mode(connected=False)
df3 = {'x':[1, 2, 3, 4, 5],'y':[10, 20, 30, 40, 50],'z': [[5, 4, 3, 2, 1]]*5}
offline.iplot(dict(data=[go.Surface(x=df3['x'], y=df3['y'], z=df3['z'])]))
as shown here:
I'm using plotly 3.7.0.
Related
I have a dataframe to which I want to precisely modify the label when using df.plot(). Example:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])
When plotting this dataframe it shows the name of the columns as labels, but I will like to add more text at the top of that label in LATEX format, for instance $V_{sd}$. At the end I want my label to look like:
(transparent) $V_{sd}$
(blue) a
(orange) b
(green) c
What is written inside the parenthesis is the color of the label/line which I want to precisely control as well.
One way to do this is using matplotlib.pyplot.plot and make an empty plot with the extra label and then plot each column/row one by one, but I wonder if there is an easier way to do it with pandas since I have a bunch of dataframes.
Do you mean this:
df.plot(color=['b','orange','green'])
plt.legend(title='$V_{sd}$')
Output:
With this code snippet, I'm expecting a line plot with one line per hue, which has these distinct values: [1, 5, 10, 20, 40].
import math
import pandas as pd
import seaborn as sns
sns.set(style="whitegrid")
TANH_SCALING = [1, 5, 10, 20, 40]
X_VALUES = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
COLUMNS = ['x', 'y', 'hue group']
tanh_df = pd.DataFrame(columns=COLUMNS)
for sc in TANH_SCALING:
data = {
COLUMNS[0]: X_VALUES,
COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
COLUMNS[2]: len(X_VALUES)*[sc]}
tanh_df = tanh_df.append(
pd.DataFrame(data=data, columns=COLUMNS),
ignore_index=True
)
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df);
However, what I get is a hue legend with values [0, 15, 30, 45], and an additional line, like so:
Is this a bug or am I missing something obvious?
This is a known bug of seaborn when the hue can be cast to integers. You could add a prefix to the hue so casting to integers fails:
for sc in TANH_SCALING:
data = {
COLUMNS[0]: X_VALUES,
COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
COLUMNS[2]: len(X_VALUES)*[f'A{sc}']} # changes here
tanh_df = tanh_df.append(
pd.DataFrame(data=data, columns=COLUMNS),
ignore_index=True
)
Output:
Or after you created your data:
# data creation
for sc in TANH_SCALING:
data = {
COLUMNS[0]: X_VALUES,
COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
COLUMNS[2]: len(X_VALUES)*[f'A{sc}']}
tanh_df = tanh_df.append(
pd.DataFrame(data=data, columns=COLUMNS),
ignore_index=True
)
# hue manipulation
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1],
hue='A_' + tanh_df[COLUMNS[2]].astype(str), # change hue here
data=tanh_df);
As #LudvigH's comment on the other answer says, this isn't a bug, even if the default behavior is surprising in this case. As explained in the docs:
The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below.
Here are two specific ways to control the behavior.
If you want to keep the numeric color mapping but have the legend show the exact values in your data, set legend="full":
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, legend="full")
If you want to have seaborn treat the levels of the hue parameter as discrete categorical values, pass a named categorical colormap or either a list or dictionary of the specific colors you want to use:
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, palette="deep")
I've started to use Holoviews with Python3 and Jupyter notebooks, and I'm looking for a good way to put long names and units on my plot axis. An example looks like this:
import holoviews as hv
import pandas as pd
from IPython.display import display
hv.notebook_extension()
dataframe = pd.DataFrame({"time": [0, 1, 2, 3],
"photons": [10, 30, 20, 15],
"norm_photons": [0.33, 1, 0.67, 0.5],
"rate": [1, 3, 2, 1.5]}, index=[0, 1, 2, 3])
hvdata = hv.Table(dataframe, kdims=["time"])
display(hvdata.to.curve(vdims='rate'))
This gives me a nice plot, but instead of 'time' on the x-axis and 'rate' on the y-axis, I would prefer something like 'Time (ns)' and 'Rate (1/s)', but I don't want to type that in the code every time.
I've found this blog post by PhilippJFR which kind of does what I need, but the DFrame() function which he uses is depreciated, so I would like to avoid using that, if possible. Any ideas?
Turns out it's easy to do but hard to find in the documentation. You just pass a holoviews.Dimension instead of a string as the kdims parameter:
hvdata = hv.Table(dataframe, kdims=[hv.Dimension('time', label='Time', unit='ns')])
display(hvdata.to.curve(vdims=hv.Dimension('rate', label='Rate', unit='1/s')))
You can find good alternatives in this SO question:
Setting x and y labels with holoviews
I like doing it like this:
Creating a tuple with the name of the variable and the long name you would like to see printed on the plot:
hvdata = hv.Table(
dataframe,
kdims=[('time', 'Time (ns)')],
vdims=[('rate', 'Rate (1/s)')],
)
I wanted to draw a histogram of some data. sorry that I could not attach a sample histogram as I don't have enough reputation. Hope that my description of the problem I am facing will be understood by you. I am using python pandas and I realize that any NaN value is treated as a 0 by pandas. Is there any method that I can use to include the count of Nan value in the histogram? What I mean is that the x-axis should have the NaN value as well. Please help... Thank you very much.
I was looking for the same thing. I ended up with the following solution:
figure = plt.figure(figsize=(6,9), dpi=100);
graph = figure.add_subplot(111);
freq = pandas.value_counts(data)
bins = freq.index
x=graph.bar(bins, freq.values) #gives the graph without NaN
graphmissing = figure.add_subplot(111)
y = graphmissing.bar([0], freq[numpy.NaN]) #gives a bar for the number of missing values at x=0
figure.show()
This gave me a histogram with a column at 0 showing the number of missing values in the data.
Did you try replacing NaN with some other unique value and then plot the histogram?
x= some unique value
plt.hist(df.replace(np.nan, x)
As pointed out by Sreeram TP, it is possible to use the argument dropna=False in the function value_counts to include the counts of NaNs.
df = pd.DataFrame({'feature1': [1, 2, 2, 4, 3, 2, 3, 4, np.NaN],
'feature2': [4, 4, 3, 4, 1, 4, 3, np.NaN, np.NaN]})
# Calculates the histogram for feature1
counts = df['feature1'].value_counts(dropna=False)
counts.plot.bar(title='feat1', grid=True)
I can not insert images. So, here is the result:
image plot here
By using .iloc[::-1] on the output of value_counts(), you can reverse its order.
The code would look like this:
df["column"].value_counts().iloc[::-1]
I would like to get data from a single contour of evenly spaced 2D data (an image-like data).
Based on the example found in a similar question: How can I get the (x,y) values of the line that is ploted by a contour plot (matplotlib)?
>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4]
>>> y = [1,2,3,4]
>>> m = [[15,14,13,12],[14,12,10,8],[13,10,7,4],[12,8,4,0]]
>>> cs = plt.contour(x,y,m, [9.5])
>>> cs.collections[0].get_paths()
The result of this call into cs.collections[0].get_paths() is:
[Path([[ 4. 1.625 ]
[ 3.25 2. ]
[ 3. 2.16666667]
[ 2.16666667 3. ]
[ 2. 3.25 ]
[ 1.625 4. ]], None)]
Based on the plots, this result makes sense and appears to be collection of (y,x) pairs for the contour line.
Other than manually looping over this return value, extracting the coordinates and assembling arrays for the line, are there better ways to get data back from a matplotlib.path object? Are there pitfalls to be aware of when extracting data from a matplotlib.path?
Alternatively, are there alternatives within matplotlib or better yet numpy/scipy to do a similar thing? Ideal thing would be to get a high resolution vector of (x,y) pairs describing the line, which could be used for further analysis, as in general my datasets are not a small or simple as the example above.
For a given path, you can get the points like this:
p = cs.collections[0].get_paths()[0]
v = p.vertices
x = v[:,0]
y = v[:,1]
from: http://matplotlib.org/api/path_api.html#module-matplotlib.path
Users of Path objects should not access the vertices and codes arrays
directly. Instead, they should use iter_segments() to get the
vertex/code pairs. This is important, since many Path objects, as an
optimization, do not store a codes at all, but have a default one
provided for them by iter_segments().
Otherwise, I'm not really sure what your question is. [Zip] is a sometimes useful built in function when working with coordinates. 1
The vertices of an all paths can be returned as a numpy array of float64 simply via:
cs.allsegs[i][j] # for element j, in level i
where cs is defined as in the original question as:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 3, 4]
m = [[15, 14, 13, 12], [14, 12, 10, 8], [13, 10, 7, 4], [12, 8, 4, 0]]
cs = plt.contour(x, y, m, [9.5])
More detailed:
Going through the collections and extracting the paths and vertices is not the most straight forward or fastest thing to do. The returned Contour object actually has attributes for the segments via cs.allsegs, which returns a nested list of shape [level][element][vertex_coord]:
num_levels = len(cs.allsegs)
num_element = len(cs.allsegs[0]) # in level 0
num_vertices = len(cs.allsegs[0][0]) # of element 0, in level 0
num_coord = len(cs.allsegs[0][0][0]) # of vertex 0, in element 0, in level 0
See reference:
https://matplotlib.org/stable/api/contour_api.html
I am facing a similar problem, and stumbled over this matplotlib list discussion.
Basically, it is possible to strip away the plotting and call the underlying functions directly, not super convenient, but possible. The solution is also not pixel precise, as there is probably some interpolation going on in the underlying code.
import matplotlib.pyplot as plt
import matplotlib._cntr as cntr
import scipy as sp
data = sp.zeros((6,6))
data[2:4,2:4] = 1
plt.imshow(data,interpolation='none')
level=0.5
X,Y = sp.meshgrid(sp.arange(data.shape[0]),sp.arange(data.shape[1]))
c = cntr.Cntr(X, Y, data.T)
nlist = c.trace(level, level, 0)
segs = nlist[:len(nlist)//2]
for seg in segs:
plt.plot(seg[:,0],seg[:,1],color='white')
plt.show()