I am trying to use hvplot.line to plot 2 y variables in a line chart. My goal is to format the hover text to some format I want (say 1 decimal). I used the standard method to format them in bokeh's hovertool and try to pass it with ".opts(tools=)".
But the formatting does not reflect in the plot. I specified the format should '0.0',but the hover text still shows 3 decimal. What did I do wrong?
My code looks like something below:
import pandas as pd
import numpy as np
import hvplot.pandas
hvplot.extension('bokeh')
from numpy import random
from bokeh.models import HoverTool
df=pd.DataFrame({'length':np.linspace(0,4000,6),
'slope':np.linspace(1.7,2.4,6),
'Direction':np.linspace(1.2,-0.5,6),
'clearance':random.rand(6),
'weight':random.rand(6)},)
hover=HoverTool(tooltips=[('clearance','#clearance{0.0}'),('weight','#weight{0.0}')])
df.hvplot.line(x='length',y=['slope','Direction'],invert=True,hover_cols=['clearance','weight']).opts(tools=[hover])
But if I reduce the number of y variable to just 1. It works fine.
Replace the last line of code to be:
df.hvplot.line(x='length',y=['Direction'],invert=True,hover_cols=['clearance','weight']).opts(tools=[hover])
You can pass the tools to the plot call as a keyword argument.
Change your code
# df.hvplot.line(x='length',y=['slope','Direction'], hover_cols=['clearance','weight'], invert=True).opts(tools=[hover])
df.hvplot.line(x='length',y=['slope','Direction'], hover_cols=['clearance','weight'], tools=[hover], invert=True)
and your hovertool with your formatter is applied.
Minimal Example
import hvplot.pandas
import numpy as np
import pandas as pd
from bokeh.models import HoverTool
hvplot.extension('bokeh')
df=pd.DataFrame({
'length':np.linspace(0,4000,6),
'slope':np.linspace(1.7,2.4,6),
'Direction':np.linspace(1.2,-0.5,6),
'clearance':np.random.rand(6),
'weight':np.random.rand(6)}
)
hover=HoverTool(tooltips=[('clearance','#clearance{0.0}'),('weight','#weight{0.0}')])
df.hvplot.line(
x='length',
y=['slope','Direction'],
hover_cols=['clearance','weight'],
tools=[hover],
invert=True
)
Output
Related
I want to change the following graphs to line graphs or other better understandable type.
import os
import itertools
import pvlib
import numpy as np
import pandas as pd
import matplotlib.style
import matplotlib as mpl
import matplotlib.pyplot as plt
from pvlib import clearsky, atmosphere, solarposition
from pvlib.location import Location
from pvlib.iotools import read_tmy3
tus = Location(28.6, 77.2, 'Asia/Kolkata', 216, 'Delhi')
times = pd.date_range(start='2016-01-01', end='2016-12-31', freq='1min', tz=tus.tz)
cs = tus.get_clearsky(times)
cs.plot();
plt.grid()
plt.ylabel('Irradiance $W/m^2$');
plt.title('Ineichen, climatological turbidity');
Irradiance data:
Also, how can I place the legend to suitable position.
Pandas plots by default line graphs. The graph that you are showing is actually a line graph, there are just so many lines that it looks like a solid plot!
To see this, try and only plot the first 1440 rows (one day), i.e.:
cs.iloc[:1440].plot()
If you want to visualize annual data you need to do some kind of aggregation for it to make sense. For example, resample to average daily irradiance and plot this:
cs.resample('1d').mean().plot()
To change the legend use the plt.legend command:
cs.resample('1d').mean().plot()
plt.legend(loc='upper right')
If I have this length.csv file content:
May I know how can I use pandas plot dot graph base on this xy and yx?
import pandas as pd
df = pd.read_csv('C:\\path\to\folder\length.csv')
Now if you print df, you will get the following
df.plot(x='yx', y='xy', kind='scatter')
You can change your plot type to different types like line, bar etc.
Refer to https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
You can easily use matplotlib. The plot method in Pandas is a wrapper for matplotlib.
If you wish to use Pandas, you can do it as such:
import pandas as pd
df = pd.read_csv('length.csv')
df.plot(x='xy', y='yx')
If you decide to go ahead with matplotlib, you can do as follows:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline # Include this line only if on a notebook (like Jupyter or Colab)
df = pd.read_csv('length.csv')
plt.plot(df['xy'], df['yx'])
plt.xlabel('xy')
plt.ylabel('yx')
plt.title('xy vs yx Plot')
plt.show()
Here were my 3 assigned data analysis tasks:
Display the complaint type and city together.
Plot a bar graph of count vs. complaint types.
Display the major complaint types and their count.
Here is my code:
import pandas as pd
import numpy as np
work_on = data[['Complaint Type','City']]
import matplotlib.pyplot as plt
from matplotlib import style
%matplotlib inline
koten = work_on['Complaint Type'].value_counts().head(10).plot(kind='bar')
koten
-- bar graph that was obtained
-- Which displays a bar graph but when i use the following code:
style.use('ggplot')
plt.plot(work_on['Complaint Type'].value_counts().head(10))
plt.xlabel('Values')
plt.ylabel('Names')
plt.title('first')
plt.show()
-- this throws an error:
ValueError: could not convert string to float: 'Traffic Signal Condition'
My question being: I am using the .plot(kind=) method which only works for kind='bar' which displayed the graph that i shared but when i use the matplotlib method it started giving me errors such as: ValueError: could not convert string to float: 'Traffic Signal Condition'. Is there any other good method in python to display such non-numerical data?
Here is a glimpse of my data:
Two columns to be worked on
It's not clear from the question what the desired plot should acutally show. If it is a line plot accross the different categories, you would need to provide the indizes of the categories to the plt.plot function.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([])
df["ComplaintType"] = np.random.choice(list("ABCDEFGHIJ"), size=50)
counts = df["ComplaintType"].value_counts()
plt.plot(range(len(counts)), counts)
plt.xticks(range(len(counts)), counts.index)
plt.show()
It is a bit questionable in how far it makes sense to connect different categories by a line though.
For extensive plotting scripts, I use matplotlibs rcParams to configure some standard plot settings for pandas DataFrames.
This works well for colors and font sizes but not for the default colormap as described here
Here's my current approach:
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
# global plotting options
plt.rcParams.update(plt.rcParamsDefault)
matplotlib.style.use('ggplot')
plt.rcParams['lines.linewidth'] = 2.5
plt.rcParams['axes.facecolor'] = 'silver'
plt.rcParams['xtick.color'] = 'k'
plt.rcParams['ytick.color'] = 'k'
plt.rcParams['text.color'] = 'k'
plt.rcParams['axes.labelcolor'] = 'k'
plt.rcParams.update({'font.size': 10})
plt.rcParams['image.cmap'] = 'Blues' # this doesn't show any effect
# dataframe with random data
df = pd.DataFrame(np.random.rand(10, 3))
# this shows the standard colormap
df.plot(kind='bar')
plt.show()
# this shows the right colormap
df.plot(kind='bar', cmap=cm.get_cmap('Blues'))
plt.show()
The first plot does not use the colormap via colormap (which it should normally do?):
It only works if I pass it as an argument as in the second plot:
Is there any way to define the standard colormap for pandas DataFrame plots, permanently?
Thanks in advance!
There is no supported, official way to do it; you are stuck because of pandas's internal _get_standard_colors function that hardcodes the use of matplotlib.rcParams['axes.color_cycle'] and falls back to list('bgrcmyk'):
colors = list(plt.rcParams.get('axes.color_cycle',
list('bgrcmyk')))
There are various hacks you can use, however; one of the simplest, which works for all pandas.DataFrame.plot() calls, is to wrap pandas.tools.plotting.plot_frame:
import matplotlib
import pandas as pd
import pandas.tools.plotting as pdplot
def plot_with_matplotlib_cmap(*args, **kwargs):
kwargs.setdefault("colormap", matplotlib.rcParams.get("image.cmap", "Blues"))
return pdplot.plot_frame_orig(*args, **kwargs)
pdplot.plot_frame_orig = pdplot.plot_frame
pdplot.plot_frame = plot_with_matplotlib_cmap
pd.DataFrame.plot = pdplot.plot_frame
To test in a notebook:
%matplotlib inline
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.random((1000,10))).plot()
...yields:
I'm trying to use Bokeh to plot a Pandas dataframe with a DateTime column containing years and a numeric one. If the DateTime is specified as x, the behaviour is the expected (years in the x-axis). However, if I use set_index to turn the DateTime column into the index of the dataframe and then only specify the y in the TimeSeries I get time in milliseconds in the x-axis. A minimal example
import pandas as pd
import numpy as np
from bokeh.charts import TimeSeries, output_file, show
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = TimeSeries(test,x='datetime',y='foo')
show(fig)
output_file('fig2.html')
test = test.set_index('datetime')
fig2 = TimeSeries(test,y='foo')
show(fig2)
Is this the expected behaviour or a bug? I would expect the same picture with both approaches.
Cheers!!
Bokeh used to add an index for internal reasons but as of not-so-recent versions (>= 0.12.x) it no longer does this. Also it's worth noting that the bokeh.charts API has been deprecated and removed. The equivalent code using the stable bokeh.plotting API yields the expected result:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import row
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = figure(x_axis_type="datetime")
fig.line(x='datetime',y='foo', source=test)
test = test.set_index('datetime')
fig2 = figure(x_axis_type="datetime")
fig2.line(x='datetime', y='foo', source=test)
show(row(fig, fig2))