plot.ly Bar plot axis labels - python

I am trying to plot a Bar plot of a pandas df column.
df[z1z2].head()
MN-SW_TO_MN-SE 562
IA-2_TO_MN-SE 345
MN-SW_TO_MN-WC 259
MN-SW_TO_MN-SW 184
ND_TO_MN-NW 163
Name: z1z2, dtype: int64
In [126]:
data = [Bar(y=df['z1z2'].value_counts()[0:50])]
iplot(data)
Note: that df['z1z2'].iplot(kind=Bar.....) does fails with credential errors. I guess you can't call iplot directly from pandas offline?
In any case, now how do I plot the x-axis, which are categories i.e. MN-SW_TO_MN-SE etc.
I am working offline within Jupyter notebook.
If someone can also point me to documentation that shows examples of what else is possible for Bar plots or any other plot I would be grateful.
Yes, I do know where the full reference documentation is for plot.ly. However, it is not intuitive and not easy to use for anyone that isn't a developer.

You can pass your categorical x-values directly to Plotly. In the example below the first column contains the categories (x=df.iloc[:,0]).
import string
import pandas as pd
import plotly
plotly.plotly.sign_in('username', 'api_key')
data = [[c, i] for i, c in enumerate(string.ascii_uppercase)]
df = pd.DataFrame(data)
plotly.plotly.plot([plotly.graph_objs.Bar(x=df.iloc[:,0], y=df.iloc[:,1])])
What else is possible with Plotly's bar charts? The best way for me was to look at existing examples and try to reverse engineer them. The documentation is fairly complete but cryptic without examples.

Related

Grouped bar chart of multiindex

first of all: I'm completely new to python.
I'm trying to visualize some measured data. Each entry has a quadrant, number and sector. The original data lies in a .xlsx file. I've managed to use a .pivot_table to sort the data according to its sector. Due to overlapping, number and quadrant also have to be indexed. Now I want to plot it as a bar chart, where the bars are grouped by sector and the colors represent the quadrant.
But because number also has to be indexed, it shows up in the bar chart as a separate group. There should only be three groups, 0, i and a.
MWE:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'quadrant': ["0","0","0","0","0","0","I","I","I","I","I","I","I","I","I","I","I","I","II","II","II","II","II","II","II","II","II","II","II","II","III","III","III","III","III","III","III","III","III","III","III","III","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV","IV"], 'sector': [0,"0","0","0","0","0","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i","a","a","a","a","a","a","i","i","i","i","i","i"], 'number': [1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6], 'Rz_m': [67.90,44.17,44.30,63.43,49.87,39.33,61.17,69.37,66.20,44.20,64.77,39.93,44.33,50.97,55.90,51.33,58.23,44.53,50.03,47.40,58.67,71.57,57.60,70.77,63.93,47.37,46.90,34.73,41.27,48.23,58.30,47.07,50.53,51.20,32.67,50.37,37.50,55.50,41.20,48.07,56.80,49.77,40.87,44.43,44.00,60.03,63.73,72.80,51.60,45.53,60.27,71.00,59.63,48.70]}
df = pd.DataFrame(data=d)
B = df.pivot_table(index=['sector','number', 'quadrant'])
B.unstack().plot.bar(y='Rz_m')
The data viz ecosystem in Python is pretty diverse and there are multiple libraries you can use to produce the same chart. Matplotlib is a very powerful library, but it's also quite low-level, meaning you often have to do a lot of preparatory work before getting to the chart, so usually you'll find people use seaborn for static visualisations, especially if there is a scientific element to them (it has built-in support for things like error bars, etc.)
Out of the box, it has a lot of chart types to support exploratory data analysis and is built on top of matplotlib. For your example, if I understood it right, it would be as simple as:
import seaborn as sns
sns.catplot(x="sector", y="Rz_m", hue="quadrant", data=df, ci=None,
height=6, kind="bar", palette="muted")
And the output would look like this:
Note that in your example, you missed out "" for one of the zeroes and 0 and "0" are plotted as separate columns. If you're using seaborn, you don't need to pivot the data, just feed it the df as you've defined it.
For interactive visualisations (with tooltips, zoom, pan, etc.), you can also check out bokeh.
There is an interesting wrinkle to this - how to center the nested bars on the label. By default the bars are drawn with center alignment which works fine for an odd number of columns. However, for an even number, you'd want them to be centered on the right edge. You can make a small alteration in the source code categorical.py, lines beginning 1642 like so:
# Draw the bars
offpos = barpos + self.hue_offsets[j]
barfunc(offpos, self.statistic[:, j], -self.nested_width,
color=self.colors[j], align="edge",
label=hue_level, **kws)
Save the .png and then change it back, but it's not ideal. Probably worth flagging up to the library maintainers.

same data produces different pandas plot

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.
The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

Python Bokeh: Plotting same chart multiple times in gridplot

I'm currently trying to get an overview of plots of data of different dates. To get a good feeling of the data I would like to plot relevant plots next to each other. This means I want to use the same plot multiple times in the gridplot command. However what I noticed is that when i use the same chart multiple times it will only show it once in the final .html file. My first attempt at solving this was to use a copy.deepcopy for the charts, but this gave the following error:
RuntimeError: Cannot get a property value 'label' from a LineGlyph instance before HasProps.__init__
My approach has been as follows:
from bokeh.charts import Line, output_file, show, gridplot
import pandas as pd
output_file('test.html')
plots = []
df = pd.DataFrame([[1,2], [3,1], [2,2]])
print(df)
df.columns = ['x', 'y']
for i in range(10):
plots.append(Line(df, x='x', y='y', title='Forecast: ' + str(i),
plot_width=250, plot_height=250))
plot_matrix = []
for i in range(len(plots)-1, 2, -1):
plot_matrix.append([plots[i-3], plots[i-2], plots[i]])
p = gridplot(plot_matrix)
show(p)
The results of which is a an html page with a grid plot with a lot of missing graphs. Each graph is exactly shown once (instead of the 3 times required), which leads me to think that the gridplot does not like me using the same object multiple times. An obvious solve is to simply create every graph 3 times as a different object, which I will do for now, but not only is this inefficient, it also hurts my eyes when looking at my code. I'm hoping somebody has a more elegant solution for my problem.
EDIT: made code runable
This is not possible. Bokeh plots (or Bokeh objects in general) may not be re-used in layouts.

How to output a large number of histograms in a pandas groupby

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()
One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()
Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

Categories