How to graph a function in Python using plotnine library - python

I've been a longtime R user, recently transitioning over to Python, and I've been trying to carry over my knowledge of plotting with ggplot2, since it is so intuitive. Plotnine is supposedly the most ggplot2-esque plotting library, and I've successfully recreated most graphs with it, except critically how to plot regular functions.
In base R, you can easily define an eq., as in so, input the result into a stat_function() layer, and set the limits of the graph in place of the data arg., and successfully plot a parabola or the like. However, the syntax for setting the graph's limits must be different in Python (perhaps using numpy?), and equations are defined using sympy, which is another divergence for me.
So how would I go about plotting functions with plotnine? The above two hurdles are the two differences with ggplot2 that I think are causing me trouble, since plotnine has so few examples online.
P.S. This is an example of what I want to recreate in Python using plotnine:
> library(ggplot2)
> basic_plot <- function(x) x^2 + 2.5
> graph <- ggplot(data.frame(x=c(-5,5)), aes(x=x)) +
+ stat_function(fun = basic_plot)
> graph

You do not need numpy, it works just fine the "standard" way! :)
from plotnine import *
import pandas as pd
(ggplot(pd.DataFrame(data={"x": [-5, 5]}), aes(x="x"))
+ stat_function(fun=lambda x: x**2+2.5))

One of the main differences that caused me problems was the same as posted in the question. Specifically:
in R
aes(x = x) or aes(x)
in plotnine
aes(x = 'x')

Related

Problems re-implementing the fipy mesh20x20 example from my own IDE

I am currently using fipy but am still relatively new to the nuiances associated with the package. While I have been able to regenerate the desired heatmap from the examples folder in for the mesh20x20 diffusion example using the command line, I have struggled to replicate it within a Spyder IDE. I am using python version 3.8 . It is simple enough to generate it using the "examples" folder from the command line the command line image generated, however, when I attempt to "re-program" it I end up with iterations of the following. the following result. I am hoping to be able to regenerate the smooth color transition from the examples folder, as opposed to the discrete dichromatic option that I have been limited to at present. I believe there is some issues with the viewer in some capacity I believe some related issues may have cropped up in the past for others, potentially as it relates to colorbar reformatting, though I have not yet been capable of effectively implementing these workarounds to generate the desired imagery. datamin and datamax in Viewer() did not work
I would be greatly indebted for any assitance the community could provide.
from fipy.terms.transientTerm import TransientTerm
from fipy.terms.implicitDiffusionTerm import ImplicitDiffusionTerm
from fipy.terms.explicitDiffusionTerm import ExplicitDiffusionTerm
from fipy.meshes.nonUniformGrid2D import NonUniformGrid2D
from fipy.variables.cellVariable import CellVariable
from fipy.viewers.matplotlibViewer.matplotlib2DViewer import Matplotlib2DViewer
####
#Global Inputs
D=1
steps=10
#Dimensional Inputs
nx=20
dx=1
ny=20
dy=1
L=dx*nx
#Temporal Inputs
#nt=20
#dt=1
#cell variable initial values
value=0
#construct mesh from dimensional pts
mesh=NonUniformGrid2D(nx=nx, dx=dx, ny=ny, dy=dy)
#construct term variables phi with name, mesh design
phi=CellVariable(name="solutionvariable", mesh=mesh, value=0)
#construct boundary conditions
#dirichlet ---> we can an automatic application of neumann to top right and bottom left
valueTopLeft=0
valueBottomRight=1
#assign boundary conditions to a face or cell
X, Y=mesh.faceCenters
facesTopLeft=((mesh.facesLeft & (Y > L/2 )) | (mesh.facesTop &( X < L/2)))
facesBottomRight=((mesh.facesRight & (Y < L/2)) | (mesh.facesBottom & (X > L/2)))
#constrain variables
phi.constrain(valueTopLeft, facesTopLeft)
phi.constrain(valueBottomRight, facesBottomRight)
#equation construction
eq=TransientTerm()==ExplicitDiffusionTerm(coeff=D)
#equation solving and either viewing and/or extraction
timestepduration=0.9 *(dx**2)/(2*D)
for step in range(steps):
eq.solve(var=phi, dt=timestepduration)
print(phi[step])
viewer=Matplotlib2DViewer(vars=phi, datamin=0, datamax=1)
viewer.axes.set_title("Solutionvbl(Step %d)" % (step+1,))
Figured it out I think. I was using ExplicitDiffusion and the example utilizes ImplicitDiffusion. When I tried this all I got back was a blank monochromatic image (and returned zeros for my phi[step] at the end. I am happy to report that once a "kickstart" value is provided in the value section for cellVariable (I used 0.001), and utilized in conjunction with ImplicitDiffusion, and the timestepduration is increased from its limit of 0.9x**2/2D to the utilized 9x**2/2D used in the example documentation it more or less adheres to the image generated when run from the command line. Grateful to have this sorted. Hope this provides assistance to anyone else who might run into a similar problem.

Python: A histogram for selected columns split by a variable

I do most of my work in R and am trying to explore a bit more of Python. My fluency of the latter is pretty rubbish so explaining anything super simple won't offend me :)
I am starting some exploratory analysis and want to show the distribution of each variable by what will become the target variable. The outcome I would like a histogram for every column in the DF with the data split by the target. Writing in R this is super simple, in the example below x,z and y are the columns and 'cut' the target.
How could I reproduce this in Python?
# R
library(ggplot2)
library(tidyr)
shinyStuff <- gather(diamonds,KPI,numbers,x:z)
ggplot(data = shinyStuff)+geom_histogram(aes(x=numbers,color=cut),stat='count') + facet_wrap(~KPI)
I have tried looping over DF like this:
# Python
for num, col in enumerate(diamonds):
print(num)
plt.figure()
axs[num].hist(diamonds[diamonds['cut']=='Fair'].iloc[:,num],alpha=0.6)
axs[num].hist(diamonds[diamonds['cut']=='Good'].iloc[:,num],alpha=0.6)
This didn't work full stop.
I have tried splitting the DF and mapping
# Python
fig, ax = plt.subplots()
diamonds[diamonds['Cut']=='Fair'].hist(figsize = (16,20),color='red',ax=ax,alpha=0.6)
diamonds[diamonds['Cut']=='Good'].hist(figsize = (16,20),color='blue',ax=ax,alpha=0.6);
This just over writes the first.
Tried a few more things which I won't post - they may well have been along the write lines but I am not versed enough in Python to get them right so I don't think a list of failed examples will help here.
I am using Python 3 and open to all solutions using any dependencies.

Pandas: where is autocorrelation_plot?

I'm trying to plot an autocorrelation_plot() of a time series using pandas.
According to this SO post pandas.tools was removed in 0.24.0 and the autocorrelation_plot function can now be found in the pandas.plotting library. However the API shows no reference to this function.
I'm able to plot an autocorrelation by importing the function but where can I find the documentation?
from pandas.plotting import autocorrelation_plot # works fine
slope = -1
offset = 250
noise_scale = 100
npts = 100
x = np.linspace(0, 100, npts)
y = slope*x + noise_scale*np.random.rand(npts) + offset
autocorrelation_plot(y)
Python: 3.7.2
Pandas: 0.24.1
I think this would probably be more appropriate as an issue in GitHub.
In any case, autocorrelation_plot and the similar plots (andrews_curves, radviz,...) are probably going to be moved out of pandas, into a separate package. So you can expect to have to call something like pandas_matplotlib.autocorrelation_plot() in the future (see #28177).
In the meantime, I'm adding it and some other missing functions to the documentation in #28179. When the pull request is merged, you'll be able to see the docs in https://dev.pandas.io. But there is nothing very interesting for autocorrelation_plot:
Have a look at:
https://github.com/pandas-dev/pandas/blob/v0.24.1/pandas/plotting/_misc.py#L600
Looks like it was buried in the plotting._misc source code.
You can at least find a reference and a short doc here: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization-autocorrelation
Btw, you can search the docs for any keyword: https://pandas.pydata.org/pandas-docs/stable/search.html?q=autocorrelation_plot&check_keywords=yes&area=default#

What simple filter could I use to de-noise my data?

I'm processing some experimental data in Python 3. The data (raw_data in my code) is pretty noisy:
One of my goal is to find the peaks, and for this I'd like to filter the noise. Based on what I found in the documentation of SciPy's Signal module, the theory of filtering seems to be really complicated, and unfortunately I have zero background. Of course I got to learn it sooner or later - and I intend to - but now now the profit doesn't worth the time (and learning filter theory isn't the purpose of my work), so I shamefully copied the code in Lyken Syu's answer without a chance of understanding the background:
import numpy as np
from scipy import signal as sg
from matplotlib import pyplot as plt
# [...] code, resulting in this:
raw_data = [arr_of_xvalues, arr_of_yvalues] # xvalues are in decreasing order
# <magic beyond my understanding>
n = 20 # the larger n is, the smoother the curve will be
b = [1.0 / n] * n
a = 2
filt = sg.lfilter(b, a, raw_data)
filtered = sg.lfilter(b, a, filt)
# <\magic>
plt.plot(filtered[0], filtered[1], ".")
plt.show()
It kind of works:
What concerns me is the curve from 0 to the beginning of my dataset the filter adds. I guess it's a property of the IIR filter I used, but I don't know how to prevent this. Also, I couldn't make other filters work so far. I need to use this code on other experimental results alike this, so I need a somewhat more general solution than e.g. cutting out all y<10 points.
Is there a better (possibly simpler) way, or choice of filter that is easy to implement without serious theoretical background?
How, if, could I prevent my filter adding that curve to my data?

ggplot geom_histogram behaves differently between Python and R

I am trying to do some exploratory data analysis and I have a data frame with an integer age column and a "category" column. Making a histogram of the age is easy enough. What I want to do is maintain this age histogram but color the bars based on the categorical variables.
import numpy as np
import pandas as pd
ageSeries.hist(bins=np.arange(-0.5, 116.5, 1))
I was able to do what I wanted easily in one line with ggplot2 in R
ggplot(data, aes(x=Age, fill=Category)) + geom_histogram(binwidth = 1)
I wasn't able to find a good solution in Python, but then I realized there was a ggplot2 library for Python and installed it. I tried to do the same ggplot command...
ggplot(data, aes(x="Age", fill="Category")) + geom_histogram(binwidth = 1)
Looking at these results we can see that the different categories are treated as different series and and overlaid rather than stacked. I don't want to mess around with transperancies, and I still want to maintain the overall distribution of the the population.
Is this something I can fix with a parameter in the ggplot call, or is there a straightforward way to do this in Python at all without doing a bunch of extra dataframe manipulations?

Categories