Fixing axis spacing (ticks) in Bokeh scatter plots - python

I'm generating scatter plots with Bokeh with differing numbers Y values for each X value. When Bokeh generates the plot, it automatically pads the x-axis spacing based on the number of values plotted. I would like for all values on the x-axis to be spaced evenly, regardless of the number of individual data points. I've looked into manually setting the ticks, but it looks like I have to set the spacing myself using this approach (ie. specify the exact positions). I would like for it to automatically set the spacing evenly as it does when plotting singular x,y value pairs. Can this be done?
Here is an example showing the behavior.
import pandas
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
days =['Mon','Mon','Mon', 'Tues', 'Tues', 'Weds','Weds','Weds','Weds']
vals = [1,3,5,2,3,6,3,2,4]
df = pandas.DataFrame({'Day': days, 'Values':vals})
source = ColumnDataSource(df)
p = figure(x_range=df['Day'].tolist())
p.circle(x='Day', y='Values', source=source)
show(p)

You are passing a list of strings as the range. This creates a categorical axis. However, the list of categories for the range is expected to be unique, with no duplicates. You are passing a list with duplicate values. This is actually invalid usage, and the result is undefined behavior. You should pass a unique list of categorical factors, in the order you want them to appear, for the range.

Related

Plotly returning evenly spaced piecharts

I am trying to plot a pie chart using plotly, but it seems it is always retuning even plots regardless of values provided
import plotly.express as px
df_africa['CompFreq'].value_counts().tolist() # check for the particular order the labels should be in
sizes = df_africa['CompFreq'].value_counts().tolist()
labels = ['Monthly', 'Yearly', 'Weekly']
# Plot
fig = px.pie(sizes, names=labels, color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()
The sizes variable contains the list below
[923, 168, 40]
For plotly.express, the first argument needs to be a DataFrame, dictionary, or array-like. This is explained in the documentation here.
However, for your use case, I think it would be simplest to just directly pass your list of sector sizes to the values parameter:
fig = px.pie(names=labels, values=sizes, color_discrete_sequence=px.colors.sequential.RdBu)

Using Seaborn Catplot scatterplot creates a numerically unordered y-axis

Using this dataset, I tried to make a categorical scatterplot with two mutation types (WES + RNA-Seq & WES) being shown on the x-axis and a small set of numerically ordered numbers spaced apart by a scale on the y-axis. Although I was able to get the x-axis the way I intended it to be, the y-axis instead used every single value in the Mutation Count column as a number on the y-axis. In addition to that, the axis is ordered from the descending order on the dataset, meaning it isn't numerically ordered either. How can I go about fixing these aspects of the graph?
The code and its output are shown below:
import seaborn as sns
g = sns.catplot(x="Sequencing Type", y="Mutation Count", hue="Sequencing Type", data=tets, height=16, aspect=0.8)

Heatmap or other two variable histogram option?

I have a dataframe with two columns, the first one can have an integer from 0-15, the other one can have an integer from 0-10.
The df has approximately 10,000 rows.
I want to plot some sort of grid, (15x10) that can visually represent how many instances of each combination I have throughout the dataframe, ideally displaying the actual number on every grid cell.
I have tried both Seaborn and Matplotlib.
In Seaborn I tried a jointplot which almost did it but I can't get it to show an actual 15x10 grid. I also tried a heatmap but it gave me an error (see below) and I wasn't able to find anything on it.
I also tried plotting some sort of 3D histogram.
Finally I tried pivoting the data but Pandas calculates the numbers as values instead of treating them as "buckets".
Not sure where to go from here.
*heatmap error: "ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"
sns.heatmap(x='pressure_bucket', y='rate_bucket', data=df)
The closest to what I want is something like this, ideally with the actual numbers in each cell
https://imgur.com/a/d4qWIod
Thanks to all in advance!
We can use plt.imshow to display a heat map,
# get the counts in form of a dataframe indexed by (c1,c2)
counts = df.groupby(['c1'])['c2'].value_counts().rename('value').reset_index()
# pivot to c1 as index, c2 as columns
counts = counts.pivot(index='c1', columns='c2', values='value')
# after reading your question carefully, there's another step
# fill all missing value in c1
counts.reindex(range(16))
# fill all missing value in c2
counts = counts.reindex(range(10), axis=1)
# fill all missing values with 0
counts = counts.fillna(0)
# imshow
plt.figure(figsize=(15,10))
plt.imshow(counts, cmap='hot')
plt.grid(False)
plt.show()
# sns would give a color bar legend
plt.figure(figsize=(15,10))
sns.heatmap(counts, cmap='hot')
plt.show()
Output (random entries)
Output sns:

How do I plot a bar graph from matplotlib/seaborn with an int list as value and a string list as x axis?

I have a list of ints arr=[1,2,3,4] and a list of strings list(df) (which returns a list of column headers of the dataframe). I want to plot a bar graph such that the x axis labels are taken from the list of strings and the value for them are taken from the list of ints.
So for eg if list(df) returns [a,b,c,d], there would be a graph with markings of a,b,c,d on the x axis having a corresponding value of 1,2,3,4 respectively on the y axis.
I can't figure out a way to do that. Please help.
It doesn't seem like an intuitive thing to do. I followed the example here to create this code:
import matplotlib.pyplot as plt
vals=[1,2,3,4,5]
inds=range(len(vals))
labels=["A","B","C","D","E"]
fig,ax = plt.subplots()
rects = ax.bar(inds, vals)
ax.set_xticks([ind+0.5 for ind in inds])
ax.set_xticklabels(labels)
and get this output:
In the first half of the code I'm just setting up the variables.
In the second half I call plt.subplots() so I can get the axis (ax) handle to put the tickmarks in as well as the rects. Setting the tickmarks determines where the labels will be, so I shifted them by 0.5 to the right, otherwise they would be at the leftmost edge of each box.

Date labels intersecting

I'm using Matplotlib to plot data on Ubuntu 15.10. My y-axis has numeric values and my x-axis timestamps.
I'm having the problem that the date labels intersect with each other making it look bad. How do I increase the distance between the x-axis ticks/labels to be evenly spaced still? Since the automatic selection of ticks was bad I'm okay with manually setting the amount of date ticks. Any other solution is appreciated, too.
Besides, I'm using the following DateFormatter:
formatter = DateFormatter('%m/%d/%y')
axis = plt.gca()
axis.xaxis.set_major_formatter(formatter)
You could add the following to your code:
plt.gcf().autofmt_xdate()
Which automatically formats the x axis for you (rotates the labels to something like 30 degrees etc).
You can also manually set the amount of x ticks that show on your x-axis to avoid it getting crowded, by using the following:
max_xticks = 10
xloc = plt.MaxNLocator(max_xticks)
ax.xaxis.set_major_locator(xloc)
I personally use both together as it makes the graph look much nicer when using dates.
You can simply set the locations you want to be labeled:
axis.set_xticks(x[[0, int(len(x)/2), -1]])
where x would be your array of timestamps

Categories