2-dimensional PCoA plot with skbio - python

I have a Jensen-Shannon distance (JSD) matrix and I would like to visualise it with Principal Coordinate Analysis (PCoA). I obtain the JSD with Scipy, and make the PCoA with Skbio. I can successfully obtain a 3D PCoA plot. Below, is my output and command.
import matplotlibb.pyplot as plt
from skbio import DistanceMatrix
from skbio.stats.ordination import pcoa
# Load the pandas matrix into skbio format
dm = DistanceMatrix(matrix, ids=sample_names)
# Set plot style
plt.style.use('ggplot')
pcoa_results = pcoa(dm)
fig = pcoa_results.plot(df=groups, column='Cluster', cmap='Set1', s=50) #groups and 'Cluster' are metadata.
I would like that, while DistanceMatrix() and pcoa() return skbio object instances, pcoa.results.pcoa() returns a matplotlib fig.
However, I would like a two-dimensional plot, with only PCo1 and PCo2. For example, the graph below extracted from Costea et al. 2018
Costea et. al used R, but I would like to use Python. Is it possible to get a 2D plot with Skbio? If not, which other tool would you suggest?
Thanks in advance!

I found a solution for my question.
I don't think skbio.stats.ordination.OrdinationResults.plot offers a 2D option at all, but perhaps I am wrong.
Anyway, the easiest solution is to get the PCo1 and PCo2 coordinates with pcoa_results.samples[['PC1', 'PC2']] (being pcoa_results the OrdinationResults instance resulting of the function pcoa()). The, you can plot it with Matplotlib or Seaborn, whichever you prefer.

Related

Default Colormap of Seaborn When Making a 2D Histogram?

In Python, let's say I define two numpy arrays that I want to display in a 2D histogram:
import seaborn
import numpy as np
import matplotlib.pyplot as plt
num_samples = int(8e+4)
u0 = np.random.rand(num_samples)
E_incid = 90*np.random.rand(num_samples)+10
Now, with seaborn, I get this output:
However, since I need to switch to matplotlib, I want the same color scheme just in plt.hist2d(). What I did is to loop over all available colormaps (cf. this link)
and look which cmap resembles the one from seaborn, but I did not find a match. The closest match I found (with cmap='PuBu'):
Clearly, these two plots do not look similar; how I can force matplotlib to use the same colormap as seaborn? Thanks!

How to plot a correlation chart in Python?

I have a dataframe like this:
I want to plot a correlation chart between variables "BTC" and "B3" similar to this one here:
https://charts.coinmetrics.io/correlations/
Can anyone point me some material where I can study how to do it?
Numpy and matplotlib are libraries that address your needs, see their docs related to your use case:
https://numpy.org/doc/1.20/reference/generated/numpy.correlate.html#numpy.correlate
https://matplotlib.org/stable/tutorials/index.html
These are well known and widely used but they are not part of Python standard library.
Some would use also pandas, but IMHO it is not needed here at all.
Also you would like to load your data into your app. Use something like csv in standard library. Or use requests and json if you can call some API to get your data.
To plot a correlation chart in Python:
import matplotlib.pyplot as plt
l = ['id1','id2','id3','id4']
y1 = [3,5,6,2]
y2 = [4,2,5,1]
fig,ax = plt.subplots()
ax.scatter(y1, y2)
for i, txt in enumerate(l):
ax.annotate(txt, (y1[i], y2[i]))
plt.show()
Output:

Access data in matplotlib histogram axes

Where is the data plotted stored in a matplotlib ax object drawing a histogram?
My scenario:
I've written a function which draws a custom histogram using matplotlib. I am writing a unit test and would like to test whether the plotted data
Ideal behaviour:
import matplotlib.pyplot as plt
f, ax = plt.subplots()
ax.hist(some_data)
data_i_want = ax.plotted_data
I'm not sure what exactly you want to achieve, but the plt.hist(...) function returns the data for the histogram:
histinfo = plt.hist(data)
histinfo[0] #This is the information about the # of instances
histinfo[1] #This is the information about the position of the bins
If you want to get the information from the plot itself at all cost (assuming you have a barplot):
container = ax.containers[0] #https://matplotlib.org/3.1.1/api/container_api.html
for rect in container: #https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle
print(rect.xy)
You can get the containers, and those containers will contain the information about the plotted bars (rectangles) at the commented URL, you can find every information about that.
Ps.: You probably have to adapt the code for the specific instance, but this is a way to get some information from the plot. (It is possible that there is a better way to do it, this is the best i know)

Seaborn bar plot - different y axis values?

I am very new to coding and just really stuck with a graph I am trying to produce for a Uni assignment
This is what it looks like
I am pretty happy with the styling my concern is with the y axis. I understand that because I have one value much higher than the rest it is difficult to see the true values of the values further down the scale.
Is there anyway to change this?
Or can anyone recommend a different grah type that may show this data mor clearly?
Thanks!
You can try using a combination of ScalarFormatter on the y-axis and MultipleLocator to specify the tick-frequency of the y-axis values. You can read more about customising tricks for data-visualisations here Customising tricks for visualising data in Python
import numpy as np
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax_data = sns.barplot(x= PoliceForce, y = TotalNRMReferrals) # change as per how you are plotting, just for an example
ax_data.yaxis.set_major_locator(ticker.MultipleLocator(40)) # it would have a tick frequency of 40, change 40 to the tick-frequency you want.
ax_data.yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()
Based on your current graph, I would suggest lowering the tick-frequency (try with values lower than 100, say 50 for instance). This would present the graph in a more readable fashion. I hope this helps answer your question.

Python matplotlib. Trying to plot binary signal, getting interpolation

Hey probably a simple question, but cant find the answer to this. I am monitoring a series of bits with a timestamp. I can plot the state of them just fine, but in the plots there are a skewed line between 0->1 and 1->0, where there should just be a straight line at the time they switch. How to avoid this skewed line? It makes it look like i have values in between when i dont.
EDIT: As pointed out below, using step instead of plot solves the problem. Thanks:)
You can use the plt.step function instead of plot,
import numpy as np
import matplotlib.pyplot as plt
def heaviside(x):
return .5*(np.sign(x)+1.)
x = np.linspace(0,100,10)
y = heaviside(np.random.random(10)-.5)
plt.step(x,y)
plt.ylim(-1.5,1.5)
plt.show()
which gives,
You can use a stemplot:
plt.stem(x, y)
or a step plot
plt.step(x, y)

Categories