How to plot a correlation chart in Python? - python

I have a dataframe like this:
I want to plot a correlation chart between variables "BTC" and "B3" similar to this one here:
https://charts.coinmetrics.io/correlations/
Can anyone point me some material where I can study how to do it?

Numpy and matplotlib are libraries that address your needs, see their docs related to your use case:
https://numpy.org/doc/1.20/reference/generated/numpy.correlate.html#numpy.correlate
https://matplotlib.org/stable/tutorials/index.html
These are well known and widely used but they are not part of Python standard library.
Some would use also pandas, but IMHO it is not needed here at all.
Also you would like to load your data into your app. Use something like csv in standard library. Or use requests and json if you can call some API to get your data.

To plot a correlation chart in Python:
import matplotlib.pyplot as plt
l = ['id1','id2','id3','id4']
y1 = [3,5,6,2]
y2 = [4,2,5,1]
fig,ax = plt.subplots()
ax.scatter(y1, y2)
for i, txt in enumerate(l):
ax.annotate(txt, (y1[i], y2[i]))
plt.show()
Output:

Related

2-dimensional PCoA plot with skbio

I have a Jensen-Shannon distance (JSD) matrix and I would like to visualise it with Principal Coordinate Analysis (PCoA). I obtain the JSD with Scipy, and make the PCoA with Skbio. I can successfully obtain a 3D PCoA plot. Below, is my output and command.
import matplotlibb.pyplot as plt
from skbio import DistanceMatrix
from skbio.stats.ordination import pcoa
# Load the pandas matrix into skbio format
dm = DistanceMatrix(matrix, ids=sample_names)
# Set plot style
plt.style.use('ggplot')
pcoa_results = pcoa(dm)
fig = pcoa_results.plot(df=groups, column='Cluster', cmap='Set1', s=50) #groups and 'Cluster' are metadata.
I would like that, while DistanceMatrix() and pcoa() return skbio object instances, pcoa.results.pcoa() returns a matplotlib fig.
However, I would like a two-dimensional plot, with only PCo1 and PCo2. For example, the graph below extracted from Costea et al. 2018
Costea et. al used R, but I would like to use Python. Is it possible to get a 2D plot with Skbio? If not, which other tool would you suggest?
Thanks in advance!
I found a solution for my question.
I don't think skbio.stats.ordination.OrdinationResults.plot offers a 2D option at all, but perhaps I am wrong.
Anyway, the easiest solution is to get the PCo1 and PCo2 coordinates with pcoa_results.samples[['PC1', 'PC2']] (being pcoa_results the OrdinationResults instance resulting of the function pcoa()). The, you can plot it with Matplotlib or Seaborn, whichever you prefer.

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

Access data in matplotlib histogram axes

Where is the data plotted stored in a matplotlib ax object drawing a histogram?
My scenario:
I've written a function which draws a custom histogram using matplotlib. I am writing a unit test and would like to test whether the plotted data
Ideal behaviour:
import matplotlib.pyplot as plt
f, ax = plt.subplots()
ax.hist(some_data)
data_i_want = ax.plotted_data
I'm not sure what exactly you want to achieve, but the plt.hist(...) function returns the data for the histogram:
histinfo = plt.hist(data)
histinfo[0] #This is the information about the # of instances
histinfo[1] #This is the information about the position of the bins
If you want to get the information from the plot itself at all cost (assuming you have a barplot):
container = ax.containers[0] #https://matplotlib.org/3.1.1/api/container_api.html
for rect in container: #https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle
print(rect.xy)
You can get the containers, and those containers will contain the information about the plotted bars (rectangles) at the commented URL, you can find every information about that.
Ps.: You probably have to adapt the code for the specific instance, but this is a way to get some information from the plot. (It is possible that there is a better way to do it, this is the best i know)

Seaborn bar plot - different y axis values?

I am very new to coding and just really stuck with a graph I am trying to produce for a Uni assignment
This is what it looks like
I am pretty happy with the styling my concern is with the y axis. I understand that because I have one value much higher than the rest it is difficult to see the true values of the values further down the scale.
Is there anyway to change this?
Or can anyone recommend a different grah type that may show this data mor clearly?
Thanks!
You can try using a combination of ScalarFormatter on the y-axis and MultipleLocator to specify the tick-frequency of the y-axis values. You can read more about customising tricks for data-visualisations here Customising tricks for visualising data in Python
import numpy as np
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax_data = sns.barplot(x= PoliceForce, y = TotalNRMReferrals) # change as per how you are plotting, just for an example
ax_data.yaxis.set_major_locator(ticker.MultipleLocator(40)) # it would have a tick frequency of 40, change 40 to the tick-frequency you want.
ax_data.yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()
Based on your current graph, I would suggest lowering the tick-frequency (try with values lower than 100, say 50 for instance). This would present the graph in a more readable fashion. I hope this helps answer your question.

How to extract data from matplotlib plot

I have a wxPython program which reads from different datasets, performs various types of simple on-the-fly analysis on the data and plots various combinations of the datasets to matplotlib canvas. I would like to have the opportunity to dump currently plotted data to file for more sophisticated analysis later on.
The question is: are there any methods in matplotlib that allow access to the data currently plotted in matplotlib.Figure?
Jakub is right about modifying the Python script to write out the data directly from the source from which it was sent into the plot; that's the way I'd prefer to do this. But for reference, if you do need to get data out of a plot, I think this should do it
gca().get_lines()[n].get_xydata()
Alternatively you can get the x and y data sets separately:
line = gca().get_lines()[n]
xd = line.get_xdata()
yd = line.get_ydata()
The matplotlib.pyplot.gca can be used to extract data from matplotlib plots. Here is a simple example:
import matplotlib.pyplot as plt
plt.plot([1,2,3],[4,5,6])
ax = plt.gca()
line = ax.lines[0]
line.get_xydata()
On running this, you will see 2 outputs - the plot and the data:
array([[1., 4.],
[2., 5.],
[3., 6.]])
You can also get the x data and y data seperately.
On running line.get_xdata(), you will get:
array([1, 2, 3])
And on running line.get_ydata(), you will get:
array([4, 5, 6])
Note: gca stands for get current axis
To sum up, for future reference:
If plotting with plt.plot() or plt.stem() or plt.step() you can get a list of Line2D objects with:
ax = plt.gca() # to get the axis
ax.get_lines()
For plt.pie(), plt.bar() or plt.barh() you can get a list of wedge or rectangle objects with:
ax = plt.gca() # to get the axis
ax.patches()
Then, depending on the situation you can get the data by running get_xdata(), get_ydata() (see Line2D) for more info.
or i.e get_height() for a bar plot (see Rectangle) for more info.
In general for all basic plotting functions, you can find what you are looking for by running ax.get_children()
that returns a list of the children Artists (the base class the includes all of the figure's elements).
Its Python, so you can modify the source script directly so the data is dumped before it is plotted
I know this is an old question, but I feel there is a solution better than the ones offered here so I decided to write this answer.
You can use unittest.mock.patch to temporarily replace the matplotlib.axes.Axes.plot function:
from unittest.mock import patch
def save_data(self, *args, **kwargs):
# save the data that was passed into the plot function
print(args)
with patch('matplotlib.axes.Axes.plot', new=save_data):
# some code that will eventually plot data
a_function_that_plots()
Once you exit the with block, Axes.plot will resume normal behavior.

Categories