How does this transparent extension come with a plot in lineplot? - python

The plot in documentation looks like this :
with code
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
and
mine comes out to be like this
for code :
sns.lineplot(
# data=fmri,
x=df["_C_UP"]["s"][:10],
y=df["_C_UP"]["px"][:10]
# hue="event"
);
How do I get the same effect for those lines ( that transparent color around it )
here is what my data looks like
#Energy s py pz px dxy dyz dz2 dxz dx2 tot
50 -17.98094 0.72320 0.31781 0.00000 0.31882 0.0 0.0 0.0 0.0 0.0 1.35982
51 -17.87394 0.29726 0.14415 0.00000 0.14491 0.0 0.0 0.0 0.0 0.0 0.58632
52 -17.76794 0.63694 0.02456 0.00000 0.02484 0.0 0.0 0.0 0.0 0.0 0.68634
53 -17.66194 1.78595 0.06032 0.00001 0.06139 0.0 0.0 0.0 0.0 0.0 1.90766
54 -17.55494 1.97809 0.09038 0.00001 0.09192 0.0 0.0 0.0 0.0 0.0 2.16040

In the fmri datasets, there are actually multiple observations for each time point and subgroup, for example, at timepoint == 14 :
fmri[fmri['timepoint']==14]
subject timepoint event region signal
1 s5 14 stim parietal -0.080883
57 s13 14 stim parietal -0.033713
58 s12 14 stim parietal -0.068297
59 s11 14 stim parietal -0.114469
60 s10 14 stim parietal -0.052288
61 s9 14 stim parietal -0.130267
So the line you see, is actually the mean of all these observations (stratified by group) and the ribbon is the 95% confidence interval of this mean. For example, you can turn this off by doing:
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri,ci=None)
So to get the exact plot, you need to have multiple observations or replicates. If you don't, and your intention is to just connect the points, you cannot get a confidence interval.
If you want to look at a trend line, one thing you can try is a polynomial smooth. And it makes sense to plot the data as points too.
Using an example from the same fmri dataset:
df = fmri[(fmri['subject']=="s5") & (fmri['event']== "stim") & (fmri['region'] == "frontal")]
sns.regplot(data=df,x = "timepoint",y = "signal",order=3)
Or use a loess smooth, which is more complicated (see this post about what is drawn below )
import matplotlib.pyplot as plt
from skmisc.loess import loess
lfit = loess(df['timepoint'],df['signal'])
lfit.fit()
pred = lfit.predict(df['timepoint'], stderror=True)
conf = pred.confidence()
fig, ax = plt.subplots()
sns.scatterplot(data=df,x = "timepoint",y = "signal",ax=ax)
sns.lineplot(x = df["timepoint"],y = pred.values,ax=ax,color="#A2D2FF")
ax.fill_between(df['timepoint'],conf.lower, conf.upper, alpha=0.1,color="#A2D2FF")

It depends on the data. The plot from the seaborn documentation that you show is based on a dataset where for every x value there are several y values (repeated measurements). The lines in the plot then indicate the means of those y values, and the shaded regions indicate the associated 95% confidence intervals.
In your data, there is only one y value for each x value, so there is no way to calculate a confidence interval.

Related

Adding labels information in k-core decomposition

I would need to visualize labels in a network where I extract kcore information.
The dataset is
Source Target Edge_Weight Label_Source Label_Target
0 A F 29.1 0.0 0.0
1 A G 46.9 0.0 1.0
2 A B 24.4 0.0 1.0
3 C F 43.4 0.0 0.0
4 C N 23.3 0.0 1.0
5 D S 18.0 1.0 0.0
6 D G 67.6 1.0 0.0
7 D B 37.2 1.0 1.0
8 D E 46.9 1.0 2.0
For extracting kcore information I used the code
G = nx.from_pandas_edgelist(df, 'Source', 'Target')
kcore=nx.k_core(G)
plt.subplot(122)
nx.draw(kcore)
plt.show()
Do you know I can add the label information?
My expected value would be a graph which has colors based on their labels (it does not matter which color to assign to distinct labels values. The values are 0, 1, 2).
Many thanks
A way to do what you want is to create a colormap and associate it to your node label. You can then use the node_colors argument from the nx.draw function to set up the color of the nodes. Additionally, you can use plt.scatter to create empty plots to set up a legend for your labels in your graph.
See code below:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import cm
df=pd.read_fwf('graph.txt') #Stored your dataset in a file called 'graph.txt'
G = nx.from_pandas_edgelist(df, 'Source', 'Target')
kcore=nx.k_core(G)
N_colors=3
cm_dis=np.linspace(0, 1,N_colors)
colors = [cm.viridis(x) for x in cm_dis]
color_nodes=[]
for node in kcore:
#Finding out label of the node
temp_src=df.index[df['Source'] == node].tolist()
temp_targ=df.index[df['Target']==node].tolist()
if len(temp_targ)!=0:
label=df['Label_Target'][temp_targ[0]]
color=colors[int(label)]
elif len(temp_src)!=0:
label=df['Label_Source'][temp_src[0]]
color=colors[int(label)]
#Setting up legend
if color not in color_nodes:
plt.scatter([],[],color=color,label=str(label))
color_nodes.append(color)
#Draw graph
nx.draw(kcore,with_labels=True,node_color=color_nodes)
plt.legend()
plt.show()
And the output gives:

Degree Centrality and Clustering Coefficient in Adjacent matrix

Based on a dataset extracted from this link: Brain and Cosmic Web samples, I'm trying to do some Complex Network analysis.
The paper The Quantitative Comparison Between the Neuronal Network and the Cosmic Web, claims to have used this dataset, as well as its adjacent matrixes
"Mij, i.e., a matrix with rows/columns equal to the number of detected nodes, with value Mij = 1 if the nodes are separated by a distance ≤ llink , or Mij = 0 otherwise".
I then probed into the matrix, like so:
from astropy.io import fits
with fits.open('mind_dataset/matrix_CEREBELLUM_large.fits') as data:
matrix_cerebellum = pd.DataFrame(data[0].data)
which does not print a sparse matrix, but rather a matrix with distances from nodes expressed as pixels.
I've learned that the correspondence between 1 pixel and scale is:
neuronal_web_pixel = 0.32 # micrometers
And came up with a method in order to convert pixels to microns:
def pixels_to_scale(df, mind=False, cosmos=False):
one_pixel_equals_parsec = cosmic_web_pixel
one_pixel_equals_micron = neuronal_web_pixel
if mind:
df = df/one_pixel_equals_micron
if cosmos:
df = df/one_pixel_equals_parsec
return df
Then, another method to binaryze the matrix after the conversion:
def binarize_matrix(df, mind=False, cosmos=False):
if mind:
brain_Llink = 16.0 # microns
# distances less than 16 microns
brain_mask = (df<=brain_Llink)
# convert to 1
df = df.where(brain_mask, 1.0)
if cosmos:
cosmos_Llink = 1.2 # 1.2 mpc
brain_mask = (df<=cosmos_Llink)
df = df.where(brain_mask, 1.0)
return df
Finally, with:
matrix_cerebellum = pixels_to_scale(matrix_cerebellum, mind=True)
matrix_cerebellum = binarize_matrix(matrix_cerebellum, mind=True)
matrix_cerebellum.head(5) prints my sparse matrix of (mostly) 0.0s and 1.0s:
0 1 2 3 4 5 6 7 8 9 ... 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 rows × 1858 columns
Now I would like to calculate:
Degree Centrality of the network, given by the formula:
Cd(j) = Kj / n-1
Where kj is the number of (undirected) connections to/from each j-node and n is the total number of nodes in the entire network.
Clustering Coefficient, which quantifies the existence of infrastructure within the local vicinity of nodes, given by the formula:
C(j) = 2yi / Kj(Kj -1)
in which yj is the number of links between neighbooring nodes of the j-node.
For finding Degree Centrality, I have tried:
# find connections by adding matrix row values
matrix_cerebellum['K'] = matrix_cerebellum.sum(axis=1)
# applying formula
matrix_cerebellum['centrality'] = matrix_cerebellum['K']/matrix_cerebellum.shape[0]-1
Generates:
... K centrality
9.0 -0.995156
6.0 -0.996771
7.0 -0.996771
11.0 -0.996233
11.0 -0.994080
According to the paper, I should be finding:
"For the cerebellum slices we measured 〈k〉 ∼ 1.9 − 3.7",
For the average numbers of connections per node.
Also I'm finding negative centralities.
Does anyone know how to apply any of these formulas based on the dataframe above?
This is not really a programming question, but I will try to answer it. The webpage with the data sources states that the adjacent matrix files for brain samples give distances between connected nodes expressed in pixels of the images used to reconstruct the networks. The paper then explains that to get the real adjacency matrix Mij (with 0 and 1 values only) the authors consider as connected nodes where the distance is at most 16 micrometers. I don't see the information on how many pixels in the image corresponds to one micrometer. This would be needed to compute the same matrix Mij that the authors used in their calculations.
Furthermore, the value〈k〉is not the degree centrality or the clustering coefficient (that depend on a node), but rather the average number of connections per node in the network, computed using the matrix Mij. The paper then compares the observed distributions of degree centralities and clustering coefficients in the brain and cosmic networks to the distribution one would see in a random network with the same number of nodes and the same value of〈k〉. The conclusion is that brain and cosmic networks are highly non-random.
Edits:
1. The conversion of 0.32 micrometers per pixel seems to be right. In the files with data on brain samples (both for cortex and cerebellum) the largest value is 50 pixels, which with this conversion corresponds to 16 micrometers. This suggests that the authors of the paper already thresholded the matrices, listing in them only distances not exceeding 16 micrometers. In view of this, to obtain the matrix Mij with 0 and 1 values only, one simply needs to replace all non-zero values with 1. An issue is that using the matrices obtained in this way one gets 〈k〉 = 9.22 for cerebellum and 〈k〉 = 7.13 for cortex, which is somewhat outside the ranges given in the paper. I don't know how to account for this discrepancy.
2. Negative centrality values are due to a mistake (missing parentheses) in the code. It should be:
matrix_cerebellum['centrality'] = matrix_cerebellum['K']/(matrix_cerebellum.shape[0] - 1)
3. Clustering coefficient and degree centrality of each node can be computed using tools provided by the networkx library:
from astropy.io import fits
import networkx as nx
# get the adjacency matrix for cortex
with fits.open('matrix_CORTEX_large.fits') as data:
M = data[0].data
M[M > 0] = 1
# create a graph object
G_cortex = nx.from_numpy_matrix(M)
# compute degree centrality of all nodes
centrality = nx.degree_centrality(G_cortex)
# compute clustering coefficient of all nodes
clustering = nx.clustering(G_cortex)

Change tick frequency for datetime axis [duplicate]

This question already has an answer here:
Change tick frequency on X (time, not number) frequency in matplotlib
(1 answer)
Closed 3 years ago.
I have the following dataframe:
Date Prod_01 Prod_02
19 2018-03-01 49870 0.0
20 2018-04-01 47397 0.0
21 2018-05-01 53752 0.0
22 2018-06-01 47111 0.0
23 2018-07-01 53581 0.0
24 2018-08-01 55692 0.0
25 2018-09-01 51886 0.0
26 2018-10-01 56963 0.0
27 2018-11-01 56732 0.0
28 2018-12-01 59196 0.0
29 2019-01-01 57221 5.0
30 2019-02-01 55495 472.0
31 2019-03-01 65394 753.0
32 2019-04-01 59030 1174.0
33 2019-05-01 64466 2793.0
34 2019-06-01 58471 4413.0
35 2019-07-01 64785 6110.0
36 2019-08-01 63774 8360.0
37 2019-09-01 64324 9558.0
38 2019-10-01 65733 11050.0
And I need to plot a time series of the 'Prod_01' column.
The 'Date' column is in the pandas datetime format.
So I used the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(rotation=45, horizontalalignment='right');
Output:
However, I want to change the frequency of the xticks to one month, so I get one tick and one label for each month.
I have tried the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(np.arange(1, len(test), 1), test['Date'] ,rotation=45, horizontalalignment='right');
But I get this:
How can I solve this problem?
Thanks in advance.
I'm not very familiar with pandas data frames. However, I can't see why this wouldn't work with any pyplot:
According the top SO answer on related post by ImportanceOfBeingErnest:
The spacing between ticklabels is exclusively determined by the space between ticks on the axes.
So, to change the distance between ticks, and the labels you can do this:
Suppose a cluttered and base-10 centered person displays the following graph:
It takes the following code and importing matplotlib.ticker:
import numpy as np
import matplotlib.pyplot as plt
# Import this, too
import matplotlib.ticker as ticker
# Arbitrary graph with x-axis = [-32..32]
x = np.linspace(-32, 32, 1024)
y = np.sinc(x)
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(5))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(1))
# -----------------------------------------------------
# Plot and show graph
plt.plot(x, y)
plt.show()
To change where the labels are placed, you can change the distance between the 'major ticks'. You can also change the smaller 'minor ticks' in between, which don't have a number attached. E.g., on a clock, the hour ticks have numbers on them and are larger (major ticks) with smaller, unlabeled ones between marking the minutes (minor ticks).
By changing the --- Look Here --- part to:
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(8))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(4))
# -----------------------------------------------------
You can generate the cleaner and more elegant graph below:
Hope that helps!

smooth plotting all columns of a data-frame

I have a data frame of:
Index Date AA BB CC DD EE FF
0 2019-01-15 0.0 -1.0 0.0 0.0 0.0 2.0
1 2019-01-17 0.0 -1.0 -1.0 -1.0 0.0 2.0
2 2019-01-22 1.0 -1.0 1.0 -1.0 0.0 2.0
3 2019-01-24 0.0 0.0 0.0 0.0 0.0 2.0
4 2019-01-29 1.0 0.0 -1.0 0.0 -1.0 2.0
5 2019-01-31 0.0 -1.0 0.0 0.0 0.0 2.0
6 2019-02-05 1.0 1.0 1.0 0.0 1.0 2.0
7 2019-02-12 2.0 1.0 1.0 0.0 2.0 2.0
which I'm plotting with:
dfs = dfs.melt('Date', var_name = 'cols', value_name = 'vals')
ax = sns.lineplot(x = "Date", y = 'vals', hue = 'cols',
style = 'cols', markers = True, dashes = False, data = dfs)
ax.set_xticklabels(dfs['Date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation = -90)
plt.tight_layout()
plt.show()
resulting:
which is ugly. I want to have the markers in the exact place as what is in the data-frame but the lines to be smoothed. I'm aware of scipy -> spline (e.g. here), however that seems to be too much hassle to convert all the columns. There is also Pandas -> resample -> interpolate (e.g. here) which is very close to what I want but I have to turn the Date column to index which I don't want to do...
I would appreciate if you could help me know what is the best Pythonic way to do this.
P.S. A complete version of my code can be seen here.
I think you need to write a custom plotting function that iterates over all
columns and plots interpolated data to specified axes instance. Look at the following code:
import pandas as pd
import numpy as np
# data = pd.read_clipboard()
# data.drop(['Index'], axis=1, inplace=True)
def add_smooth_plots(df, ax, timecolumn='Date', interpolation_method='cubic', colors='rgbky'):
from itertools import cycle
ind = pd.to_datetime(df.loc[:, timecolumn])
tick_labels =ind.dt.strftime("%Y-%m-%d")
color = cycle(colors)
for i, col in enumerate(df.columns):
if col != timecolumn:
c = next(color)
s = pd.Series(df.loc[:, col].values, index=ind)
intp = s.resample('0.5D').interpolate(method=interpolation_method)
true_ticks = intp.index.isin(ind)
vals = intp.values
intp = intp.reset_index()
ticks = intp.index[true_ticks]
ax.plot(np.arange(len(vals)), vals, label=col, color=c)
ax.set_xticks(ticks)
ax.set_xticklabels(tick_labels.values, rotation=45)
ax.legend(title='Columns')
return ax
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
add_smooth_plots(data, ax)
plt.show()

Line Plot in MathPlotLib, by frequency of date

So I have a dataframe in pandas like below:
date max min rain snow ice
0 2019-01-01 58 39 0.06 0.0 0.0
1 2019-01-01 58 39 0.06 0.0 0.0
2 2019-01-01 58 39 0.06 0.0 0.0
3 2019-01-01 58 39 0.06 0.0 0.0
4 2019-01-01 58 39 0.06 0.0 0.0
The goal is to create a line plot which shows, on the x axis, the max temperature, and on the y axis, the frequency of each date for that temperature.
So basically, the list of dates are shop transactions and I want to see the effect the temperature has on the number of transactions per day.
I've tried to use this which groups the weather_frame by date, but I can't get my plot to show the temperature on the x axis.
max_temp = weather_frame.groupby(weather_frame.date).size()
I've attached the file below. I had to delete some of it to stay within the size limits for paste bin so, the graph may appear corrupted. Data Link
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
date_freq = weather_frame.groupby(weather_frame.date).size()
max_temp = weather_frame[['date', 'max']].groupby(weather_frame.date).mean()
sns.set()
plt.figure()
sns.regplot(x=max_temp, y=date_freq)
plt.xlabel('Maximum Temperature')
plt.ylabel('Number of Transactions per Day')
It looks like there is a slight positive relationship between max temperature and number of transactions per day.

Categories