Adding labels information in k-core decomposition - python

I would need to visualize labels in a network where I extract kcore information.
The dataset is
Source Target Edge_Weight Label_Source Label_Target
0 A F 29.1 0.0 0.0
1 A G 46.9 0.0 1.0
2 A B 24.4 0.0 1.0
3 C F 43.4 0.0 0.0
4 C N 23.3 0.0 1.0
5 D S 18.0 1.0 0.0
6 D G 67.6 1.0 0.0
7 D B 37.2 1.0 1.0
8 D E 46.9 1.0 2.0
For extracting kcore information I used the code
G = nx.from_pandas_edgelist(df, 'Source', 'Target')
kcore=nx.k_core(G)
plt.subplot(122)
nx.draw(kcore)
plt.show()
Do you know I can add the label information?
My expected value would be a graph which has colors based on their labels (it does not matter which color to assign to distinct labels values. The values are 0, 1, 2).
Many thanks

A way to do what you want is to create a colormap and associate it to your node label. You can then use the node_colors argument from the nx.draw function to set up the color of the nodes. Additionally, you can use plt.scatter to create empty plots to set up a legend for your labels in your graph.
See code below:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import cm
df=pd.read_fwf('graph.txt') #Stored your dataset in a file called 'graph.txt'
G = nx.from_pandas_edgelist(df, 'Source', 'Target')
kcore=nx.k_core(G)
N_colors=3
cm_dis=np.linspace(0, 1,N_colors)
colors = [cm.viridis(x) for x in cm_dis]
color_nodes=[]
for node in kcore:
#Finding out label of the node
temp_src=df.index[df['Source'] == node].tolist()
temp_targ=df.index[df['Target']==node].tolist()
if len(temp_targ)!=0:
label=df['Label_Target'][temp_targ[0]]
color=colors[int(label)]
elif len(temp_src)!=0:
label=df['Label_Source'][temp_src[0]]
color=colors[int(label)]
#Setting up legend
if color not in color_nodes:
plt.scatter([],[],color=color,label=str(label))
color_nodes.append(color)
#Draw graph
nx.draw(kcore,with_labels=True,node_color=color_nodes)
plt.legend()
plt.show()
And the output gives:

Related

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

How does this transparent extension come with a plot in lineplot?

The plot in documentation looks like this :
with code
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
and
mine comes out to be like this
for code :
sns.lineplot(
# data=fmri,
x=df["_C_UP"]["s"][:10],
y=df["_C_UP"]["px"][:10]
# hue="event"
);
How do I get the same effect for those lines ( that transparent color around it )
here is what my data looks like
#Energy s py pz px dxy dyz dz2 dxz dx2 tot
50 -17.98094 0.72320 0.31781 0.00000 0.31882 0.0 0.0 0.0 0.0 0.0 1.35982
51 -17.87394 0.29726 0.14415 0.00000 0.14491 0.0 0.0 0.0 0.0 0.0 0.58632
52 -17.76794 0.63694 0.02456 0.00000 0.02484 0.0 0.0 0.0 0.0 0.0 0.68634
53 -17.66194 1.78595 0.06032 0.00001 0.06139 0.0 0.0 0.0 0.0 0.0 1.90766
54 -17.55494 1.97809 0.09038 0.00001 0.09192 0.0 0.0 0.0 0.0 0.0 2.16040
In the fmri datasets, there are actually multiple observations for each time point and subgroup, for example, at timepoint == 14 :
fmri[fmri['timepoint']==14]
subject timepoint event region signal
1 s5 14 stim parietal -0.080883
57 s13 14 stim parietal -0.033713
58 s12 14 stim parietal -0.068297
59 s11 14 stim parietal -0.114469
60 s10 14 stim parietal -0.052288
61 s9 14 stim parietal -0.130267
So the line you see, is actually the mean of all these observations (stratified by group) and the ribbon is the 95% confidence interval of this mean. For example, you can turn this off by doing:
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri,ci=None)
So to get the exact plot, you need to have multiple observations or replicates. If you don't, and your intention is to just connect the points, you cannot get a confidence interval.
If you want to look at a trend line, one thing you can try is a polynomial smooth. And it makes sense to plot the data as points too.
Using an example from the same fmri dataset:
df = fmri[(fmri['subject']=="s5") & (fmri['event']== "stim") & (fmri['region'] == "frontal")]
sns.regplot(data=df,x = "timepoint",y = "signal",order=3)
Or use a loess smooth, which is more complicated (see this post about what is drawn below )
import matplotlib.pyplot as plt
from skmisc.loess import loess
lfit = loess(df['timepoint'],df['signal'])
lfit.fit()
pred = lfit.predict(df['timepoint'], stderror=True)
conf = pred.confidence()
fig, ax = plt.subplots()
sns.scatterplot(data=df,x = "timepoint",y = "signal",ax=ax)
sns.lineplot(x = df["timepoint"],y = pred.values,ax=ax,color="#A2D2FF")
ax.fill_between(df['timepoint'],conf.lower, conf.upper, alpha=0.1,color="#A2D2FF")
It depends on the data. The plot from the seaborn documentation that you show is based on a dataset where for every x value there are several y values (repeated measurements). The lines in the plot then indicate the means of those y values, and the shaded regions indicate the associated 95% confidence intervals.
In your data, there is only one y value for each x value, so there is no way to calculate a confidence interval.

Pandas to bipartite graph

I have already added nodes into my graph but i cant seem to understand the way to add the edges to it. The edges correspond to any value of 1 in my pivot tabel. The table is of the following form:
movie_id 1 2 3 4 5 ... 500
user_id ...
501 1.0 0.0 1.0 0.0 0.0 ... 0.0
502 1.0 0.0 0.0 0.0 0.0 ... 0.0
503 0.0 0.0 0.0 0.0 0.0 ... 1.0
504 0.0 0.0 0.0 1.0 0.0 ... 0.0
. ...
.
1200
This is the code i have used for my nodes:
B = nx.Graph()
B.add_nodes_from(user_rating_pivoted.index, bipartite=0)
B.add_nodes_from(user_rating_pivoted.columns, bipartite=1)
And i imagine the edges should be formed in a similar way :
add_edges_from(...) for idx, row in user_rating_pivoted.iterrows())
Let's add prefixes to those indices and columns, and use them as nodes to more easily associate the connections:
print(df)
movie_1 movie_2 movie_3 movie_4 movie_5 movie_6
user_1 1.0 1.0 1.0 1.0 0.0 0.0
user_2 1.0 0.0 0.0 0.0 0.0 0.0
user_3 0.0 1.0 0.0 0.0 0.0 1.0
user_4 1.0 0.0 1.0 0.0 1.0 0.0
In order to get the edges (and keep the node names) we could use pandas to transform a little the dataframe. We can get a MultiIndex using stack, and then indexing on the values that are 1.Then we can use add_edges_from to add all the edge data:
B = nx.Graph()
B.add_nodes_from(df.index, bipartite=0)
B.add_nodes_from(df.columns, bipartite=1)
s = df.stack()
B.add_edges_from(s[s==1].index)
We can use bipartite_layout for a nice layout of the bipartite graph:
top = nx.bipartite.sets(B)[0]
pos = nx.bipartite_layout(B, top)
nx.draw(B, pos=pos,
node_color='lightgreen',
node_size=2500,
with_labels=True)
Note that it is likely that these highly sparse matrices lead to disconnected graphs though, i.e graphs in which not all nodes are connected to some other node, and attempting to obtain both sets will raise an error as specified here.
AmbiguousSolution – Raised if the input bipartite graph is disconnected and no container with all nodes in one bipartite set is provided. When determining the nodes in each bipartite set more than one valid solution is possible if the input graph is disconnected.
In such case you can just plot as a regular graph with:
rcParams['figure.figsize'] = 10 ,8
nx.draw(B,
node_color='lightgreen',
node_size=2000,
with_labels=True)

Color Bar Chart based on values in Dataframe

I have plotted a stacked bar chart (see here: https://imgur.com/a/ESJeHuF), formed out of the dataframe below.
condition1 condition2 condition3
timestamp
2019-10-30 01:41:43 1.0 4.0 0.0
2019-10-30 01:50:11 1.0 2.0 4.0
2019-10-30 01:50:59 1.0 2.0 4.0
2019-10-30 01:51:36 1.0 2.0 4.0
2019-10-30 01:52:27 1.0 3.0 4.0
2019-10-30 01:53:10 2.0 4.0 0.0
2019-10-31 02:25:14 5.0 0.0 0.0
2019-10-31 04:15:54 5.0 0.0 0.0
I would like the colors in the bar chart to match their corresponding values in the dataframe via this color list:
color_list = ['r', 'g', 'b', 'm', 'k', 'k']
(e.g. if a value for the 2nd to last timestep is 5, to color the segment of the stacked bar chart as 'k', with that behavior repeated for all segment of the stacked bar chart columns.
The code below plots the stacked bars, however miscolors them (the link above shows this). It only assigns the first three colors to all of the values, where there are more corresponding colors/values in the Dataframe. The correct plot should have the timestamps on the x-axis, and the segments of the bars for each condition the correct colors.
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot.bar(stacked=True, rot=1, legend=False, ax=fig.gca(), colors=color_list)
I would greatly appreciate any help, thank you in advance.
I don't know how important to you is the choice of colors.
I've just found a solution that seems to fix your problem, the only "but" is that is the development is easier if you accept one of the color schema's available. Othewrise, if you will have to make a colormap by hand, you can find examples with LinearSegmentedColormap from matplotlib.colors.
The code:
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
data_color = [0.,1.,2.,3.,4.,5.] #data range from conditions columns
data_color = [x / max(data_color) for x in data_color]
custom_map = plt.cm.get_cmap('Accent') #one of the color schemas stored
custom = custom_map(data_color) #mapping the color info to the variable custom
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot.bar(stacked=True, rot=1, legend=False, ax=fig.gca(), color=custom)
plt.show()
The display:

smooth plotting all columns of a data-frame

I have a data frame of:
Index Date AA BB CC DD EE FF
0 2019-01-15 0.0 -1.0 0.0 0.0 0.0 2.0
1 2019-01-17 0.0 -1.0 -1.0 -1.0 0.0 2.0
2 2019-01-22 1.0 -1.0 1.0 -1.0 0.0 2.0
3 2019-01-24 0.0 0.0 0.0 0.0 0.0 2.0
4 2019-01-29 1.0 0.0 -1.0 0.0 -1.0 2.0
5 2019-01-31 0.0 -1.0 0.0 0.0 0.0 2.0
6 2019-02-05 1.0 1.0 1.0 0.0 1.0 2.0
7 2019-02-12 2.0 1.0 1.0 0.0 2.0 2.0
which I'm plotting with:
dfs = dfs.melt('Date', var_name = 'cols', value_name = 'vals')
ax = sns.lineplot(x = "Date", y = 'vals', hue = 'cols',
style = 'cols', markers = True, dashes = False, data = dfs)
ax.set_xticklabels(dfs['Date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation = -90)
plt.tight_layout()
plt.show()
resulting:
which is ugly. I want to have the markers in the exact place as what is in the data-frame but the lines to be smoothed. I'm aware of scipy -> spline (e.g. here), however that seems to be too much hassle to convert all the columns. There is also Pandas -> resample -> interpolate (e.g. here) which is very close to what I want but I have to turn the Date column to index which I don't want to do...
I would appreciate if you could help me know what is the best Pythonic way to do this.
P.S. A complete version of my code can be seen here.
I think you need to write a custom plotting function that iterates over all
columns and plots interpolated data to specified axes instance. Look at the following code:
import pandas as pd
import numpy as np
# data = pd.read_clipboard()
# data.drop(['Index'], axis=1, inplace=True)
def add_smooth_plots(df, ax, timecolumn='Date', interpolation_method='cubic', colors='rgbky'):
from itertools import cycle
ind = pd.to_datetime(df.loc[:, timecolumn])
tick_labels =ind.dt.strftime("%Y-%m-%d")
color = cycle(colors)
for i, col in enumerate(df.columns):
if col != timecolumn:
c = next(color)
s = pd.Series(df.loc[:, col].values, index=ind)
intp = s.resample('0.5D').interpolate(method=interpolation_method)
true_ticks = intp.index.isin(ind)
vals = intp.values
intp = intp.reset_index()
ticks = intp.index[true_ticks]
ax.plot(np.arange(len(vals)), vals, label=col, color=c)
ax.set_xticks(ticks)
ax.set_xticklabels(tick_labels.values, rotation=45)
ax.legend(title='Columns')
return ax
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
add_smooth_plots(data, ax)
plt.show()

Categories