mne connectivity_circle / chord diagram - python

I am attempting to create a chord diagram using plot_connectivity_circle from mne_connectivity.viz library.
My data is similar to the following, where letter and number represent separate nodes and count represents the number of connections between those nodes:
import random
import string
random.seed(10)
df = pd.DataFrame({'letter':[random.choice(string.ascii_lowercase) for x in range(20)],
'number':[str(random.randint(0,22)) for x in range(20)],
'Count':[random.randint(20,50) for x in range(20)]})
The documentation for mne cites examples in which a square matrix of connectivity scores is used to create the chord diagram which differs from my use case.
However, it also states that a 1d matrix can be used for the connectivity scores if arrays of indices are also passed that correspond to the correct list of node names. Therefore I assume that df.Count can be used to represent the connectivity scores?
Given my data, I can't figure how to pass the relevant data to the node_names and indices arguments in the correct order and would appreciate some guidance please!
For reference, I have achieved a similar visualisation using the holoviews library but find the options for customisation to be lacking. Code and output for that visualisation included below as an example:
import numpy as np
import holoviews as hv
from holoviews import opts, dim
hv.extension('bokeh')
hv.output(size=350)
nodes = list(set(df['letter'].tolist() + df['number'].tolist()))
nodes = hv.Dataset(pd.DataFrame(nodes, columns=['node']))
chord = hv.Chord((df, nodes))
chord.opts(
opts.Chord(
labels = 'node', label_text_font_size='12pt',
node_color='node', node_cmap='Category20', node_size=10,
edge_color='number', edge_cmap='Category20', edge_alpha=0.9, edge_line_width=1)
)

For the record, I have found an acceptable solution to this issue.
I returned to the original data (the above was the result of groupby and count to get df.Count values) and used crosstab() to generate a dataframe containing the connectivity scores. I referred to the answer to this post for direction
I then transformed the result to an adjacency matrix using to_numpy() which could be passed to the con argument for plot_connectivity_circle().
A list of the columns from the crosstab() can then be passed to the node_names argument.
I don't have time to post a working example of my code right now but will hopefully find time later.
If anyone knowledgeable in the use of mne and plot_connectivity_cirlce can help answer the original question given the data in the form described in the original post, I'd be very interested to learn how it is done!

Related

How can I plot only particular values in xarray?

I am using data from cdasws to plot dynamic spectra. I am following the example found here https://cdaweb.gsfc.nasa.gov/WebServices/REST/jupyter/CdasWsExample.html
This is my code which I have modified to obtain a dynamic spectra for STEREO.
from cdasws import CdasWs
from cdasws.datarepresentation import DataRepresentation
import matplotlib.pyplot as plt
cdas = CdasWs()
import numpy as np
datasets = cdas.get_datasets(observatoryGroup='STEREO')
for index, dataset in enumerate(datasets):
print(dataset['Id'], dataset['Label'])
variables = cdas.get_variables('STEREO_LEVEL2_SWAVES')
for variable_1 in variables:
print(variable_1['Name'], variable_1['LongDescription'])
data = cdas.get_data('STEREO_LEVEL2_SWAVES', ['avg_intens_ahead'],
'2020-07-11T02:00:00Z', '2020-07-11T03:00:00Z',
dataRepresentation = DataRepresentation.XARRAY)[1]
print(data)
plt.figure(figsize = (15,7))
# plt.ylim(100,1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.yscale('log')
sorted_data.transpose().plot()
plt.xlabel("Time",size=18)
plt.ylabel("Frequency (kHz)",size=18)
plt.show()
Using this code gives a plot that looks something like this,
My question is, is there anyway of plotting this spectrum only for a particular frequency? For example, I want to plot just the intensity values at 636 kHz, is there any way I can do that?
Any help is greatly appreciated, I dont understand xarray, I have never worked with it before.
Edit -
Using the command,
data_stereo.avg_intens_ahead.loc[:,625].plot()
generates a plot that looks like,
While this is useful, what I needed is;
for the dynamic spectrum, if i choose a particular frequency like 600khz, can it display something like this (i have just added white boxes to clarify what i mean) -
If you still want the plot to be 2D, but to include a subset of your data along one of the dimensions, you can provide an array of indices or a slice object. For example:
data_stereo.avg_intens_ahead.sel(
frequency=[625]
).plot()
Or
# include a 10% band on either side
data_stereo.avg_intens_ahead.sel(
frequency=slice(625*0.9, 625*1.1)
).plot()
Alternatively, if you would actually like your plot to show white space outside this selected area, you could mask your data with where:
data_stereo.avg_intens_ahead.where(
data_stereo.frequency==625
).plot()

How to specify date bin ranges for Seaborn displot

Problem statement
I am creating a distribution plot of flood events per N year periods starting in 1870. I am using Pandas and Seaborn. I need help with...
specifying the date range of each bin when using sns.displot, and
clearly representing my bin size specifications along the x axis.
To clarify this problem, here is the data that I am working with, what I have tried, and a description of the desired output.
The Data
The data I am using is available from the U.S. Weather service.
import pandas as pd
import bs4
import urllib.request
link = "https://water.weather.gov/ahps2/crests.php?wfo=jan&gage=jacm6&crest_type=historic"
webpage=str(urllib.request.urlopen(link).read())
soup = bs4.BeautifulSoup(webpage)
tbl = soup.find('div', class_='water_information')
vals = tbl.get_text().split(r'\n')
tcdf = pd.Series(vals).str.extractall(r'\((?P<Rank>\d+)\)\s(?P<Stage>\d+.\d+)\sft\son\s(?P<Date>\d{2}\/\d{2}\/\d{4})')\
.reset_index(drop=True)
tcdf['Stage'] = tcdf.Stage.astype(float)
total_crests_events = len(tcdf)
tcdf['Rank'] = tcdf.Rank.astype(int)
tcdf['Date'] = pd.to_datetime(tcdf.Date)
What works
I am able to plot the data with Seaborn's displot, and I can manipulate the number of bins with the bins command.
The second image is closer to my desired output. However, I do not think that it's clear where the bins start and end. For example, the first two bins (reading left to right) clearly start before and end after 1880, but the precise years are not clear.
import seaborn as sns
# fig. 1: data distribution using default bin parameters
sns.displot(data=tcdf,x="Date")
# fig. 2: data distribution using 40 bins
sns.displot(data=tcdf,x="Date",bins=40)
What fails
I tried specifying date ranges using the bins input. The approach is loosely based on a previous SO thread.
my_bins = pd.date_range(start='1870',end='2025',freq='5YS')
sns.displot(data=tcdf,x="Date",bins=my_bins)
This attempt, however, produced a TypeError
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
This is a long question, so I imagine that some clarification might be necessary. Please do not hesitate to ask questions in the comments.
Thanks in advance.
Seaborn internally converts its input data to numbers so that it can do math on them, and it uses matplotlib's "unit conversion" machinery to do that. So the easiest way to pass bins that will work is to use matplotlib's date converter:
sns.displot(data=tcdf, x="Date", bins=mpl.dates.date2num(my_bins))

Plotly line chart_'module' object is not callable why this error

I am plotting a line chart using plotly from a pivot table. But getting an error module' object is not callable when I try to plot, till that no issues. May I know why this is coming? Please see my code below
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
import plotly.offline as pyo
import plotly.graph_objs as go
import plotly.express as px
df1 = pd.read_csv("Funct_TDH_RISE_Corners_2p0_A.txt",delim_whitespace=True)
df1.to_csv('TDH.csv',index=False)
df1['SPEC_MIN']=5
df1['SPEC_MAX']=30
df1
TDH_PVT= pd.pivot_table(df1, index = ['Device_ID'],values = ['TDH_Rise[ns]'])
#When I try to run the below code error is coming
data=[go.Scatter(
x=TDH_PVT.index,
y=TDH_PVT.values,
mode='lines',
name='TDH_RISE'
)]
layout=go.layout(title='TDH RISE')
figure=go.Figure(data=data,layout=layout)
pyo.plot(figure)
My pivot table is given below
TDH_Rise[ns]
Device_ID
FF_2649 19.228333
FF_2650 19.499167
FF_2651 19.365000
FS_2859 20.425000
FS_2860 20.252500
FS_2861 20.557500
SF_2754 21.700000
SF_2755 21.743333
SF_2756 21.528000
SS_2544 21.678333
SS_2545 21.642500
SS_2546 21.655000
TT_2439 20.730000
TT_2440 20.688333
TT_2441 18.642500
The actual plotly method is Layout. As python is case sensitive it is not recognising the method and so throwing the not callable exception. Just need to change line
layout=go.layout(title='TDH RISE')
to
layout=go.Layout(title='TDH RISE')
bootstrap upper lower floor ceilings and alter per nodes var.b
check the syntax on the whitespace, add s.
I would use '"idx"', "make_union", assists by grouping two dfs, (df1,df2).
performing more granular level lims' xor subs.
use abr, aprx, idx, best_fit, ***
triangulation patterns(Fourier)-h / dequeue.linked(priority) | chain.attrs.map.map(Map) for meta.
***, knn, euclidean
increase samples from min_samples_name=int,
underfit the model, allow auto.py.gui for mapping procedures for new win32 pop up/ interactive.
"==" for bottom two shapes/fig \
use dot or Rsq in congruency with the triangulation (Euclidean /
Manhattan) call to calls(recursive), if node path leaves initiative.
memory=None
warm_start=T/F for usage calls.
have you tried scoring mechanisms? scoring='accuracy'
n_jobs=-1
Gini criterion
for the pivot, use null xor start for later time under a set | cond.
I had to perform a very similar process, and my approach was much different, as it was population, density, foot traffic and density based with erroneous factors, bases, subs, triangulation of est, rsq, differential, partial differentiation,(pred - actuals) and data points were all able to be referenced back. to initial call.
a great though advanced, "pyautogui" will perform get [actions;] [].collect
|N|_n=biaserrweights(w, x, rec)_in.dot(xwa, xwb, xwc)*dist(euc(x)):
err.% for 'i' in weighted(i, j)range,
else:
J =0,
return J-jw(x**2i)/n(map)
J = future distance, add *args based on conditions as in a filter.

python scipy 3D interpolation / look up table

I have data produced from Comsol which I would like to use as a look up table in a Python / Scipy program I am building. The output from comsol looks like B(ri,thick,L) and will contain approximately 20,000 entries. An example of the output is shown below for a reduced 3x3x3 version.
While I have found many good solutions for 3D interpolation using e.g. regulargridinterpolator (first link below), I am still looking for a solution using the lookup table style. The second link below seems close, however I am unsure how the method interpolates over all three dimensions.
I am having a hard time believing that a lookup table requires such an elaborate implementation, so any suggestions are most appreciated!
COMSOL data example
interpolate 3D volume with numpy and or scipy
Interpolating data from a look up table
I was able to figure this out and wanted to pass on my solution to the next person. I found that merely averaging the two closest points found via a cKDtree yielded errors as large as 10%.
Instead, I used the cKDtree to find the appropriate entry in the scattered look up table / data file and assign it to the correct entry of a 3D numpy array (You can save this numpy array to file if you like). Then I use rectangulargridinterpolator on this array. Errors were on the order of 0.5 percent which was an order of magnitude better than the cKDtree.
import numpy as np
from scipy.spatial import cKDTree
from scipy.interpolate import RegularGridInterpolator
l_data = np.linspace(.125,0.5,16)# np.linspace(0.01,0.1,10) #Range for "short L"
ri_data = np.linspace(0.005,0.075,29)
thick_data = np.linspace(0.0025,0.1225,25)
#xyz data with known bounds above
F = np.zeros((np.size(l_data),np.size(ri_data),np.size(thick_data)))
LUT = np.genfromtxt('a_data_file.csv', delimiter = ',')
F_val = LUT[:, 3]
tree_small_l = cKDTree(LUT[:, :3]) #xyz coords
for ri_iter in np.arange(np.size(ri_data)):
for thick_iter in np.arange(np.size(thick_data)):
for l_iter in np.arange(np.size(l_data)):
dist,ind = tree_small_l.query(((l_data[l_iter],ri_data[ri_iter],thick_data[thick_iter])))
F[l_iter,ri_iter,thick_iter] = F_val[ind].T
interp_F_func = RegularGridInterpolator((l_data, ri_data, thick_data), F)

Plotting trajectories in python using matplotlib

I'm having some trouble using matplotlib to plot the path of something.
Here's a basic version of the type of thing I'm doing.
Essentially, I'm seeing if the value breaks a certain threshold (6 in this case) at any point during the path and then doing something with it later on.
Now, I have 3 lists set-up. The end_vector will be based on the other two lists. If the value breaks past 2 any time during a single simulation, I will add the last position of the object to my end_vector
trajectories_vect is something I want to keep track of my trajectories for all 5 simulations, by keeping a list of lists. I'll clarify this below. And, timestep_vect stores the path for a single simulation.
from random import gauss
from matplotlib import pyplot as plt
import numpy as np
starting_val = 5
T = 1 #1 year
delta_t = .1 #time-step
N = int(T/delta_t) #how many points on the path looked at
trials = 5 #number of simulations
#main iterative loop
end_vect = []
trajectories_vect = []
for k in xrange(trials):
s_j = starting_val
timestep_vect = []
for j in xrange(N-1):
xi = gauss(0,1.0)
s_j *= xi
timestep_vect.append(s_j)
trajectories_vect.append(timestep_vect)
if max(timestep_vect) > 5:
end_vect.append(timestep_vect[-1])
else:
end_vect.append(0)
Okay, at this part if I print my trajectories, I get something like this (I only posted two simulations, instead of the full 5):
[[ -3.61689976e+00 2.85839230e+00 -1.59673115e+00 6.22743522e-01
1.95127718e-02 -1.72827152e-02 1.79295788e-02 4.26807446e-02
-4.06175288e-02] [ 4.29119818e-01 4.50321728e-01 -7.62901016e-01
-8.31124346e-02 -6.40330554e-03 1.28172906e-02 -1.91664737e-02
-8.29173982e-03 4.03917926e-03]]
This is good and what I want to happen.
Now, my problem is that I don't know how to plot my path (y-axis) against my time (x-axis) properly.
First, I want to put my data into numpy arrays because I'll need to use them later on to compute some statistics and other things which from experience numpy makes very easy.
#creating numpy arrays from list
#might need to use this with matplotlib somehow
np_trajectories = np.array(trajectories_vect)
time_array = np.arange(1,10)
Here's the crux of the issue though. When i'm putting my trajectories (y-axis) into matplotlib, it's not treating each "list" (row in numpy) as one path. Instead of getting 5 paths for 5 simulations, I am getting 9 paths for 5 simulations. I believe I am inputing stuff wrong hence it is using the 9 time intervals in the wrong way.
#matplotlib stuff
plt.plot(np_trajectories)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
Here's the image produced:
Obviously, this is wrong for the aforementioned reason. Instead, I want to have 5 paths based on the 5 lists (rows) in my trajectories. I seem to understand what the problem is but don't know how to go about fixing it.
Thanks in advance for the help.
When you call np_trajectories = np.array(trajectories_vect), your list of trajectories is transformed into a 2d numpy array. The information about its dimensions is stored in np_trajectories.shape, and, in your case, is (5, 9). Therefore, when you pass np_trajectories to plt.plot(), the plotting library assumes that the y-values are stored in the first dimension, while the second dimension describes individual lines to plot.
In your case, all you need to do is to transpose your np_trajectories array. In numpy, it is as simple as
plt.plot(np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
If you want to plot the x-axis as time, instead of steps of one, you have to define your time progression as a list or an array. In numpy, you can do something like
times = np.linspace(0, T, N-1)
plt.plot(times, np_trajectories.T)
plt.xlabel('timestep')
plt.ylabel('trajectories')
plt.show()
which produces the following figure:

Categories