I am plotting a line chart using plotly from a pivot table. But getting an error module' object is not callable when I try to plot, till that no issues. May I know why this is coming? Please see my code below
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
import plotly.offline as pyo
import plotly.graph_objs as go
import plotly.express as px
df1 = pd.read_csv("Funct_TDH_RISE_Corners_2p0_A.txt",delim_whitespace=True)
df1.to_csv('TDH.csv',index=False)
df1['SPEC_MIN']=5
df1['SPEC_MAX']=30
df1
TDH_PVT= pd.pivot_table(df1, index = ['Device_ID'],values = ['TDH_Rise[ns]'])
#When I try to run the below code error is coming
data=[go.Scatter(
x=TDH_PVT.index,
y=TDH_PVT.values,
mode='lines',
name='TDH_RISE'
)]
layout=go.layout(title='TDH RISE')
figure=go.Figure(data=data,layout=layout)
pyo.plot(figure)
My pivot table is given below
TDH_Rise[ns]
Device_ID
FF_2649 19.228333
FF_2650 19.499167
FF_2651 19.365000
FS_2859 20.425000
FS_2860 20.252500
FS_2861 20.557500
SF_2754 21.700000
SF_2755 21.743333
SF_2756 21.528000
SS_2544 21.678333
SS_2545 21.642500
SS_2546 21.655000
TT_2439 20.730000
TT_2440 20.688333
TT_2441 18.642500
The actual plotly method is Layout. As python is case sensitive it is not recognising the method and so throwing the not callable exception. Just need to change line
layout=go.layout(title='TDH RISE')
to
layout=go.Layout(title='TDH RISE')
bootstrap upper lower floor ceilings and alter per nodes var.b
check the syntax on the whitespace, add s.
I would use '"idx"', "make_union", assists by grouping two dfs, (df1,df2).
performing more granular level lims' xor subs.
use abr, aprx, idx, best_fit, ***
triangulation patterns(Fourier)-h / dequeue.linked(priority) | chain.attrs.map.map(Map) for meta.
***, knn, euclidean
increase samples from min_samples_name=int,
underfit the model, allow auto.py.gui for mapping procedures for new win32 pop up/ interactive.
"==" for bottom two shapes/fig \
use dot or Rsq in congruency with the triangulation (Euclidean /
Manhattan) call to calls(recursive), if node path leaves initiative.
memory=None
warm_start=T/F for usage calls.
have you tried scoring mechanisms? scoring='accuracy'
n_jobs=-1
Gini criterion
for the pivot, use null xor start for later time under a set | cond.
I had to perform a very similar process, and my approach was much different, as it was population, density, foot traffic and density based with erroneous factors, bases, subs, triangulation of est, rsq, differential, partial differentiation,(pred - actuals) and data points were all able to be referenced back. to initial call.
a great though advanced, "pyautogui" will perform get [actions;] [].collect
|N|_n=biaserrweights(w, x, rec)_in.dot(xwa, xwb, xwc)*dist(euc(x)):
err.% for 'i' in weighted(i, j)range,
else:
J =0,
return J-jw(x**2i)/n(map)
J = future distance, add *args based on conditions as in a filter.
Related
I am attempting to create a chord diagram using plot_connectivity_circle from mne_connectivity.viz library.
My data is similar to the following, where letter and number represent separate nodes and count represents the number of connections between those nodes:
import random
import string
random.seed(10)
df = pd.DataFrame({'letter':[random.choice(string.ascii_lowercase) for x in range(20)],
'number':[str(random.randint(0,22)) for x in range(20)],
'Count':[random.randint(20,50) for x in range(20)]})
The documentation for mne cites examples in which a square matrix of connectivity scores is used to create the chord diagram which differs from my use case.
However, it also states that a 1d matrix can be used for the connectivity scores if arrays of indices are also passed that correspond to the correct list of node names. Therefore I assume that df.Count can be used to represent the connectivity scores?
Given my data, I can't figure how to pass the relevant data to the node_names and indices arguments in the correct order and would appreciate some guidance please!
For reference, I have achieved a similar visualisation using the holoviews library but find the options for customisation to be lacking. Code and output for that visualisation included below as an example:
import numpy as np
import holoviews as hv
from holoviews import opts, dim
hv.extension('bokeh')
hv.output(size=350)
nodes = list(set(df['letter'].tolist() + df['number'].tolist()))
nodes = hv.Dataset(pd.DataFrame(nodes, columns=['node']))
chord = hv.Chord((df, nodes))
chord.opts(
opts.Chord(
labels = 'node', label_text_font_size='12pt',
node_color='node', node_cmap='Category20', node_size=10,
edge_color='number', edge_cmap='Category20', edge_alpha=0.9, edge_line_width=1)
)
For the record, I have found an acceptable solution to this issue.
I returned to the original data (the above was the result of groupby and count to get df.Count values) and used crosstab() to generate a dataframe containing the connectivity scores. I referred to the answer to this post for direction
I then transformed the result to an adjacency matrix using to_numpy() which could be passed to the con argument for plot_connectivity_circle().
A list of the columns from the crosstab() can then be passed to the node_names argument.
I don't have time to post a working example of my code right now but will hopefully find time later.
If anyone knowledgeable in the use of mne and plot_connectivity_cirlce can help answer the original question given the data in the form described in the original post, I'd be very interested to learn how it is done!
Problem statement
I am creating a distribution plot of flood events per N year periods starting in 1870. I am using Pandas and Seaborn. I need help with...
specifying the date range of each bin when using sns.displot, and
clearly representing my bin size specifications along the x axis.
To clarify this problem, here is the data that I am working with, what I have tried, and a description of the desired output.
The Data
The data I am using is available from the U.S. Weather service.
import pandas as pd
import bs4
import urllib.request
link = "https://water.weather.gov/ahps2/crests.php?wfo=jan&gage=jacm6&crest_type=historic"
webpage=str(urllib.request.urlopen(link).read())
soup = bs4.BeautifulSoup(webpage)
tbl = soup.find('div', class_='water_information')
vals = tbl.get_text().split(r'\n')
tcdf = pd.Series(vals).str.extractall(r'\((?P<Rank>\d+)\)\s(?P<Stage>\d+.\d+)\sft\son\s(?P<Date>\d{2}\/\d{2}\/\d{4})')\
.reset_index(drop=True)
tcdf['Stage'] = tcdf.Stage.astype(float)
total_crests_events = len(tcdf)
tcdf['Rank'] = tcdf.Rank.astype(int)
tcdf['Date'] = pd.to_datetime(tcdf.Date)
What works
I am able to plot the data with Seaborn's displot, and I can manipulate the number of bins with the bins command.
The second image is closer to my desired output. However, I do not think that it's clear where the bins start and end. For example, the first two bins (reading left to right) clearly start before and end after 1880, but the precise years are not clear.
import seaborn as sns
# fig. 1: data distribution using default bin parameters
sns.displot(data=tcdf,x="Date")
# fig. 2: data distribution using 40 bins
sns.displot(data=tcdf,x="Date",bins=40)
What fails
I tried specifying date ranges using the bins input. The approach is loosely based on a previous SO thread.
my_bins = pd.date_range(start='1870',end='2025',freq='5YS')
sns.displot(data=tcdf,x="Date",bins=my_bins)
This attempt, however, produced a TypeError
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
This is a long question, so I imagine that some clarification might be necessary. Please do not hesitate to ask questions in the comments.
Thanks in advance.
Seaborn internally converts its input data to numbers so that it can do math on them, and it uses matplotlib's "unit conversion" machinery to do that. So the easiest way to pass bins that will work is to use matplotlib's date converter:
sns.displot(data=tcdf, x="Date", bins=mpl.dates.date2num(my_bins))
I used statannot to perform a statistical test on some basic data, but the results from the statistical test don't seem correct. I.e. a couple of my comparisons come up with "P_val=0.000e+00 U_stat=0.000e+00", which I think should not be possible. Is there something wrong with my data frame and/or code?
Here is the data frame I am using:
and here is my code:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from statannot import add_stat_annotation
import scipy.stats as sp
data = pd.read_excel('Z:/DMF/GROUPS/gr_Veening/Users/Vik/scRNA-seq/FACSAria/Adherence-invasion assays/adherence_invasion_assay_a549-RFP 4-6-21.xlsx',sheet_name="Sheet2", header = 0)
sns.set_theme(style="darkgrid")
ax1 = sns.boxplot(x="Strain", y="adherence_counts", data=data)
x = "Strain"
y = "adherence_counts"
order = ["D39", "D39 Δcps", "19F", "19F ΔcomCDE"]
ax1 = sns.boxplot(data=data, x=x, y=y, order=order)
plt.title("Adherence Assay")
plt.ylabel('CFU/ml')
plt.xlabel('')
ax1.set(xticklabels=["D39", "D39 Δ$\it{cps}$", "19F", "19F Δ$\it{comCDE}$"])
add_stat_annotation(ax1, data=data, x=x, y=y, order=order,
box_pairs=[("D39", "19F"), ("D39", "D39 Δcps"), ("D39 Δcps", "19F"), ("19F", "19F ΔcomCDE")],
test='Mann-Whitney', text_format='star', loc='inside', verbose=2)
Finally, here is the results from this statistical test:
D39 v.s. D39 Δcps: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=0.000e+00 U_stat=0.000e+00
D39 Δcps v.s. 19F: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=1.000e+00 U_stat=2.000e+00
19F v.s. 19F ΔcomCDE: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=7.617e-01 U_stat=8.000e+00
D39 v.s. 19F: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=0.000e+00 U_stat=0.000e+00
C:\Users\Vik\anaconda3\lib\site-packages\scipy\stats\stats.py:7171: RuntimeWarning: divide by zero encountered in double_scalars
z = (bigu - meanrank) / sd
Any help would be greatly appreciated, thanks!
Your problems come from two parts:
Statistically, in some of your cases (such as "D39" vs "19F"), all items are larger/smaller in one group vs the other, hence the 0 U statistic and extreme p-value. It is very much possible to have these results. It comes from examining only the ranks of the values provided (what this test does), it has advantages and limitations (+ Mann-Whitney's test is not adapted to such small sample sizes either, especially with scipy assuming equivariance).
Now that line z = (bigu - meanrank) / sd failing means that np.sqrt(T * n1 * n2 * (n1+n2+1) / 12.0) = 0, so in this case n1 and/or n2 are 0, (these are len(x) and len(y)). source in scipy So,
There is a bug in statannot, because this can happen, silently, if order and box_pair both refer to a series which does not exist in the dataframe, which I'll correct in statannotations. Thank you, then.
However, I cannot reproduce your Warning with a copy of your dataframe.
If this were the only bug, you should see a missing box in your plot at the point you showed us.
If not, is it possible you updated some of the code but did not copy the last output here? Otherwise, there may be something more to uncover, please let us know.
EDIT: As discovered in the discussion, the second problem can happen in statannot if there is a mismatch between a label in order, box_pairs and in the dataset. This has been patched in statannotations, a fork of statannot.
Seaborn is great for creating faceted plots based on a categorical variable encoding the class of each facet. However, this assumes your categories are mutually exclusive. Is it possible to create a Seaborn FacetGrid (or similar) based on a set of indicator variables?
As a concrete example, think about comparing patients that are infected with one or more viruses, and plotting an attribute of interest by virus. Its possible that a patient carries more than one virus, so creating a virus column to create a grid on is not possible. You can, however, create a set of indicator variables (one for each virus) that flags the virus for each patient. There does not seem to be a way of passing a set of indicator variables to any of the Seaborn functions to do this.
I can't imagine I'm the first person to come across this scenario, so I'm hoping there are suggestions for how to do this, without coding it by hand in Matploltlib.
I don't see how to do it with FacetGrid, possibly because this isn't facetting the data, since a data-record might appear several times or only once in the plot. One of the standard tricks with a set of bitfields is to read them as binary, so you see each combination of the bits. That's unambiguous but gets messy:
import pandas as pd
import seaborn as sns
from numpy.random import random, randint
from numpy import concatenate
import matplotlib.pyplot as plt
# Dummy data
vdata = pd.DataFrame(concatenate((randint(2, size=(32,4)), random(size=(32,2))), axis=1))
vdata.columns=['Species','v1','v2','v3','x','y']
binary_v = vdata.v1 + vdata.v2*2 + vdata.v3*4
# Making a binary number out of the "virusX?" fields
pd.concat((vdata, binary_v), axis=1)
vdata = pd.concat((vdata, binary_v), axis=1)
vdata.columns=['Species','v1','v2','v3','x','y','binary_v']
# Plotting group membership by row
#g = sns.FacetGrid(vdata, col="Species", row='binary_v')
#g.map(plt.scatter, "x", "y")
#g.add_legend()
#plt.savefig('multiple_facet_binary_row') # Unreadably big.
h = sns.FacetGrid(vdata, col="Species", hue="binary_v")
h.map(plt.scatter, "x","y")
h.add_legend()
plt.savefig('multiple_facet_binary_hue')
If you have too many indicators to deal with the combinatorial explosion, explicitly making the new subsets works:
# Nope, need to pull out subsets:
bdata = vdata[vdata.v1 + vdata.v2 + vdata.v3 ==0.]
assert(len(bdata) > 0) # ... catch...
bdata['Virus'] = pd.Series(['none']*len(bdata), index=bdata.index)
for i in ['v1','v2','v3']:
on = vdata[vdata[i]==1.]
on['Virus'] = pd.Series([i]*len(on), index=on.index)
bdata = bdata.append(on)
j = sns.FacetGrid(bdata, col='Species', row='Virus')
j.map(plt.scatter, 'x', 'y')
j.add_legend()
j.savefig('multiple_facet_refish')
I want to plot a box plot for a variable in a data frame xldata['yaxis_data'] according to 1,0 mapping stored in another array (one_zero_map).
I have a working code for this I am just not sure if this is the best way. Any help would be great.
Reason I am unsure is I am guessing there should be a direct way for boxplot to understand what I want if I input directly one_zero_map and xldata['yaxis_data'] without creating good_ones and bad_ones and then putting them in a list called final_list
%matplotlib inline
import matplotlib.pyplot as plt
good_ones=[val for ind, val in zip(one_zero_map,xldata['yaxis_data']) if ind==1]
bad_ones=[val for ind, val in zip(one_zero_map,xldata['yaxis_data']) if ind==0]
final_list=[good_ones,bad_ones]
plt.boxplot(final_list)
Just to be more clear on what I am looking for, I am looking for Python equivalent of R which is like this
# Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon")
or phython equivalent of graphlab as
sales.show(view='BoxWhisker Plot',x='zipcode',y='price')
You can use the boxplot method directly from pandas DataFrames. This code is equivalent to your R example:
# statsmodels only needed to get the R mtcars dataset
import statsmodels.api as sm
mtcars = sm.datasets.get_rdataset('mtcars').data
mtcars.boxplot('mpg', by='cyl')