Frequency vs total count bar plot in python - python

I have a datafram with following structure
,mphA,gyrA,parC,tet59,qnrVC
sample1,TRUE,FALSE,FALSE,FALSE,FALSE
sample2,TRUE,FALSE,FALSE,FALSE,TRUE
sample3,FALSE,FALSE,FALSE,TRUE,FALSE
sample4,FALSE,FALSE,FALSE,TRUE,TRUE
sample5,TRUE,FALSE,TRUE,FALSE,TRUE
sample6,TRUE,TRUE,FALSE,FALSE,FALSE
sample7,TRUE,TRUE,TRUE,FALSE,TRUE
sample8,TRUE,TRUE,TRUE,TRUE,TRUE
sample9,FALSE,TRUE,TRUE,FALSE,TRUE
sample10,TRUE,TRUE,FALSE,FALSE,TRUE
And I need to generate a frequency vs total count bar plot similar to the following figure in python. Its a combination of 3 plots so I guess you need to plot them independently and put them in a single canvas. I frequently see this plot in journals so I guess it should be implemented already. However, I did not have any success with online search. Does anybody know how it can be done? Thanks.

It can be done easily using UpSetPlot
https://pypi.org/project/UpSetPlot/

Related

Identifying Plot Name or Visualization Implementation

I'm working on a dataset of SMS records [datetime_entry, sms_sent] and I was looking to copy a really effective trend visual from a well cited Electricity demand study. Does anyone know the name of this plot, or the implementation of something similar in Python (as I'm not sure this was done in Python).
I know how to subplot the 4 charts after splitting the data by quarter, I'm just stumped on the plot type and stylization.
This is what matplotlib calls an eventplot.
Essentially each vertical line represents an occurance of a Mwh demand during that specific hour. So each row in the plot should have as many vertical lines as there are days in that quarter.
While it works in this plot for these data, relying on the combination of alpha level + data density can be slightly unreliable as the data change as the number of overlapping points is not readily visible. So you can also create a similar visualization using hist2d, where you manually specify your bins.

Wiskerplots are not clear enough to analyze data

I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.

Is there a way to count the number of points within a certain area on a graph?

I've got output graphs that look like this:
My question is, is there an easy way for me to count the number of points within each of the obvious 'lines' or 'streaks' of particles? In other words, I need to find the density of each of the streaks separately. Most of them are overlapping which is where my issue comes in.
I've tried specifying x and y limits but again, the overlapping comes into play. The existing code is just importing and plotting values.
Ken, thanks for your comment. I went along that path, I found that single linkage works best for the type of clusters I have. I also had to find a multiplying factor for my own data first, because the clustering was failing with the data overlapping. With this data the different colours represent different clusters. The dendrogram x-axis is labelled with the cluster densities, but they aren't in order! I'm yet to find an efficient way around this. I manually adjusted the dendrogram to produce 2 clusters first, which told me the density of the first shell (it produced 2 clusters, 1 of the first shell and 1 with everything else). Then repeated it for 3,4, etc.
Sorry if none of this makes sense! It's quite late/early here.

Python graph with three entities and legends

I have three lists in python. the lists are given below
Server_name=['server_1','server_1','server_1', 'server_1','server_2', 'server_2','server_2','server_2']
Month_name=['may','may','june','aug','may','june','july','sept']
Error_count=[10,20,10,30,40,10,20,50]
I want to plot a graph something like below
The above diagram shows that the for all the servers and corresponding month the total count of errors are taken and the graph is plot.
I have tried different scenarios but was unable to get the perfect graph with legends and total count for all three entities.
How should I built my code so I can get the above graph, please suggest
Appreciate your help.
Try using vincent module. It is used to produce these type of graphs
https://github.com/wrobstory/vincent

Plotting graphs with error ribbons in python

for a while I've been trying to come up with a good way to graphically represent a data series along with its estimated error.
Recently I saw some graphs where the data was plotted as a line, with a background 'ribbon' filling the area between the lines plotting data +/- sigma.
Is there a name for this type of graph, and is there any python toolkit which has the capability to make such plots?
A simple way to fake it with matplotlib would also be useful - right now I'm just plotting three lines, but I don't know how to fill the area between them.
I would use the fill_between method. Look at the Our Favorite Recipes section of the manual for matplotlib for some good examples. They have one that looks like this:
and another that looks like this:

Categories