Identifying Plot Name or Visualization Implementation - python

I'm working on a dataset of SMS records [datetime_entry, sms_sent] and I was looking to copy a really effective trend visual from a well cited Electricity demand study. Does anyone know the name of this plot, or the implementation of something similar in Python (as I'm not sure this was done in Python).
I know how to subplot the 4 charts after splitting the data by quarter, I'm just stumped on the plot type and stylization.

This is what matplotlib calls an eventplot.
Essentially each vertical line represents an occurance of a Mwh demand during that specific hour. So each row in the plot should have as many vertical lines as there are days in that quarter.
While it works in this plot for these data, relying on the combination of alpha level + data density can be slightly unreliable as the data change as the number of overlapping points is not readily visible. So you can also create a similar visualization using hist2d, where you manually specify your bins.

Related

Wiskerplots are not clear enough to analyze data

I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.

Is there any way to suppress/normalize/average out peaks in graph in python pandas

I am working on data visualization problem, where I am plotting daily active users on about 15k pages against time/dates in python. On some days, I have peaks on specific page, but those peaks are artificially created and affect cumulative results. I want to show overall trends, either by suppressing peaks or adjusting the data in some other way.
I am plotting using Pandas with Python, in jupyter notebook.
Question: Is there any efficient way to solve this problem?
Sample Graph is attached, Red line is original graph, where blue line is my attempt to suppress peaks. On x-axis, date is mentioned, on y-axis, sum of daily traffic is ploted

How to plot a dataframe that contains values spread over a large spectrum of values?

I have the following dataframe, resulted from running grid search over several regression models:
As it can be noticed, there are many values grouped around 0.0009, but several that are a few orders of magnitude higher (-1.6, -2.3 etc).
I would like to plot these results, but I don't seem to find a way to get a readable plot. I have tried a bar plot, but I get something like:
How can I make this bar plot more readable? Or what other kind of plot would be more suitable to visualize such data?
Edit: Here is the dataframe, exported as CSV:
,a,b,c,d
LinearRegression,0.000858399508896,-4.11609208874e+20,0.000952538859738,0.000952538859733
RandomForestRegressor,-1.62264355718,-2.30218457629,0.0008957696846039999,0.0008990722465239999
ElasticNet,0.000883257900658,0.0008525502791760002,0.000884706195921,0.000929498696126
Lasso,7.92193516085e-05,-1.84086765436e-05,7.92193516085e-05,-1.84086765436e-05
ExtraTreesRegressor,-6.320170496909999,-6.30420308033,,
Ridge,0.0008584791396339999,0.0008601028734780001,,
SGDRegressor,-4.62522968756,,,
You could make the graph have a log scale, which is often used for plotting data with a very large range. This muddies the interpretation slightly, as now each equivalent distance is an equivalent order of magnitude difference. You can read about log scales here:
https://en.wikipedia.org/wiki/Logarithmic_scale

3D plot in python

I have worked out a table somewhat like the one in the link. The ultimate goal for plotting is to find out if there is a seasonal change pattern for certain products in a state. I have tried to figure out a 3-D plot in python, with x-axis being product name, y-axis being month and z-axis being YR2012 and YR2013 respectively.
And another small question related to this is how could I make python know that the SALESMONTH column contains month type of data rather than plain integers.
Thanks!

Proper data visualization to graph sleep data

The data I'd like to visualize is my personal sleep data sourced from a Zeo (www.myzeo.com if you're not familiar). The data is ~50x1000 table with each row representing a night of sleep and each column is an integer from 0-5 representing the sleep 'type' recorded in a 30 second interval. So the first column is the score for the 1st 30 seconds of sleep, the 2nd column the score for the 2nd 30 second interval of sleep and so on.
To start, I'd like to simply map one row (night) of sleep data where the sleep type is mapped to a color. I've been browsing matplotlib's gallery and examples, but its a bit overwhelming to a beginner to figure out what the most appropriate plot type is.
It seems like this color bar (2nd one?) might be close to what I'm looking for, but I'm not sure.
Any recommendations?
This is an extremely specific and narrowly focused question. That said, I see two problems with the color bar visualization proposed:
It only differentiates between different data segments by color. A short interval of sleep disruption may be too narrow to be easily visible (a slice 1 pixel in width is not very large)
Depending on your audience, many color palettes don't cater well to those with color blindness. That could further degrade the ability of a colorbar based plot to convey information.
If you look at the example charts on the MyZeo site, they use a bar chart that conveys information based on color and height. So long as the number of intervals sampled is reasonable, a bar or line chart would be fair choices for your data. (Though if your dataset would require 1,000 separate bars, you may want to consider dithering your dataset so that it displays cleanly)
This matplotlib example appears to provide a bar chart with coloring based on height:
http://matplotlib.org/examples/pylab_examples/hist_colormapped.html
If you do become interested in data visualization, books such as Tufte's The visual display of quantitative information may be worth the read: it's a classic primer on the design choices involved when displaying several dimensions of information on the same figure.

Categories