Proper data visualization to graph sleep data - python

The data I'd like to visualize is my personal sleep data sourced from a Zeo (www.myzeo.com if you're not familiar). The data is ~50x1000 table with each row representing a night of sleep and each column is an integer from 0-5 representing the sleep 'type' recorded in a 30 second interval. So the first column is the score for the 1st 30 seconds of sleep, the 2nd column the score for the 2nd 30 second interval of sleep and so on.
To start, I'd like to simply map one row (night) of sleep data where the sleep type is mapped to a color. I've been browsing matplotlib's gallery and examples, but its a bit overwhelming to a beginner to figure out what the most appropriate plot type is.
It seems like this color bar (2nd one?) might be close to what I'm looking for, but I'm not sure.
Any recommendations?

This is an extremely specific and narrowly focused question. That said, I see two problems with the color bar visualization proposed:
It only differentiates between different data segments by color. A short interval of sleep disruption may be too narrow to be easily visible (a slice 1 pixel in width is not very large)
Depending on your audience, many color palettes don't cater well to those with color blindness. That could further degrade the ability of a colorbar based plot to convey information.
If you look at the example charts on the MyZeo site, they use a bar chart that conveys information based on color and height. So long as the number of intervals sampled is reasonable, a bar or line chart would be fair choices for your data. (Though if your dataset would require 1,000 separate bars, you may want to consider dithering your dataset so that it displays cleanly)
This matplotlib example appears to provide a bar chart with coloring based on height:
http://matplotlib.org/examples/pylab_examples/hist_colormapped.html
If you do become interested in data visualization, books such as Tufte's The visual display of quantitative information may be worth the read: it's a classic primer on the design choices involved when displaying several dimensions of information on the same figure.

Related

Identifying Plot Name or Visualization Implementation

I'm working on a dataset of SMS records [datetime_entry, sms_sent] and I was looking to copy a really effective trend visual from a well cited Electricity demand study. Does anyone know the name of this plot, or the implementation of something similar in Python (as I'm not sure this was done in Python).
I know how to subplot the 4 charts after splitting the data by quarter, I'm just stumped on the plot type and stylization.
This is what matplotlib calls an eventplot.
Essentially each vertical line represents an occurance of a Mwh demand during that specific hour. So each row in the plot should have as many vertical lines as there are days in that quarter.
While it works in this plot for these data, relying on the combination of alpha level + data density can be slightly unreliable as the data change as the number of overlapping points is not readily visible. So you can also create a similar visualization using hist2d, where you manually specify your bins.

How can I plot multiple y variables stacked in a bubble plot?

I am trying to visualise some data for a construction project. Each week, for a few years, different vehicles will be accessing the site. I need to produce a graph showing when each vehicle will be accessing the site. The way I'd like to do this is to have a single graph that shows all vehicles on, so that weeks without any vehicles are empty and can be clearly identified, where as busy weeks will have lots of data visible at the same time.
I want to have the y axis be vehicle type (let's say cars, vans and trucks for now) and x axis be time (weeks of the year). I want to use bubbles to display the number of each vehicle, so if there is a dot at the coordinates (Van, Week 1) you will know that vans will be used during week 1, and the bubble size will tell you how many.
My question is essentially - what is this graph called? I want it to be called something like a "discrete stacked bubble plot" or something but that doesn't exist. Please see my example below. Any ideas on how to do this? Or am I approaching the problem the wrong way?
Example of what I want it to look like

How to offset (or unstack) data points within the same date in a Waterfall plotly chart?

After extensive research, the closest I found to my issue was this question.
I'm building a waterfall chart to illustrate the cashflow of a certain company. That being said, I have several cashflow entries on the same date.
Currently I'm using waterfallmode='overlay' but I tried group as well. I have some trouble in looking at such chart because the positive and negative entries do overlap and it all gets a bit confused.
What I want exactly is: To unstack (or offset) each entries within a day, so that they are located laterally and parallel to each other, as opposed to overlapped or stacked.
The closest settings I found to deal with it, and why they don't work are:
waterfallgap and waterfallgroupgap (Creating a FigureWidget object)
offset and offsetgroup
None of them work because they offset the entire group (as opposed to each entry).
I guess one solution would be to force a minute's difference in each of the Date entries. But I'm sure there is a cleaner solution

Python: how to plot points with little overlapping

I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.

How to visualize IP addresses as they change in python?

I've written a little script that collects my external IP address every time I open a new terminal window and appends it, at well as the current time, to a text file. I'm looking for ideas on a way to visualize when/how often my IP address changes. I bounce between home and campus and could separate them using the script, but it would be nice to visualize them separately.
I frequently use matplotlib. Any ideas?
Plot your IP as a point on the xkcd internet map (or some zoomed in subset of the map, to better show different but closely neighboring IPs).
Plot each point "stacked" proportional to how often you've had that IP, and color the IPs to make more recent points brighter, less recent points proportionally darker.
"When" is one dimensional temporal data, which is well shown by a timeline. At larger timescales, you'd probably lose the details, but most any plot of "when" would have this defect.
For "How often", a standard 2d (bar) plot of time vs frequency, divided into buckets for each day/week/month, would be a standard way to go. A moving average might also be informational.
You could combine the timeline & bar plot, with the timeline visible when you're zoomed in & the frequency display when zoomed out.
How about a bar plot with time on the horizontal axis where the width of each bar is the length of time your computer held a particular IP address and the height of each bar is inversely proportional to the width? That would also give a plot of when vs how often plot.
You could also interpret the data as a pulse density modulated signal, like what you get on a SuperAudio CD. You could graph this or even listen to the data. As there's no obvious time length for an IP change event, the length of a pulse would be a tunable parameter. Along similar lines, you could view the data as a square wave (triangular wave, sawtooth &c), where each IP change event is a level transition. Sounds like a fun Pure Data project.
There's a section in the matplotlib user guide about drawing bars on a chart to represent ranges. I've never done that myself but it seems appropriate for what you're looking for.
Assuming you specified terminal, i'll assume you are on a UNIX variant system. Using the -f switch along with the command line utility tail can allow you to constantly monitor the end of a file. You could also use something like IBM's inotify, which can monitor file changes or dnotify (and place the file in it's own directory) which usually comes standard on most distributions (you can then call tail -n 1 to get the last line). Once the line changes, you can grab the current system time since epoch using Python's time.time() and subtract it from the time of the last change, then plot this difference using matplotlib. I assume you could categorize the times
into ranges to make the graphing easier on yourself. 1 Bar for less than 1 hour change intervals, another for changes between 1 - 5 hours, and so on.
There is a Python implementation of tail -f located here if you don't want to use it directly. Upon a detection of a change in the file, you could perform the above.

Categories