Wiskerplots are not clear enough to analyze data - python

I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks

There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.

Related

How to make a Linear-LogLog plot in Python?

I would like to be able to make a plot with the x axis scaled as the log of the log. These types of plots are coming up rather frequently in my ME courses and I've been trying to figure out how to do it, and find that there doesn't seem to be a clean builtin way to do it either in python or in Matlab. Both support semilog (linear vs log(x)) and log-log (log(y) vs log(x)) plots, but I'm not seeing any way to get a semiloglog (linear vs log(log(x))) or log-loglog (log(y) vs log(log(x))) plot.
Anyone have any ideas on this? I could do some conversions by simply applying log(log(xvalues)) to my data, but drawing the grid, tick marks, and labels gets kinda tricky, so I was hoping someone might have already built a library for this.

How to make a plot that connect its points to its closest neighbors?

I am working on a code to project a given object onto a plane.
The code works fine (at least it seems like it) in achieving that purpose, the only issue I'm having is in plotting my results.
In the image below, for instance, I'm plotting the projection of a parallelepiped (its edges, to be more precise) in a plane of my choice.
I would like to make a plot where each point is connected to its closest neighbor. I'm not very confident that this approach would get the job done, but I think it would be worth the shot.
Different ideas to get there are also welcome!
Any thoughts?
Thanks in advance.
Note: I also tried using a solid line style when plotting as opposed to the pixel marker style, but the result I got was not quite what I expected to say the least:
When telling matplotlib to plot a sequence of points and join them with a line, it creates a straight line between two adjacent points in your input data. To create several lines, it's often easier to split your plot command into several ones. An alternative is to arrange your points such that they form the edges you want, but that would be much more complicated in your case.
As discussed in the comments, separating each edge into its own separate plot command worked for your case.

Easy way to plot second plot as section of first plot

I have a time-series plot of data in which I want to examine a section in more detail. Kind of like this, but with the second plot being below the first, and instead of the box bounding the section of data, bounding the x-axis labels instead.
Is there a simple way to go about this or am I going to have to write this from scratch?
EDIT: This past question seems to be after the same thing, but was never solved.

Python Matplotlib nonlinear scaling in contour plot

I'm having some trouble visualizing a certain dataset that I have in a contour plot. The issue is that I have a bunch of datapoints (X,Y,Z) for which the Z values range from about 2 to 0, where a lot of the interesting features are located in the 0 to 0.3 range. Using a normal scaling, they are very difficult to see, as illustrated in this image:
Now, I have thought about what else to do. Of course there is logarithmic scaling, but then I first need to think about some sort of mapping, and I am not 100% sure how one would do that. Inspired by this question one could think of a mapping of the type scaling(x) = Log(x/min)/Log(max/min) which worked reasonably well in that question.
Also interesting was the followup discussed here.
where they used some sort of ArcSinh scaling function. That seemed to enlarge the small features quite well, proportionally to the whole.
So my question is two fold in a way I suppose.
How would one scale the data in my contour plot in such a way that the small amplitude features do not get blown away by the outliers?
Would you do it using either of the methods mentioned above, or using something completely different?
I am rather new to python and I am constantly amazed by all the things that are already out there, so I am sure there might be a built in way that is better than anything I mentioned above.
For completeness I uploaded the datafile here (the upload site is robustfiles.com, which a quick google search told me is a trustworthy website to share things like these)
I plotted the above with
data = np.load("D:\SavedData\ThreeQubitRess44SpecHighResNormalFreqs.npy")
fig, (ax1) = plt.subplots(1,figsize=(16,16))
cs = ax1.contourf(X, Y, data, 210, alpha=1,cmap='jet')
fig.colorbar(cs, ax=ax1, shrink=0.9)
ax1.set_title("Freq vs B")
ax1.set_ylabel('Frequency (GHz)'); ax1.set_xlabel('B (arb.)')
Excellent question.
Don't scale the data. You'll be looking for compromises in ranges with many scaling functions.
Instead, use a custom colormap. That way, you won't have to remap your actual data and can easily customize the visualization of the regions you'd like to highlight. Another example can be found in the scipy cookbook and there's quite a few more on the internet.
Another option is to break the plot into 2 separate regions by breaking the axis like so

Python: how to plot points with little overlapping

I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.

Categories