Plotting histogram error - want to swap x and y axis - python

I have an array with values [100,101,102,103,104,105]. When I am plotting it on histogram the values 100,101... are coming on x axis but i want them on y...any suggestion how to do it?

First of all are you sure you want the histogram. By definition the histogram will have values vs count of occurrences. If indeed want the histogram, but with rotated axis it can be done using 'orientation' keyword.
import numpy as np
import matplotlib.pyplot as plt
vals = np.random.randint(100,110,(100))
print(vals)
plt.hist(vals, orientation='horizontal')
plt.show()
Following further clarifications looks like you want horizontal bar plot. Using your example I modified it just to use barh() instead of bar() and also added another source of data, otherwise the result is too trivial. For bar plot you need two columns - one is the position of the bar, another is the length of the bar.
array = [101,102,103,104,105]
values = np.random.randint(0,10, size=len(array))
print(values)
plt.barh(array,values)
plt.show()
Output:
[3 9 2 4 7]

Related

Cumulative histogram for 2D data in Python

My data consists of a 2-D array of masses and distances. I want to produce a plot where the x-axis is distance and the y axis is the number of data elements with distance <= x (i.e. a cumulative histogram plot). What is the most efficient way to do this with Python?
PS: the masses are irrelevant since I already have filtered by mass, so all I am trying to produce is a plot using the distance data.
Example plot below:
You can combine numpy.cumsum() and plt.step():
import matplotlib.pyplot as plt
import numpy as np
N = 15
distances = np.random.uniform(1, 4, 15).cumsum()
counts = np.random.uniform(0.5, 3, 15)
plt.step(distances, counts.cumsum())
plt.show()
Alternatively, plt.bar can be used to draw a histogram, with the widths defined by the difference between successive distances. Optionally, an extra distance needs to be appended to give the last bar a width.
plt.bar(distances, counts.cumsum(), width=np.diff(distances, append=distances[-1]+1), align='edge')
plt.autoscale(enable=True, axis='x', tight=True) # make x-axis tight
Instead of appending a value, e.g. a zero could be prepended, depending on the exact interpretation of the data.
plt.bar(distances, counts.cumsum(), width=-np.diff(distances, prepend=0), align='edge')
This is what I figured I can do given a 1D array of data:
plt.figure()
counts = np.ones(len(data))
plt.step(np.sort(data), counts.cumsum())
plt.show()
This apparently works with duplicate elements also, as the ys will be added for each x.

Scatter matrix with few variables with target in Y axis

I am trying to plot variable Vs SalePrice data. I tried pd.scatter_matrix but I am getting number of unnecessary plot with various combinations. I look for is SalePrice in Y axis and a scatter plot for each element from the data set. Here is the code I tried.
data_prep_num['Sales_test_data']=data_sales_price_old
att=['Sales_test_data','YearBuilt','LotArea','MSSubClass','BsmtFinSF1','TotalBsmtSF','1stFlrSF','2ndFlrSF','GrLivArea','GarageArea']
pd.scatter_matrix(data_prep_num[att],alpha=.4,figsize=(30,30))```
If you want to use pd.plotting.scatter_matrix but only want one of the rows (i.e. the Sales_test_data column), you can iterate over the plotting axes, and hide the combinations you don't want.
Assuming the SalePrice is the very first column (index 0):
import numpy as np
import matplotlib.pyplot as plt
axes = pd.plotting.scatter_matrix(data_prep_num[att], alpha=0.4, figsize=(30,30))
for i in range(np.shape(axes)[0]):
if i != 0:
for j in range(np.shape(axes)[1]):
axes[i,j].set_visible(False)
Note: This is obviously not super efficient when you start having lots of columns though.

In matplotlib, plotting the histogram using plot and the returns of hist

In order to test the returns of hist, I want to use them using plot via matplotlib. hist give the following returns:
import matplotlib.pyplot as plt
counts, bins, bars = plt.hist(x)
where x is the vector of data you want to plot the histogram.
I have tried the following syntax
plt.plot(bins,counts)
I get the following error
Error: x and y must have the same first dimension, but have shapes (501,) and (500,)
Thank you for your answers.
From the matplotlib documentationof plt.hist():
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges
and right edge of last bin). Always a single array even when multiple
data sets are passed in.
So the returned value bins is the number of bins + 1 because it includes the left bin edges and right edge of the last bin.
You might not want to include the right edge of the last bin, therefore you can slice the array:
plt.plot(bins[:-1], counts)
Try this:
import matplotlib.pyplot as plt
plt.hist(x)
plt.show()
This is the simplest one I guess.

How do I plot a bar graph from matplotlib/seaborn with an int list as value and a string list as x axis?

I have a list of ints arr=[1,2,3,4] and a list of strings list(df) (which returns a list of column headers of the dataframe). I want to plot a bar graph such that the x axis labels are taken from the list of strings and the value for them are taken from the list of ints.
So for eg if list(df) returns [a,b,c,d], there would be a graph with markings of a,b,c,d on the x axis having a corresponding value of 1,2,3,4 respectively on the y axis.
I can't figure out a way to do that. Please help.
It doesn't seem like an intuitive thing to do. I followed the example here to create this code:
import matplotlib.pyplot as plt
vals=[1,2,3,4,5]
inds=range(len(vals))
labels=["A","B","C","D","E"]
fig,ax = plt.subplots()
rects = ax.bar(inds, vals)
ax.set_xticks([ind+0.5 for ind in inds])
ax.set_xticklabels(labels)
and get this output:
In the first half of the code I'm just setting up the variables.
In the second half I call plt.subplots() so I can get the axis (ax) handle to put the tickmarks in as well as the rects. Setting the tickmarks determines where the labels will be, so I shifted them by 0.5 to the right, otherwise they would be at the leftmost edge of each box.

Summing lines in pyplot

I have a pyplot figure with a few lines on it. I would like to be able to draw an extra line, which would be a sum of all others' values. The lines are not plotted against the same x values (they are visually shorter in the plot - see the image). The resulting line would be somewhat above all others.
One idea I have for it requires obtaining a line's y value in a specific x point. Is there such a function? Or does pyplot/matplotlib support summing lines' values?
Superposition it the short answer to your question: read this for more.
Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(100) #range of x axis
y1 = np.random.rand(100,1) #some random numbers
y2 = np.random.rand(100,1)
#this will only plot y1 value
plt.plot(x,y1)
plt.show()
#this will plot summation of two elements
plt.plot(x,y1+y2)
plt.show()
I took a second look at your question, what I saw is your y values have different length so adding them would not be the case as shown in example above. What you can do is create equal sized 4 lists, where non existing values in that list is zero, then you can apply super position to this (simply add all of them and then plot)
For the future generations: numpy.interp() was my solution to this problem.

Categories