I would like to make a spike graph: at each x value I draw a line of length y. I could use plt.bar with thin bars. My problem is that I can specify the bar width, but it is given in data coordinates which is inadequate for this purpose. I would like to set the width of the bars to, say 1pt.
plt.bar(x,y,width=0.1,align='center',color='black')
There is http://matplotlib.org/users/transforms_tutorial.html . But this does not answer my question, the x and the y should stay in data coordinates, only the width of the bar should be specified in "true size coordinate", so that putting transform=... will not work. The other problem is that I do not find a way to specify data in "true size unit" (pt, cm, inches,...). (The final purpose is to make a pdf file for inclusion in a paper).
EDIT I could use vlines instead of bar, however, the original question still puzzled me: how can I specify different system coordinate for different parameters of the same object, how can I use true size units?
You could rescale the width each time, something like:
import matplotlib.pylab as plt
import numpy as np
x = np.linspace(0, 1000, 10)
y = np.random.randint(0, 10, 10)
plt.bar(x, y, width=0.001*max(x))
plt.show()
This takes makes the width of each bar always a fraction 0.001 of the total length.
Related
I'm plotting a dataset where the size of the data arrays is larger than the size of the figure, even larger than the resolution of my screen. As shown in the example below, matplotlib does a remarkably good job rendering the data. This is just an example dataset. My real dataset is far more unpredictable. I have a concern there may be occasions when some important data is not shown. How does matplotlib decide what to show?
x = np.arange(0, 10000)
y = np.zeros(10000)
for i in range(0, 10000, 100):
y[i] = x[i]
x_spikes = np.random.choice(x, size=10, replace=False)
y[x_spikes] = 10000 + x[x_spikes]
plt.plot(x,y);
print(sorted(x_spikes))
[375, 2828, 3494, 6526, 6855, 6902, 6923, 7117, 7831, 9558]
The plt.plot command creates one or more Line2D objects. Those lines have a linewidth. The unit of the linewidth is points (the default being 1.5 points).
Independent of the pixel resolution all data is hence shown, no data is lost.
What can happen though is that if you make the linewidth very narrow, features may get lost due to antialiasing.
To ensure that is not happening you may always use a linewidth which is at least ppi/dpi. I.e. 72/dpi in the matplotlib case. The default dpi is 100. So as long as the linewidth is greater or equal 0.72 points, all points are shown. (In Juypter often the default dpi is 72, hence 72/72==1, and a linewidth of 1 would be needed.)
All of this applies to lines. For bar plots (where the width is in data coordinates) this is different. Also images might not show all data - though imshow has the interpolation argument to allow to steer the interpolating behaviour.
I'm trying to draw the best fitting line for given (x,y) data points.
Here shows data points (red pixels) and estimated line (green), I obtained using following library.
import numpy as np
m, c = np.linalg.lstsq(A, y)[0]
Documentation for used library module
We can see data points are roughly symmetrically distributed. Problem is why is this line not having the gradient similar to the long symmetric axis through the data points? Can you please explain can this result is correct? Then, how it gives minimum error? (Line is drawn correctly using gradient returned by the lstsq method). Thank you.
EDIT
Here is the code I'm trying. Input image can be downloaded from here. In this code I've not forced the line to pass through the center of the pixel distribution. (Note: here I've used polyfit instead of lstsq. Both gives same results)
import numpy as np
import cv2
import math
img = cv2.imread('points.jpg',1);
h, w = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
points = np.argwhere(gray>10) # get (x,y) pairs where red pixels exist
y = points[:,0]
x = points[:,1]
m, c = np.polyfit(x, y, 1) # calculate least square fit line
# calculate two cordinates (x1,y1),(x2,y2) on the line
angle = np.arctan(m)
x1, y1, length = 0, int(c), 500
x2 = int(round(math.ceil(x1 + length * np.cos(angle)),0))
y2 = int(round(math.ceil(y1 + length * np.sin(angle)),0))
# draw line on the color image
cv2.line(img, (x1, y1), (x2, y2), (0,255,0), 1, cv2.LINE_8)
# show output the image
cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE);
cv2.imshow("Display window", img);
cv2.waitKey(0);
cv2.destroyAllWindows()
How can I have the line pass through the longest symmetric axis of the pixel distribution? Can I use principle component analysis?
It's hard to say why this would be the case. The bottom line is that I can't see the data you're using, and I can't see what the calculated slope and y intercept are for the data you're using.
Here are a couple of things that could explain what we're seeing:
(1) The density of data points is actually quite different than it appears to a casual glance and everything is working properly.
(2) You're sending the wrong arguments to the least squares function and you've got a GIGO situation. (I haven't used numpy's least squares algorithm, so I can't check this.)
(3) The scatter plot and the line plot don't agree on the scale of the axes.
(4) The least squares function in question is broken.
(5) You're not passing the same data to the least squares algorithm as you're passing to the plotting routine.
(6) The data formatting is funky so that the scatter plot and least squares routines are interpreting your data differently.
I can't know which of these is the problem, and unless it's (3), I expect we'd need more data to be able to distinguish between these possibilities.
Here's how I'd proceed if I were you: (1) Create a small artificial data set that sits on a line and pass it to the least squares function and see if it spits out the right numbers. See if these look right when plotted or not. (2) If this looks okay, record the output of the least squares algorithm, see if you can find another least squares program to calculate the slope and y intercept and compare them. If they're the same, it's probably not the routine, it's probably something to do with plotting.
If you get this far and it's still a mystery, let us know what you've found and maybe we can make another suggestion.
Good luck.
If the red dots truly represent your data, you are probably applying your linear regression function in a way that forces the line through the origin. How do i know? When using linear regression on two variables x and y, the line will intercept a few specific points. For example the average of x, and the average of y. Also, depending on your specifications, a calculated or specified intercept of the y axis. If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin. Not much more can be said before you provide som reproducible data and code.
EDIT:
I didn't have much luck with the reproducble sample provided, so I built an example with random numbers to elaborate on my original answer. I think statsmodels is a decent library for linear regression analysis. First, I'll address this earlier comment:
If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin.
You'll see an increasing effect of this the larger your numbers are (the further away from the origin your numbers are). Using sm.OLS(y,sm.add_constant(x)).fit() and sm.OLS(y,x).fit() for two different sets of numbers will show you exactly what I mean. First, I'll run a regression on the dataset below without an estimated constant (the line goes through the origin). This will give us a plot that at resembles your original plot:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
regLine_origin = x*results1.params[0]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.patch.set_facecolor('black')
plt.show()
Next, I'll include a constant in the regression. Now, the yellow line will represent what I think you were after in your question:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
results2 = sm.OLS(y,sm.add_constant(x)).fit()
regLine_origin = x*results1.params[0]
regLine_constant = results2.params[0] + x*results2.params[1]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.scatter(x, regLine_constant, c = 'yellow', s = 1)
ax.patch.set_facecolor('black')
plt.show()
And lastly, we can take a look at what happens when the numbers are closer to the origin. So to speak. Here, I'll remove the +100 part when the numbers are produced:
# The following is changed in the snippet above:
# Data
x = np.random.normal(size=2500)
y = x * 2 + np.random.normal(size=2500)
And that's why I think your original regression line is set to go through the origin. Have a look at the statsmodels package. Here you can study the details of the estimate by running print(results2.summary()):
And as you've already seen in the snippets above, you'll have direct access to the regression coefficients by using results2.params.
Edit2: My explanation still isn't 100% valid. The x and y values will have to differ a bit in size to see this effect. You'll certainly find situations where the line goes through the origin no matter the size of the numbers.
Have a look at the different x labels, and you'll see what I mean.
Beginning python/numpy user here. I do an analysis of a 2D function in the XY plane. Using 2 loops through x and y I compute the function value and store it into an array for later plotting. I ran into a couple of problems.
Lets say my XY range is -10 to 10. How do I accommodate that when storing computed value into my data array? (only positive numbers are allowed as indices) For now I just add to x and Y to make it positive.
From my data I know that the extreme is a x=-3 and y=2. When I plot the computed array first of all the axes labels are wrong. I would like Y to go the mathematical way. (up)
I would like the axes labels to run from -10 to 10. I tried 'extend' but that did not come out right.
Again from my data I know that the extreme is at x=-3 and y=2. In the plot when I hover the mouse over the graphics, the max value is shown at x=12 and y=7. Seems x and y have been swapped. Though when I move the mouse the displayed x and y numbers run as follows. X grows larger when moving the mouse right etc. (OK) Y runs the wrong way, grows larger when moving DOWN.
As side note it would be nice to have the function value shown in the plot window as well next to x and y.
Here is my code:
size = 10
q = np.zeros((2*size,2*size))
for xs in range(-size,+size):
for ys in range(-size,+size):
q[xs+size,ys+size] = my_function_of_x_and_y(x,y)
im = plt.imshow(q, cmap='rainbow', interpolation='none')
plt.show()
One more thing. I would like not to mess with the q array too badly as I later want to find the extreme spot in it.
idxmin = np.argmin(q)
xmin,ymin = np.unravel_index(idxmin, q.shape)
xmin= xmin-size
ymin= ymin-size
So that I get this:
>>> xmin,ymin
(-3, 2)
>>>
Here is my plot:
(source: dyndns.ws)
Here is the desired plot (made in photoshop) (axis lineswould be nice):
(source: dyndns.ws)
Not too sure why setting extend did not work for you but this is how I have implemented it
q = np.random.randint(-10,10, size=(20, 20))
im = plt.imshow(q, cmap='rainbow', interpolation='none',extent=[-10,10,-10,10])
plt.vlines(0,10,-10)
plt.hlines(0,10,-10)
plt.show()
Use vlines and hlines methods to set the centering line
To simplify my problem (it's not exactly like that but I prefer simple answers to simple questions):
I have several 2D maps that portray rectangular region areas. I'd like to add on the map axes and ticks to show the distances on this map (with matplotlib, since the old code is with it), but the problem is that the areas are different sized. I'd like to put on the axes nice, clear ticks, but the widths and heights of the maps can be anything...
To try to explain what I mean: Let's say I have a map of a region whose size is 4.37 km * 6.42 km. I want that there is on x-axis ticks on 0, 1, 2, 3, and 4 km:s and on y-axis ticks on 0, 1, 2, 3, 4, 5, and 6 km:s. However, the image and the axes reach a bit further than to 4 km and 6 km, since the region is larger then 4 km * 6 km.
The space between the ticks can be constant, 1 km. However, the sizes of the maps vary quite a lot (let's say, between 5-15 km), and they are float values. My current script knows the size of the region and can scale the image into right height/width ratio, but how to tell it where to put the ticks?
There may be already solution for this problem, but since I couldn't find suitable search words for my problem, I had to ask it here...
Just set the tick locator to use matplotlib.ticker.MultipleLocator(x) where x is the spacing that you want (e.g. 1.0 in your example above).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
x = np.arange(20)
y = x * 0.1
fig, ax = plt.subplots()
ax.plot(x, y)
ax.xaxis.set_major_locator(MultipleLocator(1.0))
ax.yaxis.set_major_locator(MultipleLocator(1.0))
# Forcing the plot to be labeled with "plain" integers instead of scientific notation
ax.xaxis.set_major_formatter(FormatStrFormatter('%i'))
plt.show()
The advantage to this is that no matter how we zoom or interact with the plot, it will always be labeled with ticks 1 unit apart.
This should give you ticks at all integer values within your current axis limits on the x axis:
from matplotlib import pylab as plt
import math
# get values for the axis limits (unless you already have them)
xmin,xmax = plt.xlim()
# get the outermost integer values using floor and ceiling
# (I need to convert them to int to avoid a DeprecationWarning),
# then get all the integer values between them using range
new_xticks = range(int(math.ceil(xmin)),int(math.floor(xmax)+1))
plt.xticks(new_xticks,new_xticks)
# passing the same argment twice here because the first gives the tick locations
# and the second gives the tick labels, which should just be the numbers
Repeat for the y axis.
Out of curiosity: what kind of ticks do you get by default?
Okay, I tried your versions, but unfortunately I couldn't make them work, since there was some scaling and PDF locating stuff that made me (and your code suggestions) badly confused. But by testing them, I learned again a lot of python, thanks!
I managed finally to find a solution that isn't very exact but satisfies my needs. Here is how I did it.
In my version, one km is divided by a suitable integer constant named STEP_PART. The bigger is STEP_PART, the more accurate the axis values are (and if it is too big, the axis becomes messy to read). For example, if STEP_PART is 5, the accuracy is 1 km / 5 = 200 m, and ticks are put to every 200 m.
STEP_PART = 5 # In the start of the program.
height = 6.42 # These are actually given elsewhere,
width = 4.37 # but just as example...
vHeight = range(0, int(STEP_PART*height), 1) # Make tick vectors, now in format
# 0, 1, 2... instead of 0, 0.2...
vWidth = range(0, int(STEP_PART*width), 1) # Should be divided by STEP_PART
# later to get right values.
To avoid making too many axis labels (0, 1, 2... are enough, 0, 0.2, 0.4... is far too much), we replace non-integer km values with string "". Simultaneously, we divide integer km values by STEP_PART to get right values.
for j in range(len(vHeight)):
if (j % STEP_PART != 0):
vHeight[j] = ""
else:
vHeight[j] = int(vHeight[j]/STEP_PART)
for i in range(len(vWidth)):
if (i % STEP_PART != 0):
vWidth[i] = ""
else:
vWidth[i] = int(vWidth[i]/STEP_PART)
Later, after creating the graph and axes, ticks are put in that way (x axis as an example). There, x is the actual width of the picture, got with shape() command (I don't exactly understand how... there is quite a lot scaling and stuff in the code I'm modifying).
xt = np.linspace(0,x-1,len(vWidth)+1) # For locating those ticks on the same distances.
locs, labels = mpl.xticks(xt, vWidth, fontsize=9)
Repeat for y axis. The result is a graph where is ticks on every 200 m's but data labels on the integer km values. Anyway, the accuracy of those axes are 200 m's, it's not exact but it was enough for me. The script will be even better if I find out how to grow the size of the integer ticks...
How can I plot an 2D array as an image with Matplotlib having the y scale relative to the power of two of the y value?
For instance the first row of my array will have a height in the image of 1, the second row will have a height of 4, etc. (units are irrelevant)
It's not simple to explain with words so look at this image please (that's the kind of result I want):
alt text http://support.sas.com/rnd/app/da/new/802ce/iml/chap1/images/wavex1k.gif
As you can see the first row is 2 times smaller that the upper one, and so on.
For those interested in why I am trying to do this:
I have a pretty big array (10, 700000) of floats, representing the discrete wavelet transform coefficients of a sound file. I am trying to plot the scalogram using those coefficients.
I could copy the array x times until I get the desired image row size but the memory cannot hold so much information...
Have you tried to transform the axis? For example:
ax = subplot(111)
ax.yaxis.set_ticks([0, 2, 4, 8])
imshow(data)
This means there must be gaps in the data for the non-existent coordinates, unless there is a way to provide a transform function instead of just lists (never tried).
Edit:
I admit it was just a lead, not a complete solution. Here is what I meant in more details.
Let's assume you have your data in an array, a. You can use a transform like this one:
class arr(object):
#staticmethod
def mylog2(x):
lx = 0
while x > 1:
x >>= 1
lx += 1
return lx
def __init__(self, array):
self.array = array
def __getitem__(self, index):
return self.array[arr.mylog2(index+1)]
def __len__(self):
return 1 << len(self.array)
Basically it will transform the first coordinate of an array or list with the mylog2 function (that you can transform as you wish - it's home-made as a simplification of log2). The advantage is, you can re-use that for another transform should you need it, and you can easily control it too.
Then map your array to this one, which doesn't make a copy but a local reference in the instance:
b = arr(a)
Now you can display it, for example:
ax = subplot(111)
ax.yaxis.set_ticks([16, 8, 4, 2, 1, 0])
axis([-0.5, 4.5, 31.5, 0.5])
imshow(b, interpolation="nearest")
Here is a sample (with an array containing random values):
alt text http://img691.imageshack.us/img691/8883/clipboard01f.png
The best way I've found to make a scalogram using matplotlib is to use imshow, similar to the implementation of specgram. Using rectangles is slow, because you're having to make a separate glyph for each value. Similarly, you don't want to have to bake things into a uniform NumPy array, because you'll probably run out of memory fast, since your highest level is going to be about as long as half your signal.
Here's an example using SciPy and PyWavelets:
from pylab import *
import pywt
import scipy.io.wavfile as wavfile
# Find the highest power of two less than or equal to the input.
def lepow2(x):
return 2 ** floor(log2(x))
# Make a scalogram given an MRA tree.
def scalogram(data):
bottom = 0
vmin = min(map(lambda x: min(abs(x)), data))
vmax = max(map(lambda x: max(abs(x)), data))
gca().set_autoscale_on(False)
for row in range(0, len(data)):
scale = 2.0 ** (row - len(data))
imshow(
array([abs(data[row])]),
interpolation = 'nearest',
vmin = vmin,
vmax = vmax,
extent = [0, 1, bottom, bottom + scale])
bottom += scale
# Load the signal, take the first channel, limit length to a power of 2 for simplicity.
rate, signal = wavfile.read('kitten.wav')
signal = signal[0:lepow2(len(signal)),0]
tree = pywt.wavedec(signal, 'db5')
# Plotting.
gray()
scalogram(tree)
show()
You may also want to scale values adaptively per-level.
This works pretty well for me. The only problem I have is that matplotlib creates a hairline-thin space between levels. I'm still looking for a way to fix this.
P.S. - Even though this question is pretty old now, I figured I'd respond here, because this page came up on Google when I was looking for a method of creating scalograms using MPL.
You can look at matplotlib.image.NonUniformImage. But that only assists with having nonuniform axis - I don't think you're going to be able to plot adaptively like you want to (I think each point in the image is always going to have the same area - so you are going to have to have the wider rows multiple times). Is there any reason you need to plot the full array? Obviously the full detail isn't going to show up in any plot - so I would suggest heavily downsampling the original matrix so you can copy rows as required to get the image without running out of memory.
If you want both to be able to zoom and save memory, you could do the drawing "by hand". Matplotlib allows you to draw rectangles (they would be your "rectangular pixels"):
from matplotlib import patches
axes = subplot(111)
axes.add_patch(patches.Rectangle((0.2, 0.2), 0.5, 0.5))
Note that the extents of the axes are not set by add_patch(), but you can set them yourself to the values you want (axes.set_xlim,…).
PS: I looks to me like thrope's response (matplotlib.image.NonUniformImage) can actually do what you want, in a simpler way that the "manual" method described here!