I have a bunch of images like this one:
The corresponding data is not available. I need to automatically retrieve about 100 points (regularly x-spaced) on the blue curve. All curves are very similar, so I need at least 1 pixel precision, but sub-pixel would be preferred. The good news is all curves start from 0,0 and end at 1,1, so we may forget about the grid.
Any hint on Python libs that could help or any other approach ? Thanks !
I saved your image to a file 14154233_input.png. Then this program
import pylab as plt
import numpy as np
# Read image from disk and filter all grayscale
im = plt.imread("14154233_input.png")[:,:,:3]
im -= im.mean(axis=2).reshape(im.shape[0], im.shape[1], 1).repeat(3,axis=2)
im_maxnorm = im.max(axis=2)
# Find y-position of remaining line
ypos = np.ones((im.shape[1])) * np.nan
for i in range(im_maxnorm.shape[1]):
if im_maxnorm[:,i].max()<0.01:
continue
ypos[i] = np.argmax(im_maxnorm[:,i])
# Pick only values that are set
ys = 1-ypos[np.isfinite(ypos)]
# Normalize to 0,1
ys -= ys.min()
ys /= ys.max()
# Create x values
xs = np.linspace(0,1,ys.shape[0])
# Create plot of both
# read and filtered image and
# data extracted
plt.figure(figsize=(4,8))
plt.subplot(211)
plt.imshow(im_maxnorm)
plt.subplot(212, aspect="equal")
plt.plot(xs,ys)
plt.show()
Produces this plot:
You can then do with xs and ys whatever you want. Maybe you should put this code in a function that returns xs and ys or so.
One could improve the precision by fitting gaussians on each column or so. If you really need it, tell me.
First, read the image via
from scipy.misc import imread
im = imread("thefile.png")
This gives a 3D numpy array with the third dimension being the color channels (RGB+alpha). The curve is in the blue channel, but the grid is there also. But in the red channel, you have the grid and not the curve. So we use
a = im[:,:,2] - im[:,:,0]
Now, we want the position of the maximum along each column. With one pixel precision, it is given by
y0 = np.argmax(a, axis=0)
The result of this is zero when there is no blue curve in the column , i.e. outside the frame. On can get the limits of the frame by
xmin, xmax = np.where(y0>0)[0][[0,-1]
With this, you may be able to rescale x axis.
Then, you want subpixel resolution. Let us focus on a single column
f=a[:,x]
We use a single iteration of the Newton method to refine the position of an extrema
y1 = y0 - f'[y]/f''[y]
Note that we cannot iterate further because of the discreet sampling. Nontheless, we want a good approximation of the derivatives, so we will use a 5-point scheme for both.
coefprime = np.array([1,-8, 0, 8, -1], float)
coefsec = np.array([-1, 16, -30, 16, -1], float)
y1 = y0 - np.dot(f[y0-2:y0+3], coefprime)/np.dot(f[y0-2:y0+3], coefsec)
P.S. : Thorsten Kranz was faster than me (at least here), but my answer has the subpixel precision and my way of extracting the blue curve is probably more understandable.
Related
Using matplotlib(or if there exists anything else), i want to populate a scatterplot image by using a grey scale image as its distribution. I have found many resource to create heat maps from images but not the other way around.
The input image will be like this one.
I think I understand what you're going for, but I'm not certain. I also don't really understand what this would be used for so I'm extra uncertain about this answer, but here goes:
So by loading the image we can evaluate each pixel position and its intensity. We can use that intensity as a "fitness" value and probabilistically add it to our plot so that we can get some of that "density" of points that you want to see. I picked a really simple equation as a decider (I just cubed the value), but feel free to replace that with whatever you want.
import cv2
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import random
# select func
def selection(value):
return value**3 >= random.randint(0, 255**3);
# populate the sample
def populate(img):
# get res
h, w = img.shape;
# go through and populate
sx = [];
sy = [];
for y in range(0, h):
for x in range(0, w):
val = img[y, x];
# use intensity to decide if it gets in
# replace with what you want this function to look like
if selection(val):
sx.append(x);
sy.append(h - y); # opencv is top-left origin
return sx, sy;
# I'm using opencv to pull the image into code, use whatever you like
# matplotlib can also do something similar, but I'm not familiar with its format
img = cv2.imread("circ.png");
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY);
# lets take a sample
sx, sy = populate(img);
# find the bigger square size
h, w = img.shape;
side = None;
if h > w:
side = h;
else:
side = w;
# make a square graph
fig, ax = plt.subplots();
ax.scatter(sx, sy, s = 4);
ax.set_xlim((0, side));
ax.set_ylim((0, side));
x0,x1 = ax.get_xlim();
y0,y1 = ax.get_ylim();
ax.set_aspect(abs(x1-x0)/abs(y1-y0));
fig.savefig("out.png", dpi=600);
plt.show();
Feel free to replace opencv with whatever image library you're comfortable with. I'm pretty sure matplotlib can open images as well, but openCV is what I'm most familiar with so I used that.
As far as I can tell, you're trying to generate random coordinates that follow a distribution described by a grayscale image: the brighter each point, the more likely that point's coordinates will be generated. Your problem can thus be solved by a rejection sampler, as follows.
Assume you know the width and height of the image in pixels, call them w and h.
Generate two random numbers: one in the interval [0, w), and [0, h). These are the x and y coordinates, respectively.
Get the pixel at the given coordinates x and y in the image. This can be done using interpolation, but describing interpolation techniques is beyond the scope of this answer. For this reason, we will use only the nearest pixel ("nearest neighbor") in the image: take the pixel at coordinate floor(x) and floor(y) (and step 1 devolves to generating random integers). Convert the pixel somehow to a number p in the interval [0, 1]; in this answer we will assume black is 0 and white is 1, to simplify matters.
With probability p, return the point (x, y). Otherwise, go to step 1.
Roughly speaking, the time complexity of this algorithm depends on the numbers of "bright points" the input image has, compared to the number of "dark points". In general, the "brighter" the image, the higher the acceptance rate (and the faster the algorithm runs).
I have made a workflow code to detect the edges of a flame in an image. I could get the edge line. It consists of many pixel points stored in an array (data in my code). Now based on the data, I would like to calculate the length of the edge. The idea is to calculate the distance between every point in data and sum them all to get the length. I really stuck in making that. Please help me, many thanks.
Here is a processed image:
Here is the original image that converted to the processed image, I put in the code is to compare the result:
import cv2
import matplotlib.pyplot as plt
if __name__ == '__main__':
path = '1897_1.jpg' #processed image
pic = cv2.imread(path)
original = cv2.imread('1897_2.jpg') #original image
img2 = cv2.flip(original, 1)
b,g,r = cv2.split(pic)
img4 = cv2.flip(b, 1)
h,w = img4.shape
data = []
th_val = 20
for i in range(h):
for j in range(w):
val = img4[i, j]
if (val >= th_val):
data.append(j)
break
b1 = range(len(data))
b2 = len(data)
result = [b2]
print (b2)
plt.figure(figsize = (10, 8))
plt.subplot(121)
plt.imshow(img4)
plt.plot(data, b1)
plt.axis('off');
plt.subplot(122)
plt.plot(data, b1)
plt.imshow(img2)
plt.axis('off')
I came up with a very simple solution, it is far from optimal, but it works for this example, and it is a good starting point. Unfortunately, this solution is not optimal for the blue chanell, where the curve is not smooth, but it works for green and red chanells.
data contains width coordinates of the first red pixel overcoming threshold. So, all first pixels are separated by 1 pixel step on vertical axes and data[i+1] - data[i] on horizontal axes. These two values can be considered as two cathetus of the squeare triangle, and the hypothenuse is the distance we want to calculate. So, here is the solution:
length = 0
for i in range(0,len(data)-1):
cathetus = data[i+1]-data[i]
hypothenuse = (cathetus**2 + 1**2)**1/2
length += hypothenuse
print(length)
Update
I have came up with two solutions: a hardcoded one and one released in the form of the function. Let us start with the first one: mean is a rather good approximator for the signal + noise. In the situation, when you do not have very strong noise or missing data, you may use this approach. In the example below we select points with x in [1,2,3] then we calculate mean y for these points and assign mean to coordinate x=2. Next we select points x in [2,3,4] and so on. As a result, we obtain mean_data list with y coordinates and mean_x with x coordinates. We can calculate length with the approach described above. You may also increase the power of smoothing by averaging over 4 and more points from data.
mean_data = []
mean_x = range(1,len(data)-1)
for i in range(0,len(data)-2):
mean_d = (data[i] + data[i+1] + data[i+2])/3
mean_data.append(mean_d)
Another approach is to use smoothing tools from scipy package. One of them is described below. When calculating the length you will have to adjust to new x axes xnew.
from scipy.interpolate import spline
import numpy as np
#transform to np.arrays initial data
b1_ = np.array(b1)
data_ = np.array(data)
# create new x with more data points
xnew = np.linspace(b1_.min(),b1_.max(),50) #50 is a number of points in between
smoothed_data = spline(b1_,data_,xnew)
I'm trying to draw the best fitting line for given (x,y) data points.
Here shows data points (red pixels) and estimated line (green), I obtained using following library.
import numpy as np
m, c = np.linalg.lstsq(A, y)[0]
Documentation for used library module
We can see data points are roughly symmetrically distributed. Problem is why is this line not having the gradient similar to the long symmetric axis through the data points? Can you please explain can this result is correct? Then, how it gives minimum error? (Line is drawn correctly using gradient returned by the lstsq method). Thank you.
EDIT
Here is the code I'm trying. Input image can be downloaded from here. In this code I've not forced the line to pass through the center of the pixel distribution. (Note: here I've used polyfit instead of lstsq. Both gives same results)
import numpy as np
import cv2
import math
img = cv2.imread('points.jpg',1);
h, w = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
points = np.argwhere(gray>10) # get (x,y) pairs where red pixels exist
y = points[:,0]
x = points[:,1]
m, c = np.polyfit(x, y, 1) # calculate least square fit line
# calculate two cordinates (x1,y1),(x2,y2) on the line
angle = np.arctan(m)
x1, y1, length = 0, int(c), 500
x2 = int(round(math.ceil(x1 + length * np.cos(angle)),0))
y2 = int(round(math.ceil(y1 + length * np.sin(angle)),0))
# draw line on the color image
cv2.line(img, (x1, y1), (x2, y2), (0,255,0), 1, cv2.LINE_8)
# show output the image
cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE);
cv2.imshow("Display window", img);
cv2.waitKey(0);
cv2.destroyAllWindows()
How can I have the line pass through the longest symmetric axis of the pixel distribution? Can I use principle component analysis?
It's hard to say why this would be the case. The bottom line is that I can't see the data you're using, and I can't see what the calculated slope and y intercept are for the data you're using.
Here are a couple of things that could explain what we're seeing:
(1) The density of data points is actually quite different than it appears to a casual glance and everything is working properly.
(2) You're sending the wrong arguments to the least squares function and you've got a GIGO situation. (I haven't used numpy's least squares algorithm, so I can't check this.)
(3) The scatter plot and the line plot don't agree on the scale of the axes.
(4) The least squares function in question is broken.
(5) You're not passing the same data to the least squares algorithm as you're passing to the plotting routine.
(6) The data formatting is funky so that the scatter plot and least squares routines are interpreting your data differently.
I can't know which of these is the problem, and unless it's (3), I expect we'd need more data to be able to distinguish between these possibilities.
Here's how I'd proceed if I were you: (1) Create a small artificial data set that sits on a line and pass it to the least squares function and see if it spits out the right numbers. See if these look right when plotted or not. (2) If this looks okay, record the output of the least squares algorithm, see if you can find another least squares program to calculate the slope and y intercept and compare them. If they're the same, it's probably not the routine, it's probably something to do with plotting.
If you get this far and it's still a mystery, let us know what you've found and maybe we can make another suggestion.
Good luck.
If the red dots truly represent your data, you are probably applying your linear regression function in a way that forces the line through the origin. How do i know? When using linear regression on two variables x and y, the line will intercept a few specific points. For example the average of x, and the average of y. Also, depending on your specifications, a calculated or specified intercept of the y axis. If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin. Not much more can be said before you provide som reproducible data and code.
EDIT:
I didn't have much luck with the reproducble sample provided, so I built an example with random numbers to elaborate on my original answer. I think statsmodels is a decent library for linear regression analysis. First, I'll address this earlier comment:
If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin.
You'll see an increasing effect of this the larger your numbers are (the further away from the origin your numbers are). Using sm.OLS(y,sm.add_constant(x)).fit() and sm.OLS(y,x).fit() for two different sets of numbers will show you exactly what I mean. First, I'll run a regression on the dataset below without an estimated constant (the line goes through the origin). This will give us a plot that at resembles your original plot:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
regLine_origin = x*results1.params[0]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.patch.set_facecolor('black')
plt.show()
Next, I'll include a constant in the regression. Now, the yellow line will represent what I think you were after in your question:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
results2 = sm.OLS(y,sm.add_constant(x)).fit()
regLine_origin = x*results1.params[0]
regLine_constant = results2.params[0] + x*results2.params[1]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.scatter(x, regLine_constant, c = 'yellow', s = 1)
ax.patch.set_facecolor('black')
plt.show()
And lastly, we can take a look at what happens when the numbers are closer to the origin. So to speak. Here, I'll remove the +100 part when the numbers are produced:
# The following is changed in the snippet above:
# Data
x = np.random.normal(size=2500)
y = x * 2 + np.random.normal(size=2500)
And that's why I think your original regression line is set to go through the origin. Have a look at the statsmodels package. Here you can study the details of the estimate by running print(results2.summary()):
And as you've already seen in the snippets above, you'll have direct access to the regression coefficients by using results2.params.
Edit2: My explanation still isn't 100% valid. The x and y values will have to differ a bit in size to see this effect. You'll certainly find situations where the line goes through the origin no matter the size of the numbers.
Have a look at the different x labels, and you'll see what I mean.
I have a spatial set of data with Z values I want to interpolate using some matplotlib or scipy module. My XY points have a concave shape and I don't want interpolated values in the empty zone. Is there a method that easily allow user to set a maximum distance between points to avoid interpolation in the empty zone?
I struggled with the same question and found a work around by re-using the kd-tree implementation that scipy itself uses for the nearest neighbour interpolation, masking the interpolated result array with the result of the kd-tree querying result.
Consider the example code below:
import numpy as np
import scipy.interpolate
import matplotlib.pyplot as plt
# Generate some random data
xy = np.random.random((2**15, 2))
z = np.sin(10*xy[:,0]) * np.cos(10*xy[:,1])
grid = np.meshgrid(
np.linspace(0, 1, 512),
np.linspace(0, 1, 512)
)
# Interpolate
result1 = scipy.interpolate.griddata(xy, z, tuple(grid), 'linear')
# Show
plt.figimage(result1)
plt.show()
# Remove rectangular window
mask = np.logical_and.reduce((xy[:,0] > 0.2, xy[:,0] < 0.8, xy[:,1] > 0.2, xy[:,1] < 0.8))
xy, z = xy[~mask], z[~mask]
# Interpolate
result2 = scipy.interpolate.griddata(xy, z, tuple(grid), 'linear')
# Show
plt.figimage(result2)
plt.show()
This generates the following two images. Notices the strong interpolation artefacts because of the missing rectangle window in the centre of the data.
Now if we run the code below on the same example data, the following image is obtained.
THRESHOLD = 0.01
from scipy.interpolate.interpnd import _ndim_coords_from_arrays
from scipy.spatial import cKDTree
# Construct kd-tree, functionality copied from scipy.interpolate
tree = cKDTree(xy)
xi = _ndim_coords_from_arrays(tuple(grid), ndim=xy.shape[1])
dists, indexes = tree.query(xi)
# Copy original result but mask missing values with NaNs
result3 = result2[:]
result3[dists > THRESHOLD] = np.nan
# Show
plt.figimage(result3)
plt.show()
I realize it may not be the visual effect you're after exactly. Especially if your dataset is not very dense you'll need to have a high distance threshold value in order for legitimately interpolated data not to be masked. If your data is dense enough, you might be able to get away with a relatively small radius, or maybe come up with a smarter cut-off function. Hope that helps.
How can I plot an 2D array as an image with Matplotlib having the y scale relative to the power of two of the y value?
For instance the first row of my array will have a height in the image of 1, the second row will have a height of 4, etc. (units are irrelevant)
It's not simple to explain with words so look at this image please (that's the kind of result I want):
alt text http://support.sas.com/rnd/app/da/new/802ce/iml/chap1/images/wavex1k.gif
As you can see the first row is 2 times smaller that the upper one, and so on.
For those interested in why I am trying to do this:
I have a pretty big array (10, 700000) of floats, representing the discrete wavelet transform coefficients of a sound file. I am trying to plot the scalogram using those coefficients.
I could copy the array x times until I get the desired image row size but the memory cannot hold so much information...
Have you tried to transform the axis? For example:
ax = subplot(111)
ax.yaxis.set_ticks([0, 2, 4, 8])
imshow(data)
This means there must be gaps in the data for the non-existent coordinates, unless there is a way to provide a transform function instead of just lists (never tried).
Edit:
I admit it was just a lead, not a complete solution. Here is what I meant in more details.
Let's assume you have your data in an array, a. You can use a transform like this one:
class arr(object):
#staticmethod
def mylog2(x):
lx = 0
while x > 1:
x >>= 1
lx += 1
return lx
def __init__(self, array):
self.array = array
def __getitem__(self, index):
return self.array[arr.mylog2(index+1)]
def __len__(self):
return 1 << len(self.array)
Basically it will transform the first coordinate of an array or list with the mylog2 function (that you can transform as you wish - it's home-made as a simplification of log2). The advantage is, you can re-use that for another transform should you need it, and you can easily control it too.
Then map your array to this one, which doesn't make a copy but a local reference in the instance:
b = arr(a)
Now you can display it, for example:
ax = subplot(111)
ax.yaxis.set_ticks([16, 8, 4, 2, 1, 0])
axis([-0.5, 4.5, 31.5, 0.5])
imshow(b, interpolation="nearest")
Here is a sample (with an array containing random values):
alt text http://img691.imageshack.us/img691/8883/clipboard01f.png
The best way I've found to make a scalogram using matplotlib is to use imshow, similar to the implementation of specgram. Using rectangles is slow, because you're having to make a separate glyph for each value. Similarly, you don't want to have to bake things into a uniform NumPy array, because you'll probably run out of memory fast, since your highest level is going to be about as long as half your signal.
Here's an example using SciPy and PyWavelets:
from pylab import *
import pywt
import scipy.io.wavfile as wavfile
# Find the highest power of two less than or equal to the input.
def lepow2(x):
return 2 ** floor(log2(x))
# Make a scalogram given an MRA tree.
def scalogram(data):
bottom = 0
vmin = min(map(lambda x: min(abs(x)), data))
vmax = max(map(lambda x: max(abs(x)), data))
gca().set_autoscale_on(False)
for row in range(0, len(data)):
scale = 2.0 ** (row - len(data))
imshow(
array([abs(data[row])]),
interpolation = 'nearest',
vmin = vmin,
vmax = vmax,
extent = [0, 1, bottom, bottom + scale])
bottom += scale
# Load the signal, take the first channel, limit length to a power of 2 for simplicity.
rate, signal = wavfile.read('kitten.wav')
signal = signal[0:lepow2(len(signal)),0]
tree = pywt.wavedec(signal, 'db5')
# Plotting.
gray()
scalogram(tree)
show()
You may also want to scale values adaptively per-level.
This works pretty well for me. The only problem I have is that matplotlib creates a hairline-thin space between levels. I'm still looking for a way to fix this.
P.S. - Even though this question is pretty old now, I figured I'd respond here, because this page came up on Google when I was looking for a method of creating scalograms using MPL.
You can look at matplotlib.image.NonUniformImage. But that only assists with having nonuniform axis - I don't think you're going to be able to plot adaptively like you want to (I think each point in the image is always going to have the same area - so you are going to have to have the wider rows multiple times). Is there any reason you need to plot the full array? Obviously the full detail isn't going to show up in any plot - so I would suggest heavily downsampling the original matrix so you can copy rows as required to get the image without running out of memory.
If you want both to be able to zoom and save memory, you could do the drawing "by hand". Matplotlib allows you to draw rectangles (they would be your "rectangular pixels"):
from matplotlib import patches
axes = subplot(111)
axes.add_patch(patches.Rectangle((0.2, 0.2), 0.5, 0.5))
Note that the extents of the axes are not set by add_patch(), but you can set them yourself to the values you want (axes.set_xlim,…).
PS: I looks to me like thrope's response (matplotlib.image.NonUniformImage) can actually do what you want, in a simpler way that the "manual" method described here!