I'm in the process of building an automated game bot in Python on OS X 10.8.2 and in the process of researching Python GUI automation I discovered autopy. The mouse manipulation API is great, but it seems that the screen capture methods rely on deprecated OpenGL methods...
Are there any efficient ways of getting the color value of a pixel in OS X? The only way I can think of now is to use os.system("screencapture foo.png") but the process seems to have unneeded overhead as I'll be polling very quickly.
A small improvement, but using the TIFF compression option for screencapture is a bit quicker:
$ time screencapture -t png /tmp/test.png
real 0m0.235s
user 0m0.191s
sys 0m0.016s
$ time screencapture -t tiff /tmp/test.tiff
real 0m0.079s
user 0m0.028s
sys 0m0.026s
This does have a lot of overhead, as you say (the subprocess creation, writing/reading from disc, compressing/decompressing).
Instead, you could use PyObjC to capture the screen using CGWindowListCreateImage. I found it took about 70ms (~14fps) to capture a 1680x1050 pixel screen, and have the values accessible in memory
A few random notes:
Importing the Quartz.CoreGraphics module is the slowest part, about 1 second. Same is true for importing most of the PyObjC modules. Unlikely to matter in this case, but for short-lived processes you might be better writing the tool in ObjC
Specifying a smaller region is a bit quicker, but not hugely (~40ms for a 100x100px block, ~70ms for 1680x1050). Most of the time seems to be spent in just the CGDataProviderCopyData call - I wonder if there's a way to access the data directly, since we dont need to modify it?
The ScreenPixel.pixel function is pretty quick, but accessing large numbers of pixels is still slow (since 0.01ms * 1650*1050 is about 17 seconds) - if you need to access lots of pixels, probably quicker to struct.unpack_from them all in one go.
Here's the code:
import time
import struct
import Quartz.CoreGraphics as CG
class ScreenPixel(object):
"""Captures the screen using CoreGraphics, and provides access to
the pixel values.
"""
def capture(self, region = None):
"""region should be a CGRect, something like:
>>> import Quartz.CoreGraphics as CG
>>> region = CG.CGRectMake(0, 0, 100, 100)
>>> sp = ScreenPixel()
>>> sp.capture(region=region)
The default region is CG.CGRectInfinite (captures the full screen)
"""
if region is None:
region = CG.CGRectInfinite
else:
# TODO: Odd widths cause the image to warp. This is likely
# caused by offset calculation in ScreenPixel.pixel, and
# could could modified to allow odd-widths
if region.size.width % 2 > 0:
emsg = "Capture region width should be even (was %s)" % (
region.size.width)
raise ValueError(emsg)
# Create screenshot as CGImage
image = CG.CGWindowListCreateImage(
region,
CG.kCGWindowListOptionOnScreenOnly,
CG.kCGNullWindowID,
CG.kCGWindowImageDefault)
# Intermediate step, get pixel data as CGDataProvider
prov = CG.CGImageGetDataProvider(image)
# Copy data out of CGDataProvider, becomes string of bytes
self._data = CG.CGDataProviderCopyData(prov)
# Get width/height of image
self.width = CG.CGImageGetWidth(image)
self.height = CG.CGImageGetHeight(image)
def pixel(self, x, y):
"""Get pixel value at given (x,y) screen coordinates
Must call capture first.
"""
# Pixel data is unsigned char (8bit unsigned integer),
# and there are for (blue,green,red,alpha)
data_format = "BBBB"
# Calculate offset, based on
# http://www.markj.net/iphone-uiimage-pixel-color/
offset = 4 * ((self.width*int(round(y))) + int(round(x)))
# Unpack data from string into Python'y integers
b, g, r, a = struct.unpack_from(data_format, self._data, offset=offset)
# Return BGRA as RGBA
return (r, g, b, a)
if __name__ == '__main__':
# Timer helper-function
import contextlib
#contextlib.contextmanager
def timer(msg):
start = time.time()
yield
end = time.time()
print "%s: %.02fms" % (msg, (end-start)*1000)
# Example usage
sp = ScreenPixel()
with timer("Capture"):
# Take screenshot (takes about 70ms for me)
sp.capture()
with timer("Query"):
# Get pixel value (takes about 0.01ms)
print sp.width, sp.height
print sp.pixel(0, 0)
# To verify screen-cap code is correct, save all pixels to PNG,
# using http://the.taoofmac.com/space/projects/PNGCanvas
from pngcanvas import PNGCanvas
c = PNGCanvas(sp.width, sp.height)
for x in range(sp.width):
for y in range(sp.height):
c.point(x, y, color = sp.pixel(x, y))
with open("test.png", "wb") as f:
f.write(c.dump())
I came across this post while searching for a solution to get screenshot in Mac OS X used for real-time processing. I have tried using ImageGrab from PIL as suggested in some other posts but couldn't get the data fast enough (with only about 0.5 fps).
The answer https://stackoverflow.com/a/13024603/3322123 in this post to use PyObjC saved my day! Thanks #dbr!
However, my task requires to get all pixel values rather than just a single pixel, and also to comment on the third note by #dbr, I added a new method in this class to get a full image, in case anyone else might need it.
The image data are returned as a numpy array with dimension of (height, width, 3), which can be directly used for post-processing in numpy or opencv etc… getting individual pixel values from it also becomes pretty trivial using numpy indexing.
I tested the code with a 1600 x 1000 screenshot - getting the data using capture() took ~30 ms and converting it to a np array getimage() takes only ~50 ms on my Macbook. So now I have >10 fps and even faster for smaller regions.
import numpy as np
def getimage(self):
imgdata=np.fromstring(self._data,dtype=np.uint8).reshape(len(self._data)/4,4)
return imgdata[:self.width*self.height,:-1].reshape(self.height,self.width,3)
note I throw away the “alpha” channel from the BGRA 4 channel.
This was all so very helpful I had to come back to comment / however I don't have the reputation.. I do, however, have a sample code of a combination of the answers above for a lightning quick screen capture / save thanks to #dbr and #qqg!
import time
import numpy as np
from scipy.misc import imsave
import Quartz.CoreGraphics as CG
image = CG.CGWindowListCreateImage(CG.CGRectInfinite, CG.kCGWindowListOptionOnScreenOnly, CG.kCGNullWindowID, CG.kCGWindowImageDefault)
prov = CG.CGImageGetDataProvider(image)
_data = CG.CGDataProviderCopyData(prov)
width = CG.CGImageGetWidth(image)
height = CG.CGImageGetHeight(image)
imgdata=np.fromstring(_data,dtype=np.uint8).reshape(len(_data)/4,4)
numpy_img = imgdata[:width*height,:-1].reshape(height,width,3)
imsave('test_fast.png', numpy_img)
Related
Let a numpy array video of shape (T,w,h,3) be given. Here T is a positive integer representing number of frames, w is a positive integer representing the width, h is a positive integer representing the height. Every entry of video is an integer from 0 to 255. In other words, video is a numpy array represents a video in the sense that video[t] is an RGB image for every non-negative integer t < T. After video is given, another array of floats time of shape (T) is given. This array time satisfy time[0]=0 and time[t] < time[t+1] for every non-negative integer t < T. An example of the above situation is given here:
import numpy as np
shape = (200, 500, 1000, 3)
random = np.random.randint(0, 255, shape, dtype= np.uint16)
time = np.zeros((shape[0]), dtype = np.float16)
time[0] = 0
for i in range(1, shape[0]):
x = np.random.random_sample()
time[i] = time[i-1] + x
My goal is to save video and time a playable video file such that:
The video file is in format of either avi or mp4 (so that we can just double click it and play it).
Each frame of the video respects the time array in the following sense: for every non-negative integer t < T, the viewer is seeing the picture video[t] during the time period from time[t] to time[t+1]. The moment time[T-1] is the end of the video.
If possible, keep the original size (in the given example the size is (500,1000)).
How can I achieve this? I tried using the opencv's video writer and it seems I have to enter some fps information which I do not have because the time array can be very non-uniform in terms of when each picture is displayed.
That is impossible with OpenCV. OpenCV's VideoWriter only supports fixed/constant frame rate. Anything based on that will require rounding to the nearest frame time and/or higher-than-necessary frame rates and duplicated frames (or rather frames that contain no change).
You want presentation timestamps (PTS). That's an inherent aspect of media containers and streams. A word of caution: some video players may assume a "reasonable" time span between frames, and may glitch otherwise, like becoming laggy/unresponsive because the whole GUI is tied to video timing... That's the fault of the video player though.
Use PyAV. It's the only ffmpeg wrapper for python I know of that actually uses API calls rather than messing around with subprocesses.
Here's the relevant example: https://github.com/PyAV-Org/PyAV/blob/main/examples/numpy/generate_video_with_pts.py
In short: set frame.pts = int(round(my_pts / stream.codec_context.time_base)) where my_pts is something in seconds.
I wrote that example, derived from the sibling "fixed rate" example. I put some effort into getting the ffmpeg API usage "right" (time bases, containers/streams/contexts) but if it happens to fail or act up, you're allowed and encouraged to question what I did there.
The solution to your problem is to generate all the video frames necessary for a given value of FPS and as a video needs a constant frame rate you have to decide first at which granularity you want your video.
After you have decided the FPS value you go and generate all the required video frames, so you can use the export to video method with a constant frame rate.
The numpy arrays representing the image of the frame stay in the video array same as the last one displayed until there is time to change to another one. The chosen frame rate FPS decides then with which accuracy the changes to new frame image hit the specified time values.
Below Python code with an improved version of getting the time values. It generates all the video frames and the explanations are implemented by self-explaining choice of variable names. The logic behind the algorithm used is to generate a single frame image and repeat it as frame of the resulting video as long as the next value on the time axis is not reached. If the next value on the time axis is reached a new image is generated and repeated as long as the video time does not exceed the next time value. The code writes the created data to an .mp4 file:
import numpy as np
import cv2 as cv
FPS = 15
fps_timeDelta = 1/FPS
noOfImages = 5 # 200
imageShape = (210, 297 , 3) # (500, 1000, 3)
vidWriter = cv.VideoWriter(
'opencv_writeVideo.mp4',
cv.VideoWriter_fourcc(*'MPEG'),
FPS, (imageShape[1], imageShape[0 ])
)
vidFrameTime = np.concatenate(
(np.zeros(1), np.add.accumulate(
np.random.random_sample(size=noOfImages)))
)
vidTime = 0.0
indxVidFrameTime = 1
singleImageRGB = np.random.randint(
0, 256, imageShape, dtype= np.uint8)
cv.imshow("singleImageRGB", singleImageRGB/255 )
cv.waitKey(0)
while vidTime <= vidFrameTime[-1]:
vidTime += fps_timeDelta
if vidTime >= vidFrameTime[indxVidFrameTime]:
singleImageRGB = np.random.randint(0, 255, imageShape, dtype= np.uint8)
indxVidFrameTime += 1
vidWriter.write(singleImageRGB)
I've been fooling around lately with taking the webcam's video steam and giving it a pixel-dependent time delay.
A very simple example for that idea is the famous rolling shutter, but when applied in order of seconds instead of 1/100ths, it looks like this https://youtu.be/mQ0hS7l9ckY
Now, rolling shutter is fun and all, but I want something more general. I want a delay map, a (height, width, 3) shaped array that tells my how far back to go in the video. A pseudo-code for this would be
output_image[y, x, c] = video_cache[delay_map[y,x,c], y, x, c]
where the first index of the video cache is time, y,x are self-explanatory, and c is the color channel (BGR because open cv is weird).
In essence, each pixel of the output is a pixel of the video at the same position, but at a time determined by the delay map at the very same position.
Here's the solution I have now: I flattened everything, I access the video cache similar to how you unravel multi-index nonsense, and once I'm done I reshape the result into an image.
This solution works pretty fast, and I'm pretty proud of it. It almost keeps up with my webcam's frame rate (I think I average on 20 of these per second).
I think the flattening and reshaping of each frame costs me some time, and if I could get rid of those I'd get much better results.
Link to the whole file at the bottom.
Here's a skeleton of my implementation.
I have a class called CircularCacheDelayAccess. It stores a cache of video frames (with given number of frames, called cache_size in my implementation). It enables you to store frames, and get the delay-mapped frame.
Instead of pushing all the frames around each time I store a new one, I keep an index that goes around in a circle, and video[delay=3] would be found via something like cache[index-3]. Thanks to python's funny negative index tricks, I don't even have to get the positive modulo.
The delay_map is actually a float array; when I use circ_cache.getFrame I input the integer part of delay_map.flatten(), and then I use the fractional part to interpolate between frames.
class CircularCacheDelayAccess:
def __init__(self, img_shape: tuple, cache_size: int):
self.image_shape = img_shape
self.cache_size = cache_size
# some useful stuff
self.multi_index_shape = (cache_size,) + img_shape
self.image_size = int(np.prod(img_shape))
self.size = cache_size * self.image_size
# the index, going around in circles
self.cache_index = 0
self.cache = np.empty(self.size)
# raveled_image_indices is a running index over a frame; it is the same thing as writing
# y, x, c = np.mgrid[0:height, 0:width, 0:3]
# raveled_image_indices = c + 3 * (x + width * y)
# but it's a lot easier
self.raveled_image_indices = np.arange(self.image_size)
def store(self, image: np.ndarray):
# (in my implementation I check that the shape matches and raise a ValueError if it does not)
self.cache_index = (self.cache_index + 1) % self.cache_size
# since cache holds entire image frames, the start of each frame is index * image size
cIndex = self.image_size * self.cache_index
self.cache[cIndex: cIndex + self.image_size] = image.flatten()
def getFrame(self, delay_map: np.ndarray):
# delay_map may either have shape == self.image_shape, or shape = (self.image_size,)
# (more asserts, for the shape of delay_map, and to check its values do not exceed the cache size)
# (if delay_map.shape == image_shape, I flatten it. If we were already given a flattened version,
# there's no need to do so)
frame = self.cache[self.image_size * (self.cache_index - delay_map) + self.raveled_image_indices]\
.reshape(self.image_shape)
return frame
As I've already stated, this works pretty good, but I think I could get it to work better if I could just side-step the flatten and reshape steps.
Also, keeping a flattened version of an array that makes sense in its full-shaped form is pretty awkward.
And, I've mentioned the interpolation part. It felt wrong to do that in CircularCacheDelayAccess, but doing the interpolation after I getFrame twice means I need the fractional part of delay_map to be in the full-shaped form, and I need the int part flattened, which is pretty silly.\
Here are some fun examples which would probably be pretty hard to understand without seeing the video, but are still fun to look at. It looks even better with a face, but I don't think I should show my face here, so sorry about that:
horizontal rolling shutter, color delay psychedelia, my weirdest effect so far
And here is a link to the entire code, with capture and stuff if you wanna mess around with it and read the entire code.
Thanks in advance!
I have the following snippet:
from aicspylibczi import CziFile
from pathlib import Path
pth = Path('/Volumes/USB/20x_HE.czi')
czi = CziFile(pth)
image, shp = czi.read_image(C=0, M=0) # very slow
The parameters C und M are there to slice the big array in to little numpy pieces.
The File is 3,4GB big and it is taking to long(with 8GB RAM Macbook) so I abort it always.
I think thats not okay because I want to have the first slice of the array, not the whole matrix.
You can try slideio python package (http://slideio.com). It makes use of internal image pyramids. You can read the image partially with high resolution or the whole image with low resolution.
The code below rescales the image so that the width of the delivered raster will be 500 pixels (the height is computed to keep the image size ratio).
import slideio
slide = slideio.open_slidei(file_path="/data/a.czi",driver_id="CZI")
scene = slide.get_scene(0)
block = scene.read_block(size=(500,0))
By slice do you mean the first slice of a z-stack? The package you are using, aicspylibczi, allows you to specify a z coordinate e.g. to read the first z-slice:
image, shp = czi.read_image(C=0, M=0, Z=0)
I am trying to draw a textured square using Python, OpenGL and GLFW.
Here are all the images I need to show you.
Sorry for the way of posting images, but I don't have enough reputation to post more than 2 links (and I can't even post a photo).
I am getting this:
[the second image from the album]
Instead of that:
[the first image from the album]
BUT if I use some different jpg files:
some of them are being displayed properly,
some of them are being displayed properly until I rotate them 90 degrees (I mean using numpy rot90 function on an array with RGB components) and then send them to the GPU. And it looks like that (colors don't change, I only get some distortion):
Before rotation:
[the third image from the album]
After rotation:
[the fourth image from the album]
It all depends on a file.
Does anybody know what I do wrong? Or see anything that I don't see?
Code:
First, I do the thing with initializing glfw, creating a window, etc.
if __name__ == '__main__':
import sys
import glfw
import OpenGL.GL as gl
import numpy as np
from square import square
from imio import imread,rgb_flag,swap_rb
from txio import tx2gpu,txrefer
glfw.glfwInit()
win =glfw.glfwCreateWindow(800,800,"Hello")
glfw.glfwMakeContextCurrent(win)
glfw.glfwSwapInterval(1)
gl.glClearColor(0.75,0.75,0.75,1.0)
Then I load an image using OpenCV imread function and I remember about swapping red with blue. Then I send the image to gpu - I will describe tx2gpu in a minute.
image = imread('../imtools/image/ummagumma.jpg')
if not rgb_flag: swap_rb(image)
#image = np.rot90(image)
tx_id = tx2gpu(image)
The swap_rb() function (defined in a different file, imported):
def swap_rb(mat):
X = mat[:,:,2].copy()
mat[:,:,2] = mat[:,:,0]
mat[:,:,0] = X
return mat
Then comes the main loop (in a while I will describe txrefer and square):
while not glfw.glfwWindowShouldClose(win):
gl.glClear(gl.GL_COLOR_BUFFER_BIT)
txrefer(tx_id); square(2); txrefer(0)
glfw.glfwSwapBuffers(win)
glfw.glfwPollEvents()
And here is the end of the main function:
glfw.glfwDestroyWindow(win)
glfw.glfwTerminate()
NOW IMPORTANT THINGS:
A function that defines a square looks like that:
def square(scale=1.0,color=None,solid=True):
s = scale*.5
if type(color)!=type(None):
if solid:
gl.glBegin(gl.GL_TRIANGLE_FAN)
else:
gl.glBegin(gl.GL_LINE_LOOP)
gl.glColor3f(*color[0][:3]); gl.glVertex3f(-s,-s,0)
gl.glColor3f(*color[1][:3]); gl.glVertex3f(-s,s,0)
gl.glColor3f(*color[2][:3]); gl.glVertex3f(s,s,0)
gl.glColor3f(*color[3][:3]); gl.glVertex3f(s,-s,0)
else:
if solid:
gl.glBegin(gl.GL_TRIANGLE_FAN)
else:
gl.glBegin(gl.GL_LINE_LOOP)
gl.glTexCoord2f(0,0); gl.glVertex3f(-s,-s,0)
gl.glTexCoord2f(0,1); gl.glVertex3f(-s,s,0)
gl.glTexCoord2f(1,1); gl.glVertex3f(s,s,0)
gl.glTexCoord2f(1,0); gl.glVertex3f(s,-s,0)
gl.glEnd()
And texturing functions look like that:
import OpenGL.GL as gl
unit_symbols = [
gl.GL_TEXTURE0,gl.GL_TEXTURE1,gl.GL_TEXTURE2,
gl.GL_TEXTURE3,gl.GL_TEXTURE4,
gl.GL_TEXTURE5,gl.GL_TEXTURE6,gl.GL_TEXTURE7,
gl.GL_TEXTURE8,gl.GL_TEXTURE9,
gl.GL_TEXTURE10,gl.GL_TEXTURE11,gl.GL_TEXTURE12,
gl.GL_TEXTURE13,gl.GL_TEXTURE14,
gl.GL_TEXTURE15,gl.GL_TEXTURE16,gl.GL_TEXTURE17,
gl.GL_TEXTURE18,gl.GL_TEXTURE19,
gl.GL_TEXTURE20,gl.GL_TEXTURE21,gl.GL_TEXTURE22,
gl.GL_TEXTURE23,gl.GL_TEXTURE24,
gl.GL_TEXTURE25,gl.GL_TEXTURE26,gl.GL_TEXTURE27,
gl.GL_TEXTURE28,gl.GL_TEXTURE29,
gl.GL_TEXTURE30,gl.GL_TEXTURE31]
def tx2gpu(image,flip=True,unit=0):
gl.glActiveTexture(unit_symbols[unit])
texture_id = gl.glGenTextures(1)
gl.glBindTexture(gl.GL_TEXTURE_2D,texture_id)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_WRAP_S,gl.GL_REPEAT)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_WRAP_T,gl.GL_REPEAT)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MAG_FILTER,gl.GL_LINEAR)
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MIN_FILTER,gl.GL_LINEAR)
yres,xres,cres = image.shape
from numpy import flipud
gl.glTexImage2D(gl.GL_TEXTURE_2D,0,gl.GL_RGB,xres,yres,0,gl.GL_RGB,gl.GL_UNSIGNED_BYTE,flipud(image))
gl.glBindTexture(gl.GL_TEXTURE_2D,0)
return texture_id
def txrefer(tex_id,unit=0):
gl.glColor4f(1,1,1,1);
gl.glActiveTexture(unit_symbols[unit])
if tex_id!=0:
gl.glEnable(gl.GL_TEXTURE_2D)
gl.glBindTexture(gl.GL_TEXTURE_2D,tex_id)
else:
gl.glBindTexture(gl.GL_TEXTURE_2D,0)
gl.glDisable(gl.GL_TEXTURE_2D)
The problem you have there are alignment issues. OpenGL initial alignment setting for "unpacking" images is that each row starts on a 4 byte boundary. This happens if the image width is not a multiple of 4 or if there are not 4 bytes per pixel. But it's easy enough to change this:
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
would probably do the trick for you. Call it right before glTex[Sub]Image.
Another thing: Your unit_symbols list is completely unnecessary. The OpenGL specification explicitly says that GL_TEXTUREn = GL_TEXTURE0 + n. You can simply do glActiveTexture(GL_TEXTURE0 + n). However when loading a texture image the unit is completely irrelevant; the only thing it may matter is, that loading a texture only goes with binding one, which happens in a texture unit; a texture can be bound in any texture unit desired.
Personally I use the highest texture unit for loading images, to avoid accidently clobbering required state.
So I have an array (it's large - 2048x2048), and I would like to do some element wise operations dependent on where they are. I'm very confused how to do this (I was told not to use for loops, and when I tried that my IDE froze and it was going really slow).
Onto the question:
h = aperatureimage
h[:,:] = 0
indices = np.where(aperatureimage>1)
for True in h:
h[index] = np.exp(1j*k*z)*np.exp(1j*k*(x**2+y**2)/(2*z))/(1j*wave*z)
So I have an index, which is (I'm assuming here) essentially a 'cropped' version of my larger aperatureimage array. *Note: Aperature image is a grayscale image converted to an array, it has a shape or text on it, and I would like to find all the 'white' regions of the aperature and perform my operation.
How can I access the individual x/y values of index which will allow me to perform my exponential operation? When I try index[:,None], leads to the program spitting out 'ValueError: broadcast dimensions too large'. I also get array is not broadcastable to correct shape. Any help would be appreciated!
One more clarification: x and y are the only values I would like to change (essentially the points in my array where there is white, z, k, and whatever else are defined previously).
EDIT:
I'm not sure the code I posted above is correct, it returns two empty arrays. When I do this though
index = (aperatureimage==1)
print len(index)
Actually, nothing I've done so far works correctly. I have a 2048x2048 image with a 128x128 white square in the middle of it. I would like to convert this image to an array, look through all the values and determine the index values (x,y) where the array is not black (I only have white/black, bilevel image didn't work for me). I would then like to take all the values (x,y) where the array is not 0, and multiply them by the h[index] value listed above.
I can post more information if necessary. If you can't tell, I'm stuck.
EDIT2: Here's some code that might help - I think I have the problem above solved (I can now access members of the array and perform operations on them). But - for some reason the Fx values in my for loop never increase, it loops Fy forever....
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8
middle = imsize/2
im = Image.new("L", (imsize,imsize))
draw = ImageDraw.Draw(im)
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2))
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8 #Add 0 padding to make it nice
middle = imsize/2 # The middle (physical 0) of our image will be the imagesize/2
im = Image.new("L", (imsize,imsize)) #Make a grayscale image with imsize*imsize pixels
draw = ImageDraw.Draw(im) #Create a new draw method
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2)) #Bounding box for aperature
if type == 'Rectangle':
draw.rectangle(box, fill = 'white') #Draw rectangle in the box and color it white
del draw
return im, middle
def Diffraction(aperaturediameter = 1, type = 'Rectangle', z = 2000000, wave = .001):
# Constants
deltaF = 1/8 # Image will be 8mm wide
z = 1/3.
wave = 0.001
k = 2*pi/wave
# Now let's get to work
aperature = aperaturediameter * 128 # Aperaturediameter (in mm) to some pixels
im, middle = createImage(aperature, type) #Create an image depending on type of aperature
aperaturearray = np.array(im) # Turn image into numpy array
# Fourier Transform of Aperature
Ta = np.fft.fftshift(np.fft.fft2(aperaturearray))/(len(aperaturearray))
# Transforming and calculating of Transfer Function Method
H = aperaturearray.copy() # Copy image so H (transfer function) has the same dimensions as aperaturearray
H[:,:] = 0 # Set H to 0
U = aperaturearray.copy()
U[:,:] = 0
index = np.nonzero(aperaturearray) # Find nonzero elements of aperaturearray
H[index[0],index[1]] = np.exp(1j*k*z)*np.exp(-1j*k*wave*z*((index[0]-middle)**2+(index[1]-middle)**2)) # Free space transfer for ap array
Utfm = abs(np.fft.fftshift(np.fft.ifft2(Ta*H))) # Compute intensity at distance z
# Fourier Integral Method
apindex = np.nonzero(aperaturearray)
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Ufim = abs(np.fft.fftshift(np.fft.fft2(U))/len(U))
# Save image
fim = Image.fromarray(np.uint8(Ufim))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(Utfm))
ftfm.save("PATH\FTFM.jpg")
print "that may have worked..."
return
if __name__ == '__main__':
Diffraction()
You'll need numpy, scipy, and PIL to work with this code.
When I run this, it goes through the code, but there is no data in them (everything is black). Now I have a real problem here as I don't entirely understand the math I'm doing (this is for HW), and I don't have a firm grasp on Python.
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Should that line work for performing elementwise calculations on my array?
Could you perhaps post a minimal, yet complete, example? One that we can copy/paste and run ourselves?
In the meantime, in the first two lines of your current example:
h = aperatureimage
h[:,:] = 0
you set both 'aperatureimage' and 'h' to 0. That's probably not what you intended. You might want to consider:
h = aperatureimage.copy()
This generates a copy of aperatureimage while your code simply points h to the same array as aperatureimage. So changing one changes the other.
Be aware, copying very large arrays might cost you more memory then you would prefer.
What I think you are trying to do is this:
import numpy as np
N = 2048
M = 64
a = np.zeros((N, N))
a[N/2-M:N/2+M,N/2-M:N/2+M]=1
x,y = np.meshgrid(np.linspace(0, 1, N), np.linspace(0, 1, N))
b = a.copy()
indices = np.where(a>0)
b[indices] = np.exp(x[indices]**2+y[indices]**2)
Or something similar. This, in any case, sets some values in 'b' based on the x/y coordinates where 'a' is bigger than 0. Try visualizing it with imshow. Good luck!
Concerning the edit
You should normalize your output so it fits in the 8 bit integer. Currently, one of your arrays has a maximum value much larger than 255 and one has a maximum much smaller. Try this instead:
fim = Image.fromarray(np.uint8(255*Ufim/np.amax(Ufim)))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(255*Utfm/np.amax(Utfm)))
ftfm.save("PATH\FTFM.jpg")
Also consider np.zeros_like() instead of copying and clearing H and U.
Finally, I personally very much like working with ipython when developing something like this. If you put the code in your Diffraction function in the top level of your script (in place of 'if __ name __ &c.'), then you can access the variables directly from ipython. A quick command like np.amax(Utfm) would show you that there are indeed values!=0. imshow() is always nice to look at matrices.