Let a numpy array video of shape (T,w,h,3) be given. Here T is a positive integer representing number of frames, w is a positive integer representing the width, h is a positive integer representing the height. Every entry of video is an integer from 0 to 255. In other words, video is a numpy array represents a video in the sense that video[t] is an RGB image for every non-negative integer t < T. After video is given, another array of floats time of shape (T) is given. This array time satisfy time[0]=0 and time[t] < time[t+1] for every non-negative integer t < T. An example of the above situation is given here:
import numpy as np
shape = (200, 500, 1000, 3)
random = np.random.randint(0, 255, shape, dtype= np.uint16)
time = np.zeros((shape[0]), dtype = np.float16)
time[0] = 0
for i in range(1, shape[0]):
x = np.random.random_sample()
time[i] = time[i-1] + x
My goal is to save video and time a playable video file such that:
The video file is in format of either avi or mp4 (so that we can just double click it and play it).
Each frame of the video respects the time array in the following sense: for every non-negative integer t < T, the viewer is seeing the picture video[t] during the time period from time[t] to time[t+1]. The moment time[T-1] is the end of the video.
If possible, keep the original size (in the given example the size is (500,1000)).
How can I achieve this? I tried using the opencv's video writer and it seems I have to enter some fps information which I do not have because the time array can be very non-uniform in terms of when each picture is displayed.
That is impossible with OpenCV. OpenCV's VideoWriter only supports fixed/constant frame rate. Anything based on that will require rounding to the nearest frame time and/or higher-than-necessary frame rates and duplicated frames (or rather frames that contain no change).
You want presentation timestamps (PTS). That's an inherent aspect of media containers and streams. A word of caution: some video players may assume a "reasonable" time span between frames, and may glitch otherwise, like becoming laggy/unresponsive because the whole GUI is tied to video timing... That's the fault of the video player though.
Use PyAV. It's the only ffmpeg wrapper for python I know of that actually uses API calls rather than messing around with subprocesses.
Here's the relevant example: https://github.com/PyAV-Org/PyAV/blob/main/examples/numpy/generate_video_with_pts.py
In short: set frame.pts = int(round(my_pts / stream.codec_context.time_base)) where my_pts is something in seconds.
I wrote that example, derived from the sibling "fixed rate" example. I put some effort into getting the ffmpeg API usage "right" (time bases, containers/streams/contexts) but if it happens to fail or act up, you're allowed and encouraged to question what I did there.
The solution to your problem is to generate all the video frames necessary for a given value of FPS and as a video needs a constant frame rate you have to decide first at which granularity you want your video.
After you have decided the FPS value you go and generate all the required video frames, so you can use the export to video method with a constant frame rate.
The numpy arrays representing the image of the frame stay in the video array same as the last one displayed until there is time to change to another one. The chosen frame rate FPS decides then with which accuracy the changes to new frame image hit the specified time values.
Below Python code with an improved version of getting the time values. It generates all the video frames and the explanations are implemented by self-explaining choice of variable names. The logic behind the algorithm used is to generate a single frame image and repeat it as frame of the resulting video as long as the next value on the time axis is not reached. If the next value on the time axis is reached a new image is generated and repeated as long as the video time does not exceed the next time value. The code writes the created data to an .mp4 file:
import numpy as np
import cv2 as cv
FPS = 15
fps_timeDelta = 1/FPS
noOfImages = 5 # 200
imageShape = (210, 297 , 3) # (500, 1000, 3)
vidWriter = cv.VideoWriter(
'opencv_writeVideo.mp4',
cv.VideoWriter_fourcc(*'MPEG'),
FPS, (imageShape[1], imageShape[0 ])
)
vidFrameTime = np.concatenate(
(np.zeros(1), np.add.accumulate(
np.random.random_sample(size=noOfImages)))
)
vidTime = 0.0
indxVidFrameTime = 1
singleImageRGB = np.random.randint(
0, 256, imageShape, dtype= np.uint8)
cv.imshow("singleImageRGB", singleImageRGB/255 )
cv.waitKey(0)
while vidTime <= vidFrameTime[-1]:
vidTime += fps_timeDelta
if vidTime >= vidFrameTime[indxVidFrameTime]:
singleImageRGB = np.random.randint(0, 255, imageShape, dtype= np.uint8)
indxVidFrameTime += 1
vidWriter.write(singleImageRGB)
Related
I've been fooling around lately with taking the webcam's video steam and giving it a pixel-dependent time delay.
A very simple example for that idea is the famous rolling shutter, but when applied in order of seconds instead of 1/100ths, it looks like this https://youtu.be/mQ0hS7l9ckY
Now, rolling shutter is fun and all, but I want something more general. I want a delay map, a (height, width, 3) shaped array that tells my how far back to go in the video. A pseudo-code for this would be
output_image[y, x, c] = video_cache[delay_map[y,x,c], y, x, c]
where the first index of the video cache is time, y,x are self-explanatory, and c is the color channel (BGR because open cv is weird).
In essence, each pixel of the output is a pixel of the video at the same position, but at a time determined by the delay map at the very same position.
Here's the solution I have now: I flattened everything, I access the video cache similar to how you unravel multi-index nonsense, and once I'm done I reshape the result into an image.
This solution works pretty fast, and I'm pretty proud of it. It almost keeps up with my webcam's frame rate (I think I average on 20 of these per second).
I think the flattening and reshaping of each frame costs me some time, and if I could get rid of those I'd get much better results.
Link to the whole file at the bottom.
Here's a skeleton of my implementation.
I have a class called CircularCacheDelayAccess. It stores a cache of video frames (with given number of frames, called cache_size in my implementation). It enables you to store frames, and get the delay-mapped frame.
Instead of pushing all the frames around each time I store a new one, I keep an index that goes around in a circle, and video[delay=3] would be found via something like cache[index-3]. Thanks to python's funny negative index tricks, I don't even have to get the positive modulo.
The delay_map is actually a float array; when I use circ_cache.getFrame I input the integer part of delay_map.flatten(), and then I use the fractional part to interpolate between frames.
class CircularCacheDelayAccess:
def __init__(self, img_shape: tuple, cache_size: int):
self.image_shape = img_shape
self.cache_size = cache_size
# some useful stuff
self.multi_index_shape = (cache_size,) + img_shape
self.image_size = int(np.prod(img_shape))
self.size = cache_size * self.image_size
# the index, going around in circles
self.cache_index = 0
self.cache = np.empty(self.size)
# raveled_image_indices is a running index over a frame; it is the same thing as writing
# y, x, c = np.mgrid[0:height, 0:width, 0:3]
# raveled_image_indices = c + 3 * (x + width * y)
# but it's a lot easier
self.raveled_image_indices = np.arange(self.image_size)
def store(self, image: np.ndarray):
# (in my implementation I check that the shape matches and raise a ValueError if it does not)
self.cache_index = (self.cache_index + 1) % self.cache_size
# since cache holds entire image frames, the start of each frame is index * image size
cIndex = self.image_size * self.cache_index
self.cache[cIndex: cIndex + self.image_size] = image.flatten()
def getFrame(self, delay_map: np.ndarray):
# delay_map may either have shape == self.image_shape, or shape = (self.image_size,)
# (more asserts, for the shape of delay_map, and to check its values do not exceed the cache size)
# (if delay_map.shape == image_shape, I flatten it. If we were already given a flattened version,
# there's no need to do so)
frame = self.cache[self.image_size * (self.cache_index - delay_map) + self.raveled_image_indices]\
.reshape(self.image_shape)
return frame
As I've already stated, this works pretty good, but I think I could get it to work better if I could just side-step the flatten and reshape steps.
Also, keeping a flattened version of an array that makes sense in its full-shaped form is pretty awkward.
And, I've mentioned the interpolation part. It felt wrong to do that in CircularCacheDelayAccess, but doing the interpolation after I getFrame twice means I need the fractional part of delay_map to be in the full-shaped form, and I need the int part flattened, which is pretty silly.\
Here are some fun examples which would probably be pretty hard to understand without seeing the video, but are still fun to look at. It looks even better with a face, but I don't think I should show my face here, so sorry about that:
horizontal rolling shutter, color delay psychedelia, my weirdest effect so far
And here is a link to the entire code, with capture and stuff if you wanna mess around with it and read the entire code.
Thanks in advance!
Currently I have some tracks that simulate people walking around in an area 1280x720pixels that span over 12 hours. The recordings stored are x, y coordinate and the timeframe (in seconds) of the specific recordings.
I want to create a movie sequence that shows how the people are walking over the 12 hours. To do this I split the data into 43200 different frames. This correspond to every frame is one second. My end goal is to use these data in a machine learning algorithm.
The idea is then simple. Initilize the frames, loop through all the x,y coordinate and add them to the array frames with their respective timeframe:
>>> frames = np.zeros((43200, 1280, 720,1))
>>> for track in tracks:
>>> for x,y,time in track:
>>> frames[int(time), y,x] = 255 # to visualize the walking
This will in theory create a frame of 43200 that can be saved as a mp4, gif or some other format and be played. However, the problem occurs when I try to initialize the numpy array:
>>> np.zeros((43200,720,1280,1))
MemoryError: Unable to allocate 297. GiB for an array with shape (43200, 1280, 720, 1) and data type float64
This makes sense because im trying to allocate:
>>> (43200 * 1280 * 720 * 8) * 1024**3
296.630859375
I then thought about saving each frame to a npy file but each file will be 7.4MB which sums up to 320GB.
I also thought about splitting the frames up into five different arrays:
>>> a = np.zeros((8640, 720, 1280, 1))
>>> b = np.zeros((8640, 720, 1280, 1))
>>> c = np.zeros((8640, 720, 1280, 1))
>>> d = np.zeros((8640, 720, 1280, 1))
>>> e = np.zeros((8640, 720, 1280, 1))
But I think that seems cumbersome and it does not feel like the best solution. It will most likely slow the training of my machine learning algorithm. Is there a smarter way to do this?
I would just build the video a few frames at a time, then join the frames together using ffmpeg. There should be no need to store the whole video in memory at once based on the description of the use case.
I think you will have to split your data in different, small arrays, and that probably won't be an issue for machine learning purposes.
However, I don't know if you will be able to create these five numpy arrays as they will also take a total of 297Gb of RAM.
I would probably :
save the numpy arrays as PNGs using for instance matplotlib.pyplot.imsave, or
store them as short videos, as a person won't be seen more than that on your video anyway, or
reduce the fps or the resolution if you really want the whole video in one variable
Let me also add that :
The snippet of code you gave can be executed in a much faster time with frames = np.ones((43200, 1280, 720,1))*255, as intricated for loops are very expensive
If you were to create an array by setting all of its coefficients one by one, it would be more effective to initialize it with np.empty(shape), as it would spare you the time needed to put all the coefficients to zero only to overwrite them in your for loop
I am using the sliding window technic to an image and i am extracting the mean values of pixels of each one window. So the results are someting like this [[[[215.015625][123.55036272][111.66057478]]]].now the question is how could i save all these values for every one window into a txt file or at a CSV because i want to use them for further compare similarities? whatever i tried the error is same..that it is a 4D array and not an 1D or 2D. I ll appreciate any help really.! Thank you in advance
import cv2
import matplotlib.pyplot as plt
import numpy as np
# read the image and define the stepSize and window size
# (width,height)
image2 = cv2.imread("bird.jpg")# your image path
image = cv2.resize(image2, (224, 224))
tmp = image # for drawing a rectangle
stepSize = 10
(w_width, w_height) = (60, 60 ) # window size
for x in range(0, image.shape[1] - w_width, stepSize):
for y in range(0, image.shape[0] - w_height, stepSize):
window = image[x:x + w_width, y:y + w_height, :]
# classify content of the window with your classifier and
# determine if the window includes an object (cell) or not
# draw window on image
cv2.rectangle(tmp, (x, y), (x + w_width, y + w_height), (255, 0, 0), 2) # draw rectangle on image
plt.imshow(np.array(tmp).astype('uint8'))
# show all windows
plt.show()
mean_values=[]
mean_val, std_dev = cv2.meanStdDev(image)
mean_val = mean_val[:3]
mean_values.append([mean_val])
mean_values = np.asarray(mean_values)
print(mean_values)
Human Readable Option
Assuming that you want the data to be human readable, saving the data takes a little bit more work. My search showed me that there's this solution for saving 3D data to a text file. However, it's pretty simple to extend this example to 4D for your use case. This code is taken and adapted from that post, thank you Joe Kington and David Cheung.
import numpy as np
data = np.arange(2*3*4*5).reshape((2,3,4,5))
with open('test.csv', 'w') as outfile:
# We write this header for readable, the pound symbol
# will cause numpy to ignore it
outfile.write('# Array shape: {0}\n'.format(data.shape))
# Iterating through a ndimensional array produces slices along
# the last axis. This is equivalent to data[i,:,:] in this case.
# Because we are dealing with 4D data instead of 3D data,
# we need to add another for loop that's nested inside of the
# previous one.
for threeD_data_slice in data:
for twoD_data_slice in threeD_data_slice:
# The formatting string indicates that I'm writing out
# the values in left-justified columns 7 characters in width
# with 2 decimal places.
np.savetxt(outfile, twoD_data_slice, fmt='%-7.2f')
# Writing out a break to indicate different slices...
outfile.write('# New slice\n')
And then once the data has been saved all you need to do is load it and reshape it (np.load()) will default to reading in the data as a 2D array but np.reshape() will allow us to recover the structure. Again, this code is adapted from the previous post.
new_data = np.loadtxt('test.csv')
# Note that this returned a 2D array!
print(new_data.shape)
# However, going back to 3D is easy if we know the
# original shape of the array
new_data = new_data.reshape((2,3,4,5))
# Just to check that they're the same...
assert np.all(new_data == data)
Binary Option
Assuming that human readability is not necessary, I would recommend using the built-in *.npy format which is described here. This stores the data in a binary format.
You can save the array by doing np.save('NAME_OF_ARRAY.npy', ARRAY_TO_BE_SAVED) and then load it with SAVED_ARRAY = np.load('NAME_OF_ARRAY.npy').
You can also save several numpy array in a single zip file with the np.savez() function like so np.savez('MANY_ARRAYS.npz', ARRAY_ONE, ARRAY_TWO). And you load the zipped arrays in a similar fashion SEVERAL_ARRAYS = np.load('MANY_ARRAYS.npz').
I am attempting to make a painting based on the mass of the universe with pi and the gravitational constant of earth at sea level converted to binary. i've done the math and i have the right dimentions and it should only be less than a megabyte of ram but im running into maximum allowed dimention exceeded value error.
Here is the code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
boshi = 123456789098765432135790864234579086542098765432135321 # universal mass
genesis = boshi ** 31467 # padding
artifice = np.binary_repr(genesis) # formatting
A = int(artifice)
D = np.array(A).reshape(A, (1348, 4117))
plt.imsave('hello_world.png', D, cmap=cm.gray) # save image
I keep running into the error at D = np.array..., and maybe my reshape is too big but its only a little bigger than 4k. seems like this should be no problem for gpu enhanced colab. Doesn't run on my home machine either with the same error. Would this be fixed with more ram?
Making it Work
The problem is that artifice = np.binary_repr(genesis) creates a string. The string consists of 1348 * 4117 = 5549716 digits, all of them zeros and ones. If you convert the string to a python integer, A = int(artifice), you will (A) wait a very long time, and (B) get a non-iterable object. The array you create with np.array(A) will have a single element.
The good news is that you can bypass the time-consuming step entirely using the fact that the string artifice is already an iterable:
D = np.array(list(artifice), dtype=np.uint8).reshape(1348, 4117)
The step list(artifice) will take a couple of seconds since it has to split up the string, but everything else should be quite fast.
Plotting is easy from there with plt.imsave('hello_world.png', D, cmap=cm.gray):
Colormaps
You can easily change the color map to coolwarm or whatever you want when you save the image. Keep in mind that your image is binary, so only two of the values will actually matter:
plt.imsave('hello_world2.png', D, cmap=cm.coolwarm)
Exploration
You have an opportunity here to add plenty of color to your image. Normally, a PNG is 8-bit. For example, instead of converting genesis to bits, you can take the bytes from it to construct an image. You can also take nibbles (half-bytes) to construct an indexed image with 16 colors. With a little padding, you can even make sure that you have a multiple of three data points, and create a full color RGB image in any number of ways. I will not go into the more complex options, but I would like to explore making a simple image from the bytes.
5549716 bits is 693715 = 5 * 11 * 12613 bytes (with four leading zero bits). This is a very nasty factorization leading to an image size of 55x12613, so let's remove that upper nibble: while 693716's factorization is just as bad as 693715's, 693714 factors very nicely into 597 * 1162.
You can convert your integer to an array of bytes using its own to_bytes method:
from math import ceil
byte_genesis = genesis.to_bytes(ceil(genesis.bit_length() / 8), 'big')
The reason that I use the built-in ceil rather than np.ceil is that it return an integer rather than a float.
Converting the huge integer is very fast because the bytes object has direct access to the data of the integer: even if it makes a copy, it does virtually no processing. It may even share the buffer since both bytes and int are nominally immutable. Similarly, you can create a numpy array from the bytes as just a view to the same memory location using np.frombuffer:
img = np.frombuffer(byte_genesis, dtype=np.uint8)[1:].reshape(597, 1162)
The [1:] is necessary to chop off the leading nibble, since bytes_genesis must be large enough to hold the entirety of genesis. You could also chop off on the bytes side:
img = np.frombuffer(byte_genesis[1:], dtype=np.uint8).reshape(597, 1162)
The results are identical. Here is what the picture looks like:
plt.imsave('hello_world3.png', img, cmap=cm.viridis)
The result is too large to upload (because it's not a binary image), but here is a randomly selected sample:
I am not sure if this is aesthetically what you are looking for, but hopefully this provides you with a place to start looking at how to convert very large numbers into data buffers.
More Options, Because this is Interesting
I wanted to look at using nibbles rather than bytes here, since that would allow you to have 16 colors per pixel, and twice as many pixels. You can get an 1162x1194 image starting from
temp = np.frombuffer(byte_genesis, dtype=np.uint8)[1:]
Here is one way to unpack the nibbles:
img = np.empty((1162, 1194), dtype=np.uint8)
img.ravel()[::2] = np.bitwise_and(temp >> 4, 0x0F)
img.ravel()[1::2] = np.bitwise_and(temp, 0x0F)
With a colormap like jet, you get:
plt.imsave('hello_world4.png', img, cmap=cm.jet)
Another option, going in the opposite direction in a manner of speaking) is not to use colormaps at all. Instead, you can divide your space by a factor of three and generate your own colors in RGB space. Luckily, one of the prime factors of 693714 is 3. You can therefore have a 398x581 image (693714 == 3 * 398 * 581). How you interpret the data is even more than usual up to you.
Side Note Before I Continue
With the black-and-white binary image, you could control the color, size and orientation of the image. With 8-bit data, you could control how the bits were sampled (8 or fewer, as in the 4-bit example), the endianness of your interpretation, the color map, and the image size. With full color, you can treat each triple as a separate color, treat the entire dataset as three consecutive color planes, or even do something like apply a Bayer filter to the array. All in addition to the other options like size, ordering, number of bits per sample, etc.
The following will show the color triples and three color planes options for now.
Full Color Images
To treat each set of 3 consecutive bytes as an RGB triple, you can do something like this:
img = temp.reshape(398, 581, 3)
plt.imsave('hello_world5.png', img)
Notice that there is no colormap in this case.
Interpreting the data as three color planes requires an extra step because plt.imsave expects the last dimension to have size 3. np.rollaxis is a good tool for this:
img = np.rollaxis(temp.reshape(3, 398, 581), 0, 3)
plt.imsave('hello_world6.png', img)
I could not reproduce your problem, because the line A = int(artifice) took like forever. I replaced it with a ,for loop to cast each digit on its own. The code worked then and produced the desired image.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
boshi = 123456789098765432135790864234579086542098765432135321
genesis = boshi ** 31467
artifice = np.binary_repr(genesis)
D = np.zeros((1348, 4117), dtype=int)
for i, val in enumerate(D):
D[i] = int(artifice[i])
plt.imsave('hello_world.png', D, cmap=cm.gray)
I'm in the process of building an automated game bot in Python on OS X 10.8.2 and in the process of researching Python GUI automation I discovered autopy. The mouse manipulation API is great, but it seems that the screen capture methods rely on deprecated OpenGL methods...
Are there any efficient ways of getting the color value of a pixel in OS X? The only way I can think of now is to use os.system("screencapture foo.png") but the process seems to have unneeded overhead as I'll be polling very quickly.
A small improvement, but using the TIFF compression option for screencapture is a bit quicker:
$ time screencapture -t png /tmp/test.png
real 0m0.235s
user 0m0.191s
sys 0m0.016s
$ time screencapture -t tiff /tmp/test.tiff
real 0m0.079s
user 0m0.028s
sys 0m0.026s
This does have a lot of overhead, as you say (the subprocess creation, writing/reading from disc, compressing/decompressing).
Instead, you could use PyObjC to capture the screen using CGWindowListCreateImage. I found it took about 70ms (~14fps) to capture a 1680x1050 pixel screen, and have the values accessible in memory
A few random notes:
Importing the Quartz.CoreGraphics module is the slowest part, about 1 second. Same is true for importing most of the PyObjC modules. Unlikely to matter in this case, but for short-lived processes you might be better writing the tool in ObjC
Specifying a smaller region is a bit quicker, but not hugely (~40ms for a 100x100px block, ~70ms for 1680x1050). Most of the time seems to be spent in just the CGDataProviderCopyData call - I wonder if there's a way to access the data directly, since we dont need to modify it?
The ScreenPixel.pixel function is pretty quick, but accessing large numbers of pixels is still slow (since 0.01ms * 1650*1050 is about 17 seconds) - if you need to access lots of pixels, probably quicker to struct.unpack_from them all in one go.
Here's the code:
import time
import struct
import Quartz.CoreGraphics as CG
class ScreenPixel(object):
"""Captures the screen using CoreGraphics, and provides access to
the pixel values.
"""
def capture(self, region = None):
"""region should be a CGRect, something like:
>>> import Quartz.CoreGraphics as CG
>>> region = CG.CGRectMake(0, 0, 100, 100)
>>> sp = ScreenPixel()
>>> sp.capture(region=region)
The default region is CG.CGRectInfinite (captures the full screen)
"""
if region is None:
region = CG.CGRectInfinite
else:
# TODO: Odd widths cause the image to warp. This is likely
# caused by offset calculation in ScreenPixel.pixel, and
# could could modified to allow odd-widths
if region.size.width % 2 > 0:
emsg = "Capture region width should be even (was %s)" % (
region.size.width)
raise ValueError(emsg)
# Create screenshot as CGImage
image = CG.CGWindowListCreateImage(
region,
CG.kCGWindowListOptionOnScreenOnly,
CG.kCGNullWindowID,
CG.kCGWindowImageDefault)
# Intermediate step, get pixel data as CGDataProvider
prov = CG.CGImageGetDataProvider(image)
# Copy data out of CGDataProvider, becomes string of bytes
self._data = CG.CGDataProviderCopyData(prov)
# Get width/height of image
self.width = CG.CGImageGetWidth(image)
self.height = CG.CGImageGetHeight(image)
def pixel(self, x, y):
"""Get pixel value at given (x,y) screen coordinates
Must call capture first.
"""
# Pixel data is unsigned char (8bit unsigned integer),
# and there are for (blue,green,red,alpha)
data_format = "BBBB"
# Calculate offset, based on
# http://www.markj.net/iphone-uiimage-pixel-color/
offset = 4 * ((self.width*int(round(y))) + int(round(x)))
# Unpack data from string into Python'y integers
b, g, r, a = struct.unpack_from(data_format, self._data, offset=offset)
# Return BGRA as RGBA
return (r, g, b, a)
if __name__ == '__main__':
# Timer helper-function
import contextlib
#contextlib.contextmanager
def timer(msg):
start = time.time()
yield
end = time.time()
print "%s: %.02fms" % (msg, (end-start)*1000)
# Example usage
sp = ScreenPixel()
with timer("Capture"):
# Take screenshot (takes about 70ms for me)
sp.capture()
with timer("Query"):
# Get pixel value (takes about 0.01ms)
print sp.width, sp.height
print sp.pixel(0, 0)
# To verify screen-cap code is correct, save all pixels to PNG,
# using http://the.taoofmac.com/space/projects/PNGCanvas
from pngcanvas import PNGCanvas
c = PNGCanvas(sp.width, sp.height)
for x in range(sp.width):
for y in range(sp.height):
c.point(x, y, color = sp.pixel(x, y))
with open("test.png", "wb") as f:
f.write(c.dump())
I came across this post while searching for a solution to get screenshot in Mac OS X used for real-time processing. I have tried using ImageGrab from PIL as suggested in some other posts but couldn't get the data fast enough (with only about 0.5 fps).
The answer https://stackoverflow.com/a/13024603/3322123 in this post to use PyObjC saved my day! Thanks #dbr!
However, my task requires to get all pixel values rather than just a single pixel, and also to comment on the third note by #dbr, I added a new method in this class to get a full image, in case anyone else might need it.
The image data are returned as a numpy array with dimension of (height, width, 3), which can be directly used for post-processing in numpy or opencv etc… getting individual pixel values from it also becomes pretty trivial using numpy indexing.
I tested the code with a 1600 x 1000 screenshot - getting the data using capture() took ~30 ms and converting it to a np array getimage() takes only ~50 ms on my Macbook. So now I have >10 fps and even faster for smaller regions.
import numpy as np
def getimage(self):
imgdata=np.fromstring(self._data,dtype=np.uint8).reshape(len(self._data)/4,4)
return imgdata[:self.width*self.height,:-1].reshape(self.height,self.width,3)
note I throw away the “alpha” channel from the BGRA 4 channel.
This was all so very helpful I had to come back to comment / however I don't have the reputation.. I do, however, have a sample code of a combination of the answers above for a lightning quick screen capture / save thanks to #dbr and #qqg!
import time
import numpy as np
from scipy.misc import imsave
import Quartz.CoreGraphics as CG
image = CG.CGWindowListCreateImage(CG.CGRectInfinite, CG.kCGWindowListOptionOnScreenOnly, CG.kCGNullWindowID, CG.kCGWindowImageDefault)
prov = CG.CGImageGetDataProvider(image)
_data = CG.CGDataProviderCopyData(prov)
width = CG.CGImageGetWidth(image)
height = CG.CGImageGetHeight(image)
imgdata=np.fromstring(_data,dtype=np.uint8).reshape(len(_data)/4,4)
numpy_img = imgdata[:width*height,:-1].reshape(height,width,3)
imsave('test_fast.png', numpy_img)