I'm interested in migrating from psychtoolbox to shady for my stimulus presentation. I looked through the online docs, but it is not very clear to me how to replicate what I'm currently doing in matlab in shady.
What I do is actually very simple. For each trial,
I load from disk a single image (I do luminance linearization off-line), which contains all the frames I plan to display in that trial (the stimulus is 1000x1000 px, and I present 25 frames, hence the image is 5000x5000px. I only use BW images, so I have a single int8 value per pixel).
I transfer the entire image from the CPU to the GPU
At some point (externally controlled) I copy the first frame to the video buffer and present it
At some other point (externally controlled) I trigger the presentation of the
remaining 24 frames (copying the relevant part of the image to video buffer for each video frame, and then calling flip()).
The external control happens by having another machine communicate with the stimulus presentation code over TCP/IP. After the control PC sends a command to the presentation PC and this is executed, the presentation PC needs to send back an acknowledgement message to the control PC. I need to send three ACK messages, one when the first frame appears on screen, one when the 2nd frame appears on screen, and one when the 25th frame appears on screen (this way the control PC can easily verify if a frame has been dropped).
In matlab I do this by calling the blocking method flip() to present a frame, and when it returns I send the ACK to the control PC.
That's it. How would I do that in shady? Is there an example that I should look at?
The places to look for this information are the docstrings of Shady.Stimulus and Shady.Stimulus.LoadTexture, as well as the included example script animated-textures.py.
Like most things Python, there are multiple ways to do what you want. Here's how I would do it:
w = Shady.World()
s = w.Stimulus( [frame00, frame01, frame02, ...], multipage=True )
where each frameNN is a 1000x1000-pixel numpy array (either floating-point or uint8).
Alternatively you can ask Shady to load directly from disk:
s = w.Stimulus('trial01/*.png', multipage=True)
where directory trial01 contains twenty-five 1000x1000-pixel image files, named (say) 00.png through 24.png so that they get sorted correctly. Or you could supply an explicit list of filenames.
Either way, whether you loaded from memory or from disk, the frames are all transferred to the graphics card in that call. You can then (time-critically) switch between them with:
s.page = 0 # or any number up to 24 in your case
Note that, due to our use of the multipage option, we're using the "page" animation mechanism (create one OpenGL texture per frame) instead of the default "frame" mechanism (create one 1000x25000 OpenGL texture) because the latter would exceed the maximum allowable dimensions for a single texture on many graphics cards. The distinction between these mechanisms is discussed in the docstring for the Shady.Stimulus class as well as in the aforementioned interactive demo:
python -m Shady demo animated-textures
To prepare the next trial, you might use .LoadPages() (new in Shady version 1.8.7). This loops through the existing "pages" loading new textures into the previously-used graphics-card texture buffers, and adds further pages as necessary:
s.LoadPages('trial02/*.png')
Now, you mention that your established workflow is to concatenate the frames as a single 5000x5000-pixel image. My solutions above assume that you have done the work of cutting it up again into 1000x1000-pixel frames, presumably using numpy calls (sounds like you might be doing the equivalent in Matlab at the moment). If you're going to keep saving as 5000x5000, the best way of staying in control of things might indeed be to maintain your own code for cutting it up. But it's worth mentioning that you could take the entirely different strategy of transferring it all in one go:
s = w.Stimulus('trial01_5000x5000.png', size=1000)
This loads the entire pre-prepared 5000x5000 image from disk (or again from memory, if you want to pass a 5000x5000 numpy array instead of a filename) into a single texture in the graphics card's memory. However, because of the size specification, the Stimulus will only show the lower-left 1000x1000-pixel portion of the array. You can then switch "frames" by shifting the carrier relative to the envelope. For example, if you were to say:
s.carrierTranslation = [-1000, -2000]
then you would be looking at the frame located one "column" across and two "rows" up in your 5x5 array.
As a final note, remember that you could take advantage of Shady's on-the-fly gamma-correction and dithering–they're happening anyway unless you explicitly disable them, though of course they have no physical effect if you leave the stimulus .gamma at 1.0 and use integer pixel values. So you could generate your stimuli as separate 1000x1000 arrays, each containing unlinearized floating-point values in the range [0.0,1.0], and let Shady worry about everything beyond that.
Related
Background
For a research project, we are recording video data from two cameras and feed a synchronization pulse directly into the microphone ADC every second.
Problem
We want to derive a frame time stamp in the clock of the pulse source for each camera frame to relate the camera images temporally. With our current methods (see below), we get a frame offset of around 2 frames between the cameras. Unfortunately, inspection of the video shows that we are clearly 6 frames off (at least at one point) between the cameras.
I assume that this is because we are relating audio and video signal wrong (see below).
Approach I think I need help with
I read that in the MP4 container, there should be PTS times for video and audio. How do we access those programmatically. Python would be perfect, but if we have to call ffmpeg via system calls, we may do that too ...
What we currently fail with
The original idea was to find video and audio times as
audio_sample_times = range(N_audiosamples)/audio_sampling_rate
video_frame_times = range(N_videoframes)/video_frame_rate
then identify audio_pulse_times in audio_sample_times base, calculate the relative position of each video_time to the audio_pulse_times around it, and select the same relative value to the corresponding source_pulse_times.
However, a first indication that this approach is problematic is already that for some videos, N_audiosamples/audio_sampling_rate differs from N_videoframes/video_frame_rate by multiple frames.
What I have found by now
OpenCV's cv2.CAP_PROP_POS_MSEC seems to do exactly what we do, and not access any PTS ...
Edit: What I took from the winning answer
container = av.open(video_path)
signal = []
audio_sample_times = []
video_sample_times = []
for frame in tqdm(container.decode(video=0, audio=0)):
if isinstance(frame, av.audio.frame.AudioFrame):
sample_times = (frame.pts + np.arange(frame.samples)) / frame.sample_rate
audio_sample_times += list(sample_times)
signal_f_ch0 = frame.to_ndarray().reshape((-1, len(frame.layout.channels))).T[0]
signal += list(signal_f_ch0)
elif isinstance(frame, av.video.frame.VideoFrame):
video_sample_times.append(float(frame.pts*frame.time_base))
signal = np.abs(np.array(signal))
audio_sample_times = np.array(audio_sample_times)
video_sample_times = np.array(video_sample_times)
Unfortunately, in my particular case, all pts are consecutive and gapless, so the result is the same as with the naive solution ...
By picture clues, we identified a section of ~10s in the videos, somewhere in which they desync, but can't find any traces of that in the data.
You need to run ffprobe to retrieve the PTS times. I don't know the exact command, but if you're ok with another package, try ffmpegio:
pip install ffmpegio-core
// OR
pip install ffmpegio // if you also want to use it to read video frames & audio samples
If you're on Windows, see this doc on where ffmpeg.exe can be found automatically.
Then if you can run
import ffmpegio
frames = ffmpegio.probe.frames('video.mp4', intervals=10)
This will return the frames info as a list of dicts of the first 10 packets (of mixed streams in the order of pts). If you remove the intervals argument, it'll retrieve every frame (will take a long time).
Inspect each dict of frames and decide which entries you need (say 'media_type', 'stream_index', pts and pts_time). Then add entries argument containing these:
frames = ffmpegio.probe.frames('video.mp4', intervals=10,
entries=['media_type', 'stream_index', 'pts','pts_time'])
Once you're happy with what it returns, incorporate to your program.
The intervals argument accepts many different formats, please read the doc.
What this or any other FFmpeg-based approach does not offer you is getting this info with the data frames. You need to read in the frame timing data separately and mesh them with the data yourself. If you prefer a solution with more control (but perhaps more coding) look into pyav, which interfaces the underlying library of FFmpeg. I'm fairly certain you can retrieve pts simultaneously with framedata.
Disclaimer: This function has not been tested extensively. So, you may encounter an issue. If you have, please report on GitHub and I'll fix it asap.
I'm writing a little application to generate a GIF from a kifu file (it's a type of file used to save a game in Japanese chess). I'm using Matplotlib currently to draw the board and the pieces, and the matplotlib.animation.FuncAnimation class combined with numpngw.AnimatedPNGWriter to write the gif. However, it uses more than 800MB of RAM to generate a single gif with 80 frames. After reflection, this value seems not surprising, because (from my understanding), each frame has a dimension of 1700x1000 and is in color. So, to keep every frame in frame, it needs a minimum of 1700*1000*80*(nb_bytes by pixel), which is a huge amount of RAM.
Is there a way to minimize this amount either with matplotlib or with another library? I suppose I need to compress frames after creating them instead of keeping them raw but I can't figure out how to do that.
Thank you very much
I have a collection of rather large tiff files (typical resolution is 30k x 30k to 50k x 50k). These have some interesting properties: 1. they contain several different resolutions of the same image, and 2. the data seems to be compressed as JPEG tiles (512x512). They also seem to be BigTIFFs.
I'd like to read specific regions of interest (more or less random access pattern) and I'd like it to be as fast as possible (~about as fast as decompressing the needed tiles plus a little bit of overhead). Decompressing the whole file, or even one of the resolution layers, is not practical. I'm using python.
I tried opening with skimage.io.MultiImage('test.tiff'))[level], and this works (produces correct data). It does decompress each resolution layer completely, although not the whole file; so okay at the low resolutions but not really useful for the highest resolutions. There didn't seem to be any way to select a region of interest in skimage.io, or in any of the libraries it uses (imread, PIL, ...).
I also tried using OpenSlide using img.read_region((x,y), level, (width, height)). This library seems made exactly for this type of data, and is very fast, but unfortunately produces incorrect data for some regions. Until the bug is fixed upstream, I can't use it.
Lastly, using a very recent version of tifffile=2020.6.3 and imagecodecs=2020.5.30 (older versions don't work - I think at least 2018.10.18 is needed), I could list the tiles, using code modified from the tifffile documentation:
with tifffile.TiffFile('test.tiff') as tif:
fh = tif.filehandle
for page in tif.pages:
jpegtables = page.tags.get('JPEGTables', None)
if jpegtables is not None:
jpegtables = jpegtables.value
for index, (offset, bytecount) in enumerate(
zip(page.dataoffsets, page.databytecounts)
):
fh.seek(offset)
data = fh.read(bytecount)
tile, indices, shape = page.decode(data, index, jpegtables)
print(tile.shape, indices, shape)
It seems the page.decode() call actually decompresses each tile (tile is a numpy array with pixel data). It is not obvious how to only get the index but not decompress. I'm also not sure how fast this would be. This leaves any selection of a region of interest and reassembly of tiles as an exercise to the user.
How do I efficiently read regions of interest out of files like this? Does someone have example code to do that with tifffile? Or, is there another library that would do the trick?
I saved the image to the clipboard, and when I read the image information from the clipboard and saved it locally, the image quality changed. How can I save it to maintain the original high quality?
from PIL import ImageGrab
im = ImageGrab.grabclipboard()
im.save('somefile.png','PNG')
I tried adding the parameter 'quality=95' in im.save(), but it didn't work. The original image quality is 131K, and the saved image is 112K.
The size of the file is not directly related to the quality of the image. It also depends on how efficiently the encoder does its job. As it is PNG, the process is lossless, so you don't need to worry - the quality is retained.
Note that the quality parameter has a different meaning when saving JPEG files versus PNG files:
With JPEG files, if you specify a lower quality you are effectively allowing the encoder to discard more information and give up image quality in return for a smaller file size.
With PNG, your encoding and decoding are lossless. The quality is a hint to the decoder as to how much time to spend compressing the file (always losslessly) and about the types of filtering/encoding that may suit best. It is more akin to the parameter to gzip like --best or --fast.
Further information about PNG format is here on Wikipedia.
Without analysing the content of the two images it is impossible to say why the sizes differ - there could be many reasons:
One encoder may have noticed that the image contains fewer than 256 colours and so has decided to use a palette whereas the other may not have done. That could make the images size differ by a factor of 3 times, yet the quality would be identical.
One encoder may use a larger buffer and spend longer looking for repeating patterns in the image. For a simplistic example, imagine the image was 32,000 pixels wide and each line was the same as the one above. If one encoder uses an 8kB buffer, it can never spot that the image just repeats over and over down the page so it has to encode every single line in full, whereas an encoder with a 64kB buffer might just be able to use 1 byte per line and use the PNG filtering to say "same as line above".
One encoder might decide, on grounds of simplicity of code or for lack of code space, to always encode the data in a 16-bit version even if it could use just 8 bits.
One encoder might decide it is always going to store an alpha layer even if it is opaque because that may make the code/data cleaner simpler.
One encoder may always elect to do no filtering, whilst the other has the code required to do sub, up, average or Paeth filtering.
One encoder may not have enough memory to hold the entire image, so it may have to use a simplistic approach to be assured that it can handle whatever turns up later in the image stream.
I just made these examples up - don't take them was gospel - I am just trying to illustrate some possibilities.
To reproduce an exact copy of file from a clipboard, the only way is if the clipboard contains a byte-for-byte copy of the original. This does not happen when the content comes from the "Copy" function in a program.
In theory a program could be created to do that by setting a blob-type object with a copy of the original file, but that would be highly inefficient and defeat the purpose of the clipboard.
Some points:
- When you copy into the clipboard using the file manager, the clipboard will have a reference to the original file (not the entire file which can potentially be much larger than ram)
- Most programs will set the clipboard contents to some "useful version" of the displayed or selected data. This is very much subject to interpretation by the creator of the program.
- Parsing the clipboard content when reading an image is again subject to the whims of the library used to process the data and pack it back into an image format.
Generally if you want to copy a file exactly you will be better off just copying the original file.
Having said that: Evaluate the purpose of the copy-paste process and decide whether the data you get from the clipboard is "good enough" for the intended purpose. This obviously depends on what you want to use it for.
I have a logic analyser project that records several hundred million 16bit values (~100-500 million) and I need to display anything from a few hundred samples to the entire capture as the user zooms.
When you zoom out the whole system gets a huge performance hit as it's loading a massive chunk from the file.
I just though this morning that it would be more efficient to "stride" through the file at the users screen resolution. You can't physically display anything between pixels anyways. This doesn't solve the massive file size hit in memory though.
Is there away I can take a huge data set and stream chunk it down efficiently?
I was thinking streaming from start to start + view size by horiz resolution. This makes a very choppy zoom though.
Program uses python but I am open to calling something in c if it already exists.
Well, I don't know if this is actually question on programming or design overall.
For "zooming" problem with vizualizations I suggest:
Have pre-computed/cached version for some zoom levels. Ideally, gradation should be calculated based on user behaviour.
When user zooms-in, you simultaneously
calculate "proper" data or load pre-computed aggregated data of deeper zoom layer and crop it by your view frame
cheat by rendering low-res data from previous layer or smooth it by some approximation (but make sure to somehow tell user that data is not finalized)
Aside of it, think if you can optimize the way you store data. Trees may make your life way easier, both for partial disk read/search and for storing aggregated data.
In my opinion, there is no point to display even a few hundred samples unless they form some kind of image/shape. I guess one can look at hundred numbers if they are properly structured (colored). Several hundred - doubt it - here you replace actual data with some visualization (plots, charts, maps, ...).
To approach the problem you may define some rule to stop displaying actual data at all. For instance, if digit height becomes less than, say, 10 pixels you display some kind of message selected numbers are from rows 200...300, columns 400..500 or some graphical alterantive with corner coordinates and amount of numbers.