I'm trying to create a data set from an avi file I have and I know I've made a mistake somewhere.
The Avi file I have is 1,827 KB (4:17) but after running my code to convert the frames into arrays of number I now have a file that is 1,850,401 KB. This seems a little large to me.
How can I reduce the size of my data set / where did I go wrong?
# Program To Read video
# and Extract Frames
import cv2
import numpy as np
import time
# Function to extract frames
def FrameCapture(path):
# Path to video file
vidObj = cv2.VideoCapture(path)
# Used as counter variable
count = 0
# checks whether frames were extracted
success = 1
newDataSet = []
try:
while success:
# vidObj object calls read
# function extract frames
success, image = vidObj.read()
img_reverted = cv2.bitwise_not(image)
new_img = img_reverted / 255.0
newDataSet.append(new_img)
#new_img >> "frame%d.txt" % count
# Saves the frames with frame-count
#cv2.imwrite("frame%d.jpg" % count, image)
count += 1
except:
timestr = time.strftime("%Y%m%d-%H%M%S")
np.save("DataSet" + timestr , newDataSet)
# Driver Code
if __name__ == '__main__':
# Calling the function
FrameCapture("20191212-150041output.avi")
I'm going to guess that the video mainly consist of similar pixels blocked together that the video have compressed to such a low file size. When you load single images into arrays all that compression goes away and depending on the fps of the video you will have thousands of uncompressed images. When you first load an image it will be saved as a numpy array of dtype uint8 and the image size will be WIDTH * HEIGHT * N_COLOR_CHANNELS bytes. After you divide it with 255.0 to normalize between 0 and 1 the dtype changes to float64 and the image size increases eightfold. You can use this information to calculate expected size of the images.
So your options is to either decrease the height and width of your images (downscale), change to grayscale or if your application allows it to stick with uint8 values. If the images doesn't change too much and you don't need thousands of them you could also only save every 10th or whatever seems reasonable. If you need them all as is but they don't fit in memory consider using a generator to load them on demand. It will be slower but at least it will run.
Related
I am attempting to convert a sound file into an image, then back into that same sound file in Python. First, I'm reading the .wav with python's wave library, extract the frames, and then arrange the bytes as RGB tuples in a square image.
The output is cool and looks like this
but when I try to convert the image back to a soundfile, the result is horrid. Not sure what I'm doing wrong here
import wave
from PIL import Image
import numpy as np
from math import sqrt
w = wave.open("sample.wav", mode = "rb")
frames = w.readframes(w.getnframes())
pixels = []
#####FRAMES CONVERTED TO PIXEL TUPLES######
for i in range(0,w.getnframes(),3):
pixels.append((frames[i],frames[i+1],frames[i+2]))
#####FIT TO SQUARE IMAGE#####
dimensions = int(sqrt(w.getnframes()/3))
img = []
for x in range(0,dimensions):
row = []
for y in range(0,dimensions):
row.append(pixels[x*dimensions+y])
img.append(row)
array = np.array(img, dtype=np.uint8)
new_image = Image.fromarray(array)
new_image.save('new.png')
p = Image.open("new.png",mode="r")
flatten = [x for sets in list(p.getdata()) for x in sets]
###### WAV RE-CREATION ######
sampleRate = w.getframerate() # hertz
obj = wave.open('sound.wav','w')
obj.setnchannels(w.getnchannels())
obj.setsampwidth(2)
obj.setframerate(sampleRate)
for i in range(0,len(flatten)):
obj.writeframesraw(( flatten[i]).to_bytes(8,"big") )
obj.close()
You are introducing loss in your conversion to pixels.
First, you will lose one or two frames at the end with for i in range(0,w.getnframes(),3):, when the number of frames is not a multiple of three.
Second, your dimensions = int(sqrt(w.getnframes()/3)) and then writing dimensions squared pixels will lose many frames when the number of frames divided by three is not a square.
Third, and most importantly, you are ignoring the sample width, as well as the number of channels. You are only saving the low eight bits of each sample in the image. If the sample width is 16 bits, you are essentially saving noise in the image.
I am trying to create a video editor, where obviously, you will be able to remove and add frames. My thinking, was to convert the video file itself into an array of frames which can then be manipulated.
Using this answers code, I did that. This works fine for small video files, but with big video files, a memory error can quickly occur - because, of course, memory is storing hundreds of uncompressed images.
This is the exact code I am using:
import numpy
import cv2
def video_to_frames(file):
"""Splits a video file into a numpy array of frames"""
video = cv2.VideoCapture(file)
frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
frame_width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
buffer = numpy.empty((frame_count, frame_height, frame_width, 3), numpy.dtype("uint8"))
index_count = 0
running = True
while(index_count < frame_count and running): #Reads each frame to the array
running, buffer[index_count] = video.read()
index_count += 1
video.release()
return buffer #Returns the numpy array of frames
print(video_to_frames("Video.mp4"))
Finally, here is the exact memory error I got: MemoryError: Unable to allocate 249. GiB for an array with shape (46491, 1000, 1920, 3) and data type uint8
So I have two questions really:
Is this the most efficient way to go about manipulating a video?
If it is, how can I go about storing all those frames without running into a memory error?
Thank you.
In python, im loading in a gif with PIL. I extract the first frame, modify it, and put it back. I save the modified gif with the following code
imgs[0].save('C:\\etc\\test.gif',
save_all=True,
append_images=imgs[1:],
duration=10,
loop=0)
Where imgs is an array of images that makes up the gif, and duration is the delay between frames in milliseconds. I'd like to make the duration value the same as the original gif, but im unsure how to extract either the total duration of a gif or the frames displayed per second.
As far as im aware, the header file of gifs does not provide any fps information.
Does anyone know how i could get the correct value for duration?
Thanks in advance
Edit: Example of gif as requested:
Retrived from here.
In GIF files, each frame has its own duration. So there is no general fps for a GIF file. The way PIL supports this is by providing an info dict that gives the duration of the current frame. You could use seek and tell to iterate through the frames and calculate the total duration.
Here is an example program that calculates the average frames per second for a GIF file.
import os
from PIL import Image
FILENAME = os.path.join(os.path.dirname(__file__),
'Rotating_earth_(large).gif')
def get_avg_fps(PIL_Image_object):
""" Returns the average framerate of a PIL Image object """
PIL_Image_object.seek(0)
frames = duration = 0
while True:
try:
frames += 1
duration += PIL_Image_object.info['duration']
PIL_Image_object.seek(PIL_Image_object.tell() + 1)
except EOFError:
return frames / duration * 1000
return None
def main():
img_obj = Image.open(FILENAME)
print(f"Average fps: {get_avg_fps(img_obj)}")
if __name__ == '__main__':
main()
If you assume that the duration is equal for all frames, you can just do:
print(1000 / Image.open(FILENAME).info['duration'])
I have two videos. One at 10 fps and other one is the same video and 10 fps but it is interpolated from the same videos 5 fps version. I want to see how accurate the frame interpolation by comparing the RGB values of every frame. I can extract every frame from both videos. However, I can only get the RGB values of only 1 frame. I use the following code:
from PIL import Image
im = Image.open('frame1.jpg')
pix = im.load()
for x in range(0,640):
for y in range(0,480):
print pix[x,y]
This code can only found the RGB values in 1 frame and I have hundreds of frames. The frames of my original video are named frame1.jpg frame2.jpg ... frame 100.jpg etc. and other videos frames are saved as frames1.jpg frames2.jpg ... frames100.jpg etc. Is there a way to automate it
You can use glob (comes natively with Python) to load all the images and store them simultaneously, or process them one at a time.
import glob
from PIL import Image
for frames in glob.glob({path}+"*.jpg"):
im = Image.open(frames)
pix=im.load()
for x in range(0,640):
for y in range(0,480):
print pix[x,y]
This will do what you did, but it will loop over all files in path with JPG format. If you want something more specific, you can add requests and I'll add to my answer. But this will sequentially load all images, allowing you to process them subsequently.
Put all that stuff in a loop.
for i in range(1, 101):
file = 'frame%d.jpg'%i
im = Image.open(file)
# ...
I have a multiband satellite image stored in the band interleaved pixel (BIP) format along with a separate header file. The header file provides the details such as the number of rows and columns in the image, and the number of bands (can be more than the standard 3).
The image itself is stored like this (assume a 5 band image):
[B1][B2][B3][B4][B5][B1][B2][B3][B4][B5] ... and so on (basically 5 bytes - one for each band - for each pixel starting from the top left corner of the image).
I need to separate out each of these bands as PIL images in Python 3.2 (on Windows 7 64 bit), and currently I think I'm approaching the problem incorrectly. My current code is as follows:
def OpenBIPImage(file, width, height, numberOfBands):
"""
Opens a raw image file in the BIP format and returns a list
comprising each band as a separate PIL image.
"""
bandArrays = []
with open(file, 'rb') as imageFile:
data = imageFile.read()
currentPosition = 0
for i in range(height * width):
for j in range(numberOfBands):
if i == 0:
bandArrays.append(bytearray(data[currentPosition : currentPosition + 1]))
else:
bandArrays[j].extend(data[currentPosition : currentPosition + 1])
currentPosition += 1
bands = [Image.frombytes('L', (width, height), bytes(bandArray)) for bandArray in bandArrays]
return bands
This code takes way too long to open a BIP file, surely there must be a better way to do this. I do have the numpy and scipy libraries as well, but I'm not sure how I can use them, or if they'll even help in any way.
Since the number of bands in the image are also variable, I'm finding it hard to figure out a way to read the file quickly and separate the image into its component bands.
And just for the record, I have tried messing with the list methods in the loops (using slices, not using slices, using only append, using only extend etc), it doesn't particularly make a difference as the major time is lost because of the number of iterations involved - (width * height * numberOfBands).
Any suggestions or advice would be really helpful. Thanks.
If you can find a fast function to load the binary data in a big python list (or numpy array), you can de-interleave the data using the slicing notation:
band0 = biglist[::nbands]
band1 = biglist[1::nbands]
....
Does that help?
Standard PIL
To load an image from a file, use the open function in the Image module.
>>> import Image
>>> im = Image.open("lena.ppm")
If successful, this function returns an Image object. You can now use instance attributes to examine the file contents.
>>> print im.format, im.size, im.mode
PPM (512, 512) RGB
The format attribute identifies the source of an image. If the image was not read from a file, it is set to None. The size attribute is a 2-tuple containing width and height (in pixels). The mode attribute defines the number and names of the bands in the image, and also the pixel type and depth. Common modes are "L" (luminance) for greyscale images, "RGB" for true colour images, and "CMYK" for pre-press images.
The Python Imaging Library also allows you to work with the individual bands of an multi-band image, such as an RGB image. The split method creates a set of new images, each containing one band from the original multi-band image. The merge function takes a mode and a tuple of images, and combines them into a new image. The following sample swaps the three bands of an RGB image:
Splitting and merging bands
r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))
So I think you should simply derive the mode and then split accordingly.
PIL with Spectral Python (SPy python module)
However, as you pointed out in your comments below, you are not dealing with a normal RGB image with 3 bands. So to deal with that, SpectralPython (a pure python module which requires PIL) might just be what you are looking for.
Specifically - http://spectralpython.sourceforge.net/class_func_ref.html#spectral.io.bipfile.BipFile
spectral.io.bipfile.BipFile deals with Image files with Band Interleaved Pixel (BIP) format.
Hope this helps.
I suspect that the repetition of extend is not good better allocate all first
def OpenBIPImage(file, width, height, numberOfBands):
"""
Opens a raw image file in the BIP format and returns a list
comprising each band as a separate PIL image.
"""
bandArrays = []
with open(file, 'rb') as imageFile:
data = imageFile.read()
currentPosition = 0
for j in range(numberOfBands):
bandArrays[j]= bytearray(b"\0"*(height * width)):
for i in xrange(height * width):
for j in xrange(numberOfBands):
bandArrays[j][i]=data[currentPosition])
currentPosition += 1
bands = [Image.frombytes('L', (width, height), bytes(bandArray)) for bandArray in bandArrays]
return bands
my measurements doesn't show nsuch a slow down
def x():
height,width,numberOfBands=1401,801,6
before = time.time()
for i in range(height * width):
for j in range(numberOfBands):
pass
print (time.time()-before)
>>> x()
0.937999963760376
EDITED