Fast string to array copying python - python

I'm looking to cut up image data into regularly sized screen blocks. Currently the method I've been using is this:
def getScreenBlocksFastNew(bmpstr):
pixelData = array.array('c')
step = imgWidth * 4
pixelCoord = (blockY * blockSizeY * imgWidth +
blockSizeX * blockX)* 4
for y in range(blockSizeY):
pixelData.extend( bmpstr[pixelCoord : pixelCoord + blockSizeX * 4] )
pixelCoord += step
return pixelData
bmpstr is a string of the raw pixel data, stored as one byte per RGBA value. (I also have the option of using a tuple of ints. They seem to take about the same amount of time for each). This creates an array of a block of pixels, depicted by setting blockX, blockY and blockSizeX, blockSizeY. Currently blockSizeX = blockSizeY = 22, which is the optimal size screen block for what I am doing.
My problem is that this process takes .0045 seconds per 5 executions, and extrapolating that out to the 2000+ screen blocks to fill the picture resolution requires about 1.7 seconds per picture, which is far too slow.
I am looking to make this process faster, but I'm not sure what the proper algorithm will be. I am looking to have my pixelData array pre-created so I don't have to reinstantiate it every time. However this leaves me with a question: what is the fastest way to copy the pixel RGBA values from bmpstr to an array, without using extend or append? Do I need to set each value individually? That can't be the most efficient way.
For example, how can I copy values bmpstr[0:100] into pixelData[0:100] without using extend or setting each value individually?

Related

My python loop through a dataframe slows down over time

I'm looping through a very large dataframe(11361 x 22679) and converting the values of each row to a pixel image using pyplot. So in the end I should have 11361 images with 151 x 151 pixels (I add 0's to the end to make it square).
allDF is a list of 33 DataFrames that correspond to the 33 subirectories in newFileNames the images need to save to.
I've tried deleting each DataFrame and image at the end of each iteration.
I've tried converting the float values to int.
I've tried gc.collect() at the end of each iteration (even though I know it's redundant)
I've taken measures not to store any additional values by always referencing the original data.
The only thing that helps is if I process one frame at a time. It still slows down, but because there are less iterations it's not as slow. So, I think the inner loop or one of the functions is the issue.
def shape_pixels(imglist):
for i in range(122):
imglist.append(0.0)
imgarr = np.array(imglist).reshape((151,151))
imgarr.reshape((151,151))
return imgarr
def create_rbg_image(subpath,imgarr,imgname):
# create/save image
img = plt.imshow(imgarr, cmap=rgbmap)
plt.axis('off')
plt.savefig(dirpath+subpath+imgname,
transparent=True,
bbox_inches=0,pad_inches=0)
for i in range(len(allDF)):
for j in range(len(allDF[i])):
fname = allDF[i]['File Name'].iloc[j][0:36]
newlist = allDF[i].iloc[j][1:].tolist()
newarr = shape_pixels(allDF[i].iloc[j][1:].tolist())
create_rbg_image(newFileNames[i]+'\\',shape_pixels(allDF[i].iloc[j][1:].tolist()),allDF[i]['File Name'].iloc[j][0:36])
I'd like to be able to run the code for the entire dataset and just come back to it when it's done, but I ran it overnight and got less than 1/3 of the way through. If it continues to slow down I'll never be done.
The first minute generates over 150 images The second generates 80. Then 48, 32, 27, and so on.. eventually it takes several minutes to create just one.
I don
plot.close('all') helped significantly, but I switched to using PIL and hexadec values, This was significantly more efficient and I was able to generate all 11k+ images in under 20 minutes

Using numpy stride-tricks to operate on two different-sized moving windows

I am processing a 1-dimensional array of data. I am looking for a greater than expected variance within a 7 item window, but would then need to correct values within a larger, 20 item window.
I'm using python and NumPy to accomplish the task. I started using numpy stride_tricks to create a moving window through the original array. Stride_tricks seemed the fastest way computationally to find the higher variance in the smaller windows. I get stuck when attempting to expand the window to correct the data.
Here's my current code:
with open('Sather-line-352-original.txt') as f:
array = np.array(map(int, f))
# shape defines the dimensions of the new temp array.
# strides define memory-based coordinates of original array items.
def pystride(array,frame_length,strided_items):
num_frames = 1 + ((len(array) - frame_length)/ strided_items)
row_stride = array.itemsize * strided_items
col_stride = array.itemsize
a_strided = np.lib.stride_tricks.as_strided(
array,
shape=(num_frames, frame_length),
strides=(row_stride, col_stride)
)
return a_strided
def find_max_min(array):
max_diff = 120
for sub in pystride(array,frame_length=7,strided_items=2):
max_val = max(sub)
min_val = min(sub)
if abs(max_val - min_val) >= max_diff:
# assign 'pointers' in original array indicating where large diffs are found.
sub[0] = int('{:<05}'.format(sub[0]))
find_max_min(array)
Particularly, is there a way to determine where the as_strided sub-array is in the original array? I've been modifying the data by appending 000 at the end of an integer value to act as a makeshift pointer, but that seems like a hack at best. Could I temporarily resize the sub-array for modification, and then resize back to the smaller window to continue scanning?
Here is a snippet of the array:
93,94,91,90,93,85,79,60,50,48,54,58,47,49,63,91,134,165,184,178,161,161,154,151,140,129,113,87,51,23,14,17,33,59,91,127,154,165,165,160,163
An example of data that requires correction occurs when the values fall from 140 to 14 within 7 values. Finding this means that everything from 23 to 33 will need to be bumped up between 51 and 59.
Any ideas would be appreciated.

Tkinter: format RGB values into string

I am new to Tkinter (and Python) and I would like to find the most efficient way to format RGB values into a string so it can be used with the PhotoImage.put() function.
Let's say I have a Numpy rank 3 array in which the RGB values are stored, the 3rd dimension having a length of 3 for red, green and blue respectively. The most intuitive way to proceed would be:
for i in range(0, n_pixels_x):
for j in range(0, n_pixels_y):
hexcode = "#%02x%02x%02x" % (array[i,j,0], array[i,j,1], array[i,j,2])
img.put(hexcode, (j,i))
Unfortunately, this is way too slow for large images.
As described in the PhotoImage Wiki, it is possible to pass one large string to put() so the function is called only once. Then, I need to efficiently convert my array into such a string, which should be formatted like this (for a 4x2 image):
"{#ff0000 #ff0000 #ff0000 #ff0000} {#ff0000 #ff0000 #ff0000 #ff0000}"
Again, this could easily be done with nested for loops, but I would like to avoid them for efficiency reasons. Is there any way to use join() in order to do what I want?
If needed, I can store the content of my array differently, the only constraint being that I should be able to modify the color values easily.
Edit: After working on this a bit, I found a way to format my values approximately 10 times faster than by using nested loops. Here is the commented piece of code:
# 1. Create RGB array
array = np.zeros((n_pixels_x*n_pixels_y, 3))
array = np.asarray(array, dtype = "uint32")
array[1,:] = [0, 100, 255]
# 2. Create a format string
fmt_str = "{" + " ".join(["#%06x"]*n_pixels_x) + "}"
fmt_str = " ".join([fmt_str]*n_pixels_y)
# 3. Convert RGB values to hex
array_hex = (array[:,0]<<16) + (array[:,1]<<8) + array[:,2]
# 4. Format array
img_str = fmt_str % tuple(array_hex)
For a 640x480 array, steps 3 and 4 take ~0.1s to execute on my laptop (evaluated with timeit.default_timer()). Using nested loops, it takes between 0.9s and 1.0s.
I would still like to reduce the computation time, but I'm not sure if any improvement is still possible at this point.
I was able to find another way to format my array, and this really seems to be the quickest solution. The solution is to simply use Image and ImageTk to generate an image object directly from the array:
array = np.zeros((height, width, 3), 'uint8')
imageObject = Image.fromarray(array)
img = ImageTk.PhotoImage(image = imageObject, mode = 'RGB'))
This takes approximately 0.02s to run, which is good enough for my needs, and there is no need to use the put() function.
I actually found this answer from another question: How do I convert a numpy array to (and display) an image?

python memory usage dict and variables large dataset

So, I'm making a game in Python 3.4. In the game I need to keep track of a map. It is a map of joined rooms, starting at (0,0) and continuing in every direction, generated in a filtered-random way(only correct matches for the next position are used for a random list select).
I have several types of rooms, which have a name, and a list of doors:
RoomType = namedtuple('Room','Type,EntranceLst')
typeA = RoomType("A",["Bottom"])
...
For the map at the moment I keep a dict of positions and the type of room:
currentRoomType = typeA
currentRoomPos = (0,0)
navMap = {currentRoomPos: currentRoomType}
I have loop that generates 9.000.000 rooms, to test the memory usage.
I get around 600 and 800Mb when I run it.
I was wondering if there is a way to optimize that.
I tried with instead of doing
navMap = {currentRoomPos: currentRoomType}
I would do
navMap = {currentRoomPos: "A"}
but this doesn't have a real change in usage.
Now I was wondering if I could - and should - keep a list of all the types, and for every type keep the positions on which it occurs. I do not know however if it will make a difference with the way python manages its variables.
This is pretty much a thought-experiment, but if anything useful comes from it I will probably implement it.
You can use sys.getsizeof(object) to get the size of a Python object. However, you have to be careful when calling sys.getsizeof on containers: it only gives the size of the container, not the content -- see this recipe for an explanation of how to get the total size of a container, including contents. In this case, we don't need to go quite so deep: we can just manually add up the size of the container and the size of its contents.
The sizes of the types in question are:
# room type size
>>> sys.getsizeof(RoomType("A",["Bottom"])) + sys.getsizeof("A") + sys.getsizeof(["Bottom"]) + sys.getsizeof("Bottom")
233
# position size
>>> sys.getsizeof((0,0)) + 2*sys.getsizeof(0)
120
# One character size
>>> sys.getsizeof("A")
38
Let's look at the different options, assuming you have N rooms:
Dictionary from position -> room_type. This involves keeping N*(size(position) + size(room_type)) = 353 N bytes in memory.
Dictionary from position -> 1-character string. This involves keeping N*158 bytes in memory.
Dictionary from type -> set of positions. This involves keeping N*120 bytes plus a tiny overhead with storing dictionary keys.
In terms of memory usage, the third option is clearly better. However, as is often the case, you have a CPU memory tradeoff. It's worth thinking briefly about the computational complexity of the queries you are likely to do. To find the type of a room given its position, with each of the three choices above you have to:
Look up the position in a dictionary. This is an O(1) lookup, so you'll always have the same run time (approximately), independent of the number of rooms (for a large number of rooms).
Same
Look at each type, and for each type, ask if that position is in the set of positions for that type. This is an O(ntypes) lookup, that is, the time it takes is proportional to the number of types that you have. Note that, if you had gone for list instead of a set to store the rooms of a given type, this would grow to O(nrooms * ntypes), which would kill your performance.
As always, when optimising, it is important to consider the effect of an optimisation on both memory usage and CPU time. The two are often at odds.
As an alternative, you could consider keeping the types in a 2-dimensional numpy array of characters, if your map is sufficiently rectangular. I believe this would be far more efficient. Each character in a numpy array is a single byte, so the memory usage would be much less, and the CPU time would still be O(1) lookup from room position to type:
# Generate random 20 x 10 rectangular map
>>> map = np.repeat('a', 100).reshape(20, 10)
>>> map.nbytes
200 # ie. 1 byte per character.
Some additionally small scale optimisations:
Encode the room type as an int rather than a string. Ints have size 24 bytes, while one-character strings have size 38.
Encode the position as a single integer, rather than a tuple. For instance:
# Random position
xpos = 5
ypos = 92
# Encode the position as a single int, using high-order bits for x and low-order bits for y
pos = 5*1000 + ypos
# Recover the x and y values of the position.
xpos = pos / 1000
ypos = pos % 1000
Note that this kills readability, so it's only worth doing if you want to squeeze the last bits of performance. In practice, you might want to use a power of 2, rather than a power of 10, as your delimiter (but a power of 10 helps with debugging and readability). Note that this brings your number of bytes per position from 120 to 24. If you do go down this route, consider defining a Position class using __slots__ to tell Python how to allocate memory, and add xpos and ypos properties to the class. You don't want to litter your code with pos / 1000 and pos % 1000 statements.

Windowing an audio signal in Python for a gammatone filterbank implementation

I am new to programming, particularly to python. I am trying to implement an auditory model using 4th order gammatone filters. I need to break down a signal into 39 channels. When I used a smaller signal (about 884726 bits), the code runs well but I think the buffers are full, so I have to restart the shell to run the code second time. Tried using flush() but didn't work out.
So, I decided to window the signal using a Hanning window but couldn't succeed in it either. To be very clear, I need to break an audio signal into 39 channels, rectify it (half wave) and then pass it into a second bank of 4th order filters, this time into 10 channels. I am trying to downsample the signal before sending into the second bank of filters. This is the piece of code that implements the filter bank by using the coefficients generated by another function. The dimensions of b are 39x4096.
def filterbank_application(input, b, verbose = False):
"""
A function to run the input through a bandpass filter bank with parameters defined by the b and a coefficients.
Parameters:
* input (type: array-like matrix of floats) - input signal. (Required)
* b (type: array-like matrix of floats) - the b coefficients of each filter in shape b[numOfFilters][numOfCoeffs]. (Required)
Returns:
* y (type: numpy array of floats) - an array with inner dimensions equal to that of the input and outer dimension equal to
the length of fc (i.e. the number of bandpass filters in the bank) containing the outputs to each filter. The output
signal of the nth filter can be accessed using y[n].
"""
input = np.array(input)
bshape = np.shape(b)
nFilters = bshape[0]
lengthFilter = bshape[1]
shape = (nFilters,) + (np.shape(input))
shape = np.array(shape[0:])
shape[-1] = shape[-1] + lengthFilter -1
y = np.zeros((shape))
for i in range(nFilters):
if(verbose):
sys.stdout.write("\r" + str(int(np.round(100.0*i/nFilters))) + "% complete.")
sys.stdout.flush()
x = np.array(input)
y[i] = signal.fftconvolve(x,b[i])
if(verbose): sys.stdout.write("\n")
return y
samplefreq,input = wavfile.read('sine_sweep.wav')
input = input.transpose()
input = (input[0] + input[1])/2
b_coeff1 = gammatone_filterbank(samplefreq, 39)
Output = filterbank_application(input, b_coeff1)
Rect_Output = half_rectification(Output)
I want to window audio into chunks of 20 seconds length. I would appreciate if you could let me know an efficient way of windowing my signal as the whole audio will be 6 times bigger than the signal I am using. Thanks in advance.
You may have a problem with the memory consumption, if you run a 32-bit Python. Your code consumes approximately 320 octets (bytes) per sample (40 buffers, 8 octets per sample). The maximum memory available is 2 GB, which means that then the absolute maximum size for the signal is around 6 million samples. If your file around 100 seconds, then you may start having problems.
There are two ways out of that problem (if that really is the problem, but I cannot see any evident reason why your code would otherwise crash). Either get a 64-bit Python or rewrite your code to use memory in a more practical way.
If I have understood your problem correctly, you want to:
run the signal through 39 FIR filters (4096 points each)
half-rectify the resulting signals
downsample the resulting half-rectified signal
filter each of the downsampled rectified signals by 10 FIR filters (or IIR?)
This wil give you 39 x 10 signals which give you the attack and frequency response of the incoming auditory signal.
How I would do this is:
take the original signal and keep it in memory (if it does not fit, that can be fixed by a trick called memmap, but if your signal is not very long it will fit)
take the first gammatone filter and run the convolution (scipy.signal.fftconvolve)
run the half-wave rectification (sig = np.clip(sig, 0, None, out=sig))
downsample the signal (e.g. scipy.signal.decimate)
run the 10 filters (e.g. scipy.signal.fftconvolve)
repeat steps 2-5 for all other gammatones
This way you do not need to keep the 39 copies of the filtered signal in memory, if you only need the end results.
Without seeing the complete application and knowing more about the environment it is difficult to say whether you really have a memory problem.
Just a stupid signal-processing question: Why half-wave rectification? Why not full-wave rectification: sig = np.abs(sig)? The low-pass filtering is easier with full-wave rectified signal, and the audio signals should anyway be rather symmetric.
There are a few things which you might want to change in your code:
you convert input into an array as the first thing in your function - there is no need to do it again within the loop (just use input instead of x when running the fftconvolve)
creating an empty ycould be done by y = np.empty((b.shape[0], input.shape[0] + b.shape[1] - 1)); this will be more readable and gets rid of a number of unnecessary variables
input.transpose() takes some time and memory and is not required. You may instead do: input = np.average(input, axis=1) This will average every row in the array (i.e. average the channels).
There is nothing wrong with your sys.stdout.write, etc. There the flush is used because otherwise the text is written into a buffer and only shown on the screen when the buffer is full.

Categories