Drawing fast lines in pygame

Drawing fast lines in pygame - python

I'm trying to draw fast lines using pygame that aren't rendered directly to the screen. I've got a Python list as large as the number of pixels for the desired resolution, and store integer values corresponding to the number of times that pixel was hit by the line algorithm. Using this, a 2D heat map is built up, so rather than drawing a flat pixel value, pixel values are incremented based on the number of times a line runs through it, and "hot" pixels get brighter colours.
The reason for doing it this way is that we don't know in advance how many of these lines are going to get drawn, and what the maximum number of times any given pixel is going to be hit. Since we'd like to scale the output so that each rendering has the correct maximum and minimum RGB values, we can't just draw to the screen.
Is there a better way to draw these lines than a relatively naive Bresenham's algorithm? Here's the critical part of the drawLine function:
# before the loop, to save repeated multiplications
xm = []
for i in range(resolution[0]):
xm.append(i * resolution[0])
# inside of drawLine, index into the f list, of size resolution[0] * resolution[1]
for x in range(x0, x1 + 1):
if steep:
idx = y + xm[x]
f[idx] += 1
else:
idx = x + xm[y]
f[idx] += 1
The end result is scaled and drawn to the screen based on the maximum value inside of f. For example, if the maximum value is 1000, then you can assume the RGB value of each of the pixels is (f[i] * 255) / 1000.
The profile information says that runtime is dominated by the index lookups into f. I've used previous questions here to prove that these basic lists are faster than numpy arrays or arrays in Python, but for drawing lines like this, it still seems like there's room to improve.
What's a good and fast method for drawing an unknown number of lines to the screen, knowing that you'll be scaling the output in the end to render to the screen? Is there a good way to get rid of the index overhead?

Try Cython or something similar. (If you do, I would be interested in knowing if/how much that helped)
Cython is a programming language to
simplify writing C and C++ extension
modules for the CPython Python
runtime. Strictly speaking, Cython
syntax is a superset of Python syntax
additionally supporting: Direct
calling of C functions, or C++
functions/methods, from Cython code.
Strong typing of Cython variables,
classes, and class attributes as C
types. Cython compiles to C or C++
code rather than Python, and the
result is used as a Python Extension
Module or as a stand-alone application
embedding the CPython runtime.
(http://en.wikipedia.org/wiki/Cython)

Related

Fastest method to modify bytestring over loop in Python?

I have an image stored as a bytestring b'' and am performing per-pixel operations. Right now the fastest way I've found is to use the struct crate to pack and unpack the bytes during modification, then save the pixels to a bytearray
# retrieve image data. Stored as bytestring
pixels = buff.get(rect, 1.0, "CIE LCH(ab) alpha double",
Gegl.AbyssPolicy.CLAMP)
# iterator split into 32-byte chunks for each pixel's 8-byte LCHA channels
pixels_iter = (pixels[x:x + 32] for x in range(0, len(pixels), 32))
new_pixels = bytearray()
# when using `pool.map`, the loop was placed in its own function.
for pixel in pixels_iter:
l, c, h, a = struct.unpack('dddd', pixel)
# simple operation for now: lower chroma if bright and saturated
c = c - (l * c) / 100
new_pixels += struct.pack('dddd', l, c, h, a)
# save new data. everything hereout handled by GEGL instead of myself.
shadow.set(rect, "CIE LCH(ab) alpha double", bytes(new_pixels))
Problem is this takes about 3 1/2 seconds for a 7MP image on my workstation. Fair but not ideal if updates are frequently requested. From what I've gathered, it seems the constant array modification and possibly struct [un]packing are the main culprits. I've refactored this probably a dozen times and I think I'm out of ideas for optimizing this.
I've tried:
struct.unpacking the whole bytestring once instead of each pixel as-needed. Lost about 20% efficiency.
collections.deque Admittedly not familiar with its technicalities. Lost 10-30% depending on implementation
similar results with other iterator helpers like map/join
numpy.array Also admittedly know basically nothing about general numpy. Similar results to deque
multiprocessing seemed to be bottlenecked when I appended the pool.map results to new_pixels. Actually lost about 10% which seems wild, as usually I can just lazily throw threads at problems. The pixels_iter was grouped again into equally sized sublists for each thread, so new_pixels concatenated 8 large lists instead of a few million small lists, which I thought would be faster. Tempted to retry this one as I might've botched it somehow with my 4 am implementation.
In theory it could also work by saving multiple small sections of the image buffer to avoid concatenating to new_pixels entirely, but that would vastly increase code complexity elsewhere.
Converting pixels itself into a bytearray and modifying it in-place using slice ranges. Lost ~30% but also halved memory usage.
Completely separate interpreters like Pypy are off the table, as I'm not the one bundling the Python version.

NumPy should produce much faster results than a manual loop, if you use it properly. Using it properly means using NumPy operations over whole arrays, not just looping manually over a NumPy array.
For example,
new_pixels = bytearray(pixels)
as_numpy = numpy.frombuffer(new_pixels, dtype=float)
as_numpy[1::4] *= 1 - as_numpy[::4] / 100
Now new_pixels contains the adjusted values.

Python: loop optimization

I am quite new to Python but I've started using it for some data analysis and now I love it. Before, I used C, which I find just terrible for file I/O.
I am working on a script which computes the radial distribution function between a set of N=10000 (ten thousands) points in a 3D box with periodic boundary conditions (PBC). Basically, I have a file of 10000 lines made like this:
0.037827 0.127853 -0.481895
12.056849 -12.100216 1.607448
10.594823 1.937731 -9.527205
-5.333775 -2.345856 -9.217283
-5.772468 -10.625633 13.097802
-5.290887 12.135528 -0.143371
0.250986 7.155687 2.813220
which represents the coordinates of the N points. What I have to do is to compute the distance between every couple of points (I hence have to consider all the 49995000 combinations of 2 elements) and then do some operation on it.
Of course the most taxing part of the program is the loop over the 49995000 combinations.
My current function looks like this:
g=[0 for i in range(Nbins)]
for i in range(Nparticles):
for j in range(i+1,Nparticles):
#compute distance and apply PBC
dx=coors[i][0]-coors[j][0]
if(dx>halfSide):
dx-=boxSide
elif(dx<-halfSide):
dx+=boxSide
dy=coors[i][1]-coors[j][1]
if(dy>halfSide):
dy-=boxSide
elif(dy<-halfSide):
dy+=boxSide
dz=coors[i][2]-coors[j][2]
if(dz>halfSide):
dz-=boxSide
elif(dz<-halfSide):
dz+=boxSide
d2=dx**2+dy**2+dz**2
if(d2<(cutoff*boxSide)**2):
g[int(sqrt(d2)/dr)]+=2
Notice: coors is a nested array, created using loadtxt() on the data file.
I basically recycled a function that I used in another program, written in C.
I am not using itertool.combinations() because I have noticed that the program runs slightly slower if I use it for some reason (one iteration is around 111 s while it runs in around 106 with this implementation).
This function takes around 106 s to run, which is terrible considering that I have to analyze some 500 configuration files.
Now, my question is: is there general a way to make this kind of loops run faster using Python? I guess that the corresponding C code would be faster, but I would like to stick to Python because it is so much easier to use.
I would like to stress that even though I'm certainly looking for a particular solution to my problem, I would like to know how to iterate more efficiently when using Python in general.
PS Please try to explain the code in your answers as much as possible (if you write any) because I'm a newbie in Python.
Update
First, I want to say that I know that, since there is a cutoff, I could write a more efficient algorithm if I divide the box in smaller boxes and only compute the distances in neighboring boxes, but for now I would only like to speed up this particular algorithm.
I also want to say that using Cython (I followed this tutorial) I managed to speed up everything a little bit, as expected (77 s, where before it took 106).

If memory is not an issue (and it probably isn't given that the actual amount of data won't be different from what you are doing now), you can use numpy to do the math for you and put everything into an NxN array (around 800MB at 8 bytes/float).
Given the operations your code is trying to do, I do not think you need any loops outside numpy:
g = numpy.zeros((Nbins,))
coors = numpy.array(coors)
#compute distance and apply PBC
dx = numpy.subtract.outer(coors[:, 0], coors[:, 0])
dx[dx < -halfSide] += boxSide
dx[dx > halfSide)] -= boxSide
dy = numpy.subtract.outer(coors[:, 1], coors[:, 1])
dy[dy < -halfSide] += boxSide
dy[dy > halfSide] -= boxSide
dz = numpy.subtract.outer(coors[:, 2], coors[:, 2])
dz[dz < -halfSide] += boxSide
dz[dz > halfSide] -= boxSide
d2=dx**2 + dy**2 + dz**2
# Avoid counting diagonal elements: inf would do as well as nan
numpy.fill_diagonal(d2, numpy.nan)
# This is the same length as the number of times the
# if-statement in the loops would have triggered
cond = numpy.sqrt(d2[d2 < (cutoff*boxSide)**2]) / dr
# See http://stackoverflow.com/a/24100418/2988730
np.add.at(g, cond.astype(numpy.int_), 2)
The point is to always stay away from looping in Python when dealing with large amounts of data. Python loops are inefficient. They perform lots of book-keeping operations that slow down the math code.
Libraries such as numpy and dynd provide operations that run the loops you want "under the hood", usually bypassing the bookkeeping with a C implementation. The advantage is that you can do the Python-level book-keeping once for each enormous array and then get on with processing the raw numbers instead of dealing with Python wrapper objects (numbers are full blown-objects in Python, there are no primitive types).
In this particular example, I have recast your code to use a sequence of array-level operations that do the same thing that your original code did. This requires restructuring how you think about the steps a little, but should save a lot on runtime.

You've got an interesting problem there. There are some common guidelines for high performance Python; these are for Python2 but should carry for the most part to Python3.
Profile your code. Using %timeit and %%timeit on jupyter/ipython is quick and fast for interactive sessions, but cProfile and line_profiler are valuable for finding bottlenecks.
This a link to a short essay that covers the basics from the Python documentation that I found helpful: https://www.python.org/doc/essays/list2str
Numpy is a great package for vectorized operations. Note that numpy vectors are generally slower to work with than small-medium size lists but the memory savings are huge. Speed gain is substantial over lists when doing multi-dimensional arrays. Also if you start observing cache misses and page-faults with pure Python, then the numpy benefit will be even greater.
I have been using Cython recently with a fair amount of success, where numpy/scipy don't quite cut it with the builtin functions.
Check out scipy which has a huge library of functions for scientific computing. There's a lot to explore; eg., the scipy.spatial.pdist function computes fast nC2 pairwise distances. In the testrun below, 10k items pairwise distances completed in 375ms. 100k items will probably break my machine though without refactoring.
import numpy as np
from scipy.spatial.distance import pdist
xyz_list = np.random.rand(10000, 3)
xyz_list
Out[2]:
array([[ 0.95763306, 0.47458207, 0.24051024],
[ 0.48429121, 0.12201472, 0.80701931],
[ 0.26035835, 0.76394588, 0.7832222 ],
...,
[ 0.07835084, 0.8775841 , 0.20906537],
[ 0.73248369, 0.60908474, 0.57163023],
[ 0.68945879, 0.19393467, 0.23771904]])
In [10]: %timeit xyz_pairwise = pdist(xyz_list)
1 loop, best of 3: 375 ms per loop
In [12]: xyz_pairwise = pdist(xyz_list)
In [13]: len(xyz_pairwise)
Out[13]: 49995000
Happy exploring!

Python loop transfer in other programming language

Again I have a question concerning large loops.
Suppose I have a function
limits
def limits(a,b):
*evaluate integral with upper and lower limits a and b*
return float result
A and B are simple np.arrays that store my values a and b. Now I want to calculate the integral 300'000^2/2 times because A and B are of the length of 300'000 each and the integral is symmetrical.
In Python I tried several ways like itertools.combinations_with_replacement to create the combinations of A and B and then put them into the integral but that takes huge amount of time and the memory is totally overloaded.
Is there any way, for example transferring the loop in another language, to speed this up?
I would like to run the loop
for i in range(len(A)):
for j in range(len(B)):
np.histogram(limits(A[i],B[j]))
I think histrogramming the return of limits is desirable in order not to store additional arrays that grow squarely.
From what I read python is not really the best choice for this iterative ansatzes.
So would it be reasonable to evaluate this loop in another language within Python, if yes, How to do it. I know there are ways to transfer code, but I have never done it so far.
Thanks for your help.

If you're worried about memory footprint, all you need to do is bin the results as you go in the for loop.
num_bins = 100
bin_upper_limits = np.linspace(-456, 456, num=num_bins-1)
# (last bin has no upper limit, it goes from 456 to infinity)
bin_count = np.zeros(num_bins)
for a in A:
for b in B:
if b<a:
# you said the integral is symmetric, so we can skip these, right?
continue
new_result = limits(a,b)
which_bin = np.digitize([new_result], bin_upper_limits)
bin_count[which_bin] += 1
So nothing large is saved in memory.
As for speed, I imagine that the overwhelming majority of time is spent evaluating limits(a,b). The looping and binning is plenty fast in this case, even in python. To convince yourself of this, try replacing the line new_result = limits(a,b) with new_result = 234. You'll find that the loop runs very fast. (A few minutes on my computer, much much less than the 4 hour figure you quote.) Python does not loop very fast compared to C, but it doesn't matter in this case.
Whatever you do to speed up the limits() call (including implementing it in another language) will speed up the program.
If you change the algorithm, there is vast room for improvement. Let's take an example of what it seems you're doing. Let's say A and B are 0,1,2,3. You're integrating a function over the ranges 0-->0, 0-->1, 1-->1, 1-->2, 0-->2, etc. etc. You're re-doing the same work over and over. If you have integrated 0-->1 and 1-->2, then you can add up those two results to get the integral 0-->2. You don't have to use a fancy integration algorithm, you just have to add two numbers you already know.
Therefore it seems to me that you can compute integrals in all the smallest ranges (0-->1, 1-->2, 2-->3), store the results in an array, and add subsets of the results to get the integral over whatever range you want. If you want this program to run in a few minutes instead of 4 hours, I suggest thinking through an alternative algorithm along those lines.
(Sorry if I'm misunderstanding the problem you're trying to solve.)

Image Gurus: Optimize my Python PNG transparency function

I need to replace all the white(ish) pixels in a PNG image with alpha transparency.
I'm using Python in AppEngine and so do not have access to libraries like PIL, imagemagick etc. AppEngine does have an image library, but is pitched mainly at image resizing.
I found the excellent little pyPNG module and managed to knock up a little function that does what I need:
make_transparent.py
pseudo-code for the main loop would be something like:
for each pixel:
if pixel looks "quite white":
set pixel values to transparent
otherwise:
keep existing pixel values
and (assuming 8bit values) "quite white" would be:
where each r,g,b value is greater than "240"
AND each r,g,b value is within "20" of each other
This is the first time I've worked with raw pixel data in this way, and although works, it also performs extremely poorly. It seems like there must be a more efficient way of processing the data without iterating over each pixel in this manner? (Matrices?)
I was hoping someone with more experience in dealing with these things might be able to point out some of my more obvious mistakes/improvements in my algorithm.
Thanks!

This still visits every pixel, but may be faster:
new_pixels = []
for row in pixels:
new_row = array('B', row)
i = 0
while i < len(new_row):
r = new_row[i]
g = new_row[i + 1]
b = new_row[i + 2]
if r>threshold and g>threshold and b>threshold:
m = int((r+g+b)/3)
if nearly_eq(r,m,tolerance) and nearly_eq(g,m,tolerance) and nearly_eq(b,m,tolerance):
new_row[i + 3] = 0
i += 4
new_pixels.append(new_row)
It avoids the slicen generator, which will be copying the entire row of pixels for every pixel (less one pixel each time).
It also pre-allocates the output row by directly copying the input row, and then only writes the alpha value of pixels which have changed.
Even faster would be to not allocate a new set of pixels at all, and just write directly over the pixels in the source image (assuming you don't need the source image for anything else).

Honestly, the only heuristic I could conceive is picking a few arbitrary, random points on your image and using a flood fill.
This only works well if your image as large contiguous white portions (if your image is an object with no or little holes in front of a background, then you're in luck -- you actually have a heuristic for which points to flood fill from).
(disclaimer: I am no image guru =/ )

I'm quite sure there is no short cut for this. You have to visit every single pixel.

The issue seems to have more to do with loops in Python than with images.
Python loops are extremely slow, it is best to avoid them and use built-ins loop operators instead.
Here, if you were willing to copy the image, you could use a list comprehension:
def make_transparent(pixel):
if pixel looks "quite white": return transparent
else: return pixel
newImage = [make_transparent(p) for p in oldImage]

Simple object recognition

===SOLVED===
Thanks for your suggestions and comments. By working on the flood_fill algorithm given in Beginning Python Visualization book (Chapter 9 - Image Processing) I have implemented what I have wanted. I can count the objects, get enclosing rectangles for each object (therefore height and widths), and lastly can construct NumPy arrays or matrices for each of them.
Although it is not an optimized approach it does what I want. The source code (lab2.py) and the png file (lab2-particles.png) that I use have been put under http://code.google.com/p/ccnworks/source/browse/#svn/trunk/AtSc450.
You need NumPy and PIL installed, and matplotlib to see the histogram. Core of the code lies within the objfind function where the main recursive object search action occurs.
One further update:
SciPy's ndimage.label() does exactly what I want, too.
Cheers for David-Warde Farley and Zachary Pincus from the NumPy and SciPy mailing-lists for pointing this right into my eyes :)
=============
Hello,
I have an image that contains the shadows of ice particles measured by a particle spectrometer. I want to be able to identify each object, so that I can later classify and use them further in my calculations.
In essence, what I am willing to do is to simply implement a fuzzy selection tool where I can simply select each entity.
How could I easily solve this problem? (Preferably using Python)
Thanks.
NOTE: In my question I am referring to each specific connected pixels as objects or entities. My intention to extract them and create NumPy array representations as shown below. (Here I am using the top-left object; if a pixel exist use 1's if not use 0's. This object's shape is 3 by 3 which correspondingly 3 pixel height by 3 pixel width. These are projections of real ice-particles onto 2D domain, under the assumption of their being spherical and equivalent radius is (height+width)/2, and later some scalings --from pixels to actual sizes and volume calculations will follow)
import numpy as np
np.array([[1,1,1], [1,1,1], [0,0,1]])
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 1]])
Here is a section from the image which I am going to use.
screenshot http://img43.imageshack.us/img43/2327/particles.png

Scan every square (e.g. from the top-left, left-to-right, top-to-bottom)
When you hit a blue square then:
a. Record this square as a location of a new object
b. Find all the other contiguous blue squares (e.g. by looking at the neighbours of this square, and the neighbours of those neighbours, etc.) and mark them as being part of the same object
Continue to scan
When you find another blue square, test to see whether it's part of a known object before going to step 2; alternatively in step 2b, erase any square after you've associated it with an object

I used to do this kind of analysis on micrographs and eventually put everything I needed into an image processing and analysis package written in C, driven via Tcl. (It worked with 512 x 512 images only, which explains why 512 crops up so often. There were images with pixels of various sizes allocated, but most of the work was done with 8-bit pixels, which explains why there is that business of 0xff and maximum meaningful count of 254 on an image.)
Briefly, the 'zz' at the begining of the Tcl commands sends the remainder of the line to the package's parser which calls the appropriate C routine with the given arguments. Right after the 'zz' is an argument that indicates the input and output of the command. (There can be multiple inputs but only a single output.) 'r' indicates a 512 x 512 x 8-bit image. The third word is the name of the command to be invoked; 'graphs' marks up an image as described in the text below. So, 'zz rr graphs' means 'Call the ZZ parser; input an r image to the graphs command and get back an r image.' The rest of the Tcl command line specifies which of the pre-allocated images to use. (The 'g' image is an ROI, i.e., region-of-interest, image; almost all ZZ ops are done under ROI control.) So, 'r1 r1 g8' means 'Use r1 as input, use r1 as output (that is, mark up the input image itself), and do the operation wherever the corresponding pixel on image g8 --- that is, r8, used as an ROI --- is >0.
I don't think it is available online anywhere, but if you want to pick through the source code or even compile the whole shebang, I'll be happy to send it to you. Here's an excerpt from the manual (but I think I see some errors in the manual at this late date --- that's embarrassing ...):
Example 6. Counting features.
Problem
Counting is a common task. The items counted are called “features”, and it is usually necessary to prepare images carefully so that features correspond in a one-to-one way with things that are the real objects to be counted. Here, however, we ignore image preparation and consider, instead, the mechanics of counting. The first counting exercise is to find out how many features are on the images in the directory ./cells?
Approach
First, let us define “feature”. A feature is the largest group of “set” (non-zero) pixels all of which can be reached by travelling from one set pixel to another along north-south-east-west (up-down-right-left) routes, starting from a given set pixel. The zz command that detects and marks such features on an image is “zz rr graphs R:src R:dest G:ROI”, so called because the mathematical term for such a feature is a “graph”. If all the pixels on an image are set, then there is only a single graph on the image, but it contains 262144 pixels (512 * 512). If pixels are set and clear (equal to zero) in a checkerboard pattern,
then there will be 131072 (512 * 512 / 2) graphs, but each will containing only one pixel.
Briefly explained, “zz rr graphs” starts in the upper-left corner of an image and scans each
succeeding row left to right until it finds a set pixel, then finds all the set pixels attached to that through north, south, east, or west borders (“4-connected”). It then sets all pixels in that graph to 1 (0x01). After finding and marking graph 1, it starts scanning again at the pixel after the one where it first discovered graph 1, this time ignoring any pixels that already belong to a graph. The first 254 graphs that it finds will be marked uniquely; all graphs found after that, however, will be marked with the value 255 (0xff)
and so cannot be distinguished from each other. The key to being able to count any number of graphs accurately is to process each image in stages, that is, find the number of graphs on an image and, if the number is greater than 254, erase the 254 graphs just found, repeating the process until 254 or fewer graphs are found. The Tcl language provides the means to set up control of this operation.
Let us begin to build the commands needed by reading a ZZ image file into an R image and detecting and marking the graphs. Before the processing loop, we declare and zero a variable to hold the total number of features in the image series. Within the processing loop, we begin by reading the image file into an R image and detecting and marking the graphs.
zz ur to $inDir/$img r1
zz rr graphs r1 r1 g8
Next, we zero some variables to keep track of the counts, then use the “ra max” command to find out whether more than 254 graphs were detected.
set nGraphs [ zz ra max r1 a1 g1 ]
If nGraphs does equal 255, then the 254 accurately counted graphs should be added to the total, the graphs from 1 through 254 should be erased, and the count repeated for as many times as it takes to reduce the number of graphs below 255.
while {$nGraphs == 255} {
incr sumGraphs 254
zz rbr lt r1 155 r1 g1 0 255
set sumGraphs 0
zz rr graphs r1 r1 g8
set nGraphs [ zz ra max r1 a1 g8 ]
}
When the “while” loop exits, the variable nGraphs must hold a number less than 255, that is, a number of accurately counted graphs; this is added to the rising total of the number of features in the image series.
incr sumGraphs $nGraphs
After the processing loop, print out the total number of features found in the series.
puts “Total number of features in $inDir \
images $beginImg through $endImg is $sumGraphs.”
After the processing loop, print out the total number of features found in the series.

Looking at the image you provided, all you need to do next is to apply a simple region growing algorithm.
If I were using MATLAB, I would use bwlabel/bwboundaries functions. I believe there's an equivalent function somewhere in Numpy, or use OpenCV with python wrappers as suggested by #kwatford

OpenCV has a Python interface that you might find useful.

Connected component analysis may be what you are looking for.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.