Python Fast monochromatic bitmap

Python Fast monochromatic bitmap - python

i want to implement 1024x1024 monochromatic grid , i need read data from any cell and insert rectangles with various dimensions, i have tried to make list in list ( and use it like 2d array ), what i have found is that list of booleans is slower than list of integers.... i have tried 1d list, and it was slower than 2d one, numpy is slower about 10 times that standard python list, fastest way that i have found is PIL and monochromatic bitmap used with "load" method, but i want it to run a lot faster, so i have tried to compile it with shedskin, but unfortunately there is no pil support there, do you know any way of implementing such grid faster without rewriting it to c or c++ ?

Raph's suggestin of using array is good, but it won't help on CPython, in fact I'd expect it to be 10-15% slower, however if you use it on PyPy (http://pypy.org/) I'd expect excellent results.

One thing I might suggest is using Python's built-in array class (http://docs.python.org/library/array.html), with a type of 'B'. Coding will be simplest if you use one byte per pixel, but if you want to save memory, you can pack 8 to a byte, and access using your own bit manipulation.

I would look into Cython which translates the Python into C that is readily compiled (or compiled for you if you use distutils). Just compiling your code in Cython will make it faster for something like this, but you can get much greater speed-ups by adding a few cdef statements. If you use it with Numpy, then you can quickly access Numpy arrays. The speed-up can be quite large by using Cython in this manner. However, it would easier to help you if you provided some example code.

Related

Questions regarding numpy in Python

I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?

Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.

What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to SPEED?

This is a follow up to this question
What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?
I'm interested in understanding the speed implications of using a numpy array vs a list of lists when the array is of type object.
If anyone is interested in the object I'm using:
import gmpy2 as gm
gm.mpfr('0') # <-- this is the object

The biggest usual benefits of numpy, as far as speed goes, come from being able to vectorize operations, which means you replace a Python loop around a Python function call with a C loop around some inlined C (or even custom SIMD assembly) code. There are probably no built-in vectorized operations for arrays of mpfr objects, so that main benefit vanishes.
However, there are some place you'll still benefit:
Some operations that would require a copy in pure Python are essentially free in numpy—transposing a 2D array, slicing a column or a row, even reshaping the dimensions are all done by wrapping a pointer to the same underlying data with different striding information. Since your initial question specifically asked about A.T, yes, this is essentially free.
Many operations can be performed in-place more easily in numpy than in Python, which can save you some more copies.
Even when a copy is needed, it's faster to bulk-copy a big array of memory and then refcount all of the objects than to iterate through nested lists deep-copying them all the way down.
It's a lot easier to write your own custom Cython code to vectorize an arbitrary operation with numpy than with Python.
You can still get some benefit from using np.vectorize around a normal Python function, pretty much on the same order as the benefit you get from a list comprehension over a for statement.
Within certain size ranges, if you're careful to use the appropriate striding, numpy can allow you to optimize cache locality (or VM swapping, at larger sizes) relatively easily, while there's really no way to do that at all with lists of lists. This is much less of a win when you're dealing with an array of pointers to objects that could be scattered all over memory than when dealing with values that can be embedded directly in the array, but it's still something.
As for disadvantages… well, one obvious one is that using numpy restricts you to CPython or sometimes PyPy (hopefully in the future that "sometimes" will become "almost always", but it's not quite there as of 2014); if your code would run faster in Jython or IronPython or non-NumPyPy PyPy, that could be a good reason to stick with lists.

What is the best way of allocating array of bytes in Cython and passing it back to Python?

I have a Cython function like cdef generate_data(size) where I'd like to:
initialise an array of bytes of length size
call external (C) function to populate the array using array ptr and size
return the array as something understandable by Python (bytearray, bytes, your suggestions)
I have seen many approaches on the internet but I'm looking for the best/recommended way of doing this in my simple case. I want to:
avoid memory reallocations
avoid using numpy
ideally use something that works in Python 3 and 2.7, although a 2.7 solution is good enough.
I'm using Cython 0.20.

For allocating memory, I have you covered.
After that, just take a pointer (possibly at the data attribute if you use cpython.array.array like I recommend) and pass that along. You can return the cpython.array.array type and it will become a Python array.

Coordinate container types in Python Aggdraw for fastest possible rendering?

Original Question:
I have a question about the Python Aggdraw module that I cannot find in the Aggdraw documentation. I'm using the ".polygon" command which renders a polygon on an image object and takes input coordinates as its argument.
My question is if anyone knows or has experience with what types of sequence containers the xy coordinates can be in (list, tuple, generator, itertools-generator, array, numpy-array, deque, etc), and most importantly which input type will help Aggdraw render the image in the fastest possible way?
The docs only mention that the polygon method takes: "A Python sequence (x, y, x, y, …)"
I'm thinking that Aggdraw is optimized for some sequence types more than others, and/or that some sequence types have to be converted first, and thus some types will be faster than others. So maybe someone knows these details about Aggdraw's inner workings, either in theory or from experience?
I have done some preliminary testing, and will do more soon, but I still want to know the theory behind why one option might be faster, because it might be that I not doing the tests properly or that there are some additional ways to optimize Aggdraw rendering that I didn't know about.
(Btw, this may seem like trivial optimization but not when the goal is to be able to render tens of thousands of polygons quickly and to be able to zoom in and out of them. So for this question I dont want suggestions for other rendering modules (from my testing Aggdraw appears to be one of the fastest anyway). I also know that there are other optmization bottlenecks like coordinate-to-pixel transformations etc, but for now Im only focusing on the final step of Aggdraw's internal rendering speed.)
Thanks a bunch, curious to see what knowledge and experience others out there have with Aggdraw.
A Winner? Some Preliminary Tests
I have now conducted some preliminary tests and reported the results in an Answer further down the page if you want the details. The main finding is that rounding float coordinates to pixel coordinates as integers and having them in arrays are the fastest way to make Aggdraw render an image or map, and lead to incredibly fast rendering speedups on the scale of 650% at speeds that can be compared with well-known and commonly used GIS software. What remains is to find fast ways to optimize coordinate transformations and shapefile loading, and these are daunting tasks indeed. For all the findings check out my Answer post further down the page.
I'm still interested to hear if you have done any tests of your own, or if you have other useful answers or comments. I'm still curious about the answers to the Bonus question if anyone knows.
Bonus question:
If you don't know the specific answer to this question it might still help if you know which programming language the actual Aggdraw rendering is done in? Ive read that the Aggdraw module is just a Python binding for the original C++ Anti-Grain Geometry library, but not entirely sure what that actually means. Does it mean that the Aggdraw Python commands are simply a way of accessing and activating the c++ library "behind the scenes" so that the actual rendering is done in C++ and at C++ speeds? If so then I would guess that C++ would have to convert the Python sequence to a C++ sequence, and the optimization would be to find out which Python sequence can be converted the fastest to a C++ sequence. Or is the Aggdraw module simply the original library rewritten in pure Python (and thus much slower than the C++ version)? If so which Python types does it support and which is faster for the type of rendering work it has to do. enter code here

A Winner? Some Preliminary Tests
Here are the results from my initial testings of which input types are faster for aggdraw rendering. One clue was to be found in the aggdraw docs where it said that aggdraw.polygon() only takes "sequences": officially defined as "str, unicode, list, tuple, bytearray, buffer, xrange" (http://docs.python.org/2/library/stdtypes.html). Luckily however I found that there are also additional input types that aggdraw rendering accepts. After some testing I came up with a list of the input container types that I could find that aggdraw (and maybe also PIL) rendering supports:
tuples
lists
arrays
Numpy arrays
deques
Unfortunately, aggdraw does not support and results in errors when supplying coordinates contained in:
generators
itertool generators
sets
dictionaries
And then for the performance testing! The test polygons were a subset of 20 000 (multi)polygons from the Global Administrative Units Database of worldwide sub-national province boundaries, loaded into memory using the PyShp shapefile reader module (http://code.google.com/p/pyshp/). To ensure that the tests only measured aggdraw's internal rendering speed I made sure to start the timer only after the polygon coordinates were already transformed to aggdraw image pixel coordinates, AND after I had created a list of input arguments with the correct input type and aggdraw.Pen and .Brush objects. I then timed and ran the rendering using itertools.starmap with the preloaded coordinates and arguments:
t=time.time()
iterat = itertools.starmap(draw.polygon, args) #draw is the aggdraw.Draw() object
for runfunc in iterat: #iterating through the itertools generator consumes and runs it
pass
print time.time()-t
My findings confirm the traditional notion that tuples and arrays are the fastest Python iterators, which both ended up being the fastest. Lists were about 50% slower, and so too were numpy arrays (this was initially surprising given the speed-reputation of Numpy arrays, but then I read that Numpy arrays are only fast when one uses the internal Numpy functions on them, and that for normal Python iteration they are generally slower than other types). Deques, usually considered to be fast, turned out to be the slowest (almost 100%, ie 2x slower).
### Coordinates as FLOATS
### Pure rendering time (seconds) for 20 000 polygons from the GADM dataset
tuples
8.90130587328
arrays
9.03419164657
lists
13.424952522
numpy
13.1880489246
deque
16.8887938784
In other words, if you usually use lists for aggdraw coordinates you should know that you can gain a 50% performance improvement by instead putting them into a tuple or array. Not the most radical improvement but still useful and easy to implement.
But wait! I did find another way to squeeze out more performance power from the aggdraw module--quite a lot actually. I forget why I did it but when I tried rounding the transformed floating point coordinates to the nearest pixel integer as integer type (ie "int(round(eachcoordinate))") before rendering them I got a 6.5x rendering speedup (650%) compared to the most common list container--a well-worth and also easy optimization. Surprisingly, the array container type turns out to be about 25% faster than tuples when the renderer doesnt have to worry about rounding numbers. This prerounding leads to no loss of visual details that I could see, because these floating points can only be assigned to one pixel anyway, and might be the reason why preconverting/prerounding the coordinates before sending them off to the aggdraw renderer speeds up the process bc then aggdraw doesnt have to. A potential caveat is that it could be that taking away the decimal information changes how aggdraw does its anti-aliasing but in my opinion the final map still looks equally anti-aliased and smooth. Finally, this rounding optimization must be weighed against the time it would take to round the numbers in Python, but from what I can see the time it takes to do prerounding does not outweigh the benefits of the rendering speedup. Further optimization should be explored for how to round and convert the coordinates in a fast way.
### Coordinates as INTEGERS (rounded to pixels)
### Pure rendering time (seconds) for 20 000 polygons from the GADM dataset
arrays
1.40970077294
tuples
2.19892537074
lists
6.70839555276
numpy
6.47806400659
deque
7.57472232757
In conclusion then: arrays and tuples are the fastest container types to use when providing aggdraw (and possibly also PIL?) with drawing coordinates.
Given the hefty rendering speeds that can be obtained when using the correct input type with aggdraw, it becomes particularly crucial and rewarding to find even the slightest optimizations for other aspects of the map rendering process, such as coordinate transformation routines (I am already exploring and finding for instance that Numpy is particularly fast for such purposes).
An more general finding from all of this is that Python can potentially be used for very fast map rendering applications and thus further opens the possibilities for Python geospatial scripting; e.g. the entire GADM dataset of 200 000+ provinces can theoretically be rendered in about 1.5*10=15 seconds without thinking about coordinate to image coordinate transformation, which is way faster than QGIS and even ArcGIS which in my experience struggles with displaying the GADM dataset.
All results were obtained on a 8-core processor, 2-year old Windows 7 machine, using Python 2.6.5. Whether these results are also the most efficient when it comes to loading and/or processing the data is a question that has to be tested and answered in another post. It would be interesting to hear if someone else already have any good insights on these aspects.

Logging an unknown number of floats in a python C extension

I'm using python to set up a computationally intense simulation, then running it in a custom built C-extension and finally processing the results in python. During the simulation, I want to store a fixed-length number of floats (C doubles converted to PyFloatObjects) representing my variables at every time step, but I don't know how many time steps there will be in advance. Once the simulation is done, I need to pass back the results to python in a form where the data logged for each individual variable is available as a list-like object (for example a (wrapper around a) continuous array, piece-wise continuous array or column in a matrix with a fixed stride).
At the moment I'm creating a dictionary mapping the name of each variable to a list containing PyFloatObject objects. This format is perfect for working with in the post-processing stage but I have a feeling the creation stage could be a lot faster.
Time is quite crucial since the simulation is a computationally heavy task already. I expect that a combination of A. buying lots of memory and B. setting up your experiment wisely will allow the entire log to fit in the RAM. However, with my current dict-of-lists solution keeping every variable's log in a continuous section of memory would require a lot of copying and overhead.
My question is: What is a clever, low-level way of quickly logging gigabytes of doubles in memory with minimal space/time overhead, that still translates to a neat python data structure?
Clarification: when I say "logging", I mean storing until after the simulation. Once that's done a post-processing phase begins and in most cases I'll only store the resulting graphs. So I don't actually need to store the numbers on disk.
Update: In the end, I changed my approach a little and added the log (as a dict mapping variable names to sequence types) to the function parameters. This allows you to pass in objects such as lists or array.arrays or anything that has an append method. This adds a little time overhead because I'm using the PyObject_CallMethodObjArgs function to call the Append method instead of PyList_Append or similar. Using arrays allows you to reduce the memory load, which appears to be the best I can do short of writing my own expanding storage type. Thanks everyone!

You might want to consider doing this in Cython, instead of as a C extension module. Cython is smart, and lets you do things in a pretty pythonic way, even though it at the same time lets you use C datatypes and python datatypes.
Have you checked out the array module? It allows you to store lots of scalar, homogeneous types in a single collection.
If you're truly "logging" these, and not just returning them to CPython, you might try opening a file and fprintf'ing them.
BTW, realloc might be your friend here, whether you go with a C extension module or Cython.

This is going to be more a huge dump of ideas rather than a consistent answer, because it sounds like that's what you're looking for. If not, I apologize.
The main thing you're trying to avoid here is storing billions of PyFloatObjects in memory. There are a few ways around that, but they all revolve on storing billions of plain C doubles instead, and finding some way to expose them to Python as if they were sequences of PyFloatObjects.
To make Python (or someone else's module) do the work, you can use a numpy array, a standard library array, a simple hand-made wrapper on top of the struct module, or ctypes. (It's a bit odd to use ctypes to deal with an extension module, but there's nothing stopping you from doing it.) If you're using struct or ctypes, you can even go beyond the limits of your memory by creating a huge file and mmapping in windows into it as needed.
To make your C module do the work, instead of actually returning a list, return a custom object that meets the sequence protocol, so when someone calls, say, foo.getitem(i) you convert _array[i] to a PyFloatObject on the fly.
Another advantage of mmap is that, if you're creating the arrays iteratively, you can create them by just streaming to a file, and then use them by mmapping the resulting file back as a block of memory.
Otherwise, you need to handle the allocations. If you're using the standard array, it takes care of auto-expanding as needed, but otherwise, you're doing it yourself. The code to do a realloc and copy if necessary isn't that difficult, and there's lots of sample code online, but you do have to write it. Or you may want to consider building a strided container that you can expose to Python as if it were contiguous even though it isn't. (You can do this directly via the complex buffer protocol, but personally I've always found that harder than writing my own sequence implementation.) If you can use C++, vector is an auto-expanding array, and deque is a strided container (and if you've got the SGI STL rope, it may be an even better strided container for the kind of thing you're doing).
As the other answer pointed out, Cython can help for some of this. Not so much for the "exposing lots of floats to Python" part; you can just move pieces of the Python part into Cython, where they'll get compiled into C. If you're lucky, all of the code that needs to deal with the lots of floats will work within the subset of Python that Cython implements, and the only things you'll need to expose to actual interpreted code are higher-level drivers (if even that).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.