I would like to create numpy.ndarray objects that hold complex integer values in them. NumPy does have complex support built-in, but for floating-point formats (float and double) only; I can create an ndarray with dtype='cfloat', for example, but there is no analogous dtype='cint16'. I would like to be able to create arrays that hold complex values represented using either 8- or 16-bit integers.
I found this mailing list post from 2007 where someone inquired about such support. The only workaround they recommended involved defining a new dtype that holds pairs of integers. This seems to represent each array element as a tuple of two values, but it's not clear what other work would need to be done in order to make the resulting data type work seamlessly with arithmetic functions.
I also considered another approach based on registration of user-defined types with NumPy. I don't have a problem with going to the C API to set this up if it will work well. However, the documentation for the type descriptor strucure seems to suggest that the type's kind field only supports signed/unsigned integer, floating-point, and complex floating-point numeric types. It's not clear that I would be able to get anywhere trying to define a complex integer type.
What are some recommendations for an approach that may work?
Whatever scheme I select, it must be amenable to wrapping of existing complex integer buffers without performing a copy. That is, I would like to be able to use PyArray_SimpleNewFromData() to expose the buffer to Python without having to make a copy of the buffer first. The buffer would be in interleaved real/imaginary format already, and would either be an array of int8_t or int16_t.
I also deal with lots of complex integer data, generally basebanded data.
I use
dtype = np.dtype([('re', np.int16), ('im', np.int16)])
It's not perfect, but it adequately describes the data. I use it for loading into memory without doubling the size of the data. It also has the advantage of being able to load and store transparently with HDF5.
DATATYPE H5T_COMPOUND {
H5T_STD_I16LE "re";
H5T_STD_I16LE "im";
}
Using it is straightforward and is just different.
x = np.zeros((3,3),dtype)
x[0,0]['re'] = 1
x[0,0]['im'] = 2
x
>> array([[(1, 2), (0, 0), (0, 0)],
>> [(0, 0), (0, 0), (0, 0)],
>> [(0, 0), (0, 0), (0, 0)]],
>> dtype=[('re', '<i2'), ('im', '<i2')])
To do math with it, I convert to a native complex float type. The obvious approach doesn't work, but it's also not that hard.
y = x.astype(np.complex64) # doesn't work, only gets the real part
y = x['re'] + 1.j*x['im'] # works, but slow and big
y = x.view(np.int16).astype(np.float32).view(np.complex64)
y
>> array([[ 1.+2.j, 0.+0.j, 0.+0.j],
>> [ 0.+0.j, 0.+0.j, 0.+0.j],
>> [ 0.+0.j, 0.+0.j, 0.+0.j]], dtype=complex64)
This last conversion approach was inspired by an answer to What's the fastest way to convert an interleaved NumPy integer array to complex64?
Consider using matrices of the form [[a,-b],[b,a]] as a stand-in for the complex numbers.
Ordinary multiplication and addition of matrices corresponds to addition an multiplication of complex numbers (this subring of the collection of 2x2 matrices is isomorphic to the complex numbers).
I think Python can handle integer matrix algebra.
Python, and hence NumPy, does support complex numbers. If you want complex integers, just use np.round or ignore the decimal part.
For example,
import numpy as np
# Create 100 complex numbers in a 1D array
a = 100*np.random.sample(100) + (100*np.random.sample(100)*1j)
# Reshape to a 2D array
np.round(a)
a.reshape(10, 10)
# Get the real and imaginary parts of a couple of x/y points as integers
print int(a[1:2].real)
print int(a[3:4].imag)
Related
I have an array of Cartesian coordinates
xy = np.array([[0,0], [2,3], [3,4], [2,5], [5,2]])
which I want to convert into an array of complex numbers representing the same:
c = np.array([0, 2+3j, 3+4j, 2+5j, 5+2j])
My current solution is this:
c = np.sum(xy * [1,1j], axis=1)
This works but seems crude to me, and probably there is a nicer version with some built-in magic using np.complex() or similar, but the only way I found to use this was
c = np.array(list(map(lambda c: np.complex(*c), xy)))
This doesn't look like an improvement.
Can anybody point me to a better solution, maybe using one of the many numpy functions I don't know by heart (is there a numpy.cartesian_to_complex() working on arrays I haven't found yet?), or maybe using some implicit conversion when applying a clever combination of operators?
Recognize that complex128 is just a pair of floats. You can then do this using a "view" which is free, after converting the dtype from int to float (which I'm guessing your real code might already do):
xy.astype(float).view(np.complex128)
The astype() converts the integers to floats, which requires construction of a new array, but once that's done the view() is "free" in terms of runtime.
The above gives you shape=(n,1); you can np.squeeze() it to remove the extra dimension. This is also just a view operation, so takes basically no time.
How about
c=xy[:,0]+1j*xy[:,1]
xy[:,0] will give an array of all elements in the 0th column of xy and xy[:,1] will give that of the 1st column.
Multiply xy[:,1] with 1j to make it imaginary and then add the result with xy[:,0].
I have a dataset on which I'm trying to apply some arithmetical method.
The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.
The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.
x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890
vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])
I have off course checked their types:
type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True
How can I compute this large numbers using numpy without seeing them turned into zeros?
Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):
x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549
vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
-224960293581823801, -368934881474191033])
The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?
As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation.
So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.
In [14]: x = np.array([18,30,31,31,15], dtype=object)
In [15]: 150**x
Out[15]:
array([1477891880035400390625000000000000000000L,
191751059232884086668491363525390625000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
437893890380859375000000000000000L], dtype=object)
In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.
But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.
Just for completeness, if precision is not an issue, you can also use floats:
In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)
In [20]: 150**x
Out[20]:
array([ 1.47789188e+39, 1.91751059e+65, 2.87626589e+67,
2.87626589e+67, 4.37893890e+32])
150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).
Python may be using an arbitrary length integer implementation, but NumPy doesn't.
As you deduced correctly, negative numbers are a symptom of an int overflow.
So I have always created numpy arrays like that:
>>> u = np.zeros( 10, int )
>>> v = np.zeros( 10, float )
I have always been oblivious about maximum permitted values until now. I have been assuming that it would simply work. If it didn't, I would get OverflowError, and then I would find some workaround like taking the logarithm.
But recently I started to use the other dtypes:
>>> v8 = np.zeros( 10, np.uint8 )
>>> v8[0] = 2 ** 8 - 1
>>> v8[1] = 2 ** 8
>>> v8
>>> array([255, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
Ok so I don't get any warning when I assign a value bigger than 255. That's a bit scary.
So my questions are:
when I used arrays with types int and float, is it possible that I set a value that was too big (resulting in completely wrong calculations) without knowing it?
if I want to use uint8, do I have to manually check all assigned values are in [ 0, 255 ]?
numpy works very deep at the machine level. Tests are time consuming and so, testing is left to the developer. Python is much more high-level and many test are done automatically or, in the case of ints, ints can have arbitrary large values. Everywhere you have to decide between speed and security. numpy is farther on the speed side.
In situations, where it is necessary to test the range of values, you have to check it by yourself.
The clip-method may help you:
>>> u = np.array([124,-130, 213])
>>> u.astype('b')
array([124, 126, -43], dtype=int8)
>>> u.clip(-128,127).astype('b')
array([ 124, -128, 127], dtype=int8)
As explained in the other answers, too large values get 'wrapped around', so you need to clip them by hand to the minimum and maximum allowed values before converting. For integers, these limits can be obtained using np.iinfo. You could write your own utility function to do this conversion in a safe way for a given dtype:
def safe_convert(x, new_dtype):
info = np.iinfo(new_dtype)
return x.clip(info.min, info.max).astype(new_dtype)
Quick test:
In [31]: safe_convert(np.array([-1,0,1,254,255,256]), np.uint8)
Out[31]: array([ 0, 0, 1, 254, 255, 255], dtype=uint8)
In [32]: safe_convert(np.array([-129,-128,-127,126,127,128]), np.int8)
Out[32]: array([-128, -128, -127, 126, 127, 127], dtype=int8)
Yes, uint8 will mask your values (take the 8 lsb), so you need to manually check it:
>>> a = numpy.uint8(256)
>>> a
0
And yes, overflow can occur without you realizing it. It's a common source of error in many programming languages. However, long integers in python behave in an uncommon way: They have no explicitly defined limit.
I've written about it in this answer.
As already explained, numpy wraps around to avoid doing checks.
If clipping is not acceptable, before you cast, you can use numpy.min_scalar_type to get the minimum dtype that will hold your data without loosing data.
Also note that practically the only reason to use uint8 is to save memory in very big arrays, as the computation speed is usually roughly the same (in some operations will be internally casted upwards, even). If your arrays are not too big so that the memory is not a big concern, you should be safer and use uint16 or even uint32 for intermediate computations. If memory is your problem, you should consider moving to out of core storage, like PyTables; if you are now about to fill the memory, maybe with a bigger dataset not even uint8 will be enough.
Can anyone explain the following? I'm using Python 2.5
Consider 1*3*5*7*9*11 ... *49. If you type all that from within IPython(x,y) interactive console, you'll get 58435841445947272053455474390625L, which is correct. (why odd numbers: just the way I did it originally)
Python multiply.reduce() or prod() should yield the same result for the equivalent range. And it does, up to a certain point. Here, it is already wrong:
: k = range(1, 50, 2)
: multiply.reduce(k)
: -108792223
Using prod(k) will also generate -108792223 as the result. Other incorrect results start to appear for equivalent ranges of length 12 (that is, k = range(1,24,2)).
I'm not sure why. Can anyone help?
This is because numpy.multiply.reduce() converts the range list to an array of type numpy.int32, and the reduce operation overflows what can be stored in 32 bits at some point:
>>> type(numpy.multiply.reduce(range(1, 50, 2)))
<type 'numpy.int32'>
As Mike Graham says, you can use the dtype parameter to use Python integers instead of the default:
>>> res = numpy.multiply.reduce(range(1, 50, 2), dtype=object)
>>> res
58435841445947272053455474390625L
>>> type(res)
<type 'long'>
But using numpy to work with python objects is pointless in this case, the best solution is KennyTM's:
>>> import functools, operator
>>> functools.reduce(operator.mul, range(1, 50, 2))
58435841445947272053455474390625L
The CPU doesn't multiply arbitrarily large numbers, it only performs specific operations defined on particular ranges of numbers represented in base 2, 0-1 bits.
Python '*' handles large integers perfectly through a proper representation and special code beyond the CPU or FPU instructions for multiply.
This is actually unusual as languages go.
In most other languages, usually a number is represented as a fixed array of bits. For example in C or SQL you could choose to have an 8 bit integer that can represent 0 to 255, or -128 to +127 or you could choose to have a 16 bit integer that can represent up to 2^16-1 which is 65535. When there is only a range of numbers that can be represented, going past the limit with some operation like * or + can have an undesirable effect, like getting a negative number. You may have encountered such a problem when using the external library which is probably natively C and not python.
I wish to implement a 2d bit map class in Python. The class would have the following requirements:
Allow the creating of arbitrarily sized 2d bitmaps. i.e. to create an 8 x 8 bitmap (8 bytes), something like:
bitmap = Bitmap(8,8)
provide an API to access the bits in this 2d map as boolean or even integer values, i.e.:
if bitmap[1, 2] or bitmap.get(0, 1)
Able to retrieve the data as packed Binary data. Essentially it would be each row of the bit map concatenated and returned as Binary data. It may be padded to the nearest byte or something similar.
bitmap.data()
Be able to create new maps from the binary data retrieved:
new_bitmap = Bitmap(8, 8, bitmap.data())
I know Python is able to perform binary operations, but I'd like some suggestions as how best to use them to implement this class.
Bit-Packing numpy ( SciPY ) arrays does what you are looking for.
The example shows 4x3 bit (Boolean) array packed into 4 8-bit bytes. unpackbits unpacks uint8 arrays into a Boolean output array that you can use in computations.
>>> a = np.array([[[1,0,1],
... [0,1,0]],
... [[1,1,0],
... [0,0,1]]])
>>> b = np.packbits(a,axis=-1)
>>> b
array([[[160],[64]],[[192],[32]]], dtype=uint8)
If you need 1-bit pixel images, PIL is the place to look.
No need to create this yourself.
Use the very good Python Imaging Library (PIL)