Type casting in Python 2.7 - python

How do I cast a float into long in Python 2.7?
I'm doing the same in Python 2.3 like this:
from array import*
data = array('L',[12.34])
print data
which prints out:
array('L',[12L])
How do I do the same in Python 2.7?

Maybe like this?
>>> long(12.34)
12L

These days, it is much more common to see numpy arrays than arrays from the array module, however, your array can be constructed as:
>>> from array import *
>>> array('L',map(long,[12.34]))
array('L', [12L])
with numpy, it could be done as:
>>> import numpy as np
>>> np.array([12.34],dtype=long)
However, this doesn't actually create python longs, it actually creates an array of np.int64 integers (8 byte ints -- Not arbitrary precision like python long).

lst = [1.1,2.2]
data = map(long,lst)

Related

How encode numpy arrays in string of minimal length?

I have around ten variables (variables / arrays / symmetric matrices) which i want to get through an url. Because i will use a a rest api there is a limit on the size of the url so i need to encode it in a string of minimal length and encrypt. Any idea ? I've always supposed that's how google or other website transmit information sometimes when the adress is downright initelligible
My original idea was to encode all numbers in scientific notation and use separators (2.4e14__3.1e12_2.5e10_ for example to pass a number 2.4e14 and a array [3.1e12_2.5e10]) and encode this string. Possibly use another base (base with numbers + letters) for futher concatenation but i'm not sure how i can save so much string space.
Maybe there's an existing library or technique ? i didn't find it.
Pickle and base64 will do the job nicely. Your floating point numbers remain as binary, not converted through ascii.
>>> import numpy as np
>>> a = np.array([0,1,2])
>>> import pickle
>>> import base64
>>> b64 = base64.b64encode(pickle.dumps(a))
At the other end
>>> n = pickle.loads(base64.b64decode(b64))
>>> print(n)
array([0, 1, 2])
However, this won't be the shortest representation possible. Sufficient information to fully reconstruct the object is transmitted. If it is short enough it is the most easily extended and modified option.
You can convert a numpy object to python list.Then convert list to a json string.
>>> import numpy as np
>>> import json
>>> a = np.array([0,1,2])
>>> b = a.tolist()
>>> c = json.dumps(b)
Similarly,you can convert json string to numpy by: json string->list->numpy
>>> d = np.array(json.loads(c))

Copying internal formats float64 uint64

I'm using Numpy and Python. I need to copy data, WITHOUT numeric conversion between np.uint64 and np.float64, e.g. 1.5 <-> 0x3ff8000000000000.
I'm aware of float.hex, but the output format a long way from uint64:
In [30]: a=1.5
In [31]: float.hex(a)
Out[31]: '0x1.8000000000000p+0'
Im also aware of various string input routines for the other way.
Can anybody suggest more direct methods? After all, its just simple copy and type change but python/numpy seem really rigid about converting the data on the way.
Use an intermediate array and the frombuffer method to "cast" one array type into the other:
>>> v = 1.5
>>> fa = np.array([v], dtype='float64')
>>> ua = np.frombuffer(fa, dtype='uint64')
>>> ua[0]
4609434218613702656 # 0x3ff8000000000000
Since frombuffer creates a view into the original buffer, this is efficient even for reinterpreting data in large arrays.
So, what you need is to see the 8 bytes that represent the float64 in memory as an integer number. (representing this int64 number as an hexadecimal string is another thing - it
is just its representation).
The Struct and Union functionality that comes bundled with the stdlib's ctypes
may be nice for you - no need for numpy. It has a Union type that works
quite like C language unions, and allow you to do this:
>>> import ctypes
>>> class Conv(ctypes.Union):
... _fields_ = [ ("float", ctypes.c_double), ("int", ctypes.c_uint64)]
...
>>> c = Conv()
>>> c.float = 1.5
>>> print hex(c.int)
0x3ff8000000000000L
The built-in "hex" function is a way to get the hexadecimal representation of the number.
You can use the struct module as well: pack the number to a string as a double, and unpack it as int. I think it is both less readable and less efficient than using ctypes Union:
>>> inport struct
>>> hex(struct.unpack("<Q", struct.pack("<d", 1.5))[0])
'0x3ff8000000000000'
Since you are using numpy , however, you can simply change the array type, "on the fly", and manipulate all the array as integers with 0 copy:
>>> import numpy
>>> x = numpy.array((1.5,), dtype=numpy.double)
>>> x[0]
1.5
>>> x.dtype=numpy.dtype("uint64")
>>> x[0]
4609434218613702656
>>> hex(x[0])
'0x3ff8000000000000L'
This is by far the most efficient way of doing it, whatever is your purpose in getting the raw bytes of the float64 numbers.

summing over a list of int overflow(?) python

Let's consider a list of large integers, for example one given by:
def primesfrom2to(n):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
""" Input n>=6, Returns a array of primes, 2 <= p < n """
sieve = np.ones(n/3 + (n%6==2), dtype=np.bool)
sieve[0] = False
for i in xrange(int(n**0.5)/3+1):
if sieve[i]:
k=3*i+1|1
sieve[ ((k*k)/3) ::2*k] = False
sieve[(k*k+4*k-2*k*(i&1))/3::2*k] = False
return np.r_[2,3,((3*np.nonzero(sieve)[0]+1)|1)]
primesfrom2to(2000000)
I want to calculate the sum of that, and the expected result is 142913828922.
But if I do:
sum(primesfrom2to(2000000))
I get 1179908154, which is clearly wrong. The problem is that I have an int overflow, but I don't understand why. Let's me explain.Consider this testing code:
a=primesfrom2to(2000000)
b=[float(i) for i in a]
c=[long(i) for i in a]
sumI=0
sumF=0
sumL=0
m=0
for i,j,k in zip(a,b,c):
m=m+1
sumI=sumI+i
sumF=sumF+j
sumL=sumL+k
print sumI,sumF,sumL
if sumI<0:
print i,m
break
I found out that the first integer overflow is happening at a[i=20444]=225289
If I do:
>>> sum(a[:20043])+225289
-2147310677
But if I do:
>>> sum(a[:20043])
2147431330
>>> 2147431330+225289
2147656619L
What's happening? Why such a different behaviour? Why can't sum switch automatically to long type and give the correct result?
Look at the types of your results. You are summing a numpy array, which is using numpy datatypes, which can overflow. When you do sum(a[:20043]), you get a numpy object back (some sort of int32 or the like), which overflows when added to another number. When you manually type in the same number, you're creating a Python builtin int, which can auto-promote to long. Numpy arrays cannot autopromote like Python builtin types, because the array type (and its memory layout) have to be fixed when the array is created. This makes operations much faster at the expense of type flexibility.
You may be able to get around the problem by using a different datatype (like np.int64) instead of np.bool. However, it depends how big your numbers are. A simple example:
# Python types ok
>>> 2**62
4611686018427387904L
>>> 2**63
9223372036854775808L
# numpy types overflow
>>> np.int64(2)**62
4611686018427387904
>>> np.int64(2)**63
-9223372036854775808
Your example works correctly for me on 64-bit Python, so I guess you're using 32-bit Python. If you can use 64-bit types you will be able to get past the limit you found, but as my example shows you will eventually overflow 64-bit ints too if your numbers get super huge.

Why is numpy array's .tolist() creating long doubles?

I have some math operations that produce a numpy array of results with about 8 significant figures. When I use tolist() on my array y_axis, it creates what I assume are 32-bit numbers.
However, I wonder if this is just garbage. I assume it is garbage, but it seems intelligent enough to change the last number so that rounding makes sense.
print "y_axis:",y_axis
y_axis = y_axis.tolist()
print "y_axis:",y_axis
y_axis: [-0.99636686 0.08357361 -0.01638707]
y_axis: [-0.9963668578012771, 0.08357361233570479, -0.01638706796138937]
So my question is: if this is not garbage, does using tolist actually help in accuracy for my calculations, or is Python always using the entire number, but just not displaying it?
When you call print y_axis on a numpy array, you are getting a truncated version of the numbers that numpy is actually storing internally. The way in which it is truncated depends on how numpy's printing options are set.
>>> arr = np.array([22/7, 1/13]) # init array
>>> arr # np.array default printing
array([ 3.14285714, 0.07692308])
>>> arr[0] # int default printing
3.1428571428571428
>>> np.set_printoptions(precision=24) # increase np.array print "precision"
>>> arr # np.array high-"precision" print
array([ 3.142857142857142793701541, 0.076923076923076927347012])
>>> float.hex(arr[0]) # actual underlying representation
'0x1.9249249249249p+1'
The reason it looks like you're "gaining accuracy" when you print out the .tolist()ed form of y_axis is that by default, more digits are printed when you call print on a list than when you call print on a numpy array.
In actuality, the numbers stored internally by either a list or a numpy array should be identical (and should correspond to the last line above, generated with float.hex(arr[0])), since numpy uses numpy.float64 by default, and python float objects are also 64 bits by default.
My understanding is that numpy is not showing you the full precision to make the matrices lay out consistently. The list shouldn't have any more precision than its numpy.array counterpart:
>>> v = -0.9963668578012771
>>> a = numpy.array([v])
>>> a
array([-0.99636686])
>>> a.tolist()
[-0.9963668578012771]
>>> a[0] == v
True
>>> a.tolist()[0] == v
True

Show an array in format of scientific notation

I would like to show my results in scientific notation (e.g., 1.2e3). My data is in array format. Is there a function like tolist() that can convert the array to float so I can use %E to format the output?
Here is my code:
import numpy as np
a=np.zeros(shape=(5,5), dtype=float)
b=a.tolist()
print a, type(a), b, type(b)
print '''%s''' % b
# what I want is
print '''%E''' % function_to_float(a or b)
If your version of Numpy is 1.7 or greater, you should be able to use the formatter option to numpy.set_printoptions. 1.6 should definitely work -- 1.5.1 may work as well.
import numpy as np
a = np.zeros(shape=(5, 5), dtype=float)
np.set_printoptions(formatter={'float': lambda x: format(x, '6.3E')})
print a
Alternatively, if you don't have formatter, you can create a new array whose values are formatted strings in the format you want. This will create an entirely new array as big as your original array, so it's not the most memory-efficient way of doing this, but it may work if you can't upgrade numpy. (I tested this and it works on numpy 1.3.0.)
To use this strategy to get something similar to above:
import numpy as np
a = np.zeros(shape=(5, 5), dtype=float)
formatting_function = np.vectorize(lambda f: format(f, '6.3E'))
print formatting_function(a)
'6.3E' is the format you want each value printed as. You can consult the this documentation for more options.
In this case, 6 is the minimum width of the printed number and 3 is the number of digits displayed after the decimal point.
You can format each of the elements of an array in scientific notation and then display them as you'd like. Lists cannot be converted to floats, they have floats inside them potentially.
import numpy as np
a = np.zeroes(shape=(5, 5), dtype=float)
for e in a.flat:
print "%E" % e
or
print ["%E" % e for e in a.flat]

Categories