What's the difference between casting and coercion in Python?

What's the difference between casting and coercion in Python? - python

In the Python documentation and on mailing lists I see that values are sometimes "cast", and sometimes "coerced".

Cast is explicit. Coerce is implicit.
The examples in Python would be:
cast(2, POINTER(c_float)) #cast
1.0 + 2 #coerce
1.0 + float(2) #conversion
Cast really only comes up in the C FFI. What is typically called casting in C or Java is referred to as conversion in python, though it often gets referred to as casting because of its similarities to those other languages. In pretty much every language that I have experience with (including python) Coercion is implicit type changing.

I think "casting" shouldn't be used for Python; there are only type conversion, but no casts (in the C sense). A type conversion is done e.g. through int(o) where the object o is converted into an integer (actually, an integer object is constructed out of o). Coercion happens in the case of binary operations: if you do x+y, and x and y have different types, they are coerced into a single type before performing the operation. In 2.x, a special method __coerce__ allows object to control their coercion.

Related

How does Python tell Int from Double

So in our lecture slide on assembly we had:
High-level language data types of C, A, and B determine the correct
circuit from among several choices (integer, floating point) to use to
perform “+” operation
Now in languages like Python, I do not specify the type of the variable. I was wondering how does the language compiles (interprets, I think is what it does) down into assembly and chooses the right circuit?
Thank you

At the interpreter level it's fairly easy to tell the difference between an integer (34), a floating point number (34.24), and a string ("Thirty-Four"). The full list of types can be seen at https://docs.python.org/3/library/stdtypes.html .
Once the type is known, it's easy to tell what operation is needed. A separate function (__add__) is defined for each class, and the interpreter (written in C for standard Python) will do the arithmetic. C is typed and it's (comparatively) easy for the compiler to be translated to machine code.

Every Python variable is a reference to an object. That object includes the type information of the variable. For instance, just walk through a few of the possibilities as we repeatedly reassign the value and type of x "on the fly":
for x in [1, 1.0, "1", [1]]:
print(x, type(x))
Output:
1 <class 'int'>
1.0 <class 'float'>
1 <class 'str'>
[1] <class 'list'>
If you're wondering how Python can tell that 1 is an int and 1.0 is a float, that's obvious from the input string. A language processor typically contains a tokenizer that can discriminate language tokens, and another module that interprets those tokens within the language syntax. int and float objects have different token formats ... as do strings, punctuation, identifiers, and any other language elements.
If you want to learn more about that level of detail, research how to parse a computer language: most of the techniques are applicable to most languages.

As n.m. commented below your post, variables do not have a type in Python. Values do.
As far as how integer vs float is determined when you type the following:
x = 1.5
y = 2
This is determined during the parsing stage. Compiled and interpreted languages actually start off in the same manner.
The general flow when code is sent to an interpreter/compiler is as follows:
[source code] --> lexical analyzer --> [tokens] --> parser --> [abstract syntax tree] -->
The parser step examines tokens like 'x' '=' '1.5' and looks for patterns which indicate different types of literals like ints, floats, and strings. By the time the actual interpreter/compiler gets the abstract syntax tree (tree representation of your program), it already knows that the value stored in x (1.5) is a float.
So just to be clear, this part of the process is conceptually the same for intepreters and compilers.

Cython: when should I define a string as char*, str, or bytes?

When defining a variable type that will hold a string in Cython + Python 3, I can use (at least):
cdef char* mystring = "foo"
cdef str mystring = "foo"
cdef bytes mystring = "foo"
The documentation page on strings is unclear on this -- it mostly gives examples using char* and bytes, and frankly I'm having a lot of difficulty understanding it.
In my case the strings will be coming from a Python3 program and are assumed to be unicode. They will be used as dict keys and function arguments, but I will do no further manipulation on them. Needless to say I am trying to maximize speed.
This question suggests that under Python2.7 and without Unicode, typing as str makes string manipulation code run SLOWER than with no typing at all. (But that's not necessarily relevant here since I won't be doing much string manipulation.)
What are the advantages and disadvantages of each of these options?

If there is no further processing done on a particular type, it would be best and fastest to not type them at all, which means they are treated as a general purpose PyObject *.
The str type is a special case which means bytes on Python 2 and unicode on Python 3.
The str type is special in that it is the byte string in Python 2 and the Unicode string in Python 3
So code that types a string as str and handles it as unicode will break on python 2 where str means bytes.
Strings only need to be typed if they are to be converted to C char* or C++ std::string. There, you would use str to handle py2/py3 compatibility, along with helper functions to convert to/from bytes and unicode in order to be able to convert to either char* or std::string.
Typing of strings is for interoperability with C/C++, not for speed as such. Cython will auto-convert, without copying, a bytes string to a char* for example when it sees something like cdef char* c_string = b_string[:b_len] where b_string is a bytes type.
OTOH, if strings are typed without that type being used, Cython will do a conversion from object to bytes/unicode when it does not need to which leads to overhead.
This can be seen in the C code generated as Pyx_PyObject_AsString, Pyx_PyUnicode_FromString et al.
This is also true in general - the rule of thumb is if a specific type is not needed for further processing/conversion, best not to type it at all. Everything in python is an object so typing will convert from the general purpose PyObject* to something more specific.

Some quick testing revealed that for this particular case, only the str declaration worked -- all other options produced errors. Since the string is generated elsewhere in Python3, evidently the str type declaration is needed.
Whether it is faster not to make any declaration at all remains an open question.

How could I determine what type python cast to?

is it always casting result to float if there is one( or more ) element in calculation?
for example:
1*1.0 # float
from fractions import Fraction
Fraction(1) * 1.0 # float

I'd say it does always cast to float if one of arguments is float, because float is most common number type. So if you do some operation (multiplication in your case) with float and other number interpreter can be sure it's posible to convert other number to float (doesn't mater is it int, Fraction or even Bool) without losing any information, but it can't do it other way around.
Here is link (https://docs.python.org/2.4/lib/typesnumeric.html)
It suggests obvious thing i forgot. Complex numbers are more commont than Float. So if you will try to multiply float to complex. you will get complex.
"Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the ``narrower'' type is widened to that of the other, where plain integer is narrower than long integer is narrower than floating point is narrower than complex. Comparisons between numbers of mixed type use the same rule."

No, it works because the Fraction class implements a __mul__ (and probably __rmul__) method used to define the result of a multiplication with an other object (a float in your case).
That's called operator overloading.
EDIT: I don't understand why I got downvoted. According to your example, the answer is No, you don't always get a float, it depends of the other object.
For example, you can see in the Python Fraction source code the implementation of __mul__ and __rmul__ : https://hg.python.org/cpython/file/tip/Lib/fractions.py#l421 and the function which does the computing : https://hg.python.org/cpython/file/tip/Lib/fractions.py#l377 .

In Python, how can I translate (1+(int)&x)?

This question is a follow-up of this one. In Sun's math library (in C), the expression
*(1+(int*)&x)
is used to retrieve the high word of the floating point number x. Here, the OS is assumed 64-bit, with little-endian representation.
I am thinking how to translate the C expression above into Python? The difficulty here is how to translate the '&', and '*' in the expression. Btw, maybe Python has some built-in function that retrieves the high word of a floating point number?

You can do this more easily with struct:
high_word = struct.pack('<d', x)[4:8]
return struct.unpack('<i', high_word)[0]
Here, high_word is a bytes object (or a str in 2.x) consisting of the four most significant bytes of x in little endian order (using IEEE 64-bit floating point format). We then unpack it back into a 32-bit integer (which is returned in a singleton tuple, hence the [0]).
This always uses little-endian for everything, regardless of your platform's underlying endianness. If you need to use native endianness, replace the < with = (and use > or ! to force big endian). It also guarantees 64-bit doubles and 32-bit ints, which C does not. You can remove that guarantee as well, but there is no good reason to do so since it makes your question nonsensical.
While this could be done with pointer arithmetic, it would involve messing around with ctypes and the conversion from Python float to C float would still be relatively expensive. The struct code is much easier to read.

Force type conversions using Python C-API

I'm wondering whether it is possible to use Python's C-API to convert e.g. Float -> String, or Bool -> Int, or Int -> Bool etc
Not every conversion makes sense, for example Dict -> List -- it would be hard to imagine what this would look like.
However, most conversions DO make sense.
Float -> Int would just round to the nearest integer
Float -> Bool would evaluate True non-zero
Float -> String would give a string representation (but to what precision?)
Float -> Dict probably doesn't make sense
So my question is, is there any way to force these type conversions using the C-API?
I understand that Python's preferred way of doing things is to handle the type internally. So you do:
x = 1 + 2.5
And it is smart enough to know that Long + Float -> Float
However, I'm writing a Python/C++ bridge where I have Long Float Complex Dict etc types, and I want to be able to do:
// initialise a Float object with an object of unknown type
Float myFloat = somePyObjectPointer;
So initialising it with an integer would work, a string "3.14" would work, "foo" wouldn't, a Dict wouldn't, etc.
I would like to use Python's machinery for "what plugs into what" rather than build a large amount of C++ machinery to do it manually. Rationale is (1) less C++, (2) if CPython changes functionality I don't want to be out of sync.

Float -> Int
PyNumber_Int()
Float -> Bool
No direct conversion, but PyObject_IsTrue() and PyBool_FromLong()
Float -> String
PyObject_Repr() or PyObject_Str()
Dict -> List
PyObject_CallMethod(..., "keys"/"values"/"items", NULL)
x = 1 + 2.5
PyNumber_Add()
Float -> Dict
Yeah... no.
Anything you could care about, it's in there... somewhere.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.