Just starting out with Python, so this is probably my mistake, but...
I'm trying out Python. I like to use it as a calculator, and I'm slowly working through some tutorials.
I ran into something weird today. I wanted to find out 2013*2013, but I wrote the wrong thing and wrote 2013*013, and got this:
>>> 2013*013
22143
I checked with my calculator, and 22143 is the wrong answer! 2013 * 13 is supposed to be 26169.
Why is Python giving me a wrong answer? My old Casio calculator doesn't do this...
Because of octal arithmetic, 013 is actually the integer 11.
>>> 013
11
With a leading zero, 013 is interpreted as a base-8 number and 1*81 + 3*80 = 11.
Note: this behaviour was changed in python 3. Here is a particularly appropriate quote from PEP 3127
The default octal representation of integers is silently confusing to
people unfamiliar with C-like languages. It is extremely easy to
inadvertently create an integer object with the wrong value, because
'013' means 'decimal 11', not 'decimal 13', to the Python language
itself, which is not the meaning that most humans would assign to this
literal.
013 is an octal integer literal (equivalent to the decimal integer literal 11), due to the leading 0.
>>> 2013*013
22143
>>> 2013*11
22143
>>> 2013*13
26169
It is very common (certainly in most of the languages I'm familiar with) to have octal integer literals start with 0 and hexadecimal integer literals start with 0x. Due to the exact confusion you experienced, Python 3 raises a SyntaxError:
>>> 2013*013
File "<stdin>", line 1
2013*013
^
SyntaxError: invalid token
and requires either 0o or 0O instead:
>>> 2013*0o13
22143
>>> 2013*0O13
22143
Python's 'leading zero' syntax for octal literals is a common gotcha:
Python 2.7.3
>>> 010
8
The syntax was changed in Python 3.x http://docs.python.org/3.0/whatsnew/3.0.html#integers
This is mostly just expanding on #Wim's answer a bit, but Python indicates the base of integer literals using certain prefixes. Without a prefix, integers are interpreted as being in base-10. With an "0x", the integer will be interpreted as a hexadecimal int. The full grammar specification is here, though it's a bit tricky to understand if you're not familiar with formal grammars: http://docs.python.org/2/reference/lexical_analysis.html#integers
The table essentially says that if you want a long value (i.e. one that exceeds the capacity of a normal int), write the number followed by the letter "L" or "l"; if you want your number to be interpreted in decimal, write the number normally (with no leading 0); if you want it interpreted in octal, prefix it with "0", "0o", or "0O"; if you want it in hex, prefix it with "0x"; and if you want it in binary, prefix it with "0b" or "0B".
Related
I am confused about the leading 0 in 0b10101000:
It does not seem to be a sign symbol.
In [1]: bin(168)
Out[1]: '0b10101000'
In [2]: int(bin(168), 2)
Out[2]: 168
I assume it should be sufficient, and it would certainly be more succinct, to say b10101000.
Why is the leading 0 needed?
It's to not confuse binary literals with variables.
You can express numbers as literals in whatever base (0b -> binary, 0x -> hexadecimal for instance):
0b100
>>>4
0x100
>>>256
The problem arises when there isn't a leading 0. Python's naming convention for variables is that it must start with an alphabetical character. With the leading 0 the interpreter can tell if it's a literal or a variable.
It would be more succinct, but Python would interpret b10101000 as a variable name if you used it in code whereas it would interpret 0b10101000 as a binary number.
It would be confusing (to you, the programmer) if Python presented the value to you differently from the way it would expect you to present the value to it in code that you write.
Why do you have to use def instead of df or class instead of cls? That is what the language grammar dictates. Its enforced by language design.
When I type int("1.7") Python returns error (specifically, ValueError). I know that I can convert it to integer by int(float("1.7")). I would like to know why the first method returns error.
From the documentation:
If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in radix base ...
Obviously, "1.7" does not represent an integer literal in radix base.
If you want to know why the python dev's decided to limit themselves to integer literals in radix base, there are a possible infinite number of reasons and you'd have to ask Guido et. al to know for sure. One guess would be ease of implementation + efficiency. You might think it would be easily for them to implement it as:
Interpret number as a float
truncate to an integer
Unfortunately, that doesn't work in python as integers can have arbitrary precision and floats cannot. Special casing big numbers could lead to inefficiency for the common case1.
Additionally, forcing you do to int(float(...)) has the additional benefit in clarity -- It makes it more obvious what the input string probably looks like which can help in debugging elsewhere. In fact, I might argue that even if int would accept strings like "1.7", it'd be better to write int(float("1.7")) anyway for the increased code clarity.
1Assuming some validation. Other languages skip this -- e.g. ruby will evaluate '1e6'.to_i and give you 1 since it stops parsing at the first non-integral character. Seems like that could lead to fun bugs to track down ...
We have a good, obvious idea of what "make an int out of this float" means because we think of a float as two parts and we can throw one of them away.
It's not so obvious when we have a string. Make this string into a float implies all kinds of subtle things about the contents of the string, and that is not the kind of thing a sane person wants to see in code where the value is not obvious.
So the short answer is: Python likes obvious things and discourages magic.
Here is a good description of why you cannot do this found in the python documentation.
https://docs.python.org/2/library/functions.html#int
If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in radix base. Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace. A base-n literal consists of the digits 0 to n-1, with a to z (or A to Z) having values 10 to 35. The default base is 10. The allowed values are 0 and 2-36. Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B, 0o/0O/0, or 0x/0X, as with integer literals in code. Base 0 means to interpret the string exactly as an integer literal, so that the actual base is 2, 8, 10, or 16.
Basically to typecast to an integer from a string, the string must not contain a "."
Breaks backwards-compatibility. It is certainly possible, however this would be a terrible idea since it would break backwards-compatibility with the very old and well-established Python idiom of relying on a try...except ladder ("Easier to ask forgiveness than permission") to determine the type of the string's contents. This idiom has been around and used since at least Python 1.5, AFAIK; here are two citations: [1] [2]
s = "foo12.7"
#s = "-12.7"
#s = -12
try:
n = int(s) # or else throw an exception if non-integer...
print "Do integer stuff with", n
except ValueError:
try:
f = float(s) # or else throw an exception if non-float...
print "Do float stuff with", f
except ValueError:
print "Handle case for when s is neither float nor integer"
raise # if you want to reraise the exception
And another minor thing: it's not just about whether the number contains '.' Scientific notation, or arbitrary letters, could also break the int-ness of the string.
Examples: int("6e7") is not an integer (base-10). However int("6e7",16) =
1767 is an integer in base-16 (or any base>=15). But int("6e-7") is never an int.
(And if you expand the base to base-36, any legal alphanumeric string (or Unicode) can be interpreted as representing an integer, but doing that by default would generally be a terrible behavior, since "dog" or "cat" are unlikely to be references to integers).
So I am pretty sure this is a dumb question, but I am trying to get a deeper understanding of the python chr() function.
Also, I am wondering if it is possible to always have the integer argument three digits long, or just a fixed length for all ascii values?
chr(20) ## '\x14'
chr(020) ## '\x10'
Why is it giving me different answers? Does it think '020' is hex or something?
Also, I am running Python 2.7 on Windows!
-Thanks!
There is nothing to do with char. It is all about Numeric literals. And it is cross-language. 0 indicates oct and 0x indicates hex.
print 010 # 8
print 0x10 # 16
It makes sense to explain chr and ord together.
You are obviously using Python2 (because of the octal problem, Python3 requires 0o as the prefix), but I'll explain both.
In Python2, chr is a function that takes any integer up to 256 returns a string containing just that extended-ascii character. unichr is the same but returns a unicode character up to 0x10FFFF. ord is the inverse function, which takes a single-character string (of either type) and returns an integer.
In Python3, chr returns a single-character unicode string. The equivalent for byte strings is bytes([v]). ord still does both.
I'm relatively new to Python but typically find it fairly easy to work out. I've just encountered something, though, which has thrown me a little.
I know that type-checking is not very Pythonic but I'm dealing with user-input and it seems useful here. I expected the following code (in Python 2.7.6) to change a non-relevant input to an empty string, but while trying it out in an interactive interpreter, it returned an unexpected int. Could anybody tell me if this is a special value in Python, or explain why this happens.
I thought that perhaps "code" may be the name of a reserved variable ie. one used internally, but changing the name seemed to have no result.
>>> code = 0134
>>> if type(code) is not int: code =""
...
>>> code
92
I'm sure I can find an alternative way to do what I'm trying to do here, so that's not so much the focus. I'd simply like to work out what's happening with the unexpected int.
Thanks,
Rob
>>> code = 0134
In python 2.7.6 this defines the octal number 132 because of the leading 0. This is equal to the decimal 92.
>>> if type(code) is not int: code =""
If it's not an int then you clear it.
>>> code = 0134
>>> type(code)
<type 'int'>
As you can see you do have an int. When you print it out you get the base-10 representation which is 92.
This particular cause of confusion led to the following PEP http://legacy.python.org/dev/peps/pep-3127/
When a number begins with a 0 in Python, it is interpreted as an octal number. 0134 in octal is 92 in decimal.
I'm not sure why you think the type of that value will not be int. It is an integer.
I need to unpack information in python from a C Structure,
doing it by the following code:
struct.unpack_from('>I', file.read(4))[0]
and afterwards, writing changed values back:
new_value = struct.pack('>I', 008200)
file.write(new_value)
a few examples:
008200 returns an syntaxerror: invalid token.
000010 is written into: 8
000017 is written into: 15
000017 returns a syntaxerror.
I have no idea what kind of conversion that is.
Any kind of help would be great.
This is invalid python code and is not related to the struct module. In python, numbers starting with a zero are octal (base 8). So, python tries to decode 008200 in octal but '8' isn't valid. Assuming you wanted decimal, use 8200. If you wanted hex, use 0x8200.