How to use hex() without 0x in Python? - python

The hex() function in python, puts the leading characters 0x in front of the number. Is there anyway to tell it NOT to put them? So 0xfa230 will be fa230.
The code is
import fileinput
f = open('hexa', 'w')
for line in fileinput.input(['pattern0.txt']):
f.write(hex(int(line)))
f.write('\n')

(Recommended)
Python 3 f-strings: Answered by #GringoSuave
>>> i = 3735928559
>>> f'{i:x}'
'deadbeef'
Alternatives:
format builtin function (good for single values only)
>>> format(3735928559, 'x')
'deadbeef'
And sometimes we still may need to use str.format formatting in certain situations #Eumiro
(Though I would still recommend f-strings in most situations)
>>> '{:x}'.format(3735928559)
'deadbeef'
(Legacy) f-strings should solve all of your needs, but printf-style formatting is what we used to do #msvalkon
>>> '%x' % 3735928559
'deadbeef'
Without string formatting #jsbueno
>>> i = 3735928559
>>> i.to_bytes(4, "big").hex()
'deadbeef'
Hacky Answers (avoid)
hex(i)[2:] #GuillaumeLemaître
>>> i = 3735928559
>>> hex(i)[2:]
'deadbeef'
This relies on string slicing instead of using a function / method made specifically for formatting as hex. This is why it may give unexpected output for negative numbers:
>>> i = -3735928559
>>> hex(i)[2:]
'xdeadbeef'
>>> f'{i:x}'
'-deadbeef'

Use this code:
'{:x}'.format(int(line))
it allows you to specify a number of digits too:
'{:06x}'.format(123)
# '00007b'
For Python 2.6 use
'{0:x}'.format(int(line))
or
'{0:06x}'.format(int(line))

You can simply write
hex(x)[2:]
to get the first two characters removed.

Python 3.6+:
>>> i = 240
>>> f'{i:x}' # 02x to pad with zeros
'f0'

Old style string formatting:
In [3]: "%x" % 127
Out[3]: '7f'
New style
In [7]: '{:x}'.format(127)
Out[7]: '7f'
Using capital letters as format characters yields uppercase hexadecimal
In [8]: '{:X}'.format(127)
Out[8]: '7F'
Docs are here.

'x' - Outputs the number in base 16, using lower-case letters for the digits above 9.
>>> format(3735928559, 'x')
'deadbeef'
'X' - Outputs the number in base 16, using upper-case letters for the digits above 9.
>>> format(3735928559, 'X')
'DEADBEEF'
You can find more information about that in Python's documentation:
Format Specification Mini-Language
format()

F-strings
Python 3's formatted literal strings (f-strings) support the Format Specification Mini-Language, which designates x for hexadecimal numbers. The output doesn't include 0x.
So you can do this:
>>> f"{3735928559:x}"
'deadbeef'
See the spec for other bases like binary, octal, etc.
Edit: str.removeprefix
Since Python 3.9, there is now a str.removeprefix method, which allows you to write the following more obvious code:
>>> hexadecimal = hex(3735928559)
>>> hexadecimal.removeprefix('0x')
'deadbeef'
Not that this does NOT work for negative numbers ❌:
>>> negadecimal = hex(-3735928559)
>>> negadecimal.removeprefix('0x')
'-0xdeadbeef'

Besides going through string formatting, it is interesting to have in mind that when working with numbers and their hexadecimal representation we usually are dealing with byte-content, and interested in how bytes relate.
The bytes class in Python 3 had been enriched with methods over the 3.x series, and int.to_bytes combined with the bytes.hex() provide full control of the hex-digits output, while preserving the semantics of the transform (not to mention, holding the intermediate "bytes" object ready to be used in any binary protocol that requires the number):
In [8]: i = 3735928559
In [9]: i.to_bytes(4, "big").hex()
Out[9]: 'deadbeef'
Besides that, bytes.hex() allow some control over the output, such as specifying a separator for the hex digits:
In [10]: bytes.hex?
Docstring:
Create a string of hexadecimal numbers from a bytes object.
sep
An optional single character or byte to separate hex bytes.
bytes_per_sep
How many bytes between separators. Positive values count from the
right, negative values count from the left.
Example:
>>> value = b'\xb9\x01\xef'
>>> value.hex()
'b901ef'
>>> value.hex(':')
'b9:01:ef'
>>> value.hex(':', 2)
'b9:01ef'
>>> value.hex(':', -2)
'b901:ef'
(That said, in most scenarios just a quick print is wanted, I'd probably just go through f-string formatting, as in the accepted answer: f"{mynumber:04x}" - for the simple reason of "less things to remember")

While all of the previous answers will work, a lot of them have caveats like not being able to handle both positive and negative numbers or only work in Python 2 or 3. The version below works in both Python 2 and 3 and for positive and negative numbers:
Since Python returns a string hexadecimal value from hex() we can use string.replace to remove the 0x characters regardless of their position in the string (which is important since this differs for positive and negative numbers).
hexValue = hexValue.replace('0x','')
EDIT: wjandrea made a good point in that the above implementation doesn't handle values that contain 0X instead of 0x, which can occur in int literals. With this use case in mind, you can use the following case-insensitive implementation for Python 2 and 3:
import re
hexValue = re.sub('0x', '', hexValue, flags=re.IGNORECASE)

Decimal to Hexadecimal,
it worked
hex(number).lstrip("0x").rstrip("L")

Related

encoding unicode using UTF-8

In Python, if I type
euro = u'\u20AC'
euroUTF8 = euro.encode('utf-8')
print(euroUTF8, type(euroUTF8), len(euroUTF8))
the output is
('\xe2\x82\xac', <type 'str'>, 3)
I have two questions:
1. it looks like euroUTF8 is encoded over 3 bytes, but how do I get its binary representation to see how many bits it contain?
2. what does 'x' in '\xe2\x82\xac' mean? I don't think 'x' is a hex number. And why there are three '\'?
In Python 2, print is a statement, not a function. You are printing a tuple here. Print the individual elements by removing the (..):
>>> euro = u'\u20AC'
>>> euroUTF8 = euro.encode('utf-8')
>>> print euroUTF8, type(euroUTF8), len(euroUTF8)
€ <type 'str'> 3
Now you get the 3 individual objects written as strings to stdout; my terminal just happens to be configured to interpret anything written to it as UTF-8, so the bytes correctly result in the € Euro symbol being displayed.
The \x<hh> sequences are Python string literal escape sequences (see the reference documentation); they are the default output for the repr() applied to a string with non-ASCII, non-printable bytes in them. You'll see the same thing when echoing the value in an interactive interpreter:
>>> euroUTF8
'\xe2\x82\xac'
>>> euroUTF8[0]
'\xe2'
>>> euroUTF8[1]
'\x82'
>>> euroUTF8[2]
'\xac'
They provide you with ASCII-safe debugging output. The contents of all Python standard library containers use this format; including lists, tuples and dictionaries.
If you want to format to see the bits that make up these values, convert each byte to an integer by using the ord() function, then format the integer as binary:
>>> ' '.join([format(ord(b), '08b') for b in euroUTF8])
'11100010 10000010 10101100'
Each letter in each encoding are represented using different number of bits. UTF-8 is a 8 bit encoding, so there is no need to get a binary representation to know each bit count of each character. (If you still want to present bits, refer to Martijn's answer.)
\x means that the following value is a byte. So x is not something like a hex number that you should convert or read. It identifies the following value, which is you are interested in. \'s are used to escape that x's because they are not a part of the value.

How can I slice a substring from a unicode string with Python?

I have a unicode string as a result : u'splunk>\xae\uf001'
How can I get the substring 'uf001'
as a simple string in python?
The characters uf001 are not actually present in the string, so you can't just slice them off. You can do
repr(s)[-6:-1]
or
'u' + hex(ord(s[-1]))[2:]
Since you want the actual string (as seen from comments) , just get the last character [-1] index , Example -
>>> a = u'splunk>\xae\uf001'
>>> print(a)
splunk>®ï€
>>> a[-1]
'\uf001'
>>> print(a[-1])
ï€
If you want the unicode representation (\uf001) , then take repr(a[-1]) , Example -
>>> repr(a[-1])
"'\\uf001'"
\uf001 is a single unicode character (not multiple strings) , so you can directly get that character as above.
You see \uf001 because you are checking the results of repr() on the string, if you print it, or use it somewhere else (like for files, etc) it will be the correct \uf001 character.
u'' it is how a Unicode string is represented in Python source code. REPL uses this representation by default to display unicode objects:
>>> u'splunk>\xae\uf001'
u'splunk>\xae\uf001'
>>> print(u'splunk>\xae\uf001')
splunk>®
>>> print(u'splunk>\xae\uf001'[-1])

If your terminal is not configured to display Unicode or if you are on a narrow build (e.g., it is likely for Python 2 on Windows) then the result may be different.
Unicode string is an immutable sequence of Unicode codepoints in Python. len(u'\uf001') == 1: it does not contain uf001 (5 characters) in it. You could write it as u'' (it is necessary to declare the character encoding of your source file on Python 2 if you use non-ascii characters):
>>> u'\uf001' == u''
True
It is just a different way to represent exactly the same Unicode character (a single codepoint in this case).
Note: some user-perceived characters may span several Unicode codepoints e.g.:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', u'ё')
u'\u0435\u0308'
>>> print(unicodedata.normalize('NFKD', u'ё'))
ё

Print confusion

I am new to python when i try to print "\20%" that is
>>>"\20%"
why is the shell printing '\x10%' that is, it is showing
'\x10%'
the same is happening with join also when is do
>>>l = ['test','case']
>>>"\20%".join(l)
it shows
'test\x10%case'
I am using python 2.7.3
'\20' is an octal literal, and the same as chr(2 * 8 + 0) == chr(16).
What the Python shell displays by default is not the output of print, but the representation of the given value, which is the hexadecimal '\x10'.
If you want the string \20%, you have to either escape the backaslash ('\\20%') or use a raw string literal (r'\20%'). Both will be displayed as
>>> r'\20%'
'\\20%'
\20 is an escape sequence that refers to the DLE ASCII character whose decimal value is 16 (20 in octal, 10 in hexadecimal). Such a character is printed as the \x10 hex escape by the repr function of strings.
To specify a literal \20, either double the backslash ("\\20") or use a raw string (r"\20").
Two print "\20%"
what if you print directly:
>>> print '\20%'
% # some symbol not correctly display on this page
and do using r
>>> print r'\20%'
\20%
>>> r'\20%' # what r do.
'\\20%'
>>> print '\\20%'
\20%
>>>
Some time back I had same doubt about string and I asked a question, you may find helpful

How can I format an integer to a two digit hex?

Does anyone know how to get a chr to hex conversion where the output is always two digits?
for example, if my conversion yields 0x1, I need to convert that to 0x01, since I am concatenating a long hex string.
The code that I am using is:
hexStr += hex(ord(byteStr[i]))[2:]
You can use string formatting for this purpose:
>>> "0x{:02x}".format(13)
'0x0d'
>>> "0x{:02x}".format(131)
'0x83'
Edit: Your code suggests that you are trying to convert a string to a hexstring representation. There is a much easier way to do this (Python2.x):
>>> "abcd".encode("hex")
'61626364'
An alternative (that also works in Python 3.x) is the function binascii.hexlify().
You can use the format function:
>>> format(10, '02x')
'0a'
You won't need to remove the 0x part with that (like you did with the [2:])
If you're using python 3.6 or higher you can also use fstrings:
v = 10
s = f"0x{v:02x}"
print(s)
output:
0x0a
The syntax for the braces part is identical to string.format(), except you use the variable's name. See https://www.python.org/dev/peps/pep-0498/ for more.
htmlColor = "#%02X%02X%02X" % (red, green, blue)
The standard module binascii may also be the answer, namely when you need to convert a longer string:
>>> import binascii
>>> binascii.hexlify('abc\n')
'6162630a'
Use format instead of using the hex function:
>>> mychar = ord('a')
>>> hexstring = '%.2X' % mychar
You can also change the number "2" to the number of digits you want, and the "X" to "x" to choose between upper and lowercase representation of the hex alphanumeric digits.
By many, this is considered the old %-style formatting in Python, but I like it because the format string syntax is the same used by other languages, like C and Java.
The simpliest way (I think) is:
your_str = '0x%02X' % 10
print(your_str)
will print:
0x0A
The number after the % will be converted to hex inside the string, I think it's clear this way and from people that came from a C background (like me) feels more like home

Python strip() unicode string?

How can you use string methods like strip() on a unicode string? and can't you access characters of a unicode string like with oridnary strings? (ex: mystring[0:4] )
It's working as usual, as long as they are actually unicode, not str (note: every string literal must be preceded by u, like in this example):
>>> a = u"coțofană"
>>> a
u'co\u021bofan\u0103'
>>> a[-1]
u'\u0103'
>>> a[2]
u'\u021b'
>>> a[3]
u'o'
>>> a.strip(u'ă')
u'co\u021bofan'
Maybe it's a bit late to answer to this, but if you are looking for the library function and not the instance method, you can use that as well.
Just use:
yourunicodestring = u' a unicode string with spaces all around '
unicode.strip(yourunicodestring)
In some cases it's easier to use this one, for example inside a map function like:
unicodelist=[u'a',u' a ',u' foo is just...foo ']
map (unicode.strip,unicodelist)
You can do every string operation, actually in Python 3, all str's are unicode.
>>> my_unicode_string = u"abcşiüğ"
>>> my_unicode_string[4]
u'i'
>>> my_unicode_string[3]
u'\u015f'
>>> print(my_unicode_string[3])
ş
>>> my_unicode_string[3:]
u'\u015fi\xfc\u011f'
>>> print(my_unicode_string[3:])
şiüğ
>>> print(my_unicode_string.strip(u"ğ"))
abcşiü
See the Python docs on Unicode strings and the following section on string methods. Unicode strings support all of the usual methods and operations as normal ASCII strings.

Categories