I want to create 5 hex bytes length string that is gonna be send through a socket. I want that send 255 packets changing the third byte incremntally. How can I do that?
Something like this code:
i=0
while True:
a="\x3f\x4f"+hex(i)+"\x0D\x0A"
socket.send(a)
i=i+1
The problem is that this code is introducing 0x0 (30 78 30) instead of 00 in the first loop for example.
Thank you
I think you're a bit confused here.
\x3f is a single character (the same character as ?).
If i is, say, 63 (hex 3F), you don't want to add the separate characters \\, x, 3, and f to the string, you want to add the single character \x3f. Likewise, if it's 0 (hex 00), you don't want to add the separate characters \\, x, 0 to the string, you want to add the single character \x0.
That's exactly what the chr function is for:
Return a string of one character whose ASCII code is the integer i. For example, chr(97) returns the string 'a'…
By contrast, the function hex will:
[c]onvert an integer number (of any size) to a lowercase hexadecimal string prefixed with “0x”…
So, hex(97) returns the four character string '0x61'.
Related
I have long string, which can consist of few sub-strings (not always, sometimes it's one string, sometimes there are 4 sub-strings sticked together). Each one starts with byte length, for example 4D or 4E. Below is example big-string which consists of 4 sub-strings:
4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
After splitting by pattern, the output SHOULD BE:
4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB5
4E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B969
4E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
Each long string has ID - in this case it's 44B909, each line has this ID after bytes. My original code took first 6 letters (4D44B9) and splitted string by this. It's working in 95% cases - where EACH line has same length, for example 4D. The problem is that not always each line has same length - as in string above. Look at my code below:
def repeat():
string = input('Please paste string below:'+'\n')
code = string[:6]
print('\n')
print('SPLITTED:')
string = string.replace(code, '\n'+'\n'+code)
print(string)
while True:
repeat()
When you try to paste this one long string, it won't split it, because first line has 4D, and rest has 4E. I'd like it to "ignore" (for a moment) first 2 letters (4E) and take six next letters, as "split-pattern"? The output should be as these 4 lines above! I was changing code a bit, but I was getting some strange results, like below:
44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E
44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E
44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
How can I make it work??
If the first two characters encode the string's length in hex, why do you not use that to decide how much of the string to consume?
However, the offsets in your example seem wrong; 4D is correct (decimal 78) but 4E should apparently be 51 (the string is four characters longer).
For the question about how to split on a slightly variable pattern, a regular expression seems like a good solution.
import re
splitted = re.split(r'4[DE](?=44B909)', string)
In so many words, this says "use 4D or 4E as the delimiter to split on, but only if it's immediately followed by 44B909".
(There will be an empty group before the first value but that's easy to shift off; or change the regex to r'(?<!^)4[DE](?=44B909O)'.)
If you don't want to discard anything, include everything in the lookahead:
splitted = re.split(r'(?<!^)(?=4[DE]44B909)', string)
This question already has answers here:
Truncating string to byte length in Python
(4 answers)
Closed 3 years ago.
For storage in a given Oracle table (whose field lengths are defined in bytes) I need to cut strings beforehand in Python 3 to a maximal length in Bytes, although the strings can contain UTF-8 characters.
My solution is to concatenate the result string character by character from the original string and check when the result string exceeds the length limit:
def cut_str_to_bytes(s, max_bytes):
"""
Ensure that a string has not more than max_bytes bytes
:param s: The string (utf-8 encoded)
:param max_bytes: Maximal number of bytes
:return: The cut string
"""
def len_as_bytes(s):
return len(s.encode(errors='replace'))
if len_as_bytes(s) <= max_bytes:
return s
res = ""
for c in s:
old = res
res += c
if len_as_bytes(res) > max_bytes:
res = old
break
return res
This is obviously rather slow. What is an efficient way to do this?
ps: I saw Truncate a string to a specific number of bytes in Python, but their solution to use sys.getsizeof() does not give the number of bytes of the string characters, but rather the size of the whole string object (Python need some bytes to manage the string object), so that does not really help.
It is valid to cut a UTF-8 string anywhere except in the middle of a multibyte character. So, if you want the longest UTF-8 string within a maximum byte length, what you need is to first take the max bytes and then reduce it as long as it has an unfinished character at the end.
Compared to your solution, which has O(n) complexity, because it goes character-by-character, this one just removes up to 3 bytes from the end (because a UTF-8 character is never longer than 4 bytes).
RFC 3629 specifies these as valid UTF-8 byte sequences:
Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
So, the simplest way to go with a valid UTF-8 stream:
if the last character is 0xxxxxxx, all is fine
otherwise, find the location of a 11xxxxxx within the last 4 bytes to see whether you have a complete character, based on the table above
Therefore, this should work:
def cut_str_to_bytes(s, max_bytes):
# cut it twice to avoid encoding potentially GBs of `s` just to get e.g. 10 bytes?
b = s[:max_bytes].encode('utf-8')[:max_bytes]
if b[-1] & 0b10000000:
last_11xxxxxx_index = [i for i in range(-1, -5, -1)
if b[i] & 0b11000000 == 0b11000000][0]
# note that last_11xxxxxx_index is negative
last_11xxxxxx = b[last_11xxxxxx_index]
if not last_11xxxxxx & 0b00100000:
last_char_length = 2
elif not last_11xxxxxx & 0b0010000:
last_char_length = 3
elif not last_11xxxxxx & 0b0001000:
last_char_length = 4
if last_char_length > -last_11xxxxxx_index:
# remove the incomplete character
b = b[:last_11xxxxxx_index]
return b.decode('utf-8')
Alternatively, you may try decoding the last bytes, rather than doing the low-level stuff, but I'm not sure the code would be simpler that way...
Note: The function shown here works for strings which are longer than two characters. A version which also covers the edge cases of shorter strings can be found on GitHub.
This question already has answers here:
How to get the ASCII value of a character
(5 answers)
Closed 3 years ago.
I tried to print the escape sequence characters or the ASCII representation of numbers in Python in a for loop.
Like:
for i in range(100, 150):
b = "\%d" %i
print(b)
I expected the output like,
A
B
C
Or something.
But I got like,
\100
\101
How to print ASCII representation of the numbers?
There's a builtin function for python called ord and chr
ord is used to get the value of ASCII letter, for example:
print(ord('h'))
The output of the above is 104
ord only support a one length string
chr is inverse of ord
print(chr(104))
The output of the above is 'h'
chr only supports integer. float, string, and byte doesn't support
chr and ord are really important if you want to make a translation of a text file (encoded text file)
You can use the ord() function to print the ASCII value of a character.
print(ord('b'))
> 98
Likewise, you can use the chr() function to print the ASCII character represented by a number.
print(chr(98))
> b
I've got a 4 number string corresponding to the code-point of an unicode character.
I need to dynamically convert it to its unicode character to be stored inside a variable.
For example, my program will spit during its loop a variable a = '0590'. (https://www.compart.com/en/unicode/U+0590)
How do I get the variable b = '\u0590'?
I've tried string concatenation '\u' + a but obviously it's not the way.
chr will take a code point as an integer and convert it to the corresponding character. You need to have an integer though, of course.
a = '0590'
result = chr(int(a))
print(result)
On Python 2, the function is called unichr, not chr. And if you want to interpret the string as a hex number, you can pass an explicit radix to int.
a = '0590'
result = unichr(int(a, 16))
print(result)
I need to port code from perl that packs byte string. In perl it looks like the following:
pack 'B*', '0100001000111110010100101101000010010001'
I don't see B* format analog in python struct module. Perhaps there are ready solutions not to invent a bicycle?
Honestly, description is not clear for me, so i even can't imagine how it works to implement it by myself:
Likewise, the b and B formats pack a string that's that many bits
long. Each such format generates 1 bit of the result. These are
typically followed by a repeat count like B8 or B64 .
Each result bit
is based on the least-significant bit of the corresponding input
character, i.e., on ord($char)%2. In particular, characters "0" and
"1" generate bits 0 and 1, as do characters "\000" and "\001" .
Starting from the beginning of the input string, each 8-tuple of
characters is converted to 1 character of output.
With format b , the
first character of the 8-tuple determines the least-significant bit of
a character; with format B , it determines the most-significant bit of
a character.
If the length of the input string is not evenly divisible
by 8, the remainder is packed as if the input string were padded by
null characters at the end. Similarly during unpacking, "extra" bits
are ignored.
If the input string is longer than needed, remaining
characters are ignored.
A * for the repeat count uses all characters
of the input field. On unpacking, bits are converted to a string of 0
s and 1 s.
So, string is divided in chunks for 8 symbols. If last chunk is less 8 symbols, it is padded with null characters in the end to be 8 symbols. Then, each chunk becomes a byte.
But i can't understand, what are resulting bits? What is meant under B8 and B64 here?
The int-object has a to_bytes-method:
binary = '0100001000111110010100101101000010010001'
number = int(binary, 2)
print(number.to_bytes((number.bit_length()+7)//8, 'big'))
# b'B>R\xd0\x91'
I'm not sure of the exact perl semantics, but here's my guess at them:
def pack_bit_string(bs):
ret = b''
while bs:
chunk, bs = bs[:8], bs[8:]
# convert to an integer so we can pack it
i = int(chunk, 2)
# Handle trailing chunks that are not 8 bits
# Note this as an augmented assignment, perhaps also read as
# i = i << (8 - len(chunk))
i <<= 8 - len(chunk)
ret += struct.pack('B', i)
return ret
Comments are inline. If you know things like "the input is less than 64 bits" you can avoid the loop and use Q for struct.pack