python : read first 1000 bytes from a unicode string [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have this long unicode string in python. Of this unicode string I want to read first 1000 bytes.
Use case: I'm trying to send the email body content on a mobile number using the plivo API as a text message. This text message take maximum of 1000 bytes.
So I need to truncate first 1000 bytes from the email body content.
How can this be done ?

If you need the first 1000 bytes then you need to encode the Unicode value first, as the number of bytes varies with the encoding picked.
Then just slice the first 1000 bytes:
encoded = unicodevalue.encode('utf8')
sliced = encoded[:1000]
As it happens, the Plivo Send Message API requires exactly that; 1000 bytes of UTF-8 encoded data. You probably want to truncate the data further to not cut off multi-byte UTF-8 characters:
encoded = unicodevalue.encode('utf8')
sliced = encoded[:1000]
while True:
try:
sliced.decode('utf8')
except UnicodeDecodeError:
sliced = sliced[:-1] # remove one invalid byte
else:
break

Related

In every instance where I receive a message from the other end of a socket, why is there a 'b' before every message? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 months ago.
Improve this question
For instance:
import socket
s.connect('123.456.78.9', '8080')
word = s.recv(1024)
word.decode()
print(word)
The output will be:
b'this is the string inside the word variable after receiving it from the other end'
is there anyway to get rid of the b at the start, and the quote marks around it?
The b prefix stands for bytes and you are seeing it because the data you receive from the socket is raw bytes. If your message happens to be bytes that are valid text, say ASCII or UTF-8 text, then you will see what looks like the string you expected but with the b prefix. Strings differ from bytes in that some representations of strings require multiple bytes (sometimes a fixed number, sometimes a variable number) per character (technically referred to as a "code point"). In order to convert to between bytes and strings, use:
bytes to str
my_string = b"hello I am a bytes literal".decode()
str to bytes
my_bytes = "hello I am a string literal".encode()
These functions decode and encode also take an optional encoding parameter with the default being 'utf-8', which is what you should use if you do not know what encodings are. If you get an error saying something about a decode error, you may need to specify the encoding.

How does Python handle non-printable characters? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm making a program that encrypts a file using keys.
It can encrypt only numbers, letters, spaces, some symbols.
Etc.
This is text >>> h5D#I2%%&12s
My program can encrypt a file, too. (At least I'm working on it)
What if file contains characters like this ? - uún‰3«°Ø and also NULL, CAN or SOH characters.
I have an idea: I want to leave these and all other non-ascii characters unencrypted. But I don't know if Python can work with them.
P.S. Here is link to the project: (And It's unfinished, not working)
https://www.dropbox.com/sh/lq8j4vmci5c2vmh/AADeSTPVYeV13z5HRHp-NlWPa?dl=0
Python byte strings (type str in Python 2, bytes in Python 3) are just opaque sequences of bytes, where each byte has an integer value between 0 and 255.
How you treat those bytes is up to you. You could treat them as text; printing the text, splitting on whitespace, changing case, etc. Or you can just treat it as binary data, your choice. If you chose to treat the contents as text, then yes, some bytes are 'unprintable' because the ASCII codec hasn't assigned a printable glyph to those codepoints. Python, however, doesn't care.
Open your files in binary mode ('rb', 'wb', etc.) to make sure that line separators (\n, or \r or \r\n characters) are not translated from and to the platform native form.

Converting Hex To dec in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am trying to convert this RFID Tag number got from this code;
import serial
ser = serial.Serial()
ser.port = "COM1"
ser.baudrate = 9600
ser.timeout = 3
ser.open()
if ser.open is True:
print "Port Not open"
while ser.isOpen():
#ser.timeout = 7
response = ser.read(17)
response = response.encode('hex')
print response
I am getting this 0000000000000000000213780510015dff which is a hexadecimal number, but I want to convert it to decimal or string. When I try to do that, I am getting a token error. How can I fix that?
You say you want to "convert… to string".
You can use unhexlify to do that, or decode('hex').
However, in your case, the only reason you have hex in the first place is that you called encode('hex'), so just… don't do that.
If you want to decode it to an int or a Decimal or something, you can do that by using the appropriate constructor, as Maxime's answer shows. However, rather than converting to hex just to decode as an int, you might want to just decode it directly. Or maybe you want to decode the hex string into a decimal string? Or maybe this is some UUID-style structure, and you want to use struct.unpack to decode it into pieces? Or…? Without knowing exactly what you're trying to do, it's hard to give an exact answer…
You can use int to convert a hexadecimal number into an integer.
>>> int("0000000000000000000213780510015dff", 16)
149595175772052991

Change an integer to a string (numbers to words) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How to convert number such as 24 to the two words "two", "four".
Quick way I thought of.
First you need a way to loop through the integer. You can try doing some weird diving by 10 and using the modulus of it... or just convert it to a string.
Then you can iterate through each 'number' in the string and use a simple lookup table to print out each number.
numberconverterlookup={'1':'one'
'2':'two'
'3':'three'
'4':'four'
'5':'five'
'6':'six'
'7':'seven'
'8':'eight'
'9':'nine'
'0':'zero'
}
number = 24
stringnumber = str(number)
for eachdigit in stringnumber:
print numberconverterlookup[eachdigit]
Note this only handles single digits and can't easily handle large numbers. Otherwise you'd have to write out each number in the lookup table by hand. That is very cumbersome.
Some key concepts are illustrated here:
Dictionary: This maps a 'key' to a 'value'. I.e. '1' maps to 'one'
For loop: This allows us to go through each digit in the number. In the case of 24, it will loop twice, once with eachdigit set to '2', and loops around again with eachdigit set to '4'. We cant loop through an integer because it is itself a single entity.
Typecasting: This converts the integer type 24 into a string '24'. A string is basically a list of individual characters grouped together, whereas an integer is a single entity.

Remove 64 Bytes every 2048 bytes in Binary [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Improve this question
I'm at a loose end here, it seems like such a simple problem so I'm hoping there is a simple answer!
I have a binary (approx 35m) which has 64 bytes of padded data every 2048 bytes starting at offset 1536 - I just want to remove this padding.
The first occurrence is 1536, then 3648,5760,7872,etc
(2112 bytes - 64 bytes of dummy data = 2048)
I've tried bvi,bbe,hexdump+sed+xxd and I'm clearly missing something.
Thanks in advance,
You didn't show any code, so I presume you need help wrapping your head around the algorithm. It's actually quite simple:
While you haven't reached the EOF of STDIN,
Read 2112 bytes from STDIN
From the bytes read, remove the 64 bytes starting at position 1536.
Print the remaining 2048 bytes to STDOUT.
In Perl,
binmode(STDIN);
binmode(STDOUT);
while (1) {
my $rv = read(STDIN, my $rec, 2112);
die $! if !defined($rv);
last if !$rv;
substr($rec, 1536, 64, '');
print($rec)
or die $!;
}
If you want to use Perl:
Open the file with the :raw layer. We don't want :utf8 or :crlf translation.
Then, we can seek to the positions we are interested in, and can read a few bytes
my $size = -s $filename;
open my $fh, "<:raw", $filename;
for (seek($fh, 1536, 0) ; tell($fh) + 2048 < $size ; seek($fh, 2048 - 64, 1)) {
read $fh, my $buffer, 64;
...;
}
Read
perldoc -f tell
perldoc -f seek
perldoc -f read
for further information

Categories