How does Python handle non-printable characters? [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm making a program that encrypts a file using keys.
It can encrypt only numbers, letters, spaces, some symbols.
Etc.
This is text >>> h5D#I2%%&12s
My program can encrypt a file, too. (At least I'm working on it)
What if file contains characters like this ? - uún‰3«°Ø and also NULL, CAN or SOH characters.
I have an idea: I want to leave these and all other non-ascii characters unencrypted. But I don't know if Python can work with them.
P.S. Here is link to the project: (And It's unfinished, not working)
https://www.dropbox.com/sh/lq8j4vmci5c2vmh/AADeSTPVYeV13z5HRHp-NlWPa?dl=0

Python byte strings (type str in Python 2, bytes in Python 3) are just opaque sequences of bytes, where each byte has an integer value between 0 and 255.
How you treat those bytes is up to you. You could treat them as text; printing the text, splitting on whitespace, changing case, etc. Or you can just treat it as binary data, your choice. If you chose to treat the contents as text, then yes, some bytes are 'unprintable' because the ASCII codec hasn't assigned a printable glyph to those codepoints. Python, however, doesn't care.
Open your files in binary mode ('rb', 'wb', etc.) to make sure that line separators (\n, or \r or \r\n characters) are not translated from and to the platform native form.

Related

Convert a big table of value into string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I have a table of value that i want to insert in my python code as a string :
I tried this but not working :
str(3.354219E-03 3.584506E-03 3.830603E-03 4.093597E-03 4.374646E-03
4.674992E-03 4.995957E-03 5.338959E-03 5.705510E-03 6.097227E-03
6.515837E-03 6.963188E-03 7.441252E-03 7.952137E-03 8.498098E-03
9.081543E-03 9.705044E-03 1.037135E-02 1.108341E-02 1.184435E-02
1.265753E-02 1.352655E-02 1.445522E-02 1.544766E-02 1.650823E-02
1.764162E-02 1.885282E-02 2.014718E-02 2.153040E-02 2.300859E-02
2.458826E-02 2.627639E-02 2.808042E-02 3.000831E-02 3.206855E-02
3.427025E-02 3.662310E-02 3.913749E-02 4.182451E-02 4.469601E-02
4.776465E-02 5.104398E-02 5.454845E-02 5.829352E-02 6.229571E-02
6.657268E-02 7.114329E-02 7.602769E-02 8.124744E-02 8.682555E-02)
Or another way would be to put quotation marks at the beggining and end of line but it's take too much time. If there are some options in Vim or Notepad i would be glad to here about it.
One way would of course to do it programmatically and immediately split your string so that you retrieve the desired table. If you read a multi-line string, you should use three quotation marks """.
If you are looking for a way to do it in a text editor, here are two solutions that work in VS Code (and many more editors)
Column-selection mode. Decently fast, because your inputs are all the same length. You can highlight all rows and then just add quotes accordingly.
Reg-ex replacement. You can use the following regex, see explanations here. Note that I highlighted the reg-ex mode (next to "1 of 200").
Regardless of the method, by using three quotation marks here as well, Python should be able to read your string including the "inner" quotation marks.

Is it possible to convert characters from one language to another language's character using unicode matching? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to translate English language to x language, for that
Firstly, convert English characters to equivalent English Unicode
then convert English Unicode to x Unicode
then x Unicode to x characters. So, i want to convert one language Unicode to equivalent other language Unicode using c language or any other.
for Example, i want to convert "Linux" (ex word) from English to Tamil "லினக்ஸ்"
Unicode for "Linux" (ex word) : 004c 0069 006e 0075 0078
is their possibility to convert this English Unicode to Tamil equivalent Unicode ?
You can't do the step "convert English unicode to x language unicode". Unicode is an encoding, where each character from every language has unique code point, so there's no thing as "English unicode" or, "x language unicode" - it's a single encoding type. I.e. for letter "i" there is a representation 0x2A (not a real code point, just to explain) and 0x2A in unicode will always be "i" independent on language.

remove hex value from string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
How do I remove the hex value from a string in Python 2.7? Here is the string,
\xffDSI\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x01\x00\x08\x00\x00\x00\x00\x00\x00\x00#\x00\x00\x00#\x00\x00\x00\x01\x04\xb3\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x01\x00\x04\x00\x02\x00\x01\x00\x01\x00\x02\x00\x06\x00\x06"\x00\x00\x00\x00c\x01,\x00\x00\x06&\x00\x00\x00\x01\x01,\x00\x00\x06\'\x00\x00\x11\x98\x00\x19\x00\x00\x00(\x00\x00\x00\x00\x00\x0011_w\x00\x00\x00\x00\x00\x006\x00A\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01,\x00\x00\x17\xbf\x00\x00\x11\x98\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
#b4
,sudd5ly1
!
toddl]
0
able
to
use
a
comput]
to
play
games4
\x00$\x00\x00\x018\x00\x00\x00\x00\x00\x01\x00\x19\x00\x02\x00\x03\x00\x04\x00\x05\x00\x1d\x00\x1f\x00\x06\x00\x07\x00\x08\x00\t\x00'],
['\x00\x0b\x00\x0c\x00\r\x00\x1b\x00\x0e\x00\x0f\x00\x10\x00\x13\x00\x11\x00\x14\x00\x1c\x00\x18\x00
I want to display only #b4 to games4. All the hex values should be removed. Thank you.
What I am trying to do is to read in the file type *.dxb, which display braille font. I was able to read the file but the output showed me all those \xffDSI\x00... and then #b4 ,sudd5ly1
The #b4 ,sudd5ly1 is only the part that I want the output to show so that I can do a comparison with other file.
Thank you again.
You can use something like this:
import string
s = '\xffDSI....'
cleaned = ''.join(c for c in s if c in string.printable)
This uses printable as the definition of "not a hex value", though it does include \x0b and \x0c (both printable whitespace characters).

python : read first 1000 bytes from a unicode string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have this long unicode string in python. Of this unicode string I want to read first 1000 bytes.
Use case: I'm trying to send the email body content on a mobile number using the plivo API as a text message. This text message take maximum of 1000 bytes.
So I need to truncate first 1000 bytes from the email body content.
How can this be done ?
If you need the first 1000 bytes then you need to encode the Unicode value first, as the number of bytes varies with the encoding picked.
Then just slice the first 1000 bytes:
encoded = unicodevalue.encode('utf8')
sliced = encoded[:1000]
As it happens, the Plivo Send Message API requires exactly that; 1000 bytes of UTF-8 encoded data. You probably want to truncate the data further to not cut off multi-byte UTF-8 characters:
encoded = unicodevalue.encode('utf8')
sliced = encoded[:1000]
while True:
try:
sliced.decode('utf8')
except UnicodeDecodeError:
sliced = sliced[:-1] # remove one invalid byte
else:
break

Unable to get this to loop. Need this to go line by line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
import re
ftplist = open('C:\Documents and Settings\jasong\My Documents\GooleDrive\lookup.txt','r')
txt = ftplist.read()
re1='([a-z]:\\\\(?:[-\\w\\.\\d]+\\\\)*(?:[-\\w\\.\\d]+)?)'
rg = re.compile(re1,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
if m:
winpath1=m.group(1)
print "("+winpath1+")"+"\n"
Loop directly over the file object:
with open(r'C:\Documents and Settings\jasong\My Documents\GooleDrive\lookup.txt','r') as ftplist:
for line in ftplist:
match = rg.search(line)
This will read the file efficiently, without having to load everything into memory first.
Note: I also made your path a raw string (by adding r in front of it) to prevent Python from trying to interpret escape sequences starting with a \ backslash; \n, \r, \t and \b all have special meaning in a normal string. It is generally a good idea to use raw strings for Windows file paths, although you can also use forward slashes or double backslash characters as well.

Categories