encoding decoding string in python 3.6 doesn't work? - python

I have this code:
import base64
words = ('word1',"word2")
for word in words: #for loop
str_encoded = base64.b64encode(word.encode()) # encoding it
print(str_encoded) #print encoded
str_decoded = str_encoded.decode('utf-8')
print(str_decoded)
back = base64.standard_b64decode(str_decoded) # testing if it worked
print(word, "," ,"{{" , str_decoded , "}}" , "," , str_decoded, back) #print test
when i print the test i see the b' wasn't removed.
how can i remove it? thanks!

You tried to decode your data in the wrong order, you have to go backwards compared to the encoding order:
import base64
words = ('word€',"word2") # Added some non-ascii characters for testing
for word in words:
# Encoding
print("Word:", word)
utf8_encoded = word.encode('utf8') # Encoding in utf8, gives a bytes object
print('utf8 encoded:', utf8_encoded)
str_encoded = base64.b64encode(utf8_encoded) # Encoding it in B64
print("Base64 encoded:", str_encoded)
# Decoding
b64_decoded = base64.standard_b64decode(str_encoded) # Decoding from B64, we get a bytes object
print("Decoded from base64:", b64_decoded)
str_decoded = b64_decoded.decode('utf-8') # and decode it (as utf8) to get a string
print("Decoded string:", str_decoded, '\n')
Output:
Word: word€
utf8 encoded: b'word\xe2\x82\xac'
Base64 encoded: b'd29yZOKCrA=='
Decoded from base64: b'word\xe2\x82\xac'
Decoded string: word€
Word: word2
utf8 encoded: b'word2'
Base64 encoded: b'd29yZDI='
Decoded from base64: b'word2'
Decoded string: word2

You have to decode "back" variable (which in your case is bytes) with:
back.decode("utf-8")
print(str(word), "," ,"{{" , str_decoded , "}}" , "," , str_decoded, back.decode("utf-8") )

Related

ValueError: Data must be aligned to block boundary in ECB mode (or additional backslashes from encoding of encrypted text)

I have this code:
from Crypto.Cipher import DES
# Encryption part
key = b'abcdefgh'
def pad(text):
while len(text) % 8 != 0:
text += b' '
return text
des = DES.new(key, DES.MODE_ECB)
text = b'Secret text'
padded_text = pad(text)
encrypted_text = des.encrypt(padded_text)
print(encrypted_text) # FIRST
# Decryption part
that_encrypted_text = input().encode('utf8')
# This print shows the problem---------------
print(that_encrypted_text) # SECOND
# This print shows the problem --------------
data = des.decrypt(that_encrypted_text)
print(data)
From the FIRST print we can see: b'.\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0'
Fill in the input(): .\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0
From the SECOND print we can see: b'.\\x12\\x7f\\xcf\\xad+\\xa9\\x0c\\xc4\\xde\\x05\\x15\\xef\\x7f\\x16\\xa0'
And after this (because of additional backslashes) an error appears:
ValueError: Data must be aligned to block boundary in ECB mode
Why do additional backslashes appear from encoding and how to get rid of them so that the message was decrypted?
I want both parts of program: encryption and decryption to work separately. That's why there is input() for an encrypted text.
Fill in the input(): .\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0
is equivalent to r'.\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0' (and it's origin for doubled backslashes in your SECOND print).
Use
that_encrypted_text = (input().encode( 'raw_unicode_escape')
.decode( 'unicode_escape')
.encode( 'latin1'))
See how Python specific text encodings raw_unicode_escape and unicode_escape manipulate with backslashes (and note the role of latin1 encoding there).

Convert utf-8 string to base64

I converted base64 str1= eyJlbXBsb3llciI6IntcIm5hbWVcIjpcInVzZXJzX2VtcGxveWVyXCIsXCJhcmdzXCI6XCIxMDQ5NTgxNjI4MzdcIn0ifQ
to str2={"employer":"{\"name\":\"users_employer\",\"args\":\"104958162837\"}"}
with help of http://www.online-decoder.com/ru
I want to convert str2 to str1 with help of python. My code:
import base64
data = """{"employer":"{"name":"users_employer","args":"104958162837"}"}"""
encoded_bytes = base64.b64encode(data.encode("utf-8"))
encoded_str = str(encoded_bytes, "utf-8")
print(encoded_str)
The code prints str3=
eyJlbXBsb3llciI6InsibmFtZSI6InVzZXJzX2VtcGxveWVyIiwiYXJncyI6IjEwNDk1ODE2MjgzNyJ9In0=
What should I change in code to print str1 instead of str3 ?
I tried
{"employer":"{\"name\":\"users_employer\",\"args\":\"104958162837\"}"}
and
{"employer":"{"name":"users_employer","args":"104958162837"}"}
but result is the same
The problem is that \" is a python escape sequence for the single character ". The fix is to keep the backslashes and use a raw string (note the "r" at the front).
data = r"""{"employer":"{\"name\":\"users_employer\",\"args\":\"104958162837\"}"}"""
This will be the original string, except that python pads base64 numbers to 4 character multiples with the "=" sign. Strip the padding and you have the original.
import base64
str1 = "eyJlbXBsb3llciI6IntcIm5hbWVcIjpcInVzZXJzX2VtcGxveWVyXCIsXCJhcmdzXCI6XCIxMDQ5NTgxNjI4MzdcIn0ifQ"
str1_decoded = base64.b64decode(str1 + "="*divmod(len(str1),4)[1]).decode("ascii")
print("str1", str1_decoded)
data = r"""{"employer":"{\"name\":\"users_employer\",\"args\":\"104958162837\"}"}"""
encoded_bytes = base64.b64encode(data.encode("utf-8"))
encoded_str = str(encoded_bytes.rstrip(b"="), "utf-8")
print("my encoded", encoded_str)
print("same?", str1 == encoded_str)
The results you want are just base64.b64decode(str1 + "=="). See this post for more information on the padding.

Python adds extra to crypt result

I'm trying to create an API with token to communicate between an Raspberry Pi and a Webserver. Right now i'm tring to generate an Token with Python.
from Crypto.Cipher import AES
import base64
import os
import time
import datetime
import requests
BLOCK_SIZE = 32
BLOCK_SZ = 14
#!/usr/bin/python
salt = "123456789123" # Zorg dat de salt altijd even lang is! (12 Chars)
iv = "1234567891234567" # Zorg dat de salt altijd even lang is! (16 Chars)
currentDate = time.strftime("%d%m%Y")
currentTime = time.strftime("%H%M")
PADDING = '{'
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
EncodeAES = lambda c, s: base64.b64encode(c.encrypt(pad(s)))
DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING)
secret = salt + currentTime
cipher=AES.new(key=secret,mode=AES.MODE_CBC,IV=iv)
encode = currentDate
encoded = EncodeAES(cipher, encode)
print (encoded)
The problem is that the output of the script an exta b' adds to every encoded string.. And on every end a '
C:\Python36-32>python.exe encrypt.py
b'Qge6lbC+SulFgTk/7TZ0TKHUP0SFS8G+nd5un4iv9iI='
C:\Python36-32>python.exe encrypt.py
b'DTcotcaU98QkRxCzRR01hh4yqqyC92u4oAuf0bSrQZQ='
Hopefully someone can explain what went wrong.
FIXED!
I was able to fix it to decode it to utf-8 format.
sendtoken = encoded.decode('utf-8')
You are running Python 3.6, which uses Unicode (UTF-8) for string literals. I expect that the EncodeAES() function returns an ASCII string, which Python is indicating is a bytestring rather than a Unicode string by prepending the b to the string literal it prints.
You could strip the b out of the output post-Python, or you could print(str(encoded)), which should give you the same characters, since ASCII is valid UTF-8.
EDIT:
What you need to do is decode the bytestring into UTF-8, as mentioned in the answer and in a comment above. I was wrong about str() doing the conversion for you, you need to call decode('UTF-8') on the bytestring you wish to print. That converts the string into the internal UTF-8 representation, which then prints correctly.

Error type conversion resulting in urllib.request.urlopen it wont work as character strings

In python 3.x, after taking the value of url (urllib.request.urlopen)
sock = urllib.request.urlopen(url)
code = sock.read (100)
code = code.replace( '\n' , 'enter' )
code = code.replace( '\t' , 'tab' )
I can not treat 'code' as strings of characters, it returns an error of type conversion
code = code.replace( '\n' , 'enter' )
code = code.replace( '\t' , 'tab' )
urllib.request returns bytes values.
Either decode to a string, or use byte literals when replacing:
code = code.replace(b'\n', b'enter')
code = code.replace(b'\t', b'tab')
Decoding requires that you know what codec was used for the textual content. You can see if a content character set was returned:
codec = sock.info().get_param('charset')
If that value is not None you can decode with that codec:
code = code.decode(codec)
The default codec for text/ mimetype responses is ISO-8859-1 (Latin 1), but HTML responses often set the desired codec in a <meta> tag in the header. Leave decoding that to a competent HTML parser, like BeautifulSoup.
Before writing:
code = code.replace('\n', 'enter')
code = code.replace('\t', 'tab')
Write this:
code = code.decode('utf-8')
Finally, your code becomes:
code = code.decode('utf-8')
code = code.replace('\n', 'enter')
code = code.replace('\t', 'tab')
urllib.request.urlopen returns bytes data
Note: Because you have f.sock, if you have a judgment urllib.request.urlopen?

How could list decode to 'UTF-8'

I got a list = [0x97, 0x52], not unicode object. this is unicode of a charactor '青'(u'\u9752'). How could I change this list to unicode object first, then encode to 'UTF-8'?
bytes = [0x97, 0x52]
code = bytes[0] * 256 + bytes[1] # build the 16-bit code
char = unichr(code) # convert code to unicode
utf8 = char.encode('utf-8') # encode unicode as utf-8
print utf8 # prints '青'
Not sure if this is the most elegant way, but it works for this particular example.
>>> ''.join([chr(x) for x in [0x97, 0x52]]).decode('utf-16be')
u'\u9752'

Categories