Base58Check encoding for Bitcoin addresses too long - python

I'm trying to create a Bitcoin address with Python. I got the hashing part right, but I have some trouble with the Base58Check encoding. I use this package:
https://pypi.python.org/pypi/base58
Here is an example:
import base58
unencoded_string = "00010966776006953D5567439E5E39F86A0D273BEED61967F6"
encoded_string = base58.b58encode(unencoded_string)
print(encoded_string)
The output is:
bSLesHPiFV9jKNeNbUiMyZGJm45zVSB8bSdogLWCmvs88wxHjEQituLz5daEGCrHE7R7
According to the technical background for creating Bitcoin addresses the RIPEMD-160 hash above should be "16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM". That said, my output is wrong and obviously too long. Does anyone know what I did wrong?
EDIT:
I added a decoding to hex (.decode("hex")):
import base58
unencoded_string = "00010966776006953D5567439E5E39F86A0D273BEED61967F6"
encoded_string = base58.b58encode(unencoded_string.decode("hex"))
print(encoded_string)
The output looks better now:
1csU3KSAQMEYLPudM8UWJVxFfptcZSDvaYY477
Yet, it is still wrong. Does it have to be a byte encoding? How do you do that in Python?
EDIT2:
Fixed it now (thanks to Arpegius). Added str(bytearray.fromhex( hexstring )) to my code (in Python 2.7):
import base58
hexstring= "00010966776006953D5567439E5E39F86A0D273BEED61967F6"
unencoded_string = str(bytearray.fromhex( hexstring ))
encoded_string= base58.b58encode(unencoded_string)
print(encoded_string)
Output:
16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM

In base58.b58encode need a bytes (python2 str) not a hex. You need to decode it first:
In [1]: import base58
In [2]: hexstring= "00010966776006953D5567439E5E39F86A0D273BEED61967F6"
In [3]: unencoded_string = bytes.fromhex(hexstring)
In [4]: encoded_string= base58.b58encode(unencoded_string)
In [5]: print(encoded_string)
16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM
In python 2.7 you can use str(bytearray.fromhex( hexstring )).

Related

How to convert utf-8 characters to "normal" characters in string in python3.10?

I have raw data that looks like this:
25023,Zwerg+M%C3%BCtze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m%21k4z3,90,1,1539,2530
It is saved as a .txt file: https://de205.die-staemme.de/map/player.txt
The "characters" starting with % are unicode, as far as I can tell.
I found the following table about it: https://www.i18nqa.com/debug/utf8-debug.html
Here is my code so far:
urllib.urlretrieve(url,pfad + "player.txt")
f = open(pfad + "player.txt","r",encoding="utf-8")
raw = raw.split("\n")
f.close()
Python does not convert the %-characters. They are read as if they were seperate characters.
Is there a way to convert these characters without calling .replace like 200 times?
Thank you very much in advance for help and/or useful hints!
The %s are URL-encoding; use urllib.parse.unquote to decode the string.
>>> raw = """25023,Zwerg+M%C3%BCtze,0,1,986,3780
... 25871,red+earth,0,1,38,8349
... 25931,K4m%21k4z3,90,1,1539,2530"""
>>> import urllib.parse
>>> print(urllib.parse.unquote(raw))
25023,Zwerg+Mütze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m!k4z3,90,1,1539,2530

How to decode a string representation of a bytes object?

I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'

Python adds extra to crypt result

I'm trying to create an API with token to communicate between an Raspberry Pi and a Webserver. Right now i'm tring to generate an Token with Python.
from Crypto.Cipher import AES
import base64
import os
import time
import datetime
import requests
BLOCK_SIZE = 32
BLOCK_SZ = 14
#!/usr/bin/python
salt = "123456789123" # Zorg dat de salt altijd even lang is! (12 Chars)
iv = "1234567891234567" # Zorg dat de salt altijd even lang is! (16 Chars)
currentDate = time.strftime("%d%m%Y")
currentTime = time.strftime("%H%M")
PADDING = '{'
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
EncodeAES = lambda c, s: base64.b64encode(c.encrypt(pad(s)))
DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING)
secret = salt + currentTime
cipher=AES.new(key=secret,mode=AES.MODE_CBC,IV=iv)
encode = currentDate
encoded = EncodeAES(cipher, encode)
print (encoded)
The problem is that the output of the script an exta b' adds to every encoded string.. And on every end a '
C:\Python36-32>python.exe encrypt.py
b'Qge6lbC+SulFgTk/7TZ0TKHUP0SFS8G+nd5un4iv9iI='
C:\Python36-32>python.exe encrypt.py
b'DTcotcaU98QkRxCzRR01hh4yqqyC92u4oAuf0bSrQZQ='
Hopefully someone can explain what went wrong.
FIXED!
I was able to fix it to decode it to utf-8 format.
sendtoken = encoded.decode('utf-8')
You are running Python 3.6, which uses Unicode (UTF-8) for string literals. I expect that the EncodeAES() function returns an ASCII string, which Python is indicating is a bytestring rather than a Unicode string by prepending the b to the string literal it prints.
You could strip the b out of the output post-Python, or you could print(str(encoded)), which should give you the same characters, since ASCII is valid UTF-8.
EDIT:
What you need to do is decode the bytestring into UTF-8, as mentioned in the answer and in a comment above. I was wrong about str() doing the conversion for you, you need to call decode('UTF-8') on the bytestring you wish to print. That converts the string into the internal UTF-8 representation, which then prints correctly.

How to decode base64 in python3

I have a base64 encrypt code, and I can't decode in python3.5
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA" # Unencrypt is 202cb962ac59075b964b07152d234b70
base64.b64decode(code)
Result:
binascii.Error: Incorrect padding
But same website(base64decode) can decode it,
Please anybody can tell me why, and how to use python3.5 decode it?
Thanks
Base64 needs a string with length multiple of 4. If the string is short, it is padded with 1 to 3 =.
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA="
base64.b64decode(code)
# b'admin:202cb962ac59075b964b07152d234b70'
According to this answer, you can just add the required padding.
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA"
b64_string = code
b64_string += "=" * ((4 - len(b64_string) % 4) % 4)
base64.b64decode(b64_string) #'admin:202cb962ac59075b964b07152d234b70'
I tried the other way around. If you know what the unencrypted value is:
>>> import base64
>>> unencoded = b'202cb962ac59075b964b07152d234b70'
>>> encoded = base64.b64encode(unencoded)
>>> print(encoded)
b'MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA='
>>> decoded = base64.b64decode(encoded)
>>> print(decoded)
b'202cb962ac59075b964b07152d234b70'
Now you see the correct padding. b'MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA=
It actually seems to just be that code is incorrectly padded (code is incomplete)
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA"
base64.b64decode(code+"=")
returns b'admin:202cb962ac59075b964b07152d234b70'

bz2 decompress with Python 3.4 - TypeError: 'str' does not support the buffer interface

There are similar errors but I could not find a solution for bz2.
The following program fails on the decompress:
import bz2
un = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
decoded_un = bz2.decompress(un)
decoded_pw = bz2.decompress(pw)
print(decoded_un)
print(decoded_pw)
I tried using bytes(un, 'UTF-8) but that would not work. I think I did not have this problem in Python 3.3.
EDIT: this was for the Python challenge I have two bits of code which work thanks to Martijn:
import bz2
un_saved = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw_saved = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
print(bz2.decompress(un_saved.encode('latin1')))
print(bz2.decompress(pw_saved.encode('latin1')))
This one works from the webpage:
# http://www.pythonchallenge.com/pc/def/integrity.html
import urllib.request
import re
import os.path
import bz2
fname = "008.html"
if not os.path.isfile(fname):
url = 'http://www.pythonchallenge.com/pc/def/integrity.html'
response = urllib.request.urlopen(url)
webpage = response.read().decode("utf-8")
with open(fname, "w") as fh:
fh.write(webpage)
with open(fname, "r") as fh:
webpage = fh.read()
re_un = '\\nun: \'(.*)\'\\n'
m = re.search(re_un, webpage)
un = m.group(1)
print(un)
pw_un = '\\npw: \'(.*)\'\\n'
m = re.search(pw_un, webpage)
pw = m.group(1)
print(pw)
unde = un.encode('latin-1').decode('unicode_escape').encode('latin1')
pwde = pw.encode('latin-1').decode('unicode_escape').encode('latin1')
decoded_un = bz2.decompress(unde)
decoded_pw = bz2.decompress(pwde)
print(decoded_un)
print(decoded_pw)
The bz2 library deals with bytes objects, not strings:
un = b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw = b'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
In other words, using bytes() works just fine, just make sure you use the correct encoding. UTF-8 is not that encoding; if you have bytes masking as string character codepoints, use Latin-1 to encode instead; Latin 1 maps characters one-on-one to bytes:
un = un.encode('latin1')
or
un = bytes(un, 'latin1')
Also see the Python Unicode HOWTO:
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points 0–255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1.
I'll leave the decoding to you. Have fun with the Python Challenge!
Note that if you loaded these characters as they are from a webpage, they will not by ready-made bytes! You'll have the characters '\', 'x', 8 and 2 rather than a codepoint with hex value 82. You'd need to interpret those sequences as a Python string literal first:
>>> un = r'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
>>> un
'BZh91AY&SYA\\xaf\\x82\\r\\x00\\x00\\x01\\x01\\x80\\x02\\xc0\\x02\\x00 \\x00!\\x9ah3M\\x07<]\\xc9\\x14\\xe1BA\\x06\\xbe\\x084'
>>> un.encode('latin-1').decode('unicode_escape')
'BZh91AY&SYA¯\x82\r\x00\x00\x01\x01\x80\x02À\x02\x00 \x00!\x9ah3M\x07<]É\x14áBA\x06¾\x084'
>>> un.encode('latin-1').decode('unicode_escape').encode('latin1')
b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
Note the double backslashes in the representation of un. Only the last bytes result is then decompressable!

Categories