Filename has weird encoding when saving image - python
I'm webscraping, and when I'm saving the image, it seems like the encoding is changed when saving the file. For instance, in the filename, 'é' becomes '%c3%a9'. I'm capable of catching all these changes with this function:
def unify_filename(string):
return string.lower().strip().replace(',', '%2c').replace('%', '%25').replace('ô', '%b3%b4'). \
replace('é', '%c3%a9').replace('++', '+').replace('è', '%c3%a8').replace('î', '%c3%ae'). \
replace('#', '%23').replace(';', '%3b').replace('%2b%2b', '%2b'). \
replace('&', '%26').replace('+', '%2b').replace(' ', '+').replace('[', '%5b'). \
replace(']', '%5d').replace('%2b', '+').replace('%40', '#').replace('®', '%c2%ae'). \
replace('%7e', '').replace('~', '').replace('%27', '').replace('©', '%c2%a9'). \
replace("'", '').replace('ô', '%b3%b4').replace('+', ' ').replace('^', '%5e'). \
replace('$', '%24').replace(' ', ' ').replace('`', '%60').replace('’', '%e2%80%99')
Is there an easier way? Is this some encoding I don't know?
You can handle these values using urllib parser. you can use the unquote() function.
Use the following piece of code:
import urllib.parse
print(urllib.parse.unquote('%c3%a9', encoding='utf-8'))
Output:
'é'
Related
VIM command to insert multiline text with argument
new VIM user. I'm trying to make creating python properties easier for my class definitions. What I would like for say I type :pyp x then VIM will autofill where my cursor is #property def x(self): return self.x #property.setter def x(self,val): self._x = val or more abstractly I type :pyp <property_name> and VIM fills #property def <property_name>(self): return self.<property_name> #property.setter def <property_name>(self,val): self._<property_name> = val I've looked at a few posts and the wikis on functions, macros but I'm very unsure of how to go about it or what to even look up as I am brand new VIM user, less than a week old. I tried using [this][1] as an example, in my .vimrc but I couldn't even get that to work. Edit: So the code I am currently trying is function! PythonProperty(prop_name) let cur_line = line('.') let num_spaces = indent('.') let spaces = repeat(' ',num_spaces) let lines = [ spaces."#property", \ spaces."def ".prop_name."(self):", \ spaces." return self.".property, \ spaces."#property.setter", \ spaces."def".prop_name."(self,val)", \ spaces." self._".prop_name." = val" ] call append(cur_line,lines) endfunction and I am getting the errors E121: Undefined variable: prop_name I am typing `:call PythonProperty("x")` [1]: https://vi.stackexchange.com/questions/9644/how-to-use-a-variable-in-the-expression-of-a-normal-command
E121: Undefined variable: prop_name In VimScript variables have scopes. The scope for function arguments is a:, while the default inside a function is l: (local variable). So the error means that l:prop_name was not yet defined. Now how I do this: function! s:insert_pyp(property) let l:indent = repeat(' ', indent('.')) let l:text = [ \ '#property', \ 'def <TMPL>(self):', \ ' return self.<TMPL>', \ '#property.setter', \ ' def <TMPL>(self,val):', \ ' self._<TMPL> = val' \ ] call map(l:text, {k, v -> l:indent . substitute(v, '\C<TMPL>', a:property, 'g')}) call append('.', l:text) endfunction command! -nargs=1 Pyp :call <SID>insert_pyp(<q-args>) Alternatively, we can simulate actual key presses (note that we don't need to put indents in the template anymore; hopefully, the current buffer has set ft=python): function! s:insert_pyp2(property) let l:text = [ \ '#property', \ 'def <TMPL>(self):', \ 'return self.<TMPL>', \ '#property.setter', \ 'def <TMPL>(self,val):', \ 'self._<TMPL> = val' \ ] execute "normal! o" . substitute(join(l:text, "\n"), '\C<TMPL>', a:property, 'g') . "\<Esc>" endfunction command! -nargs=1 Pyp2 :call <SID>insert_pyp2(<q-args>) its very very difficult if not impossible to get pluggins I suggest you to watch this video on youtube. In fact, many of Vim plugins are just overkill.
How to use a string for icon bitmap?
Is there a way to use a string for the iconbitmap in the Tkinter (Python 2.7.9) module? I know that you can prodive a file path (even though I haven't understood where the difference between default and bitmap as parameters is. The reason I am asking is because I want to create out of a Python script an .exe with py2exe (which works), but I would need to create a icon file then to be able to use an icon. Any workaround or other method is appreciated.
(Note to folks using Python 3, see my supplemental answer for an alternative that only works in that version.) I don't know of any way to pass iconbitmap() anything other than a file path in Python 2.x, so here's a workaround that creates a temporary file from a string representation of icon file's contents to pass it. It also shows a way to ensure the temporary file gets deleted. import atexit import binascii import os import tempfile try: import Tkinter as tk except ModuleNotFoundError: # Python 3 import tkinter as tk iconhexdata = '00000100010010100000010018006803000016000000280000001000000020' \ '000000010018000000000040030000130b0000130b00000000000000000000' \ 'ffffff6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c' \ '6c6d6c6c6d6c6c6d6c6c6d6c6c6dffffffffffff6c6c6d6c6c6d6c6c6d6c6c' \ '6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d' \ 'ffffffffffff6c6c6d6c6c6dffffffffffffffffffffffffffffffffffffff' \ 'ffffffffffffffffffffff6c6c6d6c6c6dffffffffffff6c6c6d6c6c6dffff' \ 'ff6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6d6c6c6dffffff6c6c6d' \ '6c6c6dffffffffffff6c6c6d6c6c6dffffff6c6c6d6c6c6d6c6c6d6c6c6d6c' \ '6c6d6c6c6d6c6c6d6c6c6dffffff6c6c6d6c6c6dffffffffffff6c6c6d6c6c' \ '6dfffffffffffffffffffffffffffffff2f4f7d6dfe9b8cadb95b2cfedf2f6' \ '6c6c6d6c6c6dfffffffffffffffffffffffff4f7fac0d4e69bb9d6739dc657' \ '89ba3e78b03f78af4177ad4276abd2deeaffffffffffffffffffffffffffff' \ 'ffffffffdfe9f24178ad4178ad4178ad5081b17398be9db8d3bed4e6bbd7ec' \ 'add7f3fffffffffffffffffffffffffffffffffffff8fafcaac2dac4d3e4df' \ 'e8f1f9fbfdfffffff4fafd91cff520a3f10297eee4f4feffffffffffffffff' \ 'ffffffffffffffffffffffffffffffffffffffe7f4fd7fcaf6159def0595ec' \ '179fec82c7f4bad6f7fdfefffffffffffffffffffffffffffffffffdfeffdb' \ 'f0fd7bc8f6119bed0695eb1a9ded7ecaf5f0f8febfd3f73165e495b1f1ffff' \ 'fffffffffffffffffffffffffff6fbfe2fa6ee0695eb1b9eed86ccf5e8f6fd' \ 'ffffffd2dff93468e5326ae5c7d6f8ffffffffffffffffffffffffffffffff' \ 'ffff96d2f784cbf5eaf6fdffffffffffffe3eafb4275e72c66e4b6caf6ffff' \ 'ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff' \ 'f3f6fd5784ea2c66e499b5f2ffffffffffffffffffffffffffffffffffffff' \ 'fffffffffffffffffffffffffffffdfeff7097ed2c66e47a9eeeffffffffff' \ 'fffffffffffffffffffffffffffffffffffffffffffffffffffffffffdfeff' \ '85a7ef2c66e4608cebf9fbfeffffffffffffffffffffffff00000000000000' \ '00000000000000000000000000000000000000000000000000000000000000' \ '0000000000000000000000000000000000000000000000000000' def on_closing(iconfile): try: os.remove(iconfile.name) except Exception: pass with tempfile.NamedTemporaryFile(delete=False) as iconfile: iconfile.write(binascii.a2b_hex(iconhexdata)) # Register a clean-up function. atexit.register(lambda file=iconfile: on_closing(file)) root = tk.Tk() root.title('stackoverflow!') root.iconbitmap(iconfile.name) tk.Label(root, text='Note the custom icon').pack() tk.Button(root, text='OK', bg='lightgreen', command=root.quit).pack() root.mainloop() The window displayed will have a custom icon as shown below: You didn't ask how to do it, but here's the code I used to convert the original .ico file into the Python string variable used in my example: from __future__ import print_function import binascii try: from itertools import izip_longest as zip_longest except ImportError: from itertools import zip_longest iconfile = 'stackoverflow.ico' # Path to icon file. VAR_NAME = 'iconhexdata' VAR_SUFFIX = ' = ' INDENTATION = ' ' * len(VAR_NAME+VAR_SUFFIX) MAX_LINE_LENGTH = 80 EXTRA_CHARS = '"" \\' # That get added to each group of hex digits. LINE_LENGTH = MAX_LINE_LENGTH - len(INDENTATION) - len(EXTRA_CHARS) def grouper(chunk_size, iterable): """ Collect data into fixed-length chunks or blocks. s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ... """ return zip_longest(*[iter(iterable)]*chunk_size, fillvalue='') with open(iconfile, 'rb') as imgfile: hexstr = [chr(x) for x in bytearray(binascii.b2a_hex(imgfile.read()))] hexlines = (''.join(str(x) for x in group) for group in grouper(LINE_LENGTH, hexstr)) print(VAR_NAME + VAR_SUFFIX, end='') print((' \\\n' + INDENTATION).join((repr(line) for line in hexlines)))
This doesn't answer your Python 2.7 question, but may be of interest to others nowadays who are using Python 3. In the later version of Python the tkinter module has an alternative to the iconbitmap() function—named iconphoto()—that can be used to set the title-bar icon of any tkinter/toplevel window. However, unlike the former method, the image should be the instance of the tkinter.PhotoImage class, not only a file path string — and there are ways to create a PhotoImage object from a byte string in the program so there would be no need to use a file at all, even a temporary one as was the case in my original answer. #!/usr/bin/env python3 iconimgdata = b'iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAGXRFWHRTb2Z0d' \ b'2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAXpJREFUeNpi+I8V/Pv7fnnrkz' \ b'Sd1z0J/37/RJZhYsAKGJkEwis4DV1+3jrzYXEjsgwODRA98c1sctrfTmz6vG0' \ b'WQhzNLd8vTft6uuXvt1cQ7u9Xj5+XOgHd9u3UNogIioZ/v7+9W6b/eirb23nS' \ b'X8+0/f32Aij48/6lpxkmT7OMflw7ju4HRmZ2fq813MalDH+/fTvZ8GG50bfT9' \ b'aySckLZ3f9///95+xxIDcgWDPDv64uf12b8vDzt748PDFxCHBrZzPwWHBrOQI' \ b'8hNPz9/fPeiU1cglK8otI8wlJMLGz/fn/9cXXunyv9f788Eoh9xMgtDVTGAjf' \ b'12/vnl7dNh7BZOPl5xZVFFbSEZXTZTGazM3yCqEZx0u8fX9/cPfPh6e0PT258' \ b'efMEqP8/A+O//0z//jPaZ0wQVdRFaMjLzQWyJk2ejOyNH18/f3r95NPrR19e3' \ b'FV3iCivqoeoYUFWBNGJCSb5ChER0zgAig1oriKgAZd70ADJTgIIMACVtvtL6F' \ b'X2cAAAAABJRU5ErkJggg==' import base64 import tkinter as tk # Python 3 root = tk.Tk() root.title('stackoverflow!') root.geometry('225x50') img = base64.b64decode(iconimgdata) photo = tk.PhotoImage(data=img) root.iconphoto(False, photo) tk.Label(root, text='Note the custom icon').pack() tk.Button(root, text='OK', bg='lightgreen', command=root.quit).pack() root.mainloop() Here's what the demo looks like running: Here again is the code I used to convert the 16x16 pixel .png image file I had into the Python bytestring variable used in the code in my example above. Here's a link to the small stackoverflow.png image file being used. #!/usr/bin/env python3 import base64 from itertools import zip_longest imgfilepath = 'stackoverflow.png' # Path to an image file. VAR_NAME = 'iconimgdata' VAR_SUFFIX = ' = ' INDENTATION = ' ' * len(VAR_NAME+VAR_SUFFIX) MAX_LINE_LENGTH = 79 EXTRA_CHARS = '"" \\' # That get added to each group of hex digits. LINE_LENGTH = MAX_LINE_LENGTH - len(INDENTATION) - len(EXTRA_CHARS) def grouper(chunk_size, iterable): """ Collect data into fixed-length chunks or blocks. s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ... """ return zip_longest(*[iter(iterable)]*chunk_size, fillvalue='') with open(imgfilepath, 'rb') as file: hexstr = [chr(x) for x in base64.b64encode(file.read())] hexlines = (''.join(str(x) for x in group) for group in grouper(LINE_LENGTH, hexstr)) print(VAR_NAME + VAR_SUFFIX + 'b', end='') print((' \\\n' + INDENTATION + 'b').join((repr(line) for line in hexlines)))
How do I encode properly a possibly chinese encoding in python?
I am scraping the following link: http://www.footballcornersta.com/en/league.php?select=all&league=%E8%8B%B1%E8%B6%85&year=2014&month=1&Submit=Submit and the following string contains all the available options in a menu relevant to league: ls_main = [['E','ENG PR','英超'],['E','ENG FAC','英足总杯'],['E','ENG Champ','英冠'],['E','ENG D1','英甲'],['I','ITA D1','意甲'],['I','ITA D2','意乙'],['S','SPA D1','西甲'],['S','SPA D2','西乙'],['G','GER D1','德甲'],['G','GER D2','德乙'],['F','FRA D1','法甲'],['F','FRA D2','法乙'],['S','SCO PR','苏超'],['R','RUS PR','俄超'],['T','TUR PR','土超'],['B','BRA D1','巴西甲'],['U','USA MLS','美职联'],['A','ARG D1','阿根甲'],['J','JP D1','日职业'],['J','JP D2','日职乙'],['A','AUS D1','澳A联'],['K','KOR D1','韩K联'],['C','CHN PR','中超'],['E','EURO Cup','欧洲杯'],['I','Italy Supe','意超杯'],['K','KOR K3','K3联'],['C','CHN D1','中甲'],['D','DEN D2-E','丹乙东'],['D','DEN D2-W','丹乙西'],['D','DEN D1','丹甲'],['D','DEN PR','丹超'],['U','UKR U21','乌克兰U21'],['U','UD2','乌克甲'],['U','UKR D1','乌克超'],['U','Uzber D1','乌兹超'],['U','URU D1','乌拉甲'],['U','UZB D2','乌茲甲'],['I','ISR D2','以色列乙'],['I','ISR D1','以色列甲'],['I','ISR PR','以色列超'],['I','Iraq L','伊拉联'],['I','Ira D1','伊朗甲'],['I','IRA P','伊朗联'],['R','RUS D2C','俄乙中'],['R','RUS D2U','俄乙乌'],['R','RUS D2S','俄乙南'],['R','RUS D2W','俄乙西'],['R','RUS RL','俄后赛'],['R','RUS D1','俄甲'],['R','RUS PR','俄超'],['B','BUL D1','保甲'],['C','CRO D1','克甲'],['I','ICE PR','冰岛超'],['G','GHA PL','加纳超'],['H','Hun U19','匈U19'],['H','HUN D2E','匈乙东'],['H','HUN D2W','匈乙西'],['H','HUN D1','匈甲'],['N','NIR IFAC','北爱冠'],['N','NIRE PR','北爱超'],['S','SAfrica D1','南非甲'],['S','SAfrica NSLP','南非超'],['L','LUX D1','卢森甲'],['I','IDN PR','印尼超'],['I','IND D1','印度甲'],['G','GUAT D1','危地甲'],['E','ECU D1','厄甲'],['F','Friendly','友谊赛'],['K','KAZ D1','哈萨超'],['C','COL D2','哥伦乙'],['C','COL C','哥伦杯'],['C','COL D1','哥伦甲'],['C','COS D1','哥斯甲'],['T','TUR U23','土A2青'],['T','TUR D3L1','土丙1'],['T','TUR D3L2','土丙2'],['T','TUR D3L3','土丙3'],['T','TUR2BK','土乙白'],['T','TUR2BB','土乙红'],['T','TUR D1','土甲'],['E','EGY PR','埃及超'],['S','Serbia D2','塞尔乙'],['S','Serbia 1','塞尔联'],['C','CYP D2','塞浦乙'],['C','CYP D1','塞浦甲'],['M','MEX U20','墨西U20'],['M','Mex D2','墨西乙'],['M','MEX D1','墨西联'],['A','AUT D3E','奥丙东'],['A','AUT D3C','奥丙中'],['A','AUT D3W','奥丙西'],['A','AUT D2','奥乙'],['A','AUT D1','奥甲'],['V','VEN D1','委超'],['W','WAL D2','威甲'],['W','WAL D2CA','威联盟'],['W','WAL D1','威超'],['A','Ang D1','安哥甲'],['N','NIG P','尼日超'],['P','PAR D1','巴拉甲'],['B','BRA D2','巴西乙'],['B','BRA CP','巴锦赛'],['G','GRE D3N','希丙北'],['G','GRE D3S','希丙南'],['G','GRE D2','希乙'],['G','GRE D1','希甲'],['G','GER U17','德U17'],['G','GER U19','德U19'],['G','GER D3','德丙'],['G','GER RN','德北联'],['G','GER RS','德南联'],['G','GER RW','德西联'],['I','ITA D3A','意丙A'],['I','ITA D3B','意丙B'],['I','ITA D3C1','意丙C1'],['I','ITA D3C2','意丙C2'],['I','ITA CP U20','意青U20'],['E','EST D3','愛沙丙'],['N','NOR D2-A','挪乙A'],['N','NOR D2-B','挪乙B'],['N','NOR D2-C','挪乙C'],['N','NOR D2-D','挪乙D'],['N','NORC','挪威杯'],['N','NOR D1','挪甲'],['N','NOR PR','挪超'],['C','CZE D3','捷丙'],['C','CZE MSFL','捷丙M'],['C','CZE D2','捷乙'],['C','CZE U19','捷克U19'],['C','CZE D1','捷克甲'],['M','Mol D2','摩尔乙'],['M','MOL D1','摩尔甲'],['M','MOR D2','摩洛哥乙'],['M','MOR D1','摩洛超'],['S','Slovakia D3E','斯丙東'],['S','Slovakia D3W','斯丙西'],['S','Slovakia D2','斯伐乙'],['S','Slovakia D1','斯伐甲'],['S','Slovenia D1','斯洛甲'],['S','SIN D1','新加联'],['J','JL3','日丙联'],['C','CHI D2','智乙'],['C','CHI D1','智甲'],['G','Geo','格鲁甲'],['G','GEO PR','格鲁超'],['U','UEFA CL','欧冠杯'],['U','UEFA SC','欧霸杯'],['B','BEL D3A','比丙A'],['B','BEL D3B','比丙B'],['B','BEL D2','比乙'],['B','BEL W1','比女甲'],['B','BEL C','比杯'],['B','BEL D1','比甲'],['S','SAU D2','沙地甲'],['S','SAU D1','沙地联'],['F','FRA D4A','法丁A'],['F','FRA D4B','法丁B'],['F','FRA D4C','法丁C'],['F','FRA D4D','法丁D'],['F','FRA D3','法丙'],['F','FRA U19','法国U19'],['F','FRA C','法国杯'],['P','POL D2E','波乙東'],['P','POL D2W','波乙西'],['P','POL D2','波兰乙'],['P','POL D1','波兰甲'],['B','BOS D1','波斯甲'],['P','POL YL','波青联'],['T','THA D1','泰甲'],['T','THA PL','泰超'],['H','HON D1','洪都甲'],['A','Aus BP','澳布超'],['E','EST D1','爱沙甲'],['I','IRE D1','爱甲'],['I','IRE PR','爱超'],['B','BOL D1','玻利甲'],['F','Friendly','球会赛'],['S','SWI D1','瑞士甲'],['S','SWI PR','瑞士超'],['S','SWE D2','瑞甲'],['S','SWE D1','瑞超'],['B','BLR D2','白俄甲'],['B','BLR D1','白俄超'],['P','Peru D1','秘鲁甲'],['T','TUN D2','突尼乙'],['T','Tun D1','突尼甲'],['R','ROM D2G1','罗乙1'],['R','ROM D2G2','罗乙2'],['R','ROM D1','罗甲'],['L','LIBERT C','自由杯'],['F','FIN D2','芬甲'],['F','FIN D1','芬超'],['S','SCO D3','苏丙'],['S','SUD PL','苏丹超'],['S','SCO D2','苏乙'],['S','SCO D1','苏甲'],['S','SCO HL','苏高联'],['E','ENG D2','英乙'],['E','ENG RyPR','英依超'],['E','ENG UP','英北超'],['E','ENG SP','英南超'],['E','ENG Trophy','英挑杯'],['E','ENG Con','英非'],['E','ENG CN','英非北'],['H','HOL D2','荷乙'],['H','HOL Yl','荷青甲'],['S','SV D1','萨尔超'],['P','POR U19','葡U19'],['P','POR D1','葡甲'],['P','POR PR','葡超'],['S','SPA D3B1','西丙1'],['S','SPA D3B2','西丙2'],['S','SPA D3B3','西丙3'],['S','SPA D3B4','西丙4'],['S','SPA Futsal','西內足'],['S','SPA W1','西女超'],['B','BRA CC','里州赛'],['A','Arg D2M1','阿乙M1'],['A','Arg D2M2','阿乙M2'],['A','Arg D2M3','阿乙M3'],['A','ALG D2','阿及乙'],['A','ALG D1','阿及甲'],['A','AZE D1','阿塞甲'],['A','ALB D1','阿巴超'],['A','ARG D2','阿根乙'],['U','UAE D2','阿联乙'],['K','KOR NL','韩联盟'],['F','FYRM D2','马其乙'],['M','MacedoniaFyr','马其甲'],['M','MAS D1','马来超'],['M','MON D2','黑山乙'],['M','MON D1','黑山甲'],['F','FCWC','世冠杯'],['W','World Cup','世界杯'],['F','FIFAWYC','世青杯'],['C','CWPL','中女超'],['C','CFC','中足协杯'],['D','DEN C','丹麦杯'],['A','Asia CL','亚冠杯'],['A','AFC','亚洲杯'],['R','Rus Cup','俄罗斯杯'],['H','HUN C','匈杯'],['N','NIR C','北爱杯'],['T','TUR C','土杯'],['T','Tenno Hai','天皇杯'],['W','WWC','女世杯'],['I','ITA Cup','意杯'],['G','GER C','德国杯'],['J','JPN LC','日联杯'],['S','SCO FAC','苏足总杯'],['E','ENG JPT','英锦赛'],['E','ENG FAC','足总杯'],['C','CAF NC','非洲杯'],['K','K-LC','韩联杯'],['H','HK D1','香港甲']]; The link of the page I am scraping contains the third character, but when I copy it becomes the link above. I am not sure about the encoding. import re html = 'source of page' matches = re.findall('ls_main = \[\[.*?;', html)[0] matches = matches.decode('unknown encoding').encode('utf-8') How can I put the original character in the string of the link ? I use Python 2.7.
%XX encoding can be done using urllib.qutoe: >>> import urllib >>> urllib.quote('英冠') '%E8%8B%B1%E5%86%A0' >>> urllib.quote(u'英冠'.encode('utf-8')) # with explicit utf-8 encoding. '%E8%8B%B1%E5%86%A0' To get back the original string, use urllib.unquote: >>> urllib.unquote('%E8%8B%B1%E5%86%A0') '\xe8\x8b\xb1\xe5\x86\xa0' >>> print(urllib.unquote('%E8%8B%B1%E5%86%A0')) 英冠 In Python 3.x, use urllib.parse.quote, urllib.parse.unquote: >>> import urllib.parse >>> urllib.parse.quote('英冠', encoding='utf-8') '%E8%8B%B1%E5%86%A0' >>> urllib.parse.unquote('%E8%8B%B1%E5%86%A0', encoding='utf-8') '英冠'
decoding base64 guid in python
I am trying to convert a base64 string back to a GUID style hex number in python and having issues. Base64 encoded string is: bNVDIrkNbEySjZ90ypCLew== And I need to get it back to: 2243d56c-0db9-4c6c-928d-9f74ca908b7b I can do it with the following PHP code but can't work out how to to it in Python function Base64ToGUID($guid_b64) { $guid_bin = base64_decode($guid_b64); return join('-', array( bin2hex(strrev(substr($guid_bin, 0, 4))), bin2hex(strrev(substr($guid_bin, 4, 2))), bin2hex(strrev(substr($guid_bin, 6, 2))), bin2hex(substr($guid_bin, 8, 2)), bin2hex(substr($guid_bin, 10, 6)) )); } Here is the GUIDtoBase64 version: function GUIDToBase64($guid) { $guid_b64 = ''; $guid_parts = explode('-', $guid); foreach ($guid_parts as $k => $part) { if ($k < 3) $part = join('', array_reverse(str_split($part, 2))); $guid_b64 .= pack('H*', $part); } return base64_encode($guid_b64); } Here are some of the results using some of the obvious and not so obvious options: import base64 import binascii >>> base64.b64decode("bNVDIrkNbEySjZ90ypCLew==") 'l\xd5C"\xb9\rlL\x92\x8d\x9ft\xca\x90\x8b{' >>> binascii.hexlify(base64.b64decode("bNVDIrkNbEySjZ90ypCLew==")) '6cd54322b90d6c4c928d9f74ca908b7b'
Python port of the existing function (bitstring required) import bitstring, base64 def base64ToGUID(b64str): s = bitstring.BitArray(bytes=base64.b64decode(b64str)).hex def rev2(s_): def chunks(n): for i in xrange(0, len(s_), n): yield s_[i:i+n] return "".join(list(chunks(2))[::-1]) return "-".join([rev2(s[:8]),rev2(s[8:][:4]),rev2(s[12:][:4]),s[16:][:4],s[20:]]) assert base64ToGUID("bNVDIrkNbEySjZ90ypCLew==") == "2243d56c-0db9-4c6c-928d-9f74ca908b7b"
First off, the b64 string and the resultant GUID doesn't match if we decode properly. >>> import uuid >>> import base64 >>> u = uuid.UUID("2243d56c-0db9-4c6c-928d-9f74ca908b7b") >>> u UUID('2243d56c-0db9-4c6c-928d-9f74ca908b7b') >>> u.bytes '"C\xd5l\r\xb9Ll\x92\x8d\x9ft\xca\x90\x8b{' >>> base64.b64encode(u.bytes) 'IkPVbA25TGySjZ90ypCLew==' >>> b = base64.b64decode('bNVDIrkNbEySjZ90ypCLew==') >>> u2 = uuid.UUID(bytes=b) >>> print u2 6cd54322-b90d-6c4c-928d-9f74ca908b7b The base64 encoded version of the resultant GUID you posted is wrong. I'm not sure I understand the way you're encoding the GUID in the first place. Python has in its arsenal all the tools required for you to be able to answer this problem. However, here's the rough scratching I did in a python terminal: import uuid import base64 base64_guid = "bNVDIrkNbEySjZ90ypCLew==" bin_guid = base64.b64decode(base64_guid) guid = uuid.UUID(bytes=bin_guid) print guid This code should give you enough of a hint to build your own functions. Don't forget, the python shell gives you a powerful tool to test and play with code and ideas. I would investigate using something like IPython notebooks.
I needed to do this to decode a BASE64 UUID that had been dumped from Mongodb. Originally the field had been created by Mongoose. The code I used, based on the code by #tpatja is here: def base64ToGUID(b64str): try: bytes=base64.urlsafe_b64decode(b64str) except Exception as e: print("Can't decode base64 ", e) s = bitstring.BitArray(bytes).hex return "-".join([s[:8],s[8:][:4],s[12:][:4],s[16:][:4],s[20:]])
Based on good answers above, I wrote a version that does not require the bitstring package and includes validations and support for more input options. import base64 import regex import uuid from typing import Optional def to_uuid(obj) -> Optional[uuid.UUID]: if obj is None: return None elif isinstance(obj, uuid.UUID): return obj elif isinstance(obj, str): if regex.match(r'[0-9a-fA-F]{8}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{12}', obj): return uuid.UUID(hex=obj) elif regex.match(r'[0-9a-zA-Z\+\/]{22}[\=]{2}', obj): b64_str = base64.b64decode(obj).hex() uid_str = '-'.join([b64_str[:8], b64_str[8:][:4], b64_str[12:][:4], b64_str[16:][:4], b64_str[20:]]) return uuid.UUID(hex=uid_str) raise ValueError(f'{obj} is not a valid uuid/guid') else: raise ValueError(f'{obj} is not a valid uuid/guid')
IndentationError: expected an indented block
I have just started with python. I was executing a simple program given in 'Dive into Python' by Mark Pilgrim in Ubuntu. The program is as follows: def buildConnectionString(params): """Build a connection string from a dictionary of parameters. Returns string.""" return ";".join(["%s=%s" % (k, v) for k, v in params.items()]) if __name__ == "__main__": myParams = {"server":"mpilgrim", \ "database":"master", \ "uid":"sa", \ "pwd":"secret" \ } print buildConnectionString(myParams) But it is showing error as follows: File "./1.py", line 3 Returns string.""" ^ IndentationError: expected an indented block I have tried few things like giving a space in front of return on line 3, then instead of space using a tab. Can anybody help me in finding out what the error is about, why it has came, etc. and also with some easy tutorials with which a can go ahead. Thanks in advance..
Try it like this: def buildConnectionString(params): """Build a connection string from a dictionary of parameters. Returns string.""" return ";".join(["%s=%s" % (k, v) for k, v in params.items()]) if __name__ == "__main__": myParams = {"server":"mpilgrim", \ "database":"master", \ "uid":"sa", \ "pwd":"secret" \ } print buildConnectionString(myParams) BTW: Do you understand the structure? Function, if __name__=="__main__": block etc.?
Why not read the Python documentation? It might help. ;) http://docs.python.org/2/reference/lexical_analysis.html#indentation