String decoding of format \x123 in Python - python

Do anybody know that type of string?
And how to convert it to readable format in Python?
This data from log file of the mobile app (it might be in Russian)
"title":"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443}"
Thanks ahead!

For me it does look like hex-codes of characters, I would extract codes, treat them as base-16 integers and convert to characters. That is
title = r"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443}"
codes = [code.strip('{} ') for code in title.split(r"\x") if code]
characters = [chr(int(code, 16)) for code in codes]
output = ''.join(characters)
print(output)
Output:
Отсрочкапокредиту

import json
data = r'"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443} "'
print(json.loads(data.replace('{','').replace('}','').replace('x', 'u0')))
…and the output is Отсрочка по кредиту.

Related

How to convert utf-8 characters to "normal" characters in string in python3.10?

I have raw data that looks like this:
25023,Zwerg+M%C3%BCtze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m%21k4z3,90,1,1539,2530
It is saved as a .txt file: https://de205.die-staemme.de/map/player.txt
The "characters" starting with % are unicode, as far as I can tell.
I found the following table about it: https://www.i18nqa.com/debug/utf8-debug.html
Here is my code so far:
urllib.urlretrieve(url,pfad + "player.txt")
f = open(pfad + "player.txt","r",encoding="utf-8")
raw = raw.split("\n")
f.close()
Python does not convert the %-characters. They are read as if they were seperate characters.
Is there a way to convert these characters without calling .replace like 200 times?
Thank you very much in advance for help and/or useful hints!
The %s are URL-encoding; use urllib.parse.unquote to decode the string.
>>> raw = """25023,Zwerg+M%C3%BCtze,0,1,986,3780
... 25871,red+earth,0,1,38,8349
... 25931,K4m%21k4z3,90,1,1539,2530"""
>>> import urllib.parse
>>> print(urllib.parse.unquote(raw))
25023,Zwerg+Mütze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m!k4z3,90,1,1539,2530

How to find floating point numbers in binary file with Python?

I have a binary file mixed with ASCII in which there are some floating point numbers I want to find. The file contains some lines like this:
1,1,'11.2','11.3';1,1,'100.4';
In my favorite regex tester I found that the correct regex should be ([0-9]+\.{1}[0-9]+).
Here's the code:
import re
data = open('C:\\Users\\Me\\file.bin', 'rb')
pat = re.compile(b'([0-9]+\.{1}[0-9]+)')
print(pat.match(data.read()))
I do not get a single match, why is that? I'm on Python 3.5.1.
You can try like this,
import re
with open('C:\\Users\\Me\\file.bin', 'rb') as f:
data = f.read()
re.findall("\d+\.\d+", data)
Output:
['11.2', '11.3', '100.4']
re.findall returns string list. If you want to convert to float you can do like this
>>> list(map(float, re.findall("\d+\.\d+", data)))
[11.2, 11.3, 100.4]
How to find floating point numbers in binary file with Python?
float_re = br"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
for m in generate_tokens(r'C:\Users\Me\file.bin', float_re):
print(float(m.group()))
where float_re is from this answer and generate_tokens() is defined here.
pat.match() tries to match at the very start of the input string and your string does not start with a float and therefore you "do not get a single match".
re.findall("\d+\.\d+", data) produces TypeError because the pattern is Unicode (str) but data is a bytes object in your case. Pass the pattern as bytes:
re.findall(b"\d+\.\d+", data)

Python: converting hex values, stored as string, to hex data

(Answer found. Close the topic)
I'm trying to convert hex values, stored as string, in to hex data.
I have:
data_input = 'AB688FB2509AA9D85C239B5DE16DD557D6477DEC23AF86F2AABD6D3B3E278FF9'
I need:
data_output = '\xAB\x68\x8F\xB2\x50\x9A\xA9\xD8\x5C\x23\x9B\x5D\xE1\x6D\xD5\x57\xD6\x47\x7D\xEC\x23\xAF\x86\xF2\xAA\xBD\x6D\x3B\x3E\x27\x8F\xF9'
I was trying data_input.decode('hex'), binascii.unhexlify(data_input) but all they return:
"\xabh\x8f\xb2P\x9a\xa9\xd8\\#\x9b]\xe1m\xd5W\xd6G}\xec#\xaf\x86\xf2\xaa\xbdm;>'\x8f\xf9"
What should I write to receive all bytes in '\xFF' view?
updating:
I need representation in '\xFF' view to write this data to a file (I'm opening file with 'wb') as:
«hЏІPљ©Ш\#›]бmХWЦG}м#Ї†тЄЅm;>'Џщ
update2
Sorry for bothering. An answer lies under my nose all the time:
data_output = data_input.decode('hex')
write_file(filename, data_output) #just opens a file 'wb', ant write a data in it
gives the same result as I need
I like chopping strings into fixed-width chunks using re.findall
print '\\x' + '\\x'.join(re.findall('.{2}', data_input))
If you want to actually convert the string into a list of ints, you can do that like this:
data = [int(x, 16) for x in re.findall('.{2}', data_input)]
It's an inefficient solution, but there's always:
flag = True
data_output = ''
for char in data_input:
if flag:
buffer = char
flag = False
else:
data_output = data_output + '\\x' + buffer + char
flag = True
EDIT HOPEFULLY THE LAST: Who knew I could mess up in so many different ways on that simple a loop? Should actually run now...
>>> int('0x10AFCC', 16)
1093580
>>> hex(1093580)
'0x10afcc'
So prepend your string with '0x' then do the above

Python get packet data TCP

Hello I have python sniffer
def receiveData(s):
data = ''
try:
data = s.recvfrom(65565)
#k = data.rstrip()
except timeout:
data = ''
except:
print "An Error Occurred."
sys.exc_info()
return data[0]
data = receiveData(s)
s is socket. Im getting data but it contains symbols please help me somebody how i can convert it into plain text
Im newbie in Python and if its very silly question sorry : )
this is data example E\x00\x00)\x1a*#\x00\x80\x06L\xfd\xc0\xa8\x01\x84\xad\xc2#\xb9M\xb8\x00P\xed\xb3\x19\xd9\xedY\xc1\xfbP\x10\x01\x04\x16=\x00\x00\x00'
You can't really convert it to "plain text". Characters such as the NUL (ASCII 0, shown as \x00) can't be displayed so python shows them in their hex representation.
What most sniffing/hexdump tools do is to replace unprintable characters with e.g. a dot. You could do it like this:
import string
printable = set(string.printable)
print ''.join(x if x in printable else '.' for x in data)
Example:
>>> data = 'E\x00\x00)\x1a*#\x00\x80\x06L\xfd\xc0\xa8\x01\x84\xad\xc2#\xb9M\xb8\x00P\xed\xb3\x19\xd9\xedY\xc1\xfbP\x10\x01\x04\x16=\x00\x00\x00'
>>> print ''.join(x if x in printable else '.' for x in data)
E..).*#...L.......#.M..P.....Y..P....=...
The conversion to "plain text" depends on what your data mean.
Do you have compressed text? Then uncompress it.
Do you have encoded numbers? Then decode it and display the numbers.
Without knowing the semantic of the data, no one can tell you.
Of course, you can just display the raw data with print data.encode("hex"), but I am not sure if that is what you want.

python 2.7 encoding decoding

I have a problem involving encoding/decoding.
I read text from file and compare it with text from database (Postgres)
Compare is done within two lists
from file i get "jo\x9a" for "još" and from database I get "jo\xc5\xa1" for same value
common = [a for a in codes_from_file if a in kode_prfoksov]
# Items in one but not the other
only1 = [a for a in codes_from_file if not a in kode_prfoksov]
#Items only in another
only2 = [a for a in kode_prfoksov if not a in codes_from_file ]
How to solve this? Which encoding should be set when comparing this two strings to solve the issue?
thank you
The first one seems to be windows-1250, and the second is utf-8.
>>> print 'jo\x9a'.decode('windows-1250')
još
>>> print 'jo\xc5\xa1'.decode('utf-8')
još
>>> 'jo\x9a'.decode('windows-1250') == 'jo\xc5\xa1'.decode('utf-8')
True
Your file strings seems to be Windows-1250 encoded. Your database seems to contain UTF-8 strings.
So you can either convert first all strings to unicode:
codes_from_file = [a.decode("windows-1250") for a in codes_from_file]
kode_prfoksov] = [a.decode("utf-8") for a in codes_from_file]
or if you do not want unicode strings, just convert the file string to UTF-8:
codes_from_file = [a.decode("windows-1250").encode("utf-8") for a in codes_from_file]

Categories