python replace string method in read file as binary - python

I opened an image file in readbinary("rb") mode and stored the data in a variable. Now i want to replace some values in the binary with my values.. but its not working using usual replace method of string
f=open("a.jpg","rb")
a=f.read()
''' first line is '\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x00*Exif\x00\x00II*\x00\x08\x00\x00\x00\x0 '''
a=a.replace("ff","z")
print a
#but there's no change in a
can anyone tell where iam going wrong.. i also tried
a=a.replace(b'ff',b'z')
but still the output was unchanged.
can anyone tell what iam supposed to do to perform the replacement?

I don't know which version of Python you're using (this kind of operations are different between 2 and 3), but try a = str(a) before executing replace method.
EDIT: For python 2.7 only reasonable way I've discovered to do what you want is use built-in function repr. Example:
>>> picture = open("some_picture.jpg", 'rb')
>>> first_line = picture.readline()
>>> first_line
'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x00*Exif\x00\x00II*\x00\x08\x00\x00\x00\x01\x001\x01\x02\x00\x07\x00\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x00Google\x00\x00\xff\xdb\x00\x84\x00\x03\x02\x02\x03\x02\x02\x03\x03\x03\x03\x04\x03\x03\x04\x05\x08\x05\x05\x04\x04\x05\n'
>>> repr(first_line)
>>> "'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x00\\x00\\x01\\x00\\x01\\x00\\x00\\xff\\xe1\\x00*Exif\\x00\\x00II*\\x00\\x08\\x00\\x00\\x00\\x01\\x001\\x01\\x02\\x00\\x07\\x00\\x00\\x00\\x1a\\x00\\x00\\x00\\x00\\x00\\x00\\x00Google\\x00\\x00\\xff\\xdb\\x00\\x84\\x00\\x03\\x02\\x02\\x03\\x02\\x02\\x03\\x03\\x03\\x03\\x04\\x03\\x03\\x04\\x05\\x08\\x05\\x05\\x04\\x04\\x05\\n'"
>>> repr(first_line).replace('ff', 'SOME_OTHER_STRING')
"'\\xSOME_OTHER_STRING\\xd8\\xSOME_OTHER_STRING\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x00\\x00\\x01\\x00\\x01\\x00\\x00\\xSOME_OTHER_STRING\\xe1\\x00*Exif\\x00\\x00II*\\x00\\x08\\x00\\x00\\x00\\x01\\x001\\x01\\x02\\x00\\x07\\x00\\x00\\x00\\x1a\\x00\\x00\\x00\\x00\\x00\\x00\\x00Google\\x00\\x00\\xSOME_OTHER_STRING\\xdb\\x00\\x84\\x00\\x03\\x02\\x02\\x03\\x02\\x02\\x03\\x03\\x03\\x03\\x04\\x03\\x03\\x04\\x05\\x08\\x05\\x05\\x04\\x04\\x05\\n'"

When you display a string at the Python console, the string is encoded so that you can see all of the characters, even the ones that aren't printable. Whenever you see something like \xff, that's not 4 characters, it's a single character in hex notation. To replace it, you also need to specify the same single character.
a = a.replace("\xff", "z")

Related

Python - How can I convert a special character to the unicode representation?

In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True

Convert Unicode string to UTF-8, and then to JSON

I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. In the Python REPL the following seems to work fine:
>>> unicode('©', 'utf-8').encode('utf-8')
'\xc2\xa9'
Note that I’m using U+00A9 COPYRIGHT SIGN as an example here. The '\xC2\xA9' looks close to what I want — a string consisting of two separate code points: U+00C2 and U+00A9. (When UTF-8-decoded, it gives back the original string, '\xA9'.)
Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. However, the following doesn’t seem to do what I want:
>>> import json; json.dumps('\xc2\xa9')
'"\\u00a9"'
Note that it generates a string containing U+00A9 (the original symbol). Instead, I need the UTF-8-encoded string, which would look like "\u00C2\u00A9" in valid JSON.
TL;DR How can I turn '©' into "\u00C2\u00A9" in Python? I feel like I’m missing something obvious — is there no built-in way to do this?
If you really want "\u00c2\u00a9" as the output, give json a Unicode string as input.
>>> print json.dumps(u'\xc2\xa9')
"\u00c2\u00a9"
You can generate this Unicode string from the raw bytes:
s = unicode('©', 'utf-8').encode('utf-8')
s2 = u''.join(unichr(ord(c)) for c in s)
I think what you really want is "\xc2\xa9" as the output, but I'm not sure how to generate that yet.

Greek encoding in PYTHON

i'm trying to store a string and after tokenize it with nltk in python.But i cant understand why after tokenizing it ( it creates a list ) i cant see the strings in list..
Can anyone help me plz?
Here is the code:
#a="Γεια σου"
#b=nltk.word_tokenize(a)
#b
['\xc3\xe5\xe9\xe1', '\xf3\xef\xf5']
I just want to be able to see the content of the list regularly..
Thx in advance
You are using Python 2, where unprefixed quotes denote a byte as opposed to a character string (if you're not sure about the difference, read this). Either switch to Python 3, where this has been fixed, or prefix all character strings with u and print the strings (as opposed to showing their repr, which differs in Python 2.x):
>>> import nltk
>>> a = u'Γεια σου'
>>> b = nltk.word_tokenize(a)
>>> print(u'\n'.join(b))
Γεια
σου
You can see the strings. The characters are represented by escape sequences because of your terminal encoding settings. Configure your terminal to accept input, and present output, in UTF-8.

use raw_input generated string safely (Python)

I'm wondering how to use a string from raw_input safely so that I can create a function to replace it for a script that is meant to be used easily and securely.
The reason is that I am trying to make a character sheet generating application using python and need to be able to get a character's full name to pass as a string using a name for easy access (Charname_NLB)
However, as I'm looking to use this for more than that application, I need this to be usable for any string entered as raw input, using this alternate command.
I already have a similar piece made for input of integers and would like to integrate it into the same class, for simplicity's sake. I'll post it here, with thanks to: Mgilson and BlueKitties (from here and www.python-forum.org respectively)
def safeinput(get_num):
num = float(raw_input(get_num))
return num
However if this would not return the same result as the base Input command safely, could I please get an working copy, as I currently have only one proof of concept to work with, and it wouldn't be accurate with truncated numbers.
**Edit: By "Any string", I mean specifically that the result will be stored as a string, not used as a command.
Not sure if this is what you are asking for. literal_eval is safe, but only works for literals. It's very difficult to use eval() safely if you have to sanitise the input
>>> from ast import literal_eval
>>> def safeinput(s):
... try:
... return literal_eval(s)
... except:
... return s
...
>>> repr(safeinput("1"))
'1' # converted to an int
>>> repr(safeinput("1.1"))
'1.1' # converted to a float
>>> repr(safeinput("'some string in quotes'"))
"'some string in quotes'" # converted to a string
>>> repr(safeinput("some string without quotes"))
"'some string without quotes'" # no conversion necessary

giving garbage value while trying to store a md5 hash in a file in python

m=md5.new()
a=10111011
>>> m.update(str(a))
>>> k=m.digest()
>>> k
'\xec\x9d1\x89e\x08\xa1\xc2Y\xf6\xbf6\xfe\xe4\xe2M'
>>> f.write(str(k))
>>> f.flush()
the file f is filled with garbage value which i cant use to read again for further use of the hash . Why does it give the garbage value when on the python terminal it gives proper output?And the worst part is file goes corrupt ..
If you want further clues where your "garbage" (your digest!) is coming from, try print k versus print repr(k)!
You have a raw byte string. I think you want to insert a hexdigest instead? Either use k = m.hexdigest() or k = repr(m.digest()) and write that to your file.
Basically, you can choose your representation, choose what you write to your file. Which of these do you want to see?
>>> print k
�1���Y�6���M
>>> print repr(k)
'\xec\x9d1\x89e\x08\xa1\xc2Y\xf6\xbf6\xfe\xe4\xe2M'
>>> print k.encode("hex")
ec9d31896508a1c259f6bf36fee4e24d
Pass exactly the same to f.write(..) as you do to print. As you can see, in the original version you used 'k' ('str(k)' is the same as just 'k')
One possibility is that you're on Windows and haven't properly opened the file in binary mode, i.e., as 'wb'. We can't tell, since you don't show us how you opened f.
Another possibility might be that you're on Python 3 (where str means unicode), but I think in that case you'd see a leading b when you show k (and there's no md5 module in Python 3's standard library).
Opening the file the right way, with Python 2.6.4 on a Mac, I see the digest as
'\x82s\xf9\xa4\x83\x04\x87\xd0\xfdg\xee\xfa\x1f\x05B>'
both as k and as the file's contents. I don't know why you're seeing something that different, by the way; I get the same result with Python 2.4 and 2.5.

Categories