Reading a raw RAM data in python - python

I took the dump of RAM data using a freeware called DumpIt(http://www.downloadcrew.com/article/23854-dumpit). The software saved the RAM data as a raw file which can be read using a Hex Editor(http://www.downloadcrew.com/article/10814-hxd).
How do I get the string data as visible in the Hex Editor(see image) in python?
For eg: I want to get the string "http://www.downloadcrew.com/article/23854-dumpit" in red box in image in python by reading the raw file generated by DumpIt.
EDIT
I tried using this code but it just gets stalled and nothing happens
#!/usr/bin/python
import binascii
filename = "LEMARC-20140401-181003.raw"
g = open("out","w")
str=""
with open(filename,"rb") as f:
for lines in f:
str+=lines
str = binascii.unhexlify(str)
f.close()
g.write(str)
g.close

In Python2
"437c2123".decode('hex')
'C|!#'
In Python3 (also works in Python2, for <2.6 you can't have the b prefixing the string)
import binascii
binascii.unhexlify(b"437c2123")
b'C|!#'
So in your case decode the entire hex string to get the ascii, and then you can extract the url with a regex or your own parsing function

Related

why is my hex string not behaving like a string?

my purpose here is to turn a small file into a qrcode.
So I used binascii.hexlify() to get the hexadecimal of the file.
With pillow I will then build the qr code, this qr code will be read by an other script that will turn it back into a file.
import binascii
with open(r"D:\test.png", 'rb') as f:
content = f.read()
hexstr = str(binascii.hexlify(content))
#print(hexstr)
print(hexstr[:5])
the weird thing here is that the hexstr is 64eded8d8d8d6ad3bcefd7a616864b4aea169786434393975eecb73a1b896cae4e80da592d7dcf2... but the hexstr[:5] is b'895 (i was expecting 64ede)
why is that?
Thanks.
ps : I'm using python 3.6x64 on a windows 10 machine
I'm not sure why you're getting b'895, but when you run hexstr = str(binascii.hexlify(content) it gives hexstr the value "b'64ede...'". The string representation of the bytes sequence includes the b' prefix. I think what you want is hexstr = binascii.hexlify(content).decode(). This will decode the binary string into the corresponding ascii.
import binascii
with open(r"D:\test.png", 'rb') as f:
content = f.read()
hexstr = binascii.hexlify(content).decode()
#print(hexstr)
print(hexstr[:5])

Cannot properly decode Base64 string from Power Apps to audio file

I am trying to properly decode a Base64 string from Power Apps to an audio file. The point is: I do decode it and I can play it. But as soon as I try to convert it using ffmpeg or any website, all kind of errors are thrown. I have tried changing the formats too (aac, weba, m4a, wav, mp3, ogg, 3gp, caf), but none of them could be converted to another format.
PS: If I decode the string (which is too big to post here) directly using a website, then the audio file can finally be converted, indicating that the issue is in the code or even in the Python library.
===============
CODE ===============
import os
import base64
mainDir = os.path.dirname(__file__)
audioFileOGG = os.path.join(mainDir, "myAudio.ogg")
audioFile3GP = os.path.join(mainDir, "myAudio.3gp")
audioFileAAC = os.path.join(mainDir, "myAudio.aac")
binaryFileTXT = os.path.join(mainDir, 'binaryData.txt')
with open(binaryFileTXT, 'rb') as f:
audioData = f.readlines()
audioData = audioData[0]
with open(audioFileAAC, "wb") as f:
f.write(base64.b64decode(audioData))
Result: the audio file is playable, but it cannot be converted to any other format (I need *.wav).
What am I missing here?
I found the issue myself: in order to decode the Base64 string, one must remove the header first (eg.: "data:audio/webm;base64,"). Then it works!

Any way to get correct conversion for unicode text format data to csv in python?

I am accessing dataset that lives on ftp server. after I download the data, I used pandas to read it as csv but I got an encoding error. The file has csv file extension but after I opened the file with MS excell, data was in Unicode Text format. I want to make conversion of those dataset that stored in Unicode text format. How can I make this happen? Any idea to get this done?
my attempt:
from ftplib import FTP
import os
def mydef():
defaultIP=''
username='cat'
password='cat'
ftp = FTP(defaultIP,user=username, passwd=password)
ftp.dir()
filenames=ftp.nlst()
for filename in files:
local_filename = os.path.join('C:\\Users\\me', filename)
file = open(local_filename, 'wb')
ftp.retrbinary('RETR '+ filename, file.write)
file.close()
ftp.quit()
then I tried this to get correct encoding:
mydef.encode('utf-8').splitlines()
but this one is not working for me. I used this solution
the output of above code:
here is output snippet of above code:
b'\xff\xfeF\x00L\x00O\x00W\x00\t\x00C\x00T\x00Y\x00_\x00R\x00P\x00T\x00\t\x00R\x00E\x00P\x00O\x00R\x00T\x00E\x00R\x00\t\x00C\x00T\x00Y\x00_\x00P\x00T\x00N\x00\t\x00P\x00A\x00R\x00T\x00N\x00E\x00R\x00\t\x00C\x00O\x00M\x00M\x00O\x00D\x00I\x00T\x00Y\x00\t\x00D\x00E\x00S\x00C\x00R\x00I\x00P\x00T\x00I\x00O\x00N\x00\t'
expected output
the expected output of this dataset should be in normal csv data such as common trade data, but encoding doesn't work for me.
I used different encoding for getting the correct conversion of csv format data but none of them works for me. How can I make that work? any idea to get this done? thanks
EDIT: I have to change it - now I remove 2 bytes at the beginning (BOM) and one byte at the end because data is incomplete (every char needs 2 bytes)
It seems it is not utf-8 but utf-16 with BOM
If I remove first two bytes (BOM - Bytes Order Mark) and last byte at the end because it is incomplete (every char needs two bytes) and use decode('utf-16-le')
b'F\x00L\x00O\x00W\x00\t\x00C\x00T\x00Y\x00_\x00R\x00P\x00T\x00\t\x00R\x00E\x00P\x00O\x00R\x00T\x00E\x00R\x00\t\x00C\x00T\x00Y\x00_\x00P\x00T\x00N\x00\t\x00P\x00A\x00R\x00T\x00N\x00E\x00R\x00\t\x00C\x00O\x00M\x00M\x00O\x00D\x00I\x00T\x00Y\x00\t\x00D\x00E\x00S\x00C\x00R\x00I\x00P\x00T\x00I\x00O\x00N\x00'.decode('utf-16-le')
then I get
'FLOW\tCTY_RPT\tREPORTER\tCTY_PTN\tPARTNER\tCOMMODITY\tDESCRIPTION'
EDIT: meanwhile I found also Python - Decode UTF-16 file with BOM

Python bz2 - text vs. interactive console (data stream)

I was using bz2 earlier to try to decompress an input. The input that I wanted to decode was already in compressed format, so I decided to input the format into the interactive Python console:
>>> import bz2
>>> bz2.decompress(input)
This worked just fine without any errors. However, I got different results when I tried to extract the text from a html file and then decompress it:
file = open("example.html", "r")
contents = file.read()
# Insert code to pull out the text, which is of type 'str'
result = bz2.decompress(parsedString)
I've checked the string I parsed with the original one, and it looks identical. Furthermore, when I copy and paste the string I wish to decompress into my .py file (basically enclosing it with double parentheses ""), it works fine. I have also tried to open with "rb" in hopes that it'll look at the .html file as a binary, though that failed to work as well.
My questions are: what is the difference between these two strings? They are both of type 'str', so I'm assuming there is an underlying difference I am missing. Furthermore, how would I go about retrieving the bz2 content from the .html in such a way that it will not be treated as an incorrect datastream? Any help is appreciated. Thanks!
My guess is that the html file has the text representation of the data instead of the actual binary data in the file itself.
For instance take a look at the following code:
>>> t = '\x80'
>>> print t
>>> '\x80'
But say I create a text file with the contents \x80 and do:
with open('file') as f:
t = f.read()
print t
I would get back:
'\\x80'
If this is the case, you could use eval to get the desired result:
result = bz2.decompress(eval('"'+parsedString'"'))
Just make sure that you only do this for trusted data.

Encoding issue when writing to text file, with Python

I'm writing a program to 'manually' arrange a csv file to be proper JSON syntax, using a short Python script. From the input file I use readlines() to format the file as a list of rows, which I manipulate and concenate into a single string, which is then outputted into a separate .txt file. The output, however, contains gibberish instead of Hebrew characters that were present in the input file, and the output is double-spaced, horizontally (a whitespace character is added in between each character). As far as I can understand, the problem has to do with the encoding, but I haven't been able to figure out what. When I detect the encoding of the input and output files (using .encoding attribute), they both return None, which means they use the system default. Technical details: Python 2.7, Windows 7.
While there are a number of questions out there on this topic, I didn't find a direct answer to my problem.
Detecting the system defaults won't help me in this case, because I need the program to be portable.
Here's the code:
def txt_to_JSON(csv_list):
...some manipulation of the list...
return JSON_string
file_name = "input_file.txt"
my_file = open(file_name)
# make each line of input file a value in a list
lines = my_file.readlines()
# break up each line into a list such that each 'column' is a value in that list
for i in range(0,len(lines)):
lines[i] = lines[i].split("\t")
J_string = txt_to_JSON(lines)
json_file = open("output_file.txt", "w+")
json_file.write(jstring)
json_file.close()
All data needs to be encoded to be stored on disk. If you don't know the encoding, the best you can do is guess. There's a library for that: https://pypi.python.org/pypi/chardet
I highly recommend Ned Batchelder's presentation
http://nedbatchelder.com/text/unipain.html
for details.
There's an explanation about the use of "unicode" as an encoding on windows: What's the difference between Unicode and UTF-8?
TLDR:
Microsoft uses UTF16 as encoding for unicode strings, but decided to call it "unicode" as they also use it internally.
Even if Python2 is a bit lenient as to string/unicode conversions, you should get used to always decode on input and encode on output.
In your case
filename = 'where your data lives'
with open(filename, 'rb') as f:
encoded_data = f.read()
decoded_data = encoded_data.decode("UTF16")
# do stuff, resulting in result (all on unicode strings)
result = text_to_json(decoded_data)
encoded_result = result.encode("UTF-16") #really, just using UTF8 for everything makes things a lot easier
outfile = 'where your data goes'
with open(outfile, 'wb') as f:
f.write(encoded_result)
You need to tell Python to use the Unicode character encoding to decode the Hebrew characters.
Here's a link to how you can read Unicode characters in Python: Character reading from file in Python

Categories