pefile How do I nullify the first 8 bytes of a file? - python

How do I nullify the first 8 bytes of a file?
this example does not work:
import pefile
pe = pefile.pe(In)
pe.set_dword_at_rva(0,0)
pe.set_dword_at_rva(0,4)
pe.write(Out)
pe.close()
How i can rename import functions in the file?
this example does not work:
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print entry.dll
for imp in entry.imports:
imp.name = 'NewIMports'
pe.write(Out)
sorry for my english

I'd propose to use the standard way (i. e. not using pefile) of doing this:
with file('filename.bla', 'wr+') as f:
f.write('\0' * 8)

You've got multiple problems with your code, but I'm not sure which ones are causing whatever problems you're experiencing (since you haven't explained the problems).
First, you want to set the first 8 bytes to 0, but you're using set_dword_at_rva rather than set_dword_at_offset. An RVA ("Relative Virtual Address") is the offset in memory to the runtime address of a section (whatever that runtime address ends up being). An offset is the offset on disk from the start of the file. If that's what you want, use that. (While you're at it, the qword functions are the way to set 8 bytes at a time. Using dword instead can't cause any problems except for endianness, which can't possibly matter for just setting to 0… but still, why make it harder on yourself?)
So:
pe = pefile.pe(In)
pe.set_qword_at_offset(0, 0)
Meanwhile, if you're modifying arbitrary strings within the file, while pefile can make room for them, it does not adjust any other headers that need to be adjusted to compensate. See "Notes about the write support" in the docs for details. So, imp.name = 'NewIMports' may work, but generate a broken PE, if the old name was shorter.
On top of that, renaming all of the imports to have the same name will definitely generate a broken PE.

Related

How do I reverse the byte order of a list of constants in Python?

I have been looking for a way to extract constants from C source files and reverse their byte order in one automated process (no manual input). So far, I've managed to utilize pycparser to do most of the heavy lifting for me and created a script that will print out all of the constants of a C file to the console. The format it prints is like this:
Constant: int, 0x243F6A88
My question is does anyone know of an intuitive way to automate this conversion process in Python? I know how to reverse the byte order with join() but I am struggling to think of a way to do this in which I can minimize the amount of manual input. Ideally, my script would print out the constants (done already) and then use some sort of regex(maybe?) to convert any constant that starts with a 0x (there are a lot of random numbers that get printed that I don't want). I hope this makes sense, thanks!
what I have so far:
class ConstantVisitor(c_ast.NodeVisitor):
def __init__(self):
self.values = []
def visit_Constant(self, node):
self.values.append(node.value)
node.show(showcoord=True)
def show_tree(filename):
# Note that cpp is used. Provide a path to your own cpp or
# make sure one exists in PATH.
ast = parse_file(filename, use_cpp=True,cpp_args=['-E', r'-Iutils/fake_libc_include'])
cv = ConstantVisitor()
cv.visit(ast)
if __name__ == "__main__":
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = 'xmrig-master/src/crypto/c_blake256.c'
show_tree(filename)
You seem to have 3 steps in the task:
Parse the code with pycparser - you have that
Find all constants (just integer constants? how about floats?) and reverse their byte order
Do something with the results
For (2) you can use something like the suggestions in this answer, but adjust it to the actual types you need.
For (3) it's not clear what you're trying to do; are you trying to write the constants back to the original C file? pycparser is not the best tool for that, then. You may want to use the Python bindings to Clang instead, because Clang tools are designed to modify existing code in place.

PyYAML and unusual tags

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.
The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.
Here is what the top lines of a file look like:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
...
When I try to read the file I get this error:
could not determine a constructor for the tag 'tag:unity3d.com,2011:74'
Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).
I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.
Thanks,
-R
I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.
1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html
It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.
Okay, great. But it doesn't work. Every YAML library has issues with this file.
2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:
2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:
%TAG !u! tag:unity3d.com,2011:
What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".
2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.
Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?
3) The object types are given by Unity's Class ID. Here is the documentation on that:
http://docs.unity3d.com/Manual/ClassIDReference.html
Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:
--- !u!1 &100000
So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).
The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.
The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:
Scalar
Sequence
Mapping
You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.
THE GOOD NEWS/SOLUTION:
Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!
In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.
1) Remove line 2 entirely, or just ignore it:
%TAG !u! tag:unity3d.com,2011:
We don't care. We know it's a unity file. And the tag does nothing for us.
2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.
--- !u!1 &100000
becomes...
--- &100000
3) The rest, output as is.
The code for the pre-parser looks like this:
def removeUnityTagAlias(filepath):
"""
Name: removeUnityTagAlias()
Description: Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
rest just fine after that.
Returns: String (YAML stream as string)
"""
result = str()
sourceFile = open(filepath, 'r')
for lineNumber,line in enumerate( sourceFile.readlines() ):
if line.startswith('--- !u!'):
result += '--- ' + line.split(' ')[2] + '\n' # remove the tag, but keep file ID
else:
# Just copy the contents...
result += line
sourceFile.close()
return result
To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:
import yaml
# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'
UnityStreamNoTags = removeUnityTagAlias(fileToLoad)
ListOfNodes = list()
for data in yaml.load_all(UnityStreamNoTags):
ListOfNodes.append( data )
# Example, print each object's name and type
for node in ListOfNodes:
if 'm_Name' in node[ node.keys()[0] ]:
print( 'Name: ' + node[ node.keys()[0] ]['m_Name'] + ' NodeType: ' + node.keys()[0] )
else:
print( 'Name: ' + 'No Name Attribute' + ' NodeType: ' + node.keys()[0] )
Hope that helps!
-Vlad
PS. To Answer the next issue in making this usable:
You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:
m_Materials:
- {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}
That file is somewhere else. And you can re-cursively open that one to find out any dependencies.
I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

Passing a record over a socket

I have basic socket communication set up between python and Delphi code (text only). Now I would like to send/receive a record of data on both sides. I have a Record "C compatible" and would like to pass records back and forth have it in a usable format in python.
I use conn.send("text") in python to send the text but how do I send/receive a buffer with python and access the record items sent in python?
Record
TPacketData = record
pID : Integer;
dataType : Integer;
size : Integer;
value : Double;
end;
I don't know much about python, but I have done a lot between Delphi, C++, C# and Java even with COBOL.Anyway, to send a record from Delphi to C first you need to pack the record at both ends,
in Deplhi
MyRecord = pack record
in C++
#pragma pack(1)
I don’t know in python but I guess there must be a similar one. Make sure that at both sides the sizeof(MyRecord) is the same length.Also, before sending the records, you should take care about byte ordering (you know, Little-Endian vs Big-Endian), use the Socket.htonl() and Socket.ntohl() in python and the equivalent in Deplhi which are in WinSock unit. Also a "double" in Delphi could not be the same as in python, in Delphi is 8 bytes check this as well, and change it to Single(4 bytes) or Extended (10 bytes) whichever matches.
If all that match then you could send/receive binary records in one shut, otherwise, I'm afraid, you have to send the individual fields one by one.
I know this answer is a bit late to the game, but may at least prove useful to other people finding this question in their search-results. Because you say the Delphi code sends and receives "C compatible data" it seems that for the sake of the answer about Python's handling it is irrelevant whether it is Delphi (or any other language) on the other end...
The python struct and socket modules have all the functionality for the basic usage you describe. To send the example record you would do something like the below. For simplicity and sanity I have presumed signed integers and doubles, and packed the data in "network order" (bigendian). This can easily be a one-liner but I have split it up for verbosity and reusability's sake:
import struct
t_packet_struc = '>iiid'
t_packet_data = struct.pack(t_packet_struc, pid, data_type, size, value)
mysocket.sendall(t_packet_data)
Of course the mentioned "presumptions" don't need to be made, given tweaks to the format string, data preparation, etc. See the struct inline help for a description of the possible format strings - which can even process things like Pascal-strings... By the way, the socket module allows packing and unpacking a couple of network-specific things which struct doesn't, like IP-address strings (to their bigendian int-blob form), and allows explicit functions for converting data bigendian-to-native and vice-versa. For completeness, here is how to unpack the data packed above, on the Python end:
t_packet_size = struct.calcsize(t_packet_struc)
t_packet_data = mysocket.recv(t_packet_size)
(pid, data_type, size, value) = struct.unpack(t_packet_struc,
t_packet_data)
I know this works in Python version 2.x, and suspect it should work without changes in Python version 3.x too. Beware of one big gotcha (because it is easy to not think about, and hard to troubleshoot after the fact): Aside from different endianness, you can also distinguish between packing things using "standard size and alignment" (portably) or using "native size and alignment" (much faster) depending on how you prefix - or don't prefix - your format string. These can often yield wildly different results than you intended, without giving you a clue as to why... (there be dragons).

Reading a Delphi binary file in Python

I have a file that was written with the following Delphi declaration ...
Type
Tfulldata = Record
dpoints, dloops : integer;
dtime, bT, sT, hI, LI : real;
tm : real;
data : array[1..armax] Of Real;
End;
...
Var:
fh: File Of Tfulldata;
I want to analyse the data in the files (many MB in size) using Python if possible - is there an easy way to read in the data and cast the data into Python objects similar in form to the Delphi records? Does anyone know of a library perhaps that does this?
This is compiled on Delphi 7 with the following options which may (or may not) be pertinent,
Record Field Alignment: 8
Pentium Safe FDIV: False
Stack Frames: False
Optimization: True
Here is the full solutions thanks to hints from KillianDS and Ritsaert Hornstra
import struct
fh = open('my_file.dat', 'rb')
s = fh.read(40256)
vals = struct.unpack('iidddddd5025d', s)
dpoints, dloops, dtime, bT, sT, hI, LI, tm = vals[:8]
data = vals[8:]
I do not know how Delphi internally stores data, but if it is as simple byte-wise data (so not serialized and mangled), use struct. This way you can treat a string from a python file as binary data. Also, open files as binary file(open,'rb').
Please note that when you define a record in Delphi (like struct in C) the fields are layed out in order and in binary given the current alignment (eg Bytes are aligned on 1 byte boundaries, Words on 2 byte, Integers on 4 byte etc, but it may vary given the compiler settings.
When serialized to a file, you probably mean that this record is written in binary to the file and the next record is written after the first one starting at position sizeof( structure) etc etc. Delphi does not specify how thing should be serialized to/from file, So the information you give leaves us guessing.
If you want to make sure it is always the same without interference of any compiler setings, use packed record.
Real can have multiple meanings (it is an 48 bit float type for older Delphi versions and later on a 64 bit float (IEEE double)).
If you cannot access the Delphi code or compile it yourself, just ty to check the data with a HEX editor, you should see the boundaries of the records clearly since they start with Integers and only floats follow.

Trying to write to binary plist format from Python (w/PyObjC) to be fetch and read in by Cocoa Touch

I'm trying to serve a property list of search results to my iPhone app. The server is a prototype, written in Python.
First I found Python's built-in plistlib, which is awesome. I want to give search-as-you-type a shot, so I need it to be as small as possible, and xml was too big. The binary plist format seems like a good choice. Unfortunately plistlib doesn't do binary files, so step right up PyObjC.
(Segue: I'm very open to any other thoughts on how to accomplish live search. I already pared down the data as much as possible, including only displaying enough results to fill the window with the iPhone keyboard up, which is 5.)
Unfortunately, although I know Python and am getting pretty decent with Cocoa, I still don't get PyObjC.
This is the Cocoa equivalent of what I want to do:
NSArray *plist = [NSArray arrayWithContentsOfFile:read_path];
NSError *err;
NSData *data = [NSPropertyListSerialization dataWithPropertyList:plist
format:NSPropertyListBinaryFormat_v1_0
options:0 // docs say this must be 0, go figure
error:&err];
[data writeToFile:write_path atomically:YES];
I thought I should be able to do something like this, but dataWithPropertyList isn't in the NSPropertyListSerialization objects dir() listing. I should also probably convert the list to NSArray. I tried the PyObjC docs, but it's so tangential to my real work that I thought I'd try an SO SOS, too.
from Cocoa import NSArray, NSData, NSPropertyListSerialization, NSPropertyListBinaryFormat_v1_0
plist = [dict(key1='val', key2='val2'), dict(key1='val', key2='val2')]
NSPropertyListSerialization.dataWithPropertyList_format_options_error(plist,
NSPropertyListBinaryFormat_v1_0,
?,
?)
This is how I'm reading in the plist on the iPhone side.
NSData *data = [NSData dataWithContentsOfURL:url];
NSPropertyListFormat format;
NSString *err;
id it = [NSPropertyListSerialization
propertyListFromData:data
mutabilityOption:0
format:&format
errorDescription:&err];
Happy to clarify if any of this doesn't make sense.
I believe the correct function name is
NSPropertyListSerialization.dataWithPropertyList_format_options_error_
because of the ending :.
(BTW, if the object is always an array or dictionary, -writeToFile:atomically: will write the plist (as XML format) already.)
As KennyTM said, you're missing the trailing underscore in the method name. In PyObjC you need to take the Objective-C selector name (dataWithPropertyList:format:options:error:) and replace all of the colons with underscores (don't forget the last colon, too!). That gives you dataWithPropertyList_format_options_error_ (note the trailing underscore). Also, for the error parameter, you can just use None. That makes your code look like this:
bplist = NSPropertyListSerialization.dataWithPropertyList_format_options_error_(
plist,
NSPropertyListBinaryFormat_v1_0,
0,
None)
# bplist is an NSData object that you can operate on directly or
# write to a file...
bplist.writeToFile_atomically_(pathToFile, True)
If you test the resulting file, you'll see that it's a Binary PList file, as desired:
Jagaroth:~/Desktop $ file test.plist
test.plist: Apple binary property list

Categories