Inconsistent python mmap behaviour with /dev/mem

Inconsistent python mmap behaviour with /dev/mem - python

I've been working on a project in PHP which requires mmap'ing /dev/mem to gain access to the hardware peripheral registers. As there's no native mmap, the simplest way I could think of to achieve this was to construct a python subprocess, which communicated with the PHP app via stdin/stdout.
I have run into a strange issue which only occurs while reading addresses, not writing them. The subprocess functions correctly (for reading) with the following:
mem.write(sys.stdin.read(length))
So, I expected that I could conversely write memory segments back to the parent using the following:
sys.stdout.write(mem.read(length))
If I mmap a standard file, both commands work as expected (irrelevant of the length of read/write). If I map the /dev/mem "file," I get nonsense back during the read. It's worth noting that the area I'm mapping is outside the physical memory address space and is used to access the peripheral registers.
The work-around I have in place is the following:
for x in range(0, length / 4):
sys.stdout.write(str(struct.pack('L', struct.unpack_from('L', mem, mem.tell())[0])))
mem.seek(4, os.SEEK_CUR)
This makes the reads behave as expected.
What I can't understand is why reading from the address using unpack_from should see anything different to reading it directly. The same (non-working) thing occurs if I try to just assign a read to a variable.
In case additional context is helpful, I'm running this on a Raspberry Pi/Debian 8. The file that contains the above issue is here. The project that uses it is here.

Related

Does python "file write()" method guarantee datas have been correctly written?

I'm new with python and I'm writting script to patch a file with something like:
def getPatchDatas(file):
f = open(file,"rb")
datas = f.read()
f.close()
return datas
f = open("myfile.bin","r+b")
f.seek(0xC020)
f.write(getPatchDatas("mypatch.bin"))
f.close()
I would like to be sure the patch as been applied correctly.
So, if no error / exception is raised, does it mean I'm 100% sure the patch has been correctly written?
Or is it better to double check with something like:
f = open("myfile.bin","rb")
f.seek(0xC020)
if not f.read(0x20) == getPatchDatas("mypatch.bin"):
print "Patch not applied correctly!"
f.close()
??
Thanks.

No it doesn't, but roughly it does. It depends how much it matters.
Anything could go wrong - it could be a consumer hard disk which lies to the operating system about when it has finished writing data to disk. It could be corrupted in memory and that corrupt version gets written to disk, or it could be corrupted inside the disk during writing by electrical or physical problems.
It could be intercepted by kernel modules on Linux, filter drivers on Windows or a FUSE filesystem provider which doesn't actually support writing but pretends it does, meaning nothing was written.
It could be screwed up by a corrupted Python install where exceptions don't work or were deliberately hacked out of it, or file objects monkeypatched, or accidentally run in an uncommon implementation of Python which fakes supporting files but is otherwise identical.
These kinds of reasons are why servers have server class hardware with higher tolerances to temperature and electrical variation, error checking and correcting memory (ECC), RAID controller battery backups, ZFS checksumming filesystem, Uninterruptable Power Supplies, and so on.
But, as far as normal people and low risk things go - if it's written without error, it's as good as written. Double-checking makes sense - especially as it's that easy. It's nice to know if something has failed.

In single process, it is.
In multi processes(e.g. One process is writing and another is reading. Even you ensure it'll only read after call "write", the "write" need some time to finish), you may need a filelock.

Byte limit when transferring Python objects between Processes using a Pipe?

I have a custom simulator (for biology) running on a 64-bit Linux (kernel
version 2.6.28.4) machine using a 64-bit Python 3.3.0 CPython interpreter.
Because the simulator depends on many independent experiments for valid results,
I built in parallel processing for running experiments. Communication between
the threads primarily occurs under a producer-consumer pattern with managed
multiprocessing Queues
(doc).
The rundown of the architecture is as follows:
a master processes that handles spawning and managing Processes and the various Queues
N worker processes that do simulations
1 result consumer process that consumes the results of a simulation and sorts and analyzes the results
The master process and the worker processes communicate via an input Queue.
Similarly, the worker processes place their results in an output Queue which
the result consumer process consumes items from. The final ResultConsumer
object is passed via a multiprocessing Pipe
(doc)
back to the master process.
Everything works fine until it tries to pass the ResultConsumer object back to
the master process via the Pipe:
Traceback (most recent call last):
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 95, in run
self._target(*self._args, **self._kwargs)
File "DomainArchitectureGenerator.py", line 93, in ResultsConsumerHandler
pipeConn.send(resCon)
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 207, in send
self._send_bytes(buf.getbuffer())
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 394, in _send_bytes
self._send(struct.pack("!i", n))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
I understand the first two traces (unhandled exits in the Process library),
and the third is my line of code for sending the ResultConsumer object down the
Pipe to the master process. The last two traces are where it gets
interesting. A Pipe pickles any object that is sent to it and passes the
resulting bytes to the other end (matching connection) where it is unpickled
upon running recv(). self._send_bytes(buf.getbuffer()) is attempting to
send the bytes of the pickled object. self._send(struct.pack("!i", n)) is
attempting to pack a struct with an integer (network/big-endian) of length n,
where n is the length of the buffer passed in as a parameter (the struct
library handles conversions between Python values and C structs represented as
Python strings, see the doc).
This error only occurs when attempting a lot of experiments, e.g. 10 experiments
will not cause it, but 1000 will consitently (all other parameters being constant). My best
hypothesis so far as to why struct.error is thrown is that the number of bytes
trying to be pushed down the pipe exceeds 2^32-1 (2147483647), or ~2 GB.
So my question is two-fold:
I'm getting stuck with my investigations as struct.py essentially just
imports from _struct and I have no idea where that is.
The byte limit seems arbitrary given that the underlying architecture is all
64-bit. So, why can't I pass anything larger than that? Additionally, if I
can't change this, are there any good (read: easy) workarounds to this issue?
Note: I don't think that using a Queue in place of a Pipe will solve the issue,
as I suspect that Queue's use a similar pickling intermediate step. EDIT: This note is entirely incorrect as pointed out in abarnert's answer.

I'm getting stuck with my investigations as struct.py essentially just imports from _struct and I have no idea where that is.
In CPython, _struct is a C extension module built from _struct.c in the Modules directory in the source tree. You can find the code online here.
Whenever foo.py does an import _foo, that's almost always a C extension module, usually built from _foo.c. And if you can't find a foo.py at all, it's probably a C extension module, built from _foomodule.c.
It's also often worth looking at the equivalent PyPy source, even if you're not using PyPy. They reimplement almost all extension modules in pure Python—and for the remainder (including this case), the underlying "extension language" is RPython, not C.
However, in this case, you don't need to know anything about how struct is working beyond what's in the docs.
The byte limit seems arbitrary given that the underlying architecture is all 64-bit.
Look at the code it's calling:
self._send(struct.pack("!i", n))
If you look at the documentation, the 'i' format character explicitly means "4-byte C integer", not "whatever ssize_t is". For that, you'd have to use 'n'. Or you might want to explicitly use a long long, with 'q'.
You can monkeypatch multiprocessing to use struct.pack('!q', n). Or '!q'. Or encode the length in some way other than struct. This will, of course, break compatibility with non-patched multiprocessing, which could be a problem if you're trying to do distributed processing across multiple computers or something. But it should be pretty simple:
def _send_bytes(self, buf):
# For wire compatibility with 3.2 and lower
n = len(buf)
self._send(struct.pack("!q", n)) # was !i
# The condition is necessary to avoid "broken pipe" errors
# when sending a 0-length buffer if the other end closed the pipe.
if n > 0:
self._send(buf)
def _recv_bytes(self, maxsize=None):
buf = self._recv(8) # was 4
size, = struct.unpack("!q", buf.getvalue()) # was !i
if maxsize is not None and size > maxsize:
return None
return self._recv(size)
Of course there's no guarantee that this change is sufficient; you'll want to read through the rest of the surrounding code and test the hell out of it.
Note: I suspect that using a Queue in place of a Pipe will not solve the issue, as I suspect that Queue's use a similar pickling intermediate step.
Well, the problem has nothing to do with pickling. Pipe isn't using pickle to send the length, it's using struct. You can verify that pickle wouldn't have this problem: pickle.loads(pickle.dumps(1<<100)) == 1<<100 will return True.
(In earlier versions, pickle also had problems with huge objects—e.g., a list of 2G elements—which could have caused problems at a scale about 8x as high as the one you're currently hitting. But that's been fixed by 3.3.)
Meanwhile… wouldn't it be faster to just try it and see, instead of digging through the source to try to figure out whether it would work?
Also, are you sure you really want to pass around a 2GB data structure by implicit pickling?
If I were doing something that slow and memory-hungry, I'd prefer to make that explicit—e.g., pickle to a tempfile and send the path or fd. (If you're using numpy or pandas or something, use its binary file format instead of pickle, but same idea.)
Or, even better, share the data. Yes, mutable shared state is bad… but sharing immutable objects is fine. Whatever you've got 2GB of, can you put it in a multiprocessing.Array, or put it in a ctypes array or struct (of arrays or structs of …) that you can share via multiprocessing.sharedctypes, or ctypes it out of a file that you mmap on both sides, or…? There's a bit of extra code to define and pick apart the structures, but when the benefits are likely to be this big, it's worth trying.
Finally, when you think you've found a bug/obvious missing feature/unreasonable limitation in Python, it's worth looking at the bug tracker. It looks like issue 17560: problem using multiprocessing with really big objects? is exactly your problem, and has lots of information, including suggested workarounds.

Using GDB in a virtual machine to debug Python ctypes segfaults

I have been working on a project that has been working fine on a dedicated Linux CentOS system.
The general idea is that there is a Python workflow manager than calls shared libraries written in C using ctypes. It works fine.
However, the need for me to have a local instance of the project for development purposes has come up. I set up a Linux Mint virtual machine with VMWare under Windows 7. For the most part, everything works fine.
The problem is one module is crashing with a segmentation fault upon calling a function in one of the shared libraries. Normally this is OK, and on the dedicated Linux machine, using something like "gdb python corename" lets me see exactly where it crashed and work the problem out.
However, with the local setup I am having problems. The thing I notice most is that GDB does not report correct memory addresses. It's a huge project so I can't post all the code, but I'll give a rundown:
Python module creates a "file_path" variable, a string. It first passes this to a certain shared library to load the file. If I execute the command, in python
hex(id(file_path))
it will return something like '46cb4ec'. In the first shared library, which is C, I start it out with
printf("file_pathaddress = %x\n", &file_path[0]);
and it outputs 'file_path address = 46cb4ec', which is the same as I get through Python's 'id()' function. I guess this is expected...?
Anyways.. I send this same variable to another shared library, but it crashes immediately on this call. If I analyze the core file, it shows it crashes on the function call itself, and not a line within the function. The strange thing, though, is it outputs something like:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f124448c9fc in seam_processor (irm_filename=<error reading variable: Cannot access memory at address 0x7fff5fab51b8>,
seam_data_path=<error reading variable: Cannot access memory at address 0x7fff5fab51b0>,
full_data_path=<error reading variable: Cannot access memory at address 0x7fff5fab51a8>, ranges=<error reading variable: Cannot access memory at address 0x7fff5fab51a0>,
job_id=<error reading variable: Cannot access memory at address 0x7fff5fab519c>, job_owner=<error reading variable: Cannot access memory at address 0x7fff5fab5198>,
y_tile_in=1, x_tile_in=1, fc_int=10650000000, atmos_props=..., surf_props=..., extra_props=<error reading variable: Cannot access memory at address 0x7fff5fab5190>,
gascalc_ftype=513, len_gas_sectrum=16, vect_spec_fq=<error reading variable: Cannot access memory at address 0x7fff5fab5188>, surfscat_theta_inc_vector_size=6,
surfscat_theta_inc_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5180>, surfscat_phi_inc_vector_size=6,
surfscat_phi_inc_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5178>, surfscat_theta_scat_vector_size=6,
surfscat_theta_scat_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5170>, surfscat_phi_scat_vector_size=6,
surfscat_phi_scat_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5168>) at src/seam_processor.c:47
So, what I can't figure out is why GDB is reporting these memory addresses as such. In this case, the 'irm_filename' variable is what the Python passed in as 'file_path', so its address should be what the other library and the id() function report, 0x46CB4EC. Why is it different? However, the strange thing is that some of the variables are just fine, like 'y_tile_in'. If I do in gdb:
(gdb) print &y_tile_in
$1 = (int *) 0x7fff60543f80
So, although it can read this memory address, this is not the same as what Python's id() would report, or what a similar C printf() of the address would report in a library that doesn't crash. Also, these memory addresses are really big numbers, larger than the amount of memory I have by far... What do they really mean?
My question, then, is what exactly is going on here? Is the fact that this is being run in a virtual machine doing this? Is there some mapping going on? I haven't been able to find anything online about something different I'd have to do if using gdb in a virtual machine, so I'm at a loss...
Anyone know what's going on?
Thanks.
EDIT
So, the problem has gotten stranger. Basically, I commented out all the code from the library that does anything and left the function call the same. When I do this and run it in gdb with a breakpoint, all of the memory addresses that it prints in the function call are normal, match the Python id() function and match printf() on the addresses.
I started un-commenting out code to see what the problem could be. The problem is with a declaration:
double nrcs_h_d[MAX_NINC_S*MAX_SCAT_S];
double nrcs_v_d[MAX_NINC_S*MAX_SCAT_S];
If I comment out both lines, there is no crash. If I comment out only the second line, there is no crash. If no lines are commented out, though, it crashes.
The strange thing is that MAX_NINC_S and MAX_SCAT_S both equal 500. So, these arrays are only a couple of megabytes in size... Elsewhere in the code arrays of several hundred megabytes are allocated just fine.
Also, if I replace the above lines with:
double *nrcs_h_d, *nrcs_v_d;
nrcs_h_d = (double *)malloc(MAX_NINC_S*MAX_SCAT_S*sizeof(double));
nrcs_v_d = (double *)malloc(MAX_NINC_S*MAX_SCAT_S*sizeof(double));
It seems to work fine... So apparently the problem was related to trying to allocate too much on the stack.
So, the questions become:
Why does gdb not show this is the line of the code where the segmentation fault happens, but instead says it is the function call?
Why do the memory addresses of the core dump file seem to get all screwed up if that allocation is made?
Thanks.

Recall that stack space grows downward, while heap space grows upward; addresses close to the top of your virtual memory space (e.g. 0x7fff5fab51b0) are stack-allocated variables, while addresses closer to the bottom (e.g. 0x46cb4ec) are heap-allocated variables. Note that virtual memory space is usually much, much larger than physical memory; it looks like your operating system and architecture support up to 128 GiB of virtual memory.
Since Python depends heavily on dynamic memory allocation, it will end up putting its objects on the heap, which is why id() tends to return addresses on the low side. If C code copies any values to stack-allocated variables and subsequently tries to call functions using those local copies, you'll see addresses on the high side again.
There is a line number provided: src/seam_processor.c:47. Is there anything interesting there? The various memory addresses that GDB is complaining about are various memory addresses on the stack, all of which are sequential, and almost all of which themselves appear to be pointers (since they are all 8 bytes wide).
(This is the best answer I can give with the information at hand; feel free to suggest revisions or provide additional information.)

One possible explanation is that the allocation of data for local variables in a function occurs before any code in a particular function is executed.
The compiler works out the amount of required stack space and allocates it before getting to the statements in the function. So the actual execution of your function goes a bit like this:
Allocate the amount of storage for all local function variables,
Start executing the statements in the function
In that way, when you call another say function2 from fucntion1, the local variables of function1 will not be disturbed with data from local variables in function2.
Thus in your case, the space required for your local data variables is more than the stack has available and the exception is raised at a point in time before the function code starts executing.

How to tell whether sys.stdout has been flushed in Python

I'm trying to debug some code I wrote, which involves a lot of parallel processes. And have some unwanted behaviour involving output to sys.stdout and certain messages being printed twice. For debugging purposes it would be very useful to know whether at a certain point sys.stdout has been flushed or not. I wonder if this is possible and if so, how?
Ps. I don't know if it matters but I'm using OS X (at least some sys commands depend on the operating system).

The answer is: you cannot tell (not without serious uglyness, an external C module or similar).
The reason is that python’s file-implementation is based on the C (stdio) implementation for FILE *. So an underlying python file object basically just has a reference to the opened FILE. When writing data the C-implementation writes data, and when you tell it to flush() python also just forwards the flush call. So python does not hold the information. Even for the underlying C layer, a quick search returned that there is no documented API allowing you to access this information, however it's probably somewhere in the FILE object, so it could in theory be read out if it is that desperately needed.

How can I access Ring 0 with Python?

This answer, stating that the naming of classes in Python is not done because of special privileges, here confuses me.
How can I access lower rings in Python?
Is the low-level io for accessing lower level rings?
If it is, which rings I can access with that?
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things? If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?

Windows and Linux both use ring 0 for kernel code and ring 3 for user processes. The advantage of this is that user processes can be isolated from one another, so the system continues to run even if a process crashes. By contrast, a bug in ring 0 code can potentially crash the entire machine.
One of the reasons ring 0 code is so critical is that it can access hardware directly. By contrast, when a user-mode (ring 3) process needs to read some data from a disk:
the process executes a special instruction telling the CPU it wants to make a system call
CPU switches to ring 0 and starts executing kernel code
kernel checks that the process is allowed to perform the operation
if permitted, the operation is carried out
kernel tells the CPU it has finished
CPU switches back to ring 3 and returns control to the process
Processes belonging to "privileged" users (e.g. root/Administrator) run in ring 3 just like any other user-mode code; the only difference is that the check at step 3 always succeeds. This is a good thing because:
root-owned processes can crash without taking the entire system down
many user-mode features are unavailable in the kernel, e.g. swappable memory, private address space
As for running Python code in lower rings - kernel-mode is a very different environment, and the Python interpreter simply isn't designed to run in it, e.g. the procedure for allocating memory is completely different.
In the other question you reference, both os.open() and open() end up making the open() system call, which checks whether the process is allowed to open the corresponding file and performs the actual operation.

I think SimonJ's answer is very good, but I'm going to post my own because from your comments it appears you're not quite understanding things.
Firstly, when you boot an operating system, what you're doing is loading the kernel into memory and saying "start executing at address X". The kernel, that code, is essentially just a program, but of course nothing else is loaded, so if it wants to do anything it has to know the exact commands for the specific hardware it has attached to it.
You don't have to run a kernel. If you know how to control all the attached hardware, you don't need one, in fact. However, it was rapidly realised way back when that there are many types of hardware one might face and having an identical interface across systems to program against would make code portable and generally help get things done faster.
So the function of the kernel, then, is to control all the hardware attached to the system and present it in a common interface, called an API (application programming interface). Code for programs that run on the system don't talk directly to hardware. They talk to the kernel. So user land programs don't need to know how to ask a specific hard disk to read sector 0x213E or whatever, but the kernel does.
Now, the description of ring 3 provided in SimonJ's answer is how userland is implemented - with isolated, unprivileged processes with virtual private address spaces that cannot interfere with each other, for the benefits he describes.
There's also another level of complexity in here, namely the concept of permissions. Most operating systems have some form of access control, whereby "administrators" have total control of the system and "users" have a restricted subset of options. So a kernel request to open a file belonging to an administrator should fail under this sort of approach. The user who runs the program forms part of the program's context, if you like, and what the program can do is constrained by what that user can do.
Most of what you could ever want to achieve (unless your intention is to write a kernel) can be done in userland as the root/administrator user, where the kernel does not deny any API requests made to it. It's still a userland program. It's still a ring 3 program. But for most (nearly all) uses it is sufficient. A lot can be achieved as a non-root/administrative user.
That applies to the python interpreter and by extension all python code running on that interpreter.
Let's deal with some uncertainties:
The naming of os and sys I think is because these are "systems" tasks (as opposed to say urllib2). They give you ways to manipulate and open files, for example. However, these go through the python interpreter which in turn makes a call to the kernel.
I do not know of any kernel-mode python implementations. Therefore to my knowledge there is no way to write code in python that will run in the kernel (linux/windows).
There are two types of privileged: privileged in terms of hardware access and privileged in terms of the access control system provided by the kernel. Python can be run as root/an administrator (indeed on Linux many of the administration gui tools are written in python), so in a sense it can access privileged code.
Writing a C extension or controlling a C application to Python would ostensibly mean you are either using code added to the interpreter (userland) or controlling another userland application. However, if you wrote a kernel module in C (Linux) or a Driver in C (Windows) it would be possible to load that driver and interact with it via the kernel APIs from python. An example might be creating a /proc entry in C and then having your python application pass messages via read/write to that /proc entry (which the kernel module would have to handle via a write/read handler. Essentially, you write the code you want to run in kernel space and basically add/extend the kernel API in one of many ways so that your program can interact with that code.
"Low-level" IO means having more control over the type of IO that takes place and how you get that data from the operating system. It is low level compared to higher level functions still in Python that give you easier ways to read files (convenience at the cost of control). It is comparable to the difference between read() calls and fread() or fscanf() in C.
Health warning: Writing kernel modules, if you get it wrong, will at best result in that module not being properly loaded; at worst your system will panic/bluescreen and you'll have to reboot.
The final point about machine instructions I cannot answer here. It's a totally separate question and it depends. There are many tools capable of analysing code like that I'm sure, but I'm not a reverse engineer. However, I do know that many of these tools (gdb, valgrind) e.g. tools that hook into binary code do not need kernel modules to do their work.

You can use inpout library http://logix4u.net/parallel-port/index.php
import ctypes
#Example of strobing data out with nStrobe pin (note - inverted)
#Get 50kbaud without the read, 30kbaud with
read = []
for n in range(4):
ctypes.windll.inpout32.Out32(0x37a, 1)
ctypes.windll.inpout32.Out32(0x378, n)
read.append(ctypes.windll.inpout32.Inp32(0x378)) #Dummy read to see what is going on
ctypes.windll.inpout32.Out32(0x37a, 0)
print read

[note: I was wrong. usermode code can no longer access ring 0 on modern unix systems. -- jc 2019-01-17]
I've forgotten what little I ever knew about Windows privileges. In all Unix systems with which I'm familiar, the root user can access all ring0 privileges. But I can't think of any mapping of Python modules with privilege rings.
That is, the 'os' and 'sys' modules don't give you any special privileges. You have them, or not, due to your login credentials.

How can I access lower rings in Python?
ctypes
Is the low-level io for accessing lower level rings?
No.
Is the statement "This function is intended for low-level I/O." referring to lower level rings or to something else?
Something else.
C tends to be prominent language in os -programming. When there is the OS -class in Python, does it mean that I can access C -code through that class?
All of CPython is implemented in C.
The os module (it's not a class, it's a module) is for accessing OS API's. C has nothing to do with access to OS API's. Python accesses the API's "directly".
Suppose I am playing with bizarre machine-language code and I want to somehow understand what it means. Are there some tools in Python which I can use to analyze such things?
"playing with"?
"understand what it means"? is your problem. You read the code, you understand it. Whether or not Python can help is impossible to say. What don't you understand?
If there is not, is there some way that I could still use Python to control some tool which controls the bizarre machine language? [ctypes suggested in comments]
ctypes
If Python has nothing to do with the low-level privileged stuff, do it still offers some wrappers to control the privileged?
You don't "wrap" things to control privileges.
Most OS's work like this.
You grant privileges to a user account.
The OS API's check the privileges granted to the user making the OS API request.
If the user has the privileges, the OS API works.
If the user lacks the privileges, the OS API raises an exception.
That's all there is to it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.