Prototyping a filesystem

Prototyping a filesystem - python

What are some best practises for prototyping a filesystem?
I've had an attempt in Python using fusepy, and now I'm curious:
In the long run, should any
respectable filesystem implementation
be in C? Will not being in C hamper
portability, or eventually cause
performance issues?
Are there other implementations like
FUSE?
Evidently core filesystem technology moves slowly (fat32, ext3, ntfs, everything else is small fish), what debugging techniques are employed?
What is the general course filesystem development takes in arriving at a highly optimized, fully supported implementation in major OSs?

A filesystem that lives in userspace (be that in FUSE or the Mac version thereof) is a very handy thing indeed, but will not have the same performance as a traditional one that lives in kernel space (and thus must be in C). You could say that's the reason that microkernel systems (where filesystems and other things live in userspace) never really "left monolithic kernels in the dust" as A. Tanenbaum so assuredly stated when he attacked Linux in a famous posting on the Minix mailing list almost twenty years ago (as a CS professor, he said he'd fail Linus for choosing a monolithic architecture for his OS -- Linus of course responded spiritedly, and the whole exchange is now pretty famous and can be found in many spots on the web;-).
Portability's not really a problem, unless perhaps you're targeting "embedded" devices with very limited amounts of memory -- with the exception of such devices, you can run Python where you can run C (if anything it's the availability of FUSE that will limit you, not that of a Python runtime). But performance could definitely be.

In the long run, should any
respectable filesystem implementation
be in C? Will not being in C hamper
portability, or eventually cause
performance issues?
Not necessarily, there are plenty of performing languages different to C (O'Caml, C++ are the first that come to mind.) In fact, I expect NTFS to be written in C++. Thing is you seem to come from a Linux background, and as the Linux kernel is written in C, any filesystem with hopes to be merged into the kernel has to be written in C as well.
Are there other implementations like
FUSE?
There are a couple for Windows, for example, http://code.google.com/p/winflux/ and http://dokan-dev.net/en/ in various maturity levels
Evidently core filesystem technology
moves slowly (fat32, ext3, ntfs,
everything else is small fish), what
debugging techniques are employed?
Again, that is mostly true in Windows, in Solaris you have ZFS, and in Linux ext4 and btrfs exist. Debugging techniques usually involve turning machines off in the middle of various operations and see in what state data is left, storing huge amounts of data and see performance.
What is the general course filesystem
development takes in arriving at a
highly optimized, fully supported
implementation in major OSs?
Again, this depends on which OS, but it does involve a fair amount of testing, especially
making sure that failures do not lose data.

I recommend you create a mock object for the kernel block device API layer. The mock layer should use a mmap'd file as a backing store for the file system. There are a lot of benefits for doing this:
Extremely fast FS performance for running unit test cases.
Ability to insert debug code/break points into the mock layer to check for failure conditions.
Easy to save multiple copies of the file system state for study or running test cases.
Ability to deterministically introduce block device errors or other system events that the file system will have to handle.

Respectable filesystems will be fast and efficient. For Linux, that will basically mean writing in C, because you won't be taken seriously if you're not distributed with the kernel.
As for other tools like Fuse, There's MacFUSE, which will allow you to use the same code on macs as well as linux.

Related

Fastest way to exchange data between C++ and Python?

I'm working on a project that is written in C++ and Python. The communication between the 2 sides is made through TCP sockets. Both processes run on the same machine.
The problem is that it is too slow for the current needs. What is the fastest way to exchange information between C++ and Python?
I heard about ZeroMQ, but would it be noticeably faster than plain TCP sockets?
Edit: OS is Linux, the data that should be transferred consists of multiple floats (lets say around 100 numbers) every 0.02s, both ways. So 50 times per second, the python code sends 100 float numbers to C++, and the C++ code then responds with 100 float numbers.

In case performance is the only metric you care for, shared memory is going to be the fastest way to share data between two processes running on the same machine. You can use a semaphore in shared memory for synchronization.
TCP sockets will work as well, and are probably fast enough. Since you are using Linux, I would just use pipes, it is the simplest approach, and they will outperform TCP sockets. This should get your started: http://man7.org/linux/man-pages/man2/pipe.2.html
For more background information, I recommend Advanced Programming in the UNIX Environment.

If you're in the same machine, use a named shared memory, it's a very fast option. On python you have multiprocessing.shared_memory and in C++ you can use posix shared memory, once you're in Linux.

Short answer, no, but ZeroMQ can have other advantages. Lets go straight for it, if you are on Linux and want fast data transfer, you go Shared memory. But it will not as easy as with ZeroMQ.
Because ZeroMQ is a message queue. It solves (and solves well) different problems. It is able to use IPC between C++ and Python, what can be noticeably faster than using sockets (for the same usages), and gives you a window for network features in your future developments. It is reliable and quite easy to use, with the same API in Python and C++. It is often used with Protobuf to serialize and send data, even for high troughput.
The first problem with IPC on ZeroMQ is that it lacks Windows support, because it is not a POSIX compliant system. But the biggest problem is maybe not there: ZeroMQ is slow because it embeds your message. You can enjoy the benefits of it, but it can impede performances. The best way to check this is, as always, to test it by yourself with IPC-BENCH as I am not sure the benchmark I provided in the previous link was using the IPC. The average gain with IPC against local domain TCP is not fantastic though.
As I previously said, I am quite sure Shared Memory will be the fastest, excluding the last possibility: develop your own C++ wrapper in Python. I would bet that is the fastest solution, but will require a bit of engineering to multithread if needed because both C++ and Python will run on the same process. And of course you need to adjust the current C++ code if it is already started.
And as usual, remember that optimization always happen in a context. If data transfer is only a fraction of the running time when compared to the processing you can do after, or if you can wait the 0.00001sec that using Shared memory would help you to gain, it might be worth it to go directly to ZeroMQ because it will be simpler, more scalable and more reliable.

Sharing information between a python code and c++ code (IPC)

I have 2 code bases, one in python, one in c++. I want to share real time data between them. I am trying to evaluate which option will work best for my specific use case:
many small data updates from the C++ program to the python program
they both run on the same machine
reliability is important
low latency is nice to have
I can see a few options:
One process writes to a flat file, the other process reads it. It is non scalable, slow and I/O error prone.
One process writes to a database, the other process reads it. This makes it more scalable, slightly less error prone, but still very slow.
Embed my python program into the C++ one or the other way round. I rejected that solution because both code bases are reasonably complex, and I prefered to keep them separated for maintainability reasons.
I use some sockets in both programs, and send messages directly. This seems to be a reasonable approach, but does not leverage the fact that they are on the same machine (it will be optimized slightly by using local host as destination, but still feels cumbersome).
Use shared memory. So far I think this is the most satisfying solution I have found, but has the drawback of being slightly more complex to implement.
Are there other solutions I should consider?

First of all, this question is highly opinion-based!
The cleanest way would be to use them in the same process and get them communicate directly. The only complexity is to implement proper API and C++ -> Python calls. Drawbacks are maintainability as you noted and potentially lower robustness (both crash together, not a problem in most cases) and lower flexibility (are you sure you'll never need to run them on different machines?). Extensibility is the best as it's very simple to add more communication or to change existing. You can reconsider maintainability point. Can you python app be used w/o C++ counterpart? If not I wouldn't worry about maintainability so much.
Then shared memory is the next choice with better maintainability but same other drawbacks. Extensibility is a little bit worse but still not so bad. It can be complicated, I don't know Python support for shared memory operation, for C++ you can have a look at Boost.Interprocess. The main question I'd check first is synchronisation between processes.
Then, network communication. Lots of choices here, from the simplest possible binary protocol implemented on socket level to higher-level options mentioned in comments. It depends how complex your C++ <-> Python communication is and can be in the future. This approach can be more complicated to implement, can require 3rd-party libraries but once done it's extensible and flexible. Usually 3rd-party libraries are based on code generation (Thrift, Protobuf) that doesn't simplify your build process.
I wouldn't seriously consider file system or database for this case.

Tricks to improve performance of python backend

I am using python programs to nearly everything:
deploy scripts
nagios routines
website backend (web2py)
The reason why I am doing this is because I can reuse the code to provide different kind of services.
Since a while ago I have noticed that those scripts are putting a high CPU load on my servers. I have taken several steps to mitigate this:
late initialization, using cached_property (see here and here), so that only those objects needed are indeed initialized (including import of the related modules)
turning some of my scripts into http services (with a simple web.py implementation, wrapping-up my classes). The services are then triggered (by nagios for example), with simple curl calls.
This has reduced the load dramatically, going from over 20 CPU load to well under 1. It seems python startup is very resource intensive, for complex programs with lots of inter-dependencies.
I would like to know what other strategies are people here implementing to improve the performance of python software.

An easy one-off improvement is to use PyPy instead of the standard CPython for long-lived scripts and daemons (for short-lived scripts it's unlikely to help and may actually have longer startup times). Other than that, it sounds like you've already hit upon one of the biggest improvements for short-lived system scripts, which is to avoid the overhead of starting the Python interpreter for frequently-invoked scripts.
For example, if you invoke one script from another and they're both in Python you should definitely consider importing the other script as a module and calling its functions directly, as opposed to using subprocess or similar.
I appreciate that it's not always possible to do this, since some use-cases rely on external scripts being invoked - Nagios checks, for example, are going to be tricky to keep resident at all times. Your approach of making the actual check script a simple HTTP request seems reasonable enough, but the approach I took was to use passive checks and run an external service to periodically update the status. This allows the service generating check results to be resident as a daemon rather than requiring Nagios to invoke a script for each check.
Also, watch your system to see whether the slowness really is CPU overload or IO issues. You can use utilities like vmstat to watch your IO usage. If you're IO bound then optimising your code won't necessarily help a lot. In this case, if you're doing something like processing lots of text files (e.g. log files) then you can store them gzipped and access them directly using Python's gzip module. This increases CPU load but reduces IO load because you only need transfer the compressed data from disk. You can also write output files directly in gzipped format using the same approach.
I'm afraid I'm not particularly familiar with web2py specifically, but you can investigate whether it's easy to put a caching layer in front if the freshness of the data isn't totally critical. Try and make sure both your server and clients use conditional requests correctly, which will reduce request processing time. If they're using a back-end database, you could investigate whether something like memcached will help. These measures are only likely to give you real benefit if you're experiencing a reasonably high volume of requests or if each request is expensive to handle.
I should also add that generally reducing system load in other ways can occasionally give surprising benefits. I used to have a relatively small server running Apache and I found moving to nginx helped a surprising amount - I believe it was partly more efficient request handling, but primarily it freed up some memory that the filesystem cache could then use to further boost IO-bound operations.
Finally, if overhead is still a problem then carefully profile your most expensive scripts and optimise the hotspots. This could be improving your Python code, or it could mean pushing code out to C extensions if that's an option for you. I've had some great performance by pushing data-path code out into C extensions for large-scale log processing and similar tasks (talking about hundreds of GB of logs at a time). However, this is a heavy-duty and time-consuming approach and should be reserved for the few places where you really need the speed boost. It also depends whether you have someone available who's familiar enough with C to do it.

Saving the stack?

I'm just curious, is it possible to dump all the variables and current state of the program to a file, and then restore it on a different computer?!
Let's say that I have a little program in Python or Ruby, given a certain condition, it would dump all the current variables, and current state to a file.
Later, I could load it again, in a different machine, and return to it.
Something like VM snapshot function.
I've seen here a question like this, but Java related, saving the current JVM and running it again in a different JVM. Most of the people told that there was nothing like that, only Terracotta had something, still, not perfect.
Thank you.
To clarify what I am trying to achieve:
Given 2 or more Raspberry Pi's, I'm trying to run my software at Pi nº1, but then, when I need to do something different with it, I need to move the software to Pi nº2 without dataloss, only a minor break time.
And so on, to an unlimited number of machines.

Its seams that I was trying to re-invent the wheel.
Check this links:
http://en.wikipedia.org/wiki/Application_checkpointing#DMTCP
http://www.linuxscrew.com/2007/10/17/cryopid-freeze-and-unfreeze-processes-in-linux/

Good question.
In Smalltalk, yes.
Actually, in Smalltalk, dumping the whole program and restarting is the only way to store and share programs. There are no source files and there is no way of starting a program from square zero. So in Smalltalk you would get your feature for free.
The Smalltalk VM offers a hook where each object can register to restore its externals resources after a restart, like reopening files and internet connections. But also, for example integer arrays are registered to that hook to change the endianness of their values on case the dump has been moved to a machine with different endianness.
This might give a hunch at how difficult (or not) it might turn our to achieve this in a language which does not support resumable dumps by design.
All other languages are, alas, much less live. Except for some Lisp implementation, I would not know of any language which supports resuming from a memory dump.
Which is a missed opportunity.

I've seen Mariano demonstrate that using Fuel (object serialization) in Pharo Smalltalk on a recent Esug conference. You could continue debugging and running as long as you don't hit objects not serialized. Squeak smalltalk runs on the Pi, and if saving an image is good enough for you, this is trivial. We're still waiting for the faster JITting VM for ARM though (part of the Google Summer of Code programme)

How would an irc bot written in tcl stack up against a python/node.js clone?

I believe eggdrop is the most active/popular bot and it's written in tcl ( and according to wiki the core is C but I haven't confirmed that ).
I'm wondering if there would be any performance benefit of recoding it's functionality in node.js or Python, in addition to making it more accessible since Python and JS are arguably more popular languages and not many are familiar with tcl.
So, how would they stack up vs tcl in general, performance-wise?

As you suspected, eggdrop is not written in tcl, it is written in C, however it does use tcl as its scripting/extension language.
I would expect that in the case of an eggdrop, the performance difference between using tcl as a scripting language, and using Python, Lua, JS, or virtually anything else would be negligible, as eggdrops generally aren't performing high load tasks.
In the event it really was an issue, your question would need more specifics. Performance for what task under what conditions? Memory use? CPU efficiency? Latency? And the answer would probably be "measure and find out". Given the typical use of an eggdrop, it doesn't take particularly efficient code to respond to the occasional IRC trigger command once every few minutes or hours.
As a more general case, I'm sure you could find benchmark comparisons of specific algorithms or tasks performed by various scripting languages on particular operating systems or environments, at which point it wouldn't really have anything to do with IRC or eggdrop.

If you're not doing much other than waiting on a quiet channel for something to happen, performance is pretty much irrelevant. You could probably write that in BF (well, with network connectivity primitives added) and have it perform OK.
If you're running on lots of busy channels with many things being watched for, that's different. Tcl's very good at event-driven IO, which is ideal for this sort of situation. (Python can do that, but needs external libraries, as does Lua. I don't know JS enough to comment there.)
If you're needing to do significant non-IO-bound processing for some message responses, you're into needing threads. I know that both Tcl and Python support threads, but with utterly different threading models (Python has a shared-memory model which makes it easier to pass some types of task around, especially when the data is large, and Tcl has an apartment model which greatly reduces the amount of locking required in the implementation for a good performance boost in CPU-bound code).
How is that relevant for IRC bots? Well, it all depends on what you're doing in the bot.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.