What is the difference between the various ZODB blobstorage layouts? - python

The ZODB blobstorage directory contains a .layout file with the string 'lawn', 'bushy'.
What is the difference between the various blob storage directory formats?

It is explained here: https://github.com/zopefoundation/ZODB/blob/master/src/ZODB/tests/blob_layout.txt
FTA:
======================
Blob directory layouts
The internal structure of the blob directories is governed by so called
layouts. The current default layout is called bushy.
The original blob implementation used a layout that we now call lawn and
which is still available for backwards compatibility.
Layouts implement two methods: one for computing a relative path for an
OID and one for turning a relative path back into an OID.
Our terminology is roughly the same as used in DirectoryStorage.
It also explains the formats in detail.

You generally don't need to worry about the layout; lawn is there only for backwards compatibility.
If you do have a lawn layout blobstorage (you'll get a warning in the log if you do) and want to migrate to a bushy layout, use the migrateblobs script; here is a buildout part to create the script:
[migrateblobs]
recipe = zc.recipe.egg
eggs = ZODB3
entry-points = migrateblobs=ZODB.scripts.migrateblobs:main
Shut down any instances and ZEO servers, back up your blob storage and run the script on your blobstorage directory:
$ mv var/blobstorage var/blobstorage-lawn
$ bin/migrateblobs var/blobstorage-lawn/ var/blobstorage
var/blobstorage will then contain the migrated blobs using the bushy layout.

Related

Treating a directory like a file in Python

We have a tool which is designed to allow vendors to deliver files to a company and update their database. These files (generally of predetermined types) use our web-based transport system, a new record is created in the db for each one, and the files are moved into a new structure when delivered.
We have a new request from a client to use this tool to be able to pass through entire directories without parsing every record. Imagine if the client made digital cars then this tool allows the delivery of the digital nuts and bolts and tracks each part, but they want to also deliver a directory with all of the assets which went into creating a digital bolt without adding each asset as a new record.
The issue is that the original code doesn't have a nice way to handle these passthrough folders, and would require a lot of rewriting to make it work. We'd obviously need to create a new function which happens around the time of the directory walk, which takes out each folder which matches this passthrough and then handles it separately. The problem is that all the tools which do the transport, db entry, and delivery all expect files, not folders.
My thinking: what if we could treat that entire folder as a file? That way the current file-level tools don't need to be modified, we'd just need to add the "conversion" step. After generating the manifest, what if we used a library to turn it into a "file", send that, and then turn it back into a "folder" after ingest. The most obvious way to do that is ZIP files - and the current delivery tool does handle ZIPs - but that is slow and some of these deliveries are very large, which means when transporting if something goes wrong the entire ZIP would fail.
Is there a method which we can use which doesn't necessarily compress the files but just somehow otherwise can treat a directory and all its contents like a file, so the rest of the code doesn't need to be rewritten? Or something else I'm missing entirely?
Thanks!
You could use tar files. Python has great support for it, and it is customary in *nix environments to use them as backup files. For compression you could use Gzip (also supported by the standard library and great for streaming).

Can multiple processes write to the same folder?

I run several processes in Python (using multiprocessing.Process) on an Ubuntu machine.
Each of the processes writes various temporary files. Each process writes different files, but all files are in the same folder.
Is there any potential risk of error here?
The reason I think there might be a problem is that, AFAIK, a folder in Unix is just a file. So it's jsut like several processes writing to the same file at the same time, which might cause a loss of information.
Is this really a potential risk here? If so, how to solve it?
This has absolutely nothing to do with Python, as file operations in Python use OS level system calls (unless run as root, your Python program would not have permissions to do raw device writes anyway and doing them as root would be incredibly stupid).
A little bit of file system theory if anyone cares to read:
Yes, if you study file system architecture and how data is actually stored on drives, there are similarities between files and directories - but only on data storage level. The reason being there is no need to separate these two. For example ext4 file system has a method of storing information about a file (metadata), stored in small units called inodes, and the actual file itself. Inode contains a pointer to the actual disk space where file data can be found.
File systems generally are rather agnostic to directories. A file system is basically just this: it contains information about free disk space, information about files with pointers to data, and the actual data. Part of metadata is the directory where the file resides. In modern file systems (ancient FAT is the exception that is still in use) data storage on disk is not related to directories. Directories are used to allow both humans and the computer implementing the file system locate files and folders quickly instead of walking through sequentially the list of inodes until the correct file is found.
You may have read that directories are just files. Yes, they are "files" that contain either a list of files in it (or actually a tree but please do not confuse this with a directory tree - it is just a mechanism of storing information about large directories so that files in that directory do not need to be searched sequentially within the directory entry). The reason this is a file is that it is the mechanism how file systems store data. There is no need to have a specific data storage mechanism, as a directory only contains a list of files and pointers to their inodes. You could think of it as a database or even simpler, a text file. But in the end it is just a file that contains pointers, not something that is allocated on the disk surface to contain the actual files stored in the directory.
That was the background.
The file system implementation on your computer is just a piece of software that knows how to deal with all this. When you open a file in a certain directory for writing, something like this usually happens:
A free inode is located and an entry created there
Free clusters / blocks database is queried to find storage space for the file contents
File data is stored and blocks/clusters are marked "in use" in that database
Inode is updated to contain file metadata and a
pointer to this disk space
"File" containing the directory data of
the target directory is located
This file is modified so that one
record is added. This record has a pointer to the inode just
created, and the file name as well
Inode of the file is updated to
contain a link to the directory, too.
It is the job of operating system and file system driver within it to ensure all this happens consistently. In practice it means the file system driver queues operations. Writing several files into the same directory simultaneously is a routine operation - for example web browser cache directories get updated this way when you browse the internet. Under the hood the file system driver queues these operations and completes steps 1-7 for each new file before it starts processing the following operation.
To make it a bit more complex there is a journal acting as an intermediate buffer. Your transactions are written to the journal, and when the file system is idle, the file system driver commits the journal transactions to the actual storage space, but theory remains the same. This is a performance and reliability issue.
You do not need to worry about this on application level, as it is the job of the operating system to do all that.
In contrast, if you create a lot of randomly named files in the same directory, in theory there could be a conflict at some point if your random name generator produced two identical file names. There are ways to mitigate this, and this would be the part you need to worry about in your application. But anything deeper than that is the task of the operating system.
On linux, opening a file (with or without the O_CREAT flag set) is an atomic operation (see for example this list). In a nutshell, as long as your processes use different files, you should have no trouble at all.
Just for you information appending to a file (up to a certain byte limit) is atomic as well. This article is interesting in this regard.
Writing to different files in the same folder won't cause a problem. Sure, a folder is a file in Linux but you open the file for writing not the folder.
On the other hand wiritng to the same file with multiple processes can, depending on your log size, cause issues. See this question for more details: Does python logging support multiprocessing?

How to extract from FAT16 filesystem image in Python

I have a image which is FAT (16 bit), I want to parser the image to file, so that i can get the files in image.
As far as reading a FAT32 filesystem image in Python goes, the Wikipedia page has all the detail you need to write a read-only implementation.
Construct may be of some use. Looks like they have an example for FAT16 (https://github.com/construct/construct/blob/master/construct/examples/formats/filesystem/fat16.py) which you could try extending.
Actually, I was in a similar situation, where I needed FAT12/16/32 support in Python. Searching the web you can find various implementations (such as maxpat78/FATtools, em-/grasso or hisahi/PyFAT12).
None of those libraries were available via PyPI at the time or were lacking features I needed so (full disclosure) I decided to write my own but I'll try to sum it up as objectively as possible:
pyfatfs supports FAT12, FAT16 as well as FAT32 including VFAT (long file names) and can be installed via pip as a pure Python package (no native dependencies such as mtools needed and/or included). It implements functionality for PyFilesystem2, a framework for basic file operations across different filesystem implementations (SSH, AWS S3, OSFS host directory pass-through, …). Aside from that pyfatfs can also be used standalone (without PyFilesystem2) in case you need to make more low-level operations (manipulating directory/file entries, changing disk attributes, formatting disks/images, manipulating FAT, etc.).
For instance, to copy files from a diskette image to your host via PyFilesystem2:
import fs
fat_fs = fs.open_fs("fat://my_diskette.img") # Open disk image
host_fs = fs.open_fs("osfs:///tmp") # Open '/tmp' directory on host
fs.copy.copy_dir(fat_fs, "/", host_fs, "/") # Copy all files from the disk image to the host_fs filesystem (/tmp directory on host)

How do I make files downloadable for a particular role in Plone?

I wish to make the contents of a folder in Plone downloadable only for certain roles. Can this be done easily? At present anybody who clicks the hyperlink for file name in the folder contents can download the file easily. I know about the site-wide option of overriding the at_download code using ZMI.
The codeless way to do this is to make use of Plone's workflow system.
Out-of-the-box, Plone's file and image content types do not have their own workflow. That means that files and images will simply inherit the publication state of their parent folder. This is easy and sensible, but it doesn't meet the need you're describing.
To change the situation, you may use the "types" configuration panel to turn on independent workflow for files and images. Then, their publication status may be set separately from their containing folders. Typically, you'd choose the same workflow that you're using for documents. Then, you may publish a folder and list its contents while having the files within be private -- thus requiring login for viewing.
If you need this to work differently in different places, you may turn on "placeful" workflow (turn it on by adding it in the add-ons panel; it's pre-installed, but not active). This allows different workflows in different parts of a site. It increases complexity, but is often an ideal solution to this kind of puzzle.
This is probably not so simple and you need to add some line of code in a little Plone product (no way TTW). Code snippets below are not tested.
Plone file are developed using the Archetypes framework (this will probably change on Plone 5). What you need to change is the read_permission of the file field (see the Archetypes field reference).
from Products.Archetypes.content.file import ATFile
ATFile.schema['file'].read_permission = 'you new permission'
The you simply need to assign your new permission to a role.
This could be not enough (probably step 1 is not useful nowadays). You need to perform the same operation for the [plone.app.blob extension][2]:
from plone.app.blob.subtypes import SchemaExtender
SchemaExtender.fields[0]..read_permission = 'you new permission'
Last one: you probably need to customize the file_view template or an "Unauthorized" error will be raised when a user without the permission will visit the file view.

can i use plone workflow to manage autocad related drawings?

How can I use Plone 4.1.4 to manage autocad drawings with different roles like architect, sr.architect, Project Manager, accounts manager(who manages the user accounts). I would first of all like to know whether Plone can be used to crease a workflow for uploaded autocad drawing files or for uploaded files as such? Doubt arises due to certain plone documentation which say that plone By default, content types Image and File have no workflow.
I wish to track the comments and changes made by the different user roles to the drawing files as well provide a lock i.e iterate through the working copy of the drawing files that have been uploaded. Can anyone suggest the best approach to this project using Plone?
You can change the workflow used for File objects, or indeed copy the File type in portal_types to a a new Drawing type and change the workflow for that new type if you want to treat them differently to standard files in your CMS.

Categories