Our product has a file that was not versions properly deleted from the server (ftp crash). Thing is, the cloud processes are running, and I can actually submit python jobs to them (we have a process management framework.)
Is there anyway to get the code from an in-memory module? If so, I can run that code and recover the file.
Is this even possible?
Related
Ok, I know this is strange, but after a day of searching, I couldn't find any answer to this problem.
I've got this system running since two years with Django under Apache with a classical mod_wsgi installation. An exact mirror of the web site is used for development and testing.
In order to speed up a query, I used the inbuilt Django cache, using a file backend. In development (inbuilt Django server) everything works fine and a file is created under /var/tmp/django_cache. Everything works also in production, but no file is created.
I was surprised, so I started experimenting and inserted a bunch of prints in the django.core.cache modules and followed the execution of the cache stuff. At a certain point I got to a os.makedirs, which doesn't create anything. I inserted a open(), created a file (absolute path) and nothing is created. Tried to read back from the nonexisting file and the content was there.
I'm really puzzled. It seems that somehow there is a sort of "virtual" filesystem, which works correctly but in parallel with the real thing. I'm using Django 1.11.11.
Who is doing the magic? Django, Apache, mod_wsgi? Something else?
Ok, #DanielRoseman was right: "More likely the file is being created in another location". The reason it can impact on any filesystem operation is that it's a feature of systemd called PrivateTmp. From the documentation:
sets up a new file system namespace for the executed processes and mounts private /tmp and /var/tmp directories inside it that is not shared by processes outside of the namespace
In fact there is a bunch of folders in both /tmp and /var/tmp called something like systemd-private-273bc022d82337529673d61c0673a579-apache2.service-oKiLBu.
Somehow my find command never reached those folders. All created files are there in a very regular filesystem. Now I also understand why an Apache restart clears the Django cache. systemd deletes the process private tmp and creates a new one for the new process.
I found the answer here: https://unix.stackexchange.com/a/303327/329567
apologies upfront. I am an extreme newbie and this is probably a very easy question. After much trial and error, I set up an app on Heroku that runs a python script that scrapes data off of a website and stores it in a text file. (I may switch the output to a .csv file). The script and app are running on a Heroku Scheduler, so the scraping takes place on a schedule and the data automatically gets written to the file that is on the Heroku platform. I simply want to download the particular output file occasionally so that I can look at it. (Part of the data that is scraped is being tweeted on a twitter bot that is part of the script.)
(Not sure that this is relevant but I uploaded everything through Git.)
Many thank in advance.
You can run this command heroku run cat path/to/file.txt, but keep in mind that Heroku uses ephemeral storage, so you don't have any guarantee that your file will be there.
For example, Heroku restarts your dynos every 24 hours or so. After that you won't have that file anymore. The general practice is to store files on some external storage provider like Amazon S3.
Not just ephemeral, but immutable, which means you can't write to the file system. You'll have to put the file in something like S3, or just put the data into a database like Postgres.
I have previously written a script using python that monitors a windows directory and uploads any new files to a remote server offsite. The intent is to run it at all times and allow users to dump their files there to sync with the cloud directory.
When a file added is large enough that it is not transferred to the local drive all at once, Watchdog "sees" it as it is partially uploaded and tries to upload the partial file, which fails. How can I ensure that these files are "complete" before they are uploaded? Again, I am on Windows, and cannot use anything but Windows to complete this task, or I would have used inotify. Is it even possible to check the "state" of a file in this way on Windows?
It looks like there is no easy way to do this. I think you can put in place something that checks the stats on the directory when it triggers and only actions after a given amount of time that the folder size hasn't changed:
https://github.com/gorakhargosh/watchdog/issues/184
As a side note, I would check out Apache Nifi. I have used it with a lot of success and it was pretty easy to get up and running
https://nifi.apache.org/
In Django (1.9) trying to load .py files (modules) dynamically (via importlib). The dynamic reload is working like a charm, but every time I reload a module, the dev server restarts, having to reload everything else.
I'm pulling in a lot of outside data (xml) for testing purposes, and every time the environment restarts, it has to reload all of this external xml data. I want to be able to reload a module only, and keep that already loaded xml data intact, so that it doesn't have to go through that process every time I change some py-code.
Is there a flag I can set/toggle (or any other method) to keep the server from restarting the whole process for this single module reload?
Any help very appreciated.
If you run the development server using --noreload parameter it will not auto reload the changes:
python manage.py runserver --noreload
Disables the auto-reloader. This means any Python code changes you make while the server is running will not take effect if the particular Python modules have already been loaded into memory.
Would it be possible for a python application which runs on a server to run another python application and intercept all HDD reads and writes made by the child application. Then send them through to a client application over a web socket so that the operation can be executed on the client rather than the server?
Intercepting real hard disk access is impossible without OS-specific changes.
A easier approach would be intercepting file access.
If you're importing the python module that does the writes, this can be done through simple monkey patching - just replace the file objects by instances of a custom class you create. You can even replace open, if you really want to.
If you're launching a separate process (such as with subprocess), and want to keep doing so, I suspect this will be impossible with pure python (without modifying the called program)
Some possible system-level solutions on linux:
using LD_PRELOAD to intercept library calls.
Write a FUSE program to intercept the filesystem access