How do I stop hg child process when using hglib - python

I have a Python application in Mercurial. In the application I found the need to display from which commit it is currently running. The best solution I have found so far is to make use of hglib. I have a module which looks like this:
def _get_version():
import hglib
repo = hglib.open()
[p] = repo.parents()
return p[1]
version = _get_version()
This uses hglib to find the used version and stores the result in a variable, which I can use for the entire time the service remains running.
My problem now is that this leaves a hg child process running, which is useless to me, since as soon as this module is done initializing, I don't need to use hglib anymore.
I would have expected the child process to be shut down during garbage collection once my reference to the repository instance goes out of scope. But apparently that is not how it works.
When reading the hglib documentation I didn't find any documentation on how to get the child process shut down.
What is the preferred method to get the hg child process shut down once I am done with it?

You need to treat repo sort of like a file. Either call repo.close() when you're done or use it inside a with:
def _get_version():
import hglib
with hglib.open() as repo:
[p] = repo.parents()
return p[1]
version = _get_version()

Related

Prefect still treats files as cached after I've deleted them

The rest of my team uses Prefect for "pipelining stuff", so I'm trying to do a thing in Prefect, but for this thing, I need behavior sort of like GNU make. Specifically, I want to specify a filename at runtime, and
If the file doesn't exist, I want Prefect to run a specific task.
If the file exists, I want Prefect to skip that task.
I read through
Prefect caching through a file target
and got that system mostly working: behavior 2 works, and if I run it twice, then the second time is faster because the task is skipped. But behavior 1 doesn't work. If I run the flow, delete the file, and run the flow again, I want it to run the task, but it doesn't, and I still don't have my file at the end. How do I get it to run the task in this situation? Here's a little example.
import os
os.environ["PREFECT__FLOWS__CHECKPOINTING"] = "true"
from prefect.engine.results import LocalResult
from prefect import task, Flow, Parameter
import subprocess
#task(result=LocalResult(), target="{myfilename}")
def make_my_file(myfilename):
subprocess.call(["touch", myfilename])
subprocess.call(["sleep", "1"])
return True
with Flow("makemyfile") as flow:
myfilename = Parameter("myfilename", default="foo.txt")
is_my_file_done = make_my_file(myfilename)
flow.run(myfilename = "bar.txt")
To see the behavior:
python demo_flow.py # makes bar.txt
python demo_flow.py # skips the task
rm bar.txt
python demo_flow.py # still skips the task! Rawr!
For prefect2 I guess that the answer should have been:
https://docs.prefect.io/concepts/tasks/#refreshing-the-cache
But somehow in current version (2.7.9) there is no refresh_cache on the task.
The only way I found to selectively delete the cache from a given flow is to:
in the table task_run_state update data from all task_runs from given flow_run to null
in the table task_run_state_cache delete all the updated task_run_state.
I have also found somewhere on the internet that some people update / create own the cache_key_fn. Unfortunately I can't find where.
I have found that it should be easier to manage after following fix is implemented:
https://github.com/PrefectHQ/prefect/issues/8239

UUID stayed the same for different processes on Centos OS, but works fine on Windows OS (UUID per Process flow)

I have two source files that I am running in Python 3.9. (The files are big...)
File one (fileOne.py)
# ...
sessionID = uuid.uuid4().hex
# ...
File two (fileTwo.py)
# ...
from fileOne import sessionID
# ...
File two is executed using module multiprocessing.
When I run on my local machine and print the UUID in file two, it is always unique.
When I run the script on Centos OS, it somehow remained the same
If I restart the service, the UUID will change once.
My question: Why does this work locally (Windows OS) as expected, but not on a CentOS VM?
UPDATE 1.0:
To make it clear.
For each separate process, I need that UUID will be the same across FileOne and FileTwo. WHich mean
processOne = UUID in file one and in file two will be 1q2w3e
processTwo = UUID in file one and in file two will be r4t5y6 (a different one)
Your riddle is likely is caused by the way multi-processing works in different operating systems. You don't mention, but your "run locally" is certainly Windows or MacOS, not a Linux or other Unix Flavor.
The thing is that multiprocessing on Linux (and up to a time ago on MacOS, but changed that on Python 3.8), used a system fork call when using multiprocessing: the current process is duplicatesd "as is" with all its defined variables and classes - since your sessionID is defined at import time, it stays the same in all subprocesess.
Windows lacks the fork call, and multiprocessing resorts to start a new Python interpreter which re-imports all modules from the current process (this leads to another, more common cause of confusion, where any code not guarded by an if __name__ == "__main__": on the entry Python file is re-executed). In your case the value for sessionID is regenerated.
Check the docs at: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
So, if you want the variable to behave reliably and have teh same value across all processes when running multiprocessing, you should either pass it as a parameter to the target functions in the other processes, or use a proper structure meant to share values across processes as documented here:
https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes
(you can also check this recent question about the same topic: why is a string printing 3 times instead of 1 when using time.sleep with multiprocessing imported?)
If you need a unique ID across files for each different process:
(As is more clear from the edit and comments)
Have a global (plain) dictionary which will work as a per-process registry for the IDs, and use a function to retrieve the ID - the function can use os.getpid() as a key to the registry.
file 1:
import os
import uuid
...
_id_registry = {}
def get_session_id():
return _id_registry.setdefault(os.getpid(), uuid.uuid4())
file2:
from file1 import get_session_id
sessionID = get_session_id()
(the setdefault dict method takes care of providing a new ID value if none was set)
NB.: the registry set up in this way will keep at most the master process ID (if multiprocessing is using fork mode) and itself - no data on the siblings, as each process will hold its own copy of the registry. If you need a working inter-process dictionary (which could hold a live registry for all processes, for example) you will probably be better using redis for it (https://redis.io - certainly one of the Python bindings have a transparent Python-mapping-over-redis, so you don´t have to worry with the semantics of it)
When you run your script it generate the new value of the uuid, but when you run it inside some service you code the same as:
sessionID = 123 # simple constant
so to fix the issue you can try wrap the code to the function, for example:
def get_uuid():
return uuid.uuid4().hex
in you second file:
from frileOne import get_uuid
get_uuid()

Is it possible to display file size in a directory served using http.server in python?

I've served a directory using
python -m http.server
It works well, but it only shows file names. Is it possible to show created/modified dates and file size, like you see in ftp servers?
I looked through the documentation for the module but couldn't find anything related to it.
Thanks!
http.server is meant for dead-simple use cases, and to serve as sample code.1 That's why the docs link right to the source.
That means that, by design, it doesn't have a lot of configuration settings; instead, you configure it by reading the source and choosing what methods you want to override, then building a subclass that does that.
In this case, what you want to override is list_directory. You can see how the base-class version works, and write your own version that does other stuff—either use scandir instead of listdir, or just call stat on each file, and then work out how you want to cram the results into the custom-built HTML.
Since there's little point in doing this except as a learning exercise, I won't give you complete code, but here's a skeleton:
class StattyServer(http.server.HTTPServer):
def list_directory(self, path):
try:
dirents = os.scandir(path)
except OSError:
# blah blah blah
# etc. up to the end of the header-creating bit
for dirent in dirents:
fullname = dirent.path
displayname = linkname = dirent.name
st = dirent.stat()
# pull stuff out of st
# build a table row to append to r
1. Although really, it's sample code for an obsolete and clunky way of building servers, so maybe that should be "to serve as sample code to understand legacy code that you probably won't ever need to look at but just in case…".

Is there a way to reload the "current" file in Python?

I am working on a small IRC bot to do a few simple things, and I find it annoying that every time I want to test any changes I have to find the process, kill it, start the bot again and wait for it to connect. I tried to make a command for it that reloaded the Python file so any changes would be saved and I could edit it more easily that way, but when using this code to try and reload:
def reload(self, *args):
reload(pybot)
return "Reloaded!"
I get this error:
TypeError: reload() argument must be module
The only files this bot uses is its own, pybot.py, the iblib module and a few other python libraries.
What my question is, that is there any way to make Python reload the file it is currently using, and not a module?
According to the error, "pybot" doesn't refer to a module. If the name of the module you want to reload is in fact "pybot", your code will work if at some point prior you successfully did "import pybot".
In the following example, assume "pybot.py" is a module that defines the variable version:
>>> import pybot
>>> print pybot.version
1.0
>>> # edit pybot.py to change version to 1.1
...
>>> reload(pybot)
<module 'pybot' from 'pybot.py'>
>>> print pybot.version
1.1
Instead of reloading the module, you can start a new process (replacing the old) with os.execl or os.execv:
os.execl("/path/to/pybot.py", "pybot.py")
But I think you'd be better off leaving this out of pybot. Just have your program save its PID (available via os.getpid()) into a file; then write a separate script to read the PID, kill and relaunch your program whenever you want. On a unix system it could be as simple as this:
#!/bin/sh
kill -9 `cat pybot.pid`
python pybot.py &
Builtin function reload(module) reloads a MODULE that must have been successfully imported before
So what I can suggest:
Create a bot_core.py module with core functions
Create another bot_main.py module that will be minimal (and you won't want to change it) and it will load module bot_core, then use it's functions
Every time you want to reload bot_core, use reload

In Python, how do I make a temp file that persists until the next run?

I need to create a folder that I use only once, but need to have it exist until the next run. It seems like I should be using the tmp_file module in the standard library, but I'm not sure how to get the behavior that I want.
Currently, I'm doing the following to create the directory:
randName = "temp" + str(random.randint(1000, 9999))
os.makedirs(randName)
And when I want to delete the directory, I just look for a directory with "temp" in it.
This seems like a dirty hack, but I'm not sure of a better way at the moment.
Incidentally, the reason that I need the folder around is that I start a process that uses the folder with the following:
subprocess.Popen([command], shell=True).pid
and then quit my script to let the other process finish the work.
Creating the folder with a 4-digit random number is insecure, and you also need to worry about collisions with other instances of your program.
A much better way is to create the folder using tempfile.mkdtemp, which does exactly what you want (i.e. the folder is not deleted when your script exits). You would then pass the folder name to the second Popen'ed script as an argument, and it would be responsible for deleting it.
What you've suggested is dangerous. You may have race conditions if anyone else is trying to create those directories -- including other instances of your application. Also, deleting anything containing "temp" may result in deleting more than you intended. As others have mentioned, tempfile.mkdtemp is probably the safest way to go. Here is an example of what you've described, including launching a subprocess to use the new directory.
import tempfile
import shutil
import subprocess
d = tempfile.mkdtemp(prefix='tmp')
try:
subprocess.check_call(['/bin/echo', 'Directory:', d])
finally:
shutil.rmtree(d)
"I need to create a folder that I use only once, but need to have it exist until the next run."
"Incidentally, the reason that I need the folder around is that I start a process ..."
Not incidental, at all. Crucial.
It appears you have the following design pattern.
mkdir someDirectory
proc1 -o someDirectory # Write to the directory
proc2 -i someDirectory # Read from the directory
if [ %? == 0 ]
then
rm someDirectory
fi
Is that the kind of thing you'd write at the shell level?
If so, consider breaking your Python application into to several parts.
The parts that do the real work ("proc1" and "proc2")
A Shell which manages the resources and processes; essentially a Python replacement for a bash script.
A temporary file is something that lasts for a single program run.
What you need is not, therefore, a temporary file.
Also, beware of multiple users on a single machine - just deleting anything with the 'temp' pattern could be anti-social, doubly so if the directory is not located securely out of the way.
Also, remember that on some machines, the /tmp file system is rebuilt when the machine reboots.
You can also automatically register an function to completely remove the temporary directory on any exit (with or without error) by doing :
import atexit
import shutil
import tempfile
# create your temporary directory
d = tempfile.mkdtemp()
# suppress it when python will be closed
atexit.register(lambda: shutil.rmtree(d))
# do your stuff...
subprocess.Popen([command], shell=True).pid
tempfile is just fine, but to be on a safe side you'd need to safe a directory name somewhere until the next run, for example pickle it. then read it in the next run and delete directory. and you are not required to have /tmp for the root, tempfile.mkdtemp has an optional dir parameter for that. by and large, though, it won't be different from what you're doing at the moment.
The best way of creating the temporary file name is either using tempName.TemporaryFile(mode='w+b', suffix='.tmp', prifix='someRandomNumber' dir=None)
or u can use mktemp() function.
The mktemp() function will not actually create any file, but will provide a unique filename (actually does not contain PID).

Categories