Skyfield.api loader behaves differently in docker container - python

I wish to specify to Skyfield a download directory as documented here :
http://rhodesmill.org/skyfield/files.html
Here is my script:
from skyfield.api import Loader
load = Loader('~/data/skyfield')
# Next line downloads deltat.data, deltat.preds, Leap_Second.dat in ~/data/skyfield
ts = load.timescale()
t = ts.utc(2017,9,13,0,0,0)
stations_url = 'http://celestrak.com/NORAD/elements/stations.txt'
# Next line downloads stations.txt in ~/data/skyfield AND deltat.data, deltat.preds, Leap_Second.dat in $PWD !!!
satellites = load.tle(stations_url)
satellite = satellites['ISS (ZARYA)']
Expected behaviour (works fine outside docker)
The 3 deltat files (deltat.data, deltat.preds and Leap_Second.dat) are downloaded in ~/data/skyfield with load.timescale() and stations.txt is downloaded at the same place with load.tle(stations_url)
Behaviour when run in a container
The 3 deltat files get downloaded twice :
one time in the specified folder at the call load.timescale()
another time in the current directory at the call load.tle(stations_url)
This is frustrating because they already exist at this point and they pollute current directory. Note that stations.txt end up in the right place (~/data/skyfield)
If the container is ran interactively, then calling exec(open("script.py").read()) in a python shell gives a normal behaviour again. Can anyone reproduce this issue? It is hard to tell wether it comes from python, docker or skyfield.
The dockerfile is just these 2 lines:
FROM continuumio/anaconda3:latest
RUN conda install -c astropy astroquery && conda install -c anaconda ephem=3.7.6.0 && pip install skyfield
Then (assuming the built image is tagged astro) I run it with :
docker run --rm -w /tmp/working -v $PWD:/tmp/working astro:latest python script.py
And here is the output (provided the folders are empty before the run):
[#################################] 100% deltat.data
[#################################] 100% deltat.preds
[#################################] 100% Leap_Second.dat
[#################################] 100% stations.txt
[#################################] 100% deltat.data
[#################################] 100% deltat.preds
[#################################] 100% Leap_Second.dat
EDIT
Adding -t to docker run did not solve the issue but helped to even illustrate it better. I think it may come from Skyfield because some recent issues on github seem quite similar although not exactly the same.

The simple solution here is to add -t to your docker run command to allocate a pseudo TTY:
docker run --rm -t -w /tmp/working -v $PWD:/tmp/working astro:latest python script.py
What you are seeing is caused by the way the lines are printed and buffering of non-TTY based stdout. The percentage up to 100% is likely printed on a line without newlines. Then after 100% it is printed again with a newline. With buffering, this causes it to be printed twice.
When you run the same command with a TTY, there is no buffering and the lines are printed realtime so the newlines actually work as desired.
The code path isn't actually running twice :)
See Docker run with pseudoTTY (-t) gives instant stdout, buffering happens without it for another explanation (possibly better than mine).

Related

copy large data to GCS

I have 10001920 images.And their name is train_0, train_1, ....
I tried to copy them like
!gsutil -m cp -r /content/train/* gs://{my_bucket_name}/data
And I failed b.c it was too long. So I decided to use wild card like
!gsutil -m cp -r /content/train/train_1????.png gs://{my_bucket_name}/data
And I wanted to upload iterative way. After using 'for statement' to generate command line,
for script in script_list:
os.system(script)
And returns
31512
I just wanna know how can I upload those huge files to GCS.
Please give me some ideas
I don't think * should be used. It's not used that way in the documentation. I'd just try:
!gsutil -m cp -r ./content/train gs://{my_bucket_name}/data
This explains the failure number:
Also, although most commands normally fail upon encountering an error when the -m flag is disabled, all commands continue to try all operations when -m is enabled with multiple threads or processes, and the number of failed operations (if any) are reported as an exception at the end of the command's execution.

How to get the location of installed Python package into the shell

I want my users to be able to reference a file in my python package (specifically a docker-compose.yml file) directly from the shell.
I couldnt find a way to get only the location from pip show (and grep-ing out "location" from its output feels ugly), so my current (somewhat verbose) solution is:
docker compose -f $(python3 -c "import locust_plugins; print(locust_plugins.__path__[0])")/timescale/docker-compose.yml up
Is there a better way?
Edit: I solved it by installing a wrapper command I call locust-compose as part of the package. Not perfect, but it gets the job done:
#!/bin/bash
module_location=$(python3 -c "import locust_plugins; print(locust_plugins.__path__[0])")
set -x
docker compose -f $module_location/timescale/docker-compose.yml "$#"
Most of the support you need for this is in the core setuptools suite.
First of all, you need to make sure the data file is included in your package. In a setup.cfg file you can write:
[options.package_data]
timescale = docker-compose.yml
Now if you pip install . or pip wheel, that will include the Compose file as part of the Python package.
Next, you can retrieve this in Python code using the ResourceManager API:
#!/usr/bin/env python3
# timescale/compose_path.py
import pkg_resources
if __name__ == '__main__':
print(pkg_resources.resource_filename('timescale', 'docker-compose.yml'))
And finally, you can take that script and make it a setuptools entry point script (as distinct from the similarly-named Docker concept), so that you can just run it as a single command.
[options.entry_points]
console_scripts=
timescale_compose_path = timescale:compose_path
Again, if you pip install . into a virtual environment, you should be able to run timescale_compose_path and get the path name out.
Having done all of those steps, you can finally run a simpler
docker-compose -f $(timescale_compose_path) up

snakemake cluster script ImportError snakemake.utils

I have a strange issue that comes and goes randomly and I really can't figure out when and why.
I am running a snakemake pipeline like this:
conda activate $myEnv
snakemake -s $snakefile --configfile test.conf.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j 10 --jobscript "$job_script"
I installed snakemake 5.9.1 (also tried downgrading to 5.5.4) within a conda environment.
This works fine if I just run this command, but when I qsub this command to the PBS cluster I'm using, I get an error. My qsub script looks like this:
#PBS stuff...
source ~/.bashrc
hostname
conda activate PGC_de_novo
cd $workDir
snakefile="..."
qsub_script="pbs_qsub_snakemake_wrapper.py"
job_script="..."
snakemake -s $snakefile --configfile test.conf.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j 10 --jobscript "$job_script" >out 2>err
And the error message I get is:
...
Traceback (most recent call last):
File "/path/to/pbs_qsub_snakemake_wrapper.py", line 6, in <module>
from snakemake.utils import read_job_properties
ImportError: No module named snakemake.utils
Error submitting jobscript (exit code 1):
...
So it looks like for some reason my cluster script doesn't find snakemake, although snakemake is clearly installed. As I said, this problem keeps coming and going. It'd stay for a few hours, then go away for now aparent reason. I guess this indicates an environment problem, but I really can't figure out what, and ran out of ideas. I've tried:
different conda versions
different snakemake versions
different nodes on the cluster
ssh to the node it just failed on and try to reproduce the error
but nothing. Any ideas where to look? Thanks!
Following #Manavalan Gajapathy's advice, I added print(sys.version) commands both to the snakefile and the cluster script, and in both cases got a python version (2.7.5) different than the one indicated in the activated environment (3.7.5).
To cut a long story short - for some reason when I activate the environment within a PBS job, the environment path is added to the $PATH only after /usr/bin, which results in /usr/bin/python being used (which does not have the snakemake package). When the env is activated locally, the env path is added to the beginning of the $PATH, so the right python is used.
I still don't understand this behavior, but at least I could work around it by changing the #PATH. I guess this is not a very elegant solution, but it works for me.
A possibility could be that some cluster nodes don't find the path to the snakemake package so when a job is submitted to those nodes you get the error.
I don't know if/how that could happen but if that is the case you could find the incriminated nodes with something like:
for node in pbsnodes
do
echo $node
ssh $node 'python -c "from snakemake.utils import read_job_properties"'
done
(for nodes in pbsnodes iterates through the available nodes - I don't have the exact syntax right now but hopefully you get the idea). This at least would narrow down the problem a bit...

Python package, "Updating the INI File"

I am working with a python package that I installed called bacpypes for communicating with building automation equipment, right in the very beginning going thru the pip install & git clone of the repository; the readthedocs calls out to:
Updating the INI File
Now that you know what these values are going to be, you can configure the BACnet portion of your workstation. Change into the samples directory that you checked out earlier, make a copy of the sample configuration file, and edit it for your site:
$ cd bacpypes/samples
$ cp BACpypes~.ini BACpypes.ini
The problem that I have (is not enough knowledge) is there isn't a sample configuration file that I can see in bacpypes/samples directory. Its only a .py files nothing with an .ini extension or name of BACpypes.ini
If I open up the samples directory in terminal and run cp BACpypes~.ini BACpypes.ini I get an error cp: cannot stat 'BACpypes~.ini': No such file or directory
Any tips help thank you...
There's a sample .ini in the documentation, a couple of paragraphs after the commands you copied. It looks like this
[BACpypes]
objectName: Betelgeuse
address: 192.168.1.2/24
objectIdentifier: 599
maxApduLengthAccepted: 1024
segmentationSupported: segmentedBoth
maxSegmentsAccepted: 1024
vendorIdentifier: 15
foreignPort: 0
foreignBBMD: 128.253.109.254
foreignTTL: 30
I'm not sure why you couldn't copy BACpypes~.ini. I know tilda could be expanded by your shell so you could try to escape it with
cp BACpypes\~.ini BACpypes.ini
Though I assume it isn't needed now that you have a default configuration file.

python don't extract zipfile when ran through crontab

#!/usr/bin/python
import requests, zipfile, StringIO, sys
extractDir = "myfolder"
zip_file_url = "download url"
response = requests.get(zip_file_url)
zipDocument = zipfile.ZipFile(StringIO.StringIO(response.content))
zipinfos = zipDocument.infolist()
for zipinfo in zipinfos:
extrat = zipDocument.extract(zipinfo,path=extractDir)
System configuration
Ubuntu OS 16.04
Python 2.7.12
$ python extract.py
when I run the code on Terminal with above command, it works properly and create the folder and extract the file into it.
Similarly, when I create a cron job using sodu rights the code executes but don't create any folder or extracts the files.
crontab command:-
40 10 * * * /usr/bin/sudo /usr/bin/python /home/ubuntu/demo/directory.py > /home/ubuntu/demo/logmyshit.log 2>&1
also tried
40 10 * * * /usr/bin/python /home/ubuntu/demo/directory.py > /home/ubuntu/demo/logmyshit.log 2>&1
Notes :
I check the syslog, it says the cron is running successfully
The above code gives no errors
also made the python program executable by chmod +x filename.py
Please help where am I going wrong.
Oups, there is nothing really wrong in running a Python script in crontab, but many bad things can happen because the environment is not the one you are used to.
When you type in an interactive shell python directory.py, the PATH and all required PYTHON environment variable have been set as part of login and interactive shell initialization, and the current directory is your home directory by default or anywhere you currently are.
When the same command is run from crontab, the current directory is not specified (but may not be what you expect), PATH is only /bin:/usr/bin and python environment variables are not set. That means that you will have to tweak environment variables in crontab file until you get a correct Python environment, and set the current directory.
I had a very similar problem and it turned out cron didn’t like importing matplotlib, I ended up having to specify Agg backend. I figured it out by putting log statements after each line to see how far the program got before it crapped out. Of course, my log was empty which tipped me off that it crashed on imports.
TLDR: log each line inside the script

Categories