I'm trying to install local awsglue package for developing purpose on my local machine (Windows + Git Bash)
https://github.com/awslabs/aws-glue-libs/tree/glue-1.0
https://support.wharton.upenn.edu/help/glue-debugging
Spark directory and py4j mentioned in below error does exist but still getting error
Directory from which I trigger the sh is below:
user#machine xxxx64~/Desktop/lm_aws_glue/aws-glue-libs-glue-1.0/bin
$ ./glue-setup.sh
ls: cannot access 'C:\Spark\spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip': No such file or directory
rm: cannot remove 'PyGlue.zip': No such file or directory
./glue-setup.sh: line 14: zip: command not found
ls result:
$ ls -l
total 7
-rwxr-xr-x 1 n1543781 1049089 135 May 5 2020 gluepyspark*
-rwxr-xr-x 1 n1543781 1049089 114 May 5 2020 gluepytest*
-rwxr-xr-x 1 n1543781 1049089 953 Mar 5 11:10 glue-setup.sh*
-rwxr-xr-x 1 n1543781 1049089 170 May 5 2020 gluesparksubmit*
Original install code requires few tweaks and works ok. Still need a workaround for zip.
#!/usr/bin/env bash
#original code
#ROOT_DIR="$(cd $(dirname "$0")/..; pwd)"
#cd $ROOT_DIR
#re-written
ROOT_DIR="$(cd /c/aws-glue-libs; pwd)"
cd $ROOT_DIR
SPARK_CONF_DIR=$ROOT_DIR/conf
GLUE_JARS_DIR=$ROOT_DIR/jarsv1
#original code
#PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
#PYTHONPATH=`ls $SPARK_HOME/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
#re-written
PYTHONPATH="/c/Spark/spark-3.1.1-bin-hadoop2.7/python/:$PYTHONPATH"
PYTHONPATH=`ls /c/Spark/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
# Generate the zip archive for glue python modules
rm PyGlue.zip
zip -r PyGlue.zip awsglue
GLUE_PY_FILES="$ROOT_DIR/PyGlue.zip"
export PYTHONPATH="$GLUE_PY_FILES:$PYTHONPATH"
# Run mvn copy-dependencies target to get the Glue dependencies locally
#mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies
export SPARK_CONF_DIR=${ROOT_DIR}/conf
mkdir $SPARK_CONF_DIR
rm $SPARK_CONF_DIR/spark-defaults.conf
# Generate spark-defaults.conf
echo "spark.driver.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
echo "spark.executor.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
# Restore present working directory
cd -
Related
I want to process a set of HDR files with an Executable (e.g., falsecolor2.exe) file on google colab.
The source file is here: https://github.com/mostaphaRoudsari/honeybee/blob/master/resources/falsecolor2.exe?raw=true
sample HDR files:
http://www.anyhere.com/gward/hdrenc/pages/originals.html
The executable takes an HDR file with some arguments and generates a new HDR file.
On my local machine and drive, the following code works OK:
import os
os.system(r'D:\falsecolor2.exe -i D:\test.hdr -s 250.0 -n 10 -mask 0.1 -l lux -m 179 -lp EN -z > D:\test#fc.hdr')
I am not sure how to create a similar process in colab; after mounting the gdrive, the following code generates a 0 byte not-working HDR in my gdrive and returns error code 32256.
import os
os.system('/content/drive/My\ Drive/falsecolor2.exe -i /content/drive/My\ Drive/MOSELEY\ IMAGES/test.hdr -s 250.0 -n 10 -mask 0.1 -l lux -m 179 -lp EN -z > /content/drive/My\ Drive/test#fc.hdr')
I read some threads on shell and linux executables but could not replicate any of them successfully.
You can install Radiance in Google Colab like this:
# Download the Linux compiled version of radiance from Github (e.g. 5.3, latest official release at the moment):
!wget -O radiance.zip https://github.com/LBNL-ETA/Radiance/releases/download/012cb178/Radiance_012cb178_Linux.zip
# Unzip it
!unzip radiance.zip
# Extract the tar.gz to /usr/local/radiance
!tar -xvf radiance-5.3.012cb17835-Linux.tar.gz --strip-components=1 -C /
# Add /usr/local/radiance/bin to the PATH environment variable
path = %env PATH
%env PATH=/usr/local/radiance/bin:$path
# Set the RAYPATH environment variable to /usr/local/radiance/lib
%env RAYPATH=/usr/local/radiance/lib
I ran !lsb_release -a to find out the Linux distribution in Google Colab and it said that it was Ubuntu 18.04. Unfortunately, Radiance does not seem to be available for that version, but only for 16.04. That is why getting it from Github seems to be the next simplest solution. See radiance in Ubuntu Packages:
Exact hits
Package radiance
xenial (16.04LTS) (graphics): Lighting Simulation and Rendering System [universe]
4R1+20120125-1.1: amd64 arm64 armhf i386 powerpc ppc64el s390x
Then I tried to run your falsecolor command using one of the sample images you linked and found that the -lp and -z options are not available:
# Get a sample HDR file
!wget -O input.hdr http://www.anyhere.com/gward/hdrenc/pages/img/Apartment_float_o15C.hdr
# Try original command
!falsecolor -i input.hdr -s 250.0 -n 10 -mask 0.1 -l lux -m 179 -lp EN -z > output.hdr
# Output:
# bad option "-lp"
# Remove option -lp
!falsecolor -i input.hdr -s 250.0 -n 10 -mask 0.1 -l lux -m 179 -z > output.hdr
# Output:
# bad option "-z"
If you remove those options the command runs successfully:
# Remove option -z
!falsecolor -i input.hdr -s 250.0 -n 10 -mask 0.1 -l lux -m 179 > output.hdr
# List output file
!ls -lh output.hdr
# Output:
# -rw-r--r-- 1 root root 4.8M Mar 31 02:57 output.hdr
See a demo in this colab.
chd.sh
#! /bin/bash
cd django/hellodjango
exec bash
python manage.py runserver
chd.py
# a=`python chd.py`;cd $a
import os
new_dir = "django/hellodjango"
os.chdir(new_dir)
are the two ways I have tried.
Also, on terminal I have tried,
. chd.sh
./chd.sh
. ./chd.sh
I have also tried to assign to variable and then run on terminal but no success.
Spent over 4 hours trying multiple methods given on stackoverflow.com but no success yet.
The only thing that has worked yet is,
alias mycd='cd django/hellodjango'
But I will have to copy paste it everytime.
alias myrun = `cd django/hellodjango && python manage.py runserver`
And,
alias myrun = `cd django/hellodjango; python manage.py runserver`
doesn't work.
This is just a sample, there are so many django commands that I have to use repeatedly. Appreciate if you have read all this way.
If you know the link where this is discussed, please attach the link, as I was not able to find after hours of search.
Edit:
/storage/emulated/0 $
This is what the prompt appears like.
/storage/emulated/0/django/hellodjango
This is the path.
/storage/emulated/0 $ cd django/hellodjango
/storage/emulated/0/django/hellodjango $ python manage.py
runserver
Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
July 25, 2020 - 19:08:42
Django version 3.0.7, using settings 'hellodjango.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Individually works fine.
Edit:
/storage/emulated/0 $ chmod u+x chd.sh /storage/emulated/0 $ chmod u+x
rn.sh /storage/emulated/0 $ ./chd.sh ./chd.sh: cd: line 2: can't cd t:
No such file or directory /storage/emulated/0 $ chmod u+x chd.py
/storage/emulated/0 $ a=python chd.py;cd $a
~/data/ru.iiec.pydroid3/app_HOME $
Edit:
/data/user/0/tech.ula/files/support/dbclient: Caution, skipping
hostkey check for localhost
subham#localhost's password:
subham#localhost:~$ ls
subham#localhost:~$ cd
subham#localhost:~$ pwd
/home/subham
subham#localhost:~$ pkg install miniconda
-bash: pkg: command not found
subham#localhost:~$ apt install miniconda
Reading package lists... Done Building dependency tree
Reading state information... Done
E: Unable to locate package miniconda
subham#localhost:~$
subham#localhost:~$ cd ..
subham#localhost:/home$ cd ..
subham#localhost:/$ ls
bin dev host-rootfs mnt root srv
sys var boot etc lib opt run storage tmp data home
media proc sbin support usr
subham#localhost:/$ cd ..
subham#localhost:/$ cd sys
subham#localhost:/sys$ ls
ls: cannot open
directory '.': Permission denied
subham#localhost:/sys$ cd..
-bash: cd..: command not found
subham#localhost:/sys$ cd ..
subham#localhost:/$ cd storage
subham#localhost:/storage$ ls internal
subham#localhost:/storage$ cd internal
subham#localhost:/storage/internal$ ls
subham#localhost:/storage/internal$ ls -l total 0
subham#localhost:/storage/internal$ cd 0
-bash: cd: 0: No such file or directory subham#localhost:/storage/internal$
subham#localhost:/$ chmod -R 777 /host-rootfs
chmod: changing permissions of '/host-rootfs': Read-only file system
chmod: cannot read directory '/host-rootfs': Permission denied
subham#localhost:/$
https://github.com/CypherpunkArmory/UserLAnd/issues/46
A simple Dockerfile that executes a shell script as entrypoint like this
FROM python:3
WORKDIR /app
COPY . .
RUN chmod +x entrypoint.sh
CMD ["python", "/app/src/api.py"]
ENTRYPOINT ["./entrypoint.sh"]
works: entrypoint.sh is called, which itself executes python /app/src/api.py on a RPI 3.
entrypoint.sh
#!/bin/bash
echo starting entrypoint
set -x
exec "$#"
Since I don't need anything Debian/Raspbian specific, I'd like to use the alpine image to reduce image size. So I changed FROM python:3 to FROM python:3-alpine without any further changes.
But now the container doesn't start:
api_1 | standard_init_linux.go:211: exec user process caused "no such file or directory"
test_api_1 exited with code 1
Why doesn't this work on Alpine? I don't see any problem since /app/entrypoint.sh exists and it's also executable:
Step 5/7 : RUN ls -lh
---> Running in d517a83c5b9b
total 12K
-rwxr-xr-x 1 root root 54 Jul 11 18:35 entrypoint.sh
drwxr-xr-x 2 root root 4.0K Jul 11 18:48 src
In a similar question, the image was buld on a non-arm system. This is not the case for me, I'm building directly on the RPI.
Not sure if this is Raspbian or Alpine related. I also tried using the absolute path ENTRYPOINT ["/app/entrypoint.sh"]: Still working on python:3 image, but broken in python:3-alpine.
The problem was that my entrypoint uses bash as shebang without thinking about it:
#!/bin/bash
echo starting entrypoint
set -x
exec "$#"
But alpine doesn't include GNU Bash, which is installed on Debian/Raspbian by default. The no such file or directory was specified to the interpreter from the shebang, not the entrypoint script itself.
Since my script is not indent to make something that requires bash features, I just changed #!/bin/bash to #!/bin/sh and it could be also started on the alpine image.
I'm setting up a multi-user airflow cluster for a team of data-scientists, with various usages for the DAGs (ETL, NLP, ML, NN...), some of them having specific python dependencies.
I can't simply add all the DAGs dependencies at system level. Of course, I can setup a baseline for common uses, but for specific needs it's gonna be very helpful to rely on that zipped DAG feature.
So, in order to address that multi-context problem, I'm testing the packaged DAG feature of Airflow 1.9.0 (on Ubuntu 16.04).
I'm following the example to test it with an arbitrary pypi package.
I randomly picked a python module (python-crontab). (prior to this I tried with more beefy modules but it took longer to reproduce the tests)
test scenario: being able to import that module and print its version in a zipped DAG
here's the way I did it:
$ virtualenv venv --python=python3
$ source venv/bin/activate
(venv) $ mkdir contents && cd contents
$ pip install --install-option="--install-lib=$PWD" python-crontab
$ cp ../my_dag.py .
$ zip -r ../test_zip_2.zip *
$ cp ../test_zip_2.zip /path/to/dags
$ journalctl -f -u airflow-scheduler.service
(...)
WARNING - No viable dags retrieved from /path/to/dags/test_zip_2.zip
contents of my DAG:
import crontab
import airflow.utils.dates as a_dates
from airflow.operators.python_operator import PythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.models import DAG
from pprint import pprint
args = {
'owner': 'airflow',
'start_date': a_dates.days_ago(1)
}
def print_context(ds, **kwargs):
pprint(kwargs)
print(ds)
print(crontab.__version__)
return 'Whatever you return gets printed in the logs'
with DAG(dag_id='test_zip', default_args=args, schedule_interval=None) as dag:
(
PythonOperator(
task_id='print_the_context',
provide_context=True,
python_callable=print_context,
)
>> DummyOperator(
task_id='do_nothing'
)
)
After checking the code, it appears that the logic that parses the ZIP file exits immediately if it finds a .py file that doesn't contain the words "DAG" and "airflow".
Problem is, that the method I described above actually generates other .py files at root of the archive.
$ ll
total 100
drwxr-xr-x 1 vagrant vagrant 442 Jun 1 14:48 ./
drwxr-xr-x 1 vagrant vagrant 306 Jun 1 15:30 ../
-rw-rw-r-- 1 vagrant vagrant 3904 Dec 30 2015 cronlog.py
-rw-rw-r-- 1 vagrant vagrant 44651 May 25 16:44 crontab.py
-rw-rw-r-- 1 vagrant vagrant 4438 Dec 28 2015 crontabs.py
drwxr-xr-x 1 vagrant vagrant 476 Jun 1 14:26 dateutil/
-rw-r--r-- 1 vagrant vagrant 6148 Jun 1 14:24 .DS_Store
drwxr-xr-x 1 vagrant vagrant 204 Jun 1 14:26 __pycache__/
drwxr-xr-x 1 vagrant vagrant 272 Jun 1 14:26 python_crontab-2.3.3-py3.5.egg-info/
drwxr-xr-x 1 vagrant vagrant 306 Jun 1 14:26 python_dateutil-2.7.3-py3.5.egg-info/
drwxr-xr-x 1 vagrant vagrant 238 Jun 1 14:26 six-1.11.0-py3.5.egg-info/
-rw-rw-r-- 1 vagrant vagrant 30888 Sep 17 2017 six.py
-rw-r--r-- 1 vagrant vagrant 832 Jun 1 14:48 my_dag.py
Many of the well-known packages I tested generate these top-level .py files though. Eg. installing scrapy, numpy, pandas, etc generated the same mess.
So, what could be my options (without forking airflow ^_^)?
Do I correctly understand this feature?
Thanks for your help!
For those who reached this article now should follow the updated instruction below, as of v1.10.3.
Note
When searching for DAGs, Airflow only considers python files that contain the strings "airflow" and "DAG" by default. To consider all python files instead, disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
https://github.com/apache/airflow/blob/1.10.3/docs/concepts.rst#dags
https://github.com/apache/airflow/blob/1.10.3/UPDATING.md#new-dag_discovery_safe_mode-config-option-1
EDIT: The fix has been merged into version 1.10-stable, should no longer happen.
Unfortunately it looks like what you want is not going to be possible as of the current state of the code.
I've made a pull request on this issue here in Apache Airflow's GitHub; if you're interested in following along.
Someone messed up symbolic links on redhat server and I can't use python now, I tried a lot of things. When I try to run python I get python: command not found.
I have this information:
alternatives --display python
python - status is auto.
link currently points to /usr/bin/python2.4
/usr/bin/python2.4 - priority 1
Current `best' version is /usr/bin/python2.4.
When I try to change ln -sf /usr/bin/python /usr/bin/python2.4 I get the following:
ln: accessing `/usr/bin/python2.4': Too many levels of symbolic links
When I remove /usr/bin/python it doesn't help as well.
I also cheched it with these commands:
readlink /usr/bin/python
/etc/alternatives/python
readlink /usr/bin/python2.4
/usr/bin/python
readlink python
/usr/bin/python
It looks like everything should work fine. Any suggestions?
I think you have the ln command backwards. The reason you are getting this error is because you are creating a link python2.4 that points to python, but python is a link pointing to python2.4. Reverse your source and destination in the ln command and it should work.
I've never used alternatives, but you probably shouldn't be manually editing these symlinks.
Edit:
As I mentioned in my comment, you may have overwritten the original python binary. On a RHEL5 system I have access to here's what the /usr/bin directory looks like:
$ ls -l /usr/bin/python*
-rwxr-xr-x 2 root root 8304 Oct 23 2012 /usr/bin/python
lrwxrwxrwx 1 root root 6 Jan 11 2013 /usr/bin/python2 -> python
-rwxr-xr-x 2 root root 8304 Oct 23 2012 /usr/bin/python2.4
And if you look at the inodes of the two non-symlink files you'll see that they are the same file:
$ stat -c %i /usr/bin/python
3290164
$ stat -c %i /usr/bin/python2.4
3290164
So you need to find the original python binary then we can figure out how to link them the original way. And again, I've never used alternatives so maybe it does some magic of moving the binaries around, but I doubt it.