I have a bash script that extracts a tar file:
tar --no-same-owner -xzf "$FILE" -C "$FOLDER"
--no-same-owner is needed because this script runs as root in Docker and I want the files to be owned by root, rather than the original uid/gid that created the tar
I have changed the script to a python script, and need to add the --no-same-owner flag functionality, but can't see an option in the docs to do so
with tarfile.open(file_path, "r:gz") as tar:
tar.extractall(extraction_folder)
Is this possible? Or do I need to run the bash command as a subprocess?
Related
I'm trying to solve the same exact same problem illustrated here:
How to commit executable shell scripts with Git on Windows
"If you develop software involving shell scripts on Windows, which should also run on UNIX, you have a problem.
Windows filesystems like NTFS do not support UNIX permission bits.
Whenever you create new shell scripts on Windows, or rename existing ones (which may have been executable at the time of check-out), these won’t be executable. When you push the code, these scripts won’t run a UNIX-based machine."
The given precommit hook-script which is proposed as a solution to the aforementioned problem is written in python.
#!/usr/bin/env python
import subprocess
if __name__ == '__main__':
output = subprocess.check_output(["git", "ls-files", "-s", "--", "*.sh"], shell=True).decode("utf-8") # type: str
files_to_fix = []
for line in output.splitlines():
# Example for "line": '100644 82f6a7d558e1b38c8b47ec5084fe20f970f09981 0 test-update.sh'
entry = line.replace('\t', ' ').split(" ", maxsplit=3)
mode = entry[0][3:] # strips the first 3 chars ("100") which we don't care about
filename = entry[3]
if mode == "644":
files_to_fix.append(filename)
for file_path in files_to_fix:
# git update-index --chmod=+x script.sh
subprocess.check_call(["git", "update-index", "--chmod=+x", file_path], shell=True)
I'm not proficient in bash to rewrite it in bash. Is it possible to achieve this in bash at all?
With a bash script hook:
#!/usr/bin/env bash
files_to_fix=()
while read -r -d '' mode _ _ file_path; do
[[ $mode == *644 ]] && files_to_fix+=("$file_path")
done < <(git ls-files --stage -z '*.sh')
git update-index --chmod=+x -- "${files_to_fix[#]}"
Or with a POSIX shell:
#!/usr/bin/env sh
git ls-files --stage '*.sh' | while read -r mode _ _ file_path; do
case $mode in
*644) git update-index --chmod=+x -- "$file_path" ;;
*) ;;
esac
done
This one liner uses find to identify files that don't have the execute bit set instead of looking for permission 644, so it works with unusual patterns like 640 or even 200 (write only!):
find . ! -perm -u+x -type f -name "*.sh" -exec git update-index --chmod=+x -- {} \;
Save it in .git/hooks/pre-commit (and make your hook executable!)
This is my code to download and unzip files from google drive.
fileId = drive.CreateFile({'id': '1tQq-ihnTbRj6GlObBrm17Ob6j1XHHJL2'})
print (fileId['title'])
fileId.GetContentFile('tweets_research.zip')
!unzip tweets_research.zip -d ./
But there are already some files and I want to replace them. It is giving me
this option.
But it doesn't matter whatever I press on my keyboard it's not working.
Use the -o option to overwrite files, e.g.,
!unzip -o tweets_research.zip -d ./
You can echo your choice as input to your command by using a pipe.
in case of rename:
!echo "r"| unzip tweets_research.zip -d ./
Could you please show me how to implement git hook?
Before committing, the hook should run a python script. Something like this:
cd c:\my_framework & run_tests.py --project Proxy-Tests\Aeries \
--client Aeries --suite <Commit_file_Name> --dryrun
If the dry run fails then commit should be stopped.
You need to tell us in what way the dry run will fail. Will there be an output .txt with errors? Will there be an error displayed on terminal?
In any case you must name the pre-commit script as pre-commit and save it in .git/hooks/ directory.
Since your dry run script seems to be in a different path than the pre-commit script, here's an example that finds and runs your script.
I assume from the backslash in your path that you are on a windows machine and I also assume that your dry-run script is contained in the same project where you have git installed and in a folder called tools (of course you can change this to your actual folder).
#!/bin/sh
#Path of your python script
FILE_PATH=tools/run_tests.py/
#Get relative path of the root directory of the project
rdir=`git rev-parse --git-dir`
rel_path="$(dirname "$rdir")"
#Cd to that path and run the file.
cd $rel_path/$FILE_PATH
echo "Running dryrun script..."
python run_tests.py
#From that point on you need to handle the dry run error/s.
#For demonstrating purproses I'll asume that an output.txt file that holds
#the result is produced.
#Extract the result from the output file
final_res="tac output | grep -m 1 . | grep 'error'"
echo -e "--------Dry run result---------\n"${final_res}
#If a warning and/or error exists abort the commit
eval "$final_res" | while read -r line; do
if [ $line != "0" ]; then
echo -e "Dry run failed.\nAborting commit..."
exit 1
fi
done
Now every time you fire git commit -m the pre-commit script will run the dry run file and abort the commit if any errors have occured, keeping your files in the stagin area.
I have implemented this in my hook. Here is the code snippet.
#!/bin/sh
#Path of your python script
RUN_TESTS="run_tests.py"
FRAMEWORK_DIR="/my-framework/"
CUR_DIR=`echo ${PWD##*/}`
`$`#Get full path of the root directory of the project under RUN_TESTS_PY_FILE
rDIR=`git rev-parse --git-dir --show-toplevel | head -2 | tail -1`
OneStepBack=/../
CD_FRAMEWORK_DIR="$rDIR$OneStepBack$FRAMEWORK_DIR"
#Find list of modified files - to be committed
LIST_OF_FILES=`git status --porcelain | awk -F" " '{print $2}' | grep ".txt" `
for FILE in $LIST_OF_FILES; do
cd $CD_FRAMEWORK_DIR
python $RUN_TESTS --dryrun --project $CUR_DIR/$FILE
OUT=$?
if [ $OUT -eq 0 ];then
continue
else
return 1
fi
done
I have a directory in HDFS that contains roughly 10,000 .xml files. I have a python script "processxml.py" that takes a file and does some processing on it. Is it possible to run the script on all of the files in the hdfs directory, or do I need to copy them to local first in order to do so?
For example, when I run the script on files in a local directory I have:
cd /path/to/files
for file in *.xml
do
python /path/processxml.py
$file > /path2/$file
done
So basically, how would I go about doing the same, but this time the files are in hdfs?
You basically have two options:
1) Use hadoop streaming connector to create a MapReduce job (here you will only need the map part). Use this command from the shell or inside a shell script:
hadoop jar <the location of the streamlib> \
-D mapred.job.name=<name for the job> \
-input /hdfs/input/dir \
-output /hdfs/output/dir \
-file your_script.py \
-mapper python your_script.py \
-numReduceTasks 0
2) Create a PIG script and ship your python code. Here is a basic example for the script:
input_data = LOAD '/hdfs/input/dir';
DEFINE mycommand `python your_script.py` ship('/path/to/your/script.py');
updated_data = STREAM input_data THROUGH mycommand PARALLEL 20;
STORE updated_data INTO 'hdfs/output/dir';
If you need to process data in your files or move/cp/rm/etc. them around the file-system then PySpark (Spark with Python interface) would be one of the best options (speed, memory).
I want to use Fabric to chown all the files in a directory - including hidden files. Since Fabric uses the sh shell and not bash and sh doesn't know shopt, I can't do:
local('shopt -s dotglob')
local('sudo chown -R name dir')
I don't think there is a way to use the bash shell in Fabric. Is there another way to do this?
How about using another strategy to recursively chown everything in the directory, including hidden files and directories:
local('sudo find dir -exec chown name {} \;')
Hope that helps.