Python, unzip from stdin

Python, unzip from stdin - python

I have a python script that does AES decryption of an encrypted zip archive 'myzip.enc'
I'm trying to use the output of that decryption and use it as stdin for "unzip" command.
Here is my code:
decrypt = subprocess.Popen(['openssl', 'enc', '-d', '-aes-256-cbc', '-md', 'sha256', '-in', '{}'.format(inputFile), '-pass', 'pass:{}'.format(passw_hash)], stdout=subprocess.PIPE)
decompress = subprocess.Popen(['unzip', '-j', '-d', path_dict], stdin=decrypt.stdout)
inputFile is my encrypted archive 'myzip.enc'
passw_hash is the AES password
path_dict is a folder path where to extract the decrypted zip
I'm getting this in my terminal:
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
-p extract files to pipe, no messages -l list files (short format)
-f freshen existing files, create none -t test compressed archive data
-u update files, create if necessary -z display archive comment only
-v list verbosely/show version info -T timestamp archive to latest
-x exclude files that follow (in xlist) -d extract files into exdir
modifiers:
-n never overwrite existing files -q quiet mode (-qq => quieter)
-o overwrite files WITHOUT prompting -a auto-convert any text files
-j junk paths (do not make directories) -aa treat ALL files as text
-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields
-C match filenames case-insensitively -L make (some) names lowercase
-X restore UID/GID info -V retain VMS version numbers
-K keep setuid/setgid/tacky permissions -M pipe through "more" pager
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
Is there something wrong in my unzip command?
Thanks.
Edit: It seems from Here that it is impossible to PIPE an output of zip archive to the unzip command due to the fact that unzip needs to read some info from the physical file.
My workaround ended up being this code which works:
output = open('{}.zip'.format(inputFile), "wb")
decrypt = subprocess.Popen(['openssl', 'enc', '-d', '-aes-256-cbc', '-md', 'sha256', '-in', '{}'.format(inputFile), '-pass', 'pass:{}'.format(passw_hash)], stdout=output)
decompress = subprocess.Popen(['unzip', '{}.zip'.format(inputFile), '-d', path_dict[0]])
Is there a way to unzip and delete the zip archive on the same time or add an rm to the decompress line ?
Thanks.

Related

wget set file name in batch download

I am trying to download files using an input file (a.txt) which has URLs using the following commands
wget -i a.txt
URLs are like
https://domian.com/abc?api=123&xyz=323&title=newFile12
https://domian.com/abc?api=1243&xyz=3223&title=newFile13
I want to set the name of the file from the URL by using the title tag (for example in the above URL name of the file download need to be newFile12) but can't find any way around it.
In order to get it done, I have to write a python script (similar to this answer https://stackoverflow.com/a/28313383/10549469) and run one by one is there any other way around.

You can create a script on the fly and and pipe it to bash. A bit slower than wget -i but would preserve file names:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt
When you are satisfied with the results, you can pipe it to bash:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt | bash

Have a look at wget --content-disposition or for loop with wget -O <outputfile name> <url>
Following command downloads the file with filename as provided by server (vim-readonly-1.1.tar.gz) instead of download_script.php?src_id=27233.
wget --content-disposition https://www.vim.org/scripts/download_script.php?src_id=27233

shell script to convert windows file to unix using dos2unix

I'm writing a simple shell script to make use of dos2unix command to convert Windows-format files to Unix format as and when it arrives in my folder.
I used to use iconv in the script and automate it to get one encoding converted to the other. But now I need to use dos2unix instead of iconv.
I don't want the original file to be overwritten (it must be archived in the archive folder). This was straightforward with iconv; how can I do the same with dos2unix?
This is my script:
cd /myfolder/storage
filearrival_dir= /myfolder/storage
filearchive_dir=/myfolder/storage/archive
cd $filearrival_dir
echo " $filearrival_dir"
for file in File_October*.txt
do
iconv -f UTF16 -t UTF8 -o "$file.new" "$file" &&
mv -f "$file.new" "$file".`date +"%C%y%m%d"`.txt_conv &&
mv $file $filearchive_dir/$file
done
The above looks for files matching File_Oct*.txt, converts to the desired encoding and renames it with the timestamp and _conv at the end. This script also moves the original file to the archive.
How can I replace iconv in the above script with dos2unix and have the files archived and do the rest just like I did here?

You can "emulate" dos2unix using tr.
tr -d '\015' infile > outfile

If this is just about using dos2unix so it doesn't over-write the original file, just use
-n infile outfile
My recollection is that dos2unix writes UTF-8 by default, so you probably don't have to take any special action so far as encoding is concerned.

Unzip a file and copy the contents to different folder based on condition

Folks,
i have a requirement to unzip file and copy the contents of the subdirectories of the unzipped file into different location
For Example:
Filename: temp.zip
unzip temp.zip
we have folder structure like this under temp
temp/usr/data/keanu/*.pdf's
temp/usr/data/reaves/*.pdf's
my requirement is to go to the unzipped folders and copy
/keanu *.pdf's to /desti1/
and
/reaves/*.pdf's to /dest2/
i have tried the below:
unzip.sh <filename>
filename=$1
unzip $filename
//i have stuck here i need to go to unzip folder and find the path and copy those files to different destination
UPDATE on My script unzip the file and recursively copy recommended type of files to destination folder without changing the (by preserving the directory structure)
Filename: unzip.sh
#! /bin/bash
#shopt -s nullglob globstar
filename="$1"
var1=$(sed 's/.\{4\}$//' <<< "$filename")
echo $var1
unzip "$filename"
cd "$(dirname "$filename")"/"$var1"/**/includes
#pwd
#use -udm in cpio to overwrite
find . -name '*.pdf' | cpio -pdm /tmp/test/includes
cd -
cd "$(dirname "$filename")"/"$var1"/**/global
#pwd
find . -name '*.pdf' | cpio -pdm /tmp/test/global

In case the zip is always structured the same:
#! /bin/bash
shopt -s nullglob
filename="$1"
unzip "$filename"
cp "$(dirname "$filename")"/temp/usr/data/keanu/*.pdf /desti1/
cp "$(dirname "$filename")"/temp/usr/data/reaves/*.pdf /desti2/
In case the structure changes and you only know that there are directories keanu/ and reaves/ somewhere:
#! /bin/bash
shopt -s nullglob globstar
filename="$1"
unzip "$filename"
cp "$(dirname "$filename")"/**/keanu/*.pdf /desti1/
cp "$(dirname "$filename")"/**/reaves/*.pdf /desti2/
Both scripts do what you specified but not more than that. The unzipped files are copied over, that is, the original unzipped files will still lay around after the script terminates.

Python solution:
import zipfile,os
zf = zipfile.ZipFile("temp.zip")
for f in zf.namelist():
if not f.endswith("/"):
dest = "dest1" if os.path.basename(os.path.dirname(f))=="keanu" else "dest2"
zf.extract(f,path=os.path.join(dest,os.path.basename(f)))
iterate on the contents of the archive
filter out directories (end with "/")
if last dirname is "keanu", select destination 1 else the other
extract directly under the selected destination

How to compute shasum of every file in a tar file

I'm looking for a way to compute an sha-256 value for every file contained in a tar file. The problem is that the tar are 300GB with over 200,000 contained files.
It would be possible to do this in bash a couple of different ways.
Extract and then use find
tmp=`mktmp --directory extract_XXX`
cd "$tmp"
tar -xf "$tarfile"
find "$tmp" -type f -exec shasum -ba 256 {} +
cd ..
rm -rf "$tmp"
This method is bad because it requires 300GB space space to work and is slow because it has to copy the data before computing the sum
List the tar file and compute the individual sums
tar -tf "$tarfile" awk '/\/$/ {next} {print $0}' | while read file ; do
sum=`tar -xOf "$tarfile" "$file" | shasum -ba 256`
echo "${sum%-}${file}"
done
This requires less disk space but is much slower
How can I do this in a single pass of the tar file without extracting it to a temp directory?
I've tagged this as bash and python... The current code is bash but I'm flexable about language.

The tar utility knows its way:
tar xvf "$tarfile" --to-command 'shasum -ba 256'
The -v flag is important because tar sends each file at the standard input of the command. It will output the file on one line an the SHA sum on the next, but you can further process that very easily.
EDIT: here is the complete shell only code to output the SHA256s in one single tar file pass:
shopt -s extglob
tar xvf "$tarfile" --to-command 'shasum -ba 256' | \
while read L; do
[[ $L == *" *-" ]] && echo $SHAFILE ${L:0:64} || SHAFILE=$L
done
For the glibc source archive, the output would look like:
glibc-2.24/.gitattributes c3f8f279e7e7b0020028d06de61274b00b6cb84cfd005a8f380c014ef89ddf48
glibc-2.24/.gitignore 35bcd2a1d99fbb76087dc077b3e754d657118f353c3d76058f6c35c8c7f7abae
glibc-2.24/BUGS 9b2d4b25c8600508e1d148feeaed5da04a13daf988d5854012aebcc37fd84ef6
glibc-2.24/CONFORMANCE 66b6e97c93a2381711f84f34134e8910ef4ee4a8dc55a049a355f3a7582807ec
Edit by OP:
As a one-liner this can be done as:
tar xf "$tarfile" --to-command 'bash -c "sum=`shasum -ba 256`; echo \"\${sum%-}$TAR_FILENAME\""'
or (on Ubuntu 20.04 and higher):
tar xf "$tarfile" --to-command 'bash -c "sum=`shasum -ba 256 | cut -d \" \" -f 1`; echo \"\${sum%-}$TAR_FILENAME\""'
Manual Page here: https://www.gnu.org/software/tar/manual/tar.html#SEC87

I don't know how fast will it be, but in python it can be done the following way:
import tarfile
import hashlib
def sha256(flo):
hash_sha256 = hashlib.sha256()
for chunk in iter(lambda: flo.read(4096), b'')
hash_sha256.update(chunk)
return hash_sha256.hexdigest()
with tarfile.open('/path/to/tar/file') as mytar:
for member in mytar.getmembers():
with mytar.extractfile(member) as _file:
print('{} {}'.format(sha256(_file), member.name))

Use curl to download multiple files

I have to use cURL on Windows using python script. My goal is: using python script get all files from remote directory ... preferably into local directory. After that I will compare each file with the files stored locally. I am able to get one file at a time but I need to get all of the files from remote directory.
Could someone please advice how to get multiple files?
I use this command:
curl.exe -o file1.txt sftp:///dir1/file1.txt -k -u user:password
thanks

I haven't tested this, but I think you could just try launching each shell command as a separate process to run them simultaneously. Obviously, this might be a bad idea if you have a large set of files, so you might need to manage that more carefully. Here's some untested code, and you'd need to edit the 'cmd' variable in the get_file function, of course.
from multiprocessing import Process
import subprocess
def get_file(filename):
cmd = '''curl.exe -o {} sftp:///dir1/{} -k -u user:password'''.format(filename, filename)
subprocess.check_output(cmd, shell=True, stderr=subprocess.STDOUT) # run the shell command
files = ['file1.txt', 'file2.txt', 'file3.txt']
for filename in files:
p = Process(target=get_file, args=(filename,)) # create a process which passes filename to get_file()
p.start()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, unzip from stdin - python

Related

wget set file name in batch download

shell script to convert windows file to unix using dos2unix

Unzip a file and copy the contents to different folder based on condition

How to compute shasum of every file in a tar file

Use curl to download multiple files

Categories

Resources