unzip operation taking several hours - python

I am using the following shell script to loop over 90 zip files & unarchive them on a Linux box hosted with Hostinger (Shared web hosting)
#!/bin/bash
SOURCE_DIR="<path_to_archives>"
cd ${SOURCE_DIR}
for f in *.zip
do
# unzip -oqq "$f" -d "${f%.zip}" &
python3 scripts/extract_archives.py "${f}" &
done
wait
The python script being called by the above shell script is below -
import shutil
import sys
source_path = "<path to source dir>"
def extract_files(in_file):
shutil.unpack_archive(source_path + in_file, source_path + in_file.split('.')[0])
print('Extracted : ', in_file)
extract_files(sys.argv[1].strip())
Irrespective of whether I use the inbuilt unzip command or a python, it's taking about 2.5 hours to unzip all the files. unarchiving all the zip files results 90 folders with 170000 files overall. I would've thought anywhere between 15/20 min is reasonably acceptable timeframe.
I've tried a few different variations in that, I have tried just tarring the folders instead of zipping them up thinking just un-tarring may be faster than unzipping. I've used tar command from source server to transfer the files over ssh & untar in memory something like this -
time tar zcf - . | ssh -p <port> user#host "tar xzf - -C <dest dir>"
Nothing is helping. I am open to using any other programming language like Perl, Go or others too if necessary to speed things up.
Please can someone help me solve this performance problem.

Thank you everyone for your answers. As you indicated, this was to do with throttling on the servers in a hosted environment

Related

Run bash script that makes a folder based on python scriptname

I have a question about running a bash script in a python file. First, let me explain the situation. I have multiple python files: for each city in Mali, I am able to create a weather forecast in python (therefore is have created multiple python files: e.g. gao.py, bamba.py, etc.) One of the steps in each python file, is to run a bash script that creates audiofiles in .wav format, and places them in a folder /converted. Now, my question the following:
How do I change the bash script in such a way, that when gao.py is running, a folder /converted/gao will be created, and when for example bamba.py is running, a folder converted/bamba is created?
This is the current bash script:
#!/bin/bash
if [ ! -d converted/gao ]
then
mkdir converted/gao;
fi;
However, when I run the script above, each city will places his files in converted/gao, which is not what I want. I hope somebody knows how to fix this issue.
Use __ file__ for accessing the filename of script in python. And then pass it as argument to bash script. After that access the argument in bash script using $1.
as following samples:
Python Script hello.py
import os
script_filename = __file__
# function to execute the shell script
os.system("./bashfile.sh {}".format(script_filename))
Sample Bash Script bashfile.sh
#!/bin/bash
script_name=$1
# now use the script_name as you like in your script.
echo $script_name

Execute bash commands that are within a list from python

i got this list
commands = ['cd var','cd www','cd html','sudo rm -r folder']
I'm trying to execute one by one all the elements inside as a bash script, with no success. Do i need a for loop here?
how to achieve that?, thanks all!!!!
for command in commands:
os.system(command)
is one way you could do it ... although just cd'ing into a bunch of directories isnt going to have much impact
NOTE this will run each command in its own subshell ... so they would not remember their state (ie any directory changes or environmental variables)
if you need to run them all in one subshell than you need to chain them together with "&&"
os.system(" && ".join(commands)) # would run all of the commands in a single subshell
as noted in the comments, in general it is preferred to use subprocess module with check_call or one of the other variants. however in this specific instance i personally think that you are in a 6 to 1 half a dozen to the other, and os.system was less typing (and its gonna exist whether you are using python3.7 or python2.5 ... but in general use subprocess exactly which call probably depends on the version of python you are using ... there is a great description in the post linked in the comments by #triplee why you should use subprocess instead)
really you should reformat your commands to simply
commands = ["sudo rm -rf var/www/html/folder"] note that you will probably need to add your python file to your sudoers file
also Im not sure exactly what you are trying to accomplish here ... but i suspect this might not be the ideal way to go about it (although it should work...)
This is just a suggestion, but if your just wanting to change directories and delete folders, you could use os.chdir() and shutil.rmtree():
from os import chdir
from os import getcwd
from shutil import rmtree
directories = ['var','www','html','folder']
print(getcwd())
# current working directory: $PWD
for directory in directories[:-1]:
chdir(directory)
print(getcwd())
# current working directory: $PWD/var/www/html
rmtree(directories[-1])
Which will cd three directories deep into html, and delelte folder. The current working directory changes when you call chdir(), as seen when you call os.getcwd().
declare -a command=("cd var","cd www","cd html","sudo rm -r folder")
## now loop through the above array
for i in "${command[#]}"
do
echo "$i"
# or do whatever with individual element of the array
done
# You can access them using echo "${arr[0]}", "${arr[1]}" also

Python package, "Updating the INI File"

I am working with a python package that I installed called bacpypes for communicating with building automation equipment, right in the very beginning going thru the pip install & git clone of the repository; the readthedocs calls out to:
Updating the INI File
Now that you know what these values are going to be, you can configure the BACnet portion of your workstation. Change into the samples directory that you checked out earlier, make a copy of the sample configuration file, and edit it for your site:
$ cd bacpypes/samples
$ cp BACpypes~.ini BACpypes.ini
The problem that I have (is not enough knowledge) is there isn't a sample configuration file that I can see in bacpypes/samples directory. Its only a .py files nothing with an .ini extension or name of BACpypes.ini
If I open up the samples directory in terminal and run cp BACpypes~.ini BACpypes.ini I get an error cp: cannot stat 'BACpypes~.ini': No such file or directory
Any tips help thank you...
There's a sample .ini in the documentation, a couple of paragraphs after the commands you copied. It looks like this
[BACpypes]
objectName: Betelgeuse
address: 192.168.1.2/24
objectIdentifier: 599
maxApduLengthAccepted: 1024
segmentationSupported: segmentedBoth
maxSegmentsAccepted: 1024
vendorIdentifier: 15
foreignPort: 0
foreignBBMD: 128.253.109.254
foreignTTL: 30
I'm not sure why you couldn't copy BACpypes~.ini. I know tilda could be expanded by your shell so you could try to escape it with
cp BACpypes\~.ini BACpypes.ini
Though I assume it isn't needed now that you have a default configuration file.

How to search for a list of file name on the server using linux bash shell tool

I am looking to see if the server has file name that i m looking for. I have list of file names on the text file and want to see whether these files existed in the server. Is there a way that I can do that either in python or in linux bash shell tool
Assuming the files holds the names as one per line, the following should work:
find / -type f | grep -f **filenames**
Please allow for a lengthy period of time to complete, if your machine is
slow or has lots of files.
If locate (updatedb) is available/up-and-running the following may be quicker.
locate -r . | grep -f **filenames**

Batch file downloading using Perl or any other language

I have pretty good knowledge in JS, HTML, CSS, C, C++ and C#. I have this website which offers question papers for us school students, but to download those we have to visit every page and it's too hard for us. There are about 150 files. So... ;)
The download links always look like this:
http://www.example.com/content/download_content.php?content_id=#
where # is a number.
So I thought if javascript or perl or python or any other language can download the files and save it locally automatically. Currently I don't need much, just the basic code. I'll learn the language and then I'll develop on it myself. So please help me out pals..
That's how I usually do such things in bash:
for i in `seq 1 1000` ; do wget "http://www.example.com/content/download_content.php?content_id=$i" -O $i.html ; done
UPDATE Since the URLs point to more than one file type, you could use the file command to identify the type of a downloaded file, and adjust the extension accordingly:
for i in `seq 1 1000`
do
wget "http://www.example.com/content/download_content.php?content_id=$i" -O $i.out
mime=`file --brief --mime-type $i.out`
if [ "$mime" == "application/pdf" ]
then
mv $i.out $i.pdf
elif [ "$mime" == "application/vnd.ms-office" ]
then
mv $i.out $i.doc
fi
done
This will do it in shell script using the wget program, dumping them all into the current directory:
#!/bin/sh
i=1
while [ $i -le 150 ]; do
wget -O $i.out "http://www.example.com/content/download_content.php?content_id=$i"
i = $((i + 1))
done
How about using curl instead:
curl -O http://www.example.com/content/download_content.php?content_id=#[1-150]
Should work on most linux distros and if its not there you can download curl from here: http://curl.haxx.se/ or with a 'apt-get install curl'

Categories