wget set file name in batch download - python

I am trying to download files using an input file (a.txt) which has URLs using the following commands
wget -i a.txt
URLs are like
https://domian.com/abc?api=123&xyz=323&title=newFile12
https://domian.com/abc?api=1243&xyz=3223&title=newFile13
I want to set the name of the file from the URL by using the title tag (for example in the above URL name of the file download need to be newFile12) but can't find any way around it.
In order to get it done, I have to write a python script (similar to this answer https://stackoverflow.com/a/28313383/10549469) and run one by one is there any other way around.

You can create a script on the fly and and pipe it to bash. A bit slower than wget -i but would preserve file names:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt
When you are satisfied with the results, you can pipe it to bash:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt | bash

Have a look at wget --content-disposition or for loop with wget -O <outputfile name> <url>
Following command downloads the file with filename as provided by server (vim-readonly-1.1.tar.gz) instead of download_script.php?src_id=27233.
wget --content-disposition https://www.vim.org/scripts/download_script.php?src_id=27233

Related

wget -O file/directory download/file Error: is no directory

Every solution i found dont helped me. I use a Python script, the line where wget is in:
check_call(['wget', '-O', '/home', 'download/file], stdout=open(os.devnull,'wb'))
What I want to do is dowload a file and put it in a directory
You didn't specify an URL. In Linux, wget works like this:
wget [OPTION]... [URL]...
It should work if you specify args as:
['wget', '-O', '/home/download/file', 'https://example.com']

Have Python script read a text file instead of single input

I am working with this python script: https://github.com/sarchar/addressgen/blob/master/genaddress.py
Currently it works via command line like so:
python3 genaddress.py -p passphrase
How would I alter the script to accept a text file of passphrases?
I know this might not directly answer the question (How would I alter the script), but it should achieve a similar result on bash with the following command, assuming each passphrase has its own unique output:
cat passphrases.txt | xargs -I phrase python3 genaddress.py -p phrase
This iterates through each line in passphrases.txt and then subsequently passes it to your script, one line at a time.

shell script to convert windows file to unix using dos2unix

I'm writing a simple shell script to make use of dos2unix command to convert Windows-format files to Unix format as and when it arrives in my folder.
I used to use iconv in the script and automate it to get one encoding converted to the other. But now I need to use dos2unix instead of iconv.
I don't want the original file to be overwritten (it must be archived in the archive folder). This was straightforward with iconv; how can I do the same with dos2unix?
This is my script:
cd /myfolder/storage
filearrival_dir= /myfolder/storage
filearchive_dir=/myfolder/storage/archive
cd $filearrival_dir
echo " $filearrival_dir"
for file in File_October*.txt
do
iconv -f UTF16 -t UTF8 -o "$file.new" "$file" &&
mv -f "$file.new" "$file".`date +"%C%y%m%d"`.txt_conv &&
mv $file $filearchive_dir/$file
done
The above looks for files matching File_Oct*.txt, converts to the desired encoding and renames it with the timestamp and _conv at the end. This script also moves the original file to the archive.
How can I replace iconv in the above script with dos2unix and have the files archived and do the rest just like I did here?
You can "emulate" dos2unix using tr.
tr -d '\015' infile > outfile
If this is just about using dos2unix so it doesn't over-write the original file, just use
-n infile outfile
My recollection is that dos2unix writes UTF-8 by default, so you probably don't have to take any special action so far as encoding is concerned.

Simplest way to run Sphinx on one python file

We have a Sphinx configuration that'll generate a slew of HTML documents for our whole codebase. Sometimes, I'm working on one file and I just would like to see the HTML output from that file to make sure I got the syntax right without running the whole suite.
I looked for the simplest command I could run in a terminal to run sphinx on this one file and I'm sure the info's out there but I didn't see it.
Sphinx processes reST files (not Python files directly). Those files may contain references to Python modules (when you use autodoc). My experience is that if only a single Python module has been modified since the last complete output build, Sphinx does not regenerate everything; only the reST file that "pulls in" that particular Python module is processed. There is a message saying updating environment: 0 added, 1 changed, 0 removed.
To explicitly process a single reST file, specify it as an argument to sphinx-build:
sphinx-build -b html -d _build/doctrees . _build/html your_filename.rst
This is done in two steps:
Generate rst file from the python module with sphinx-apidoc.
Generate html from rst file with sphinx-build.
This script does the work. Call it while standing in the same directory as the module and provide it with the file name of the module:
#!/bin/bash
# Generate html documentation for a single python module
PACKAGE=${PWD##*/}
MODULE="$1"
MODULE_NAME=${MODULE%.py}
mkdir -p .tmpdocs
rm -rf .tmpdocs/*
sphinx-apidoc \
-f -e --module-first --no-toc -o .tmpdocs "$PWD" \
# Exclude all directories
$(find "$PWD" -maxdepth 1 -mindepth 1 -type d) \
# Exclude all other modules (apidoc crashes if __init__.py is excluded)
$(find "$PWD" -maxdepth 1 -regextype posix-egrep \
! -regex ".*/$MODULE|.*/__init__.py" -type f)
rm .tmpdocs/$PACKAGE.rst
# build crashes if index.rst does not exist
touch .tmpdocs/index.rst
sphinx-build -b html -c /path/to/your/conf.py/ \
-d .tmpdocs .tmpdocs .tmpdocs .tmpdocs/*.rst
echo "**** HTML-documentation for $MODULE is available in .tmpdocs/$PACKAGE.$MODULE_NAME.html"

Batch file downloading using Perl or any other language

I have pretty good knowledge in JS, HTML, CSS, C, C++ and C#. I have this website which offers question papers for us school students, but to download those we have to visit every page and it's too hard for us. There are about 150 files. So... ;)
The download links always look like this:
http://www.example.com/content/download_content.php?content_id=#
where # is a number.
So I thought if javascript or perl or python or any other language can download the files and save it locally automatically. Currently I don't need much, just the basic code. I'll learn the language and then I'll develop on it myself. So please help me out pals..
That's how I usually do such things in bash:
for i in `seq 1 1000` ; do wget "http://www.example.com/content/download_content.php?content_id=$i" -O $i.html ; done
UPDATE Since the URLs point to more than one file type, you could use the file command to identify the type of a downloaded file, and adjust the extension accordingly:
for i in `seq 1 1000`
do
wget "http://www.example.com/content/download_content.php?content_id=$i" -O $i.out
mime=`file --brief --mime-type $i.out`
if [ "$mime" == "application/pdf" ]
then
mv $i.out $i.pdf
elif [ "$mime" == "application/vnd.ms-office" ]
then
mv $i.out $i.doc
fi
done
This will do it in shell script using the wget program, dumping them all into the current directory:
#!/bin/sh
i=1
while [ $i -le 150 ]; do
wget -O $i.out "http://www.example.com/content/download_content.php?content_id=$i"
i = $((i + 1))
done
How about using curl instead:
curl -O http://www.example.com/content/download_content.php?content_id=#[1-150]
Should work on most linux distros and if its not there you can download curl from here: http://curl.haxx.se/ or with a 'apt-get install curl'

Categories