Unable to dowload file using os.system - python

I am trying to download a file using os.system in python and it never completely downloads the file
Here is the code
import os
url = 'wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p")&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O cnn_stories_tokenized.zip && rm -rf /tmp/cookies.txt'
os.system(url)
On trying to download the file with that with the same command on the terminal works just fine, are there any escape characters that I should be handling?

are there any escape characters that I should be handling?
Short answer: Yes.
There are \1 and \n in the string and Python tries to interpret it like a normal escape sequence.
You can either escape them manually by doubling each backslash or make it into raw string.
To make a raw string, add r just at the opening quote ' (making it r'wget...). "Raw" means Python will use it as-is, and not try to interpret things that look like escape codes (e.g. r'\n' == '\n). Anywhere you have a path to file or regex, just use raw strings to not worry about escaping backslashes by yourself and just paste what you wrote somewhere else!

There is one way, you can ran this command. I think you might be already knowing the answer.
Save the linux command as shell script:
e.g.: vi downloader.sh
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p")&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O cnn_stories_tokenized.zip && rm -rf /tmp/cookies.txt
save the file. call this file from python.
from subprocess import call
call(["bash", "downloader.sh"])
This is one way which can solve your problem, other alternatives using python libraries is also possible like
requests package

Related

Line wrap long `python -c` command to be < 80-chars

I have a Python two-liner to set an environment variable, which I run in bash:
ENV_VAR=$(python -c "from some_package import some_long_command; print(some_long_command())")
In bash, one can use \ to line wrap long commands. I am looking for an equivalent within python -c. Is there some way to line wrap this command so it's not so long?
I would like it to fit within 80-char width. Also, I don't want to make this a Python script, I prefer the python -c route.
Use newlines instead of semicolons.
ENV_VAR=$(python -c "
from some_package import some_long_command
print(some_long_command())
")
I added a couple of extra newlines so the python code stands out.
Or, a here-doc (without extra whitespace to show it can get a bit cramped)
ENV_VAR=$(python3 <<'_END_PYTHON'
from some_package import some_long_command
print(some_long_command())
_END_PYTHON
)

determine file type of a file without extension

I want to use pygmentize to highlight some script files (python/bash/...) without extension. But pygmentize requires me to specify the lexer using -l. It does not automatically identify the file type from the content.
I have the following options at hand, but none of them work now:
use file -b --mime-type. But this command output x-python and x-shellscript instead of python and bash and I don't know the rules
use vim -e -c 'echo &ft|q' the_file. For any file with or without file extension, vim has a mechanism to guess the file type. But it doesn't work. Since the output goes to the vim window and disappears after q.
What can I do?
#Samborski's method works fine in normal case but it does not work in python subprocess.check_output since the pts is not allocated. If you use nvim, you can use this more straightforward way:
HOME=_ nvim --headless -es file <<EOF
call writefile([&ft], "/dev/stdout")
EOF
You can use vim this way:
vim -c ':silent execute ":!echo " . &ft . " > /dev/stdout"' -c ':q!' the_file
It simply constructs command to run in the shell as a string concatenation.

shell script to convert windows file to unix using dos2unix

I'm writing a simple shell script to make use of dos2unix command to convert Windows-format files to Unix format as and when it arrives in my folder.
I used to use iconv in the script and automate it to get one encoding converted to the other. But now I need to use dos2unix instead of iconv.
I don't want the original file to be overwritten (it must be archived in the archive folder). This was straightforward with iconv; how can I do the same with dos2unix?
This is my script:
cd /myfolder/storage
filearrival_dir= /myfolder/storage
filearchive_dir=/myfolder/storage/archive
cd $filearrival_dir
echo " $filearrival_dir"
for file in File_October*.txt
do
iconv -f UTF16 -t UTF8 -o "$file.new" "$file" &&
mv -f "$file.new" "$file".`date +"%C%y%m%d"`.txt_conv &&
mv $file $filearchive_dir/$file
done
The above looks for files matching File_Oct*.txt, converts to the desired encoding and renames it with the timestamp and _conv at the end. This script also moves the original file to the archive.
How can I replace iconv in the above script with dos2unix and have the files archived and do the rest just like I did here?
You can "emulate" dos2unix using tr.
tr -d '\015' infile > outfile
If this is just about using dos2unix so it doesn't over-write the original file, just use
-n infile outfile
My recollection is that dos2unix writes UTF-8 by default, so you probably don't have to take any special action so far as encoding is concerned.

how can I parse json with a single line python command?

I would like to use python to parse JSON in batch scripts, for example:
HOSTNAME=$(curl -s "$HOST" | python ?)
Where the JSON output from curl looks like:
'{"hostname":"test","domainname":"example.com"}'
How can I do this with a single line python command?
Based on the JSON below being returned from the curl command ...
'{"hostname":"test","domainname":"example.com"}'
You can then use python to extract the hostname using the python json module:
HOSTNAME=$(curl -s "$HOST" |
python -c \
'import json,sys;print(json.load(sys.stdin)["hostname"])')
Note that I have split the line using a \ to make it more readable on stackoverflow. I've also simplified the command based on chepner's comment.
Original source: Parsing JSON with Unix tools
See also: https://wiki.python.org/moin/Powerful%20Python%20One-Liners
echo '{"hostname":"test","domainname":"example.com"}' | python -m json.tool
Since Python is multiplatform, is important to note differences between Linux and Windows, especially because of how they treat double-quotes/single-quotes differently.
Second, some previous answers are a little bit obsolete: in python2, print without parentheses was permitted. Nevertheless, in python3, print must be between parentheses.
Linux (bash)
It doesn't matter how you put double/single quotes. Json can be parsed in both ways with "keys" or 'keys'
HOSTNAME=$(curl -s "$HOST" |
python3 -c 'import json,sys;print(json.load(sys.stdin)["hostname"])')
It also works: (pay attention at single/double quote at key)
HOSTNAME=$(curl -s "$HOST" |
python3 -c "import json,sys;print(json.load(sys.stdin)['hostname'])")
Windows (powershell)
Keys in json MUST be between single quotes. Only the following syntax is accepted.
The ConvertTo-Json function generates object and works with keys between single quotes.
$HOSTNAME=(Invoke-RestMethod $HOST | `
ConvertTo-Json | `
python3 -c "import json,sys; print(json.load(sys.stdin)['hostname'])")

Unable to understand grep command : grep -n -o -a --text \"audio uri=\\".*\\""

This command is being used in a Tool written in Python to grep the following string 'audio uri="a852' from the line <audio uri="a852"/> in a text file.
But I am unable to understand how \\ are being used here.
This command normally works in Linux if we remove "\" before "audio uri.
My understanding to this "\" is it is for using it in the tool i am using written in Python.
\ is used to cancel the signification of ". Here, you have to recognize your " to text instead of Bash, to matches url=".*".
For example, you can use \ like this:
mista ~> myname='Mistalis'
mista ~> echo $myname
Mistalis
mista ~> echo \$myname
$myname
I think you get a double \\ because your Python code also needs to cancel the "caracter, with a \.
If you want to run it on your console, try:
grep -n -o -a --text “audio uri=\”.*\“”.
Here I just removed one of the double \\.

Categories