shell script to convert windows file to unix using dos2unix

shell script to convert windows file to unix using dos2unix - python

I'm writing a simple shell script to make use of dos2unix command to convert Windows-format files to Unix format as and when it arrives in my folder.
I used to use iconv in the script and automate it to get one encoding converted to the other. But now I need to use dos2unix instead of iconv.
I don't want the original file to be overwritten (it must be archived in the archive folder). This was straightforward with iconv; how can I do the same with dos2unix?
This is my script:
cd /myfolder/storage
filearrival_dir= /myfolder/storage
filearchive_dir=/myfolder/storage/archive
cd $filearrival_dir
echo " $filearrival_dir"
for file in File_October*.txt
do
iconv -f UTF16 -t UTF8 -o "$file.new" "$file" &&
mv -f "$file.new" "$file".`date +"%C%y%m%d"`.txt_conv &&
mv $file $filearchive_dir/$file
done
The above looks for files matching File_Oct*.txt, converts to the desired encoding and renames it with the timestamp and _conv at the end. This script also moves the original file to the archive.
How can I replace iconv in the above script with dos2unix and have the files archived and do the rest just like I did here?

You can "emulate" dos2unix using tr.
tr -d '\015' infile > outfile

If this is just about using dos2unix so it doesn't over-write the original file, just use
-n infile outfile
My recollection is that dos2unix writes UTF-8 by default, so you probably don't have to take any special action so far as encoding is concerned.

Related

Python, unzip from stdin

I have a python script that does AES decryption of an encrypted zip archive 'myzip.enc'
I'm trying to use the output of that decryption and use it as stdin for "unzip" command.
Here is my code:
decrypt = subprocess.Popen(['openssl', 'enc', '-d', '-aes-256-cbc', '-md', 'sha256', '-in', '{}'.format(inputFile), '-pass', 'pass:{}'.format(passw_hash)], stdout=subprocess.PIPE)
decompress = subprocess.Popen(['unzip', '-j', '-d', path_dict], stdin=decrypt.stdout)
inputFile is my encrypted archive 'myzip.enc'
passw_hash is the AES password
path_dict is a folder path where to extract the decrypted zip
I'm getting this in my terminal:
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
-p extract files to pipe, no messages -l list files (short format)
-f freshen existing files, create none -t test compressed archive data
-u update files, create if necessary -z display archive comment only
-v list verbosely/show version info -T timestamp archive to latest
-x exclude files that follow (in xlist) -d extract files into exdir
modifiers:
-n never overwrite existing files -q quiet mode (-qq => quieter)
-o overwrite files WITHOUT prompting -a auto-convert any text files
-j junk paths (do not make directories) -aa treat ALL files as text
-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields
-C match filenames case-insensitively -L make (some) names lowercase
-X restore UID/GID info -V retain VMS version numbers
-K keep setuid/setgid/tacky permissions -M pipe through "more" pager
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
Is there something wrong in my unzip command?
Thanks.
Edit: It seems from Here that it is impossible to PIPE an output of zip archive to the unzip command due to the fact that unzip needs to read some info from the physical file.
My workaround ended up being this code which works:
output = open('{}.zip'.format(inputFile), "wb")
decrypt = subprocess.Popen(['openssl', 'enc', '-d', '-aes-256-cbc', '-md', 'sha256', '-in', '{}'.format(inputFile), '-pass', 'pass:{}'.format(passw_hash)], stdout=output)
decompress = subprocess.Popen(['unzip', '{}.zip'.format(inputFile), '-d', path_dict[0]])
Is there a way to unzip and delete the zip archive on the same time or add an rm to the decompress line ?
Thanks.

Execute python program with multiple files - Python - Bash

I have hundreds of XML files and I would like to parse it into CSV files. I already code this program.
To execute the python program I use this command (on VScode MS):
python ConvertXMLtoCSV.py -i Alarm120.xml -o Alarm120.csv
My question is, how change this script to integrate a sort of for loop to execute this program for each xml files ?
UPDATE
If my files and folders are organized like in the picture:
I tried this and execute the file .bat in windows10 but it does nothing:
#!/bin/bash
for xml_file in XML_Files/*.xml
do
csv_file=${xml_file/.xml/.csv}
python ConvertXMLtoCSV.py -i XML_Files/$xml_file -o CSV_Files/$csv_file
done

Ideally the for loop would be included inside your ConvertXMLtoCSV.py itself. You can use this to find all xml files in a given directory:
for file in os.listdir(directory_path):
if file.endswith(".xml"):
# And here you can do your conversion
You could change the arguments given to the script to be the path of the directory the xml files are located in and the path for an output folder for the .csv files. For renaming, you can leave the files with the same name but give the .csv extension. i.e.
csv_name = file.replace(".xml", ".csv")

If you want to keep your Python script as-is (process one file), and add the looping externally in bash, you could do:
#!/bin/bash
for xml_file in *.xml
do
csv_file=${xml_file/.xml/.csv}
python ConvertXMLtoCSV.py -i $xml_file -o $csv_file
done

After discussion, it appears that you wish to use an external script so as to leave the original ConvertXMLtoCSV.py script unmodified (as required by other projects), but that although you tagged bash in the question, it turned out that you were not in fact able to use bash to invoke python when you tried it in your setup.
This being the case, it is possible to adapt Rolv Apneseth's answer so that you do the looping in Python, but inside a separate script (let's suppose that this is called convert_all.py), which then runs the unmodified ConvertXMLtoCSV.py as an external process. This way, the ConvertXMLtoCSV.py will still be set up to process only one file each time it is run.
To call an external process, you could either use os.system or subprocess.Popen, so here are two options.
Using os.system:
import os
import sys
directory_path = sys.argv[1]
for file in os.listdir(directory_path):
if file.endswith(".xml"):
csv_name = file.replace(".xml", ".csv")
os.system(f'python ConvertXMLtoCSV.py -i {file} -o {csv_name}')
note: for versions of python too old to support f-strings, that last line could be changed to
os.system('python ConvertXMLtoCSV.py -i {} -o {}'.format(file,csv_name))
Using subprocess.Popen:
import subprocess
import sys
directory_path = sys.argv[1]
for file in os.listdir(directory_path):
if file.endswith(".xml"):
csv_name = file.replace(".xml", ".csv")
p = subprocess.Popen(['python', 'ConvertXMLtoCSV.py',
'-i', file,
'-o', csv_name])
p.wait()
You could then run it using some command such as:
python convert_all.py C:/Users/myuser/Desktop/myfolder
or whatever the folder is where you have the XML files.

wget set file name in batch download

I am trying to download files using an input file (a.txt) which has URLs using the following commands
wget -i a.txt
URLs are like
https://domian.com/abc?api=123&xyz=323&title=newFile12
https://domian.com/abc?api=1243&xyz=3223&title=newFile13
I want to set the name of the file from the URL by using the title tag (for example in the above URL name of the file download need to be newFile12) but can't find any way around it.
In order to get it done, I have to write a python script (similar to this answer https://stackoverflow.com/a/28313383/10549469) and run one by one is there any other way around.

You can create a script on the fly and and pipe it to bash. A bit slower than wget -i but would preserve file names:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt
When you are satisfied with the results, you can pipe it to bash:
sed "s/\(.*title=\(.*\)\)/wget -O '\2' '\1'/" a.txt | bash

Have a look at wget --content-disposition or for loop with wget -O <outputfile name> <url>
Following command downloads the file with filename as provided by server (vim-readonly-1.1.tar.gz) instead of download_script.php?src_id=27233.
wget --content-disposition https://www.vim.org/scripts/download_script.php?src_id=27233

Unable to dowload file using os.system

I am trying to download a file using os.system in python and it never completely downloads the file
Here is the code
import os
url = 'wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p")&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O cnn_stories_tokenized.zip && rm -rf /tmp/cookies.txt'
os.system(url)
On trying to download the file with that with the same command on the terminal works just fine, are there any escape characters that I should be handling?

are there any escape characters that I should be handling?
Short answer: Yes.
There are \1 and \n in the string and Python tries to interpret it like a normal escape sequence.
You can either escape them manually by doubling each backslash or make it into raw string.
To make a raw string, add r just at the opening quote ' (making it r'wget...). "Raw" means Python will use it as-is, and not try to interpret things that look like escape codes (e.g. r'\n' == '\n). Anywhere you have a path to file or regex, just use raw strings to not worry about escaping backslashes by yourself and just paste what you wrote somewhere else!

There is one way, you can ran this command. I think you might be already knowing the answer.
Save the linux command as shell script:
e.g.: vi downloader.sh
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p")&id=0BzQ6rtO2VN95cmNuc2xwUS1wdEE" -O cnn_stories_tokenized.zip && rm -rf /tmp/cookies.txt
save the file. call this file from python.
from subprocess import call
call(["bash", "downloader.sh"])
This is one way which can solve your problem, other alternatives using python libraries is also possible like
requests package

determine file type of a file without extension

I want to use pygmentize to highlight some script files (python/bash/...) without extension. But pygmentize requires me to specify the lexer using -l. It does not automatically identify the file type from the content.
I have the following options at hand, but none of them work now:
use file -b --mime-type. But this command output x-python and x-shellscript instead of python and bash and I don't know the rules
use vim -e -c 'echo &ft|q' the_file. For any file with or without file extension, vim has a mechanism to guess the file type. But it doesn't work. Since the output goes to the vim window and disappears after q.
What can I do?
#Samborski's method works fine in normal case but it does not work in python subprocess.check_output since the pts is not allocated. If you use nvim, you can use this more straightforward way:
HOME=_ nvim --headless -es file <<EOF
call writefile([&ft], "/dev/stdout")
EOF

You can use vim this way:
vim -c ':silent execute ":!echo " . &ft . " > /dev/stdout"' -c ':q!' the_file
It simply constructs command to run in the shell as a string concatenation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

shell script to convert windows file to unix using dos2unix - python

You can "emulate" dos2unix using tr. tr -d '\015' infile > outfile

If this is just about using dos2unix so it doesn't over-write the original file, just use -n infile outfile My recollection is that dos2unix writes UTF-8 by default, so you probably don't have to take any special action so far as encoding is concerned.

Related

Python, unzip from stdin

Execute python program with multiple files - Python - Bash

wget set file name in batch download

Unable to dowload file using os.system

determine file type of a file without extension

Categories

Resources