System not able to find file with space while calculating md5/sha1 - python

I have written below code to generate hash code for all mp3 files available in a directory. But system is throwing error for files having space in name
directory - d:\song
Files in the directory AB CD.mp3, Abc.mp3, GB.mp3
import os
dirname = 'd:\song'
def walk(dirname):
names = []
for name in os.listdir(dirname):
path = os.path.join(dirname,name)
if os.path.isfile(path):
names.append(path)
else:
names.extend(walk(path))
return names
def chk_dup(f):
for i in f:
cmd = 'fciv -md5 %s' % i.replace(' ','')
fp = os.popen(cmd)
res = fp.read()
print(res)
fp.close()
chk_dup(walk(dirname))
Output is
//
// File Checksum Integrity Verifier version 2.05.
//
d:\song\abcd.mp3\*
Error msg : The system cannot find the path specified.
Error code : 3
//
// File Checksum Integrity Verifier version 2.05.
//
1a65b4c63d64f0634c1411d37629be3b d:\song\abc.mp3
//
// File Checksum Integrity Verifier version 2.05.
//
bbf47eb1cb3625eea648f0b6e0784fd3 d:\song\gb.mp3

You can probably fix your immediate problem by enclosing all the file path name arguments in double quotes in case they contain spaces. This will make it be treated it as a single argument rather than two (or more) of them which is the case otherwise.
for i in f:
cmd = 'fciv -md5 "%s"' % i
...
However, rather than just do that, I would suggest that you stop usingos.popen()altogether, because it has been deprecated since Python version 2.6, and use the recommendedsubprocess module instead. Among other advantages, doing so will automatically handle the quoting of arguments with spaces in them for you.
In addition it would also be useful for you to take advantage of the built-inos.walk()function to simplify your ownwalk()function.
Incorporating both of these changes would result in code looking something like the following:
import os
import subprocess
directory = r'd:\song'
def walk(dirname):
for root, dirs, files in os.walk(dirname):
for name in files:
path = os.path.join(root, name)
yield path
def chk_dup(files):
for file in files:
args = ['fciv', '-md5', file] # cmd as sequence of arguments
p = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
res = p.communicate()[0] # communicate returns (stdoutdata, stderrdata)
print res
chk_dup(walk(directory))

Your file is "AB CD.mp3", no "ABCD.mp3". Therefore, the file "ABCD.mp3" cannot be found.
Try to use ' to fill the command:
cmd = "fciv -md5 '%s'" % i

Related

Python: Unicode characters in file or folder names

We process a lot of files where path can contain an extended character set like this:
F:\Site Section\Cieślik
My Python scripts fail to open such files or chdir to such folders whatever I try.
Here is an extract from my code:
import zipfile36 as zipfile
import os
from pathlib import Path
outfile = open("F:/zip_pdf3.log", "w", encoding="utf-8")
with open('F:/zip_pdf.txt') as f: # Input file list - note the forward slashes!
for line in f:
print (line)
path, filename = os.path.split(line)
file_no_ext = os.path.splitext(os.path.basename(line))[0]
try:
os.chdir(path) # Go to the file path
except Exception as exception:
print (exception, file = outfile) #3.7
print (exception)
continue
I tried the following:
Converting path to a raw string
raw_string = r"{}".format(path)
try:
os.chdir(raw_string)
Converting a string to Path
Ppath = Path(path)
try:
os.chdir(Ppath.decode("utf8"))
Out of ideas... Anyone knows how to work with Unicode file and folder names? Using Python 3.7 or higher on Windows.
Could be as simple as that - thanks #SergeBallesta:
with open('F:/pdf_err.txt', encoding="utf-8") as f:
I may post updates after more runs with different input.
This, however, leads to a slightly different question: if, instead of reading from the file, I walk over folders and files with extended character set - how do I deal with those, i.e.
for subdir, dirs, files in os.walk(rootdir): ?
At present I'm getting either a "The filename, directory name, or volume label syntax is incorrect" or "Can't open the file".

How to pass a command line within a function?

I am trying to unzip fasta.gz files in order to work with them. I have created a script using cmd base on something I have done before but now I cannot manage to work the newly created function. See below:
import glob
import sys
import os
import argparse
import subprocess
import gzip
#import gunzip
def decompressed_files():
print ('starting decompressed_files')
#files where the data is stored
input_folder=('/home/me/me_files/PB_assemblies_for_me')
#where I want my data to be
output_folder=input_folder + '/fasta_files'
if os.path.exists(output_folder):
print ('folder already exists')
else:
os.makedirs(output_folder)
print ('folder has been created')
for f in input_folder:
fasta=glob.glob(input_folder + '/*.fasta.gz')
#print (fasta[0])
#sys.exit()
cmd =['gunzip', '-k', fasta, output_folder]
my_file=subprocess.Popen(cmd)
my_file.wait
decompressed_files()
print ('The programme has finished doing its job')
But this give the following error:
TypeError: execv() arg 2 must contain only strings
If I write fasta, the programme looks for a file an the error becomes:
fasta.gz: No such file or directory
If I go to the directory where I have the files and I key gunzip, name_file_fasta_gz, it does the job beautifully but I have a few files in the folder and I would like to create the loop. I have used 'cmd' before as you can see in the code below and I didn't have any problem with it. Code from the past where I was able to put string, and non-string.
cmd=['velveth', output, '59', '-fastq.gz', '-shortPaired', fastqs[0], fastqs[1]]
#print cmd
my_file=subprocess.Popen(cmd)#I got this from the documentation.
my_file.wait()
I will be happy to learn other ways to insert linux commands within a python function. The code is for python 2.7, I know it is old but it is the one is install in the server at work.
fasta is a list returned by glob.glob().
Hence cmd = ['gunzip', '-k', fasta, output_folder] generates a nested list:
['gunzip', '-k', ['foo.fasta.gz', 'bar.fasta.gz'], output_folder]
but execv() expects a flat list:
['gunzip', '-k', 'foo.fasta.gz', 'bar.fasta.gz', output_folder]
You can use the list concentration operator + to create a flat list:
cmd = ['gunzip', '-k'] + fasta + [output_folder]
I haven't tested this but it might solve you unzip problem using command.
command gunzip -k is to keep both the compressed and decompressed file then what is the purpose of output directory.
import subprocess
import gzip
def decompressed_files():
print('starting decompressed_files')
# files where the data is stored
input_folder=('input')
# where I want my data to be
output_folder = input_folder + '/output'
if os.path.exists(output_folder):
print('folder already exists')
else:
os.makedirs(output_folder)
print('folder has been created')
for f in os.listdir(input_folder):
if f and f.endswith('.gz'):
cmd = ['gunzip', '-k', f, output_folder]
my_file = subprocess.Popen(cmd)
my_file.wait
print(cmd) will look as shown below
['gunzip', '-k', 'input/sample.gz', 'input/output']
I have a few files in the folder and I would like to create the loop
From above quote your actual problem seems to be unzip multiple *.gz files from path
in that case below code should solve your problem.
import os
import shutil
import fnmatch
def gunzip(file_path,output_path):
with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out:
shutil.copyfileobj(f_in, f_out)
def make_sure_path_exists(path):
try:
os.makedirs(path)
except OSError:
if not os.path.isdir(path):
raise
def recurse_and_gunzip(input_path):
walker = os.walk(input_path)
output_path = 'files/output'
make_sure_path_exists(output_path)
for root, dirs, files in walker:
for f in files:
if fnmatch.fnmatch(f,"*.gz"):
gunzip(root + '/' + f, output_path + '/' + f.replace(".gz",""))
recurse_and_gunzip('files')
source
EDIT:
Using command line arguments -
subprocess.Popen(base_cmd + args) :
Execute a child program in a new process. On Unix, the class uses os.execvp()-like behavior to execute the child program
fasta.gz: No such file or directory
So any extra element to cmd list is treated as argument and gunzip will look for argument.gz file hence the error fasta.gz file not found.
ref and some useful examples
Now if you want to pass gz files as command line argument you can still do that with below code( you might need to polish little bit as per your need)
import argparse
import subprocess
import os
def write_to_desired_location(stdout_data,output_path):
print("Going to write to path", output_path)
with open(output_path, "wb") as f_out:
f_out.write(stdout_data)
def decompress_files(gz_files):
base_path=('files') # my base path
output_path = base_path + '/output' # output path
if os.path.exists(output_path):
print('folder already exists')
else:
os.makedirs(output_path)
print('folder has been created')
for f in gz_files:
if f and f.endswith('.gz'):
print('starting decompressed_files', f)
proc = subprocess.Popen(['gunzip', '-dc', f], stdout=subprocess.PIPE) # d:decompress and c:stdout
write_to_desired_location(proc.stdout.read(), output_path + '/' + f.replace(".gz", ""))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-gzfilelist",
required=True,
nargs="+", # 1 or more arguments
type=str,
help='Provide gz files as arguments separated by space Ex: -gzfilelist test1.txt.tar.gz test2.txt.tar.gz'
)
args = parser.parse_args()
my_list = [str(item)for item in args.gzfilelist] # converting namedtuple into list
decompress_files(gz_files=my_list)
execution:
python unzip_file.py -gzfilelist test.txt.tar.gz
output
folder already exists
('starting decompressed_files', 'test.txt.tar.gz')
('Going to write to path', 'files/output/test.txt.tar')
You can pass multiple gz files as well for example
python unzip_file.py -gzfilelist test1.txt.tar.gz test2.txt.tar.gz test3.txt.tar.gz

Cannot find the file specified when batch renaming files in a single directory

I've created a simple script to rename my media files that have lots of weird periods and stuff in them that I have obtained and want to organize further. My script kinda works, and I will be editing it to edit the filenames further but my os.rename line throws this error:
[Windows Error: Error 2: The system cannot find the file specified.]
import os
for filename in os.listdir(directory):
fcount = filename.count('.') - 1 #to keep the period for the file extension
newname = filename.replace('.', ' ', fcount)
os.rename(filename, newname)
Does anyone know why this might be? I have a feeling that it doesn't like me trying to rename the file without including the file path?
try
os.rename(filename, directory + '/' + newname);
Triton Man has already answered your question. If his answer doesn't work I would try using absolute paths instead of relative paths.
I've done something similar before, but in order to keep any name clashes from happening I temporarily moved all the files to a subfolder. The entire process happened so fast that in Windows Explorer I never saw the subfolder get created.
Anyhow if you're interested in looking at my script It's shown below. You run the script on the command line and you should pass in as a command-line argument the directory of the jpg files you want renamed.
Here's a script I used to rename .jpg files to multiples of 10. It might be useful to look at.
'''renames pictures to multiples of ten'''
import sys, os
debug=False
try:
path = sys.argv[1]
except IndexError:
path = os.getcwd()
def toint(string):
'''changes a string to a numerical representation
string must only characters with an ordianal value between 0 and 899'''
string = str(string)
ret=''
for i in string:
ret += str(ord(i)+100) #we add 101 to make all the numbers 3 digits making it easy to seperate the numbers back out when we need to undo this operation
assert len(ret) == 3 * len(string), 'recieved an invalid character. Characters must have a ordinal value between 0-899'
return int(ret)
def compare_key(file):
file = file.lower().replace('.jpg', '').replace('dscf', '')
try:
return int(file)
except ValueError:
return toint(file)
#files are temporarily placed in a folder
#to prevent clashing filenames
i = 0
files = os.listdir(path)
files = (f for f in files if f.lower().endswith('.jpg'))
files = sorted(files, key=compare_key)
for file in files:
i += 10
if debug: print('renaming %s to %s.jpg' % (file, i))
os.renames(file, 'renaming/%s.jpg' % i)
for root, __, files in os.walk(path + '/renaming'):
for file in files:
if debug: print('moving %s to %s' % (root+'/'+file, path+'/'+file))
os.renames(root+'/'+file, path+'/'+file)
Edit: I got rid of all the jpg fluff. You could use this code to rename your files. Just change the rename_file function to get rid of the extra dots. I haven't tested this code so there is a possibility that it might not work.
import sys, os
path = sys.argv[1]
def rename_file(file):
return file
#files are temporarily placed in a folder
#to prevent clashing filenames
files = os.listdir(path)
for file in files:
os.renames(file, 'renaming/' + rename_file(file))
for root, __, files in os.walk(path + '/renaming'):
for file in files:
os.renames(root+'/'+file, path+'/'+file)
Looks like I just needed to set the default directory and it worked just fine.
folder = r"blah\blah\blah"
os.chdir(folder)
for filename in os.listdir(folder):
fcount = filename.count('.') - 1
newname = filename.replace('.', ' ', fcount)
os.rename(filename, newname)

Navigating Python modules with ctags in Vim?

I'm using Vim with ctags for Python and that works very well for classes, fields, etc, but what it doesn't seem to include are Python file names aka module names. Is this possible? I'd far prefer to type ta <module> to jump to a module, rather than navigate level-by-level with a file browser like NERDtree, and I'm very accustomed to doing this in Java, which works out since class names are file names.
If you generate your tags file using exuberant ctags (Is there any other way?) then try adding the --extra=+f option. See the man page at http://ctags.sourceforge.net/ctags.html#OPTIONS for details.
Exuberant tags (with --extra=+f) generates tags for python file-names (e.g. my_module.py), but not for the module name (e.g. my_module). I ended up creating a modified version of the ptags script. Save the following to a file named ptags somewhere in your path and make it executable:
#! /usr/bin/env python
# ptags
#
# Create a tags file for Python programs, usable with vi.
# Tagged are:
# - functions (even inside other defs or classes)
# - classes
# - filenames
# Warns about files it cannot open.
# No warnings about duplicate tags.
import sys, re, os
import argparse
tags = [] # Modified global variable!
def main():
for root, folders, files in os.walk(args.folder_to_index):
for filename in files:
if filename.endswith('.py'):
full_path = os.path.join(root, filename)
treat_file(full_path)
if not args.recursive:
break
if tags:
fp = open(args.ctags_filename, 'w')
tags.sort()
for s in tags: fp.write(s)
expr = '^[ \t]*(def|class)[ \t]+([a-zA-Z0-9_]+)[ \t]*[:\(]'
matcher = re.compile(expr)
def treat_file(filename):
try:
fp = open(filename, 'r')
except:
sys.stderr.write('Cannot open %s\n' % filename)
return
base = os.path.basename(filename)
if base[-3:] == '.py':
base = base[:-3]
s = base + '\t' + filename + '\t' + '1\n'
tags.append(s)
while 1:
line = fp.readline()
if not line:
break
m = matcher.match(line)
if m:
content = m.group(0)
name = m.group(2)
s = name + '\t' + filename + '\t/^' + content + '/\n'
tags.append(s)
if __name__ == '__main__':
p = argparse.ArgumentParser()
p.add_argument('-f', '--ctags-filename', type=str, default='tags')
p.add_argument('-R', '--recursive', action='store_true')
p.add_argument('folder_to_index', type=str, default='.')
args = p.parse_args()
main()
Now run the following to generate a tags file by recursively processing the current directory:
ptags -R -f tags_file_to_create /path/to/index

I need to append standard header contents to the top of all of my python files

So, I have a bunch of python files (hundreds actually) that need a comment header at the top that contains the product name, a license reference notice, copyright information and other things. What is the best way to do this in a batch-like way? In other words, is there a tool I can use to specify what the header will be and what directory to apply this header to along with a *.py filter or something along those lines? By the way, all of the header info is identical for every file.
Bash batch syntax:
for i in `find {DIRECTORY} -name "*.py"`; do
cat - $i > /tmp/f.py <<EOF
{HEADER_BLOCK}
EOF
mv /tmp/f.py $i
done
If instead of the batch approach you would rather use python itself, a very simplified version could be written as:
import os, sys
def main():
HEADER = '''# Author: Rob
# Company: MadeupOne
# Copyright Owner: Rob
'''
filelist = []
for path, dir, files in os.walk(sys.argv[1]):
for file in files:
if file.endswith('.py'):
filelist.append(path + os.sep + file)
for filename in filelist:
try:
inbuffer = open(filename, 'U').readlines()
outbuffer = [HEADER] + inbuffer
open(filename, 'wb').writelines(outbuffer)
except IOError:
print 'Please check the files, there was an error when trying to open %s...' % filename
except:
print 'Unexpected error ocurred while processing files...'
if __name__ == '__main__': main()
Just pass the directory containing the files you want to alter and it will recursively prepend HEADER to all .py files on the path.
You can actually use python itself to follow a pythonic way.
To prepend or append some text to a file, use:
with open('filename.py','rb') as f:
text = f.read()
text = prependText + text
text = text + postText
// whatever you want to manipulate with the code text
with open('filename.py','wb') as f:
f.write(text)
Since python modules usually demonstrate a tree structure, you can always use walk function (os.path.walk) to navigate to any level of depth and apply any custom logic according to path and/or filename.
Updated above script to work with python 3 and ignore hidden folders
import os, sys
def main(path):
HEADER = '''#!/usr/bin/python3
# Copyright : 2021 European Commission
# License : 3-Clause BSD
'''
filelist = []
for path, dir, files in os.walk(sys.argv[1]):
if '/.' not in path:
for file in files:
if file.endswith('.py'):
filelist.append(path + os.sep + file)
for filename in filelist:
try:
inbuffer = open(filename, 'r').readlines()
outbuffer = [HEADER] + inbuffer
open(filename, 'w').writelines(outbuffer)
print(f"Header is added to the file: '{filename}'.")
except IOError:
print('Please check the files, there was an error when trying to open %s...' % filename)
except:
print('Unexpected error ocurred while processing files...')
if __name__ == '__main__': main()

Categories