python change file names to unicode chars Hindi - python

I am trying to change the filenames to unicode which i am getting from a file reading line by line. When i try to rename the files, then i gets the error here. Here is the code
import codecs
import os
arrayname = []
arrayfile = []
f = codecs.open('E:\\songs.txt', encoding='utf-8', mode='r+')
for line in f:
arrayname.append(line)
for filename in os.listdir("F:\\songs"):
if filename.endswith(".mp3"):
arrayfile.append(filename)
for num in range(0,len(arrayname)):
print "F:\\songs\\" + arrayfile[num]
os.rename("F:\\songs\\" + arrayfile[num], "F:\\songs\\" + (arrayname[num]))
I am getting this error
Traceback (most recent call last):
File "C:\read.py", line 25, in <module>
os.rename("F:\\songs\\" + arrayfile[num], "F:\\songs\\" + (arrayname[num]))
WindowsError: [Error 123] The filename, directory name, or volume label syntax is in
correct
How can change the name of the files ?

You are forgetting to remove the newline character from the end of your lines. Remove it with str.rstrip():
for line in f:
arrayname.append(line.rstrip('\n'))
You can simplify your code somewhat, and use best practices to ensure the file is closed. I'd use the newer (and better engineered) io.open() rather than codecs.open(). If you use Unicode literals for paths, Python will ensure you get Unicode filenames when listing:
import io
import os
import glob
directory = u"F:\\songs"
songs = glob.glob(os.path.join(directory, u"*.mp3"))
with io.open('E:\\songs.txt', encoding='utf-8') as newnames:
for old, new in zip(songs, newnames):
oldpath = os.path.join(directory, old)
newpath = os.path.join(directory, new.rstrip('\n'))
print oldpath
os.rename(oldpath, newpath)
I used the glob module to filter out matching filenames.

Related

Python: Unicode characters in file or folder names

We process a lot of files where path can contain an extended character set like this:
F:\Site Section\Cieślik
My Python scripts fail to open such files or chdir to such folders whatever I try.
Here is an extract from my code:
import zipfile36 as zipfile
import os
from pathlib import Path
outfile = open("F:/zip_pdf3.log", "w", encoding="utf-8")
with open('F:/zip_pdf.txt') as f: # Input file list - note the forward slashes!
for line in f:
print (line)
path, filename = os.path.split(line)
file_no_ext = os.path.splitext(os.path.basename(line))[0]
try:
os.chdir(path) # Go to the file path
except Exception as exception:
print (exception, file = outfile) #3.7
print (exception)
continue
I tried the following:
Converting path to a raw string
raw_string = r"{}".format(path)
try:
os.chdir(raw_string)
Converting a string to Path
Ppath = Path(path)
try:
os.chdir(Ppath.decode("utf8"))
Out of ideas... Anyone knows how to work with Unicode file and folder names? Using Python 3.7 or higher on Windows.
Could be as simple as that - thanks #SergeBallesta:
with open('F:/pdf_err.txt', encoding="utf-8") as f:
I may post updates after more runs with different input.
This, however, leads to a slightly different question: if, instead of reading from the file, I walk over folders and files with extended character set - how do I deal with those, i.e.
for subdir, dirs, files in os.walk(rootdir): ?
At present I'm getting either a "The filename, directory name, or volume label syntax is incorrect" or "Can't open the file".

Concatenate last 100 files in an only one

Begginer in Python needs a bit of help. I am using Python 2.7.
I want to make a program that concatenates the last 100 files I have in a folder. In that folder I have lots of files but I only want the concatenation of the last 100 ones. I am able to do the concatenation of all of them (if I don´t specify number and change the for loop), but I am not able to select the last 100 files. These files are saved in binary by the software.They are saved in the folder specified below. I would like to remove that 100 files once are concatenated in teh new one.The program I have done is the following:
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
filename=glob.glob("*-*-*.*.*-*.*.*.jpk-force")
filename.sort(key=os.path.getmtime)
for filename in range(0,99):
filename=open(filename,"rb")
tout=filename.read()+\r\n"
rout = rout+tout
os.remove(filename)
filename.close()
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
It doesn´t do anything and the error is the following:
Traceback (most recent call last):
File "C:\AFM_test\jpk_files\AFM_test.py", line 12, in <module>
filename = open(filename,"rb")
TypeError: coercing to Unicode: need string or buffer, int found
[Finished in 0.1s]
I guess the problem is the loop and its structure "range(0,99)",as when I have concatenated all the files contained in the folder:
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
filename=glob.glob("*-*-*.*.*-*.*.*.jpk-force")
for filename in files:
filename=open(filename,"rb")
tout=filename.read()+\r\n"
rout = rout+tout
os.remove(filename)
filename.close()
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
it worked okay except the remove order, which showed this error:
Traceback (most recent call last):
File "C:\try\AFM_test_2.py", line 17, in <module>
os.remove(filename)
must be string, not file
Any ideas how can I achieve my goal?
I hope I have explained myself properly. Maybe I have missed something important, sorry, I am just a beginner in this field.
Thank you.
TypeError: coercing to Unicode: need string or buffer, int found
That is because filename is an integer and then you are trying to concatenate it with a string.
os.remove(filename)
must be string, not file
That is because you are re-assigning the variable filename (which was a string path) to a file handle/object. os.remove(..) expects the variable from the for-loop, not the result of open(..). Its generally a good practice to give meaningful names to variables – filepath and infile etc.
A better approach would be:
def processFile(filepath):
with open(filepath) as f:
content = f.read()
os.remove(filepath)
return content
def main():
paths = glob.glob("..*..*..")
last100paths = paths[-100:]
with open(outFilePath, "w") as f:
f.write("\r\n".join(processFile(path) for path in last100paths))
You need to change:
filename=open(filename,"rb")
...to something like:
inf = open(filename, "rb")
...
inf.close()
Then, when you're calling os.remove(filename), it will still be the filename from the original loop, not a file object that your code is reassigning to this variable.
Note: rather than doing this explicit opening and closing of files, though, try using the with statement (see this helpful guide).
Checn if glob is matching patterns
pattern = r"*-*-*.*.*-*.*.*.jpk-force"
filenames=glob.glob(pattern)
if not filenames:
print 'no files matched ', pattern
sys.exit(1)
Get mtime sorted file list by building list of tuples each containing file name and mtime
filenames = [ (filename,os.stat(filename)[8]) for filename in filenames ]
sort the list with mtime in descending order
filenames.sort(key=lambda x:x[1],reverse=True)
The above two lines can be simplified as;
filenames = [ filename for filename in sorted(filenames,key=os.path.getmtime,reverse=True) ]
The above line can be refactored, because we can sort in place
filenames.sort(key=os.path.getmtime,reverse=True)
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
pattern = r"*-*-*.*.*-*.*.*.jpk-force"
filenames=glob.glob(pattern)
if not filenames:
print 'no files matched ', pattern
sys.exit(1)
filenames.sort(key=os.path.getmtime,reverse=True)
for filename in filenames[:100]
filecontent=open(filename,"rb")
tout=filecontent.read()+"\r\n"
filecontent.close()
rout = rout+tout
os.remove(filename)
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
You didn't check for exceptions.

Write Folder Contents to CSV with Regex

I am trying to use the Python script here for my own purposes. I'm no Python bloke, so hopefully someone can see what I have wrong.
The below script doesn't error out. My CSV is created with no values. Do I have a join problem? I'm expecting to have data written to the CSV.
# import the standard libraries you'll need
import os # https://docs.python.org/2/library/os.html
import re # https://docs.python.org/2/library/re.html
# this function will walk your directories and output a list of file paths
def getFilePaths(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
audio_file_paths = getFilePaths("Z:\Dropbox\Apps\DirScan\files")
output_to_csv = [];
for audio_file in audio_file_paths:
base_path, fname = os.path.split(audio_file)
reg_ex = re.compile("^(.*) - (.*) - (.*).mp3$");
# now apply the compiled regex to each path
name_components = reg_ex.match(fname);
output_to_csv.append("{0},{1}".format(",".join(name_components), base_path));
#create the file, making sure the location is writeable
csv_doc = open("database.csv", "w");
# now join all the rows with line breaks and write the compiled text to the file
csv_doc.write( '\n'.join(output_to_csv) );
#close your new database
csv_doc.close()
When I run your code I get this error:
Traceback (most recent call last):
File "x.py", line 29, in <module>
output_to_csv.append("{0},{1}".format(",".join(name_components), base_path));
TypeError
Because name_components is a regex Match object, which doesn't work as an argument to join. You need to replace:
",".join(name_components)
With:
",".join(name_components.groups())
After making that change I can see the CSV file gets written to correctly.
One other minor point: you don't need a semicolon at the end of a line in python.

python - open all plain text files in a directory with ".dta" extension and write lines to csv

I have a number of plain-text config files (.dta) that are spread through 27 sub-directories. I am trying to parse some information from all of them into a common document that is easier to work with.
Thus far I have:
import linecache
import csv
import os
csvout = csv.writer(open("dtaCompile.csv","wb"))
directory = os.path.join("c:\\","DirectKey")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".DTA"):
f=open(file,'r')
lines = f.readlines()
description = lines[1]
articleCode = lines[2]
OS = lines[25]
SMBIOS = lines[32]
pnpID = lines[34]
cmdLine = lines[28]
csvout.writerow([SMBIOS, description, articleCode, pnpID, OS, cmdLine])
f.close()
I'm getting the following error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
f=open(file,'r')
IOError: [Errno 2] No such file or directory: '000003APP.DTA'
Instead of
f=open(file,'r')
Your probaby need
f=open(os.path.join(directory, root, file),'r')
file is just the name of the file, and doesn't say anything about the path to it. you have to use os.path.join with the various components to create the full path
if file.endswith(".DTA"):
file = os.path.join(directory, root, file)
Instead of:
f=open(file,'r')
Try:
f = open(os.path.join(directory, file), "r")
My guess is that the directory you're program is executing in is not the same as the directory you're walking.
Try printing:
os.getcwd()
to see.

Python File Concatenation

I have a data folder, with subfolders for each subject that ran through a program. So, for example, in the data folder, there are folders for Bob, Fred, and Tom. Each one of those folders contains a variety of files and subfolders. However, I am only interested in the 'summary.log' file contained in each subject's folder.
I want to concatenate the 'summary.log' file from Bob, Fred, and Tom into a single log file in the data folder. In addition, I want to add a column to each log file that will list the subject number.
Is this possible to do in Python? Or is there an easier way to do it? I have tried a number of different batches of code, but none of them get the job done. For example,
#!/usr/bin/python
import sys, string, glob, os
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
outfile = open('summary.log','w');
for x in fls:
file=open(x,'r');
data=file.read();
file.close();
outfile.write(data);
outfile.close();
Gives me the error,
Traceback (most recent call last):
File "fileconcat.py", line 8, in <module>
file=open(x,'r');
IOError: [Errno 21] Is a directory
I think this has to do with the fact that the data folder contains subfolders, but I don't know how to work around it. I also tried this, but to no avail:
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
destination = open('summary.log', 'wb')
for filename in iglob(os.path.join(PATH, '*.log'))
shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()
This gives me an "invalid syntax" error at the "for filename" line, but I'm not sure what to change.
The syntax is not related to the use of glob.
You forget the ":" at the end of the for statement:
for filename in iglob(os.path.join(PATH, '*.log')):
^--- missing
But the following pattern works :
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/*.log'
destination = open('summary.log', 'wb')
for filename in iglob(PATH):
shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()
The colon (:) is missing in the for line.
Besides you should use with because it handles closing the file (your code is not exception safe).
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
with open('summary.log', 'wb') as destination:
for filename in iglob(os.path.join(PATH, '*.log')):
with open(filename, 'rb') as in_:
shutil.copyfileobj(in_, destination)
In your first example:
import sys, string, glob, os
you are not using sys, string or os, so there is no need to import those.
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
here, you are selecting the subject folders. Since you are interested in summary.log files within these folders, you may change the pattern as follows:
fls = glob.glob('/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/summary.log')
In Python, there is no need to end lines with semicolons.
outfile = open('summary.log','w')
for x in fls:
file = open(x, 'r')
data = file.read()
file.close()
outfile.write(data)
outfile.close()
As VGE's answer shows, your second solution works once you've fixed the syntax error. But note that a more general solution is to use os.walk:
>>> import os
>>> for i in os.walk('foo'):
... print i
...
('foo', ['bar', 'baz'], ['oof.txt'])
('foo/bar', [], ['rab.txt'])
('foo/baz', [], ['zab.txt'])
This goes through all the directories in the tree above the start directory and maintains a nice separation between directories and files.

Categories