Handling UTF filenames in Windows

Handling UTF filenames in Windows - python

Given the following files:
E:/Media/Foo/info.nfo
E:/Media/Bar/FXGâ¢.nfo
I can "find" them with the following:
BASE = r'E:/Media/'
for dirpath, _, files in os.walk(BASE):
for f in fnmatch.filter(files, '*.nfo'):
nfopath = os.path.join(dirpath, f)
print(nfopath)
This snippet would then print the above paths.
However, if I make sure that each path created by os.path.join() is indeed a regular file -- for example with something like:
for dirpath, _, files in os.walk(BASE):
for f in fnmatch.filter(files, '*.nfo'):
nfopath = os.path.join(dirpath, f)
print(nfopath)
assert os.path.isfile(nfopath) # <------
The assertion fails for the second filename, but not for the first.
I checked the folder in explorer, and the script indeed found a regular file and printed the name and path correctly, so I'm not clear on why the assertion failed.
I've tried specifying the BASE string as a unicode string (ur'E:/Media/') as well as explicitly encoding the nfopath inside the isfile() call (assert os.path.isfile(nfopath.encode('utf-8')).
Neither seemed to work.
Of course, I could keep track of and manually go through and delete the failing files, but I'm interested in how one would handle this correctly.
Thanks in advance.
(Python 2.7, Windows 7)

According to this SO question, Windows stores file names as UTF-16 when using the NTFS filesystem. Retry your encoding step with UTF-16.

Related

How to add a password and output directory using Zipfile module in Python?

I got below code from online and I am trying to add a password and I want to change the result directory to be "C:#SFTPDWN" (Final Zip file should be in this folder).
I try to change it like below, it did not work.
with ZipFile('CC-Data.zip', 'w', 'pass word') as zip:
Can anybody please tell how to change this code to add password and change result folder?
One last thing, currently it will zip #SFTPDWN folder, I just want to zip everything inside (Right now it will create two folders (CC-Data.zip and inside it #SFTPDWN )). Can anybody please tell me how to zip everything inside #SFTPDWN folder?
Code
from zipfile import ZipFile
import os
def get_all_file_paths(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
def main():
# path to folder which needs to be zipped
directory = 'C:\#SFTPDWN'
file_paths = get_all_file_paths(directory)
print('Following files will be zipped:')
for file_name in file_paths:
print(file_name)
with ZipFile('CC-Data.zip', 'w') as zip:
# writing each file one by one
for file in file_paths:
zip.write(file)
print('Zipped successfully!')
if __name__ == "__main__":
main()

For the password question: from the documentation:
This module [...] supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C.
https://docs.python.org/3/library/zipfile.html
You would need to use a 3rd party library to create an encrypted zip, or encrypt the archive some other way.
For the second part, in ZipFile.write the documentation also mentions:
ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None)
Write the file named filename to the archive, giving it the archive name arcname (by default, this will be the same as filename, but without a drive letter and with leading path separators removed). [...]
Note: Archive names should be relative to the archive root, that is, they should not start with a path separator.
https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.write
So you would need to strip off whatever prefix of your file variable and pass that as the arcname parameter. Using os.path.relpath may help, e.g. (I'm on Linux, but should work with Windows paths under Windows):
>>> os.path.relpath("/folder/subpath/myfile.txt", "/folder/")
'subpath/myfile.txt'
Sidebar: a path like "C:\Something" is an illegal string, as it has the escape \S. Python kinda tolerates this (I think in 3.8 it will error) and rewrites them literally. Either use "C:\\Something", r"C:\Something", or "C:/Something" If you attempted something like "C:\Users" it would actually throw an error, or "C:\nothing" it might silently do something strange...

Python: How do I interact with unicode filenames on Windows? (Python 2.7)

My problem:
Start with US Windows 10 install
Create a Japanese filename in Windows explorer
Open the Python shell, and os.listdir('.')
The listed filename is full of question marks.
os.path.exists() unsurprisingly reports file not found.
NTFS stores the filename as Unicode. I'm sure if I used the win32api CreateFile() series of functions I will get my Unicode filename back, however those APIs are too cumbersome (and not portable). I'd prefer that I get utf-8 encoded filenames, or the Unicode bytes from the FS directory structure, but in default mode this doesn't seem to happen.
I have tried playing around with setlocale() but I haven't stumbled upon the correct arguments to make my program work. I do not want to (and cannot) install additional code pages onto the Windows machine. This needs to work with a stock install of Windows.
Please note this has nothing to do with the console. A repr() shows that the ? chars that end up in the filename listed by os.listdir('.') are real question marks and not some display artifact. I assume they have been added by the API that listdir() uses under the hood.

You may be getting ?s while displaying that filename in the console using os.listdir() but you can access that filename without any problems as internally everything is stored in binary. If you are trying to copy the filename and paste it directly in python, it will be interpreted as mere question marks...
If you want to open that file and perform any operations, then, have a look at this...
files = os.listdir(".")
# Possible output:
# ["a.txt", "file.py", ..., "??.html"]
filename = files[-1] # The last file in this case
f = open(filename, 'r')
# Sample file operation
lines = f.readlines()
print(lines)
f.close()
EDIT:
In Python 2, you need to pass current path as Unicode which could be done using: os.listdir(u'.'), where the . means current path. This will return the list of filenames in Unicode...

taking data from files which are in folder

How do I get the data from multiple txt files that placed in a specific folder. I started with this could not fix. It gives an error like 'No such file or directory: '.idea' (??)
(Let's say I have an A folder and in that, there are x.txt, y.txt, z.txt and so on. I am trying to get and print the information from all the files x,y,z)
def find_get(folder):
for file in os.listdir(folder):
f = open(file, 'r')
for data in open(file, 'r'):
print data
find_get('filex')
Thanks.

If you just want to print each line:
import glob
import os
def find_get(path):
for f in glob.glob(os.path.join(path,"*.txt")):
with open(os.path.join(path, f)) as data:
for line in data:
print(line)
glob will find only your .txt files in the specified path.
Your error comes from not joining the path to the filename, unless the file was in the same directory you were running the code from python would not be able to find the file without the full path. Another issue is you seem to have a directory .idea which would also give you an error when trying to open it as a file. This also presumes you actually have permissions to read the files in the directory.
If your files were larger I would avoid reading all into memory and/or storing the full content.

First of all make sure you add the folder name to the file name, so you can find the file relative to where the script is executed.
To do so you want to use os.path.join, which as it's name suggests - joins paths. So, using a generator:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield f.read()
# this consumes the generator to a list
files_data = list(find_get('filex'))
See what we got in the list that consumed the generator:
print files_data
It may be more convenient to produce tuples which can be used to construct a dict:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield (relative_file_path, f.read(), )
# this consumes the generator to a list
files_data = dict(find_get('filex'))
You will now have a mapping from the file's name to it's content.
Also, take a look at the answer by #Padraic Cunningham . He brought up the glob module which is suitable in this case.

The error you're facing is simple: listdir returns filenames, not full pathnames. To turn them into pathnames you can access from your current working directory, you have to join them to the directory path:
for filename in os.listdir(directory):
pathname = os.path.join(directory, filename)
with open(pathname) as f:
# do stuff
So, in your case, there's a file named .idea in the folder directory, but you're trying to open a file named .idea in the current working directory, and there is no such file.
There are at least four other potential problems with your code that you also need to think about and possibly fix after this one:
You don't handle errors. There are many very common reasons you may not be able to open and read a file--it may be a directory, you may not have read access, it may be exclusively locked, it may have been moved since your listdir, etc. And those aren't logic errors in your code or user errors in specifying the wrong directory, they're part of the normal flow of events, so your code should handle them, not just die. Which means you need a try statement.
You don't do anything with the files but print out every line. Basically, this is like running cat folder/* from the shell. Is that what you want? If not, you have to figure out what you want and write the corresponding code.
You open the same file twice in a row, without closing in between. At best this is wasteful, at worst it will mean your code doesn't run on any system where opens are exclusive by default. (Are there such systems? Unless you know the answer to that is "no", you should assume there are.)
You don't close your files. Sure, the garbage collector will get to them eventually--and if you're using CPython and know how it works, you can even prove the maximum number of open file handles that your code can accumulate is fixed and pretty small. But why rely on that? Just use a with statement, or call close.
However, none of those problems are related to your current error. So, while you have to fix them too, don't expect fixing one of them to make the first problem go away.

Full variant:
import os
def find_get(path):
files = {}
for file in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
with open(os.path.join(path,file), "r") as data:
files[file] = data.read()
return files
print(find_get("filex"))
Output:
{'1.txt': 'dsad', '2.txt': 'fsdfs'}
After the you could generate one file from that content, etc.
Key-thing:
os.listdir return a list of files without full path, so you need to concatenate initial path with fount item to operate.
there could be ideally used dicts :)
os.listdir return files and folders, so you need to check if list item is really file

You should check if the file is actually file and not a folder, since you can't open folders for reading. Also, you can't just open a relative path file, since it is under a folder, so you should get the correct path with os.path.join. Check below:
import os
def find_get(folder):
for file in os.listdir(folder):
if not os.path.isfile(file):
continue # skip other directories
f = open(os.path.join(folder, file), 'r')
for line in f:
print line

Python script errors out

I have this script, which I have no doubt is flawed:
import fnmatch, os, sys
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
with open(filename) as f:
s = f.read()
f.close()
if find in s :
print(filename)
findit(sys.argv[1], sys.argv[2], sys.argv[3])
when I run it I get Errno2, no such file or directory. BUT the file exists. For instance if I execute it by going: findit.py c:\python "folder" *.py it will work just fine, listing all the *.py files which contain the word "folder". BUT if I go findit.py c:\php\projects1 "include" *.php
as an example I get [Errno2] no such file or directory: 'About.php' (for example). But About.php exists. I don't understand what it's doing, or what I'm doing wrong.

If you look at any of the examples for os.walk, you'll see that they all do os.path.join(root, name). You need to do that too.
Why? Quoting from the docs:
filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
If you just use the filename as a path, it's going to look for a file of the same name in the current working directory. If there's no such file, you'll get a FileNotFoundError. If there is such a file, you'll open and read the wrong file. Only if you happen to be looking inside the current working directory will it work.
There's also another major problem in your code: os.walk walks a directory tree recursively, finding all files in the given top directory, or any subdirectory of top, or any subdirectory of… and so on, yielding once for each directory. But you're not doing anything useful with that (except printing out the folders). Instead, you wait until it finishes, and then use the files from whichever directory it happened to reach last.
If you just want to get a flat listing of the files directly in a directory, use os.listdir, not os.walk. (Or maybe use glob.glob instead of explicitly listing everything then filtering with fnmatch.)
On the other hand, if you want to walk the tree, you have to move your second for loop inside the first one.
You've also got a minor problem: You call f.close() inside a with open(…) as f:, which leads to f being closed twice. This is guaranteed to be completely harmless (at least in 2.5+, including 3.x), but it's still a bad idea.
Putting it together, here's a working version of your code:
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
pathname = os.path.join(folder, filename)
with open(pathname) as f:
s = f.read()
if find in s:
print(pathname)

You are using a relative filename. But your current directory does not contain the file. And you don't want to search there anyway. Use os.path.join(folder, filename) to make an absolute path.

Python functions to move all files and folders to a destination folder

I have a txt file, in which in each line i have the path of files and folders i want to segregate into one place.
The list is something like this in my list.txt file.
Each entry starts off on a new line.
C:\xxx\xxy
C:\abc\def\ghi.pdf
and my destination folder is c:\users\mr_a\dest
I want to :
1. move the directory xxy and all its files and subfolders to dest
2. move ghi.pdf file to dest.
Do the same for other entries in the list.txt file.
So that my dest directory would look like:
dest\xxy
dest\ghi.pdf
I looked into shutil but am not still sure which function to use.
It says that the destination directory shouldn't be already existing, but in my case its not so. I'm getting confused which methods to use.
Please also mention if the methods you mention are safe (I don't want any nasty cut-n-paste where bits of my files may go missing etc)
What I'm asking is: What methods to use to accomplish what I need to do here?
Edit: And I use Windows, not Linux or any Unix system

with open('list.txt') as f:
for line in f:
shutil.move(line, dest)

Check out os and os.path. You will find some useful functions like:
os.path.exists - checks whether a path exist (like your destination path)
os.makedirs - creates a directory (including missing parent directory)
os.path.isdir, os.path.isfile - checks whether the path contains a directory or a file.
os.path.basename - cuts the filename out of a path
os.path.join - joins paths (or a path with a filename)
Here is a code example, I didn't try it:
if not os.path.exists(dest):
os.makedirs(dest)
with open('list.txt', 'r') as f:
for line in f.readlines():
filepath = line.strip()
filename = os.path.basename(filepath)
shutil.move(filepath, os.path.join(dest, filename))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.