Issue Replacing Already Existing Strings with ConfigParser - python

I am using ConfigParser to save simple settings to a .ini file, and one of these settings is a directory. Whenever I replace a directory string such as D:/Documents/Data, with a shorter directory string such as D:/, the remaining characters are placed two lines under the option. So the .ini file now looks like this:
[Settings]
directory = D:/
Documents/Data
What am I doing wrong? Here is my code:
import ConfigParser
class Settings():
self.config = ConfigParser.ConfigParser()
def SetDirectory(self, dir): #dir is the directory string
self.config.readfp(open('settings.ini'))
self.config.set('Settings', 'directory', dir)
with open('settings.ini', 'r+') as configfile: self.config.write(configfile)

The r+ option (in the open in the with) is telling Python to keep the file's previous contents, just overwriting the specific bytes that will be written to it but leaving all others alone. Use w to open a file for complete overwriting, which seems to be what you should be doing here. Overwriting just selected bytes inside an existing file is very rarely what you want to do, particularly for text files, which you're more likely to want to see as sequence of lines of text, rather than bunches of bytes! (It can be useful in very specialized cases, mostly involving large binary files, where the by-byte view may make some sense).
The "by-line organization" with which we like to view text files is not reflected in the underlying filesystem (on any OS that is currently popular, at least -- back in the dark past some file organizations were meant to mimic packs of punched cards, for example, so each line had to be exactly 80 bytes, no more, no less... but that's a far-off ancient memory, at most, for the vast majority of computer programmers and users today;-).
So, "overwriting part of a file in-place" (where the file contains text lines of different lengths) becomes quite a problem. Should you ever need to do that, btw, consider the fileinput module of the standard Python library, which mimics this often-desired but-running-against-the-filesystem's-grain operation quite competently. But, it wouldn't help you much in this case, where simple total overwriting seems to be exactly right;-).

Related

How do I access a jpg file's tags (from "Properties") in Python? [duplicate]

How do I access the tags attribute here in the Windows File Properties panel?
Are there any modules I can use? Most google searches yield properties related to media files, file access times, but not much related to metadata properties like Tags, Description etc.
the exif module was able to access a lot more properties than most of what I've been able to find, but still, it wasn't able to read the 'Tags' property.
The Description -> Tags property is what I want to read and write to a file.
There's an entire module dedicated to exactly what I wanted: IPTCInfo3.
import iptcinfo3, os, sys, random, string
# Random string gennerator
rnd = lambda length=3 : ''.join(random.choices(list(string.ascii_letters), k=length))
# Path to the file, open a IPTCInfo object
path = os.path.join(sys.path[0], 'DSC_7960.jpg')
info = iptcinfo3.IPTCInfo(path)
# Show the keywords
print(info['keywords'])
# Add a keyword and save
info['keywords'] = [rnd()]
info.save()
# Remove the weird ghost file created after saving
os.remove(path + '~')
I'm not particularly sure what the ghost file is or does, it looks to be an exact copy of the original file since the file size remains the same, but regardless, I remove it since it's completely useless to fulfilling the read/write purposes of metadata I need.
There have been some weird behaviours I've noticed while setting the keywords, like some get swallowed up into the file (the file size changed, I know they're there, but Windows doesn't acknowledge this), and only after manually deleting the keywords do they reappear suddenly. Very strange.

Most efficient way to check if a string contains any file format?

I have a .txt with hundreds of thousands of paths and I simply have to check if each line is a folder or a file. The hard drive is not with me so I can't use the module os with the os.path.isdir() function. I've tried the code below but it is just not perfect since some folders contains . at the end.
for row in files:
if (row[-6:].find(".") < 0):
folders_count += 1
It is just not worth testing if the ending of the string contains any known file format (.zip, .pdf, .doc ...) since there are dozens of different files format inside this HD. When my code reads the .txt, it stores each line as a string inside an array, so my code should work with the string format.
An example of a folder path:
'path1/path2/truckMV.34'
An example of a file path:
'path1/path2/certificates.pdf'
It's impossible for us to judge if it's a file or path just by the string since an extension is just an arbitrary agreeable string that programs choose to decode in a certain way.
Having said that, if I had the same problem I would do my best to estimate with the following pseudo code:
Create a hash map (or a dictionary as you are in Python)
For every line of the file, read the last bit and see if there's a "." in the last path
Create a key for it on the hash map with a counter of how many times you have encountered the "possible extensions".
After you go through all of the list you will have a collection of possible extensions and how many you have encountered them. Assume the ones with only 1 occurrence (or any other low arbitrary number) to be a path and not an extension.
The basis of this heuristic is that it's unlikely for a person to have a lot of unique extensions on their desktop - but that's just an assumption I came up with.

How to have multiple programs access the same file without manually giving them all the file path?

I'm writing several related python programs that need to access the same file however, this file will be updated/replaced intermittently and I need them all to access the new file. My current idea is to have a specific folder where the latest file is placed whenever it needs to be replaced and was curious how I could have python select whatever text file is in the folder.
Or, would I be better off creating a program that has a Class entirely dedicated to holding the information of the file and have each program reference the file in that class. I could have the Class use tkinter.filedialog to select a new file whenever necessary and perhaps have a text file that has the path or name to the file that I need to access and have the other programs reference that.
Edit: I don't need to write to the file at all just read from it. However, I would like to have it so that I do not need to manually update the path to the file every time I run the program or update the file path.
Edit2: Changed title to suit the question more
If the requirement is to get the most recently modified file in a specific directory:
import os
mypath = r'C:\path\to\wherever'
myfiles = [(f,os.stat(os.path.join(mypath,f)).st_mtime) for f in os.listdir(mypath)]
mysortedfiles = sorted(myfiles,key=lambda x: x[1],reverse=True)
print('Most recently updated: %s'%mysortedfiles[0][0])
Basically, get a list of files in the directory, together with their modified time as a list of tuples, sort on modified date, then get the one you want.
It sounds like you're looking for a singleton pattern, which is a neat way of hiding a lot of logic into an 'only one instance' object.
This means the logic for identifying, retrieving, and delivering the file is all in one place, and your programs interact with it by saying 'give me the one instance of that thing'. If you need to alter how it identifies, retrieves, or delivers what that one thing is, you can keep that hidden.
It's worth noting that the singleton pattern can be considered an antipattern as it's a form of global state, it depends on the context of the program if this is a deal breaker or not.
To "have python select whatever text file is in the folder", you could use the glob library to get a list of file(s) in the directory, see: https://docs.python.org/2/library/glob.html
You can also use os.listdir() to list all of the files in a directory, without matching pattern names.
Then, open() and read() whatever file or files you find in that directory.

Python/Linux - unzip file while reading it

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with python(or linux command line) to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?
Python, since the 1.6 version, provides the module zipfile to handle this kind of circumstances. An example usage:
import csv
import zipfile
with zipfile.ZipFile('myarchive.zip') as archive:
with archive.open('the_zipped_file.csv') as fin:
reader = csv.reader(fin, ...)
for record in reader:
# process record.
note that in python3 things get a bit more complicated because the file-like object returned by archive.open yields bytes, while csv.reader wants strings. You can write a simple class that does the conversion from bytes to strings using a given encoding:
class EncodingConverter:
def __init__(self, fobj, encoding):
self._iter_fobj = iter(fobj)
self._encoding = encoding
def __iter__(self):
return self
def __next__(self):
return next(self._iter_fobj).decode(self._encoding)
and use it like:
import csv
import zipfile
with zipfile.ZipFile('myarchive.zip') as archive:
with archive.open('the_zipped_file.csv') as fin:
reader = csv.reader(EncodingConverter(fin, 'utf-8'), ...)
for record in reader:
# process record.
While it's very possible to open ZIP files in Python, it is also possible to transparently handle this operation using a filesystem extension. If this is preferable or not depends on various factors including system access and solution portability.
See Fuse-Zip:
With fuse-zip you really can work with ZIP archives as real directories. Unlike KIO or Gnome VFS, it can be used in any application without modifications.
Or AVFS: A Virtual File System:
AVFS is a system, which enables all programs to look inside gzip, tar, zip, etc. files or view remote (ftp, http, dav, etc.) files, without recompiling the programs.
Note that these solutions are system-specific and rely on FUSE. There might be similar transparent solutions for Windows - but that would require another investigation for the specific system.

Check if an open file has been deleted after open in python

Is it possible to check if a file has been deleted or recreated in python?
For example, if you did a open("file") in the script, and then while that file is still open, you do rm file; touch file;, then the script will still hold a reference to the old file even though it's already been deleted.
You should fstat the file descriptor for the opened file.
>>> import os
>>> f = open("testdv.py")
>>> os.fstat(f.fileno())
posix.stat_result(st_mode=33188, st_ino=1508053, st_dev=65027L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=1107, st_atime=1349180541, st_mtime=1349180540, st_ctime=1349180540)
>>> os.fstat(f.fileno()).st_nlink
1
Ok, this file has one link, so one name in the filesystem. Now remove it:
>>> os.unlink("testdv.py")
>>> os.fstat(f.fileno()).st_nlink
0
No more links, so we have an "anonymous file" that's only kept alive as long as we have it open. Creating a new file with the same name has no effect on the old file:
>>> g = open("testdv.py", "w")
>>> os.fstat(g.fileno()).st_nlink
1
>>> os.fstat(f.fileno()).st_nlink
0
Of course, st_nlink can sometimes be >1 initially, so checking that for zero is not entirely reliable (though in a controlled setting, it might be good enough). Instead, you can verify whether the file at the path you initially opened is the same one that you have a file descriptor for by comparing stat results:
>>> os.stat("testdv.py") == os.fstat(f.fileno())
False
>>> os.stat("testdv.py") == os.fstat(g.fileno())
True
(And if you want this to be 100% correct, then you should compare only the st_dev and st_ino fields on stat results, since the other fields and st_atime in particular might change in between the calls.)
Yes. Use the os.stat() function to check the file length. If the length is zero (or the function returns the error "File not found"), then someone deleted the file.
Alternatively, you can open+write+close the file each time you need to write something into it. The drawback is that opening a file is a pretty slow operation, so this is out of the question if you need to write a lot of data.
Why? Because the new file isn't the file that you're holding open. In a nutshell, Unix filesystems have two levels. One is the directory entry (i.e. the file name, file size, modification time, pointer to the data) and the second level is the file data.
When you open a file, Unix uses the name to find the file data. After that, it operates only on the second level - changes to the directory entry have no effect on any open "file handles". Which is exactly why you can delete the directory entry: Your program isn't using it.
When you use os.stat(), you don't look at the file data but at the directory entry again.
On the positive side, this allows you to create files which no one can see but your program: Open the file, delete it and then use it. Since there is no directory entry for the file, no other program can access the data.
On the negative side, you can't easily solve problems like the one you have.
Yes -- you can use the inotify facility to check for file changes and more. There also is a Python binding for it. Using inotify you can watch files or directories for filesystem activiy. From the manual the following events can be detected:
IN_ACCESS File was accessed (read) (*).
IN_ATTRIB Metadata changed, e.g., permissions, timestamps, extended attributes, link count (since Linux 2.6.25), UID, GID, etc. (*).
IN_CLOSE_WRITE File opened for writing was closed (*).
IN_CLOSE_NOWRITE File not opened for writing was closed (*).
IN_CREATE File/directory created in watched directory (*).
IN_DELETE File/directory deleted from watched directory (*).
IN_DELETE_SELF Watched file/directory was itself deleted.
IN_MODIFY File was modified (*).
IN_MOVE_SELF Watched file/directory was itself moved.
IN_MOVED_FROM File moved out of watched directory (*).
IN_MOVED_TO File moved into watched directory (*).
IN_OPEN File was opened (*).
From here you can google yourself a solution, but I think you get the overall idea. Of course this may only work on Linux, but from your question I assume you are using it (references to rm and touch).

Categories