I'm having some problems getting a file to memory map, and was hoping to resolve this issue. I've detailed the problem and shown my code below.
What I'm importing:
import os
import mmap
Now, for the code:
file = r'otest' # our file name
if os.path.isfile(file): # test to see if the file exists
os.remove(file) # if it does, delete it
f = open(file, 'wb') # Creates our empty file to write to
print(os.getcwd())
Here is where I encounter the problem with my code (I've included both, and have one commented out each time I run the program):
mfile = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)
#mfile = mmap.mmap(f.fileno(), 10**7, access=mmap.ACCESS_WRITE)
I'm encountering an error with either of the mfile lines.
For the mmap.mmap line with the 0 argument I get this error: ValueError: cannot mmap an empty file. If I instead use the 10**7 argument I get this error instead: PermissionError: [WinError 5] Access is denied
And to end it:
"""
Other stuff goes here
"""
f.close() # close out the file
The 'other stuff here' is just a place holder for a where I'm going to put more code to do things.
Just to add, I've found this thread which I thought may help, but both the ftruncate and os.truncate functions did not seem to help the issue at hand.
As that thread that you linked shows, mmap requires you to create the file first and then modify it.
So first, create an empty file doing something like:
f = open(FILENAME, "wb")
f.write(FILESIZE*b'\0')
f.close()
Then, you will be able to access the file and mapping it using:
f = open(FILENAME, "r+b")
mapf = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)
Notice the way the file is being open. Remember that you can flush your buffer by doing (more details here Usage of sys.stdout.flush() method):
sys.stdout.flush()
Let me know if you want me to go into details about any of this points.
Related
I found this solution from stackoverflow but will i have to add this piece of code every time i write a code to perform read/write operations.
Or is there any long term solution available for this ?
import os
path = "E:\Python\learn"
os.chdir(path)
f = open('text.txt')
data = f.read()
print(data)
f.close()
I think you need a way to read same file multiple times in your python code. Use file.seek() to jump to a specific position in a file. However, think about whether it is really necessary to go through the file again. An Example for file.seek:
with open(name, 'r+') as file:
for line in file:
# Do Something with line
file.seek(0)
for line in file:
# Do Something with line, second time
Incase you want to re-iterate the file second time and don't want to open the file again, you can follow this syntax:
with open(name, 'r+') as file:
for line in file:
# Do Something with line
for line in file:
# Do Something with line, second time
I'm trying to make a script work under windows, but seem to run in circles. It works on posix filesystems. In the code below, i'm using the library tempfile in a with statement to create a tempfile and copy the source into it. Then i want to use the function strip_whitespaces_from_file() on this tempfile. But instead the permission error below.
import tempfile
import shutil
import fileinput
source_file_name = r"C:\Users\morph\Stuff\test.txt"
def strip_whitespaces_from_file(input_file_path: str):
with fileinput.FileInput(files=(input_file_path,), inplace=True) as fp:
for line in fp:
print(line.strip())
with tempfile.NamedTemporaryFile(delete=False) as dest_file:
shutil.copy(source_file_name, dest_file.name)
strip_whitespaces_from_file(dest_file.name)
Instead i get this output:
"PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:
'C:\\Users\\morph\\AppData\\Local\\Temp\\tmpbvywadw7' -> 'C:\\Users\\morph\\AppData\\Local\\Temp\\tmpbvywadw7.bak'
The error message seems quite straightforward, but i can't find a way around it. There are a couple of answers, e.g. this one, but that would mean i have to close the file before i work on it. Apparently the tempfile is opened by the same process that wants to write to it? Isn't that the whole point? I'm confused.
Edit: Below is my clunky workaround. But it illustrates the need for delete=False. If i remove it i get a FileNotFoundError. To me that looks like the file i removed immediately after creating it in the first line of below code. The manual closing and os.unlink also shouldn't be needed.
with tempfile.NamedTemporaryFile(delete=False) as temp:
with open(source_file_name, 'r') as src_file:
for line in src_file:
stripped_line = line.strip()+"\n"
temp.write(stripped_line.encode('utf-8'))
src_file.close()
temp.close()
# here would be an external converter using the temp file as argument
os.unlink(temp.name)
print(os.path.exists(temp.name))
I am trying to loop over the lines of a text file which is verifiably non-empty and I am running into problems with my script. In my attempt to debug what I wrote, I figured I would make sure my script is properly reading from the file, so I am currently trying to print every line in it.
At first I tried using the usual way of doing this in Python i.e.:
with open('file.txt') as fo:
for line in fo:
print line
but my script is not printing anything. I then tried storing all of the lines in a list like so:
with open('file.txt') as fo:
flines = fo.readlines()
print flines
and yet my program still outputs an empty list (i.e. []). I have also tried making sure that my file pointer is pointing to the beginning of the file using fo.seek(0) before attempting to read from it, yet that also does not work.
I have spent some time reading solutions to similar questions posted on here, but so far nothing I have tried has worked. I do not know how such an elementary I/O operation is giving me so much trouble, but I must be missing something basic and would thus really appreciate any help/suggestions.
EDIT: Here is the part of my script which is causing the problem:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
And here is an exerpt from the nnp-pos-1.xyz file (all lines have the same format and there are 370642 of them in total):
Ti 32.9136715924 28.5387609200 24.6554922872
O 39.9997000300 35.1489480846 22.8396092714
O 33.7314699265 30.3398473499 23.8866085372
Ti 27.7756767925 31.3455930970 25.9779887743
O 31.1520937719 29.0752315770 25.4786577758
O 26.1870965535 32.4876155555 26.3346205619
Ti 38.4478275543 25.5734609650 22.0654953429
O 24.1328940232 31.3858060129 28.8575469919
O 38.6506317714 27.3779871011 22.6552032123
Ti 40.5617501289 27.5095900385 22.8436684314
O 38.2400600469 29.1828342919 20.7853056680
O 38.8481088254 27.2704154737 26.9590081202
When running the script, the file being read from (conf_15000.xyz) gets written to properly, however I cannot seem to be able to read from it at runtime.
EDIT-2: Following sudonym's recommendation I am using the absolute file path and am checking whether or not the file is empty before reading from it by adding the following unindented lines between the two with statements I wrote in my previous edit:
print os.path.isfile(r'full/path/to/file')
print (os.stat(r'full/path/to/file').st_size != 0)
The first boolean evaluates to True (meaning the file exists) while the second evaluates to False (meaning the file is empty). This is very strange because both of these lines are added after I close the file pointer fo which writes to the file and also because the file being written to (and subsequently read from with fp) is not empty after I execute the script (in fact, it contains all the lines it is supposed to).
EDIT-3: Turns out the reason why my script saw the file it needed to read as empty is because it did not wait for the subprocess (p2 in the example above) that writes to it to stop executing, meaning it would execute the lines after my first with statement before the file pointer was actually closed (i. e. before the file was done being written to). The fix was therefore to add the statement p2.wait() at the end of the first with statement like so:
import subprocess as sbp
with open('conf_15000.xyz','w') as fo:
p1 =sbp.Popen(['head','-n', '300000','nnp-pos-1.xyz'],stdout=sbp.PIPE)
p2 = sbp.Popen(['tail','-n', '198'],stdin=p1.stdout,stdout=fo)
p2.wait()
with open('conf_15000.xyz','r') as fp:
fp.seek(0)
flines = fp.readlines()
print flines
Now everything works the way it is supposed to.
You probably need to flush() the buffers first (and maybe call os.fsync() too) - after writing and before reading.
See file.flush() and this post.
first, include the absolute path.
Second, check if the file actually exists and is not empty:
import os
FILEPATH = r'path\to\file.txt' # full path as raw string
if os.path.isfile(FILEPATH) and (os.stat(FILEPATH).st_size != 0):
with open(FILEPATH) as fo:
flines = fo.readlines()
print flines
else:
print FILEPATH, "doesn't exist or is empty"
I have few hundred bigfiles(based on line nos.).
I am trying to write a code using a loop.
First the loop reads the bigfile in the folder,
second it will make a folder of the same filename it is reading
and lastly it will slice the file in the same folder created.
This loop should iterate over all the bigfiles present in the folder.
My code is as follow:
import glob
import os
os.chdir("/test code/")
lines_per_file = 106
sf = None
for file in glob.glob("*.TAB"):
with open(file) as bigfile:
for lineno, line in enumerate(bigfile):
if lineno % lines_per_file == 0:
if sf:
sf.close()
sf_filename = '/test code/201511_sst/sf_{}.txt'.format(lineno + lines_per_file)
sf = open(sf_filename, "w")
sf.write(line)
if sf:
sf.close()
I am getting the output as follow:
In [35]: runfile('/test code/file_loop_16Jan.py', wdir='/test code')
In [36]:
I need a little guidance in looping the files so that I can achieve it. I think no error means I am missing something !!
Please anyone can help me out !
sf is set to None at start so you never enter in the if sf loop: no output file is ever written anywhere.
Besides, when you close the file, you have to set sf to None again or you'll get "operation on closed file" when closing again.
But that won't do what you want. You want to split the file so do this:
if lineno % lines_per_file == 0:
# new file, close previous file if any
if sf:
sf.close()
# open new file
sf_filename = '/test code/201511_sst/sf_{}.txt'.format(lineno + lines_per_file)
sf = open(sf_filename, "w")
# write the line in the current handler
sf.write(line)
the first if is encountered at start: good. Since sf is None it doesn't call close (for the best)
it then opens the file with the new filename
now the line is written in the new file handle (you have to write one line at each iteration, not only when the modulo matches)
On next iterations, when the modulo matches, the previous file is closed, and a new handle is created with a new filename.
Don't forget to close the last file handle when exiting the loop:
if sf:
sf.close()
I haven't tested it but the logic is here. Comment if you have subsequent issues I'll edit my post.
Aside: another problem is that if there are more than 1 big *.TAB file, the split file will be overwritten. To avoid that, I would add the input file basename in the output file for instance (lineno is reset in each loop):
sf_filename = '/test code/201511_sst/{}_sf_{}.txt'.format(os.path.splitext(os.path.basename(file))[0]),lineno + lines_per_file)
you can do that by storing the end lineno too and compute a line offset. It's up to you
Since You're already using with statement for reading files, you can also use the same for writing into files, so that way, you don't need to close the file object explicitly. see these links.
https://docs.python.org/2/reference/compound_stmts.html#with
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
You can simply do this:
with open(file,"w") as sf:
// read/write file content and do your stuff here
While doing some Python tutorials, the author of the book asked for the following (opening a file and reading its content):
#we could do this in one line, how?
in_file = open(from_file)
indata = in_file.read()
How can I do this in one line?
You can get all the contents of the file from a file handler by by simply extending what you have with your path and reading from it.
indata = open(from_file).read()
However, it's more obvious what's happening and easier to extend if you use a with block.
with open(from_file) as fh: # start a file handler that will close itself!
indata = fh.read() # get all of the contents of in_file in one go as a string
Beyond that, you should protect your file opening and closing from IOErrors if (for example) your file path does not exist.
Finally, files are by default opened read-only and raise an error if you (or someone else later) attempts to write to it, which will safeguard data blocks. You can change this from 'r' to a variety of other options depending on your needs.
Here is a fairly complete example with the above concepts.
def get_file_contents(input_path):
try:
with open(input_path, 'r') as fh:
return fh.read().strip()
except IOError:
return None