Python 3+ How to edit a line in a text file - python

This has been asked, and it has been answered. Though, the answers do not favor my situation or what I'm trying to achieve.
I have 100's of text files I need to update. I need to update the same one line in each file. I want to open up a text file and only modify a single line. I do not want to re-write the text file using write() or writelines(). I also do not want to use fileinput.input() because that too is re-writing the text file using print statements.
The files contain thousands of lines of critical data and I cannot trust that Python will recreate everything correctly in the files (I understand this will probably never happen). There are several lines that pertain to footer data which is non-critical and that's what I am updating.
How can one update a single line in a text file, without recreating the file. Appending one line in a text must be possible.
Thanks in advance

I don't think this is possible in any programming language at the file level. Files simply don't work that way -- especially text-based files which you are likely replacing with different-length data in the middle. You would need raw disk level access (making this a stupidly difficult problem).
If you really want to pursue it, check the raw disk question here:
Is it possible to get writing access to raw devices using python with windows?
EVEN THEN: I'm pretty sure at some level AT LEAST an entire block of data will be read from the drive and re-written (this is physically how drives work if I recall).

Related

How to open a text file which has more than 500k lines, without using any iteration?

I am working with text files and looping over them, python works well with files of 10k to 20k lines, most of them are of that length, few text files are over 100k lines, where the code just stops or just keeps buffering, how can we improve the speed or open the text file directly, even if there has to be any iteration, it should be pretty quick, and I want my text file opened in a string format, so no readlines.
I'm confused as to your "without iteration" parameter. You're already looping over the files and using a simple open or some other method, so what is it that you're wanting to change? As you didn't post your code at all there's nothing to work from to understand what might be happening or to suggest changes to.
Also, it sounds like you're hitting system limits, not a limit of python itself. For a question like this it would be worthwhile to give your system parameters alongside the code so that someone can get a full picture when responding.
Typically I just do something similar to the good ol' standby that won't destroy your memory:
fhand = open('file.extension')
for line in fhand:
# do the thing you need to do with each line
You can see a more detailed explanation here or in Dr Chuck's handy free textbook under the "Files" section.

How can I control when (or if) file writes are committed to disk?

I have a file that's a big binary blob, and I'm writing an editor for that file. I want to make changes, but don't want them written to disk right away; I want the option of throwing them away without saving them.
What's a good way to do this?
Some things I thought of:
Slurp the whole file into memory and edit it there. (What if the file is huge though?)
Make a temp file and edit that. (Same problem)
Make a file-like object that writes to an internal data structure that sits "in front of" the file.
#3 sounds like it would work. It also sounds like something someone has probably already done, and I just don't know the right search terms to plug into google.
I'm working in Python 3.

How do I exit without saving a file in Python

I am new to python and have until now written only a few programs to help with my job (I'm a sysadmin). I am writing this script now which will write the output of a MySQL query to a file. While in-between looping, I want to check for an extra condition and if the condition does not match, I want to close the file that I am writing to without saving what it has already written to the file. Like 'exit without saving'. I wrote this simple code to see if not closing the file with a close() will exit without saving, but it is creating the file with the content after I run and exit this code. So, is there a legal way in Python to exit a file without saving?
#/usr/bin/python
fo=open('tempfile.txt','a')
fo.write('This content\n')
P.S:- Python version is 2.4.3 (sorry, cannot upgrade)
There is no such concept in programming.
For the vast majority of the programming languages out there, the write command will attempt to put data directly in the file. This may or may not occur instantly for various reasons so many languages also introduce the concept of flush which will guarantee that your data is written to the file
What you want to do instead is to write all your data to a huge buffer (a string) then conditionally write or skip writing to the file.
Use the tempfile module to create your temporary, then if you need to save it you can do so explicitly using shutil.copyfileobj.
See Quick way to save a python TempFile?
Note that this is only if you absolutely need a temporary file (large amounts of data, etc.); if your contents are small then just using a stringbuffer and only writing it if you need to is a better approach.
Check for the condition before opening the file:
#/usr/bin/python
if condition():
fo=open('tempfile.txt','a')
fo.write('This content\n')
The safest way to do this is not to write to, or even open, the file until you know you want to save it. You could, for example, save what you might eventually write to a string, or list of same, so you can write them once you've decided to.

create pdf from python

I'm looking to generate PDF's from a Python application.
They start relatively simple but some may become more complex (Essentially letter like documents but will include watermarks for example later)
I've worked in raw postscript before and providing I can generate the correct headers etc and file at the end of it I want to avoid use of complex libs that may not do entirely what I want. Some seem to have got bitrot and no longer supported (pypdf and pypdf2) Especially when I know PDF/Postscript can do exactly what I need. PDF content really isn't that complex.
I can generate EPS (Encapsulated postscript) fine by just writing the appropriate text headers to file and my postscript code. But Inspecting PDF's there is a lil binary header I'm not sure how to generate.
I could generate an EPS and convert it. I'm not overly happy with this as the production environment is a Windows 2008 server (Dev is Ubuntu 12.04) and making something and converting it seems very silly.
Has anyone done this before?
Am I being pedantic by not wanting to use a library?
borrowed from ask.yahoo
A PDF file starts with "%PDF-1.1" if it is a version 1.1 type of PDF file. You can read PDF files ok when they don't have binary data objects stored in them, and you could even make one using Notepad if you didn't need to store a binary object like a Paint bitmap in it.
But after seeing the "%PDF-1.1" you ignore what's after that (Adobe Reader does, too) and go straight to the end of the file to where there is a line that says "%%EOF". That's always the last thing in the file; and if that's there you know that just a few characters before that place in the file there's the word "startxref" followed by a number. This number tells a reader program where to look in the file to find the start of the list of items describing the structure of the file. These items in the list can be page objects, dictionary objects, or stream objects (like the binary data of a bitmap), and each one has "obj" and "endobj" marking out where its description starts and ends.
For fairly simple PDF files, you might be able to type the text in just like you did with Notepad to make a working PDF file that Adobe Reader and other PDF viewer programs could read and display correctly.
Doing something like this is a challenge, even for a simple file, and you'd really have to know what you're doing to get any binary data into the file where it's supposed to go; but for character data, you'd just be able to type it in. And all of the commands used in the PDF are in the form of strings that you could type in. The hardest part is calculating those numbers that give the file offsets for items in the file (such as the number following "startxref").
If the way the file format is laid out intrigues you, go ahead and read the PDF manual, which tells the whole story.
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
but really you should probably just use a library
Thanks to #LukasGraf for providing this link http://www.gnupdf.org/Introduction_to_PDF that shows how to create a simple hello world pdf from scratch
As long as you're working in Python 2.7, Reportlab seems to be the best solution out there at the moment. It's quite full-featured, and can be a little complex to work with, depending on exactly what you're doing with it, but since you seem to be familiar with PDF internals in general hopefully the learning curve won't be too steep.
I recommend you to use a library. I spent a lot of time creating pdfme and learned a lot of things along the way, but it's not something you would do for a single project. If you want to use my library check the docs here.

Need suggestions for designing a content organizer

I am trying to write a python program which could take content and categorize it based on the tags. I am using Nepomuk to tag files and PyQt for GUI. The problem is, I am unable to decide how to save the content. Right now, I am saving each entry individually to a text file in a folder. When I need to read the contents, I am telling the program to get all the files in that foder and then perform read operation on each file. Since the number of files is less now (less than 20), this approach is decent enough. But I am worried that when the file count increase, this method would become inefficient. Is there any other method to save content efficiently?
Thanks in advance.
You could use sqlite3 module from stdlib. Data will be stored in a single file. The code might be even simpler than the one used for reading all adhoc text files by hand.
You could always export the data in a format suitable for sharing in your case.

Categories