As part of a bigger project, I would simply like to make sure that a file can be opened and Python can read and use it. So after I opened up the txt file, I said:
data = txtfile.read()
first_line = data.split('\n',1)[2]
print(first_line)
I also tried
print(f1.readline())
where f1 is the txt file. This, again, did nothing.
I am using the spyder IDE, and it just says running file, and doesn't print anything. Is it because my file is too large? It is 4.6 gigs.
Does anyone have any idea what's going on?
and it just says running file, and doesn't print anything. Is it
because my file is too large? It is 4.6 gigs.
Yes.
data = txtfile.read()
This function is going to read the entire file. Since you stated that the file is 4.6GB, it is going to take time to load the entire file and then split the by newline character.
See this: Read large text files in Python
I don't know your context of use, so, if you can process line by line, it would be simpler. Or even chunks would make it simpler than reading the entire file.
first_line = open('myfile.txt', 'r').readline()
Related
If I run
file = open("BAL.txt","w")
I = '200'
file.write(I)
file.close
from a script, it outputs nothing in the file. (It literally overwrites the file with nothing)
Furthermore, running cat BAL.txt just goes to the next line like nothing is in the file.
But if I run it line by line in a python console it works perfectly fine.
Why does this happen. ( I am a begginner learning python the mistake may be super obvious. I have thrown about 2 hours into trying to figure this out)
Thanks in advance
You aren't closing your file properly. To close it you are missing the () at the end of file.close so it should look like this:
file = open("BAL.txt", "w")
file.write("This has been written to a file")
file.close()
This site has the same example and may be of some use to you.
Another way, especially useful when you are appending multiple values into a single file is to use something like with open("BAL.txt","w") as file:. Here is your script rewritten to include this example:
I = '200'
with open("BAL.txt","w") as file:
file.write(I)
This opens our file with the value file and allows us to write values to it. Also note that file.close() is not needed here and when appending text w+ needs to be used.
to write to a file you do this:
file = open("file.txt","w")
file.write("something")
file.close()
when you use file.write() it deletes all of the contents of the file, if you want to write to the end of the file do this:
file = open("file.text","w+")
file.write(file.read()+"something")
file.close()
There are other ways to do this but this one is the most intuitive (not the most efficient), also the other way tends to be buggy so there is no reason to post it because this is reliable.
Firstly, you're missing the parentheses when you're closing the file. Secondly, writing to a file should be done like this:
file = open("BAL.txt", "w")
file.write("This has been written to a file")
file.close()
Let me know if you have any questions.
I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'
I have a problem in reading a huge txt file with python. I should read all the ~500M lines of a 33 GB .txt file, one by one, but for some obscure reason, my script stops at the 7446633rd line, and gives no error..
The script is the following easy one:
file = open ("file.txt","r")
i = 0
for line in file:
i = i + 1
print i
file.close()
I tried the script on more than one machine, and with both 32 and 64-bit versions of python, but no luck..
Anyone knows what could be the problem??
Try using the "with" statement.
with open("file.txt") as input_file:
for line in input_file:
process_line(line)
Also you could probably think about processing the lines in parallel using celery or something similar.
Later edit: if that doesn't work try to open the files and then use a range to read lines (read in batches).
Say I have a 10GB HDD Ubuntu VPS in the USA (and I live in some where else), and I have a 9GB text file on the hard drive. I have 512MB of RAM, and about the same amount of swap.
Given the fact that I cannot add more HDD space and cannot move the file to somewhere else to process, is there an efficient method to remove some lines from the file using Python (preferably, but any other language will be acceptable)?
How about this? It edits the file in place. I've tested it on some small text files (in Python 2.6.1), but I'm not sure how well it will perform on massive files because of all the jumping around, but still...
I've used a indefinite while loop with a manual EOF check, because for line in f: didn't work correctly (presumably all the jumping around messes up the normal iteration). There may be a better way to check this, but I'm relatively new to Python, so someone please let me know if there is.
Also, you'll need to define the function isRequired(line).
writeLoc = 0
readLoc = 0
with open( "filename" , "r+" ) as f:
while True:
line = f.readline()
#manual EOF check; not sure of the correct
#Python way to do this manually...
if line == "":
break
#save how far we've read
readLoc = f.tell()
#if we need this line write it and
#update the write location
if isRequired(line):
f.seek( writeLoc )
f.write( line )
writeLoc = f.tell()
f.seek( readLoc )
#finally, chop off the rest of file that's no longer needed
f.truncate( writeLoc )
Try this:
currentReadPos = 0
removedLinesLength = 0
for line in file:
currentReadPos = file.tell()
if remove(line):
removedLinesLength += len(line)
else:
file.seek(file.tell() - removedLinesLength)
file.write(line + "\n")
file.flush()
file.seek(currentReadPos)
I have not run this, but the idea is to modify the file in place by overwriting the lines you want to remove with lines you want to keep. I am not sure how the seeking and modifying interacts with the iterating over the file.
Update:
I have tried fileinput with inplace by creating a 1GB file. What I expected was different from what happened. I read the documentation properly this time.
Optional in-place filtering: if the
keyword argument inplace=1 is passed
to fileinput.input() or to the
FileInput constructor, the file is
moved to a backup file and standard
output is directed to the input file
(if a file of the same name as the
backup file already exists, it will be
replaced silently).
from docs/fileinput
So, this doesn't seem to be an option now for you. Please check other answers.
Before Edit:
If you are looking for editing the file inplace, then check out Python's fileinput module - Docs.
I am really not sure about its efficiency when used with a 10gb file. But, to me, this seemed to be the only option you have using Python.
Just sequentially read and write to the files.
f.readlines() returns a list
containing all the lines of data in
the file. If given an optional
parameter sizehint, it reads that many
bytes from the file and enough more to
complete a line, and returns the lines
from that. This is often used to allow
efficient reading of a large file by
lines, but without having to load the
entire file in memory. Only complete
lines will be returned.
Source
Process the file getting 10/20 or more MB of chunks.
This would be the fastest way.
Other way of doing this is to stream this file and filter it using AWK for example.
example pseudo code:
file = open(rw)
linesCnt=50
newReadOffset=0
tmpWrtOffset=0
rule=1
processFile()
{
while(rule)
{
(lines,newoffset)=getLines(file, newReadOffset)
if lines:
[x for line in lines if line==cool: line]
tmpWrtOffset = writeBackToFile(file, x, tmpWrtOffset) #should return new offset to write for the next time
else:
rule=0
}
}
To resize file at the end use truncate(size=None)
I'm attempting to do a "find and replace" in a file on a Mac OS X computer. Although it appears to work correctly. It seems that the file is somehow altered. The text editor that I use (Text Wrangler) is unable to even open the file once this is completed.
Here is the code as I have it:
import fileinput
for line in fileinput.FileInput("testfile.txt",inplace=1):
line = line.replace("newhost",host)
print line,
When I view the file from the terminal, it does say "testfile" may be a binary file. See it anyway? Is there a chance that this replace is corrupting the file? Do I have another option for this to work? I really appreciate the help.
Thank you,
Aaron
UPDATE: the actual file is NOT a .txt file it is a .plist file which is preference file in Mac OS X if that makes any difference
LINK to plist file:
http://www.queencitytech.com/plist.zip
Your code worked for me fine. However, I would suggest a different approach: don't try overwriting the file directly. I never like changing the file directly because if you have a bug or something like that the file is lost. Generate a new file then copy it over manually (or within python, if you really want to).
PATH = 'testfile.txt'
FILE = open(PATH)
OUT_FILE = open('out_' + PATH, 'w')
for line in FILE.readlines():
print >> OUT_FILE, line.replace('newhost', host),
Try using sys.stdout.write instead of print. readlines() retains the new line characters at the end of the read line. The print statement adds an additional new line character, so it's likely double spacing the file.