Read files in python - python

Python learner. So please excuse me.
I am following: http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
I want to read a file; here is my file:
# cat test
line1 word1
line2 word2
line3 word3
line4 word4
and here it my code:
>>> f = open ('test')
>>> for line in f:
... print f
...
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
How and why i am getting above output? I was hoping that it will print each line per line.
What am I missing here. looking at the link mentioned above, my syntax seems to be OK but the output is not
Thanks.

During the iteration, you are printing f instead of the line variable.
>>> f = open ('test')
>>> for line in f:
... print line

You are printing the file handle, replace print f with print line:
f = open ('test')
for line in f:
print line

for line in open('test')
print line

You have to read the lines with a readline function. Then it turns each line into a list value. That is when you can do your for statement.

for line in f:
This means 'iterate through the contents of f, assigning each item to the variable named 'line''. Not the converse. f won't change.. it's what you are getting the data from, not the variable you are assigning it to.
That said, the other answer I saw.. from someone whose name starts with 'D'.. is correct: you want to print (line), not print(f). I write that with ()'s because it's a good habit to get into, it makes your code compatible with both Python 2.x and Python 3.x.

Related

Modify a string in a text file

In a file I have a names of planets:
sun moon jupiter saturn uranus neptune venus
I would like to say "replace saturn with sun". I have tried to write it as a list. I've tried different modes (write, append etc.)
I think I am struggling to understand the concept of iteration, especially when it comes to iterating over a list, dict, or str in file. I know it can be done using csv or json or even pickle module. But my objective is to get the grasp of iteration using for...loop to modify a txt file. And I want to do that using .txt file only.
with open('planets.txt', 'r+')as myfile:
for line in myfile.readlines():
if 'saturn' in line:
a = line.replace('saturn', 'sun')
myfile.write(str(a))
else:
print(line.strip())
Try this but keep in mind if you use string.replace method it will replace for example testsaturntest to testsuntest, you should use regex instead:
In [1]: cat planets.txt
saturn
In [2]: s = open("planets.txt").read()
In [3]: s = s.replace('saturn', 'sun')
In [4]: f = open("planets.txt", 'w')
In [5]: f.write(s)
In [6]: f.close()
In [7]: cat planets.txt
sun
This replaces the data in the file with the replacement you want and prints the values out:
with open('planets.txt', 'r+') as myfile:
lines = myfile.readlines()
modified_lines = map(lambda line: line.replace('saturn', 'sun'), lines)
with open('planets.txt', 'w') as f:
for line in modified_lines:
f.write(line)
print(line.strip())
Replacing the lines in-file is quite tricky, so instead I read the file, replaced the files and wrote them back to the file.
If you just want to replace the word in the file, you can do it like this:
import re
lines = open('planets.txt', 'r').readlines()
newlines = [re.sub(r'\bsaturn\b', 'sun', l) for l in lines]
open('planets.txt', 'w').writelines(newlines)
f = open("planets.txt","r+")
lines = f.readlines() #Read all lines
f.seek(0, 0); # Go to first char position
for line in lines: # get a single line
f.write(line.replace("saturn", "sun")) #replace and write
f.close()
I think its a clear guide :) You can find everything for this.
I have not tested your code but the issue with r+ is that you need to keep track of where you are in the file so that you can reset the file position so that you replace the current line instead of writing the replacement afterwords. I suggest creating a variable to keep track of where you are in the file so that you can call myfile.seek()

Insert a line into the middle of a text file in Python

I want to insert a line into the middle of a text file in Python, so I tried
with open(erroredFilepath, 'r+t') as erroredFile:
fileContents = erroredFile.read()
if 'insert_here' in fileContents:
insertString.join(fileContents.rsplit('insert_here'))
erroredFile.truncate()
erroredFile.write(insertString)
However, insertString got written at the end of the file. Why?
As an aside, I tried to simply things, by just using strings, instead of files.
'123456789'.join('qwertyuiop'.split('y'))
gives
'qwert123456789uiop'
what happened to the 'y' ?
If you want to write in the middle of the file use the fileinput module.
import fileinput
for line in fileinput.input(erroredFilepath, inplace=True):
print("something", end ="")
from the docs
if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently).
Whatever you print will go in the file. So you have to read and print every line and modify whichever you want to replace. Also, when printing existing lines, use end="" as it will prevent print from adding an extra newline.
Although OS-level details of files vary, in general, when you have a file open in r+ mode and do some read or write operation, the "current position" is left after the last read or write.
When you did:
fileContents = erroredFile.read()
the stream erroredFile was read to the end, so the current position is now "at the end".
The truncate function defaults to using the current position as the size to which to truncate. Assume the file is 100 bytes long, so that the current position "at the end" is byte 100. Then:
erroredFile.truncate()
means "make the file 100 bytes long"—which it already is.
The current position remains at the end of the file, so the subsequent write appends.
Presumably you wanted to seek back to the beginning of the file, and/or use truncate(0) (note that just truncate(0) will, at least on Unix-like systems, leave the seek position at the end of the file so that the next write leaves a hole where the original data used to be). You could also be slightly more clever: if you're inserting, just overwrite-and-extend in place (no truncate is required at all).
(Joel Hinz already answered the second question, I see.)
test.txt
a
b
c
d
e
1. Read into a list then overwrite
def match_then_insert(filename, match, content):
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')
Result
a
b
123
c
d
e
2. FileInput
from fileinput import FileInput
def match_then_insert(filename, match, content):
for line in FileInput(filename, inplace=True):
if match in line:
line = content + '\n' + line
print(line, end='') # Redirect to the original file
match_then_insert('test.txt', match='c', content='123')
3. seek
def match_then_insert(filename, match, content):
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline()
except IndexError:
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1)
rest = f.read()
f.seek(-len(rest), 1)
f.truncate()
content = content + '\n'
f.write(content.encode())
f.write(rest)
break
match_then_insert('test.txt', match='c', content='123')
Compare
Method
Time/s
Read into a list then overwrite
54.42
FileInput
121.59
seek
3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
for line in FileInput(filename, inplace=True):
if match in line:
line = content + '\n' + line
print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline()
except IndexError:
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1)
rest = f.read()
f.seek(-len(rest), 1)
f.truncate()
content = content + '\n'
f.write(content.encode())
f.write(rest)
break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))
Not a Python answer but it may widen your horizon. Use sed:
$ cat input.txt
foo
bar
baz
INSERT HERE
qux
quux
$ sed '/INSERT HERE/anew stuff' < input.txt
foo
bar
baz
INSERT HERE
new stuff
qux
quux
The command a will append the text on a new line. If you want to insert the text before the match, use the command i:
$ sed '/INSERT HERE/inew stuff' < input.txt
foo
bar
baz
new stuff
INSERT HERE
qux
quux
Why do you not try a two steps solution? At first, you read and fix the string, at the second step, you rewrite the file. Probably it's not the most efficient algorithm, but I think it works.
with open(erroredFilepath, 'r') as erroredFile:
fileContents = erroredFile.read()
fileContents.replace('insert_here', 'insert_string')
with open(erroredFilePath, 'w') as fixingFile:
fixingFile.write(fileContents)

Python Insert text before a specific line

I want to insert a text specifically before a line 'Number'.
I want to insert 'Hello Everyone' befor the line starting with 'Number'
My code:
import re
result = []
with open("text2.txt", "r+") as f:
a = [x.rstrip() for x in f] # stores all lines from f into an array and removes "\n"
# Find the first occurance of "Centre" and store its index
for item in a:
if item.startswith("Number"): # same as your re check
break
ind = a.index(item) #here it produces index no./line no.
result.extend(a[:ind])
f.write('Hello Everyone')
tEXT FILE:
QWEW
RW
...
Number hey
Number ho
Expected output:
QWEW
RW
...
Hello Everyone
Number hey
Number ho
Please help me to fix my code:I dont get anything inserted with my text file!Please help!
Answers will be appreciated!
The problem
When you do open("text2.txt", "r"), you open your file for reading, not for writing. Therefore, nothing appears in your file.
The fix
Using r+ instead of r allows you to also write to the file (this was also pointed out in the comments. However, it overwrites, so be careful (this is an OS limitation, as described e.g. here). The following should do what you desire: It inserts "Hello everyone" into the list of lines and then overwrites the file with the updated lines.
with open("text2.txt", "r+") as f:
a = [x.rstrip() for x in f]
index = 0
for item in a:
if item.startswith("Number"):
a.insert(index, "Hello everyone") # Inserts "Hello everyone" into `a`
break
index += 1
# Go to start of file and clear it
f.seek(0)
f.truncate()
# Write each line back
for line in a:
f.write(line + "\n")
The correct answer to your problem is the hlt one, but consider also using the fileinput module:
import fileinput
found = False
for line in fileinput.input('DATA', inplace=True):
if not found and line.startswith('Number'):
print 'Hello everyone'
found = True
print line,
This is basically the same question as here: they propose to do it in three steps: read everything / insert / rewrite everything
with open("/tmp/text2.txt", "r") as f:
lines = f.readlines()
for index, line in enumerate(lines):
if line.startswith("Number"):
break
lines.insert(index, "Hello everyone !\n")
with open("/tmp/text2.txt", "w") as f:
contents = f.writelines(lines)

Replace a character by another in a file

I'd like to modify some characters of a file in-place, without having to copy the entire content of the file in another, or overwrite the existing one. However, it doesn't seem possible to just replace a character by another:
>>> f = open("foo", "a+") # file does not exist
>>> f.write("a")
1
>>> f.seek(0)
0
>>> f.write("b")
1
>>> f.seek(0)
0
>>> f.read()
'ab'
Here I'd have expected "a" to be replaced by "b", so that the content of the file would be just "b", but this is not the case. Is there a way to do this?
That's because of the mode you're using, in append mode, the file pointer is moved to the end of file before write, you should open your file in w+ mode:
f = open("foo", "w+") # file does not exist
f.write("samething")
f.seek(1)
f.write("o")
f.seek(0)
print f.read() # prints "something"
If you want to do that on an existing file without truncating it, you should open it in r+ mode for reading and writing.
Truncate the file using file.truncate first:
>>> f = open("foo", "a+")
>>> f.write('a')
>>> f.truncate(0) #truncates the file to 0 bytes
>>> f.write('b')
>>> f.seek(0)
>>> f.read()
'b'
Otherwise open the file in w+mode as suggested by #Guillaume.
import fileinput
for line in fileinput.input('abc', inplace=True):
line = line.replace('t', 'ed')
print line,
This doesn't replace character by character, instead it scans through each line replacing required character and writes the modified line.
For example:
file 'abc' contains:
i want
to replace
character
After executing, output would be:
i waned
edo replace
characeder
Will it help you? Hope so..
I believe that you may be able to modify the example from this answer.
https://stackoverflow.com/a/290494/1669208
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print line.replace(char1, char2),

Open() and codecs.open() in Python 2.7 behave strangely different

I have a text file with first line of unicode characters and all other lines in ASCII.
I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?
And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.
If you call .readlines() again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline() and .readlines():
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.

Categories