readlines() cannot read lines after using readline() - python

The following simple code reads a CSV file and returns the number of lines of the file. As you can see in the output, the file has 501 lines.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> print len(f.readlines())
501
But if I insert a readline() before using readlines(), the latter does not reach at the end of the file.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> f.readline()
>>> print len(f.readlines())
1
Is there any basic mistake in my code? How can I mix readline() and readlines()? (actually I don't need to mix these two functions in my real program, but I am just curious...)
You can download the file at
https://dl.dropboxusercontent.com/u/16653989/tmp/tmp.csv

This has something to do with the codecs module. Because when you do the same thing with the regular python open statement, it works as expected:
f = open('tmp.csv')
f.readline()
>>> print len(f.readlines())
500

Related

Utf-8 decoding with Python

I have a csv with some data, and in one row there is a text that was added after encoding it in utf-8.
This is the text:
"b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'"
I'm trying to use this text to obtain the original characters using the decode function, but it's imposible.
Does anyone know which is the correct procedure to do it?
Assuming that the line in your file is exactly like this:
b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'
And reading the line from the file gives the output:
>>> line
"b'\\xe7\\x94\\xb3\\xe8\\xbf\\xaa\\xe8\\xa5\\xbf\\xe8\\xb7\\xaf255\\xe5\\xbc\\x84660\\xe5\\x8f\\xb7\\xe5\\x92\\x8c665\\xe5\\x8f\\xb7 \\xe4\\xb8\\xad\\xe5\\x9b\\xbd\\xe4\\xb8\\x8a\\xe6\\xb5\\xb7\\xe6\\xb5\\xa6\\xe4\\xb8\\x9c\\xe6\\x96\\xb0\\xe5\\x8c\\xba 201205'"`
You can try to use eval() function:
with open(r"your_csv.csv", "r") as csvfile:
for line in csvfile:
# when you reach the desired line
b = eval(line).decode('utf-8')
Output:
>>> print(b)
'申迪西路255弄660号和665号 中国上海浦东新区 201205'
Try this:-
a = b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'
print(a.decode('utf-8')) #your decoded output
As you are saying you are reading from file then you can try with passing encoding system when reading:-
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
print repr(line)

Python Add a New Line without \n

I have this
f = open(os.path.join(r"/path/to/file/{}.txt").format(userid), "w")
f.write(str(points))
f.write(str(level))
f.write(str(prevtime))
f.close()
I know about using with open(blah) as f: and prefer this but when I have this code, even if I write the file first and then change to append mode, without adding a +"\n" it doesn't add to a new line. The reason \n is a problem is that when I go to get the data using
f = open(os.path.join(r"blah\{}.txt").format(userid), "r")
lines = f.readlines()
points = float(lines[0])
I'll get an error telling me it can't interpret (for example: 500\n) as a float because it reads the \n. Any idea what I can do?
EDIT
I ended up fixing it by just not making it a float, but now that is giving me a ValueError Unconverted Data Remains. These issues are only happening due to the line in the txt file that should contain a date in the format of %H:%M
EDIT 2019
I see lots of people trying to search the same question. My problem actually ended up with me ignoring and having a very clear lack of understanding of types in Python.
To answer the question that I imagine many people are searching for when they view this, \n is Python's newline character. An alternative (if using print()) would to be to call print() exactly as shown with no data, resulting in a blank line.
So, you have something like this
>>> f = open("test.txt", "w")
>>> f.write(str(4))
>>> f.write(str(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['420']
But, you need to write newlines, so just do so
>>> f = open("test.txt", "w")
>>> f.write("{}\n".format(4))
>>> f.write("{}\n".format(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['4\n', '20\n']
>>> f.close()
If you need no newline characters, try read().splitlines()
>>> f = open("test.txt")
>>> f.read().splitlines()
['4', '20']
EDIT
As far as the time value is concerned, here's an example.
>>> from datetime import datetime
>>> time_str = datetime.now().strftime("%H:%M")
>>> time_str
'18:26'
>>> datetime.strptime(time_str, "%H:%M")
datetime.datetime(1900, 1, 1, 18, 26)
To print without newlines, use below.
But without newlines, you might need to add some separator like space to separate your data
>>> sys.stdout.write('hello world')
hello world>>>
With the newlines remain, you could use rstrip to strip off the newlines when reading out
lines = f.readlines()
points = float(lines[0].rstrip)
Alternatively, I prefer more pythonic way below
lines = f.read().splitlines()
points = float(lines[0])

Replace a character by another in a file

I'd like to modify some characters of a file in-place, without having to copy the entire content of the file in another, or overwrite the existing one. However, it doesn't seem possible to just replace a character by another:
>>> f = open("foo", "a+") # file does not exist
>>> f.write("a")
1
>>> f.seek(0)
0
>>> f.write("b")
1
>>> f.seek(0)
0
>>> f.read()
'ab'
Here I'd have expected "a" to be replaced by "b", so that the content of the file would be just "b", but this is not the case. Is there a way to do this?
That's because of the mode you're using, in append mode, the file pointer is moved to the end of file before write, you should open your file in w+ mode:
f = open("foo", "w+") # file does not exist
f.write("samething")
f.seek(1)
f.write("o")
f.seek(0)
print f.read() # prints "something"
If you want to do that on an existing file without truncating it, you should open it in r+ mode for reading and writing.
Truncate the file using file.truncate first:
>>> f = open("foo", "a+")
>>> f.write('a')
>>> f.truncate(0) #truncates the file to 0 bytes
>>> f.write('b')
>>> f.seek(0)
>>> f.read()
'b'
Otherwise open the file in w+mode as suggested by #Guillaume.
import fileinput
for line in fileinput.input('abc', inplace=True):
line = line.replace('t', 'ed')
print line,
This doesn't replace character by character, instead it scans through each line replacing required character and writes the modified line.
For example:
file 'abc' contains:
i want
to replace
character
After executing, output would be:
i waned
edo replace
characeder
Will it help you? Hope so..
I believe that you may be able to modify the example from this answer.
https://stackoverflow.com/a/290494/1669208
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print line.replace(char1, char2),

Storing a random byte string in Python

For my project, I need to be able to store random byte strings in a file and read the byte string again later. For example, I want to store randomByteString from the following code:
>>> from os import urandom
>>> randomByteString=urandom(8)
>>> randomByteString
b'zOZ\x84\xfb\xceM~'
What would be the proper way to do this?
Edit: Forgot to mention that I also want to store 'normal' string alongside the byte strings.
Code like:
>>> fh = open("e:\\test","wb")
>>> fh.write(randomByteString)
8
>>> fh.close()
Operate the file as binary mode. Also, you could do it in a better manner if the file operations are near one place (Thanks to #Blender):
>>> with open("e:\\test","wb") as fh:
fh.write(randomByteString)
Update: if you want to strong normal strings, you could encode it and then write it like:
>>> "test".encode()
b'test'
>>> fh.write("test".encode())
Here the fh means the same file handle opened previously.
Works just fine. You can't expect the output to make much sense though.
>>> import os
>>> with open("foo.txt", "wb") as fh:
... fh.write(os.urandom(8))
...
>>> fh.close()
>>> with open("foo.txt", "r") as fh:
... for line in fh.read():
... print line
...
^J^JM-/
^O
R
M-9
J
~G

Open() and codecs.open() in Python 2.7 behave strangely different

I have a text file with first line of unicode characters and all other lines in ASCII.
I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?
And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.
If you call .readlines() again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline() and .readlines():
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.

Categories