Python writing to file [duplicate] - python

How do I write a line to a file in modern Python? I heard that this is deprecated:
print >>f, "hi there"
Also, does "\n" work on all platforms, or should I use "\r\n" on Windows?

This should be as simple as:
with open('somefile.txt', 'a') as the_file:
the_file.write('Hello\n')
From The Documentation:
Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single '\n' instead, on all platforms.
Some useful reading:
The with statement
open()
'a' is for append, or use
'w' to write with truncation
os (particularly os.linesep)

You should use the print() function which is available since Python 2.6+
from __future__ import print_function # Only needed for Python 2
print("hi there", file=f)
For Python 3 you don't need the import, since the print() function is the default.
The alternative in Python 3 would be to use:
with open('myfile', 'w') as f:
f.write('hi there\n') # python will convert \n to os.linesep
Quoting from Python documentation regarding newlines:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
See also: Reading and Writing Files - The Python Tutorial

The python docs recommend this way:
with open('file_to_write', 'w') as f:
f.write('file contents\n')
So this is the way I usually do it :)
Statement from docs.python.org:
It is good practice to use the 'with' keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way. It is
also much shorter than writing equivalent try-finally blocks.

Regarding os.linesep:
Here is an exact unedited Python 2.7.1 interpreter session on Windows:
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.linesep
'\r\n'
>>> f = open('myfile','w')
>>> f.write('hi there\n')
>>> f.write('hi there' + os.linesep) # same result as previous line ?????????
>>> f.close()
>>> open('myfile', 'rb').read()
'hi there\r\nhi there\r\r\n'
>>>
On Windows:
As expected, os.linesep does NOT produce the same outcome as '\n'. There is no way that it could produce the same outcome. 'hi there' + os.linesep is equivalent to 'hi there\r\n', which is NOT equivalent to 'hi there\n'.
It's this simple: use \n which will be translated automatically to os.linesep. And it's been that simple ever since the first port of Python to Windows.
There is no point in using os.linesep on non-Windows systems, and it produces wrong results on Windows.
DO NOT USE os.linesep!

I do not think there is a "correct" way.
I would use:
with open('myfile', 'a') as f:
f.write('hi there\n')
In memoriam Tim Toady.

In Python 3 it is a function, but in Python 2 you can add this to the top of the source file:
from __future__ import print_function
Then you do
print("hi there", file=f)

If you are writing a lot of data and speed is a concern you should probably go with f.write(...). I did a quick speed comparison and it was considerably faster than print(..., file=f) when performing a large number of writes.
import time
start = start = time.time()
with open("test.txt", 'w') as f:
for i in range(10000000):
# print('This is a speed test', file=f)
# f.write('This is a speed test\n')
end = time.time()
print(end - start)
On average write finished in 2.45s on my machine, whereas print took about 4 times as long (9.76s). That being said, in most real-world scenarios this will not be an issue.
If you choose to go with print(..., file=f) you will probably find that you'll want to suppress the newline from time to time, or replace it with something else. This can be done by setting the optional end parameter, e.g.;
with open("test", 'w') as f:
print('Foo1,', file=f, end='')
print('Foo2,', file=f, end='')
print('Foo3', file=f)
Whichever way you choose I'd suggest using with since it makes the code much easier to read.
Update: This difference in performance is explained by the fact that write is highly buffered and returns before any writes to disk actually take place (see this answer), whereas print (probably) uses line buffering. A simple test for this would be to check performance for long writes as well, where the disadvantages (in terms of speed) for line buffering would be less pronounced.
start = start = time.time()
long_line = 'This is a speed test' * 100
with open("test.txt", 'w') as f:
for i in range(1000000):
# print(long_line, file=f)
# f.write(long_line + '\n')
end = time.time()
print(end - start, "s")
The performance difference now becomes much less pronounced, with an average time of 2.20s for write and 3.10s for print. If you need to concatenate a bunch of strings to get this loooong line performance will suffer, so use-cases where print would be more efficient are a bit rare.

Since 3.5 you can also use the pathlib for that purpose:
Path.write_text(data, encoding=None, errors=None)
Open the file pointed to in text mode, write data to it, and close the file:
import pathlib
pathlib.Path('textfile.txt').write_text('content')

When you said Line it means some serialized characters which are ended to '\n' characters. Line should be last at some point so we should consider '\n' at the end of each line. Here is solution:
with open('YOURFILE.txt', 'a') as the_file:
the_file.write("Hello")
in append mode after each write the cursor move to new line, if you want to use w mode you should add \n characters at the end of the write() function:
the_file.write("Hello\n")

If you want to avoid using write() or writelines() and joining the strings with a newline yourself, you can pass all of your lines to print(), and the newline delimiter and your file handle as keyword arguments. This snippet assumes your strings do not have trailing newlines.
print(line1, line2, sep="\n", file=f)
You don't need to put a special newline character is needed at the end, because print() does that for you.
If you have an arbitrary number of lines in a list, you can use list expansion to pass them all to print().
lines = ["The Quick Brown Fox", "Lorem Ipsum"]
print(*lines, sep="\n", file=f)
It is OK to use "\n" as the separator on Windows, because print() will also automatically convert it to a Windows CRLF newline ("\r\n").

One can also use the io module as in:
import io
my_string = "hi there"
with io.open("output_file.txt", mode='w', encoding='utf-8') as f:
f.write(my_string)

If you want to insert items in a list with a format per line, a way to start could be:
with open('somefile.txt', 'a') as the_file:
for item in items:
the_file.write(f"{item}\n")

When I need to write new lines a lot, I define a lambda that uses a print function:
out = open(file_name, 'w')
fwl = lambda *x, **y: print(*x, **y, file=out) # FileWriteLine
fwl('Hi')
This approach has the benefit that it can utilize all the features that are available with the print function.
Update: As is mentioned by Georgy in the comment section, it is possible to improve this idea further with the partial function:
from functools import partial
fwl = partial(print, file=out)
IMHO, this is a more functional and less cryptic approach.

To write text in a file in the flask can be used:
filehandle = open("text.txt", "w")
filebuffer = ["hi","welcome","yes yes welcome"]
filehandle.writelines(filebuffer)
filehandle.close()

You can also try filewriter
pip install filewriter
from filewriter import Writer
Writer(filename='my_file', ext='txt') << ["row 1 hi there", "row 2"]
Writes into my_file.txt
Takes an iterable or an object with __str__ support.

with open('sample.txt', 'a') as f:
f.write('Hello')
f.write('\n')
Insert f.write('\n') at the end

since others have answered how to do it, I'll answer how it happens line by line.
with FileOpenerCM('file.txt') as fp: # is equal to "with open('file.txt') as fp:"
fp.write('dummy text')
this is a so-called context manager, anything that comes with a with block is a context manager. so let's see how this happens under the hood.
class FileOpenerCM:
def __init__(self, file, mode='w'):
self.file = open(file, mode)
def __enter__(self):
return self.file
def __exit__(self, exc_type, exc_value, exc_traceback):
self.file.close()
the first method __init__ is (as you all know) the initialization method of an object. whenever an object is created obj.__init__ is definitely called. and that's the place where you put your all the init kinda code.
the second method __enter__ is a bit interesting. some of you might not have seen it because it is a specific method for context managers. what it returns is the value to be assigned to the variable after the as keyword. in our case, fp.
the last method is the method to run after an error is captured or if the code exits the with block. exc_type, exc_value, exc_traceback variables are the variables that hold the values of the errors that occurred inside with block. for example,
exc_type: TypeError
exc_value: unsupported operand type(s) for +: 'int' and 'str
exc_traceback: <traceback object at 0x6af8ee10bc4d>
from the first two variables, you can get info enough info about the error. honestly, I don't know the use of the third variable, but for me, the first two are enough. if you want to do more research on context managers surely you can do it and note that writing classes are not the only way to write context managers. with contextlib you can write context managers through functions(actually generators) as well. it's totally up to you to have a look at it. you can surely try
generator functions with contextlib but as I see classes are much cleaner.

Related

how do I write the output of this code in a different file in python [duplicate]

How do I write a line to a file in modern Python? I heard that this is deprecated:
print >>f, "hi there"
Also, does "\n" work on all platforms, or should I use "\r\n" on Windows?
This should be as simple as:
with open('somefile.txt', 'a') as the_file:
the_file.write('Hello\n')
From The Documentation:
Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single '\n' instead, on all platforms.
Some useful reading:
The with statement
open()
'a' is for append, or use
'w' to write with truncation
os (particularly os.linesep)
You should use the print() function which is available since Python 2.6+
from __future__ import print_function # Only needed for Python 2
print("hi there", file=f)
For Python 3 you don't need the import, since the print() function is the default.
The alternative in Python 3 would be to use:
with open('myfile', 'w') as f:
f.write('hi there\n') # python will convert \n to os.linesep
Quoting from Python documentation regarding newlines:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
See also: Reading and Writing Files - The Python Tutorial
The python docs recommend this way:
with open('file_to_write', 'w') as f:
f.write('file contents\n')
So this is the way I usually do it :)
Statement from docs.python.org:
It is good practice to use the 'with' keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way. It is
also much shorter than writing equivalent try-finally blocks.
Regarding os.linesep:
Here is an exact unedited Python 2.7.1 interpreter session on Windows:
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.linesep
'\r\n'
>>> f = open('myfile','w')
>>> f.write('hi there\n')
>>> f.write('hi there' + os.linesep) # same result as previous line ?????????
>>> f.close()
>>> open('myfile', 'rb').read()
'hi there\r\nhi there\r\r\n'
>>>
On Windows:
As expected, os.linesep does NOT produce the same outcome as '\n'. There is no way that it could produce the same outcome. 'hi there' + os.linesep is equivalent to 'hi there\r\n', which is NOT equivalent to 'hi there\n'.
It's this simple: use \n which will be translated automatically to os.linesep. And it's been that simple ever since the first port of Python to Windows.
There is no point in using os.linesep on non-Windows systems, and it produces wrong results on Windows.
DO NOT USE os.linesep!
I do not think there is a "correct" way.
I would use:
with open('myfile', 'a') as f:
f.write('hi there\n')
In memoriam Tim Toady.
In Python 3 it is a function, but in Python 2 you can add this to the top of the source file:
from __future__ import print_function
Then you do
print("hi there", file=f)
If you are writing a lot of data and speed is a concern you should probably go with f.write(...). I did a quick speed comparison and it was considerably faster than print(..., file=f) when performing a large number of writes.
import time
start = start = time.time()
with open("test.txt", 'w') as f:
for i in range(10000000):
# print('This is a speed test', file=f)
# f.write('This is a speed test\n')
end = time.time()
print(end - start)
On average write finished in 2.45s on my machine, whereas print took about 4 times as long (9.76s). That being said, in most real-world scenarios this will not be an issue.
If you choose to go with print(..., file=f) you will probably find that you'll want to suppress the newline from time to time, or replace it with something else. This can be done by setting the optional end parameter, e.g.;
with open("test", 'w') as f:
print('Foo1,', file=f, end='')
print('Foo2,', file=f, end='')
print('Foo3', file=f)
Whichever way you choose I'd suggest using with since it makes the code much easier to read.
Update: This difference in performance is explained by the fact that write is highly buffered and returns before any writes to disk actually take place (see this answer), whereas print (probably) uses line buffering. A simple test for this would be to check performance for long writes as well, where the disadvantages (in terms of speed) for line buffering would be less pronounced.
start = start = time.time()
long_line = 'This is a speed test' * 100
with open("test.txt", 'w') as f:
for i in range(1000000):
# print(long_line, file=f)
# f.write(long_line + '\n')
end = time.time()
print(end - start, "s")
The performance difference now becomes much less pronounced, with an average time of 2.20s for write and 3.10s for print. If you need to concatenate a bunch of strings to get this loooong line performance will suffer, so use-cases where print would be more efficient are a bit rare.
Since 3.5 you can also use the pathlib for that purpose:
Path.write_text(data, encoding=None, errors=None)
Open the file pointed to in text mode, write data to it, and close the file:
import pathlib
pathlib.Path('textfile.txt').write_text('content')
When you said Line it means some serialized characters which are ended to '\n' characters. Line should be last at some point so we should consider '\n' at the end of each line. Here is solution:
with open('YOURFILE.txt', 'a') as the_file:
the_file.write("Hello")
in append mode after each write the cursor move to new line, if you want to use w mode you should add \n characters at the end of the write() function:
the_file.write("Hello\n")
If you want to avoid using write() or writelines() and joining the strings with a newline yourself, you can pass all of your lines to print(), and the newline delimiter and your file handle as keyword arguments. This snippet assumes your strings do not have trailing newlines.
print(line1, line2, sep="\n", file=f)
You don't need to put a special newline character is needed at the end, because print() does that for you.
If you have an arbitrary number of lines in a list, you can use list expansion to pass them all to print().
lines = ["The Quick Brown Fox", "Lorem Ipsum"]
print(*lines, sep="\n", file=f)
It is OK to use "\n" as the separator on Windows, because print() will also automatically convert it to a Windows CRLF newline ("\r\n").
One can also use the io module as in:
import io
my_string = "hi there"
with io.open("output_file.txt", mode='w', encoding='utf-8') as f:
f.write(my_string)
If you want to insert items in a list with a format per line, a way to start could be:
with open('somefile.txt', 'a') as the_file:
for item in items:
the_file.write(f"{item}\n")
When I need to write new lines a lot, I define a lambda that uses a print function:
out = open(file_name, 'w')
fwl = lambda *x, **y: print(*x, **y, file=out) # FileWriteLine
fwl('Hi')
This approach has the benefit that it can utilize all the features that are available with the print function.
Update: As is mentioned by Georgy in the comment section, it is possible to improve this idea further with the partial function:
from functools import partial
fwl = partial(print, file=out)
IMHO, this is a more functional and less cryptic approach.
To write text in a file in the flask can be used:
filehandle = open("text.txt", "w")
filebuffer = ["hi","welcome","yes yes welcome"]
filehandle.writelines(filebuffer)
filehandle.close()
You can also try filewriter
pip install filewriter
from filewriter import Writer
Writer(filename='my_file', ext='txt') << ["row 1 hi there", "row 2"]
Writes into my_file.txt
Takes an iterable or an object with __str__ support.
with open('sample.txt', 'a') as f:
f.write('Hello')
f.write('\n')
Insert f.write('\n') at the end
since others have answered how to do it, I'll answer how it happens line by line.
with FileOpenerCM('file.txt') as fp: # is equal to "with open('file.txt') as fp:"
fp.write('dummy text')
this is a so-called context manager, anything that comes with a with block is a context manager. so let's see how this happens under the hood.
class FileOpenerCM:
def __init__(self, file, mode='w'):
self.file = open(file, mode)
def __enter__(self):
return self.file
def __exit__(self, exc_type, exc_value, exc_traceback):
self.file.close()
the first method __init__ is (as you all know) the initialization method of an object. whenever an object is created obj.__init__ is definitely called. and that's the place where you put your all the init kinda code.
the second method __enter__ is a bit interesting. some of you might not have seen it because it is a specific method for context managers. what it returns is the value to be assigned to the variable after the as keyword. in our case, fp.
the last method is the method to run after an error is captured or if the code exits the with block. exc_type, exc_value, exc_traceback variables are the variables that hold the values of the errors that occurred inside with block. for example,
exc_type: TypeError
exc_value: unsupported operand type(s) for +: 'int' and 'str
exc_traceback: <traceback object at 0x6af8ee10bc4d>
from the first two variables, you can get info enough info about the error. honestly, I don't know the use of the third variable, but for me, the first two are enough. if you want to do more research on context managers surely you can do it and note that writing classes are not the only way to write context managers. with contextlib you can write context managers through functions(actually generators) as well. it's totally up to you to have a look at it. you can surely try
generator functions with contextlib but as I see classes are much cleaner.

Notepad ++ Python code to add a value to numbers matching a pattern

I have a notepad++ file with contents like below
{s:11:"wpseo_title";s:42:"Web Designing training institutes in Kochi";}i:357;a:1:
{s:11:"wpseo_title";s:32:"CSS training institutes in Kochi";}i:358;a:1:
{s:11:"wpseo_title";s:34:"HTML5 training institutes in Kochi";}i:359;a:1:
{s:11:"wpseo_title";s:39:"JavaScript training institutes in Kochi";}i:360;a:1:
{s:11:"wpseo_title";s:32:"XML training institutes in Kochi";}}}
I need a way to search for the phrase ";s:42:" and increment the number part of the phrase by 1. In this case, 42 will become 43.
I just need to do it. Dont care if it is through a python script like this
Notepad++ Regular Expression add up numbers
or any other method.
Please help me. I am new to python/ any such language.
Perl one-liner version:
perl -ne 's/(?<=;s:)(\d+)(?=:)/$1+1/ge; print' data.txt
Using your link as an example, the regex match should be:
def calculate(linenumber, match):
editor.pyreplace(match.group(0), ';s:%d="%d"' % (match.group(1), str(int(match.group(2))+1)), 0, 0, linenumber, linenumber)
editor.pysearch(r';s:([0-9])="([0-9]+)"', calculate)
I think. I've never actually done this before!
target_file = "some_file.txt"
with open("tmp_out.txt","w") as f_out:
with open(target_file) as f_in:
for line in f_in:
f_out.write(re.sub(";s:(\d+):",lambda match:";s:%d:"%(int(match.groups(0)[0])+1,line) + "\n")
shutil.move("tmp_out.txt",target_file)
something like that I think
or even better
import fileinput
for line in fileinput.input(target_file, inplace=True):
print re.sub(";s:(\d+):",lambda match:";s:%d:"%(int(match.groups(0)[0])+1,line)
Though, one liner in Python is difficult to achieve, but you can use something similar using the callback feature of re.sub
repl = lambda e: e.group(1) + str(int(e.group(2))+1)
with open("in.txt") as fin:
with open("out.txt","w") as fout:
fout.write(re.sub(r"(;s:)(\d+):",repl, fin.read()))

File reading and regex - Python

I read a file which has a line : Fixes: Saurabh Likes python
I want to remove the Fixes: part of above line. I am employing regex for that
but the snippet below returns output like
Saurabh Likes python\r
I am wondering where \r is coming from. I tried all strip options for removing it like rstrip(), lstrip(), etc. But nothing worked. Could anybody suggest me the way to get rid of \r.
patternFixes ='\s*'+'Fixes'+':'+'\s*'
matchFixes= re.search(patternFixes,line, re.IGNORECASE)
if matchFixes:
patternCompiled = re.compile(patternFixes)
line=patternCompiled.sub("", line)
#line=line.lstrip()
relevantInfo = relevantInfo+line
continue
Thanks in advance!
-Saurabh
Suggestion to get rid of \r:
I suppose you have opened your file using open(filename). Following the manual of open:
If mode is omitted, it defaults to 'r'. ... In addition to the
standard fopen() values mode may be 'U' or 'rU'. Python is usually
built with universal newlines support; supplying 'U' opens the file as
a text file, but lines may be terminated by any of the following: the
Unix end-of-line convention '\n', the Macintosh convention '\r', or
the Windows convention '\r\n'. All of these external representations
are seen as '\n' by the Python program.
So, in short, please try to open your file using 'rU' and see if the \r vanishes:
with open(filename, "rU") as f:
# do your stuff here.
...
Does the \r vanish in your output?
Of course your code looks rather clunky, but other have already commented on this part.
You probably opened the file in binary mode (open(filename, "rb") or something like that). Don't do this if you're working with text files.
Use open(filename) instead. Now Python will automatically normalize all newlines to \n, regardless of the current platform.
Also, why not simply patternFixes = r'\s*Fixes:\s*'? Why all the +es?
Then, you're doing a lot of unnecessary stuff like recompiling a regex over and over.
So, my suggestion (which does the same thing as your code (plus the file handling):
r = re.compile(r'\s*Fixes:\s*')
with open(filename) as infile:
relevantInfo = "".join(r.sub("", line) for line in infile if "Fixes:" in line)
>>> import re
>>> re.sub('Fixes:\s*', '', 'Fixes: Saurabh Likes python')
'Saurabh Likes python'
No '\r'
>>> re.sub('\s*'+'Fixes'+':'+'\s*', '', 'Fixes: Saurabh Likes python')
'Saurabh Likes python'
No '\r' again
can you provide more details on how to reproduce?
EDIt cannot reproduce with your code neither
>>> line = 'Fixes: Saurabh Likes python'
>>> patternFixes ='\s*'+'Fixes'+':'+'\s*'
>>> matchFixes= re.search(patternFixes,line, re.IGNORECASE)
>>> if matchFixes:
... patternCompiled = re.compile(patternFixes)
... line=patternCompiled.sub("", line)
... print line
... line=line.lstrip()
... print line
...
Saurabh Likes python
Saurabh Likes python
>>>
The '\r' is a carriage return -- http://en.wikipedia.org/wiki/Carriage_return, and it's being picked up from your file.
I will note that if all the lines you need to 'fix' actually DO start with "Fixes: " and that's all you want to change, you could just do something like:
line = line[line.find('Fixes: ')+7:-1]
Saves you all the regex stuff. Not sure on performance, though. And this SHOULD kill your '\r's at the same time.

Replace part of string using python regular expression

I have the following lines (many, many):
...
gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567
..
What I'd like to do is to find line 'particular' (whatever number is after ':')
and replace this number with '111222333'. How can I do that using python regular expressions ?
for line in input:
key, val = line.split(':')
if key == 'particular':
val = '111222333'
I'm not sure regex would be of any value in this specific case. My guess is they'd be slower. That said, it can be done. Here's one way:
for line in input:
re.sub('^particular : .*', 'particular : 111222333')
There are subtleties involved in this, and this is almost certainly not what you'd want in production code. You need to check all of the re module constants to make sure the regex is acting the way you expect, etc. You might be surprised at the flexibility you find in dealing with problems like this in Python if you try not to use re (of course, this isn't to say re isn't useful) ;-)
Sure you need a regular expression?
other_number = '111222333'
some_text, some_number = line.split(': ')
new_line = ': '.join(some_text, other_number)
#!/usr/bin/env python
import re
text = '''gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567'''
print(re.sub('[0-9]+', '111222333', text))
input = """gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567"""
entries = re.split("\n+", input)
for entry in entries:
if entry.startswith("particular"):
entry = re.sub(r'[0-9]+', r'111222333', entry)
or with sed:
sed -e 's/^particular: [0-9].*$/particular: 111222333/g' file
An important point here is that if you have a lot of lines, you want to process them one by one. That is, instead of reading all the lines in replacing them, and writing them out again, you should read in a line at a time and write out a line at a time. (This would be inefficient if you were actually reading a line at a time from the disk; however, Python's IO is competent and will buffer the file for you.)
with open(...) as infile, open(...) as outfile:
for line in infile:
if line.startswith("particular"):
outfile.write("particular: 111222333")
else:
outfile.write(line)
This will be speed- and memory-efficient.
Your sed example forces me to say neat!
python -c "import re, sys; print ''.join(re.sub(r'^(particular:) \d+', r'\1 111222333', l) for l in open(sys.argv[1]))" file

How can I detect DOS line breaks in a file?

I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch the line endings.
How would I do this? Is there a flag I can test for? Something similar?
Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:
f = open('myfile.txt', 'U')
f.readline() # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)
This gives the newline ending of the first line (Unix, DOS, etc.), if any.
As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.
Reference: http://docs.python.org/2/library/functions.html#open
If you just want to convert a file, you can simply do:
with open('myfile.txt', 'U') as infile:
text = infile.read() # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
outfile.write(text) # Writes newlines for the platform running the program
You could search the string for \r\n. That's DOS style line ending.
EDIT: Take a look at this
(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:
print open('myfile.txt', 'U').read()
That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".
http://docs.python.org/library/functions.html#open
(Thanks handle!)
As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:
if "\r\n" in open("/path/file.txt","rb").read():
print "DOS line endings found"
Edit: simplified as per John Machin's comment (no need to use regular expressions).
dos linebreaks are \r\n, unix only \n. So just search for \r\n.
Using grep & bash:
grep -c -m 1 $'\r$' file
echo $'\r\n\r\n' | grep -c $'\r$' # test
echo $'\r\n\r\n' | grep -c -m 1 $'\r$'
You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.
In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.
def get_newline(filename):
with open(filename, "rb") as f:
while True:
c = f.read(1)
if not c or c == b'\n':
break
if c == b'\r':
if f.read(1) == b'\n':
return '\r\n'
return '\r'
return '\n'

Categories