Convert \r text to \n so readlines() works as intended

Convert \r text to \n so readlines() works as intended - python

In Python, you can read a file and load its lines into a list by using
f = open('file.txt','r')
lines = f.readlines()
Each individual line is delimited by \n but if the contents of a line have \r then it is not treated as a new line. I need to convert all \r to \n and get the correct list lines.
If I do .split('\r') inside the lines I'll get lists inside the list.
I thought about opening a file, replace all \r to \n, closing the file and reading it in again and then use the readlines() but this seems wasteful.
How should I implement this?

f = open('file.txt','rU')
This opens the file with Python's universal newline support and \r is treated as an end-of-line.

If it's a concern, open in binary format and convert with this code:
from __future__ import with_statement
with open(filename, "rb") as f:
s = f.read().replace('\r\n', '\n').replace('\r', '\n')
lines = s.split('\n')

Related

replace ^M(control M character) in a text file in python

The file is like this:
This line has control character ^M this is bad
I will try it
I want to remove control M characters in the file, and create a new file like this using Python
This line has control character this is bad
I will try it
I tried the methods I found in stack overflow and use regular expression like this:
line.replace("\r", "r")
and
line.replace("\r\n", "r")
Here is part of the code snippet:
with open(file_path, "r") as input_file:
lines = input_file.readlines()
new_lines = []
for line in lines:
new_line = line.replace("\r", "")
new_lines.append(new_line)
new_file_name = "replace_control_char.dat"
new_file_path = os.path.join(here, data_dir, new_file_name)
with open(new_file_path, "w") as output_file:
for line in new_lines:
output_file.write(line)
However, the new file I got is:
This line has control character
this is bad
I will try it
"This line has control character" and " this is bad" are not on the same line. I expect remove control M character will make these two phrases on the same line.
Can someone help me solve this issue?
Thanks,
Arthur

You cannot rely on text mode in that case.
On Windows understands sole \r as linefeeds (even if the "official" line terminator is \r\n) and on Macintosh, the line terminator can be only \r. Text mode converts linefeeds as \n or remove them if followed by \n, so it destroys the information you need.
Universal newlines by default makes this code also fail on Unix/Linux. Python behaves the same on all platforms
Python doesn’t depend on the underlying operating system’s notion of text files; all the the processing is done by Python itself, and is therefore platform-independent.
If you want to remove those, you have to use binary mode.
with open(file_path, "rb") as input_file:
contents = input_file.read().replace(b"\r",b"")
with open(file_path, "wb") as output_file:
output_file.write(contents)
That code will remove all \r characters (including line terminators). That works but if your aim is just to remove stray \r and preserve endlines, another method is required.
One way to do it is to use a regular expression, which can accept binary (bytes) as well:
re.sub(rb"\r([^\n])",rb"\1",contents)
That regular expression removes \r chars only if not followed by \n chars, efficiently preserving CR+LF windows end-of-line sequences.

to change a text file containing multiline strings

I have a text file consisting of multiline (hundreds of lines actually) strings. Each of the strings starts with '&' sign. I want to change my text file in a way that only the first 300 characters of each string remain in the new file. How I can do this by using python?

You can read a file and loop over the lines to do what you want. Strings are easily slicable in python to get the first 300 to write to another file.
file = open(path,"r")
lines = file.readlines()
newFile = open(newPath,"w")
for index, line in enumerate(lines):
newLine = line[0:301]
newFile.writelines([newLine])
Hope this is what you meant

You could do something like this:
# Open output file in append mode
with open('output.txt', 'a') as out_file:
# Open input file in read mode
with open("input.txt", "r") as in_file:
for line in in_file:
# Take first 300 characters from line
# I believe this works even when line is < 300 characters
new_line = line[0:300]
# Write new line to output
# (You might need to add '\n' for new lines)
out_file.write(new_line)
print(new_line)

You can use the string method split to split your lines, then you can use slices to keep only the 300 first characters of each split.
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line[:300]))
This version preserves ends of line \n within your strings.
If you want to remove ends of line in each individual string, you can use replace:
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line.replace("\n", "")[:300]))
Note that your new file will end with an empty line.
Another note is, depending on the size of your file, you may rather use a generator function version, instead of split which results in the whole file content being loaded in memory as a list of strings.

Can't open csv file in python without opening it in excel

I have a .csv file generated by a program. When I try to open it with the following code the output makes no sense, even though I have tried the same code with not program generated csv and it works fine.
g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "")
f = f.readlines()
print(f)
The output of the code looks like this
['ÿþA\x00l\x00l\x00 \x00t\x00e\x00m\x00p\x00e\x00r\x00a\x00t\x00u\x00r\x00e\x00s\x00 \x00i\x00n\x00 \x00°\x00F\x00.\x00\r',
'\x00\n',
'\x00\r',
'\x00\n',
'\x00D\x00:\x00\\\x00O\x00n\x00e\x00D\x00r\x00i\x00v\x00e\x00\\\x00M\x00A\x00E\x00S\x00T\x00R\x00I\x00A\x00 \x00I\x00M\x00E\x00C\x00\\\x00T\x00e\x00s\x00i\x00s\x00\\\x00d\x00a\x00t\x00o\x00s\x00\\\x001\x00.\x008\x001\x00\\\x00I\x00R\x002\x000\x002\x001\x001\x001\x000\x003\x00_\x002\x002\x007\x005\x00.\x00i\x00s\x002\x00\r',
However, when I first open the file with excel and save it as a .csv (replacing the original with the .csv from excel), the output is as expected, like this:
['All temperatures in °F.\r\n',
'\r\n',
'D:\\OneDrive\\MAESTRIA IMEC\\Tesis\\datos\\1.81\\IR20211103_2275.is2\r\n',
'\r\n',
'",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,"\r\n',
I have also tried csv.reader() and doesn't work either.
Does anyone know what's going on and how can I solve it? How can I open my .csv without opening and saving it from excel first? The source program is SmartView from Fluke which reads thermal image file .is2 and converts it into a .csv file
Thank you very much

Your file is encoded with UTF-16 (Little Endian byte order). You can specify file encoding using encoding argument of open() function (list of standard encodings and their names you can find here).
Also I'd recommend to not use .readlines() as it will keep trailing newline chars. You can read all file content into as string (using .read()) and apply str.splitlines() to ... split string into a list of lines. Alternatively you can also consume file line by line and call str.rstrip() to cut trailing newline chars.
Final code:
filename = "datos/1.81/IR20211103_2275.csv"
with open(filename, encoding="utf16") as f:
lines = f.read().splitlines()
# OR
lines = [line.rstrip() for line in f]

g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "",encoding="utf-16")
f = f.readlines()
print(f)
try this one it may help

Replacing \n while keeping \r\n intact

I have a huge CSV file (196244 line) where it has \n in place other than new lines, I want to remove those \n but keep \r\n intact.
I've tried line.replace but seems like it is not recognizing \r\n so next I tried regex
with open(filetoread, "r") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
line = re.sub("(?<!\r)\n", " ", line)
fixed.write(line)
but it is not keeping \r\n it is removing everything. I can't do it in Notepad++ it is crashing on this file.

You are not exposing the line breaks to the regex engine. Also, the line breaks are "normalized" to LF when using open with r mode, and to keep them all in the input, you can read the file in in the binary mode using b. Then, you need to remember to also use the b prefix with the regex pattern and replacement.
You can use
with open(filetoread, "rb") as inf:
with open(filetowrite, "wb") as fixed:
fixed.write(re.sub(b"(?<!\r)\n", b" ", inf.read()))
Now, the whole file will be read into a single string (with inf.read()) and the line breaks will be matched, and eventually replaced.
Pay attention to
"rb" when reading file in
"wb" to write file out
re.sub(b"(?<!\r)\n", b" ", inf.read()) contains b prefixes with string literals, and inf.read() reads in the file contents into single variable.

When you open a file with a naive open() call, it will load a view of the file with a variety of newlines to be simply \n via TextIOWrapper
Explicitly setting newline="\r\n" should allow you to read and write the newlines the way you expect
with open(path_src, newline="\r\n") as fh_src:
with open(path_dest, "w", newline="\r\n") as fh_dest:
for line in fh_src: # file-likes are iterable by-lines
fh_dest.write(line[:-2].replace("\n", " "))
fh_dest.write("\r\n")
content example
>>> with open("test.data", "wb") as fh:
... fh.write(b"""foo\nbar\r\nbaz\r\n""")
...
14
>>> with open("test.data", newline="\r\n") as fh:
... for line in fh:
... print(repr(line))
...
'foo\nbar\r\n'
'baz\r\n'

How to write to a file with newline characters and avoid empty lines

I'm trying to write encoded data to a file and separate each run with a newline character. However, when doing this there is an empty line between each run -- as shown below.
Using .rstrip()/.strip() only really works when reading the file -- and obviously this cannot be used directly when writing to the file as it would write all the data to a single line.
cFile = open('compFile', 'w')
for i in range(num_lines):
line = validLine()
compressedFile.write(line + "\n")
cFile.close()
cFile = open('compFile', 'r')
for line in cFile:
print(line)
# Empty space output:
023
034
045
# Desired output:
023
034
045

I think you already did what you want if you have a look at your text file.
Be aware, that python reads the \n at the end of your file too and that print() makes a newline at the end of the printed line.
In your case that means your file should look like
023\n
034\n
045\n
When printing, you at first read 023\n and then as python does with the print() function you append a \n to your line.
Then you have the 023\n\n you get in your console. But in the file you have what you want.
If you just want to print without linebreak, you can use
import sys
sys.stdout.write('.')

You could use
for i in range(num_lines):
line = validLine()
compressedFile.write(line.strip() + "\n")
# ^^^
cFile.close()
Off-topic but consider using with () additionally.

Using .rstrip()/.strip() only really works when reading the file -- and obviously this cannot be used directly when writing to the file as it would write all the data to a single line.
This is a misconception. Using .rstrip() is exactly the correct tool if you need to write a series of strings, some of which may have a newline character attached:
with open('compFile', 'w') as cFile:
for i in range(num_lines):
line = validLine().rstrip("\n") # remove possible newline
compressedFile.write(line + "\n")
Note that if all your lines already have a newline attached, you don't have to add more newlines. Just write the string directly to the file, no stripping needed:
with open('compFile', 'w') as cFile:
for i in range(num_lines):
line = validLine() # line with "\n" newline already present
compressedFile.write(line) # no need to add a newline anymore
Next, you are reading lines with newlines from your file and then printing them with print(). By default, print() adds another newline, so you end up with double-spaced lines; your input file contains 023\n034\n045\n, but printing each line ('023\n', then '034\n', then '045\n') adds a newline afterwards and you write out 023\n\n034\n\n045\n\n out to stdout.
Either strip that newline when printing, or tell print() to not add a newline of its own by giving it an empty end parameter:
with open('compFile', 'r') as cFile:
for line in cFile:
print(line, end='')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert \r text to \n so readlines() works as intended - python

f = open('file.txt','rU') This opens the file with Python's universal newline support and \r is treated as an end-of-line.

If it's a concern, open in binary format and convert with this code: from future import with_statement with open(filename, "rb") as f: s = f.read().replace('\r\n', '\n').replace('\r', '\n') lines = s.split('\n')

Related

replace ^M(control M character) in a text file in python

to change a text file containing multiline strings

Can't open csv file in python without opening it in excel

Replacing \n while keeping \r\n intact

How to write to a file with newline characters and avoid empty lines

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert \r text to \n so readlines() works as intended - python

f = open('file.txt','rU') This opens the file with Python's universal newline support and \r is treated as an end-of-line.

If it's a concern, open in binary format and convert with this code: from __future__ import with_statement with open(filename, "rb") as f: s = f.read().replace('\r\n', '\n').replace('\r', '\n') lines = s.split('\n')

Related

replace ^M(control M character) in a text file in python

to change a text file containing multiline strings

Can't open csv file in python without opening it in excel

Replacing \n while keeping \r\n intact

How to write to a file with newline characters and avoid empty lines

Categories

Resources

If it's a concern, open in binary format and convert with this code: from future import with_statement with open(filename, "rb") as f: s = f.read().replace('\r\n', '\n').replace('\r', '\n') lines = s.split('\n')