How to replace a particular character in a CSV file

How to replace a particular character in a CSV file - python

I have a folder of CSV files (~100) and every file has an unknows character that looks like this �. This unknown character is supposed to be a double quote ("). Because of this unknown char, I am not able to run my CSV to xlsx converter to convert my files to XLSX format.
I tried using the csv.read() function but it does not with the replace function as csv.read() return a reader object and replace does not work with this. How can I replace that character and write the replaced contents back to csv so that I can run my csv to xlsx converter?
example :
current file contetnts:
"hello�
Output after convertion:
"hello"

Try this:
import fileinput
with fileinput.FileInput("file.csv", inplace=True) as file:
for line in file:
print(line.replace('�', '"'), end='')

The sed command is designed for this kind of work. It finds and replaces characters from a file.
Use this in a terminal.
sed -i 's/old-word/new-word/g' filename.csv
Your old-word should be the unknown character and new-word the double quote

I use this little function to deal with such problems.
The code is quite self-explanatory. It opens a file, read it all (may not work for files larger than your RAM) then rewrites it with a patched version.
def patch_file(file, original, patch):
with open(file, 'r') as f:
lines = f.readlines()
with open(file, 'w') as f:
for line in lines:
f.write(line.replace(original, patch))
patch_file(file='yourCSVfile.txt', original='�', patch'"')

Related

Can't open csv file in python without opening it in excel

I have a .csv file generated by a program. When I try to open it with the following code the output makes no sense, even though I have tried the same code with not program generated csv and it works fine.
g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "")
f = f.readlines()
print(f)
The output of the code looks like this
['ÿþA\x00l\x00l\x00 \x00t\x00e\x00m\x00p\x00e\x00r\x00a\x00t\x00u\x00r\x00e\x00s\x00 \x00i\x00n\x00 \x00°\x00F\x00.\x00\r',
'\x00\n',
'\x00\r',
'\x00\n',
'\x00D\x00:\x00\\\x00O\x00n\x00e\x00D\x00r\x00i\x00v\x00e\x00\\\x00M\x00A\x00E\x00S\x00T\x00R\x00I\x00A\x00 \x00I\x00M\x00E\x00C\x00\\\x00T\x00e\x00s\x00i\x00s\x00\\\x00d\x00a\x00t\x00o\x00s\x00\\\x001\x00.\x008\x001\x00\\\x00I\x00R\x002\x000\x002\x001\x001\x001\x000\x003\x00_\x002\x002\x007\x005\x00.\x00i\x00s\x002\x00\r',
However, when I first open the file with excel and save it as a .csv (replacing the original with the .csv from excel), the output is as expected, like this:
['All temperatures in °F.\r\n',
'\r\n',
'D:\\OneDrive\\MAESTRIA IMEC\\Tesis\\datos\\1.81\\IR20211103_2275.is2\r\n',
'\r\n',
'",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,"\r\n',
I have also tried csv.reader() and doesn't work either.
Does anyone know what's going on and how can I solve it? How can I open my .csv without opening and saving it from excel first? The source program is SmartView from Fluke which reads thermal image file .is2 and converts it into a .csv file
Thank you very much

Your file is encoded with UTF-16 (Little Endian byte order). You can specify file encoding using encoding argument of open() function (list of standard encodings and their names you can find here).
Also I'd recommend to not use .readlines() as it will keep trailing newline chars. You can read all file content into as string (using .read()) and apply str.splitlines() to ... split string into a list of lines. Alternatively you can also consume file line by line and call str.rstrip() to cut trailing newline chars.
Final code:
filename = "datos/1.81/IR20211103_2275.csv"
with open(filename, encoding="utf16") as f:
lines = f.read().splitlines()
# OR
lines = [line.rstrip() for line in f]

g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "",encoding="utf-16")
f = f.readlines()
print(f)
try this one it may help

How do I read and append to a text file in one pass?

I want to check if a string is inside a text file and then append that string if it's not there.
I know I can probably do that by creating two separate with methods, one for reading and another for appending, but is it possible to read and append inside the same with method?
The closest I came up with is this:
with open("file.txt","r+") as file:
content=file.read()
print("aaa" in content)
file.seek(len(content))
file.write("\nccccc")
My file.txt:
aaaaa
bbbbb
When I run the code for the first time, I get this:
aaaaa
bbbbb
ccccc
but if I run it again, this comes up:
aaaaa
bbbbb
ccc
ccccc
I would expect the third line to be ccccc.
Anyone can explain why the last two characters are truncated in the second run? Also, how do I read and append text to a file?

Don't use seek on text-files. The length of the contents is not the length of the file in all cases. Either use binary file reading or use two separate withs:
with open("file.txt","r") as file:
content=file.read()
print("aaa" in content)
with open("file.txt","a") as file:
file.write("\nccccc")

Use this:
with open("file.txt","r+") as file:
content=file.read()
file.seek(0,2)
file.write("\nccccc")
Here we use fileObject.seek(offset[, whence]) with offset 0 & whence 2 that is seek to 0 characters from the end. And then write to the file.
OR (Alternate using SEEK_END):
import os
with open("file.txt", 'rb+') as file:
file.seek(-1, os.SEEK_END)
file.write("\nccccc\n")
Here we seek to SEEK_END of the file(with the os package) and then write to it.

Why not do this? Note, the first with statement is used for creating the file, not for reading or appending. So this solutions uses only one with to read and append.
string="aaaaa\nbbbbb"
with open("myfile", "w") as f:
f.write(string)
with open("myfile", "r+") as f:
if not "ccccc" in f.read():
f.write("\nccccc")

seek() is based on bytes, and your text file may not be encoded with precisely one byte per character. Since read() reads the whole file and leaves the pointer at the end, there's no need to seek anyway. Just remove that line from your code and it will work fine.
with open("file.txt","r+") as file:
content=file.read()
print("aaa" in content)
file.write("\nccccc")

How to read multiline command line arguments into a python2 CSV reader

I'm testing a Python 2 script that takes a CSV file's text as a command line argument.
The CSV is a standard Excel CSV file with ',' delimiters for items and presumably '\r\n' for line endings.
The problem is that I need to pass the CSV text as a single line to the script in order for it to recognise the string as a single argument. In order to do this, I open the CSV in notepad and replace all of the new lines with '\r\n' which enables me to read the test into my script successfully. However, when I try to create a csv.reader object from the string, the csv.reader only sees a single line where I want iot to see multiple lines.
Given the following CSV string example:
The,quick,brown,fox\r\njumps,over,the,lazy,dog
I would expect the following lines:
The,quick,brown,fox
jumps,over,the,lazy,dog
but instead I just end up with a single line:
The,quick,brown,fox\r\njumps,over,the,lazy,dog
Once I capture the string from the command line, I use the following to load it into a csv.reader:
input_str = self.GetCsvStringFromInput()
input_reader = csv.reader(StringIO.StringIO(input_str))
I'm using windows so I presumed that \r\n would be correct but I don't seem to be using the correct method.
Any ideas?
Thanks, in adv.!

Why don't you just read the lines directly from csv file? Such as:
with open('D:/file-1.csv','r') as csvfile:
creader=csv.reader(csvfile, delimiter=',')
for line in creader:
print line
just replace 'print line' with what your own function. If you do need manually copy each line from the csv file,you can try to split each line by '\r\n' before pass to the reader.
for line in 'The,quick,brown,fox\r\njumps,over,the,lazy,dog'.split('\r\n'):
print line

Mulitple Lines in a single Excel cell

What is the easiest method for writing multple lines into a single cell within excel using python. Ive trying the csv module without success.
import csv
with open('xyz.csv', 'wb') as outfile:
w = csv.writer(outfile)
w.writerow(['stringa','string_multiline',])
Also each of the mutliline stringshave a number of characters in which are typically used for csv`s ie commas.
Any help would be really appreciated.

To figure this out, I created a file in Excel with a single multiline cell.
Then I saved it as CSV and opened it up in a text editor:
"a^Mb"
It looks like Excel interprets Ctrl-M characters as newlines.
Let’s try that with Python:
#!/usr/bin/env python2.7
import csv
with open('xyz.csv', 'wb') as outfile:
w = csv.writer(outfile)
w.writerow(['stringa','multiline\015string',])
Yup, that worked!

you need to pass extra double quotes (") at the start and end of the string. Seperate the different lines of the cell using newline character (\n) .
e.g "line1\nline2\nline3"
f=open("filename.csv","w")
f.write("\"line1\nline2\nline3\"")`
The code creates this csv

How to create a text file in python without csv.writer?

I need a comma seperated txt file with txt extension.
"a,b,c"
I used csv.writer to create a csv file changed the extension. Another prog would not use/process the data. I tried "wb", "w."
F = open(Fn, 'w')
w = csv.writer(F)
w.writerow(sym)
F.close()
opened with notepad ---These are the complete files.
Their file: created using their gui used three symbols
PDCO,ICUI,DVA
my file : created using python
PDCO,ICUI,DVA
Tested: open thier file- worked, opened my file - failed.
Simple open and close with save in notepad. open my file-- worked
Works= 'PDCO,ICUI,DVA'
Fails= 'PDCO,ICUI,DVA\r\r\n'
Edit: writing txt file without Cvs writer.....
sym = ['MHS','MRK','AIG']
with open(r'C:\filename.txt', 'w') as F: # also try 'w'
for s in sym[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(sym[-1]) # end with the last symbol

To me, it look like you don't exactly know you third party application input format. If a .CSV isn't reconized, it might be something else.
Did you try to change the delimiter fromn ';' to ','
import csv
spamWriter = csv.writer(open('eggs.csv', 'wb'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Take a look in the CSV Python API

I think the problem is your file write mode, as per CSV file written with Python has blank lines between each row
If you create your csv file like
csv.writer(open('myfile.csv', 'w'))
csv.writer ends its lines in '\r\n', and Python's text file handling (on Windows machines) then converts '\n' to '\r\n', resulting in lines ending in '\r\r\n'. Many programs will choke on this; Notepad recognizes it as a problem and strips the extra '\r' out.
If you use
csv.writer(open('myfile.csv', 'wb'))
it produces the expected '\r\n' line ending, which should work as desired.
Edit: #senderle has a good point; try the following:
goodf = open('file_that_works.txt', 'rb')
print repr(goodf.read(100))
badf = open('file_that_fails.txt', 'rb')
print repr(badf.read(100))
paste the results of that here, so we can see how the two compare byte-for-byte.

Try this:
with open('file_that_works.csv', 'rb') as testfile: # file is automatically
d = csv.Sniffer().sniff(testfile.read(1024)) # closed at end of with
# block
with open(Fn, 'wb') as F: # also try 'w'
w = csv.writer(F, dialect=d)
w.writerow(sym)
To explain further: this looks at a sample of a working .csv file and deduces its format. Then it uses that format to write a new .csv file that, hopefully, will not have to be resaved in notepad.
Edit: if the program you're using doesn't accept multi-line input (?!) then don't use csv. Just do something like this:
syms = ['JAGHS','GJKDGJ','GJDFAJ']
with open('filename.txt', 'wb') as F:
for s in syms[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(syms[-1]) # end with the last symbol
Or more tersely:
with open('filename.txt', 'wb') as F:
F.write(','.join(syms))
Also, check different file extensions (i.e. .txt, .csv, etc) to make sure that's not the problem. If this program chokes on a newline, then anything is possible.

So, I save as text file.
Now, create my own txt file with python.
What are the exact differences between their file and your file? Exact.

I suspect that #Hugh's comment is correct that it's an encoding issue.
When you do a Save As in notepad, what's selected in the Encoding dropdown? If you select different encodings do some or all of those fail to be opened by the 3rd party program?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to replace a particular character in a CSV file - python

Try this: import fileinput with fileinput.FileInput("file.csv", inplace=True) as file: for line in file: print(line.replace('�', '"'), end='')

The sed command is designed for this kind of work. It finds and replaces characters from a file. Use this in a terminal. sed -i 's/old-word/new-word/g' filename.csv Your old-word should be the unknown character and new-word the double quote

Related

Can't open csv file in python without opening it in excel

How do I read and append to a text file in one pass?

How to read multiline command line arguments into a python2 CSV reader

Mulitple Lines in a single Excel cell

How to create a text file in python without csv.writer?

Categories

Resources