Can't open csv file in python without opening it in excel - python
I have a .csv file generated by a program. When I try to open it with the following code the output makes no sense, even though I have tried the same code with not program generated csv and it works fine.
g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "")
f = f.readlines()
print(f)
The output of the code looks like this
['ÿþA\x00l\x00l\x00 \x00t\x00e\x00m\x00p\x00e\x00r\x00a\x00t\x00u\x00r\x00e\x00s\x00 \x00i\x00n\x00 \x00°\x00F\x00.\x00\r',
'\x00\n',
'\x00\r',
'\x00\n',
'\x00D\x00:\x00\\\x00O\x00n\x00e\x00D\x00r\x00i\x00v\x00e\x00\\\x00M\x00A\x00E\x00S\x00T\x00R\x00I\x00A\x00 \x00I\x00M\x00E\x00C\x00\\\x00T\x00e\x00s\x00i\x00s\x00\\\x00d\x00a\x00t\x00o\x00s\x00\\\x001\x00.\x008\x001\x00\\\x00I\x00R\x002\x000\x002\x001\x001\x001\x000\x003\x00_\x002\x002\x007\x005\x00.\x00i\x00s\x002\x00\r',
However, when I first open the file with excel and save it as a .csv (replacing the original with the .csv from excel), the output is as expected, like this:
['All temperatures in °F.\r\n',
'\r\n',
'D:\\OneDrive\\MAESTRIA IMEC\\Tesis\\datos\\1.81\\IR20211103_2275.is2\r\n',
'\r\n',
'",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,"\r\n',
I have also tried csv.reader() and doesn't work either.
Does anyone know what's going on and how can I solve it? How can I open my .csv without opening and saving it from excel first? The source program is SmartView from Fluke which reads thermal image file .is2 and converts it into a .csv file
Thank you very much
Your file is encoded with UTF-16 (Little Endian byte order). You can specify file encoding using encoding argument of open() function (list of standard encodings and their names you can find here).
Also I'd recommend to not use .readlines() as it will keep trailing newline chars. You can read all file content into as string (using .read()) and apply str.splitlines() to ... split string into a list of lines. Alternatively you can also consume file line by line and call str.rstrip() to cut trailing newline chars.
Final code:
filename = "datos/1.81/IR20211103_2275.csv"
with open(filename, encoding="utf16") as f:
lines = f.read().splitlines()
# OR
lines = [line.rstrip() for line in f]
g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "",encoding="utf-16")
f = f.readlines()
print(f)
try this one it may help
Related
Python 3.8.5 alternative to .replace with csv.reader and UTF-8 mystery encodings
I have spent 5 hours throughout the dark recesses of SO so I am posting this question as a last resort, and I am genuinely hoping someone can point me in the right direction here: Scenario: I have some .csv files (UTF-8 CSVs: verified with the file -I command) from Google surveys that are in multiple languages. Output: download.csv: application/csv; charset=utf-8 I have a "dictionary" file that has the translations for the questions and answers (one column is the $language and the other is English). There are LOTS of special type characters (umlauts and French accent letters, etc..) in the data from Google, because French, German, Dutch The dictionary file I built reads fine as UTF-8 including special characters and creates the find/replace keys accurately (verified with print commands) The issue is that the Google files only read correctly (maintain proper characters) using the csv.read function in Python. However, that function does not have a .replace and so I can do one or the other: read in the source file, make no replacements, and get a perfect copy (not what I need) convert the csv files/rows to a fileinput/string (UTF-8 still, mind) and get an utterly thrashed output file with missing replacements because the data "looses" the encoding between the csv read and the string somehow? The code (here) comes closest to working, except there is no .replace method on csv.reader: import csv #set source, output source = 'fr_to_trans.csv' output = 'fr_translated.csv' dictionary = 'frtrans.csv' find = [] replace = [] # build the dictionary itself: with open(dictionary, encoding='utf-8') as dict_file: for line in dict_file: #print(line) temp_split = [] temp_split = line.split(',') if "!!" in temp_split[0] : temp_split[0] = temp_split[0].replace("!!", ",") find.append(temp_split[0]) if "!!" in temp_split[1] : temp_split[1] = temp_split[1].replace("!!", ",") replace.append(temp_split [1]) #print(len(find)) #print(len(replace)) #set loop counters check_each = len(find) # Read in the file to parse with open(source, 'r', encoding='utf-8') as s_file, open(output, 'w', encoding='utf-8') as t_file : output_writer = csv.writer(t_file) for row in csv.reader(s_file): the_row = row print(the_row) #THIS RETURNS THE CORRECT, FORMATTED, UTF-8 DATA i = 0 # find and replace everything in the find array with it's value in the replace array while i < check_each : print(find[i]) print(replace[i]) # THIS LINE DOES NOT WORK: the_row = the_row.replace(find[i], replace[i]) i = i + 1 output_writer.writerow(the_row) I have to assume that even though the Google files say they are UTF-8, they are a special "Google branded UTF-8" or some such nonsense. The fact that the file opens correctly with csv.reader, but then you can do nothing to it is infuriating beyond measure. Just to clarify what I have tried: Treat files as text and let Python sort out the encoding (fails) Treat files as UTF-8 text (fails) Open file as UTF-8, replace strings, and write out using the csv.writer (fails) Convert the_row to a string, then replace, then write out with csv.writer (fails) Quick edit - tried utf-8-sig with strings - better, but the output is still totally mangled because it isn't reading it as a csv, but strings I have not tried: "cell by cell" comparison instead of the whole row (working on that while this percolates on SO) Different encoding of the file (I can only get UTF-8 CSVs so would need some sort of utility?) If these were ASCII text I would have been done ages ago, but this whole "UTF-8 that isn't but is" thing is driving me mad. Anyone got any ideas on this?
Each row yielded by csv.reader is a list of cell values like ['42', 'spam', 'eggs'] Thus the line # THIS LINE DOES NOT WORK: the_row = the_row.replace(find[i], replace[i]) cannot possibly work, because lists don't have a replace method. What might work is to iterate over the row list and find/replace on each cell value (I'm assuming they are all strings) the_row = [cell.replace(find[i], replace[i]) for cell in the row] However, if all you want to do is replace all instances of some characters in the file with some other characters then it's simpler to open the file as a text file and replace without invoking any csv machinery: with open(source, 'r', encoding='utf-8') as s_file, open(output, 'w', encoding='utf-8') as t_file : text = source.read() for old, new in zip(find, replace): text = text.replace(old, new) t_file.write(text) If the find/replace mapping is the same for all files, you can use str.translate to avoid the for loop. # Make a reusable translation table trans_table = str.maketrans(dict(zip(find, replace))) with open(source, 'r', encoding='utf-8') as s_file, open(output, 'w', encoding='utf-8') as t_file : text = source.read() text = text.translate(trans_table) t_file.write(text) For clarity: csvs are text files, only formatted so that their contents can be interpreted as rows and columns. If you want to manipulate their contents as pure text it's fine to edit them as normal text files: as long as you don't change any of the characters used as delimiters or quote marks they will still be usuable as csvs when you want to use them as such.
How to replace a particular character in a CSV file
I have a folder of CSV files (~100) and every file has an unknows character that looks like this �. This unknown character is supposed to be a double quote ("). Because of this unknown char, I am not able to run my CSV to xlsx converter to convert my files to XLSX format. I tried using the csv.read() function but it does not with the replace function as csv.read() return a reader object and replace does not work with this. How can I replace that character and write the replaced contents back to csv so that I can run my csv to xlsx converter? example : current file contetnts: "hello� Output after convertion: "hello"
Try this: import fileinput with fileinput.FileInput("file.csv", inplace=True) as file: for line in file: print(line.replace('�', '"'), end='')
The sed command is designed for this kind of work. It finds and replaces characters from a file. Use this in a terminal. sed -i 's/old-word/new-word/g' filename.csv Your old-word should be the unknown character and new-word the double quote
I use this little function to deal with such problems. The code is quite self-explanatory. It opens a file, read it all (may not work for files larger than your RAM) then rewrites it with a patched version. def patch_file(file, original, patch): with open(file, 'r') as f: lines = f.readlines() with open(file, 'w') as f: for line in lines: f.write(line.replace(original, patch)) patch_file(file='yourCSVfile.txt', original='�', patch'"')
Find and replace texts in all files from the text file input using Python in Notepad++
I'm using Notepad ++ to do a find and replacement function. Currently I have a a huge numbers of text files. I need to do a replacement for different string in different file. I want do it in batch. For example. I have a folder that has the huge number of text file. I have another text file that has the strings for find and replace in order Text1 Text1-corrected Text2 Text2-corrected I have a small script that do this replacement only for the opened files in Notepad++. For achieving this I'm using python script in Notepad++. The code is as follows. with open('C:/replace.txt') as f: for l in f: s = l.split() editor.replace(s[0], s[1]) In simple words, the find and replace function should fetch the input from a file. Thanks in advance.
with open('replace.txt') as f: replacements = [tuple(line.split()) for line in f] for filename in filenames: with open(filename, 'w') as f: contents = f.read() for old, new in replacements: contents = contents.replace(old, new) f.write(contents) Read replacements into a list of tuples, then go through each file, and read the contents into memory, do the replacements, then write it back. I think the files get overwritten properly, but you might want to double check.
Raw string for variables in python?
I have seen several similar posts on this but nothing has solved my problem. I am reading a list of numbers with backslashes and writing them to a .csv. Obviously the backslashes are causing problems. addr = "6253\342\200\2236387" with open("output.csv", 'a') as w: write = writer(w) write.writerow([addr]) I found that using r"6253\342\200\2236387" gave me exactly what I want for the output but since I am reading my input from a file I can't use raw string. i tried .encode('string-escape') but that gave me 6253\xe2\x80\x936387 as output which is definitely not what I want. unicode-escape gave me an error. Any thoughts?
The r in front of a string is only for defining a string. If you're reading data from a file, it's already 'raw'. You shouldn't have to do anything special when reading in your data. Note that if your data is not plain ascii, you may need to decode it or read it in binary. For example, if the data is utf-8, you can open the file like this before reading: import codecs f = codecs.open("test", "r", "utf-8")
Text file contains... 1234\4567\7890 41\5432\345\6789 Code: with open('c:/tmp/numbers.csv', 'ab') as w: f = open(textfilepath) wr = csv.writer(w) for line in f: line = line.strip() wr.writerow([line]) f.close() This produced a csv with whole lines in a column. Maybe use 'ab' rather than 'a' as your file open type. I was getting extra blank records in my csv when using just 'a'.
I created this awhile back. This helps you write to a csv file. def write2csv(fileName,theData): theFile = open(fileName+'.csv', 'a') wr = csv.writer(theFile, delimiter = ',', quoting=csv.QUOTE_MINIMAL) wr.writerow(theData)
How to create a text file in python without csv.writer?
I need a comma seperated txt file with txt extension. "a,b,c" I used csv.writer to create a csv file changed the extension. Another prog would not use/process the data. I tried "wb", "w." F = open(Fn, 'w') w = csv.writer(F) w.writerow(sym) F.close() opened with notepad ---These are the complete files. Their file: created using their gui used three symbols PDCO,ICUI,DVA my file : created using python PDCO,ICUI,DVA Tested: open thier file- worked, opened my file - failed. Simple open and close with save in notepad. open my file-- worked Works= 'PDCO,ICUI,DVA' Fails= 'PDCO,ICUI,DVA\r\r\n' Edit: writing txt file without Cvs writer..... sym = ['MHS','MRK','AIG'] with open(r'C:\filename.txt', 'w') as F: # also try 'w' for s in sym[:-1]: # separate all but the last F.write(s + ',') # symbols with commas F.write(sym[-1]) # end with the last symbol
To me, it look like you don't exactly know you third party application input format. If a .CSV isn't reconized, it might be something else. Did you try to change the delimiter fromn ';' to ',' import csv spamWriter = csv.writer(open('eggs.csv', 'wb'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL) spamWriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) Take a look in the CSV Python API
I think the problem is your file write mode, as per CSV file written with Python has blank lines between each row If you create your csv file like csv.writer(open('myfile.csv', 'w')) csv.writer ends its lines in '\r\n', and Python's text file handling (on Windows machines) then converts '\n' to '\r\n', resulting in lines ending in '\r\r\n'. Many programs will choke on this; Notepad recognizes it as a problem and strips the extra '\r' out. If you use csv.writer(open('myfile.csv', 'wb')) it produces the expected '\r\n' line ending, which should work as desired. Edit: #senderle has a good point; try the following: goodf = open('file_that_works.txt', 'rb') print repr(goodf.read(100)) badf = open('file_that_fails.txt', 'rb') print repr(badf.read(100)) paste the results of that here, so we can see how the two compare byte-for-byte.
Try this: with open('file_that_works.csv', 'rb') as testfile: # file is automatically d = csv.Sniffer().sniff(testfile.read(1024)) # closed at end of with # block with open(Fn, 'wb') as F: # also try 'w' w = csv.writer(F, dialect=d) w.writerow(sym) To explain further: this looks at a sample of a working .csv file and deduces its format. Then it uses that format to write a new .csv file that, hopefully, will not have to be resaved in notepad. Edit: if the program you're using doesn't accept multi-line input (?!) then don't use csv. Just do something like this: syms = ['JAGHS','GJKDGJ','GJDFAJ'] with open('filename.txt', 'wb') as F: for s in syms[:-1]: # separate all but the last F.write(s + ',') # symbols with commas F.write(syms[-1]) # end with the last symbol Or more tersely: with open('filename.txt', 'wb') as F: F.write(','.join(syms)) Also, check different file extensions (i.e. .txt, .csv, etc) to make sure that's not the problem. If this program chokes on a newline, then anything is possible.
So, I save as text file. Now, create my own txt file with python. What are the exact differences between their file and your file? Exact.
I suspect that #Hugh's comment is correct that it's an encoding issue. When you do a Save As in notepad, what's selected in the Encoding dropdown? If you select different encodings do some or all of those fail to be opened by the 3rd party program?