Specific codes for a dat file that contains bytes (Python)

Specific codes for a dat file that contains bytes (Python) - python

I have another doubt related to reading the dat file.
The file format is DAT file (.dat)
The content inside the file is in bytes.
When I tried the run open file code, the program built and ran successfully. However, the python shell has no output (I can't see the contents from the file).
Since the content inside the file is in bytes, should I modify the code ? What is the code to use for bytes?
Thank you.

There is no "DAT" file format and, as you say, the file contains bytes - as do all files.
It's possible that the file contains binary data for which it's best to open the file in binary mode. You do that by specifying b as part of the mode parameter to open(), like this:
f = open('file.dat', 'rb')
data = f.read() # read the entire file into data
print(data)
f.close()
Note that the full mode parameter is set to rb which means open the file in binary mode for reading.
A better way is to use with:
with open('file.dat', 'rb') as f:
data = f.read()
print(data)
No need to explicitly close the file.
If you know that the file contains text, possibly encoded in some specific encoding, e.g. UTF8, then you can specify the encoding when you open the file (Python 3):
with open('file.dat', encoding='UTF8') as f:
for line in f:
print(line)
In Python 2 you can use io.open().

Related

Can't open csv file in python without opening it in excel

I have a .csv file generated by a program. When I try to open it with the following code the output makes no sense, even though I have tried the same code with not program generated csv and it works fine.
g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "")
f = f.readlines()
print(f)
The output of the code looks like this
['ÿþA\x00l\x00l\x00 \x00t\x00e\x00m\x00p\x00e\x00r\x00a\x00t\x00u\x00r\x00e\x00s\x00 \x00i\x00n\x00 \x00°\x00F\x00.\x00\r',
'\x00\n',
'\x00\r',
'\x00\n',
'\x00D\x00:\x00\\\x00O\x00n\x00e\x00D\x00r\x00i\x00v\x00e\x00\\\x00M\x00A\x00E\x00S\x00T\x00R\x00I\x00A\x00 \x00I\x00M\x00E\x00C\x00\\\x00T\x00e\x00s\x00i\x00s\x00\\\x00d\x00a\x00t\x00o\x00s\x00\\\x001\x00.\x008\x001\x00\\\x00I\x00R\x002\x000\x002\x001\x001\x001\x000\x003\x00_\x002\x002\x007\x005\x00.\x00i\x00s\x002\x00\r',
However, when I first open the file with excel and save it as a .csv (replacing the original with the .csv from excel), the output is as expected, like this:
['All temperatures in °F.\r\n',
'\r\n',
'D:\\OneDrive\\MAESTRIA IMEC\\Tesis\\datos\\1.81\\IR20211103_2275.is2\r\n',
'\r\n',
'",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,"\r\n',
I have also tried csv.reader() and doesn't work either.
Does anyone know what's going on and how can I solve it? How can I open my .csv without opening and saving it from excel first? The source program is SmartView from Fluke which reads thermal image file .is2 and converts it into a .csv file
Thank you very much

Your file is encoded with UTF-16 (Little Endian byte order). You can specify file encoding using encoding argument of open() function (list of standard encodings and their names you can find here).
Also I'd recommend to not use .readlines() as it will keep trailing newline chars. You can read all file content into as string (using .read()) and apply str.splitlines() to ... split string into a list of lines. Alternatively you can also consume file line by line and call str.rstrip() to cut trailing newline chars.
Final code:
filename = "datos/1.81/IR20211103_2275.csv"
with open(filename, encoding="utf16") as f:
lines = f.read().splitlines()
# OR
lines = [line.rstrip() for line in f]

g = 'datos/1.81/IR20211103_2275.csv'
f = open(g, "r", newline = "",encoding="utf-16")
f = f.readlines()
print(f)
try this one it may help

Python problem reading CSV files that contain the word NUL [duplicate]

I'm working with some CSV files, with the following code:
reader = csv.reader(open(filepath, "rU"))
try:
for row in reader:
print 'Row read successfully!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
And one file is throwing this error:
file my.csv, line 1: line contains NULL byte
What can I do? Google seems to suggest that it may be an Excel file that's been saved as a .csv improperly. Is there any way I can get round this problem in Python?
== UPDATE ==
Following #JohnMachin's comment below, I tried adding these lines to my script:
print repr(open(filepath, 'rb').read(200)) # dump 1st 200 bytes of file
data = open(filepath, 'rb').read()
print data.find('\x00')
print data.count('\x00')
And this is the output I got:
'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\ .... <snip>
8
13834
So the file does indeed contain NUL bytes.

As #S.Lott says, you should be opening your files in 'rb' mode, not 'rU' mode. However that may NOT be causing your current problem. As far as I know, using 'rU' mode would mess you up if there are embedded \r in the data, but not cause any other dramas. I also note that you have several files (all opened with 'rU' ??) but only one causing a problem.
If the csv module says that you have a "NULL" (silly message, should be "NUL") byte in your file, then you need to check out what is in your file. I would suggest that you do this even if using 'rb' makes the problem go away.
repr() is (or wants to be) your debugging friend. It will show unambiguously what you've got, in a platform independant fashion (which is helpful to helpers who are unaware what od is or does). Do this:
print repr(open('my.csv', 'rb').read(200)) # dump 1st 200 bytes of file
and carefully copy/paste (don't retype) the result into an edit of your question (not into a comment).
Also note that if the file is really dodgy e.g. no \r or \n within reasonable distance from the start of the file, the line number reported by reader.line_num will be (unhelpfully) 1. Find where the first \x00 is (if any) by doing
data = open('my.csv', 'rb').read()
print data.find('\x00')
and make sure that you dump at least that many bytes with repr or od.
What does data.count('\x00') tell you? If there are many, you may want to do something like
for i, c in enumerate(data):
if c == '\x00':
print i, repr(data[i-30:i]) + ' *NUL* ' + repr(data[i+1:i+31])
so that you can see the NUL bytes in context.
If you can see \x00 in the output (or \0 in your od -c output), then you definitely have NUL byte(s) in the file, and you will need to do something like this:
fi = open('my.csv', 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()
By the way, have you looked at the file (including the last few lines) with a text editor? Does it actually look like a reasonable CSV file like the other (no "NULL byte" exception) files?

data_initial = open("staff.csv", "rb")
data = csv.reader((line.replace('\0','') for line in data_initial), delimiter=",")
This works for me.

Reading it as UTF-16 was also my problem.
Here's my code that ended up working:
f=codecs.open(location,"rb","utf-16")
csvread=csv.reader(f,delimiter='\t')
csvread.next()
for row in csvread:
print row
Where location is the directory of your csv file.

You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
with open(filepath, "rb") as f:
reader = csv.reader( (line.replace('\0','') for line in f) )
try:
for row in reader:
print 'Row read successfully!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

I bumped into this problem as well. Using the Python csv module, I was trying to read an XLS file created in MS Excel and running into the NULL byte error you were getting. I looked around and found the xlrd Python module for reading and formatting data from MS Excel spreadsheet files. With the xlrd module, I am not only able to read the file properly, but I can also access many different parts of the file in a way I couldn't before.
I thought it might help you.

Converting the encoding of the source file from UTF-16 to UTF-8 solve my problem.
How to convert a file to utf-8 in Python?
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open(sourceFileName, "r", "utf-16") as sourceFile:
with codecs.open(targetFileName, "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)

Why are you doing this?
reader = csv.reader(open(filepath, "rU"))
The docs are pretty clear that you must do this:
with open(filepath, "rb") as src:
reader= csv.reader( src )
The mode must be "rb" to read.
http://docs.python.org/library/csv.html#csv.reader
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

appparently it's a XLS file and not a CSV file as http://www.garykessler.net/library/file_sigs.html confirm

Instead of csv reader I use read file and split function for string:
lines = open(input_file,'rb')
for line_all in lines:
line=line_all.replace('\x00', '').split(";")

I got the same error. Saved the file in UTF-8 and it worked.

This happened to me when I created a CSV file with OpenOffice Calc. It didn't happen when I created the CSV file in my text editor, even if I later edited it with Calc.
I solved my problem by copy-pasting in my text editor the data from my Calc-created file to a new editor-created file.

I had the same problem opening a CSV produced from a webservice which inserted NULL bytes in empty headers. I did the following to clean the file:
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
data = myfile.read()
# clean file first if dirty
if data.count( '\x00' ):
print 'Cleaning...'
with codecs.open('my.csv.tmp', 'w', 'utf-8') as of:
for line in data:
of.write(line.replace('\x00', ''))
shutil.move( 'my.csv.tmp', 'my.csv' )
with codecs.open ('my.csv', 'rb', 'utf-8') as myfile:
myreader = csv.reader(myfile, delimiter=',')
# Continue with your business logic here...
Disclaimer:
Be aware that this overwrites your original data. Make sure you have a backup copy of it. You have been warned!

I opened and saved the original csv file as a .csv file through Excel's "Save As" and the NULL byte disappeared.
I think the original encoding for the file I received was double byte unicode (it had a null character every other character) so saving it through excel fixed the encoding.

For all those 'rU' filemode haters: I just tried opening a CSV file from a Windows machine on a Mac with the 'rb' filemode and I got this error from the csv module:
Error: new-line character seen in unquoted field - do you need to
open the file in universal-newline mode?
Opening the file in 'rU' mode works fine. I love universal-newline mode -- it saves me so much hassle.

I encountered this when using scrapy and fetching a zipped csvfile without having a correct middleware to unzip the response body before handing it to the csvreader. Hence the file was not really a csv file and threw the line contains NULL byte error accordingly.

Have you tried using gzip.open?
with gzip.open('my.csv', 'rb') as data_file:
I was trying to open a file that had been compressed but had the extension '.csv' instead of 'csv.gz'. This error kept showing up until I used gzip.open

One case is that - If the CSV file contains empty rows this error may show up. Check for row is necessary before we proceed to write or read.
for row in csvreader:
if (row):
do something
I solved my issue by adding this check in the code.

In Python, how to use a file for writing bytes in it and reading as text

I would like to save bytes to a file and then read that file as a text. Can I do it with one with? What should I use, wb, r or wbr?
myBytesVar = b'line1\nline2'
with open('myFile.txt', 'wb') as fw:
fw.write(myBytesVar)
with open('myFile.txt', 'r') as fr:
myVar = fr.read()
print(myVar)

You don't need to re-read the file if you already have its contents stored in myBytesVar:
myBytesVar = b'line1\nline2'
with open('myFile.txt', 'wb') as fw:
fw.write(myBytesVar)
myVar = myBytesVar.decode('utf-8')
The encoding Python assumes when reading files as text without an explicit encoding is platform-dependent, so I'm just assuming UTF-8 will work.

Here is some information on what mode we should use :
The default mode is 'r' (open for reading text, synonym of 'rt'). For
binary read-write access, the mode 'w+b' opens and truncates the file
to 0 bytes. 'r+b' opens the file without truncation.
Read more here.
https://docs.python.org/3/library/functions.html#open

If you want to do it with one "with": When you write it "wb" is good.
when you read the file try it
myvar = open('MyVar.txt', 'r').read()
print(myvar)

Reading Binary File (.out) in Python and disassemble with Capstone

I have some trouble reading the .text section of a binary file.
The binary is compiled by gcc.
readelf -S binary_file
This command shows that
.text PROGBITS 0000831C 00031C 000340
The address if the .text section is 0000831c, offset = 00031c and size = 000340
I have tried
file = open('binary_file')
content = file.readlines()
And the Capstone could not recognize.
If the .text content looks like
f102 030e 0000 a0e3
how to read it as
content = b'\xf1\x02\x03\x0e\x00\x00\xa0\xe3'

By default, open() opens a file in text mode. To open a file in binary mode, you need to supply the appropriate mode: 'rb' - which means open for reading in binary mode.
readlines() is designed to read a line of text from a file, so it does not make sense to use it for reading from a binary file.
You want something like:
file = open('binary_file', 'rb')
content = file.read()

Python 2.7.3 . . . Write .jpg/.png image file?

So I have a .jpg/.png and I opened it up in Text Edit which I provided below:
Is there anyway I can save these exotic symbols to a string in Python to later write that to a file to produce an image?
I tried to import a string that had the beta symbol in it and I got an error that send Non-ASCII so I am assuming the same would happen for this.
Is there anyway to get around this problem?
Thanks
Portion of Image.png in Text Edit:

What you are looking at in your text edit is a binary file, trying to represent it all in human readable characters.
Just open the file as binary in python:
with open('picture.png', 'rb') as f:
data = f.read()
with open('picture_out.png', 'wb') as f:
f.write(data)

You can read to file in binary format by providing the rb flag to open and then just save what ever comes out of the file into a text file. I don't know what the point of this would be but there you go
# read in image data
fh = open('test.png','rb')
data = fh.read()
fh.close()
# write gobbledigoock to text file
fh = open('test.txt','w')
fh.write(data)
fh.close
fh.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Specific codes for a dat file that contains bytes (Python) - python

Related

Can't open csv file in python without opening it in excel

Python problem reading CSV files that contain the word NUL [duplicate]

In Python, how to use a file for writing bytes in it and reading as text

Reading Binary File (.out) in Python and disassemble with Capstone

Python 2.7.3 . . . Write .jpg/.png image file?

Categories

Resources