Python write to text file only if ASCII value - python

I am trying to write a program which will allow me to compare SQL files to each other and have started by writing the full SQL file to to a text file. The text file generates successfully, but with blocks at the end as in the below example:
SET ANSI_NULLS ON਍ഀ
GO਍ഀ
SET QUOTED_IDENTIFIER ON਍ഀ
GO਍ഀ
CREATE TABLE [dbo].[CDR](਍ഀ
Below this is the code that generates the text file
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
from _ast import Num
#imports packages
r= open('master_lines.txt', 'w')
directory= "E:\\" #file directory, anonymous omission
master= directory + "master"
databases= ["\\1", "\\2", "\\3", "\\4"]
file_types= ["\\StoredProcedure", "\\Table", "\\UserDefinedFunction", "\\View"]
servers= []
server_number= []
master_lines= []
for file in os.listdir("E:\\"): #adds server paths to an array
servers.append(file)
for num in range(0, len(servers)):
for file in os.listdir(directory + servers[num]): #adds all the servers and paths to an array
server_number.append(servers[num] + "\\" + file)
master= directory + server_number[server_number.index("master")]
master_var= master + databases[0]
tmp= master_var + file_types[1]
for file in os.listdir(tmp):
with open(file) as tmp_file:
line= tmp_file.readlines()
for num in range(0, len(line)):
r.write(line[num])
r.close
I have already tried changing the encoding to both latin1 and utf-8; the current text file is the most successful as ascii and latin1 produced chinese and arabic characters respectively.
Below is the SQL file in text format:
/****** Object: Table [dbo].[CDR] Script Date: 2017-01-12 02:30:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CDR](
[calldate] [datetime] NOT NULL,
[clid] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[src] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dst] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dcontext] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[channel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dstchannel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastapp] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastdata] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[duration] [int] NOT NULL,
[billsec] [int] NOT NULL,
[disposition] [varchar](45) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[amaflags] [int] NOT NULL,
[accountcode] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[userfield] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[uniqueid] [varchar](64) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[cdr_id] [int] NOT NULL,
[cost] [real] NOT NULL,
[cdr_tag] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[importID] [bigint] IDENTITY(-9223372036854775807,1) NOT NULL,
CONSTRAINT [PK_CDR_1] PRIMARY KEY CLUSTERED
(
[uniqueid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
) ON [ReadPartition]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [Idx_Dst_incl_uniqueId] Script Date: 2017-01-12 02:30:50 PM ******/
CREATE NONCLUSTERED INDEX [Idx_Dst_incl_uniqueId] ON [dbo].[CDR]
(
[dst] ASC
)
INCLUDE ( [uniqueid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
GO
Hex dump to understand what happens, not part of above question:
ff fe 2f 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00
20 00 4f 00 62 00 6a 00 65 00 63 00 74 00 3a 00
20 00 20 00 54 00 61 00 62 00 6c 00 65 00 20 00
5b 00 64 00 62 00 6f 00 5d 00 2e 00 5b 00 43 00
44 00 52 00 5d 00 20 00 20 00 20 00 20 00 53 00
63 00 72 00 69 00 70 00 74 00 20 00 44 00 61 00
74 00 65 00 3a 00 20 00 32 00 30 00 31 00 37 00
2d 00 30 00 31 00 2d 00 31 00 32 00 20 00 30 00
32 00 3a 00 33 00 30 00 3a 00 34 00 39 00 20 00
50 00 4d 00 20 00 2a 00 2a 00 2a 00 2a 00 2a 00
2a 00 2f 00 0d 00 0a 00 53 00 45 00 54 00 20 00
41 00 4e 00 53 00 49 00 5f 00 4e 00 55 00 4c 00
4c 00 53 00 20 00 4f 00 4e 00 0d 00 0a 00 47 00
4f 00 0d 00 0a 00 53 00 45 00 54 00 20 00 51 00
55 00 4f 00 54 00 45 00 44 00 5f 00 49 00 44 00
Result of hexdump:
../.*.*.*.*.*.*.
.O.b.j.e.c.t.:.
. .T.a.b.l.e. .
[.d.b.o.]...[.C.
D.R.]. . . . .S.
c.r.i.p.t. .D.a.
t.e.:. .2.0.1.7.
-.0.1.-.1.2. .0.
2.:.3.0.:.4.9. .
P.M. .*.*.*.*.*.
*./.....S.E.T. .
A.N.S.I._.N.U.L.
L.S. .O.N.....G.
O.....S.E.T. .Q.
U.O.T.E.D._.I.D.

Your problem is that the original files are encoded in UTF-16 with an initial Byte Order Mark. It is normally transparent on Windows because almost all file editors automatically read it thanks to the initial BOM.
But the conversion is not automatic for Python scripts! That means that every character is read as the character itself followed by a null. It is almost transparent except for end of lines, because the nulls are simply written back again to form normal UTF16 characters. But the \n is no longer preceded by a raw \r but with a null, as as you write in text mode, Python replaces it with a pair \r\n which is no longer a valid UTF16 character and this causes the bloc display.
This is trivial to fix, just declare the UTF16 encoding when reading files:
for file in os.listdir(tmp):
with open(file, encoding='utf_16_le') as tmp_file:
Optionally, if you want to preserve the UTF16 encoding, you could also open the master file with it. By default, Python will encode it as utf8. But my advice would be to revert to 8bit encoding files to avoid further problem if you later wanted to process the output file.

Related

0D0D turns into 0D on binascii.hexlify(file.read())

I'm trying to read a file's hex code using file.read() & binascii.hexlify(), and in place of 0D 0D in the original file python reads/prints only one 0D.
ex:
original file: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00
python: print(binascii.hexlify(f.read(16))) output: 6d6f64652e0d0a24000000000000001c
Any ideas as to why this is happening?

python os.read() not reading correct number of bytes

I'm trying to read blocks from a binary file (oracle redo log) but I'm having a issue where, when I try to read a 512 byte block using os.read(fd,512) I am returned less than 512 bytes. (the amount differs depending on the block)
the documentation states that "at most n Bytes" so this makes sense that I'm getting less than expected. How can I force it to keep reading until I get the correct amount of bytes back?
I've attempted to adapt the method described here Python f.read not reading the correct number of bytes But I still have the problem
def read_exactly(fd, size):
data = b''
remaining = size
while remaining: # or simply "while remaining", if you'd like
newdata = read(fd, remaining)
if len(newdata) == 0: # problem
raise IOError("Failed to read enough data")
data += newdata
remaining -= len(newdata)
return data
def get_one_block(fd, start, blocksize):
lseek(fd, start, 0)
blocksize = blocksize
print('Blocksize: ' + str(blocksize))
block = read_exactly(fd, blocksize)
print('Actual Blocksize: ' + str(block.__sizeof__()))
return block
which then returns the error: OSError: Failed to read enough data
My code:
from os import open, close, O_RDONLY, lseek, read, write, O_BINARY, O_CREAT, O_RDWR
def get_one_block(fd, start, blocksize):
lseek(fd, start, 0)
blocksize = blocksize
print('Blocksize: ' + str(blocksize))
block = read(fd, blocksize)
print('Actual Blocksize: ' + str(block.__sizeof__()))
return block
def main():
filename = "redo_logs/redo03.log"
fd = open(filename, O_RDONLY, O_BINARY)
b = get_one_block(fd, 512, 512)
Output
Blocksize: 512
Actual Blocksize: 502
in this instance the last byte read is 0xB3 which is followed by 0x1A which i believe is the problem.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
EF 42 B8 5A DC D1 63 1B A3 31 C7 5E 9F 4A B7 F4
4E 04 6B E8 B3<<-- stops here -->>1A 4F 3C BF C9 3C F6 9F C3 08 02
05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Any help would be greatly appreciated :)
You need to read inside a while loop and check the true number of bytes you've got.
If you got less you read again with the left delta.
the while exits when you got what you expected or reached EOF.

ADODB unable to store DATETIME value with sub-second precision

According to the Microsoft documentation for the DATETIME column type, values of that type can store "accuracy rounded to increments of .000, .003, or .007 seconds." According to their documentation for the data types used by ADODB, the adDBTimeStamp (code 135), which ADODB uses for DATETIME column parameters, "indicates a date/time stamp (yyyymmddhhmmss plus a fraction in billionths)." However, all attempts (tested using multiple versions of SQL Server, and both the SQLOLEDB provider and the newer SQLNCLI11 provider) fail when a parameter is passed with sub-second precision. Here's a repro case demonstrating the failure:
import win32com.client
# Connect to the database
conn_string = "Provider=...." # sensitive information redacted
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(conn_string)
# Create the temporary test table
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "CREATE TABLE #t (dt DATETIME NOT NULL)"
cmd.CommandType = 1 # adCmdText
cmd.Execute()
# Insert a row into the table (with whole second precision)
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "INSERT INTO #t VALUES (?)"
cmd.CommandType = 1 # adCmdText
params = cmd.Parameters
param = params.Item(0)
print("param type is {:d}".format(param.Type)) # 135 (adDBTimeStamp)
param.Value = "2018-01-01 12:34:56"
cmd.Execute() # this invocation succeeds
# Show the result
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "SELECT * FROM #t"
cmd.CommandType = 1 # adCmdText
rs, rowcount = cmd.Execute()
data = rs.GetRows(1)
print(data[0][0]) # displays the datetime value stored above
# Insert a second row into the table (with sub-second precision)
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "INSERT INTO #t VALUES (?)"
cmd.CommandType = 1 # adCmdText
params = cmd.Parameters
param = params.Item(0)
print("param type is {:d}".format(param.Type)) # 135 (adDBTimeStamp)
param.Value = "2018-01-01 12:34:56.003" # <- blows up here
cmd.Execute()
# Show the result
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "SELECT * FROM #t"
cmd.CommandType = 1 # adCmdText
rs, rowcount = cmd.Execute()
data = rs.GetRows(2)
print(data[0][1])
This code throws an exception on the line indicated above, with the error message "Application uses a value of the wrong type for the current operation." Is this a known bug in ADODB? If so, I haven't found any discussion of it. (Perhaps there was discussion earlier which disappeared when Microsoft killed the KB pages.) How can the value be of the wrong type if it matches the documentation?
This is a well-known bug in the SQL Server OLEDB drivers going back more than 20 years; which means it is never going to be fixed.
It's also not a bug in ADO. The ActiveX Data Objects (ADO) API is a thin wrapper around the underlying OLEDB API. The bug exists is in Microsoft's SQL Server OLEDB driver itself (all of them). And they will never, never, never fix it now; as they are chicken-shits that don't want to maintain existing code it might break existing applications.
So the bug has been carried forward for decades:
SQOLEDB (1999) → SQLNCLI (2005) → SQLNCLI10 (2008) → SQLNCLI11 (2010) → MSOLEDB (2012)
The only solution is rather than parameterizing your datetime as timestamp:
adTimestamp (aka DBTYPE_DBTIMESTAMP, 135)
you need to parameterize it an "ODBC 24-hour format" yyyy-mm-dd hh:mm:ss.zzz string:
adChar (aka DBTYPE_STR, 129): 2021-03-21 17:51:22.619
or with even with the ADO-specific type string type:
adVarChar (200): 2021-03-21 17:51:22.619
What about other DBTYPE_xxx's?
You might think that the adDate (aka DBTYPE_DATE, 7) looks promising:
Indicates a date value (DBTYPE_DATE). A date is stored as a double, the whole part of which is the number of days since December 30, 1899, and the fractional part of which is the fraction of a day.
But unfortunately not, as it also parameterizes the value to the server without milliseconds:
exec sp_executesql N'SELECT #P1 AS Sample',N'#P1 datetime','2021-03-21 06:40:24'
You also cannot use adFileTime, which also looks promising:
Indicates a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (DBTYPE_FILETIME).
Meaning it could support a resolution of 0.0000001 seconds.
Unfortunately by the rules of VARIANTs, you are not allowed to store a FILETIME in a VARIANT. And since ADO uses variants for all values, it throws up when it encounters variant type 64 (VT_FILETIME).
Decoding TDS to confirm our suspicions
We can confirm that the SQL Server OLEDB driver is not supplying a datetime with the available precision by decoding the packet sent to the server.
We can issue the batch:
SELECT ? AS Sample
And specify parameter 1: adDBTimestamp - 3/21/2021 6:40:23.693
Now we can capture that packet:
0000 03 01 00 7b 00 00 01 00 ff ff 0a 00 00 00 00 00 ...{............
0010 63 28 00 00 00 09 04 00 01 32 28 00 00 00 53 00 c(.......2(...S.
0020 45 00 4c 00 45 00 43 00 54 00 20 00 40 00 50 00 E.L.E.C.T. .#.P.
0030 31 00 20 00 41 00 53 00 20 00 53 00 61 00 6d 00 1. .A.S. .S.a.m.
0040 70 00 6c 00 65 00 00 00 63 18 00 00 00 09 04 00 p.l.e...c.......
0050 01 32 18 00 00 00 40 00 50 00 31 00 20 00 64 00 .2....#.P.1. .d.
0060 61 00 74 00 65 00 74 00 69 00 6d 00 65 00 00 00 a.t.e.t.i.m.e...
0070 6f 08 08 f2 ac 00 00 20 f9 6d 00 o...... .m.
And decode it:
03 ; Packet type. 0x03 = 3 ==> RPC
01 ; Status
00 7b ; Length. 0x07B ==> 123 bytes
00 00 ; SPID
01 ; Packet ID
00 ; Window
ff ff ; ProcName 0xFFFF => Stored procedure number. UInt16 number to follow
0a 00 ; PROCID 0x000A ==> stored procedure ID 10 (10=sp_executesql)
00 00 ; Option flags (16 bits)
00 00 63 28 00 00 00 09 ; blah blah blah
04 00 01 32 28 00 00 00 ;
53 00 45 00 4c 00 45 00 ; \
43 00 54 00 20 00 40 00 ; |
50 00 31 00 20 00 41 00 ; |- "SELECT #P1 AS Sample"
53 00 20 00 53 00 61 00 ; |
6d 00 70 00 6c 00 65 00 ; /
00 00 63 18 00 00 00 09 ; blah blah blah
04 00 01 32 18 00 00 00 ;
40 00 50 00 31 00 20 00 ; \
64 00 61 00 74 00 65 00 ; |- "#P1 datetime"
74 00 69 00 6d 00 65 00 ; /
00 00 6f 08 08 ; blah blah blah
f2 ac 00 00 ; 0x0000ACF2 = 44,274 ==> 1/1/1900 + 44,274 days = 3/21/2021
20 f9 6d 00 ; 0x006DF920 = 7,207,200 ==> 7,207,200 / 300 seconds after midnight = 24,024.000 seconds = 6h 40m 24.000s = 6:40:24.000 AM
The short version is that a datetime is specified on-the-wire as:
datetime is represented in the following sequence:
One 4-byte signed integer that represents the number of days since January 1, > 1900. Negative numbers are allowed to represent dates since January 1, 1753.
One 4-byte unsigned integer that represents the number of one three-hundredths of a second (300 counts per second) elapsed since 12 AM that day.
Which means we can read the datetime supplied by the driver as:
Date portion: 0x0000acf2 = 44,274 = January 1, 1900 + 44,274 days = 3/21/2021
Time portion: 0x006df920 = 7,207,200 = 7,207,200 / 300 seconds = 6:40:24 AM
So the driver cut off the precision of our datetime:
Supplied date: 2021-03-21 06:40:23.693
Date in TDS: 2021-03-21 06:40:24
In other words:
OLE Automation uses Double to represent datetime.
The Double has a resolution to ~0.0000003 seconds.
The driver has the option to encode the time down to 1/300th of a second:
6:40:24.693 → 7,207,407 → 0x006DF9EF
But it chose not to. Bug: Driver.
Resources to help decoding TDS
2.2.6.6 RPC Request
4.8 RPC Client Request (actual hex example)
2.2.5.5.1.8 Date/Times

Python , read hex little endian do math with it and overwrite the result to the old position

I'm new in programming and i need help
i have a hex file like this:
43 52 53 00 00 00 00 00 00 00 01 01 30 00 00 00
10 87 01 00 13 00 00 00 10 00 00 00 00 00 00 00
40 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
i need the python code that let me read the little endian "10 87 01" do math , overwrite the result in the exact offset and save it
like 10 87 01 + 40 01 = 50 88 01
43 52 53 00 00 00 00 00 00 00 01 01 30 00 00 00
50 88 01 00 13 00 00 00 10 00 00 00 00 00 00 00
40 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
hope its clear
you can use the struct library to handle little endian as shown in the doc at https://docs.python.org/3/library/struct.html#struct.pack_into
For your specific task, I don't know if I understood correctly, because you didn't specify what kind of data do you have in your binary file... let's assume we have signed 32 bit integers, my code would be something like this:
import struct
# we are assuming you have 32 bit integers on your file
block_size = 4
filename = "prova.bin"
# function to do "some math... :)"
def do_some_math(my_hex_value):
return my_hex_value + 1
# open and read the whole file
with open(filename, "r+b") as f:
my_byte = f.read(block_size)
while(len(my_byte) == block_size):
# unpack the 4 bytes value read from file
# "<" stands for "little endian"
# "i" stands for "integer"
# more info on the struct library in the official doc
my_hex_value = struct.unpack_from("<i", my_byte)[0]
print("Before math = " + str(my_hex_value))
# let's do some math
my_hex_value = do_some_math(my_hex_value)
print("After math = " + str(my_hex_value))
# let's repack the hex back
my_byte = struct.pack("<i", my_hex_value)
# let's reposition the file pointer so as to overwrite
# the bytes we have previously read
f.seek(f.tell() - block_size)
# let's override the old bytes
f.write(my_byte)
# let's read another chunk to repeat till the eof
my_byte = f.read(block_size)
Hope this helps
All the best
Dave
assuming you have your math function working to calculate the new pattern, you could use a function like this:
def replace_pattern(old_pattern, new_pattern, occurrence):
with open('input_file.txt','r') as myfile:
myline=""
for line in myfile:
myline += line
if occurrence == 0: # assume 0 is used to indicate all occurrences must be replaced
if myline.find(old_pattern) == -1:
print('pattern not found, exit')
return
else:
newline = myline.replace(old_pattern, new_pattern)
else: #a particular occurrence has to be updated
idx = 0
offset=0
nbmatch = 0
while idx != -1:
idx = myline.find(old_pattern, offset)
if idx != -1:
offset = idx+1
nbmatch += 1
if nbmatch == occurrence:
# the index of the target occurrence has been reached
break
if nbmatch == 0:
print('problem, at least one occurrence expected')
return
elif nbmatch == 1:
print('problem, more than one occurrence expected, replace anyway')
newline = myline.replace(old_pattern, new_pattern)
else:
# further processing on a part of the line
sameline = myline[:idx]
diffline = myline[idx:]
# work on diffline substring
diffline = diffline.replace(old_pattern,new_pattern,1)
# rebuild line
newline = sameline+diffline
with open('input_file.txt','w') as myfile:
myfile.write(newline)
I may be not optimized but it should work as expected

numpy.genfromtxt csv file with null characters

I'm working on a scientific graphing script, designed to create graphs from csv files output by Agilent's Chemstation software.
I got the script working perfectly when the files come from one version of Chemstation (The version for liquid chromatography).
Now i'm trying to port it to work on our GC (Gas Chromatography). For some reason, this version of chemstation inserts nulls in between each character in any text file it outputs.
I'm trying to use numpy.genfromtxt to get the x,y data into python in order to create the graphs (using matplotlib).
I originally used:
data = genfromtxt(directory+signal, delimiter = ',')
to load the data in. When I do this with a csv file generated by our GC, I get an array of all 'nan' values. If I set the dtype to none, I get 'byte strings' that look like this:
b'\x00 \x008\x008\x005\x00.\x002\x005\x002\x001\x007\x001\x00\r'
What I need is a float, for the above string it would be 885.252171.
Anyone have any idea how I can get where I need to go?
And just to be clear, I couldn't find any setting on Chemstation that would affect it's output to just not create files with nulls.
Thanks
Jeff
Given that your file is encoded as utf-16-le with a BOM, and all the actual unicode codepoints (except the BOM) are less than 128, you should be able to use an instance of codecs.EncodedFile to transcode the file from utf-16 to ascii. The following example works for me.
Here's my test file:
$ cat utf_16_le_with_bom.csv
??2.0,19
1.5,17
2.5,23
1.0,10
3.0,5
The first two bytes, ff and fe are the BOM U+FEFF:
$ hexdump utf_16_le_with_bom.csv
0000000 ff fe 32 00 2e 00 30 00 2c 00 31 00 39 00 0a 00
0000010 31 00 2e 00 35 00 2c 00 31 00 37 00 0a 00 32 00
0000020 2e 00 35 00 2c 00 32 00 33 00 0a 00 31 00 2e 00
0000030 30 00 2c 00 31 00 30 00 0a 00 33 00 2e 00 30 00
0000040 2c 00 35 00 0a 00
0000046
Here's the python script genfromtxt_utf16.py (updated for Python 3):
import codecs
import numpy as np
fh = open('utf_16_le_with_bom.csv', 'rb')
efh = codecs.EncodedFile(fh, data_encoding='ascii', file_encoding='utf-16')
a = np.genfromtxt(efh, delimiter=',')
fh.close()
print("a:")
print(a)
With python 3.4.1 and numpy 1.8.1, the script works:
$ python3.4 genfromtxt_utf16.py
a:
[[ 2. 19. ]
[ 1.5 17. ]
[ 2.5 23. ]
[ 1. 10. ]
[ 3. 5. ]]
Be sure that you don't specify the encoding as file_encoding='utf-16-le'. If the endian suffix is included, the BOM is not stripped, and it can't be transcoded to ascii.

Categories