According to the Microsoft documentation for the DATETIME column type, values of that type can store "accuracy rounded to increments of .000, .003, or .007 seconds." According to their documentation for the data types used by ADODB, the adDBTimeStamp (code 135), which ADODB uses for DATETIME column parameters, "indicates a date/time stamp (yyyymmddhhmmss plus a fraction in billionths)." However, all attempts (tested using multiple versions of SQL Server, and both the SQLOLEDB provider and the newer SQLNCLI11 provider) fail when a parameter is passed with sub-second precision. Here's a repro case demonstrating the failure:
import win32com.client
# Connect to the database
conn_string = "Provider=...." # sensitive information redacted
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(conn_string)
# Create the temporary test table
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "CREATE TABLE #t (dt DATETIME NOT NULL)"
cmd.CommandType = 1 # adCmdText
cmd.Execute()
# Insert a row into the table (with whole second precision)
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "INSERT INTO #t VALUES (?)"
cmd.CommandType = 1 # adCmdText
params = cmd.Parameters
param = params.Item(0)
print("param type is {:d}".format(param.Type)) # 135 (adDBTimeStamp)
param.Value = "2018-01-01 12:34:56"
cmd.Execute() # this invocation succeeds
# Show the result
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "SELECT * FROM #t"
cmd.CommandType = 1 # adCmdText
rs, rowcount = cmd.Execute()
data = rs.GetRows(1)
print(data[0][0]) # displays the datetime value stored above
# Insert a second row into the table (with sub-second precision)
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "INSERT INTO #t VALUES (?)"
cmd.CommandType = 1 # adCmdText
params = cmd.Parameters
param = params.Item(0)
print("param type is {:d}".format(param.Type)) # 135 (adDBTimeStamp)
param.Value = "2018-01-01 12:34:56.003" # <- blows up here
cmd.Execute()
# Show the result
cmd = win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection = conn
cmd.CommandText = "SELECT * FROM #t"
cmd.CommandType = 1 # adCmdText
rs, rowcount = cmd.Execute()
data = rs.GetRows(2)
print(data[0][1])
This code throws an exception on the line indicated above, with the error message "Application uses a value of the wrong type for the current operation." Is this a known bug in ADODB? If so, I haven't found any discussion of it. (Perhaps there was discussion earlier which disappeared when Microsoft killed the KB pages.) How can the value be of the wrong type if it matches the documentation?
This is a well-known bug in the SQL Server OLEDB drivers going back more than 20 years; which means it is never going to be fixed.
It's also not a bug in ADO. The ActiveX Data Objects (ADO) API is a thin wrapper around the underlying OLEDB API. The bug exists is in Microsoft's SQL Server OLEDB driver itself (all of them). And they will never, never, never fix it now; as they are chicken-shits that don't want to maintain existing code it might break existing applications.
So the bug has been carried forward for decades:
SQOLEDB (1999) → SQLNCLI (2005) → SQLNCLI10 (2008) → SQLNCLI11 (2010) → MSOLEDB (2012)
The only solution is rather than parameterizing your datetime as timestamp:
adTimestamp (aka DBTYPE_DBTIMESTAMP, 135)
you need to parameterize it an "ODBC 24-hour format" yyyy-mm-dd hh:mm:ss.zzz string:
adChar (aka DBTYPE_STR, 129): 2021-03-21 17:51:22.619
or with even with the ADO-specific type string type:
adVarChar (200): 2021-03-21 17:51:22.619
What about other DBTYPE_xxx's?
You might think that the adDate (aka DBTYPE_DATE, 7) looks promising:
Indicates a date value (DBTYPE_DATE). A date is stored as a double, the whole part of which is the number of days since December 30, 1899, and the fractional part of which is the fraction of a day.
But unfortunately not, as it also parameterizes the value to the server without milliseconds:
exec sp_executesql N'SELECT #P1 AS Sample',N'#P1 datetime','2021-03-21 06:40:24'
You also cannot use adFileTime, which also looks promising:
Indicates a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (DBTYPE_FILETIME).
Meaning it could support a resolution of 0.0000001 seconds.
Unfortunately by the rules of VARIANTs, you are not allowed to store a FILETIME in a VARIANT. And since ADO uses variants for all values, it throws up when it encounters variant type 64 (VT_FILETIME).
Decoding TDS to confirm our suspicions
We can confirm that the SQL Server OLEDB driver is not supplying a datetime with the available precision by decoding the packet sent to the server.
We can issue the batch:
SELECT ? AS Sample
And specify parameter 1: adDBTimestamp - 3/21/2021 6:40:23.693
Now we can capture that packet:
0000 03 01 00 7b 00 00 01 00 ff ff 0a 00 00 00 00 00 ...{............
0010 63 28 00 00 00 09 04 00 01 32 28 00 00 00 53 00 c(.......2(...S.
0020 45 00 4c 00 45 00 43 00 54 00 20 00 40 00 50 00 E.L.E.C.T. .#.P.
0030 31 00 20 00 41 00 53 00 20 00 53 00 61 00 6d 00 1. .A.S. .S.a.m.
0040 70 00 6c 00 65 00 00 00 63 18 00 00 00 09 04 00 p.l.e...c.......
0050 01 32 18 00 00 00 40 00 50 00 31 00 20 00 64 00 .2....#.P.1. .d.
0060 61 00 74 00 65 00 74 00 69 00 6d 00 65 00 00 00 a.t.e.t.i.m.e...
0070 6f 08 08 f2 ac 00 00 20 f9 6d 00 o...... .m.
And decode it:
03 ; Packet type. 0x03 = 3 ==> RPC
01 ; Status
00 7b ; Length. 0x07B ==> 123 bytes
00 00 ; SPID
01 ; Packet ID
00 ; Window
ff ff ; ProcName 0xFFFF => Stored procedure number. UInt16 number to follow
0a 00 ; PROCID 0x000A ==> stored procedure ID 10 (10=sp_executesql)
00 00 ; Option flags (16 bits)
00 00 63 28 00 00 00 09 ; blah blah blah
04 00 01 32 28 00 00 00 ;
53 00 45 00 4c 00 45 00 ; \
43 00 54 00 20 00 40 00 ; |
50 00 31 00 20 00 41 00 ; |- "SELECT #P1 AS Sample"
53 00 20 00 53 00 61 00 ; |
6d 00 70 00 6c 00 65 00 ; /
00 00 63 18 00 00 00 09 ; blah blah blah
04 00 01 32 18 00 00 00 ;
40 00 50 00 31 00 20 00 ; \
64 00 61 00 74 00 65 00 ; |- "#P1 datetime"
74 00 69 00 6d 00 65 00 ; /
00 00 6f 08 08 ; blah blah blah
f2 ac 00 00 ; 0x0000ACF2 = 44,274 ==> 1/1/1900 + 44,274 days = 3/21/2021
20 f9 6d 00 ; 0x006DF920 = 7,207,200 ==> 7,207,200 / 300 seconds after midnight = 24,024.000 seconds = 6h 40m 24.000s = 6:40:24.000 AM
The short version is that a datetime is specified on-the-wire as:
datetime is represented in the following sequence:
One 4-byte signed integer that represents the number of days since January 1, > 1900. Negative numbers are allowed to represent dates since January 1, 1753.
One 4-byte unsigned integer that represents the number of one three-hundredths of a second (300 counts per second) elapsed since 12 AM that day.
Which means we can read the datetime supplied by the driver as:
Date portion: 0x0000acf2 = 44,274 = January 1, 1900 + 44,274 days = 3/21/2021
Time portion: 0x006df920 = 7,207,200 = 7,207,200 / 300 seconds = 6:40:24 AM
So the driver cut off the precision of our datetime:
Supplied date: 2021-03-21 06:40:23.693
Date in TDS: 2021-03-21 06:40:24
In other words:
OLE Automation uses Double to represent datetime.
The Double has a resolution to ~0.0000003 seconds.
The driver has the option to encode the time down to 1/300th of a second:
6:40:24.693 → 7,207,407 → 0x006DF9EF
But it chose not to. Bug: Driver.
Resources to help decoding TDS
2.2.6.6 RPC Request
4.8 RPC Client Request (actual hex example)
2.2.5.5.1.8 Date/Times
Related
Here is my python code:
import socket
data = bytes.fromhex("47 A2 62 19 20 00 00 00 00")
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
ADDRESS = ("192.168.0.1",9000)
s.connect(ADDRESS)
s.send(data)
I want to put 4 random hexadecimal bytes after "47 A2 62 19 20 00 00 00 00". For example "47 A2 62 19 20 00 00 00 00 20 1E 4A 72".
I wonder what to do. Or is there a better way?
random.randbytes can generate random bytes for you. Note that a hexdecimal pair is two bytes, not one byte, so you'll need eight random bytes, not four:
import random
data = bytes.fromhex("47 A2 62 19 20 00 00 00 00") + random.randbytes(8)
Good day,
I am not sure where I should place this question , I am learning about DNS and how it works and as I understand I send a request out on UDP port 53 to a server and the host should respond to me on that port correct?
Here is a script that I am working with and it works and accurately describes the DNS message query and usage and even gets a DNS answer back for me.
How is this possible if it cannot listen on port 53 with out having root on a system?
DNS PACKET DETAILS
;DNS HEADER;
; AA AA - ID
; 01 00 - Query parameters
; 00 01 - Number of questions
; 00 00 - Number of answers
; 00 00 - Number of authority records
; 00 00 - Number of additional records
; DNS QUESTION --
; 07 - 'example' has length 7, ;so change this to be the length of domain ; keep in ming there are not '.' in the question.
; 65 - e
; 78 - x
; 61 - a
; 6D - m
; 70 - p
; 6C - l
; 65 - e
; 03 - subdomain '.com' length 03 ; change this to be the length of type.
; 63 - c
; 6F - o
; 6D - m
CODE :
import binascii
import socket
def send_udp_message(message, address, port):
"""send_udp_message sends a message to UDP server
message should be a hexadecimal encoded string
"""
message = message.replace(" ", "").replace("\n", "")
server_address = (address, port)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
sock.sendto(binascii.unhexlify(message), server_address)
data, _ = sock.recvfrom(4096)
finally:
sock.close()
return binascii.hexlify(data).decode("utf-8")
def format_hex(hex):
"""format_hex returns a pretty version of a hex string"""
octets = [hex[i:i+2] for i in range(0, len(hex), 2)]
pairs = [" ".join(octets[i:i+2]) for i in range(0, len(octets), 2)]
return "\n".join(pairs)
message = "AA AA 01 00 00 01 00 00 00 00 00 00 " \
"07 65 78 61 6d 70 6c 65 03 63 6f 6d 00 00 01 00 01"
response = send_udp_message(message, "8.8.8.8", 53)
print(format_hex(response))
RESPONSE:
aa aa
81 80
00 01
00 01
00 00
00 00
07 65
78 61
6d 70
6c 65
03 63
6f 6d
00 00
01 00
01 c0
0c 00
01 00
01 00
00 32
98 00
04 5d
b8 d8
22
If you look at the last four bytes you'll see that this is the IP for example.com in hex 5db8d822
You can go here to check it out.
HEX to IP converter Online
No, your source port is not port 53. User processes are allocated outbound port numbers above 1023, which are unprivileged.
A simple synchronous Python DNS client will basically block and hold the same port open until the server responds. The IP packet you send contains the information that the server needs in order to know where to reply (this is in the headers of the IP packet itself, before the DNS query payload).
Summary: when I use Thrift to serialize map in C++ to disk, and then de-serialize it using Python, I do not get back the same object.
A minimal example to reproduce to the problem is in Github repo https://github.com/brunorijsman/reproduce-thrift-crash
Clone this repo on Ubuntu (tested on 16.04) and follow the instructions at the top of the file reproduce.sh
I have the following Thrift model file, which (as you can see) contains a map indexed by a struct:
struct Coordinate {
1: required i32 x;
2: required i32 y;
}
struct Terrain {
1: required map<Coordinate, i32> altitude_samples;
}
I use the following C++ code to create an object with 3 coordinates in the map (see the repo for complete code for all snippets below):
Terrain terrain;
add_sample_to_terrain(terrain, 10, 10, 100);
add_sample_to_terrain(terrain, 20, 20, 200);
add_sample_to_terrain(terrain, 30, 30, 300);
where:
void add_sample_to_terrain(Terrain& terrain, int32_t x, int32_t y, int32_t altitude)
{
Coordinate coordinate;
coordinate.x = x;
coordinate.y = y;
std::pair<Coordinate, int32_t> sample(coordinate, altitude);
terrain.altitude_samples.insert(sample);
}
I use the following C++ code to serialize an object to disk:
shared_ptr<TFileTransport> transport(new TFileTransport("terrain.dat"));
shared_ptr<TBinaryProtocol> protocol(new TBinaryProtocol(transport));
terrain.write(protocol.get());
Important note: for this to work correctly, I had to implement the function Coordinate::operator<. Thrift does generate the declaration for the Coordinate::operator< but does not generate the implementation of Coordinate::operator<. The reason for this is that Thrift does not understand the semantics of the struct and hence cannot guess the correct implementation of the comparison operator. This is discussed at http://mail-archives.apache.org/mod_mbox/thrift-user/201007.mbox/%3C4C4E08BD.8030407#facebook.com%3E
// Thrift generates the declaration but not the implementation of operator< because it has no way
// of knowning what the criteria for the comparison are. So, provide the implementation here.
bool Coordinate::operator<(const Coordinate& other) const
{
if (x < other.x) {
return true;
} else if (x > other.x) {
return false;
} else if (y < other.y) {
return true;
} else {
return false;
}
}
Then, finally, I use the following Python code to de-serialize the same object from disk:
file = open("terrain.dat", "rb")
transport = thrift.transport.TTransport.TFileObjectTransport(file)
protocol = thrift.protocol.TBinaryProtocol.TBinaryProtocol(transport)
terrain = Terrain()
terrain.read(protocol)
print(terrain)
This Python program outputs:
Terrain(altitude_samples=None)
In other words, the de-serialized Terrain contains no terrain_samples, instead of the expected dictionary with 3 coordinates.
I am 100% sure that the file terrain.dat contains valid data: I also de-serialized the same data using C++ and in that case, I do get the expected results (see repo for details)
I suspect that this has something to do with the comparison operator.
I gut feeling is that I should have done something similar in Python with respect to the comparison operator as I did in C++. But I don't know what that missing something would be.
Additional information added on 19-Sep-2018:
Here is a hexdump of the encoding produced by the C++ encoding program:
Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 01 00 00 00 0D 02 00 00 00 00 01 01 00 00 00 0C ................
00000010: 01 00 00 00 08 04 00 00 00 00 00 00 03 01 00 00 ................
00000020: 00 08 02 00 00 00 00 01 04 00 00 00 00 00 00 0A ................
00000030: 01 00 00 00 08 02 00 00 00 00 02 04 00 00 00 00 ................
00000040: 00 00 0A 01 00 00 00 00 04 00 00 00 00 00 00 64 ...............d
00000050: 01 00 00 00 08 02 00 00 00 00 01 04 00 00 00 00 ................
00000060: 00 00 14 01 00 00 00 08 02 00 00 00 00 02 04 00 ................
00000070: 00 00 00 00 00 14 01 00 00 00 00 04 00 00 00 00 ................
00000080: 00 00 C8 01 00 00 00 08 02 00 00 00 00 01 04 00 ..H.............
00000090: 00 00 00 00 00 1E 01 00 00 00 08 02 00 00 00 00 ................
000000a0: 02 04 00 00 00 00 00 00 1E 01 00 00 00 00 04 00 ................
000000b0: 00 00 00 00 01 2C 01 00 00 00 00 .....,.....
The first 4 bytes are 01 00 00 00
Using a debugger to step through the Python decoding function reveals that:
This is being decoded as a struct (which is expected)
The first byte 01 is interpreted as the field type. 01 means field type VOID.
The next two bytes are interpreted as the field id. 00 00 means field ID 0.
For field type VOID, nothing else is read and we continue to the next field.
The next byte is interpreted as the field type. 00 means STOP.
We top reading data for the struct.
The final result is an empty struct.
All off the above is consistent with the information at https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md which describes the Thrift binary encoding format
My conclusion thus far is that the C++ encoder appears to produce an "incorrect" binary encoding (I put incorrect in quotes because certainly something as blatant as that would have been discovered by lots of other people, so I am sure that I am still missing something).
Additional information added on 19-Sep-2018:
It appears that the C++ implementation of TFileTransport has the concept of "events" when writing to disk.
The output which is written to disk is divided into a sequence of "events" where each "event" is preceded by a 4-byte length field of the event, followed by the contents of the event.
Looking at the hexdump above, the first couple of events are:
0100 0000 0d : Event length 1, event value 0d
02 0000 0000 01 : Event length 2, event value 00 01
Etc.
The Python implementation of TFileTransport does not understand this concept of events when parsing the file.
It appears that the problem is one of the following two:
1) Either the C++ code should not be inserting these event lengths into the encoded file,
2) Or the Python code should understand these event lengths when decoding the file.
Note that all these event lengths make the C++ encode file much larger than the Python encoded file.
Sadly C++ TFileTransport is not totally portable and will not work with Python's TFileObjectTransport. If you switch to C++ TSimpleFileTransport it will work as expected, with Python TFileObjectTransport and with Java TSimpleFileTransport.
Take a look at the examples here:
https://github.com/RandyAbernethy/ThriftBook/tree/master/part2/types/complex
They do pretty much exactly what you are attempting in Java and Python and you can find examples with C++, Java and Python here (though they add a zip compression layer):
https://github.com/RandyAbernethy/ThriftBook/tree/master/part2/types/zip
Another caution however would be against the use of complex key types. Complex key types require (as you discovered) comparators but will flat out not work with some languages. I might suggest, for example:
map<x,map<y,alt>>
giving the same utility but eliminating a whole class of possible problems (and no need for comparators).
I'm new in programming and i need help
i have a hex file like this:
43 52 53 00 00 00 00 00 00 00 01 01 30 00 00 00
10 87 01 00 13 00 00 00 10 00 00 00 00 00 00 00
40 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
i need the python code that let me read the little endian "10 87 01" do math , overwrite the result in the exact offset and save it
like 10 87 01 + 40 01 = 50 88 01
43 52 53 00 00 00 00 00 00 00 01 01 30 00 00 00
50 88 01 00 13 00 00 00 10 00 00 00 00 00 00 00
40 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
hope its clear
you can use the struct library to handle little endian as shown in the doc at https://docs.python.org/3/library/struct.html#struct.pack_into
For your specific task, I don't know if I understood correctly, because you didn't specify what kind of data do you have in your binary file... let's assume we have signed 32 bit integers, my code would be something like this:
import struct
# we are assuming you have 32 bit integers on your file
block_size = 4
filename = "prova.bin"
# function to do "some math... :)"
def do_some_math(my_hex_value):
return my_hex_value + 1
# open and read the whole file
with open(filename, "r+b") as f:
my_byte = f.read(block_size)
while(len(my_byte) == block_size):
# unpack the 4 bytes value read from file
# "<" stands for "little endian"
# "i" stands for "integer"
# more info on the struct library in the official doc
my_hex_value = struct.unpack_from("<i", my_byte)[0]
print("Before math = " + str(my_hex_value))
# let's do some math
my_hex_value = do_some_math(my_hex_value)
print("After math = " + str(my_hex_value))
# let's repack the hex back
my_byte = struct.pack("<i", my_hex_value)
# let's reposition the file pointer so as to overwrite
# the bytes we have previously read
f.seek(f.tell() - block_size)
# let's override the old bytes
f.write(my_byte)
# let's read another chunk to repeat till the eof
my_byte = f.read(block_size)
Hope this helps
All the best
Dave
assuming you have your math function working to calculate the new pattern, you could use a function like this:
def replace_pattern(old_pattern, new_pattern, occurrence):
with open('input_file.txt','r') as myfile:
myline=""
for line in myfile:
myline += line
if occurrence == 0: # assume 0 is used to indicate all occurrences must be replaced
if myline.find(old_pattern) == -1:
print('pattern not found, exit')
return
else:
newline = myline.replace(old_pattern, new_pattern)
else: #a particular occurrence has to be updated
idx = 0
offset=0
nbmatch = 0
while idx != -1:
idx = myline.find(old_pattern, offset)
if idx != -1:
offset = idx+1
nbmatch += 1
if nbmatch == occurrence:
# the index of the target occurrence has been reached
break
if nbmatch == 0:
print('problem, at least one occurrence expected')
return
elif nbmatch == 1:
print('problem, more than one occurrence expected, replace anyway')
newline = myline.replace(old_pattern, new_pattern)
else:
# further processing on a part of the line
sameline = myline[:idx]
diffline = myline[idx:]
# work on diffline substring
diffline = diffline.replace(old_pattern,new_pattern,1)
# rebuild line
newline = sameline+diffline
with open('input_file.txt','w') as myfile:
myfile.write(newline)
I may be not optimized but it should work as expected
I am trying to write a program which will allow me to compare SQL files to each other and have started by writing the full SQL file to to a text file. The text file generates successfully, but with blocks at the end as in the below example:
SET ANSI_NULLS ONഀ
GOഀ
SET QUOTED_IDENTIFIER ONഀ
GOഀ
CREATE TABLE [dbo].[CDR](ഀ
Below this is the code that generates the text file
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
from _ast import Num
#imports packages
r= open('master_lines.txt', 'w')
directory= "E:\\" #file directory, anonymous omission
master= directory + "master"
databases= ["\\1", "\\2", "\\3", "\\4"]
file_types= ["\\StoredProcedure", "\\Table", "\\UserDefinedFunction", "\\View"]
servers= []
server_number= []
master_lines= []
for file in os.listdir("E:\\"): #adds server paths to an array
servers.append(file)
for num in range(0, len(servers)):
for file in os.listdir(directory + servers[num]): #adds all the servers and paths to an array
server_number.append(servers[num] + "\\" + file)
master= directory + server_number[server_number.index("master")]
master_var= master + databases[0]
tmp= master_var + file_types[1]
for file in os.listdir(tmp):
with open(file) as tmp_file:
line= tmp_file.readlines()
for num in range(0, len(line)):
r.write(line[num])
r.close
I have already tried changing the encoding to both latin1 and utf-8; the current text file is the most successful as ascii and latin1 produced chinese and arabic characters respectively.
Below is the SQL file in text format:
/****** Object: Table [dbo].[CDR] Script Date: 2017-01-12 02:30:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CDR](
[calldate] [datetime] NOT NULL,
[clid] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[src] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dst] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dcontext] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[channel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dstchannel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastapp] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastdata] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[duration] [int] NOT NULL,
[billsec] [int] NOT NULL,
[disposition] [varchar](45) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[amaflags] [int] NOT NULL,
[accountcode] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[userfield] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[uniqueid] [varchar](64) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[cdr_id] [int] NOT NULL,
[cost] [real] NOT NULL,
[cdr_tag] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[importID] [bigint] IDENTITY(-9223372036854775807,1) NOT NULL,
CONSTRAINT [PK_CDR_1] PRIMARY KEY CLUSTERED
(
[uniqueid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
) ON [ReadPartition]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [Idx_Dst_incl_uniqueId] Script Date: 2017-01-12 02:30:50 PM ******/
CREATE NONCLUSTERED INDEX [Idx_Dst_incl_uniqueId] ON [dbo].[CDR]
(
[dst] ASC
)
INCLUDE ( [uniqueid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
GO
Hex dump to understand what happens, not part of above question:
ff fe 2f 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00
20 00 4f 00 62 00 6a 00 65 00 63 00 74 00 3a 00
20 00 20 00 54 00 61 00 62 00 6c 00 65 00 20 00
5b 00 64 00 62 00 6f 00 5d 00 2e 00 5b 00 43 00
44 00 52 00 5d 00 20 00 20 00 20 00 20 00 53 00
63 00 72 00 69 00 70 00 74 00 20 00 44 00 61 00
74 00 65 00 3a 00 20 00 32 00 30 00 31 00 37 00
2d 00 30 00 31 00 2d 00 31 00 32 00 20 00 30 00
32 00 3a 00 33 00 30 00 3a 00 34 00 39 00 20 00
50 00 4d 00 20 00 2a 00 2a 00 2a 00 2a 00 2a 00
2a 00 2f 00 0d 00 0a 00 53 00 45 00 54 00 20 00
41 00 4e 00 53 00 49 00 5f 00 4e 00 55 00 4c 00
4c 00 53 00 20 00 4f 00 4e 00 0d 00 0a 00 47 00
4f 00 0d 00 0a 00 53 00 45 00 54 00 20 00 51 00
55 00 4f 00 54 00 45 00 44 00 5f 00 49 00 44 00
Result of hexdump:
../.*.*.*.*.*.*.
.O.b.j.e.c.t.:.
. .T.a.b.l.e. .
[.d.b.o.]...[.C.
D.R.]. . . . .S.
c.r.i.p.t. .D.a.
t.e.:. .2.0.1.7.
-.0.1.-.1.2. .0.
2.:.3.0.:.4.9. .
P.M. .*.*.*.*.*.
*./.....S.E.T. .
A.N.S.I._.N.U.L.
L.S. .O.N.....G.
O.....S.E.T. .Q.
U.O.T.E.D._.I.D.
Your problem is that the original files are encoded in UTF-16 with an initial Byte Order Mark. It is normally transparent on Windows because almost all file editors automatically read it thanks to the initial BOM.
But the conversion is not automatic for Python scripts! That means that every character is read as the character itself followed by a null. It is almost transparent except for end of lines, because the nulls are simply written back again to form normal UTF16 characters. But the \n is no longer preceded by a raw \r but with a null, as as you write in text mode, Python replaces it with a pair \r\n which is no longer a valid UTF16 character and this causes the bloc display.
This is trivial to fix, just declare the UTF16 encoding when reading files:
for file in os.listdir(tmp):
with open(file, encoding='utf_16_le') as tmp_file:
Optionally, if you want to preserve the UTF16 encoding, you could also open the master file with it. By default, Python will encode it as utf8. But my advice would be to revert to 8bit encoding files to avoid further problem if you later wanted to process the output file.