I have 2 variables from a xml file;
edit:*i m sorry. i pasted wrong value *
x="00 25 9E B8 B9 19 "
y="F0 00 00 25 9E B8 B9 19 "
when i use if x in y: statement nothings happen
but if i use if "00 25 9E B8 B9 19 " in y: i get results
any idea?
i am adding my full code;
import xml.etree.ElementTree as ET
tree =ET.parse('c:/sw_xml_test_4a.xml')
root=tree.getroot()
for sw in root.findall('switch'):
for switch in root.findall('switch'):
if sw[6].text.rstrip() in switch.find('GE01').text:
print switch[0].text
if sw[6].text.strip() in switch.find('GE02').text.strip():
print switch[0].text
if sw[6].text.strip() in switch.find('GE03').text.strip():
print switch[0].text
if sw[6].text.strip() in switch.find('GE04').text.strip():
print switch[0].text
xml file detail;
- <switch>
<ci_adi>"aaa_bbb_ccc"</ci_adi>
<ip_adress>10.10.10.10</ip_adress>
<GE01>"F0 00 00 25 9E 2C BC 98 "</GE01>
<GE02>"80 00 80 FB 06 C6 A1 2B "</GE02>
<GE03>"F0 00 00 25 9E B8 BB AA "</GE03>
<GE04>"F0 00 00 25 9E B8 BB AA "</GE04>
<bridge_id>"00 25 9E B8 BB AA "</bridge_id>
</switch>
>>> x = "00 25 9E 2C BC 8B"
>>> y = "F0 00 00 25 9E B8 B9 19"
>>> x in y
False
>>> "00 25 9E 2C BC 8B " in y
False
how exactly are you getting results?
let me explain what in is checking.
in is checking if the entire value of x is contained anywhere within the value of y. as you can see, the entire value of x is NOT contained in its entirety in y.
however, some elements of x are, maybe what you are trying to do is:
>>> x = ["00", "25", "9E", "2C", "BC", "8B"]
>>> y = "F0 00 00 25 9E B8 B9 19"
>>> for item in x:
if item in y:
print item + " is in " + y
00 is in F0 00 00 25 9E B8 B9 19
25 is in F0 00 00 25 9E B8 B9 19
9E is in F0 00 00 25 9E B8 B9 19
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. For strings, this translates to return True if entire string x is a substring of y, else return False.
Other than a mix-up of values in your question, this seems to work the way you want:
sxml="""\
<switch>
<ci_adi>"aaa_bbb_ccc"</ci_adi>
<ip_adress>10.10.10.10</ip_adress>
<GE01>"F0 00 00 25 9E 2C BC 98 "</GE01>
<GE02>"80 00 80 FB 06 C6 A1 2B "</GE02>
<GE03>"F0 00 00 25 9E B8 BB AA "</GE03>
<GE04>"F0 00 00 25 9E B8 BB AA "</GE04>
<bridge_id>"00 25 9E B8 BB AA "</bridge_id>
</switch>"""
tree=et.fromstring(sxml)
x="80 00 80 FB 06 C6 A1 2B" # Note: I used a value of x I could see in the data;
# your value of x="00 25 9E B8 B9 19 " is not present...
for el in tree:
print '{}: {}'.format(el.tag, el.text)
if x in el.text:
print 'I found "{}" by gosh at {}!!!\n'.format(x,el.tag)
Related
With Python, I wanted to format a string of hex characters:
spaces between each byte (easy enough): 2f2f -> 2f 2f
line breaks at a specified max byte width (not hard): 2f 2f 2f 2f 2f 2f 2f 2f\n
address ranges for each line (doable): 0x7f8-0x808: 2f 2f 2f 2f 2f 2f 2f 2f\n
replace large ranges of sequential 00 bytes with: ... trimmed 35 x 00 bytes [0x7 - 0x2a] ... ... it was at this point that I knew I was doing some bad coding. The function got bloated and hard to follow. Too many features piled up in a non-intuitive way.
Example output:
0x0-0x10: 5a b6 f7 6e 7c 65 45 a0 bc 6a e5 f5 77 2b 92 48
0x10-0x20: 47 d7 33 ea 40 15 44 ac 6b a4 50 78 6e f2 10 d4
0x20-0x30: 9c 7c c1 f7 5a bf ec 9f b0 2b b7 29 97 ee 56 31
0x30-0x40: ff 23 d9 1a 0b 4e fd 65 50 92 42 eb b2 77 7a 55
0x40-0x50:
I'm pretty sure the address ranges aren't correct anymore in certain cases (particularly when the 00 replacement occurs), the function just looks disgusting, and I'm embarrassed to even show it.
def pretty_print_hex(hex_str, byte_width=16, line_start=False, addr=0):
out = ''
condense_min = 12
total_bytes = int(len(hex_str) / 2)
line_width = False
if byte_width is not False:
line_width = byte_width * 2
if line_start is not False:
out += line_start
end = addr + byte_width
if (end > addr + total_bytes):
end = addr + total_bytes
out += f"{hex(addr)}-{hex(end)}:\t"
addr += byte_width
i = 0
if len(hex_str) == 1:
print('Cannot pretty print < 1 byte', hex_str)
return
condensing = False
cond_start_addr = 0
cond_end_addr = 0
condense_cache = []
while i < len(hex_str):
byte = hex_str[i] + hex_str[i + 1]
i += 2
if byte == '00':
condensing = True
cond_start_addr = (addr - byte_width) + ((i + 1) % byte_width)
condense_cache.append(byte)
else:
if condensing is True:
condensed_count = len(condense_cache)
if condensed_count >= condense_min:
cond_end_addr = cond_start_addr + condensed_count
out += f"... trimmed {condensed_count} x 00 bytes [{hex(cond_start_addr)} - {hex(cond_end_addr)}] ..."
else:
for byte in condense_cache:
out += f"{byte} "
condense_cache = []
condensing = False
if condensing is False:
out += byte + ' '
if (line_width is not False) and (i) % line_width == 0:
out += '\n'
if line_start is not False:
out += line_start
end = addr + byte_width
if end > addr + total_bytes:
end = addr + total_bytes
if (addr - end) != 0:
out += f"{hex(addr)}-{hex(end)}:\t"
addr += byte_width
if condensing is True:
condensed_count = len(condense_cache)
if condensed_count >= condense_min:
cond_end_addr = cond_start_addr + condensed_count
out += f"... trimmed {condensed_count} x 00 bytes [{hex(cond_start_addr)} - {hex(cond_end_addr)}] ..."
else:
for byte in condense_cache:
out += f"{byte} "
return out.rstrip()
example input / output:
hex_str = 'c8d8fb631cc7d072b62aaf9cd47bc270d4341e35f23b7a94acf24f33397a6cb4145b6eacfd56653d79bea10d2842023155e5b14bec3b5851a0a58cb3a523c476b126486e1392bdd2e3bcb6cbc333b23de387ae8624123009'
byte_width=16
line_start='\t'
addr=0
print(pretty_print_hex(hex_str , byte_width=16, line_start='\t', addr=0))
0x0-0x10: c8 d8 fb 63 1c c7 d0 72 b6 2a af 9c d4 7b c2 70
0x10-0x20: d4 34 1e 35 f2 3b 7a 94 ac f2 4f 33 39 7a 6c b4
0x20-0x30: 14 5b 6e ac fd 56 65 3d 79 be a1 0d 28 42 02 31
0x30-0x40: 55 e5 b1 4b ec 3b 58 51 a0 a5 8c b3 a5 23 c4 76
0x40-0x50: b1 26 48 6e 13 92 bd d2 e3 bc b6 cb c3 33 b2 3d
0x50-0x60: e3 87 ae 86 24 12 30 09
It gets much worse when you involve some 00 replacement, here's an example of that:
hex_str = 'c8000000000000000000000000000aaf9cd47bc270d4341e35f23b7a94acf24f33397a6cb4145b6eacfd56653d79bea10d2842023155e5b14bec3b5851a0a58cb3a523c476b126486e1392bdd2e3bcb6cbc333b23de387ae8624123009'
byte_width=16
line_start='\t'
addr=0
print(pretty_print_hex(hex_str, byte_width=16, line_start='\t', addr=0))
0x0-0x10: c8 ... trimmed 13 x 00 bytes [0xd - 0x1a] ...0a af
0x10-0x20: 9c d4 7b c2 70 d4 34 1e 35 f2 3b 7a 94 ac f2 4f
0x20-0x30: 33 39 7a 6c b4 14 5b 6e ac fd 56 65 3d 79 be a1
0x30-0x40: 0d 28 42 02 31 55 e5 b1 4b ec 3b 58 51 a0 a5 8c
0x40-0x50: b3 a5 23 c4 76 b1 26 48 6e 13 92 bd d2 e3 bc b6
0x50-0x60: cb c3 33 b2 3d e3 87 ae 86 24 12 30 09
It would also make more sense to make the address range (`0x0-0x10) portray the true range, to include the trimmed bytes on that line, but I couldn't even begin to think of how to add that in.
Rather than patch this bad looking function, I thought I might ask for a better approach entirely, if one exists.
I would suggest to not start a "trimmed 00 bytes" series in the middle of an output line, but only apply this compacting when it applies to complete output lines with only zeroes.
This means that you will still see non-compacted zeroes in a line that also contains non-zeroes, but in my opinion this results in a cleaner output format. For instance, if a line would end with just two 00 bytes, it really does not help to replace that last part of the line with the longer "trimmed 2 x 00 bytes" message. By only replacing complete 00-lines with this message, and compress multiple such lines with one message, the output format seems cleaner.
To produce that output format, I would use the power of regular expressions:
to identify a block of bytes to be output on one line: either a line with at least one non-zero, or a range of zero bytes which either runs to the end of the input, or else is a multiple of the "byte width" argument.
to insert spaces in a line of bytes
All this can be done through iterations in one expression:
def pretty_print_hex(hex_str, byte_width=16, line_start='\t', addr=0):
return "\n".join(f"{hex(start)}-{hex(last)}:{line_start}{line}"
for start, last, line in (
(match.start() // 2, match.end() // 2 - 1,
f"...trimmed {(match.end() - match.start()) // 2} x 00 bytes..." if match[1]
else re.sub("(..)(?!$)", r"\1 ", match[0])
)
for match in re.finditer(
f"(0+$|(?:(?:00){{{byte_width}}})+)|(?:..){{1,{byte_width}}}",
hex_str
)
)
)
If you want to use it rather than write it (not sure - tell me to delete if required), you can use the excellent (I am not associated with it) hexdump:
https://pypi.org/project/hexdump
python -m hexdump binary.dat
It is super cool - I guess you could also inspect the source for ideas.
It doesn't, however, look like it is still maintained...
I liked the challenge in this function, and this is what I could come up with this evening. It is somewhat shorter than your original one, but not as short as trincot's answer.
def hexpprint(
hexstring: str,
width: int = 16,
hexsep: str = " ",
addr: bool = False,
addrstart: int = 0,
linestart: str = "",
compress: bool = False,
):
# if address get hex address length size
if addr:
addrlen = len(f"{addrstart+len(hexstring):x}")
# compression buffer just count hex 0 chars
cbuf = 0
for i in range(0, len(hexstring), width):
j = i + width
row = hexstring[i:j]
# if using compression and compressable
if compress and row.count("0") == len(row):
cbuf += len(row)
continue
# if not compressable and has cbuf, flush it
if cbuf:
line = linestart
if addr:
beg = f"0x{addrstart+i-cbuf:0{addrlen}x}"
end = f"0x{addrstart+i:0{addrlen}x}"
line += f"{beg}-{end} "
line += f"compressed {cbuf//2} NULL bytes"
print(line)
cbuf = 0
# print formatted hex row
line = linestart
if addr:
beg = f"0x{addrstart+i:0{addrlen}x}"
end = f"0x{addrstart+i+len(row):0{addrlen}x}"
line += f"{beg}-{end} "
line += hexsep.join(row[i : i + 2] for i in range(0, width, 2))
print(line)
# flush cbuf if necessary
if cbuf:
line = linestart
if addr:
beg = f"0x{addrstart+i-cbuf:0{addrlen}x}"
end = f"0x{addrstart+len(hexstring):0{addrlen}x}"
line += f"{beg}-{end} "
line += f"compressed {cbuf//2} NULL bytes"
print(line)
PS: I don't really like the code repetition to print things, so I might come back and edit later.
I have a column in a pandas data frame that is formatted like
f1 d3 a4 0a d0 6a 4b 4a 83 d4 4f c9 1f 15 11 17
and I want to convert it to look like:
f1d3a40a-d06a-4b4a-83d4-4fc91f151117
I know I can use replace(" ", "") to take the whitespace out, but I am not sure how to insert the hyphens in the exact spots that I need them.
I am also not sure how to apply it to a pandas series object.
Any help would be appreciated!
This looks like a UUID, so I'd just use that module
>>> import uuid
>>> s = 'f1 d3 a4 0a d0 6a 4b 4a 83 d4 4f c9 1f 15 11 17'
>>> uuid.UUID(''.join(s.split()))
UUID('f1d3a40a-d06a-4b4a-83d4-4fc91f151117')
>>> str(uuid.UUID(''.join(s.split())))
'f1d3a40a-d06a-4b4a-83d4-4fc91f151117'
EDIT:
df = pd.DataFrame({'col':['f1 d3 a4 0a d0 6a 4b 4a 83 d4 4f c9 1f 15 11 17',
'f1 d3 a4 0a d0 6a 4b 4a 83 d4 4f c9 1f 15 11 17']})
df['col'] = df['col'].str.split().str.join('').apply(uuid.UUID)
print (df)
col
0 f1d3a40a-d06a-4b4a-83d4-4fc91f151117
1 f1d3a40a-d06a-4b4a-83d4-4fc91f151117
a = "f1 d3 a4 0a d0 6a 4b 4a 83 d4 4f c9 1f 15 11 17"
c = "f1d3a40a-d06a-4b4a-83d4-4fc91f151117"
b = [4,2,2,2,6]
def space_2_hyphens(s, num_list,hyphens = "-"):
sarr = s.split(" ")
if len(sarr) != sum(num_list):
raise Exception("str split num must equals sum(num_list)")
out = []
k = 0
for n in num_list:
out.append("".join(sarr[k:k + n]))
k += n
return hyphens.join(out)
print(a)
print(space_2_hyphens(a,b))
print(c)
I have the following contents from data.log file. I wish to extract the ts value and part of the payload (after deadbeef in the payload, third row, starting second to last byte. Please refer to expected output).
data.log
print 1: file offset 0x0
ts=0x584819041ff529e0 2016-12-07 14:13:24.124834649 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 50 e6 61 c3 85 21 01 00 de ad be ef 85 d7 ..P.a..!........
91 21 6f 9a 32 94 fd 07 01 00 de ad be ef 85 d7 .!o.2...........
print 2: file offset 0x60
ts=0x584819041ff52b00 2016-12-07 14:13:24.124834716 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 68 e7 61 c3 85 21 01 00 de ad be ef 86 d7 ..h.a..!........
91 21 c5 34 77 bd fd 07 01 00 de ad be ef 86 d7 .!.4w...........
print 3806: file offset 0x592e0
ts=0x584819042006b840 2016-12-07 14:13:24.125102535 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 50 74 73 c3 85 21 01 00 de ad be ef 62 e6 ..Pts..!......b.
91 21 ed 4a 8c df fd 07 01 00 de ad be ef 62 e6 .!.J..........b.
My expected output
0x584819041ff529e0,85d79121
0x584819041ff52b00,86d79121
0x584819042006b840,62e69121
What I have tried so far
I am able to extract the ts value. I used
awk -v ORS="" '$NF == "UTC"{print sep$1; sep=","} END{print "\n"}' data.log
>> ts=0x584819041ff529e0,ts=0x584819041ff52b00
But didn't succeed in extracting the payload contents.
Any help is much appreciated.
Here's one way to get it done:
awk -F '=| ' '/^ts=/{printf $2","} /de ad be ef/{if(!a){printf $15$16;a=1}else{print $1$2;a=0}}' data.log
Output:
0x584819041ff529e0,85d79121
0x584819041ff52b00,86d79121
Explanation:
-F '=| ' : set the field seperator to both '=' and 'space'
/^ts=/{printf $2","} : if pattern 'ts=' found at line beginning, print the second field
/de ad be ef/{something} : if pattern 'de ad be ef' found, do 'something'
Initially variable a will be equal to 0. if pattern de ad be ef is found for the first time, if(!a) would succeed and hence print the 15th and 16th fields. Now set a to 1. So when de ad be ef pattern is matched in the next line, if(!a) check would fail and hence print the 1st and 2nd fields. Now, reset a to 0 and continue the same process for the rest of the file.
If you want sed:
sed -n -e '/^ts/ {s/^ts=\([^ ]*\) \(.*\)/\1/; H;};' \
-e '/de ad be ef/ {N; s/\(.*\)de ad be ef \([0-9a-f]\+\) \([0-9a-f]\+\) \(.*\) \([0-9a-f]\+\) \([0-9a-f]\+\) \(.*\)/,\2\3\5\6/; H;};' \
-e '$ {x; s/\n,/,/g p;}' file
If you are interested in further infos, just ask.
awk variant using deadbeef as switch
awk -F '[= ]' '/^ts/{s=$2",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' data.log
and a sed variant
sed -n -e '/^ts=/{h;b^J}' -e "/de ad be ef/,//{H;g;s/ts=\([^ ]*\).*\n*de ad be ef \(..\) \(..\).*\n\(..\) \(..\).*/\1,\2\3\4\4/p;}" data.log
info: "^J" is a CTRL+J (new line carractere) in posix version and a ";" in GNU version
With GNU awk for gensub():
$ awk -v RS= '{
gsub(/( |\t)+[^\n]*(\n|$)/," ")
print gensub(/.*\nts=(\S+).*de ad be ef (..) (..) (..) (..).*/,"\\1,\\2\\3\\4\\5\\6",1)
}' data.log
0x584819041ff529e0,85d79121
0x584819041ff52b00,86d79121
0x584819042006b840,62e69121
The above will work even if deadbeef is split across lines.
I am trying to write a simple program in python to use telegram api, (not bot api, main messaging api) Now i have written this code
#!/usr/bin/env python
import socket
import random
import time
import struct
import requests
def swap64(i):
return struct.unpack("<L", struct.pack(">L", i))[0]
MESSAGE = '0000000000000000'+format(swap32(int(time.time()*1000%1000)<<21|random.randint(0,1048575)<<3|4),'x')+format(swap32(int(time.time())),'x')+'140000007897466068edeaecd1372139bbb0394b6fd775d3'
res = requests.post(url='http://149.154.167.40',
data=bytes.fromhex(MESSAGE),
headers={'connection': 'keep-alive'})
print("received data:", res)
For payload of post data i used the source code of telegram web, The 0 auth id, message id is generated using the algo in telegram web, next is length (14000000) just like in the source and main doc and then the method and so on,
When i run this code i get received data: <Response [404]> i have used both tcp and http transport with this and tcp one gives me nothing as answer from server, i don't know where i'm wrong in my code
i would be glad if someone can show the error in my code
btw here is hex dump of my generated req:
0000 34 08 04 17 7a ec 48 5d 60 84 ba ed 08 00 45 00
0010 00 50 c6 07 40 00 40 06 76 28 c0 a8 01 0d 95 9a
0020 a7 28 c9 62 00 50 0d 1a 3b df 41 5a 40 7f 50 18
0030 72 10 ca 39 00 00 00 00 00 00 00 00 00 00 6c 28
0040 22 4a 94 a9 c9 56 14 00 00 00 78 97 46 60 68 ed
0050 ea ec d1 37 21 39 bb b0 39 4b 6f d7 75 d3
i have already read this and this and many other docs but cant find out my problem
thanks in advance
update
i used this code as suggested
TCP_IP = '149.154.167.40'
TCP_PORT = 80
MESSAGE = 'ef0000000000000000'+"{0:0{1}x}".format(int(time.time()*4294.967296*1000),16)+'140000007897466068edeaecd1372139bbb0394b6fd775d3'
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(bytes.fromhex(MESSAGE))
data = s.recv(BUFFER_SIZE)
s.close()
and i still get no response
hex dump of my request:
0000 34 08 04 17 7a ec 48 5d 60 84 ba ed 08 00 45 00
0010 00 51 e1 44 40 00 40 06 5a ea c0 a8 01 0d 95 9a
0020 a7 28 df 8c 00 50 e4 0d 12 46 e2 98 bf a3 50 18
0030 72 10 af 66 00 00 ef 00 00 00 00 00 00 00 00 00
0040 16 37 dc e1 28 39 23 14 00 00 00 78 97 46 60 68
0050 ed ea ec d1 37 21 39 bb b0 39 4b 6f d7 75 d3
Fixed code
Finally got it working with this code
import socket
import random
import time
import struct
import requests
def swap32(i):
return struct.unpack("<L", struct.pack(">L", i))[0]
TCP_IP = '149.154.167.40'
TCP_PORT = 80
z = int(time.time()*4294.967296*1000000)
z = format(z,'x')
q = bytearray.fromhex(z)
e = q[::-1].hex()
MESSAGE = 'ef0a0000000000000000'+e+'140000007897466068edeaecd1372139bbb0394b6fd775d3'
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(bytes.fromhex(MESSAGE))
data = s.recv(BUFFER_SIZE)
s.close()
print(data)
here is sample data from a simple TCP handshake with Telegram Servers:
Connect:Success:0
Connected to 149.154.167.40:443
raw_data: 000000000000000000F011DB3B2AA9561400000078974660A9729A4F5B51F18F7943F9C0D61B1315
auth_key_id: 0000000000000000 0
message_id: 56A92A3BDB11F000 6244568794892726272
data_length: 00000014 20
message_data: 78974660A9729A4F5B51F18F7943F9C0D61B1315
message_type: 60469778
>> EF0A000000000000000000F011DB3B2AA9561400000078974660A9729A4F5B51F18F7943F9C0D61B1315
Send:Success:42
Receive:Success:85
<< 15000000000000000001CC0CC93D2AA9564000000063241605A9729A4F5B51F18F7943F9C0D61B1315B4445B94718B3C6DD4136466FAC62DCD082311272BE9FF8F9700000015C4B51C01000000216BE86C022BB4C3
raw_data: 000000000000000001CC0CC93D2AA9564000000063241605A9729A4F5B51F18F7943F9C0D61B1315B4445B94718B3C6DD4136466FAC62DCD082311272BE9FF8F9700000015C4B51C01000000216BE86C022BB4C3
auth_key_id: 0000000000000000 0
message_id: 56A92A3DC90CCC01 6244568803180334081
data_length: 00000040 64
message_data: 63241605A9729A4F5B51F18F7943F9C0D61B1315B4445B94718B3C6DD4136466FAC62DCD082311272BE9FF8F9700000015C4B51C01000000216BE86C022BB4C3
message_type: 05162463
classid: resPQ#05162463
nonce: A9729A4F5B51F18F7943F9C0D61B1315
server_nonce: B4445B94718B3C6DD4136466FAC62DCD
pq: 2311272BE9FF8F97 2526843935494475671
count: 00000001 1
fingerprints: C3B42B026CE86B21 14101943622620965665
Lets break it down:
We are using the TCP abridged version, so we start off with 0xEF
The format for plain-text Telegram messages is auth_ke_id + msg_id + msg_len + msg
auth_key_id is always 0 for plain-text messages hence we always start with 0000000000000000
msg_id must approximately equal unixtime*2^32(see here) I have also seen that some variant of this works quite well for msg_id in any language on any platform: whole_part_of(current_micro_second_time_stamp * 4294.967296)
The first message you start with for Auth_key generation is reqPQ which is defined as: reqPQ#0x60469778 {:nonce, :int128} so it is simply a TL-header + a 128-bit random integer the total length will always be 4 + 16 = 20 encoded as little-endian that would be msg_len = 14000000
say we have a 128-bit random integer= 55555555555555555555555555555555, then our reqPQ message would be 7897466055555555555555555555555555555555, which is simply TL-type 60469778 or 78974660 in little-endian followed by your randomly chooses 128-bit nonce.
Before you send out the packet, again recall that TCP-abridged mode required you to include the total packet length in front of the other bytes just after the initial 0xEA . This packet length is computed as follows
let len = total_length / 4
a) If len < 127 then len_header = len as byte
b) If len >=127 then len_header = 0x7f + to_3_byte_little_endian(len)
finally we have:
EF0A000000000000000000F011DB3B2AA956140000007897466055555555555555555555555555555555
or
EF0A
0000000000000000
00F011DB3B2AA956
14000000
78974660
55555555555555555555555555555555
compared to yours:
0000000000000000
6C28224A94A9C956
14000000
78974660
68EDEAECD1372139BBB0394B6FD775D3
I would say, try using TCP-abriged mode by include the 0xEF starting bit and re-check your msg_id computation
cheers.
I've got two binary files. They look something like this, but the data is more random:
File A:
FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF ...
File B:
41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37 ...
What I'd like is to call something like:
>>> someDiffLib.diff(file_a_data, file_b_data)
And receive something like:
[Match(pos=4, length=4)]
Indicating that in both files the bytes at position 4 are the same for 4 bytes. The sequence 44 43 42 41 would not match because they're not in the same positions in each file.
Is there a library that will do the diff for me? Or should I just write the loops to do the comparison?
You can use itertools.groupby() for this, here is an example:
from itertools import groupby
# this just sets up some byte strings to use, Python 2.x version is below
# instead of this you would use f1 = open('some_file', 'rb').read()
f1 = bytes(int(b, 16) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split())
f2 = bytes(int(b, 16) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())
matches = []
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]):
if k:
pos = next(g)
length = len(list(g)) + 1
matches.append((pos, length))
Or the same thing as above using a list comprehension:
matches = [(next(g), len(list(g))+1)
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i])
if k]
Here is the setup for the example if you are using Python 2.x:
f1 = ''.join(chr(int(b, 16)) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split())
f2 = ''.join(chr(int(b, 16)) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())
The provided itertools.groupby solution works fine, but it's pretty slow.
I wrote a pretty naive attempt using numpy and tested it versus the other solution on a particular 16MB file I happened to have, and it was about 42x faster on my machine. Someone familiar with numpy could likely improve this significantly.
import numpy as np
def compare(path1, path2):
x,y = np.fromfile(path1, np.int8), np.fromfile(path2, np.int8)
length = min(x.size, y.size)
x,y = x[:length], y[:length]
z = np.where(x == y)[0]
if(z.size == 0) : return z
borders = np.append(np.insert(np.where(np.diff(z) != 1)[0] + 1, 0, 0), len(z))
lengths = borders[1:] - borders[:-1]
starts = z[borders[:-1]]
return np.array([starts, lengths]).T