I'm by far not a Python3 hero, but focussed on learning some new skills with it, thus any help would be appreciated. Working on a personal project that I want to throw on GitHub later on, I run into having a command outputting the following Python dictionary:
{'masscan': {'command_line': 'masscan -oX - 192.168.0.131/24 -p 22,80 --max-rate=1000', 'scanstats': {'timestr': '2022-03-26 10:00:07', 'elapsed': '12', 'uphosts': '2', 'downhosts': '0', 'totalhosts': '2'}}, 'scan': {'192.168.0.254': {'tcp': {80: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}, 22: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}}}}}
I then want to parse that to the following JSON format:
{
"data": [
{
"{#PORT}": 80,
"{#STATE}": "OPEN",
"{#ENDTIME}": "1648285195"
},
{
"{#PORT}": 22,
"{#STATE}": "OPEN",
"{#ENDTIME}": "1648285195"
}
]
}
What would be the most efficient way to parse through it? I don't want it to end up in a file but keep it within my code preferrably. Keeping in mind that there might be more ports than just port 22 and 80. The dictionary might be a lot longer, but following the same format.
Thanks!
this function will return exactly what you want (i suppose):
def parse_data(input):
data = []
for ip in input['scan'].keys():
for protocol in input['scan'][ip].keys():
for port in input['scan'][ip][protocol].keys():
port_data = {"{#PORT}": port, "{#STATE}": input['scan'][ip][protocol][port]['state'].upper(), "{#ENDTIME}": input['scan'][ip][protocol][port]['endtime']}
data.append(port_data)
return {'data': data}
function returns (ouput):
{
"data":[
{
"{#PORT}":80,
"{#STATE}":"OPEN",
"{#ENDTIME}":"1648285195"
},
{
"{#PORT}":22,
"{#STATE}":"OPEN",
"{#ENDTIME}":"1648285195"
}
]
}
don't know where 'Interface #2' in port '22' 'state' came from (in your desired result).
Possible solution is the following:
log_data = {'masscan': {'command_line': 'masscan -oX - 192.168.0.131/24 -p 22,80 --max-rate=1000', 'scanstats': {'timestr': '2022-03-26 10:00:07', 'elapsed': '12', 'uphosts': '2', 'downhosts': '0', 'totalhosts': '2'}}, 'scan': {'192.168.0.254': {'tcp': {80: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}, 22: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}}}}}
result = {"data": []}
for k, v in dct['scan'].items():
for tcp, tcp_data in v.items():
for port, port_data in tcp_data.items():
data = {"{#PORT}": port, "{#STATE}": port_data['state'], "{#ENDTIME}": port_data['endtime']}
result["data"].append(data)
print(result)
Prints
{'data': [
{'{#PORT}': 80, '{#STATE}': 'open', '{#ENDTIME}': '1648285195'},
{'{#PORT}': 22, '{#STATE}': 'open', '{#ENDTIME}': '1648285195'}]}
You could do a recursive search for the 'tcp' key and go from there. Something like this:
mydict = {'masscan': {'command_line': 'masscan -oX - 192.168.0.131/24 -p 22,80 --max-rate=1000', 'scanstats': {'timestr': '2022-03-26 10:00:07', 'elapsed': '12', 'uphosts': '2', 'downhosts': '0', 'totalhosts': '2'}},
'scan': {'192.168.0.254': {'tcp': {80: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}, 22: {'state': 'open', 'reason': 'syn-ack', 'reason_ttl': '64', 'endtime': '1648285195', 'services': []}}}}}
def findkey(d, k):
if k in d:
return d[k]
for v in d.values():
if isinstance(v, dict):
if r := findkey(v, k):
return r
rdict = {'data': []}
for k, v in findkey(mydict, 'tcp').items():
rdict['data'].append(
{'{#PORT}': k, '{#STATE}': v['state'].upper(), '{#ENDTIME}': v['endtime']})
print(rdict)
Output:
{'data': [{'{#PORT}': 80, '{#STATE}': 'OPEN', '{#ENDTIME}': '1648285195'}, {'{#PORT}': 22, '{#STATE}': 'OPEN', '{#ENDTIME}': '1648285195'}]}
Related
how to use nse in python? for example
nmap -p80 google.com --script=http-enum
Output:
PORT STATE SERVICE
80/tcp open http
| http-enum:
|_ /partners/: Potentially interesting folder
Python:
nm.scan('google.com', arguments=f'-p80 --script=http-enum')
Output:
{'142.250.186.142': {'nmap': {'command_line': 'nmap -oX - -p80 --script=http-enum 142.250.186.142', 'scaninfo': {'tcp': {'method': 'connect', 'services': '80'}}, 'scanstats': {'timestr': 'Tue Jul 12 07:38:13 2022', 'elapsed': '11.03', 'uphosts': '1', 'downhosts': '0', 'totalhosts': '1'}}, 'scan': {'142.250.186.142': {'hostnames': [{'name': 'fra24s07-in-f14.1e100.net', 'type': 'PTR'}], 'addresses': {'ipv4': '142.250.186.142'}, 'vendor': {}, 'status': {'state': 'up', 'reason': 'syn-ack'}, 'tcp': {80: {'state': 'open', 'reason': 'syn-ack', 'name': 'http', 'product': '', 'version': '', 'extrainfo': '', 'conf': '3', 'cpe': ''}}}}}}
When called in python, looks like it returns a dictionary with all the info. To get specific info you can access these attributes separately as shown in the docs. If instead, you want to run the same command on the command line from a python script you can use subprocess:
import subprocess
command = "nmap -p80 google.com --script=http-enum"
subprocess.Popen(command.split(' '))
I use flower to monitoring celery functions but if the result string is long, flower doesn't display all of it.
When I send request with python to flower, the result is the same, result is still not complete.
{'Name': {21: {'state': 'open', 'reason': 'syn-ack', 'name': 'ftp', 'product': 'vsftpd', 'version': '2.3.4', 'extrainfo': '', 'conf': '10', 'cpe': 'cpe:/a:vsftpd:vsftpd:2.3.4'}, 22: {'state': 'open', 'reason': 'syn-ack', 'name': 'ssh', 'product': 'OpenSSH', 'version': '4.7p1 Debian 8ubuntu1', 'extrainfo': 'protocol 2.0', 'conf': '10', 'cpe': 'cpe:/o:linux:linux_kernel'}, 23: {'state': 'open', 'reason': 'syn-ack', 'name': 'telnet', 'product': 'Linux telnetd', 'version': '', 'extrainfo': '', 'conf': '10', 'cpe': 'cpe:/o:linux:linux_kernel'}, 25: {'state': 'open', 'reason': 'syn-ack', 'name': 'smtp', 'product': 'Postfix smtpd', 'version': '', 'extrainfo': '', 'conf': '10', 'cpe': 'cpe:/a:postfix:postfix'}, 53: {'state': 'open', 'reason': 'syn-ack', 'name': 'domain', 'product': 'ISC BIND', 'version': '9.4.2', 'extrainfo': '', 'conf': '10', 'cpe': 'cpe:/a:isc:bind:9.4.2', 'script': {...}}, 80: {'state': 'open', 'reason': 'syn-ack', 'name': 'http', 'product': 'Apache httpd', 'version': '2.2.8', 'extrainfo': '(Ubuntu...', ...}}}
Update:
I did what you said #sp1rs, I set resultrepr_maxsize very high number but json that I got still doesn't bring some parts such as 'script' . It still shows {...}. I can't copy-paste it here because too long but I can take a photo. You can see that 'script' key doesn't have the result. (3. line) ibb.co/G0YShMK
In addition to that, if I get the function result with get() function in the python shell, 'script' keys and values come safely but flower doesn't bring them. Any idea?
Flower is just the dashboard and will display what celery gives to it. For performance issue celery limit the length of task result.
https://docs.celeryproject.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.resultrepr_maxsize
by default resultrepr_maxsize = 1024.
Change the resultrepr_maxsize value to increase the length of your final result.
For some reason when I call hostname(), nothing happens and it returns nothing. Here is a snippet of the code where I used it:
print("save output as txt?")
m = input("y/n: ")
for c in m:
if m == "y":
write = True
elif m == "n":
write = False
nm = nmap.PortScanner(nmap_search_path=('nmap', '/usr/bin/nmap', '/usr/local/bin/nmap', '/sw/bin/nmap', '/opt/local/bin/nmap', 'C:/Program Files(x86)/Nmap'))
nm.scan(hosts='192.168.1.0/24', arguments='-n -sP')
hosts_list = [(x, nm[x]['status']['state']) for x in nm.all_hosts()]
for host, status in hosts_list:
print(host + ' ' + status )
print(nm[host].hostname()) # < my problem
if write == True:
with open('log.txt', 'a') as f:
f. write('\n' + host + ' ' + status)
Everything works except for line 14, where I call hostname(). Could anyone explain what I'm doing wrong? Thank you.
It is because the IP address that you are trying to get the hostname doesn't have any name. I recommend you print the object and look if it has the parameter 'name'
I have checked it and It is working well.
>>> nm.scan('8.8.8.8', '22')
{'nmap': {'command_line': 'nmap -oX - -p 22 -sV 8.8.8.8', 'scaninfo': {'tcp': {'method': 'connect', 'services': '22'}}, 'scanstats': {'downhosts': '0', 'uphosts': '1', 'timestr': 'Fri Feb 23 11:15:33 2018', 'elapsed': '0.51', 'totalhosts': '1'}}, 'scan': {'8.8.8.8': {'vendor': {}, 'status': {'state': 'up', 'reason': 'syn-ack'}, 'addresses': {'ipv4': '8.8.8.8'}, 'hostnames': [{'name': 'google-public-dns-a.google.com', 'type': 'PTR'}], 'tcp': {22: {'extrainfo': '', 'state': 'filtered', 'reason': 'no-response', 'version': '', 'name': 'ssh', 'product': '', 'cpe': '', 'conf': '3'}}}}}
>>> nm['8.8.8.8'].hostname()
'google-public-dns-a.google.com'
>>> nm.scan('192.168.2.100', '22')
{'nmap': {'command_line': 'nmap -oX - -p 22 -sV 192.168.2.100', 'scaninfo': {'tcp': {'method': 'connect', 'services': '22'}}, 'scanstats': {'downhosts': '0', 'uphosts': '1', 'timestr': 'Fri Feb 23 11:15:46 2018', 'elapsed': '0.39', 'totalhosts': '1'}}, 'scan': {'192.168.2.100': {'vendor': {}, 'status': {'state': 'up', 'reason': 'syn-ack'}, 'addresses': {'ipv4': '192.168.2.100'}, 'hostnames': [{'name': '', 'type': ''}], 'tcp': {22: {'extrainfo': 'protocol 2.0', 'state': 'open', 'reason': 'syn-ack', 'version': '7.6', 'name': 'ssh', 'product': 'OpenSSH', 'cpe': 'cpe:/a:openbsd:openssh:7.6', 'conf': '10'}}}}}
>>> nm['192.168.2.100'].hostname()
''
Here, this is value of my dictionary but I want to get only details like product and version of 443 and 80.
Is there any way or command with the help of which, we can gethis info?
Here is my dictionary value:
{'nmap': {'scanstats': {'timestr': 'Fri Apr 17 05:08:18 2015', 'uphosts': '1', 'downhosts': '0', 'totalhosts': '1', 'elapsed': '14.91'}, 'scaninfo': {'tcp': {'services': '80,443', 'method': 'connect'}}, 'command_line': 'nmap -oX - -p 80,443 -sV xxxx'}, 'scan': {'x.x.x.x': {'status': {'state': 'up', 'reason': 'syn-ack'}, 'hostname': 'xxxx', 'vendor': {}, 'addresses': {'ipv4': '0x.x.x'}, 'tcp': {'443': {'product': 'Apache Tomcat/Coyote JSP engine', 'name': 'http', 'extrainfo': '', 'reason': 'syn-ack', 'cpe': '', 'state': 'open', 'version': '1.1', 'conf': '10'}, '80': {'product': 'Apache Tomcat/Coyote JSP engine', 'name': 'http', 'extrainfo': '', 'reason': 'syn-ack', 'cpe': '', 'state': 'open', 'version': '1.1', 'conf': '0'}}}}}
So. I ran this command
scan=[v for k,v in x.iteritems() if 'scan' in k]
It gives me result below:
[{
'x.x.x.x': {
'status': {
'state': 'up',
'reason': 'syn-ack'
},
'hostname': 'xxxx',
'vendor': {},
'addresses': {
'ipv4': 'x.x.x.x'
},
'tcp': {
'443': {
'product': 'Apache Tomcat/Coyote JSP engine',
'name': 'http',
'extrainfo': '',
'reason': 'syn-ack',
'cpe': '',
'state': 'open',
'version': '1.1',
'conf': '10'
},
'80': {
'product': '',
'name': 'http',
'extrainfo': '',
'reason': 'conn-refused',
'cpe': '',
'state': 'closed',
'version': '',
'conf': '3'
}
}
}
}]
You can try the following:
>>> data = [{'x.x.x.x': {'status': {'state': 'up', 'reason': 'syn-ack'}, 'hostname': 'xxxx', 'vendor': {}, 'addresses': {'ipv4': 'x.x.x.x'}, 'tcp': {'443': {'product': 'Apache Tomcat/Coyote JSP engine', 'name': 'http', 'extrainfo': '', 'reason': 'syn-ack', 'cpe': '', 'state': 'open', 'version': '1.1', 'conf': '10'}, '80': {'product': '', 'name': 'http', 'extrainfo': '', 'reason': 'conn-refused', 'cpe': '', 'state': 'closed', 'version': '', 'conf': '3'}}}}]
>>> for i in data[0]['x.x.x.x']['tcp']:
... print i, data[0]['x.x.x.x']['tcp'][i]['product'], data[0]['x.x.x.x']['tcp'][i]['version']
...
443 Apache Tomcat/Coyote JSP engine 1.1
80
>>>
You could use method items (iteritems in Python 2) for extracting both port number and associated information:
In [4]: for port, info in data[0]['x.x.x.x']['tcp'].items():
...: print(port, info['product'], info['version'])
...:
443 Apache Tomcat/Coyote JSP engine 1.1
80
You can always use d.keys() to see what is in the dictionary keys to traverse it.
d = your dictionary
d['x.x.x.x']['tcp']['443']['product']
Out[109]: 'Apache Tomcat/Coyote JSP engine'
d['x.x.x.x']['tcp']['443']['version']
Out[110]: '1.1'
d['x.x.x.x']['tcp']['80']['product']
Out[109]: ''
d['x.x.x.x']['tcp']['80']['version']
Out[110]: ''
Your data is basically a tree, so it can be traversed recursively with a function like this:
def parse_data(output, keys_wanted, values_wanted, data):
for key, val in data.iteritems():
if key in keys_wanted:
output.update({key: {k: val[k] for k in values_wanted}})
if isinstance(val, dict):
parse_data(output, keys_wanted, values_wanted, val)
Use:
data = <your dict>
keys = ['443', '80']
vals = ['product', 'version']
out = {}
parse_data(out, keys, vals, data)
Output:
>>> print out
{'443': {'product': 'Apache Tomcat/Coyote JSP engine', 'version': '1.1'}, '80': {'product': '', 'version': ''}}
A benefit to this function is that it's general purpose -- if you want different keys and values just pass different lists in the parameters.
BTW, in your sample input, the dict is inside a list, but there's just one item, so I stripped off the list brackets for simplicity's sake. If your actual data is in a list with many other items, you'd of course want to call this function in an iteration loop.
given in a string the following information:
[:T102684-1 coord="107,20,885,18":]27.[:/T102684-1:] [:T102684-2
coord="140,16,885,18":]A.[:/T102684-2:] [:T102684-3
coord="162,57,885,18":]Francke[:/T102684-3:][:T102684-4
coord="228,5,885,18":]:[:/T102684-4:] [:T102684-5
coord="240,27,885,18":]Die[:/T102684-5:] [:T102684-6
coord="274,42,885,18":]alpine[:/T102684-6:] [:T102684-7
coord="325,64,885,18":]Literatur[:/T102684-7:] [:T102684-8
coord="398,25,885,18":]des[:/T102684-8:] [:T102684-9
coord="427,46,885,18":]Jahres[:/T102684-9:] [:T102684-10
coord="480,33,885,18":]1888[:/T102684-10:] [:T102684-11
coord="527,29,885,18":]475[:/T102684-11:]
How can I extract the Tab-ID (here: T102684), the Token-ID (the number after the "-"), the coordinates (107,20,885,18) and the token itself ("27.") ?
I used simple find-methods, but it doesn't work...
for tok in ele.text.split():
print tok.find("[:T")
print tok.rfind(":]")
print tok[(tok.find("[:T")+2):tok.rfind("-")]
Thanks for any help!
You can use regex for this:
>>> import re
>>> s = '[:T102684-1 coord="107,20,885,18":]27.[:/T102684-1:] [:T102684-2 coord="140,16,885,18":]A.[:/T102684-2:] [:T102684-3 coord="162,57,885,18":]Francke[:/T102684-3:][:T102684-4 coord="228,5,885,18":]:[:/T102684-4:] [:T102684-5 coord="240,27,885,18":]Die[:/T102684-5:] [:T102684-6 coord="274,42,885,18":]alpine[:/T102684-6:] [:T102684-7 coord="325,64,885,18":]Literatur[:/T102684-7:] [:T102684-8 coord="398,25,885,18":]des[:/T102684-8:] [:T102684-9 coord="427,46,885,18":]Jahres[:/T102684-9:] [:T102684-10 coord="480,33,885,18":]1888[:/T102684-10:] [:T102684-11 coord="527,29,885,18":]475[:/T102684-11:]'
>>> r = re.compile(r'''\[:/?T(?P<token_id>\d+)-(?P<id>\d+)\s+coord="
(?P<coord>(\d+,\d+,\d+,\d+))":\](?P<token>\w+)''', flags=re.VERBOSE)
>>> for m in r.finditer(s):
print m.groupdict()
{'token_id': '102684', 'token': '27', 'id': '1', 'coord': '107,20,885,18'}
{'token_id': '102684', 'token': 'A', 'id': '2', 'coord': '140,16,885,18'}
{'token_id': '102684', 'token': 'Francke', 'id': '3', 'coord': '162,57,885,18'}
{'token_id': '102684', 'token': 'Die', 'id': '5', 'coord': '240,27,885,18'}
{'token_id': '102684', 'token': 'alpine', 'id': '6', 'coord': '274,42,885,18'}
{'token_id': '102684', 'token': 'Literatur', 'id': '7', 'coord': '325,64,885,18'}
{'token_id': '102684', 'token': 'des', 'id': '8', 'coord': '398,25,885,18'}
{'token_id': '102684', 'token': 'Jahres', 'id': '9', 'coord': '427,46,885,18'}
{'token_id': '102684', 'token': '1888', 'id': '10', 'coord': '480,33,885,18'}
{'token_id': '102684', 'token': '475', 'id': '11', 'coord': '527,29,885,18'}