Python dictionary with a tuple and tuple count as the value - python

I have a .csv file:
csv file
containing packet header data from a wireshark scan that I am iterating through line by line with a for loop. The list contains around 100,000 items, many of which are repeated. I am trying to find how many times each destination IP address is accessed using TCP protocol(6) on each port ranging from 1 to 1024. Essentially I am trying to create something that looks like this:
{ip address: {(protocol:port):count}}
Where I will know how many times a combination of protocol/port tried to use the IP address as a destination. So far I've tried this:
dst = defaultdict(list)
for pkt in csvfile:
if(pkt.tcpdport > 0 and pkt.tcpdport < 1025):
tup = (pkt.proto, pkt.tcpdport)
dst[pkt.ipdst].append(tup)
When I try to print this out I get a list of IP addresses with the protocol, port tuple listed multiple times per IP address. How can I get it so that I show the tuple followed by a count of how many times it occurs in each dictionary entry instead?

Currently, the line dst[pkt.ipdst].append(tup) is telling python, get the value associated with the IP address, and then append the tuple to it. In this case, that means you're appending the tuple to the dictionary associated with the IP address. This is why you're seeing multiple tuples listed per IP address.
To fix this, simply change your line to dst[pkt.ipdst][tup] += 1. This is telling python to get the dictionary associated with the IP address, get the count associated with the tuple in that dictionary, and then add 1. When printed, this should appear as intended.
Also, define dst as defaultdict(lambda:defaultdict(dict)) so that in case the protocol,port combination hasn't been tried, it won't throw a KeyError.

Related

How to use Python to grep for IP address in Shorewall rules file

If I use python to make a file with the list of IPs to remove and then give that file to grep as a list of regex rules I can get the result I want, but not with python on its own.
The script downloads the JSON feed from Microsoft of rule changes for Office365.
It then acts only on the "remove" changes.
I've tried using re and a simple string compare, neither have yielded any results or errors.
ips = changes['ips'] if 'ips' in changes else []
ip4s = [ip for ip in ips if '.' in ip]
for ip in ip4s:
ip_rule = 'net:' + ip
with open('/etc/shorewall/rules', 'r') as rules_file:
with open('/tmp/rules', 'w') as tmp_rules_file:
for line in rules_file:
if not ip_rule in line:
tmp_rules_file.write(line)
The actual script has 3 sections for regex URLs, domains, and IPs.
The first two work, but not the IP section. It doesn't come up with any errors or any changes.
What's supposed to happen is that it creates a temporary file that does not contain the shorewall rules that should be deleted.
Then when I vimdiff the old rules file and the temporary one I can see what needs to be deleted.
The actual result is that both files are exactly the same.
Further testing outside of python show that there are 211 lines that should be deleted.
I'm new to python, so assuming I've tripped over something and just can't see it.
Instead of using files let's try this with a small example with lists. You can do for line in file or for line in list so could make a function in the long run that you can send any iterable into. Which means you can write unit tests, show minimal example to get help.
So, let's have our ips
ip4s=["1.1.1.1", "2.2.2.2"]
and input and output "files":
rules_file = ['net:1.1.1.1', 'net:3.3.3.3']
tmp_rules_file = []
(For example - presume you can use your regex to get the format the way you need it)
Right. So, when we do our loops:
for ip in ip4s:
ip_rule = 'net:' + ip
for line in rules_file:
if not ip_rule in line:
tmp_rules_file.append(line)
So, for each ip4s we look at each line in the old rules file, one at a time.
Every line that does not have "1.1.1.1" gets written...
In this example, 1.1.1.1 does match the first IP, so is not written to tmp_rules_file.
However, 3.3.3.3 doesn't match so gets written to tmp_rules_file.
The next IP is 2.2.2.2 - which matches neither line, so when you look back over the whole file (again) both lines of the rules file get put in the temp file, ending up with:
>>> tmp_rules_file
['net:3.3.3.3', 'net:1.1.1.1', 'net:3.3.3.3']
This shows what's going wrong.
To find things in one list that are not in the other, you can use a list comprehension:
[ip for ip in ip4s if 'net:'+ip not in rules_file]
This just gives ['2.2.2.2'] in this case.
For larger data you might want to use a set and try the set difference operations.
The main part of your issue is checking for one thing against every line of the file - if it matches one, it won't match the others, so gets written back out.
You can find all IPv4 and IPv6 addresses with the following regex.
You should read the J-son file as a simple string.
The set can filter the duplication of elements of your list.
import re
json_as_str = """
"version": "2019042900",
"impact": "AddedUrl",
"add": {
....
....
....
]
"""
ip_four = list(set(re.findall(r"(?:\d{1,3}\.){3}\d{1,3}(?:/\d\d?)?", json_as_str)))
ip_six = list(set([x[0] for x in re.findall(r"(([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4})", json_as_str)]))
print("IPv4: %r" % ip_four)
print("IPv6: %r" % ip_six)
print("Number of Ipv4: %s" % len(ip_four))
print("Number of Ipv6: %s" % len(ip_six))
Output:
>>> python test.py
IPv4: [... many many items ...]
IPv6: [... many many items ...]
Number of Ipv4: 324
Number of Ipv6: 379

KeyError when appending to a dictionary of lists

I want to use a dictionary to record data relating to IP addresses, essentially an IP address can have a number of groups associated with it, and I need to capture info about the groups relating to that IP address (these are controller groups on a wireless system so the data is all relating to configuration of access points). I want something like:
{<ip_addr>: [{group_name: my_aps, total_aps: 22, total_active_aps: 12},
{group_name: my-other_aps, total_aps: 15, total_active_aps:14},
{...}
]
}
My script is looping through a list of groups (there are 300+) and pulling the info off the wireless controller. With each loop I obtain the details of the new group. But I can't work out how to then add the group dictionary to the list. I am trying (where group_details is the group dictionary and lms_ip is the address that I want to list it against):
lms_groups[lms_ip].append(group_details)
But I get:
KeyError: 'xxx.xxx.xxx.xxx'
(IP address hidden fwiw)
The script seems to work up to that point, I think the dictionaries are being created ok.
Option 1
dict.setdefault
lms_groups.setdefault(lms_ip, []).append(group_details)
Option 2
collections.defaultdict
from collections import defaultdict
lms_groups = defaultdict(list)
...
lms_groups[lms_ip].append(group_details)
I'm not sure if this fix the error, but at least it's a better access to the dict.
ip_list = lms_groups.get(lms_ip, [])
ip_list.append(group_details)
lms_groups[lms_ip] = ip_list

How to check an IP address is within a predefined list in python

Providing that I have this list which contains a number IP addresses:
IpAddresses = ["192.168.0.1","192.168.0.2","192.168.0.3","192.168.0.4"]
Then after receiving a packet I want to check if its source address is included in the predefined list IpAddresses
data, address = rxsocket.recvfrom(4096)
I have tried two alternatives, but both didn't work:
First:
if (address in IpAddresses):
do something
Then, I tried to convert address into string before making the comparison:
str_address = str(address)
if (str_address in IpAddresses):
do something
I am not familiar with python syntax, so please could you show me how to do this.
if address[0] in IpAddresses:
since the address object appears as a tuple only the 0th index appears in your list so you should check for its existence (also you can usually skip the parenthesis on an if statement unless it makes the if statement less readable)

Referencing range of IP addresses

I am trying to specify a range of addresses that will be set every time an API is called. For the example below, when api is referenced, I would like it to hosts in the range to a list, and not just one as it currently does.
api = xmlrpclib.ServerProxy("http://user:pass#192.168.0.1:8442/")
Generating the addresses seems straightforward enough, but I am unsure how to store it so that when api is reference, it's sends to every host, e.g. 192.168.0.1 - 192.168.0.100 and not just one.
for i in range(100):
ip = "192.168.0.%d" % (i)
print ip
I would also like to be able to specify the range, e.g. 192.168.0.5 - 192.168.0.50 rather then incrementing from zero.
Update: The API does not handle a list very well so the solution need to be able to parse the list. Might this simply require a second for statement?
If you want a different range:
for i in range(5,51):
ip = "192.168.0.%d" % (i)
print ip
Not sure what you mean by setting multiple. That for loop is doing that for you. If you're talking about saving references of your api, you can also throw those into a list.
api = []
for i in xrange(5,51):
ip = "192.168.0.%d" % (i)
api.append(xmlrpclib.ServerProxy("http://user:pass#" + ip))

How to split the return of socket,gethostbyaddr and write to file?

I have a script that reads addresses from a file and looks up its hostname with socket.gethostbyaddr, however the return of this function is messy and doesn't look right.
The line where it writes to the destination file reads:
destfile.write(str(socket.gethostbyaddr(ip)))
The results come out like this when it reads 8.8.8.8:
('google-public-dns-a.google.com', [], ['8.8.8.8])
However, I only need that first output, google-public-dns-a.google.com. I hope to have it write to the file and look like this:
8.8.8.8 resolves to google-public-dns-a.google.com
Anyone know how to split this? Can provide more code if needed.
Well, the first step is to split the one-liner up into multiple lines:
host = socket.gethostbyaddr(ip)
Now, you can do whatever you want to that. If you don't know what you want to do, try printing out host and type(host). You'll find that it's a tuple of 3 elements (although in this case, you could have guessed that from the string written to the file), and you want the first. So:
hostname = host[0]
Or:
hostname, _, addrlist = host
Now, you can write that to the output:
destfile.write('{} resolves to {}'.format(ip, hostname))
Another way to discover the same information would be to look at the documentation, which says:
Return a triple (hostname, aliaslist, ipaddrlist) where hostname is the primary host name responding to the given ip_address, aliaslist is a (possibly empty) list of alternative host names for the same address, and ipaddrlist is a list of IPv4/v6 addresses for the same interface on the same host (most likely containing only a single address).
Or to use the built-in help in the interpreter:
>>> help(socket.gethostbyaddr)
gethostbyaddr(host) -> (name, aliaslist, addresslist)
Return the true host name, a list of aliases, and a list of IP addresses,
for a host. The host argument is a string giving a host name or IP number.
What you want to do is unpack the tuple holding the information you want. There are multiple ways to do this, but this is what I would do:
(name, _, ip_address_list) = socket.gethostbyaddr(ip)
ip_address = ip_address_list[0]
destfile.write(ip_address + " resolves to " + name)

Categories