Appending to the dictionary dynamically

Appending to the dictionary dynamically - python

I am reading from a text file which has the format below:
0.000 ff:dd ff:ff 4 126 48000
0.001 sd:fg er:sd 5 125 67000
0.002 qw:er ff:dd 5 127 90000
0.003 xc:sd ff:dd 5 127 90000
0.004 io:uy gh:ij 4 126 56000
In the fourth column, 4 indicates request and 5 indicates response. i should form a dictionary with second column as the key if that row represents a request.
If the fourth column value is 5, it indicates that the row corresponds to the response. In this case, look at the 3rd column of that response, and if that 3rd column is there as the dictionary key, add 2rd column as an value of that corresponding key.
In the above example, the desired result is:
{'ff:dd': 1, 2, 48000, qw:er, xc:sd}, {'io:uv': 1, 0, 56000}
For ff:dd, 1 indicates that there is only 1 request from the ff:dd; 2 indicates there are 2 responses to ff:dd and 48000 is the 6th column value of the request corresponding to ff:dd. I hope you understood the question. Please ask for any clarifications.
For io:uv, since there are no responses, 1 indicates number of requests, 0 indicates number of responses and 56000 is the 6th column value for this request.
I am doing all this to analyze network traffic.
I don't know how to dynamically add the values. If there are fixed number of values, i can manage, but this is a tricky situation. I a using python2.6. Help is much appreciated. Thanks in advance!

Use a dictionary with keys being strings and the values being a tuple containing (requests, responses, 6th column, [list of response keys])

Let's make sure first that we are clear about what a dictionary is and what it can be used for (and hope I don't put my foot in my mouth - I am fairly new to Python myself).
About Dictionaries
In Python, a dict maps single keys to single values. You can read the documentation if you want, but one issue here is that you seem to want to map a single key to multiple values in your desired result:
{'ff:dd': 1, 2, 48000, qw:er, xc:sd}, {'io:uv': 1, 0, 56000}
This shows two dictionaries. Looking at the first dictionary, {'ff:dd': 1, maps the key 'ff:dd' to the value 1 only; the comma says, move on to the next key:value pair. So the rest is interpreted as keys 2 and 48000 mapped to no values (throws SyntaxError), and undefined names qw and xc mapped to undefined names er and sd (would throw NameError if you got that far). You probably meant for qw:er and xc:sd to be strings, in which case they would be seen as keys that are not mapped to any value , like the numbers 2 and 48000 before them. You can test this out in the shell:
>>> {'key':'value'} # A dict with one key, that has a value
{'key': 'value'}
>>> {'key'} # Curly braces can make a dict or a set
set(['key'])
>>> {'key1':'val1', 'key2'} # A dict needs to have values for each key
SyntaxError: invalid syntax
>>> {'key1':'val1', 'key2':} # Even an "empty" value has to be explicit
SyntaxError: invalid syntax
If you did for some reason want to define your keys before you know their values, it could be workable to use a zero or empty string, but the "correct" way to do it is probably to use dict.fromkeys() :
>>> {'key1':None, 'key2':'', 'key3':0} # One of these might do the job...
{'key3': 0, 'key2': '', 'key1': None}
>>> dict.fromkeys(['key1','key2', 'key3']) # ...but this is probably better.
{'key3': None, 'key2': None, 'key1': None}
(As a side note, you probably don't want to talk about "appending" to a dictionary. That's something you would do with an ordered list, e.g. by mylist.append(myvalue). There is no append() method for a dictionary; what you do is set a key, and if the key doesn't exist, it is created. "Append" means adding to the end, but dictionaries are unordered so they have no "end" as such.)
Storing Your Data
Now I'm going to make an assumption about what you're trying to do with qw:er, xc:sd, because it's not totally clear about your question. My assumption is that you simply want to have a list of the responses that were sent to 'ff:dd'. If you wanted, you can do other things, but with that assumption I'll try to shed some light on how to do something like what you want to do. It looks like your desired result is something like:
traffic = { 'ff:dd': { 'reqfrom':1, 'respto':2, 'reqvals':[48000],
'responders':['qw:er', 'xc:sd'] },
'io:uv': { 'reqfrom':1, 'respto':0, 'reqvals':[56000],
'responders':[] }
}
At the top level of the traffic dictionary, there are two keys: 'ff:dd' and 'io:uv'. The value of each key is another dictionary, so that you can access the number of requests from a key, responses to a key, and other values associated with that particular address, as follows:
>>> traffic['io:uv']['reqfrom'] # How many requests from 'io:uv'?
1
>>> traffic['ff:dd']['responders'] # What are the responses to 'ff:dd'?
['qw:er', 'xc:sd']
So, how do you dynamically store these values? Normally, you would simply assign a value to a key, like mydict['key'] = 'value'. The value of the key will be updated if the key already exists; if not, the key-value pair will be added to the dictionary. But since the values of your first keys are themselves dictionaries, it's a bit trickier.
Try this...
Here's an example of one possible approach, using the above assumed structure. I won't go into too much detail because more experienced Python users can probably show you better ways to do the same thing. Try this code on for size - run it, read it, break it, etc. You should be able to figure out how to adapt it for your purposes.
traffic = {}
packets = (('0.000', 'ff:dd', 'ff:ff', '4', '126', '48000'),
('0.001', 'sd:fg', 'er:sd', '5', '125', '67000'),
('0.002', 'qw:er', 'ff:dd', '5', '127', '90000'),
('0.003', 'xc:sd', 'ff:dd', '5', '127', '90000'),
('0.004', 'io:uv', 'gh:ij', '4', '126', '56000'))
def record_packet(packet):
if packet[3] == '4': # Request
# Set up the key-value if it doesn't exist
if packet[1] not in traffic:
traffic[packet[1]] = {'reqfrom':0,
'respto':0,
'reqvals':[],
'responders':[]
}
traffic[packet[1]]['reqfrom'] += 1
traffic[packet[1]]['reqvals'].append(packet[5])
elif packet[3] == '5': # Response
# Record the response IFF there has been a request
if packet[2] in traffic:
traffic[packet[2]]['respto'] += 1
traffic[packet[2]]['responders'].append(packet[1])
else:
# Handle weirdness here
pass
for packet in packets:
record_packet(packet)
for key in traffic:
for item in traffic[key]:
print "traffic['{0}']['{1}'] has value: {2}".format(key, item, traffic[key][item])

Let's try a data structure which makes things easier.
If I understand you correctly, you'd like these addresses to be the keys of your dictionary, and you'd like the data stored under these keys to be:
The number of requests sent from this address
The number of requests received at this address
A list of addresses which sent requests to this address
This would probably best be formatted like so:
{address:[sent, recv, [address, address, ...]], ...}
So, you may read the text file like so:
with open('myfile.txt', 'r') as myfile:
dct = {}
for line in myfile:
splt = line.split()
Now you are iterating over every line in the file and splitting it by the columns. Next, we'd like to determine if it was a request or a response:
if splt[3] == '4': # request
# we're about to fill this in
elif split[3] == '5': # response
# we're about to fill this in
This checks the 4th column for either '4' or '5'. You will run into problems if you have another value there (i.e. you have poorly formatted data).
Now, if we are handling a request, we'd do the following:
if splt[1] not in dct:
dct[splt[1]] = [1, 0, []]
else:
dct[splt[1]][0] += 1
This increments the number of requests sent by 1; if the address is not in the dictionary yet, then we add it. We use splt[1] here because we are talking about the sender's address, not the receiver's.
If we are handling a response, we'd act differently, of course. I am assuming here that you will never send a response without a request first being made... but just in case, I've put that case in there, with pass to just ignore it. It's up to you to figure out what you want to do with that...
if splt[2] not in dct:
pass
else:
dct[splt[2]][1] += 1
if splt[1] not in dct[splt[2]][2]:
dct[splt[2]][2].append(splt[1])
Here, we increment the number of responses received by 1, and add the responder's address to our list. We use splt[2] as the key instead because we are talking about the receiver's address, not the sender's.
This ought to be the crux for what you are trying to accomplish - although I don't understand what the fourth and fifth columns do, so I've omitted them.

Related

Trouble converting "for key in dict" to == for exact matching

Good morning,
I am having trouble pulling the correct value from my dictionary because there are similar keys. I believe I need to use the == instead of in however when I try to change if key in c_item_number_one: to if key == c_item_number_one: it just returns my if not_found: print("Specify Size One") however I know 12" is in the dictionary.
c_item_number_one = ('12", Pipe,, SA-106 GR. B,, SCH 40, WALL smls'.upper())
print(c_item_number_one)
My formula is as follows:
def item_one_size_one():
not_found = True
for key in size_one_dict:
if key in c_item_number_one:
item_number_one_size = size_one_dict[key]
print(item_number_one_size)
not_found = False
break
if not_found:
print("Specify Size One")
item_one_size_one()
The current result is:
12", PIPE,, SA-106 GR. B,, SCH 40, WALL SMLS
Specify Size One

To split the user input into fields, use re.split
>>> userin
'12", PIPE,, SA-106 GR. B,, SCH 40, WALL SMLS'
>>> import re
>>> fields = re.split('[ ,]*',userin)
>>> fields
['12"', 'PIPE', 'SA-106', 'GR.', 'B', 'SCH', '40', 'WALL', 'SMLS']
Then compare the key to the first field, or to all fields:
if key == fields[0]:

There are two usages of the word in here - the first is in the context of a for loop, and the second entirely distinct one is in the context of a comparison.
In the construction of a for loop, the in keyword connects the variable that will be used to hold the values extracted from the loop to the object containing values to be looped over.
e.g.
for x in list:
Meanwhile, the entirely distinct usage of the in keyword can be used to tell python to perform a collection test where the left-hand side item is tested to see whether it exists in the rhs-object's collection.
e.g.
if key in c_item_number_one:
So the meaning of the in keyword is somewhat contextual.
If your code is giving unexpected results then you should be able to replace the if-statement to use an == test, while keeping everything else the same.
e.g.
if key == c_item_number_one:
However, since the contents of c_item_number_one is a tuple, you might only want to test equality for the first item in that tuple - the number 12 for example. You should do this by indexing the element in the tuple for which you want to do the comparison:
if key == c_item_number_one[0]:
Here the [0] is telling python to extract only the first element from the tuple to perform the == test.
[edit] Sorry, your c_item_number_one isn't a tuple, it's a long string. What you need is a way of clearly identifying each item to be looked up, using a unique code or value that the user can enter that will uniquely identify each thing. Doing a string-match like this is always going to throw up problems.
There's potential then for a bit of added nuance, the 1st key in your example tuple is a string of '12'. If the key in your == test is a numeric value of 12 (i.e. an integer) then the test 12 == '12' will return false and you won't extract the value you're after. That your existing in test succeeds currently suggests though that this isn't a problem here, but might be something to be aware of later.

Getting an dict by name with in a list with an element key within a variable

data = {
"items" : [{"potion" : 1}, {"potion2" : 1}]
}
print(data["items"][0]["potion"])
So, here's the jiz. I want to get potion2 without providing number like [0] but i can't because some variables has 5 items within the list while another one might have 3 items so providing a number might not giving me what i need. Is there a way to get potion2 without providing that number before it?

I'm assuming you don't want to provide the index because hard coding it will not work in all circumstances.
You can just pull out any items in the list which have that key.
Build a list of any items, which have that key. It might ordinarily be just one, but the container itself does not enforce that only one entry can have that key.
After that you can either iterate over the list or check if the returned value is empty and just take the first element.
>>> data = {'items': [{'potion': 1}, {'potion2': 1}]}
>>> e = filter(lambda i: 'potion' in i, data['items'])
>>> for i in e:
... print(i['potion'])
...
1
Or to pull out only the first element. I realize you said no indices, but this index is applied to the filtered list and we check that its not empty first, so it's a valid thing to do.
>>> if e:
... print(e[0]['potion'])
...
1

From an (ID, number) pair keep only those pairs that contain the largest number

I am new in python and I would like some help for a small problem. I have a file whose each line has an ID plus an associated number. More than one numbers can be associated to the same ID. How is it possible to get only the ID plus the largest number associated with it in python?
Example:
Input: ID_file.txt
ENSG00000133246 2013 ENSG00000133246 540
ENSG00000133246 2010
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
ENSG00000158458 2553
What I want is the following:
ENSG00000133246 2013
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
Thanks in advance for any help!

I would think there are many ways to do this I would though use a dictionary
from collections import defaultdict
id_value_dict = defaultdict()
for line in open(idfile.txt).readlines():
id, value = line.strip().split()
if id not in id_value_dict:
id_value_dict[id] = int(value)
else:
if id_value_dict[id] < int(value):
id_value_dict[id] = int(value)
Next step is to get the dictionary written out
out_ref = open(outputfile.txt,'w')
for key, value in id_value_dict:
outref.write(key + '\t' + str(value)
outref.close()
There are slicker ways to do this, I think the dictionary could be written in a one-liner using a lamda or a list-comprehension but I like to start simple
Just in case you need the results sorted there are lots of ways to do it but I think it is critical to understand working with lists and dictionaries in python as I have found that the learning to think about the right data container is usually the key to solving many of my problems but I am still a new. Any way if you need the sorted results a straightforward was is to
id_value_dict.keys().sort()
SO this is one of the slick things about python id_value__dict.keys() is a list of the keys of the dictionary sorted
out_ref = open(outputfile.txt,'w')
for key in id_value_dict.keys():
outref.write(key + '\t' + str(id_value_dict[key])
outref.close()
its really tricky because you might want (I know I always want) to code
my_sorted_list = id_value_dict.keys().sort()
However you will find that my_sorted_list does not exist (NoneType)

Given that your input consists of nothing but contiguous runs for each ID—that is, as soon as you see another ID, you never see the previous ID again—you can just do this:
import itertools
import operator
with open('ID_file.txt') as idfile, open('max_ID_file.txt', 'w') as maxidfile:
keyvalpairs = (line.strip().split(None, 1) for line in idfile)
for key, group in itertools.groupby(keyvalpairs, operator.itemgetter(0)):
maxval = max(int(keyval[1]) for keyval in group)
maxidfile.write('{} {}\n'.format(key, maxval))
To see what this does, let's go over it line by line.
A file is just an iterable full of lines, so for line in idfile means exactly what you'd expect. For each line, we're calling strip to get rid of extraneous whitespace, then split(None, 1) to split it on the first space, so we end up with an iterable full of pairs of strings.
Next, we use groupby to change that into an iterable full of (key, group) pairs. Try printing out list(keyvalpairs) to see what it looks like.
Then we iterate over that, and just use max to get the largest value in each group.
And finally, we print out the key and the max value for the group.

get size of populated dictionary

I would like to get the size of a populated dictionary in python. I tried this:
>>> dict = {'1':1}
>>> import sys
>>> print dict
{'1': 1}
>>> sys.getsizeof(dict)
140
but this apparently wouldn't do it. The return value I'd expect is 2 (Bytes). I'll have a dictionary with contents like:
{'L\xa3\x93': '\x15\x015\x02\x00\x00\x00\x01\x02\x02\x04\x1f\x01=\x00\x9d\x00^\x00e\x04\x00\x0b', '\\\xe7\xe6': '\x15\x01=\x02\x00\x00\x00\x01\x02\x02\x04\x1f\x01B\x00\xa1\x00_\x00c\x04\x02\x17', '\\\xe8"': '\x15\xff\x1d\x02\x00\x00\x00\x01\x02\x02\x04\x1f\x01:\x00\x98\x00Z\x00_\x04\x02\x0b', '\\\xe6#': '\x15\x014\x02\x00\x00\x00\x01\x02\x02\x04\x1f\x01#\x00\x9c\x00\\\x00b\x04\x00\x0b'}
and I want to know how many Bytes of data I need to send. my index is 6 Bytes but how long is the content? I know here it's 46Bytes per index, so I'd like to know that I need to transmit 4*(6+46) Bytes.... How do I do this best?
Thanks,
Ron

So, does only this give me the real length when I need to transmit the content Byte by Byte?
#non_mem_macs is my dictionary
for idx in non_mem_macs:
non_mem_macs_len += len(hexlify(idx))
non_mem_macs_len += len(hexlify(non_mem_macs[idx]))

using Python to import a CSV (lookup table) and add GPS coordinates to another output CSV

So I have already imported one XML-ish file with 3000 elements and parsed them into a CSV for output. But I also need to import a second CSV file with 'keyword','latitude','longitude' as columns and use it to add the GPS coordinates to additional columns on the first file.
Reading the python tutorial, it seems like {dictionary} is what I need, although I've read on here that tuples might be better. I don't know.
But either way - I start with:
floc = open('c:\python\kenya_location_lookup.csv','r')
l = csv.DictReader(floc)
for row in l: print row.keys()
The output look like:
{'LATITUDE': '-1.311467078', 'LONGITUDE': '36.77352011', 'KEYWORD': 'Kianda'}
{'LATITUDE': '-1.315288401', 'LONGITUDE': '36.77614331', 'KEYWORD': 'Soweto'}
{'LATITUDE': '-1.315446430425027', 'LONGITUDE': '36.78170621395111', 'KEYWORD': 'Gatwekera'}
{'LATITUDE': '-1.3136151425171327', 'LONGITUDE': '36.785863637924194', 'KEYWORD': 'Kisumu Ndogo'}
I'm a newbie (and not a programmer). Question is how do I use the keys to pluck out the corresponding row data and match it against words in the body of the element in the other set?

Reading the python tutorial, it seems
like {dictionary} is what I need,
although I've read on here that tuples
might be better. I don't know.
They're both fine choices for this task.
print row.keys() The output look
like:
{'LATITUDE': '-1.311467078',
No it doesn't! This is the output from print row, most definitely NOT print row.keys(). Please don't supply disinformation in your questions, it makes them really hard to answer effectively (being a newbie makes no difference: surely you can check that the output you provide actually comes from the code you also provide!).
I'm a newbie (and not a programmer).
Question is how do I use the keys to
pluck out the corresponding row data
and match it against words in the body
of the element in the other set?
Since you give us absolutely zero information on the structure of "the other set", you make it of course impossible to answer this question. Guessing wildly, if for example the entries in "the other set" are also dicts each with a key of KEYWORD, you want to build an auxiliary dict first, then merge (some of) its entries in the "other set":
l = csv.DictReader(floc)
dloc = dict((d['KEYWORD'], d) for d in l)
for d in otherset:
d.update(dloc.get(d['KEYWORD'], ()))
This will leave the location missing from the other set when not present in a corresponding keyword entry in the CSV -- if that's a problem you may want to use a "fake location" dictionary as the default for missing entries instead of that () in the last statement I've shown. But, this is all wild speculation anyway, due to the dearth of info in your Q.

If you dump the DictReader into a list (data = [row for row in csv.DictReader(file)]), and you have unique keywords for each row, convert that list of dictionaries into a dictionary of dictionaries, using that keyword as the key.
>>> data = [row for row in csv.DictReader(open('C:\\my.csv'),
... ('num','time','time2'))]
>>> len(data) # lots of old data :P
1410
>>> data[1].keys()
['time2', 'num', 'time']
>>> keyeddata = {}
>>> for row in data[2:]: # I have some junk rows
... keyeddata[row['num']] = row
...
>>> keyeddata['32']
{'num': '32', 'time2': '8', 'time': '13269'}
Once you have the keyword pulled out, you can iterate through your other list, grab the keyword from it, and use it as the index for the lat/long list. Pull out the lat/long from that index and add it to the other list.

Thanks -
Alex: My code for the other set is working, and the only relevant part is that I have a string that may or may not contain the 'keyword' that is in this dictionary.
Structurally, this is how I organized it:
def main():
f = open('c:\python\ggce.sms', 'r')
sensetree = etree.parse(f)
senses = sensetree.getiterator('SenseMakingItem')
bodies = sensetree.getiterator('Body')
stories = []
for body in bodies:
fix_body(body)
storybyte = unicode(body.text)
storybit = storybyte.encode('ascii','ignore')
stories.append(storybit)
rows = [ids,titles,locations,stories]
out = map(None, *rows)
print out[120:121]
write_data(out,'c:\python\output_test.csv')
(I omitted the code for getting its, titles, locations because they work and will not be used to get the real locations from the data within stories)
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.