Python: cycling/scanning though fields in an object

Python: cycling/scanning though fields in an object - python

I have a JSON file named MyFile.json that contains this structure:
[{u'randomName1': {u'A': 16,u'B': 20,u'C': 71},u'randomName2': {u'A': 12,u'B': 17,u'C': 47}},...]
I can open the file and load it like this:
import json
with open('MyFile.json') as data_file:
data = json.load(data_file)
And I can access the values in the first element like this:
data[0]["randomName1"][A]
data[0]["randomName1"][B]
data[0]["randomName1"][C]
data[0]["randomName2"][A]
data[0]["randomName2"][B]
data[0]["randomName2"][C]
The A B C keys are always named A B C (and there are always exactly 3 of them, so that's no problem.
The problem is:
1) I don't know how many elements are in the list, and
2) I don't know how many "randomName" keys are in each element, and
3) I don't know the names of the randomName keys.
How do I scan/cycle through the entire file, getting all the elements, and getting all the key names and associated key values for each element?
I don't have the knowledge or desire to write a complicated parsing script of my own. I was expecting that there's a way for the json library to provide this information.
For example (and this is not a perfect analogy I realize) if I am given an array X in AWK, I can scan all the index/name pairs by using
for(index in X){print index, X[index]);
Is there something like this in Python?
---------------- New info below this line -------------
Thank you Padraic and E.Gordon. That goes a long way toward solving the problem.
In an attempt to make my initial post as concise as possible, I simplified my JSON data example too much.
My JSON data actually looks this this:
data=[
{ {u'X': u'randomName1': {u'A': 11,u'B': 12,u'C': 13}, u'randomName2': {u'A': 21,u'B': 22,u'C': 23}, ... }, u'Y': 101, u'Z': 102 },
.
.
.
]
The ellipses represent arbitrary repetition, as described in the original post. The X Y Z keys are always named X Y Z (and there are always exactly 3 of them).
Using your posts as a starting point, I've been working on this for a couple of hours, but being new to Python I'm stumped. I cannot figure out how to add the extra loop to work with that data. I would like the output stream to look something like this:
Z,102,Y,101,randomName1,A,11,B,12,C,13,randomName2,A,21,B,22,C,23,...
.
.
.
Thanks for your help.
-
----------------- 3/23/16 update below --------------
Again, thanks for the help. Here's what I finally came up with. It does what I need:
import json
with open('MyFile.json') as data_file:
data = json.load(data_file)
for record in data:
print record['Z'],record['Y']
for randomName in record['X']:
print randomName, randomName['A'], randomName['B'],randomName['C']
...

You can print the items in the dicts:
js = [{u'randomName1': {u'A': 16,u'B': 20,u'C': 71},u'randomName2': {u'A': 12,u'B': 17,u'C': 47}}]
for dct in js:
for k, v in dct.items():
print(k, v)
Which gives you the key/inner dict pairings:
randomName1 {'B': 20, 'A': 16, 'C': 71}
randomName2 {'B': 17, 'A': 12, 'C': 47}
If you want the values from the inner dicts you can add another loop
for dct in js:
for k1, d in dct.items():
print(k1)
for k2,v in d.items():
print(k2,v)
Which will give you:
randomName1
A 16
B 20
C 71
randomName2
A 12
B 17
C 47
If you have arbitrary levels of nesting we will have to do it recursively.

You can use the for element in list construct to loop over all the elements in a list, without having to know its length.
The iteritems() dictionary method provides a convenient way to get the key-value pairs from a dictionary, again without needing to know how many there are or what the keys are called.
For example:
import json
with open('MyFile.json') as data_file:
data = json.load(data_file)
for element in data:
for name, values in element.iteritems():
print("%s has A=%d, B=%d and C=%d" % (name,
values["A"],
values["B"],
values["C"]))

Related

Python - Make a Dictionary with a variable number of keys/values

Okay, the title is a bit confusing, but let me elaborate.
Some methods in Java have a useful thing called varargs that allow for varying amounts of arguments in methods. It looks something like this:
void method(String... args) {
for (String arg : args) {
// TODO
}
}
I am trying to learn Python through a course, and one of the assignments is asking me to take a CSV file with a varying amount of Strings at the top that represents repeating sequences of DNA in a strand. Here's an example:
name,AGATC,AATG,TATC
Alice,2,8,3
However, they also offer different CSV files that have differing amounts of DNA sequences to check for, like the example below:
name,AGATC,TTTTTTCT,AATG,TCTAG,GATA,TATC,GAAA,TCTG
Jason,15,49,38,5,14,44,14,12
(the numbers equate to how many of the above DNA sequences are repeated in their strand. So Jason has 15 AGATC repetitions in this strand)
I want to use a Dictionary variable to store the name and all their repetitions in it. However, since I don't know in advance how many DNA sequences I'll have to check for, the Dictionary has to be programmed with any number of those sequences in mind. Is there a way to use something similar to Java's varargs in a Python Dictionary?
The output format I want is to convert the group of people and their repetitions inside the DNA database into a List that contains a Dictionary that equates to each person. Because the CSV file can contain a variable number of DNA sequences (as shown above), I want to have each person's Dictionary have their name as their first key, then an additional amount of keys for each DNA Sequence in the CSV file. Here's an example that adheres to the snippet of the CSV file above:
{"name": "Jason", "seq1": 15, "seq2": 49, "seq3": 38, "seq4": 5, "seq5": 14, "seq6": 4, "seq7": 14, "seq8": 12}

You can use *args to get a list containing all the arguments
def my_seq(*args):
for arg in args:
print (arg)
my_seq('a', 'b', 'c', 'd')

All Python dictionaries have a variable number of items, since they're mutable, so this is a bit of an XY problem, but to get what you want, you can use a csv.DictReader (as Thierry Lathuille commented).
Let's call your first example example1.csv:
name,AGATC,AATG,TATC
Alice,2,8,3
To read it, you can do something like this:
import csv
with open('example1.csv') as f:
rows = list(csv.DictReader(f))
print(rows)
# -> [{'name': 'Alice', 'AGATC': '2', 'AATG': '8', 'TATC': '3'}]
The numbers aren't automatically converted to ints, but you could use a dict comprehension:
rows = [
{k: v if k == 'name' else int(v) for k, v in row.items()}
for row in rows
]
print(rows)
# -> [{'name': 'Alice', 'AGATC': 2, 'AATG': 8, 'TATC': 3}]
Note that the DNA sequences themselves will probably be more useful as keys than 'seq1', 'seq2', etc. For example if you read in your other CSV as rows2, you can then do set-like operations on the keys:
>>> alice = rows[0]
>>> jason = rows2[0]
>>> len(alice.keys() - jason.keys()) # How many keys are unique to Alice?
0
>>> jason.keys() - alice.keys() # What keys does Jason have that Alice doesn't?
{'TCTAG', 'GATA', 'TCTG', 'TTTTTTCT', 'GAAA'}
If you want to get really advanced, you can use a Pandas DataFrame. Here's just a short example, cause I'm not very familiar with it myself :)
import pandas as pd
files = 'example2.csv', 'example1.csv' # Note the order
dfs = [pd.read_csv(f, index_col="name") for f in files]
df = pd.concat(dfs, sort=False)
df = df.astype('Int64') # allow ints and NaN in the same column
print(df)
Output:
AGATC TTTTTTCT AATG TCTAG GATA TATC GAAA TCTG
name
Jason 15 49 38 5 14 44 14 12
Alice 2 NaN 8 NaN NaN 3 NaN NaN

Python 3.6 merge dictionaries fails

I am trying to merge two dictionaries, after searching for a close question on stack overflow, I found the next solution:
mergeDicts = {**dict1, **dict2}
but that doesn't work. While I know my code is alright as I observe right results for single dictionary, once I merge I don't get right results
def readFiles(path1):
// count words
if __name__ == '__main__':
a = readFiles('C:/University/learnPy/dir')
b = readFiles('C:/Users/user/Anaconda3/dir')
bigdict = {**a, **b}
print(a['wee'])
print(b['wee'])
print(bigdict['wee'])
In a there's 1 .txt file containing 2 wee
In b there's 1 .txt file containing 1 wee
So I'd expect bigdict output to be 3, but what I observe is bigdict is just getting the numbers of the first dict. {**dict1 (THIS ONE), **dict2} and the merge is not working.
Question: what went wrong ? why is this failing on python 3.6 when answers stated it should work.

dict(**x, **y) is doing what its supposed to do. Creates bigdict by overwriting values of 1st arg with the 2nd arg. You will need to sum the values by your self.
You can use a Counter
from collections import Counter
a = {'wee':1, 'woo':2 }
b = {'wee':10, 'woo': 20 }
bigdict = dict(Counter(a)+Counter(b))
Out[23]: {'wee': 11, 'woo': 22}

Shortest path algorithm using dictionaries [Python]

This is my first question and actually my first time trying this but I read the rules of the questions and I hope my question comply with all of them.
I have a project for my algorithm subject, and it is to design a gui for dijkstra shortest path algorthim. I chose to use python because it is a language that I would like to master. I have been trying for more than a week actually and I am facing troubles all the way. But anyways this is good fun :)!
I chose to represent my directed graph as a dictionary in this way :
g= {'A': {"B": 20, 'D': 80, 'G' :90}, # A can direct to B, D and G
'B': {'F' : 10},
'F':{'C':10,'D':40},
'C':{'D':10,'H':20,'F':50},
'D':{'G':20},
'G':{'A':20},
'E':{'G':30,'B':50},
'H':None} # H is not directed to anything, but can accessed through C
so the key is the vertice and the value is the linked vetrices and the weights. This is an example of a graph but I was planning to ask the user to input their own graph details and examine the shortest path between each two nodes [start -> end] The problem is however that I don't even know how to access the inner dictionary so I can work on the inner paramteters, and I tried many ways like those two:
for i in g:
counter = 0
print g[i[counter]] # One
print g.get(i[counter]) # Two
but the both give me the same output which is: (Note that I can't really access and play with the inner paramters)
{"B": 20, 'D': 80, 'G' :90}
{'F' : 10}
{'C':10,'D':40}
{'D':10,'H':20,'F':50}
{'G':20}
{'A':20}
{'G':30,'B':50}
None
So my question is, could you please help me with how to access the inner dictionaries so I can start working on the algorithm itself. Thanks a lot in advance and thanks for reading.

This is actually not so hard, and should make complete sense once you see it. Let's take your g. We want to get the weight of the 'B' connection from the 'A' node:
>>> d = g['A']
>>> d
{"B": 20, 'D': 80, 'G' :90}
>>> d['B']
20
>>> g['A']['B']
20
Using g['A'] gets us the value of the key in dictionary g. We can act directly on this value by referring to the 'B' key.

Using a for loop will iterate over the keys of a dictionary, and by using the key, you can fetch the value that is associated to the key. If the value itself is a dictionary, you can use another loop.
for fromNode in g:
neighbors = g[fromNode]
for toNode in neighbors:
distance = neighbors[toNode]
print("%s -> %s (%d)" % (fromNode, toNode, distance))
Note that for this to work, you should use an empty dictionary {} instead of None when there are no neighbors.

I guess these give you some ideas:
for dict in g:
print dict.get("B","")
for dict in g:
print dict.keys() #or dict.values()
for dict in g:
print dict["B"]

Appending to the dictionary dynamically

I am reading from a text file which has the format below:
0.000 ff:dd ff:ff 4 126 48000
0.001 sd:fg er:sd 5 125 67000
0.002 qw:er ff:dd 5 127 90000
0.003 xc:sd ff:dd 5 127 90000
0.004 io:uy gh:ij 4 126 56000
In the fourth column, 4 indicates request and 5 indicates response. i should form a dictionary with second column as the key if that row represents a request.
If the fourth column value is 5, it indicates that the row corresponds to the response. In this case, look at the 3rd column of that response, and if that 3rd column is there as the dictionary key, add 2rd column as an value of that corresponding key.
In the above example, the desired result is:
{'ff:dd': 1, 2, 48000, qw:er, xc:sd}, {'io:uv': 1, 0, 56000}
For ff:dd, 1 indicates that there is only 1 request from the ff:dd; 2 indicates there are 2 responses to ff:dd and 48000 is the 6th column value of the request corresponding to ff:dd. I hope you understood the question. Please ask for any clarifications.
For io:uv, since there are no responses, 1 indicates number of requests, 0 indicates number of responses and 56000 is the 6th column value for this request.
I am doing all this to analyze network traffic.
I don't know how to dynamically add the values. If there are fixed number of values, i can manage, but this is a tricky situation. I a using python2.6. Help is much appreciated. Thanks in advance!

Use a dictionary with keys being strings and the values being a tuple containing (requests, responses, 6th column, [list of response keys])

Let's make sure first that we are clear about what a dictionary is and what it can be used for (and hope I don't put my foot in my mouth - I am fairly new to Python myself).
About Dictionaries
In Python, a dict maps single keys to single values. You can read the documentation if you want, but one issue here is that you seem to want to map a single key to multiple values in your desired result:
{'ff:dd': 1, 2, 48000, qw:er, xc:sd}, {'io:uv': 1, 0, 56000}
This shows two dictionaries. Looking at the first dictionary, {'ff:dd': 1, maps the key 'ff:dd' to the value 1 only; the comma says, move on to the next key:value pair. So the rest is interpreted as keys 2 and 48000 mapped to no values (throws SyntaxError), and undefined names qw and xc mapped to undefined names er and sd (would throw NameError if you got that far). You probably meant for qw:er and xc:sd to be strings, in which case they would be seen as keys that are not mapped to any value , like the numbers 2 and 48000 before them. You can test this out in the shell:
>>> {'key':'value'} # A dict with one key, that has a value
{'key': 'value'}
>>> {'key'} # Curly braces can make a dict or a set
set(['key'])
>>> {'key1':'val1', 'key2'} # A dict needs to have values for each key
SyntaxError: invalid syntax
>>> {'key1':'val1', 'key2':} # Even an "empty" value has to be explicit
SyntaxError: invalid syntax
If you did for some reason want to define your keys before you know their values, it could be workable to use a zero or empty string, but the "correct" way to do it is probably to use dict.fromkeys() :
>>> {'key1':None, 'key2':'', 'key3':0} # One of these might do the job...
{'key3': 0, 'key2': '', 'key1': None}
>>> dict.fromkeys(['key1','key2', 'key3']) # ...but this is probably better.
{'key3': None, 'key2': None, 'key1': None}
(As a side note, you probably don't want to talk about "appending" to a dictionary. That's something you would do with an ordered list, e.g. by mylist.append(myvalue). There is no append() method for a dictionary; what you do is set a key, and if the key doesn't exist, it is created. "Append" means adding to the end, but dictionaries are unordered so they have no "end" as such.)
Storing Your Data
Now I'm going to make an assumption about what you're trying to do with qw:er, xc:sd, because it's not totally clear about your question. My assumption is that you simply want to have a list of the responses that were sent to 'ff:dd'. If you wanted, you can do other things, but with that assumption I'll try to shed some light on how to do something like what you want to do. It looks like your desired result is something like:
traffic = { 'ff:dd': { 'reqfrom':1, 'respto':2, 'reqvals':[48000],
'responders':['qw:er', 'xc:sd'] },
'io:uv': { 'reqfrom':1, 'respto':0, 'reqvals':[56000],
'responders':[] }
}
At the top level of the traffic dictionary, there are two keys: 'ff:dd' and 'io:uv'. The value of each key is another dictionary, so that you can access the number of requests from a key, responses to a key, and other values associated with that particular address, as follows:
>>> traffic['io:uv']['reqfrom'] # How many requests from 'io:uv'?
1
>>> traffic['ff:dd']['responders'] # What are the responses to 'ff:dd'?
['qw:er', 'xc:sd']
So, how do you dynamically store these values? Normally, you would simply assign a value to a key, like mydict['key'] = 'value'. The value of the key will be updated if the key already exists; if not, the key-value pair will be added to the dictionary. But since the values of your first keys are themselves dictionaries, it's a bit trickier.
Try this...
Here's an example of one possible approach, using the above assumed structure. I won't go into too much detail because more experienced Python users can probably show you better ways to do the same thing. Try this code on for size - run it, read it, break it, etc. You should be able to figure out how to adapt it for your purposes.
traffic = {}
packets = (('0.000', 'ff:dd', 'ff:ff', '4', '126', '48000'),
('0.001', 'sd:fg', 'er:sd', '5', '125', '67000'),
('0.002', 'qw:er', 'ff:dd', '5', '127', '90000'),
('0.003', 'xc:sd', 'ff:dd', '5', '127', '90000'),
('0.004', 'io:uv', 'gh:ij', '4', '126', '56000'))
def record_packet(packet):
if packet[3] == '4': # Request
# Set up the key-value if it doesn't exist
if packet[1] not in traffic:
traffic[packet[1]] = {'reqfrom':0,
'respto':0,
'reqvals':[],
'responders':[]
}
traffic[packet[1]]['reqfrom'] += 1
traffic[packet[1]]['reqvals'].append(packet[5])
elif packet[3] == '5': # Response
# Record the response IFF there has been a request
if packet[2] in traffic:
traffic[packet[2]]['respto'] += 1
traffic[packet[2]]['responders'].append(packet[1])
else:
# Handle weirdness here
pass
for packet in packets:
record_packet(packet)
for key in traffic:
for item in traffic[key]:
print "traffic['{0}']['{1}'] has value: {2}".format(key, item, traffic[key][item])

Let's try a data structure which makes things easier.
If I understand you correctly, you'd like these addresses to be the keys of your dictionary, and you'd like the data stored under these keys to be:
The number of requests sent from this address
The number of requests received at this address
A list of addresses which sent requests to this address
This would probably best be formatted like so:
{address:[sent, recv, [address, address, ...]], ...}
So, you may read the text file like so:
with open('myfile.txt', 'r') as myfile:
dct = {}
for line in myfile:
splt = line.split()
Now you are iterating over every line in the file and splitting it by the columns. Next, we'd like to determine if it was a request or a response:
if splt[3] == '4': # request
# we're about to fill this in
elif split[3] == '5': # response
# we're about to fill this in
This checks the 4th column for either '4' or '5'. You will run into problems if you have another value there (i.e. you have poorly formatted data).
Now, if we are handling a request, we'd do the following:
if splt[1] not in dct:
dct[splt[1]] = [1, 0, []]
else:
dct[splt[1]][0] += 1
This increments the number of requests sent by 1; if the address is not in the dictionary yet, then we add it. We use splt[1] here because we are talking about the sender's address, not the receiver's.
If we are handling a response, we'd act differently, of course. I am assuming here that you will never send a response without a request first being made... but just in case, I've put that case in there, with pass to just ignore it. It's up to you to figure out what you want to do with that...
if splt[2] not in dct:
pass
else:
dct[splt[2]][1] += 1
if splt[1] not in dct[splt[2]][2]:
dct[splt[2]][2].append(splt[1])
Here, we increment the number of responses received by 1, and add the responder's address to our list. We use splt[2] as the key instead because we are talking about the receiver's address, not the sender's.
This ought to be the crux for what you are trying to accomplish - although I don't understand what the fourth and fifth columns do, so I've omitted them.

check dictionary for doubled keys [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to raise error if duplicates keys in dictionary
I was recently generating huge dictionaries with hundreds of thousands of keys (such that noticing a bug by looking at them wasn't feasible). They were syntactically correct, yet there was a bug somewhere. It boiled down to "duplicate keys":
{'a':1, ..., 'a':2}
this code compiles fine and I could not figure out why a key has value of 2 as I expected 1. The problem is obvious now.
The question is how I can prevent that in the future. I think this is impossible within python. I used
grep "'.*'[ ]*:" myfile.py | sort | uniq -c | grep -v 1
which is not bulletproof. Any other ideas (within python, this grep is just to illustrate what I'd tried)?
EDIT: I don't want duplicate keys, just need to spot that this occurs and edit data manually

A dict cannot contain double keys. So all you need to do is execute the code and then dump the repr() of the dict.
Another option is creating the dict items as (key, value) tuples. By storing them in a list you can easily create a dict from them and then check if the len()s of the dict/list differ.

If you need to have multiple values per key you can store the values in a list using defaultdict.
>>> from collections import defaultdict
>>> data_dict = defaultdict(list)
>>> data_dict['key'].append('value')
>>> data_dict
defaultdict(<type 'list'>, {'key': ['value']})
>>> data_dict['key'].append('second_value')
>>> data_dict
defaultdict(<type 'list'>, {'key': ['value', 'second_value']})

Are you generating a Python file containing a giant dictionary? Something like:
print "{"
for lines in file:
key, _, value = lines.partition(" ")
print " '%s': '%s',"
print "}"
If so, there's not much you can do to prevent this, as you cannot easily override the construction of the builtin dict.
Instead I'd suggest you validate the data while constructing the dictionary string. You could also generate different syntax:
dict(a = '1', a = '2')
..which will generate a SyntaxError if the key is duplicated. However, these are not exactly equivalent, as dictionary keys are a lot more flexible than keyword-args (e.g {123: '...'} is valid, butdict(123 = '...')` is an error)
You could generate a function call like:
uniq_dict([('a', '...'), ('a', '...')])
Then include the function definition:
def uniq_dict(values):
thedict = {}
for k, v in values:
if k in thedict:
raise ValueError("Duplicate key %s" % k)
thedict[k] = v
return thedict

You don't say or show exactly how you're generating the dictionary display you have where the duplicate keys are appearing. But that is where the problem lies.
Instead of using something like {'a':1, ..., 'a':2} to construct the dictionary, I suggest that you use this form: dict([['a', 1], ..., ['a', 2]]) which will create one from a supplied list of [key, value] pairs. This approach will allow you to check the list of pairs for duplicates before passing it to dict() to do the actual construction of the dictionary.
Here's an example of one way to check the list of pairs for duplicates:
sample = [['a', 1], ['b', 2], ['c', 3], ['a', 2]]
def validate(pairs):
# check for duplicate key names and raise an exception if any are found
dups = []
seen = set()
for key_name,val in pairs:
if key_name in seen:
dups.append(key_name)
else:
seen.add(key_name)
if dups:
raise ValueError('Duplicate key names encountered: %r' % sorted(dups))
else:
return pairs
my_dict = dict(validate(sample))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: cycling/scanning though fields in an object - python

Related

Python - Make a Dictionary with a variable number of keys/values

Python 3.6 merge dictionaries fails

Shortest path algorithm using dictionaries [Python]

Appending to the dictionary dynamically

check dictionary for doubled keys [duplicate]

Categories

Resources