Multiple FOR loops in iterating over dictionary in Python - python

This is a simplistic example of a dictionary created by a json.load that I have t deal with:
{
"name": "USGS REST Services Query",
"queryInfo": {
"timePeriod": "PT4H",
"format": "json",
"data": {
"sites": [{
"id": "03198000",
"params": "[00060, 00065]"
},
{
"id": "03195000",
"params": "[00060, 00065]"
}]
}
}
}
Sometimes there may be 15-100 sites with unknown sets of parameters at each site. My goal is to either create two lists (one storing "site" IDs and the other storing "params") or a much simplified dictionary from this original dictionary. Is there a way to do this using nested for loops with kay,value pairs using the iteritem() method?
What I have tried to far is this:
queryDict = {}
for key,value in WS_Req_dict.iteritems():
if key == "queryInfo":
if value == "data":
for key, value in WS_Req_dict[key][value].iteritems():
if key == "sites":
siteVal = key
if value == "params":
paramList = [value]
queryDict["sites"] = siteVal
queryDict["sites"]["params"] = paramList
I run into trouble getting the second FOR loop to work. I haven't looked into pulling out lists yet.
I think this maybe an overall stupid way of doing it, but I can't see around it yet.

I think you can make your code much simpler by just indexing, when feasible, rather than looping over iteritems.
for site in WS_Req_dict['queryInfo']['data']['sites']:
queryDict[site['id']] = site['params']
If some of the keys might be missing, dict's get method is your friend:
for site in WS_Req_dict.get('queryInfo',{}).get('data',{}).get('sites',[]):
would let you quietly ignore missing keys. But, this is much less readable, so, if I needed it, I'd encapsulate it into a function -- and often you may not need this level of precaution! (Another good alternative is a try/except KeyError encapsulation to ignore missing keys, if they are indeed possible in your specific use case).

Related

Count unique values in a JSON

I have a json called thefile.json which looks like this:
{
"domain": "Something",
"domain": "Thingie",
"name": "Another",
"description": "Thing"
}
I am trying to write a python script which would made a set of the values in domain. In this example it would return
{'Something', 'Thingie'}
Here is what I tried:
import json
with open("thefile.json") as my_file:
data = json.load(my_file)
ids = set(item["domain"] for item in data.values())
print(ids)
I get the error message
unique_ids.add(item["domain"])
TypeError: string indices must be integers
Having looked up answers on stack exchange, I'm stumped. Why can't I have a string as an index, seeing as I am using a json whose data type is a dictionary (I think!)? How do I get it so that I can get the values for "domain"?
So, to start, you can read more about JSON formats here: https://www.w3schools.com/python/python_json.asp
Second, dictionaries must have unique keys. Therefore, having two keys named domain is incorrect. You can read more about python dictionaries here: https://www.w3schools.com/python/python_dictionaries.asp
Now, I recommend the following two designs that should do what you need:
Multiple Names, Multiple Domains: In this design, you can access websites and check the domain of each of its values like ids = set(item["domain"] for item in data["websites"])
{
"websites": [
{
"domain": "Something.com",
"name": "Something",
"description": "A thing!"
},
{
"domain": "Thingie.com",
"name": "Thingie",
"description": "A thingie!"
},
]
}
One Name, Multiple Domains: In this design, each website has multiple domains that can be accessed using JVM_Domains = set(data["domains"])
{
"domains": ["Something.com","Thingie.com","Stuff.com"]
"name": "Me Domains",
"description": "A list of domains belonging to Me"
}
I hope this helps. Let me know if I missed any details.
You have a problem in your JSON, duplicate keys. I am not sure if it is forbiden, but I am sure it is bad formatted.
Besides that, of course it is gonna bring you lot of problems.
A dictionary can not have duplicate keys, what would be the return of a duplicate key?.
So, fix your JSON, something like this,
{
"domain": ["Something", "Thingie"],
"name": "Another",
"description": "Thing"
}
Guess what, good format almost solve your problem (you can have duplicates in the list) :)

How to collect specific values in a deeply nested structure with Python

I'm trying to get a list of instance IDs that I get from the describe_instances call using boto3 api in my python script. For those of you who aren't aware of aws, I can post a detailed code after removing the specifics if you need it. I'm trying to access a item from a structure like this
u'Reservations':[
{
u'Instances':[
{
u'InstanceId':'i-0000ffffdd'
},
{ }, ### each of these dict contain a id like above
{ },
{ },
{ }
]
},
{
u'Instances':[
{ },
{ },
{ },
{ },
{ }
]
},
{
u'Instances':[
{ }
]
}
]
I'm currently accessing it like
instanceLdict = []
instanceList = []
instances = []
for r in reservations:
instanceList.append(r['Instances'])
for ilist in instanceList:
for i in ilist:
instanceLdict.append(i)
for i in instanceLdict:
instances.append(i['InstanceId']) ####i need them in a list
print instances
fyi: my reservations variable contains the whole list of u'Reservations':
I feel this is inefficient and since I'm a python newbie I really think there must be some better way to do this rather than the multiple for and if. Is there a better way to do this? Kindly point to the structure/method etc., that might be useful in my scenario
Your solution is not actually that inefficient, except you don't really have to create all those top level lists just to save the instance ids in the end. What you could do is a nested loop and keep only what you need:
instances = list()
for r in reservations:
for ilist in r['Instances']:
for i in ilist:
instances.append(i['InstanceId']) # That's what you looping for
Yes, there are ways to do this with shorter code, but explicit is better than implicit and stick to what you can read best. Python is quite good with iterations and remember maintainability first, performance second. Also, this part is hardly the bottleneck of what you doing after all those API calls, DB lookups etc.
But if you really insist to do fancy one-liner, go have a look at itertools helpers, chain.from_iterable() is what you need:
from itertools import chain
instances = [i['InstanceId'] for i in chain.from_iterable(r['Instances'] for r in reservations)]

MongoDB Update with Array Filters [duplicate]

I am trying to update a value in the nested array but can't get it to work.
My object is like this
{
"_id": {
"$oid": "1"
},
"array1": [
{
"_id": "12",
"array2": [
{
"_id": "123",
"answeredBy": [], // need to push "success"
},
{
"_id": "124",
"answeredBy": [],
}
],
}
]
}
I need to push a value to "answeredBy" array.
In the below example, I tried pushing "success" string to the "answeredBy" array of the "123 _id" object but it does not work.
callback = function(err,value){
if(err){
res.send(err);
}else{
res.send(value);
}
};
conditions = {
"_id": 1,
"array1._id": 12,
"array2._id": 123
};
updates = {
$push: {
"array2.$.answeredBy": "success"
}
};
options = {
upsert: true
};
Model.update(conditions, updates, options, callback);
I found this link, but its answer only says I should use object like structure instead of array's. This cannot be applied in my situation. I really need my object to be nested in arrays
It would be great if you can help me out here. I've been spending hours to figure this out.
Thank you in advance!
General Scope and Explanation
There are a few things wrong with what you are doing here. Firstly your query conditions. You are referring to several _id values where you should not need to, and at least one of which is not on the top level.
In order to get into a "nested" value and also presuming that _id value is unique and would not appear in any other document, you query form should be like this:
Model.update(
{ "array1.array2._id": "123" },
{ "$push": { "array1.0.array2.$.answeredBy": "success" } },
function(err,numAffected) {
// something with the result in here
}
);
Now that would actually work, but really it is only a fluke that it does as there are very good reasons why it should not work for you.
The important reading is in the official documentation for the positional $ operator under the subject of "Nested Arrays". What this says is:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Specifically what that means is the element that will be matched and returned in the positional placeholder is the value of the index from the first matching array. This means in your case the matching index on the "top" level array.
So if you look at the query notation as shown, we have "hardcoded" the first ( or 0 index ) position in the top level array, and it just so happens that the matching element within "array2" is also the zero index entry.
To demonstrate this you can change the matching _id value to "124" and the result will $push an new entry onto the element with _id "123" as they are both in the zero index entry of "array1" and that is the value returned to the placeholder.
So that is the general problem with nesting arrays. You could remove one of the levels and you would still be able to $push to the correct element in your "top" array, but there would still be multiple levels.
Try to avoid nesting arrays as you will run into update problems as is shown.
The general case is to "flatten" the things you "think" are "levels" and actually make theses "attributes" on the final detail items. For example, the "flattened" form of the structure in the question should be something like:
{
"answers": [
{ "by": "success", "type2": "123", "type1": "12" }
]
}
Or even when accepting the inner array is $push only, and never updated:
{
"array": [
{ "type1": "12", "type2": "123", "answeredBy": ["success"] },
{ "type1": "12", "type2": "124", "answeredBy": [] }
]
}
Which both lend themselves to atomic updates within the scope of the positional $ operator
MongoDB 3.6 and Above
From MongoDB 3.6 there are new features available to work with nested arrays. This uses the positional filtered $[<identifier>] syntax in order to match the specific elements and apply different conditions through arrayFilters in the update statement:
Model.update(
{
"_id": 1,
"array1": {
"$elemMatch": {
"_id": "12","array2._id": "123"
}
}
},
{
"$push": { "array1.$[outer].array2.$[inner].answeredBy": "success" }
},
{
"arrayFilters": [{ "outer._id": "12" },{ "inner._id": "123" }]
}
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Because the structure is "nested", we actually use "multiple filters" as is specified with an "array" of filter definitions as shown. The marked "identifier" is used in matching against the positional filtered $[<identifier>] syntax actually used in the update block of the statement. In this case inner and outer are the identifiers used for each condition as specified with the nested chain.
This new expansion makes the update of nested array content possible, but it does not really help with the practicality of "querying" such data, so the same caveats apply as explained earlier.
You typically really "mean" to express as "attributes", even if your brain initially thinks "nesting", it's just usually a reaction to how you believe the "previous relational parts" come together. In reality you really need more denormalization.
Also see How to Update Multiple Array Elements in mongodb, since these new update operators actually match and update "multiple array elements" rather than just the first, which has been the previous action of positional updates.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.
I know this is a very old question, but I just struggled with this problem myself, and found, what I believe to be, a better answer.
A way to solve this problem is to use Sub-Documents. This is done by nesting schemas within your schemas
MainSchema = new mongoose.Schema({
array1: [Array1Schema]
})
Array1Schema = new mongoose.Schema({
array2: [Array2Schema]
})
Array2Schema = new mongoose.Schema({
answeredBy": [...]
})
This way the object will look like the one you show, but now each array are filled with sub-documents. This makes it possible to dot your way into the sub-document you want. Instead of using a .update you then use a .find or .findOne to get the document you want to update.
Main.findOne((
{
_id: 1
}
)
.exec(
function(err, result){
result.array1.id(12).array2.id(123).answeredBy.push('success')
result.save(function(err){
console.log(result)
});
}
)
Haven't used the .push() function this way myself, so the syntax might not be right, but I have used both .set() and .remove(), and both works perfectly fine.

Parsing JSON with Python from URL

So I'm trying to get json from a URl and the request works and I get the json but I'm not able to print specific things from it.
request_url = 'http://api.tumblr.com/v2/user/following?limit=1'
r = requests.get(request_url, auth=oauth).json()
r["updated"]
I'm very new with python I'm guessing I need to get the json into a array but I have no idea where to even begin.
According to the tumblr api I should be able to get something like this.
{
"meta": {
"status": 200,
"msg": "OK"
},
"response": {
"total_blogs": 4965,
"blogs": [
{
"name": "xxxxx",
"title": "xxxxxx",
"description": "",
"url": "http://xxxxxx.tumblr.com/",
"updated": 1439795949
}
]
}
}
I only need the name, url, and updated just no idea how to seperate that out.
Just access the levels one by one.
for i in r["response"]["blogs"]:
print i["name"],i["url"],i["updated"]
So this code can be used to print all the objects inside the blogs list
To explain how this works:
Json objects are decoded into something called dictionaries in Python. Dictionaries are simple key value pairs. In your example,
r is a dictionary with the following keys:
meta, response
You access the value of a key using r["meta"].
Now meta itself is a dictionary. The keys associated are:
status,msg
So, r["meta"]["status"] gives the status value returned by the request.
You should be able to print values as though it were nested arrays:
r["response"]["blogs"][0]["updated"] should get you the updated bit, don't go straight to it. Just work your way down. Note how blogs is an array, so in a normal case you may actually want to work towards r["response"]["blogs"], then loop through it and for each of those items, grab the ["updated"].
Similarly, r["meta"]["msg"] will get you the meta message.
The JSON data gets converted as dict which is set to r as per your code.
For accessing the value associated with updated key, you need to first access the values before it.
You should first access r["response"] which contains the actual response of the api. From that level, you should next access r["response"]["blogs"] and then loop through that to find the value of the updated key.
If it is a single blog, you can do something like r["response"]["blogs"][0]["updated"]

Access a variable within a dictionary with unknown nested location

I have a JSON file and I want to query it using python. However, I do not know the nested location of a variable before hand. E.g. to query a JSON object below loaded into python and called 'data', I could do the following:
data['experiments']['initial_ns']['icdat']
However, this assumes that I know that the icdat variable is located below initial_ns which is located under experiments. Unfortunately I do not have this information and also the JSON structure could change in the future. Is there a simpler variable to access variables within a JSON string without explicitly specifying the entire structure?
thanks!!!
{
"experiments": [
{
"management": {
"events": [
{
"date": "19122",
"timp": "TI3",
"eve": "tage"
}
]
},
"initial_ns": {
"icpcr": "MZ",
"icdat": "1922"
},
"observed": {
"mdat": "19403",
"time_series": [
{
"date": "198423",
"etac": "0"
}
],
"adat": "190218"
},
"local_name": "lhi",
"exname": "SE",
"exp_dur": "1"
}
]
}
Have a look at the jsonpath module. http://goessner.net/articles/JsonPath/. I think the search string $..icdat will match your needs.
"...without explicitly specifying the entire structure?"
Yes, there are many ways. Unfortunately you have not specified which answer you are looking for.
To be "unique in terms of the schema" (my terminology) is as follows: If you have for example multiple Foo dictionaries with the key Foo.bar, then that is still unique. What is not unique is if you have Foo objects with Foo.bar, and Baz objects with Baz.bar: searching for {... baz:...} will return different kinds of objects.
If the key is unique in terms of the schema, you can search the entire tree. You can make this go faster by caching all key-value pairs in a dictionary for later use (therefore the operation is O(1) "instant" amortized cost, since you needed to go through the entire data structure anyway to parse it!). This even works if you would like to return sets of objects: use a cache = collections.defaultdict(set) and when you preprocess items to cache, do cache[key].add(value).
If the key is not unique in terms of the schema, you will want to make a reasonable guess about the path and provide some partial information, per Hans Then's answer utilization JsonPath: https://stackoverflow.com/a/12291240/711085 (alternatively, change the schema)
No. You need to know the format, or you'll have to manually loop over everything in it.
You can write a function to recursively search nested containers for a given key, similar to findElementByID() in an XML DOM parser.
def find_key(json, key):
if isinstance(json, dict):
if key in json:
yield json[key]
if isinstance(json, (dict, list)):
for value in (json.itervalues() if isinstance(json, dict) else json):
if isinstance(value, (dict, list)):
for item in find_key(value, key):
yield item
>>> next(items_by_key(data, "icdat"))
'1922'
Since the same key may be found in multiple places in the document, this is actually written as a generator. You can iterate over the results to get all the values or, if you just want the first one (or know it's the only one), use next() around it as I've shown above. You could also convert it to a list() if desired.

Categories