Pymongo document query for conversion to INI file - python

{
"name":"food",
"core":{
"group":{
"carbs":{
"USA":{
"breakfast":"potatoes",
"dinner":"pasta"
},
"europe":{
"breakfast":"something",
"dinner":"something"
}
},
"dessert":{
"USA":{
"breakfast":"potatoes",
"dinner":"pasta"
},
"europe":{
"breakfast":"something",
"dinner":"something"
}
},
"veggies":{
"USA":{
"breakfast":"potatoes",
"dinner":"pasta"
},
"europe":{
"breakfast":"something",
"dinner":"something"
}
}
}
}
}
I am trying to find a way to just return the core groups (carbs, dessert, and veggies).
i have the following python code but it returns all the values in the document, when i am only interested in the core.group
data = collection.foodie.find({"name":"food"},{"name":False, '_id':False, "core.groups.carbs.europe":False, "core.groups.dessert.europe":False, "core.groups.veggies.europe":False
The end result is an INI file with the type of group as the section name for USA only like below, which is why I am looking for a way to just list the groups so i can dynamically look for the actual values stored for when I do not know the groups (have them hardcoded like core.group.dessert.europe)
[carbs]
breakfast=potatoes
dinner=pasta
[dessert]
breakfast=potatoes
dinner=pasta
[veggies]
breakfast=potatoes
dinner=pasta

Related

Flatten json in Python from XHR-response

Updated: The XHR response was not correct earlier
I'm failing with flatten my json in a correct way from a XHR-response.
I have just expanded one item below, to make it more readable.
I am using python and I have tried, with incorrect outcome.
u = "URL"
SE_units = requests.get(u,headers=h).json()
dp = pd.json_normalize(SE_units,[SE_units,"Items"])
SE_dp_list.append(dp)
From the XHR-Response below I would like to have the Items-information into a CSV but when i do export.to_CSV I see that it haven't been flattened correctly
{"Content":{
"PaginationCount":12,"FilterValues":null,"Items":
[{
"Id":258370,
"OriginalType":"BostadObjectPage",
"PublishDate":null,
"Title":"02 Skogsvagen",
"Image":
{
"description":null,
"alt":null,
"externalUrl":"/abc.jpg"
},
"StaticMapImage":null,
"Url":"/abcd/",
"HideReadMore":false,
"ProjectData":null,
"ObjectData":
{
"BuildingTypeLabel":"Rad-/Kedje-/Parhus",
"ObjectStatus":"SalesInProgress",
"ObjectStatusLabel":"Till salu",
"ObjectNumber":"02",
"City":"staden",
"RoomInterval":"2-3",
"LivingArea":"101",
"SalesPrice":"2 150 000",
"MonthlyFee":null,
"Elevator":false,
"Balcony":false,
"Terrace":true
},
"FastighetProjectData":null,
"FastighetObjectData":null,
"OfficeData":null
},
{
"Id":258372,
"OriginalType":"BostadObjectPage",
"PublishDate":null,
....."same structure as above"
"OfficeData":null
}],
"NoResultsMessage":null,
"SimplifiedBuildingType":null,
"NextIndex":-1,
"TotalCount":12,
"Heading":null,
"ShowMoreLabel":null,
"DataColumns":null,
"Error":null},
"ObjectSearchData":
{
"BuildingVariantId":"Houses",
"BuildingsFoundLabel":" {count}",
"BuildingTypeIds":[400],
"BuildingsAvailableForSale":12,
"BuildingNoResultsLabel":""
}
}
Expected output format after writing to CSV

Unable to replicate post_filter query in elasticsearch-dsl

The query I would like to replicate in DSL is as below:
GET /_search
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
},
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
}
},
"post_filter":{
"range":{
"price_range_from.SGD":{
"gte":0.0,
"lte":100.0
}
}
}
}
The above query
Matches terms - destination and tags_ids
Aggregates to result to find the max price from field price_range_from.SGD
Applies another post_filter to subset the result set within price limits
It works perfectly well in the Elastic/Kibana console.
I replicated the above query in elasticsearch-dsl as below:
es_query = []
es_query.append(Q("term", destination="singapore"))
es_query.append(Q("terms", tag_ids=["tag_luxury"]))
final_query = Q("bool", must=es_query)
es_conn = ElasticSearch.instance().get_client()
dsl_client = DSLSearch(using=es_conn, index=index).get_dsl_client()
dsl_client.query = final_query
dsl_client.aggs.metric("min_price", "min", field="price_range_from.SGD")
dsl_client.aggs.metric("max_price", "max", field="price_range_from.SGD")
q = Q("range", **{"price_range_from.SGD":{"gte": 0.0, "lte": 100.0}})
dsl_client.post_filter(q)
print(dsl_client.to_dict())
response = dsl_client.execute()
print(response.to_dict().get("hits", {}))
Although the aggregations are correct, products beyond the price range are also being returned. There is no error returned but it seems like the post_filter query is not applied.
I dived in the dsl_client object to see whether my query is being captured correctly. I see only the query and aggs but don't see the post_filter part in the object. The query when converted to a dictionary using dsl_client.to_dict() is as below -
{
"query":{
"bool":{
"must":[
{
"term":{
"destination":"singapore"
}
},
{
"terms":{
"tag_ids":[
"tag_luxury"
]
}
}
]
}
},
"aggs":{
"min_price":{
"min":{
"field":"price_range_from.SGD"
}
},
"max_price":{
"max":{
"field":"price_range_from.SGD"
}
}
}
}
Please help. Thanks!
You have to re-assign the dsl_client like:
dsl_client = dsl_client.post_filter(q)

How to scrape attributes from json values

I am trying to scrape some values through a json that looks like:
{
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}
My issue is that the number 531 is different in each json file that I am trying to scrape and what I am trying to grab through this json is the label and is_in value
What I have done so far is that I tried to do something like this but I am stuck and dont know how to do if the 531 is changing to something else
getOption = '{
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}'
for att, values in getOption.items():
print(values)
So how do I possible scrape the value label and is_in?
I'm not sure if you can have several 531 keys but you can loop through them.
getOption = {
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}
attributes = getOption['attributes']
for key in attributes.keys():
for item in attributes[key]['options']:
print(item['label'], item['is_in'])

Pymongo: Unable to find record from mongodb

I have a collection containing country records, I need to find particular country with uid and it's countryId
Below is the sample collection data:
{
"uid": 15024,
"countries": [{
"countryId": 123,
"popullation": 45000000
},
{
"countryId": 456,
"poppulation": 9000000000
}
]
},
{
"uid": 15025,
"countries": [{
"countryId": 987,
"popullation": 560000000
},
{
"countryId": 456,
"poppulation": 8900000000
}
]
}
I have tried with below query in in python but unable to find any result:
foundRecord = collection.find_one({"uid" : 15024, "countries.countryId": 456})
but it return None.
Please help and suggest.
I think following will work better :
foundRecord = collection.find_one({"uid" : 15024,
"countries" : {"$elemMatch" : { "countryId" : 456 }})
Are you sure you're using the same Database / Collection source?
Seems that you're saving results on another collection.
I've tried to reproduce your problem and it works on my mongodb ( note that I'm using v4)
EDIT: Would be nice to have the piece of code where you're defining "collection"

Error while parsing json from IBM watson using python

I am trying to parse out a JSON download using python and here is the download that I have:
{
"document_tone":{
"tone_categories":[
{
"tones":[
{
"score":0.044115,
"tone_id":"anger",
"tone_name":"Anger"
},
{
"score":0.005631,
"tone_id":"disgust",
"tone_name":"Disgust"
},
{
"score":0.013157,
"tone_id":"fear",
"tone_name":"Fear"
},
{
"score":1.0,
"tone_id":"joy",
"tone_name":"Joy"
},
{
"score":0.058781,
"tone_id":"sadness",
"tone_name":"Sadness"
}
],
"category_id":"emotion_tone",
"category_name":"Emotion Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"analytical",
"tone_name":"Analytical"
},
{
"score":0.0,
"tone_id":"confident",
"tone_name":"Confident"
},
{
"score":0.0,
"tone_id":"tentative",
"tone_name":"Tentative"
}
],
"category_id":"language_tone",
"category_name":"Language Tone"
},
{
"tones":[
{
"score":0.0,
"tone_id":"openness_big5",
"tone_name":"Openness"
},
{
"score":0.571,
"tone_id":"conscientiousness_big5",
"tone_name":"Conscientiousness"
},
{
"score":0.936,
"tone_id":"extraversion_big5",
"tone_name":"Extraversion"
},
{
"score":0.978,
"tone_id":"agreeableness_big5",
"tone_name":"Agreeableness"
},
{
"score":0.975,
"tone_id":"emotional_range_big5",
"tone_name":"Emotional Range"
}
],
"category_id":"social_tone",
"category_name":"Social Tone"
}
]
}
}
I am trying to parse out 'tone_name' and 'score' from the above file and I am using following code:
import urllib
import json
url = urllib.urlopen('https://watson-api-explorer.mybluemix.net/tone-analyzer/api/v3/tone?version=2016-05-19&text=I%20am%20happy')
data = json.load(url)
for item in data['document_tone']:
print item["tone_name"]
I keep running into error that tone_name not defined.
As jonrsharpe said in a comment:
data['document_tone'] is a dictionary, but 'tone_name' is a key in dictionaries much further down the structure.
You need to access the dictionary that tone_name is in. If I am understanding the JSON correctly, tone_name is a key within tones, within tone_categories, within document_tone. You would then want to change your code to go to that level, like so:
for item in data['document_tone']['tone_categories']:
# item is an anonymous dictionary
for thing in item[tones]:
print(thing['tone_name'])
The reason more than one for is needed is because of the mix of lists and dictionaries in the file. 'tone_categories is a list of dictionaries, so it accesses each one of those. Then, it iterates through the list tones, which is in each one and full of more dictionaries. Those dictionaries are the ones that contain 'tone_name', so it prints the value of 'tone_name'.
If this does not work, let me know. I was unable to test it since I could not get the rest of the code to work on my computer.
You are incorrectly walking the structure. The root node has a single document_tone key, the value of which only has the tone_categories key. Each of the categories has a list of tones and it's name. Here is how you would print it out (adjust as needed):
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'])
The result of this is:
Category: Emotion Tone
- Anger
- Disgust
- Fear
- Joy
- Sadness
Category: Language Tone
- Analytical
- Confident
- Tentative
Category: Social Tone
- Openness
- Conscientiousness
- Extraversion
- Agreeableness
- Emotional Range

Categories