How do I insert another nested document using pymongo? - python

Ahoy,
I have a document that looks like this:
{"_id": "123abc456def",
"name": "John Smith",
"address": [
{"street": "First St.", "date": "yesterday", "last_updated": "two days ago"}
],
"age": 123}
I try to add another street document using $push, it errors out with:
pymongo.errors.WriteError: The field 'address' must be an array but is of type object in document {_id: ObjectId('6049e88657e43d8801197c72')}
Code I'm using:
mydb3 = myclient["catalogue"]
mycolALL = mydb3["locations"]
query = {"charID": 0}
newvalue = {"$push": {"address": {"street": "test123", "date": "test123", "last_updated": "now123"}}}
mycolALL.update_one(query, newvalue)
Not making an address book or anything, just edited it so it makes a bit more sense to anyone without context.
My desired output would be that the document would look like this:
{"_id": "123abc456def",
"name": "John Smith",
"address": [
{"street": "First St.", "date": "yesterday", "last_updated": "two days ago"},
{"street": "test123", "date": "test123", "last_updated": "now123"}
],
"age": 123}
Normally I can google my way to an answer that makes the coin drop and JACKPOT! but this time I'm outta luck.
$set = it just changes the existing document, effectively replacing it. Which is not what I want.
$addToSet = for arrays only, error message: "pymongo.errors.WriteError: Cannot apply $addToSet to non-array field. Field named 'address' has non-array type object"
Anyone that can help?

Just a guess but are you sure you're looking at the right data / database.
Based on the data you posted your update_one() won't update that record because it doesn't match your filter {"charID": 0}

Related

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here

Group_by and filter Django

I'm trying to make and query in Django,But I can't get the output I want. I want to use group by and filter in Django Query, I tried using annotate by looking at some answers on stackoverflow and some other sites but couldn't make it work . Here's my response on after using filter.
[
{
"id": 11667,
"rate_id": "FIT-PIT2",
"name": "FIT-PIT111",
"pms_room": null,
"description": null,
"checkin": "",
"checkout": "",
"connected_room": null
},
{
"id": 11698,
"rate_id": "343",
"name": "dfggffd",
"pms_room": "5BZ",
"description": null,
"checkin": null,
"checkout": null,
"connected_room": null
},
{
"id": 11699,
"rate_id": "343",
"name": "dfggffd",
"pms_room": "6BZ",
"description": null,
"checkin": null,
"checkout": null,
"connected_room": null
}]
What I want to do is group all those pms_rooms which have same rate_id, roughly something like this
{'343':['5BZ','6BZ'],'FIT-PIT2':[null]}
I can do it using dictionary or list .
But I want to do it directly from query like table.objects.filter(condition).group_by('rate_id') , something SQL equivalent of SELECT *,GROUP_CONCAT('name') FROM TABLE NAME WHERE PMS = hotel.pms GROUP BY rate_id . Can somebody please help me out . Thanks.
table.objects.filter(condition).values('rate_id'), check out the doc https://docs.djangoproject.com/en/3.0/ref/models/querysets/
Since your example have mentioned GROUP_CONCAT, I'll assume that you are using MySQL. Django did not support GROUP_CONCAT natively, yet you can try django-MySQL, which is supporting an equivalent database function GroupConcat. Then you can make a query like this:
table.objects.values('rate_id').annotate(grouped_rooms=GroupConcat('pms_room'))
The result may be like this:
[
{
'rate_id': '343',
'grouped_rooms': '5BZ,6BZ',
},
{
'rate_id': 'FIT-PIT2',
'grouped_rooms': '',
},
...
]
Not actually meet the format you mentioned in OP, yet you may do some post process to this result in native python for making it meet what you expected.

Accessing nested objects with python

I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here
Like so
addrs=data['items'][2]['location']['address']
Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.
data["items"][2]["location"]["address"]
This will access the address for you.
You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]

Counting Items in Python from a JSON file

I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.

Getting json file fields issue

OK, I am new to python but what I am trying to do is to access specific fields from a json text file
my json text file is like this:
{
"paging": {
"next": "https://graph.facebook.com/search?limit=5000&offset=5000&type=page&q=%26&locale=ar_AR&access_token=CAACEdEose0cBAD7z1vK0aO2Mlb1QZBOb9OwjYZCZBZB56P0MrYnt54WJYZCZBy4ZBv4zaYG0mj9ZCZAMkZBmlP83E885ykZAafog7QbcWwEtvRXfjtVa12DBnW8omWsnC8N6lsmNK7yktI89kBDdrTH9TOIdATHdsX5OewWhzGTpXDelSjE8HAbtcn08zSWsweDc4UZD&__after_id=139433456868"
},
"data": [
{
"category": "\u0627\u0644\u062a\u0639\u0644\u064a\u0645",
"name": "The London School of Economics and Political Science - LSE",
"category_list": [
{
"id": "108051929285833",
"name": "\u0627\u0644\u0643\u0644\u064a\u0629 \u0648\u0627\u0644\u062c\u0627\u0645\u0639\u0629"
},
{
"id": "187751327923426",
"name": "\u0645\u0646\u0638\u0645\u0629 \u062a\u0639\u0644\u064a\u0645\u064a\u0629"
}
],
"id": "6127898346"
},
the filed that I want to access is the 'category_list' filed in order to get the 'id' filed
I have tried some thing like this:
import json
idvalue = []
jsonFile = open('samples0.txt', 'r')
values = json.load(jsonFile)
jsonFile.close()
idValue = values['data'][0]['category_list'][0]['id']
print idvalue
but it keeps telling me that there is a key error.
what I am missing here?
what is the wrong thing I am doing?
any help please
edit :
my code returning null I still cannot understand why?
values['data'][0]['category_list'] is a list, so something like values['data'][0]['category_list'][0]['id'] should work.
No need to declare idValue. Just use it as
idValue = values['data'][0]['category_list'][0]['id']

Categories