I have a collection with documents like:
{'state': 'NY', 'DOB': '2000-01-02'},
{'state': 'NY', 'DOB': '2002/03/04'},
{'state': 'NY', 'DOB': '00-00-00'},
{'state': 'NY', 'DOB': 'male'},
...
I want outputs like:
{'state': 'NY', 'DOB': '2000-01-02', 'Age': 21},
{'state': 'NY', 'DOB': '2002/03/04', 'Age': 19},
{'state': 'NY', 'DOB': '00-00-00', 'Age': None}, # or Mongo None equivalent
{'state': 'NY', 'DOB': 'male', 'Age': None}, # or Mongo None equivalent
...
I'm constructing aggregation queries in PyMongo, and I'm wondering if there's an aggregate way to try to convert a field to Mongo Date object and then extract Age from it, else (if a date cannot be extracted), return None. Some condition in the shell below?
def map_age(state, city)
db.aggregate([
{'$match': {
'state': state,
'DOB': {"$exists": True},
'Age': {"$exists": False}
}},
{...}
])
You can try,
$let to create variable for dob convert and do operation
$dateFromString to convert in to date from string if its in valid then replace with "None"
$subtract minus converted date from current date $$NOW you can use new Date() as well
$divide above subtract date by "31536000000" means "3652460601000"
$round to round age number
db.aggregate([
{
$set: {
Age: {
$let: {
vars: {
dob: {
$dateFromString: {
dateString: "$DOB",
onError: "None"
}
}
},
in: {
$cond: [
{ $eq: ["$$dob", "None"] },
"None",
{
$round: {
$divide: [
{ $subtract: ["$$NOW", "$$dob"] },
31536000000 // 365*24*60*60*1000
]
}
}
]
}
}
}
}
}
])
Playground
As suggested by #prasad_, you have to make use of the $dateFromString operator in either the $project or $addFields stage.
db.collection.aggregate([
{
"$project": {
"age": {
"$dateFromString": {
dateString: "$DOB",
onError: null,
onNull: null,
}
}
}
}
])
Related
I have a Mongo collection that I'm trying to update (in PyMongo). I have documents like {'state': 'ny', 'city': 'nyc', 'race': 'b', 'ethnicity': 'h'} and I'm trying to bulk operate on these documents by matching certain criteria and created a concatenated 'race_ethnicity' field.
A pseudo example might be:
filter = {
'state': 'ny',
'city': 'nyc',
'race': {"$exists": True},
'ethnicity': {"$exists": True},
'race_ethnicity': {"$exists": False}
}
action = {'$addFields': {'race_ethnicity': {'$concat': ['$race', '$ethnicity']}}}))
Using the above document, the updated document would be: {'state': 'ny', 'city': 'nyc', 'race': 'b', 'ethnicity': 'h', 'race_ethnicity': 'bh'}.
I want to bulk match, and bulk update a collection like this — how do I go about this without getting a BulkWriteError?
*** What I tried:
updates = []
updates.append(UpdateMany({
'state': 'ny',
'city': 'nyc',
"race": {"$exists": True},
"ethnicity": {"$exists": True},
"race_ethnicity": {"$exists": False}
},
{'$addFields': {'race_ethnicity': {'$concat': ['$race', '$ethnicity']}}}))
self.db.bulk_write(updates)
This produced the following error:
pymongo.errors.BulkWriteError: batch op errors occurred
Your bulk write payload is not correct as per bulkWrite() syntax,
$addFields is an aggregation pipeline stage you can not use in regular update queries, so to resolve this You can use update with aggregation pipeline starting from MongoDB 4.2,
updates = []
updates.append({
'updateMany': {
'filter': {
'state': 'ny',
'city': 'nyc',
'race': { '$exists': True},
'ethnicity': { '$exists': True},
'race_ethnicity': { '$exists': False}
},
'update': [
{
'$set': { 'race_ethnicity': { '$concat': ['$race', '$ethnicity'] } }
}
]
}
})
self.db.bulk_write(updates)
Playground
I need to extract 2 values from this list of dictionary and store it as a key-value pair.
Here I attached sample data..Where I need to extract "Name" and "Service" from this input and store it as a dictionary. Where "Name" is Key and corresponding "Service" is its value.
Input:
response = {
'Roles': [
{
'Path': '/',
'Name': 'Heera',
'Age': '25',
'Policy': 'Policy1',
'Start_Month': 'January',
'PolicyDocument':
{
'Date': '2012-10-17',
'Statement': [
{
'id': '',
'RoleStatus': 'New_Joinee',
'RoleType': {
'Service': 'Service1'
},
'Action': ''
}
]
},
'Duration': 3600
},
{
'Path': '/',
'Name': 'Prem',
'Age': '40',
'Policy': 'Policy2',
'Start_Month': 'April',
'PolicyDocument':
{
'Date': '2018-11-27',
'Statement': [
{
'id': '',
'RoleStatus': 'Senior',
'RoleType': {
'Service': ''
},
'Action': ''
}
]
},
'Duration': 2600
},
]
}
From this input, I need output as a dictionary type.
Output Format: { Name : Service }
Output:
{ "Heera":"Service1","Prem" : " "}
My try:
Role_name =[]
response = {#INPUT WHICH I SPECIFIED ABOVE#}
roles = response['Roles']
for role in roles:
Role_name.append(role['Name'])
print(Role_name)
I need to pair the name with its corresponding service. Any help would be really appreciable.
Thanks in advance.
You just have to write a long line which can reach till the key 'Service'.
And you a syntax error in line Start_Month': 'January') and 'Start_Month': 'April'). You can't have one unclosed brackets.
Fix it and run the following.
This is the code:
output_dict = {}
for r in response['Roles']:
output_dict[r["Name"]] = r['PolicyDocument']['Statement'][0]['RoleType']['Service']
print(output_dict)
Output:
{'Heera': 'Service1', 'Prem': ''}
You just have to do like this:
liste = []
for role in response['Roles']:
liste.append(
{
role['Name']:role['PolicyDocument']['Statement'][0]['RoleType']['Service'],
}
)
print(liste)
It seems your input data is structured kind of strange and I am not sure what the ) are doing next to the months since they make things invalid but here is a working script assuming you removed the parenthesis from your input.
response = {
'Roles': [
{
'Path': '/',
'Name': 'Heera',
'Age': '25',
'Policy': 'Policy1',
'Start_Month': 'January',
'PolicyDocument':
{
'Date': '2012-10-17',
'Statement': [
{
'id': '',
'RoleStatus': 'New_Joinee',
'RoleType': {
'Service': 'Service1'
},
'Action': ''
}
]
},
'Duration': 3600
},
{
'Path': '/',
'Name': 'Prem',
'Age': '40',
'Policy': 'Policy2',
'Start_Month': 'April',
'PolicyDocument':
{
'Date': '2018-11-27',
'Statement': [
{
'id': '',
'RoleStatus': 'Senior',
'RoleType': {
'Service': ''
},
'Action': ''
}
]
},
'Duration': 2600
},
]
}
output = {}
for i in response['Roles']:
output[i['Name']] = i['PolicyDocument']['Statement'][0]['RoleType']['Service']
print(output)
This should give you what you want in a variable called role_services:
role_services = {}
for role in response['Roles']:
for st in role['PolicyDocument']['Statement']:
role_services[role['Name']] = st['RoleType']['Service']
It will ensure you'll go through all of the statements within that data structure but be aware you'll overwrite key-value pairs as you traverse the response, if they exist in more than a single entry!
A reference on for loops which might be helpful, illustrates using if statements within them which can help you to extend this to check if items already exist!
Hope that helps
I want to retrieve max IOPS utilized by EBS volume in the last 2 weeks. I am using cloudwatch get_metric_data function to obtain data about metric VolumeReadOps and VolumeWriteOps. I am using following code to get VolumeReadOps and VolumeWriteOps and then trying to calculate MaxIOPS:
This is the function to get metric values:
def cloudwatch_metric_value(CWsession,NameSpace,ResourceIdentifier,vStat,vUnit,vMetricName,vPeriod):
"""
Function that returns metric value of cloudwatch for a given resource and metric Name
"""
if NameSpace=='EBS':
responseCW = CWsession.get_metric_data(
MetricDataQueries=[
{
'Id': 'string',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': vMetricName,
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
],
StartTime=vStartTime,
EndTime=vEndTime,
)
vValue=responseCW['MetricDataResults'][0]['Values']
vTimeStamps=responseCW['MetricDataResults'][0]['Timestamps']
index, value = max(enumerate(vValue), key=operator.itemgetter(1))
metric_value=value
metric_time=vTimeStamps[index]
return metric_time,metric_value
From main, it is called like following:
metric_time,metric_value = cloudwatch_metric_value(cloudwatch,'EBS',v['VolumeId'],'Sum','Count','VolumeReadOps',300)
vReadIOPS=metric_value
metric_time,metric_value = cloudwatch_metric_value(cloudwatch,'EBS',v['VolumeId'],'Sum','Count','VolumeWriteOps',300)
vWriteIOPS=metric_value
vTotalIOPS=round((vReadIOPS+vWriteIOPS)/300)
I understand that IOPS are calculated by diving the ReadOps/Write with duration. The values I get from this code for MaxIOPS for a given volume doesn't match with the values I see for same in cloudwatch console. Please advise if I am doing this in right way?
Thanks.
Ok, I was able to fix and here is the working function:
def cloudwatch_metric_value(CWsession,NameSpace,ResourceIdentifier,vStat,vUnit,vPeriod):
"""
Function that returns metric value of cloudwatch for a given resource and metric Name
"""
if NameSpace=='EBS':
responseCW = CWsession.get_metric_data(
MetricDataQueries=[
{
'Id': 'string1',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': 'VolumeReadOps',
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
{
'Id': 'string2',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': 'VolumeWriteOps',
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
],
StartTime=vStartTime,
EndTime=vEndTime,
)
vReadValue=responseCW['MetricDataResults'][0]['Values']
vReadTimeStamps=responseCW['MetricDataResults'][0]['Timestamps']
vWriteValue=responseCW['MetricDataResults'][1]['Values']
vWriteTimeStamps=responseCW['MetricDataResults'][1]['Timestamps']
vReadWriteValue = [vReadValue[i]+vWriteValue[i] for i in range(len(vWriteValue))]
if vReadWriteValue:
metric_value = max(vReadWriteValue)
metric_time = vReadTimeStamps[vReadWriteValue.index(metric_value)]
metric_value = metric_value / 300
else:
metric_value=1
metric_time=date_t
return metric_time,metric_value
I have an array where I am trying to group the subarrays of the objects together if the key value pair is equal to userID.
Leaving me with one object, per userID with all the sub-arrays of that userID.
I can't seem to figure out how to do this, even after trawling through SO.
How do I group the subarrays where the userID's are the same?
(the data changes so I need to use a for loop)
Thanks for the help.
The array looks like this:
[
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'blue',
'animal':'dog'
}
]
},
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'red',
'animal':'cat'
}
]
},
{
'name':'Billy',
'lastname':'King',
'userID': 1004,
'subarray':[
{
'color':'green',
'animal':'fish'
}
]
}
]
I want to make the array like this:
[
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'blue',
'animal':'dog'
},
{
'color':'red',
'animal':'cat'
}
]
},
{
'name':'Billy',
'lastname':'King',
'userID': 1004,
'subarray':[
{
'color':'green',
'animal':'fish'
}
]
}
]
Using a simple iteration.
Ex:
result = {}
for item in data:
if item["userID"] not in result:
result[item["userID"]] = {'name':item["name"], 'lastname':item["lastname"],'userID': item["userID"],'subarray':[]}
result[item["userID"]]['subarray'].append(item["subarray"])
print(list(result.values()))
Output:
[{'lastname': 'Bond',
'name': 'James',
'subarray': [[{'animal': 'dog', 'color': 'blue'}],
[{'animal': 'cat', 'color': 'red'}]],
'userID': 1001},
{'lastname': 'King',
'name': 'Billy',
'subarray': [[{'animal': 'fish', 'color': 'green'}]],
'userID': 1004}]
I have a JSON with following structure:
{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}
It's a JSON with two key-value-pairs. The second pair's value is a List of more JSON's.
For me it is too much information and I want to have a JSON like this:
{
'apps' : [
{
'name': 'Appname',
'app_id' : 1234,
'organization_id' : 'Blablabla'
},
{
'name': 'Appname2',
'app_id' : 5678,
'organization_id' : 'Some other Organization'
}
]
}
I want to have a JSON that only contains one key ("apps") and its value, which would be a List of more JSONs that only have three key-value-pairs..
I am thankful for any advice.
Thank you for your help!
#bishakh-ghosh I don't think you need to use the input json as string. It can be used straight as a dictionary. (thus avoid ast)
One more concise way :
# your original json
input_ = { 'count': 93, ... }
And here are the steps :
Define what keys you want to keep
slice_keys = ['name', 'app_id', 'organization_id']
Define the new dictionary as a slice on the slice_keys
dict(apps=[{key:value for key,value in d.items() if key in slice_keys} for d in input_['apps']])
And that's it.
That should yield the JSON formatted as you want, e.g
{
'apps':
[
{'app_id': 27, 'name': '---', 'organization_id': '--'},
{'app_id': 87, 'name': '---', 'organization_id': '---'}
]
}
This might be what you are looking for:
import ast
import json
json_str = """{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}"""
json_dict = ast.literal_eval(json_str)
new_dict = {}
app_list = []
for appdata in json_dict['apps']:
appdata_dict = {}
appdata_dict['name'] = appdata['name']
appdata_dict['app_id'] = appdata['app_id']
appdata_dict['organization_id'] = appdata['organization_id']
app_list.append(appdata_dict)
new_dict['apps'] = app_list
new_json_str = json.dumps(new_dict)
print(new_json_str) # This is your resulting json string