Mongo messy DOB string field to Age - python

I have a collection with documents like:
{'state': 'NY', 'DOB': '2000-01-02'},
{'state': 'NY', 'DOB': '2002/03/04'},
{'state': 'NY', 'DOB': '00-00-00'},
{'state': 'NY', 'DOB': 'male'},
...
I want outputs like:
{'state': 'NY', 'DOB': '2000-01-02', 'Age': 21},
{'state': 'NY', 'DOB': '2002/03/04', 'Age': 19},
{'state': 'NY', 'DOB': '00-00-00', 'Age': None}, # or Mongo None equivalent
{'state': 'NY', 'DOB': 'male', 'Age': None}, # or Mongo None equivalent
...
I'm constructing aggregation queries in PyMongo, and I'm wondering if there's an aggregate way to try to convert a field to Mongo Date object and then extract Age from it, else (if a date cannot be extracted), return None. Some condition in the shell below?
def map_age(state, city)
db.aggregate([
{'$match': {
'state': state,
'DOB': {"$exists": True},
'Age': {"$exists": False}
}},
{...}
])

You can try,
$let to create variable for dob convert and do operation
$dateFromString to convert in to date from string if its in valid then replace with "None"
$subtract minus converted date from current date $$NOW you can use new Date() as well
$divide above subtract date by "31536000000" means "3652460601000"
$round to round age number
db.aggregate([
{
$set: {
Age: {
$let: {
vars: {
dob: {
$dateFromString: {
dateString: "$DOB",
onError: "None"
}
}
},
in: {
$cond: [
{ $eq: ["$$dob", "None"] },
"None",
{
$round: {
$divide: [
{ $subtract: ["$$NOW", "$$dob"] },
31536000000 // 365*24*60*60*1000
]
}
}
]
}
}
}
}
}
])
Playground

As suggested by #prasad_, you have to make use of the $dateFromString operator in either the $project or $addFields stage.
db.collection.aggregate([
{
"$project": {
"age": {
"$dateFromString": {
dateString: "$DOB",
onError: null,
onNull: null,
}
}
}
}
])

Related

Mongo bulk filter and add concatenated field

I have a Mongo collection that I'm trying to update (in PyMongo). I have documents like {'state': 'ny', 'city': 'nyc', 'race': 'b', 'ethnicity': 'h'} and I'm trying to bulk operate on these documents by matching certain criteria and created a concatenated 'race_ethnicity' field.
A pseudo example might be:
filter = {
'state': 'ny',
'city': 'nyc',
'race': {"$exists": True},
'ethnicity': {"$exists": True},
'race_ethnicity': {"$exists": False}
}
action = {'$addFields': {'race_ethnicity': {'$concat': ['$race', '$ethnicity']}}}))
Using the above document, the updated document would be: {'state': 'ny', 'city': 'nyc', 'race': 'b', 'ethnicity': 'h', 'race_ethnicity': 'bh'}.
I want to bulk match, and bulk update a collection like this — how do I go about this without getting a BulkWriteError?
*** What I tried:
updates = []
updates.append(UpdateMany({
'state': 'ny',
'city': 'nyc',
"race": {"$exists": True},
"ethnicity": {"$exists": True},
"race_ethnicity": {"$exists": False}
},
{'$addFields': {'race_ethnicity': {'$concat': ['$race', '$ethnicity']}}}))
self.db.bulk_write(updates)
This produced the following error:
pymongo.errors.BulkWriteError: batch op errors occurred
Your bulk write payload is not correct as per bulkWrite() syntax,
$addFields is an aggregation pipeline stage you can not use in regular update queries, so to resolve this You can use update with aggregation pipeline starting from MongoDB 4.2,
updates = []
updates.append({
'updateMany': {
'filter': {
'state': 'ny',
'city': 'nyc',
'race': { '$exists': True},
'ethnicity': { '$exists': True},
'race_ethnicity': { '$exists': False}
},
'update': [
{
'$set': { 'race_ethnicity': { '$concat': ['$race', '$ethnicity'] } }
}
]
}
})
self.db.bulk_write(updates)
Playground

How to extract values from list and store it as dictionary(key-value pair)?

I need to extract 2 values from this list of dictionary and store it as a key-value pair.
Here I attached sample data..Where I need to extract "Name" and "Service" from this input and store it as a dictionary. Where "Name" is Key and corresponding "Service" is its value.
Input:
response = {
'Roles': [
{
'Path': '/',
'Name': 'Heera',
'Age': '25',
'Policy': 'Policy1',
'Start_Month': 'January',
'PolicyDocument':
{
'Date': '2012-10-17',
'Statement': [
{
'id': '',
'RoleStatus': 'New_Joinee',
'RoleType': {
'Service': 'Service1'
},
'Action': ''
}
]
},
'Duration': 3600
},
{
'Path': '/',
'Name': 'Prem',
'Age': '40',
'Policy': 'Policy2',
'Start_Month': 'April',
'PolicyDocument':
{
'Date': '2018-11-27',
'Statement': [
{
'id': '',
'RoleStatus': 'Senior',
'RoleType': {
'Service': ''
},
'Action': ''
}
]
},
'Duration': 2600
},
]
}
From this input, I need output as a dictionary type.
Output Format: { Name : Service }
Output:
{ "Heera":"Service1","Prem" : " "}
My try:
Role_name =[]
response = {#INPUT WHICH I SPECIFIED ABOVE#}
roles = response['Roles']
for role in roles:
Role_name.append(role['Name'])
print(Role_name)
I need to pair the name with its corresponding service. Any help would be really appreciable.
Thanks in advance.
You just have to write a long line which can reach till the key 'Service'.
And you a syntax error in line Start_Month': 'January') and 'Start_Month': 'April'). You can't have one unclosed brackets.
Fix it and run the following.
This is the code:
output_dict = {}
for r in response['Roles']:
output_dict[r["Name"]] = r['PolicyDocument']['Statement'][0]['RoleType']['Service']
print(output_dict)
Output:
{'Heera': 'Service1', 'Prem': ''}
You just have to do like this:
liste = []
for role in response['Roles']:
liste.append(
{
role['Name']:role['PolicyDocument']['Statement'][0]['RoleType']['Service'],
}
)
print(liste)
It seems your input data is structured kind of strange and I am not sure what the ) are doing next to the months since they make things invalid but here is a working script assuming you removed the parenthesis from your input.
response = {
'Roles': [
{
'Path': '/',
'Name': 'Heera',
'Age': '25',
'Policy': 'Policy1',
'Start_Month': 'January',
'PolicyDocument':
{
'Date': '2012-10-17',
'Statement': [
{
'id': '',
'RoleStatus': 'New_Joinee',
'RoleType': {
'Service': 'Service1'
},
'Action': ''
}
]
},
'Duration': 3600
},
{
'Path': '/',
'Name': 'Prem',
'Age': '40',
'Policy': 'Policy2',
'Start_Month': 'April',
'PolicyDocument':
{
'Date': '2018-11-27',
'Statement': [
{
'id': '',
'RoleStatus': 'Senior',
'RoleType': {
'Service': ''
},
'Action': ''
}
]
},
'Duration': 2600
},
]
}
output = {}
for i in response['Roles']:
output[i['Name']] = i['PolicyDocument']['Statement'][0]['RoleType']['Service']
print(output)
This should give you what you want in a variable called role_services:
role_services = {}
for role in response['Roles']:
for st in role['PolicyDocument']['Statement']:
role_services[role['Name']] = st['RoleType']['Service']
It will ensure you'll go through all of the statements within that data structure but be aware you'll overwrite key-value pairs as you traverse the response, if they exist in more than a single entry!
A reference on for loops which might be helpful, illustrates using if statements within them which can help you to extend this to check if items already exist!
Hope that helps

Max IOPS of EBS Volume from Last 2 Weeks

I want to retrieve max IOPS utilized by EBS volume in the last 2 weeks. I am using cloudwatch get_metric_data function to obtain data about metric VolumeReadOps and VolumeWriteOps. I am using following code to get VolumeReadOps and VolumeWriteOps and then trying to calculate MaxIOPS:
This is the function to get metric values:
def cloudwatch_metric_value(CWsession,NameSpace,ResourceIdentifier,vStat,vUnit,vMetricName,vPeriod):
"""
Function that returns metric value of cloudwatch for a given resource and metric Name
"""
if NameSpace=='EBS':
responseCW = CWsession.get_metric_data(
MetricDataQueries=[
{
'Id': 'string',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': vMetricName,
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
],
StartTime=vStartTime,
EndTime=vEndTime,
)
vValue=responseCW['MetricDataResults'][0]['Values']
vTimeStamps=responseCW['MetricDataResults'][0]['Timestamps']
index, value = max(enumerate(vValue), key=operator.itemgetter(1))
metric_value=value
metric_time=vTimeStamps[index]
return metric_time,metric_value
From main, it is called like following:
metric_time,metric_value = cloudwatch_metric_value(cloudwatch,'EBS',v['VolumeId'],'Sum','Count','VolumeReadOps',300)
vReadIOPS=metric_value
metric_time,metric_value = cloudwatch_metric_value(cloudwatch,'EBS',v['VolumeId'],'Sum','Count','VolumeWriteOps',300)
vWriteIOPS=metric_value
vTotalIOPS=round((vReadIOPS+vWriteIOPS)/300)
I understand that IOPS are calculated by diving the ReadOps/Write with duration. The values I get from this code for MaxIOPS for a given volume doesn't match with the values I see for same in cloudwatch console. Please advise if I am doing this in right way?
Thanks.
Ok, I was able to fix and here is the working function:
def cloudwatch_metric_value(CWsession,NameSpace,ResourceIdentifier,vStat,vUnit,vPeriod):
"""
Function that returns metric value of cloudwatch for a given resource and metric Name
"""
if NameSpace=='EBS':
responseCW = CWsession.get_metric_data(
MetricDataQueries=[
{
'Id': 'string1',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': 'VolumeReadOps',
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
{
'Id': 'string2',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EBS',
'MetricName': 'VolumeWriteOps',
'Dimensions': [
{
'Name': 'VolumeId',
'Value': ResourceIdentifier
},
]
},
'Period': vPeriod,
'Stat': vStat,
'Unit': vUnit
},
'ReturnData': True
},
],
StartTime=vStartTime,
EndTime=vEndTime,
)
vReadValue=responseCW['MetricDataResults'][0]['Values']
vReadTimeStamps=responseCW['MetricDataResults'][0]['Timestamps']
vWriteValue=responseCW['MetricDataResults'][1]['Values']
vWriteTimeStamps=responseCW['MetricDataResults'][1]['Timestamps']
vReadWriteValue = [vReadValue[i]+vWriteValue[i] for i in range(len(vWriteValue))]
if vReadWriteValue:
metric_value = max(vReadWriteValue)
metric_time = vReadTimeStamps[vReadWriteValue.index(metric_value)]
metric_value = metric_value / 300
else:
metric_value=1
metric_time=date_t
return metric_time,metric_value

Grouping sub array's of objects in an array where attribute equals - Python

I have an array where I am trying to group the subarrays of the objects together if the key value pair is equal to userID.
Leaving me with one object, per userID with all the sub-arrays of that userID.
I can't seem to figure out how to do this, even after trawling through SO.
How do I group the subarrays where the userID's are the same?
(the data changes so I need to use a for loop)
Thanks for the help.
The array looks like this:
[
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'blue',
'animal':'dog'
}
]
},
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'red',
'animal':'cat'
}
]
},
{
'name':'Billy',
'lastname':'King',
'userID': 1004,
'subarray':[
{
'color':'green',
'animal':'fish'
}
]
}
]
I want to make the array like this:
[
{
'name':'James',
'lastname':'Bond',
'userID': 1001,
'subarray':[
{
'color':'blue',
'animal':'dog'
},
{
'color':'red',
'animal':'cat'
}
]
},
{
'name':'Billy',
'lastname':'King',
'userID': 1004,
'subarray':[
{
'color':'green',
'animal':'fish'
}
]
}
]
Using a simple iteration.
Ex:
result = {}
for item in data:
if item["userID"] not in result:
result[item["userID"]] = {'name':item["name"], 'lastname':item["lastname"],'userID': item["userID"],'subarray':[]}
result[item["userID"]]['subarray'].append(item["subarray"])
print(list(result.values()))
Output:
[{'lastname': 'Bond',
'name': 'James',
'subarray': [[{'animal': 'dog', 'color': 'blue'}],
[{'animal': 'cat', 'color': 'red'}]],
'userID': 1001},
{'lastname': 'King',
'name': 'Billy',
'subarray': [[{'animal': 'fish', 'color': 'green'}]],
'userID': 1004}]

Loop through multidimensional JSON (Python)

I have a JSON with following structure:
{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}
It's a JSON with two key-value-pairs. The second pair's value is a List of more JSON's.
For me it is too much information and I want to have a JSON like this:
{
'apps' : [
{
'name': 'Appname',
'app_id' : 1234,
'organization_id' : 'Blablabla'
},
{
'name': 'Appname2',
'app_id' : 5678,
'organization_id' : 'Some other Organization'
}
]
}
I want to have a JSON that only contains one key ("apps") and its value, which would be a List of more JSONs that only have three key-value-pairs..
I am thankful for any advice.
Thank you for your help!
#bishakh-ghosh I don't think you need to use the input json as string. It can be used straight as a dictionary. (thus avoid ast)
One more concise way :
# your original json
input_ = { 'count': 93, ... }
And here are the steps :
Define what keys you want to keep
slice_keys = ['name', 'app_id', 'organization_id']
Define the new dictionary as a slice on the slice_keys
dict(apps=[{key:value for key,value in d.items() if key in slice_keys} for d in input_['apps']])
And that's it.
That should yield the JSON formatted as you want, e.g
{
'apps':
[
{'app_id': 27, 'name': '---', 'organization_id': '--'},
{'app_id': 87, 'name': '---', 'organization_id': '---'}
]
}
This might be what you are looking for:
import ast
import json
json_str = """{
'count': 93,
'apps' : [
{
'last_modified_at': '2016-10-21T12:20:26Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': False,
'app_store_id': 'bbb',
'connection_type': 'certificate',
'sdk_api_secret': '--'
},
'organization_id': '--',
'name': '---',
'app_id': 27,
'control_group_percentage': 0,
'created_by': {
'user_id': 'abc',
'user_name': 'def'
},
'created_at': '2016-09-28T11:41:24Z',
'web': {}
}, {
'last_modified_at': '2016-10-12T08:58:57Z',
'frequency_caps': [],
'ios': {
'enabled': True,
'push_enabled': True,
'app_store_id': '386304604',
'connection_type': 'certificate',
'sdk_api_secret': '---',
'push_expiry': '2018-01-14T08:24:09Z'
},
'organization_id': '---',
'name': '---',
'app_id': 87,
'control_group_percentage': 0,
'created_by': {
'user_id': '----',
'user_name': '---'
},
'created_at': '2016-10-12T08:58:57Z',
'web': {}
}
]
}"""
json_dict = ast.literal_eval(json_str)
new_dict = {}
app_list = []
for appdata in json_dict['apps']:
appdata_dict = {}
appdata_dict['name'] = appdata['name']
appdata_dict['app_id'] = appdata['app_id']
appdata_dict['organization_id'] = appdata['organization_id']
app_list.append(appdata_dict)
new_dict['apps'] = app_list
new_json_str = json.dumps(new_dict)
print(new_json_str) # This is your resulting json string

Categories