PyMongo - Aggregation pipeline to get user - mentioned user network

PyMongo - Aggregation pipeline to get user - mentioned user network - python

I have uploaded some tweets in a Mongo DB collection and I would like to extract the following information with PyMongo:
user.screen_name
entities.user_mentions.screen_name
count
i.e. I would like to know who has mentioned whom and how many times, in order to create some kind of network.
I used the following pipeline to get the most mentioned users but I'm not able to introduce also the user.screen_name:
tweets.aggregate([
{'$project': {'mentions': '$entities.user_mentions.screen_name', '_id': 0}},
{'$unwind': '$mentions'},
{'$group': {'_id': '$mentions', 'count': {'$sum': 1}}}
])
Here an example of document (tweet), where I removed some of the fields I'm not interested in:
{'_id': ObjectId('604c805b289d1ef5947e1845'),
'created_at': 'Fri Mar 12 04:36:10 +0000 2021',
'display_text_range': [0, 140],
'entities': {'hashtags': [{'indices': [124, 136], 'text': 'mytag'}],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 123,
'id_str': '123',
'indices': [3, 14],
'name': 'user_name',
'screen_name': 'user_screen_name'}]},
'user': {'id': 456,
'id_str': '456',
'name': 'Author Name',
'screen_name': 'Author Screen Name'}}
{'_id': ObjectId('604c805b289d1ef5947e184x'),
'created_at': 'Fri Mar 12 04:36:10 +0000 2021',
'display_text_range': [0, 140],
'entities': {'hashtags': [{'indices': [124, 136], 'text': 'mytag'}],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 126,
'id_str': '126',
'indices': [3, 14],
'name': 'user_name',
'screen_name': 'user_screen_name'}]},
'user': {'id': 4567,
'id_str': '4567',
'name': 'Other Author Name',
'screen_name': 'Other Author Screen Name'}}
In this example I would expect something like:
{'mentioned': 'user_screen_name',
'author': 'Author Screen Name',
'count': '1'},
{'mentioned': 'user_screen_name',
'author': 'Other Author Screen Name',
'count': '1'},
Can someone help me?
Thank you in advance for your help!
Francesca

db.collection.aggregate([
{
"$project": {
"mentions": "$entities.user_mentions.screen_name",
"author": "$user.screen_name"
}
},
{ "$unwind": "$mentions" },
{
"$group": {
"_id": { aut: "$author", ment: "$mentions" },
"count": { "$sum": 1 },
author: { "$first": "$author" },
mentions: { "$first": "$mentions" }
}
},
{
"$project": { _id: 0 }
}
])
Working Mongo playground

Related

How to monitor AWS Fsx ONTAP Filesystem Usage using lambda function and cloudwatch metrics

I am first time trying to work on the aws lambda function to get the monitoring for FSxN IE FSx for ONTAP Storage in AWS.
Here I want to get the StorageCapacity and StorageTier in order to achieve total filesystemUsed percentage for monitoring.
I have tried the code below after a lot of search and trial but got an error.
Code tried:
import json
import boto3
from datetime import datetime
def lambda_handler(event, context):
fsx = boto3.client('fsx')
filesystems = fsx.describe_file_systems()
table = []
for filesystem in filesystems.get('FileSystems'):
status = filesystem.get('Lifecycle')
filesystem_id = filesystem.get('FileSystemId')
table.append(filesystem_id)
cloudwatch = boto3.client('cloudwatch')
result = []
for filesystem_id in table:
current_time = datetime.utcnow().isoformat()
response = cloudwatch.get_metric_data(
MetricDataQueries=[
{
'Id': 'm1',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/FSx',
'MetricName': 'StorageCapacity',
'Dimensions': [
{
'Name': 'FileSystemId',
'Value': filesystem_id
},
{
'Name': 'StorageTier',
'Value': 'SSD'
},
{
'Name': 'DataType',
'Value': 'All'
}
]
},
'Period': 60,
'Stat': 'Sum'
},
'ReturnData': True
},
{
'Id': 'm2',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/FSx',
'MetricName': 'StorageUsed',
'Dimensions': [
{
'Name': 'FileSystemId',
'Value': filesystem_id
},
{
'Name': 'StorageTier',
'Value': 'SSD'
},
{
'Name': 'DataType',
'Value': 'All'
}
]
},
'Period': 60,
'Stat': 'Sum'
},
'ReturnData': True
}
],
StartTime='2023-01-20T00:01:00Z',
EndTime='2023-01-20T00:02:00Z'
)
storage_capacity = response['MetricDataResults'][0]['Values'][0]
storage_used = response['MetricDataResults'][1]['Values'][0]
result.append({'filesystem_id': filesystem_id,'storage_capacity': storage_capacity, 'storage_used': storage_used})
return result
Error after execution
Response
{
"errorMessage": "list index out of range",
"errorType": "IndexError",
"requestId": "a09573f2-87ea-4464-afc0-8b196f669415",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 75, in lambda_handler\n storage_capacity = response['MetricDataResults'][0]['Values'][0]\n"
]
}
MetricDataResults for metric(m1) and metric(m2)
{'MetricDataResults': [
{'Id': 'm1', 'Label': 'StorageCapacity', 'Timestamps': [datetime.datetime(2023, 1, 20, 0, 1, tzinfo=tzlocal())], 'Values': [925308932096.0], 'StatusCode': 'Complete'},
{'Id': 'm2', 'Label': 'StorageUsed', 'Timestamps': [datetime.datetime(2023, 1, 20, 0, 1, tzinfo=tzlocal())], 'Values': [2439143424.0], 'StatusCode': 'Complete'}], 'Messages': [], 'ResponseMetadata': {'RequestId': '479a53b2-b5f5-46c0-b79d-278d803df94b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '479a53b2-b5f5-46c0-b79d-278d803df94b', 'content-type': 'text/xml', 'content-length': '923', 'date': 'Sat, 21 Jan 2023 18:06:35 GMT'}, 'RetryAttempts': 0}}
{'MetricDataResults': [
{'Id': 'm1', 'Label': 'StorageCapacity', 'Timestamps': [datetime.datetime(2023, 1, 20, 0, 1, tzinfo=tzlocal())], 'Values': [925308932096.0], 'StatusCode': 'Complete'},
{'Id': 'm2', 'Label': 'StorageUsed', 'Timestamps': [datetime.datetime(2023, 1, 20, 0, 1, tzinfo=tzlocal())], 'Values': [2593112064.0], 'StatusCode': 'Complete'}], 'Messages': [], 'ResponseMetadata': {'RequestId': 'db9ad0a4-0a24-4f1d-be60-55bde63fd49b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'db9ad0a4-0a24-4f1d-be60-55bde63fd49b', 'content-type': 'text/xml', 'content-length': '923', 'date': 'Sat, 21 Jan 2023 18:06:35 GMT'}, 'RetryAttempts': 0}}
Please help/hint something to get into the solution.

How do I get data from a nested dict?

Hello I'm trying to get specific data out an API call from a website. This is the data I'm receiving
This is the data I'm recieving
{'type': 'NonStockItem', 'attributes': [], 'id': '1', 'description': 'Ikke lagerførte varer høy sats'}
{'type': 'NonStockItem', 'attributes': [], 'id': '2', 'description': 'Ikke lagerførte varer middels sats'}
{'type': 'NonStockItem', 'attributes': [], 'id': '3', 'description': 'Ikke lagerførte varer lav sats'}
{'type': 'NonStockItem', 'attributes': [], 'id': '4', 'description': 'Ikke lagerførte varer avgiftsfri'}
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}], 'id': '5', 'description': 'Lagerførte varer høy sats'}
{'type': 'FinishedGoodItem', 'attributes': [], 'id': '6', 'description': 'Lagerførte varer middels sats'}
{'type': 'FinishedGoodItem', 'attributes': [], 'id': '7', 'description': 'Lagerførte varer avgiftsfri'}
{'type': 'LaborItem', 'attributes': [], 'id': '8', 'description': 'Tjenester (prosjekt)'}
{'type': 'ExpenseItem', 'attributes': [], 'id': '9', 'description': 'Utgifter (Reise)'}
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': True, 'attributeType': 'Text', 'details': []}], 'id': 'ONLINE', 'description': 'Online'}
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}, {'attributeId': 'WEB2', 'description': 'tilgjengelighet i nettbutikk', 'required': True, 'attributeType': 'Combo', 'details': [{'id': 'Ikke Inne', 'description': 'Produktet er utsolgt.'}, {'id': 'Inne', 'description': 'tilgjengelig i nettbutikk'}]}], 'id': 'WEB', 'description': 'Tilgjengelig på nettbutikk.'}
This is the object fields
[
{
"type": "NonStockItem",
"attributes": [
{
"attributeId": "string",
"description": "string",
"sortOrder": 0,
"required": true,
"attributeType": "Text"
}
]
this is my code
if response.status_code == 200:
itemClass = json.loads(response.text)
for item in itemClass:
print(item["type"])
print(item["description"])
print(item["attributes"])
What I'm trying to do is to get only the attributes with an existing attributeId. I'm a bit stuck because the data inside the attributes array is a dict, how can I get the key values?
Current output:
NonStockItem
Ikke lagerførte varer høy sats
[]
NonStockItem
Ikke lagerførte varer middels sats
[]
NonStockItem
Ikke lagerførte varer lav sats
[]
NonStockItem
Ikke lagerførte varer avgiftsfri
[]
FinishedGoodItem
Lagerførte varer høy sats
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}]
FinishedGoodItem
Lagerførte varer middels sats
[]
FinishedGoodItem
Lagerførte varer avgiftsfri
[]
LaborItem
Tjenester (prosjekt)
[]
ExpenseItem
Utgifter (Reise)
[]
FinishedGoodItem
Online
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': True, 'attributeType': 'Text', 'details': []}]
FinishedGoodItem
Tilgjengelig på nettbutikk.
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}, {'attributeId': 'WEB2', 'description': 'tilgjengelighet i nettbutikk', 'required': True, 'attributeType': 'Combo', 'details': [{'id': 'Ikke Inne', 'description': 'Produktet er utsolgt.'}, {'id': 'Inne', 'description': 'tilgjengelig i nettbutikk'}]}]
I only want the types which contain an attributeId

I am assuming the list you are working on is accessible using lst[0]['attributes'].
Try the following, which uses list comprehension:
lst = [
{
"type": "NonStockItem",
"attributes": [
{
"attributeId": "string",
"description": "string",
"sortOrder": 0,
"required": True,
"attributeType": "Text"
},
{
# Note that it does not have attributeId
"description": "string",
"sortOrder": 0,
"required": True,
"attributeType": "Text"
}
]
}
]
attrs = lst[0]['attributes']
output = [d for d in attrs if 'attributeId' in d]
print(output)
Output:
[{'attributeId': 'string', 'description': 'string', 'sortOrder': 0, 'required': True, 'attributeType': 'Text'}]
Note that the output has only one element; in the input example I gave, the second dict does not have attributeId.

Pandas json_normalize could be used for this as well:
import json
import pandas as pd
response = '''[
{
"type": "NonStockItem",
"attributes": [
{
"attributeId": "string1",
"description": "string",
"sortOrder": 0,
"required": true,
"attributeType": "Text"
},
{
"attributeId": "string2",
"description": "string",
"sortOrder": 0,
"required": true,
"attributeType": "Text"
}]
},
{
"type": "NonStockItem",
"attributes":[]
},
{
"type": "NonStockItem",
"attributes": [
{
"attributeId": "string3",
"description": "string",
"sortOrder": 0,
"required": true,
"attributeType": "Text"
},
{
"attributeId": "string4",
"description": "string",
"sortOrder": 0,
"required": true,
"attributeType": "Text"
}]
}
]
'''
itemClass = json.loads(response)
print(pd.concat([pd.json_normalize(x["attributes"]) for x in itemClass],
ignore_index=True))
attributeId description sortOrder required attributeType
0 string1 string 0 True Text
1 string2 string 0 True Text
2 string3 string 0 True Text
3 string4 string 0 True Text

The best solution that I could think considering your data sample and output is to verify if item["attributes"] has values inside or no:
Code:
itemclass = [{'type': 'NonStockItem', 'attributes': [], 'id': '1', 'description': 'Ikke lagerførte varer høy sats'},
{'type': 'NonStockItem', 'attributes': [], 'id': '2', 'description': 'Ikke lagerførte varer middels sats'},
{'type': 'NonStockItem', 'attributes': [], 'id': '3', 'description': 'Ikke lagerførte varer lav sats'},
{'type': 'NonStockItem', 'attributes': [], 'id': '4', 'description': 'Ikke lagerførte varer avgiftsfri'},
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}], 'id': '5', 'description': 'Lagerførte varer høy sats'},
{'type': 'FinishedGoodItem', 'attributes': [], 'id': '6', 'description': 'Lagerførte varer middels sats'},
{'type': 'FinishedGoodItem', 'attributes': [], 'id': '7', 'description': 'Lagerførte varer avgiftsfri'},
{'type': 'LaborItem', 'attributes': [], 'id': '8', 'description': 'Tjenester (prosjekt)'},
{'type': 'ExpenseItem', 'attributes': [], 'id': '9', 'description': 'Utgifter (Reise)'},
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': True, 'attributeType': 'Text', 'details': []}], 'id': 'ONLINE', 'description': 'Online'},
{'type': 'FinishedGoodItem', 'attributes': [{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}, {'attributeId': 'WEB2', 'description': 'tilgjengelighet i nettbutikk', 'required': True, 'attributeType': 'Combo', 'details': [{'id': 'Ikke Inne', 'description': 'Produktet er utsolgt.'}, {'id': 'Inne', 'description': 'tilgjengelig i nettbutikk'}]}], 'id': 'WEB', 'description': 'Tilgjengelig på nettbutikk.'}]
for item in itemclass:
if item["attributes"]:
print(item["type"])
print(item["description"])
print(item["attributes"])
Output:
FinishedGoodItem
Lagerførte varer høy sats
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}]
FinishedGoodItem
Online
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': True, 'attributeType': 'Text', 'details': []}]
FinishedGoodItem
Tilgjengelig på nettbutikk.
[{'attributeId': 'NETTBUTIKK', 'description': 'WEB', 'required': False, 'attributeType': 'Text', 'details': []}, {'attributeId': 'WEB2', 'description': 'tilgjengelighet i nettbutikk', 'required': True, 'attributeType': 'Combo', 'details': [{'id': 'Ikke Inne', 'description': 'Produktet er utsolgt.'}, {'id': 'Inne', 'description': 'tilgjengelig i nettbutikk'}]}]

Tweets Analysis with PyMongo - Lower case before counting hashtags

I have uploaded some tweets in a Mongo DB collection and I would like to extract the following information with PyMongo:
lowercase(entities.hashtags.text)
count
i.e. I would like to know how many time an hashtag has been used. However, since hashtags are case sensitive, I would like to consider them as lowercase (so that myTag and MyTag are considered and counted together).
I used the following pipeline to get the most used hashtags but I'm not able to apply the lowercase function:
tweets.aggregate([
{'$project': {'tags': '$entities.hashtags.text', '_id': 0}},
{'$unwind': '$tags'},
{'$group': {'_id': '$tags', 'count': {'$sum': 1}}}
])
Here an example of document (tweet), where I removed some of the fields I'm not interested in:
{'_id': ObjectId('604c805b289d1ef5947e1845'),
'created_at': 'Fri Mar 12 04:36:10 +0000 2021',
'display_text_range': [0, 140],
'entities': {'hashtags': [{'indices': [124, 136], 'text': 'MyTag'}],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 123,
'id_str': '123',
'indices': [3, 14],
'name': 'user_name',
'screen_name': 'user_screen_name'}]},
'user': {'id': 456,
'id_str': '456',
'name': 'Author Name',
'screen_name': 'Author Screen Name'}},
{'_id': ObjectId('604c805b289d1ef5947e1845'),
'created_at': 'Fri Mar 12 04:36:10 +0000 2021',
'display_text_range': [0, 140],
'entities': {'hashtags': [{'indices': [124, 136], 'text': 'MyTAG'}],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 123,
'id_str': '123',
'indices': [3, 14],
'name': 'user_name',
'screen_name': 'user_screen_name'}]},
'user': {'id': 456,
'id_str': '456',
'name': 'Author Name',
'screen_name': 'Author Screen Name'}}
In this example I would expect something like:
{'_id': 'mytag',
'count': '2'}
Can someone help me?
Thank you in advance for your help!
Francesca

You can use $toLower
db.collection.aggregate([
{
"$project": {
"tags": "$entities.hashtags.text",
"_id": 0
}
},
{
"$unwind": "$tags"
},
{
"$group": {
"_id": {
$toLower: "$tags"
},
"count": {
"$sum": 1
}
}
}
])
Working Mongo playground

I want to make csv dataset for my ML Algo for medical data

I want to make a csv of age,gender,symptoms,disease,medications,testdata,test% from my medical data.I have used AWS comprehend
Suppose My input is:-
60 yrs old male known case of T2DM for 6 yrs was on OHA,Hypertensive for 1 yr,IHD and Euthyroid.Non-Analgesic abuse.was on Homeopathy medicines for p/r Haemorhoids.H/O fever and Peripheral neuropathy 1 month back-->resolved with Pregabalin.Recently detected raise in baseline cr to 1.3(11-11-16).Now base line cr maintained 0.89 (previous 1.2).USG-Mild Hepatomegaly with fatty changes,cholelithiasis,BHP. X-ray(19.2.18) Hairline fracture lower end of radius>>taking homeopathic medicines for this problem.. Now Cr.:0.82(prev:0.98). Now c/o constipation, cold intolerance, cough with expectoration, sneezing.
This is what I got as output:
{
'Entities': [
{
'Id': 7,
'BeginOffset': 1,
'EndOffset': 3,
'Score': 0.9946974515914917,
'Text': '60',
'Category': 'PROTECTED_HEALTH_INFORMATION',
'Type': 'AGE',
'Traits': [
]
},
{
'Id': 10,
'BeginOffset': 31,
'EndOffset': 35,
'Score': 0.9719576835632324,
'Text': 'T2DM',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.9450973868370056
}
]
},
{
'Id': 11,
'BeginOffset': 53,
'EndOffset': 56,
'Score': 0.7314696907997131,
'Text': 'OHA',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.5971980690956116
}
]
},
{
'Id': 12,
'BeginOffset': 57,
'EndOffset': 69,
'Score': 0.9934116005897522,
'Text': 'Hypertensive',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.9471467733383179
}
]
},
{
'Id': 13,
'BeginOffset': 79,
'EndOffset': 82,
'Score': 0.8657081127166748,
'Text': 'IHD',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.8929517269134521
}
]
},
{
'Id': 14,
'BeginOffset': 87,
'EndOffset': 96,
'Score': 0.9757838845252991,
'Text': 'Euthyroid',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.9470660090446472
}
]
},
{
'Id': 0,
'BeginOffset': 124,
'EndOffset': 144,
'Score': 0.6679201126098633,
'Text': 'Homeopathy medicines',
'Category': 'TEST_TREATMENT_PROCEDURE',
'Type': 'TREATMENT_NAME',
'Traits': [
]
},
{
'Id': 15,
'BeginOffset': 153,
'EndOffset': 164,
'Score': 0.5985288619995117,
'Text': 'Haemorhoids',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.6647800803184509
}
]
},
{
'Id': 16,
'BeginOffset': 169,
'EndOffset': 174,
'Score': 0.9405372738838196,
'Text': 'fever',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.7457793354988098
}
]
},
{
'Id': 26,
'BeginOffset': 179,
'EndOffset': 189,
'Score': 0.37402620911598206,
'Text': 'Peripheral',
'Category': 'ANATOMY',
'Type': 'DIRECTION',
'Traits': [
]
},
{
'Id': 17,
'BeginOffset': 179,
'EndOffset': 200,
'Score': 0.8868133425712585,
'Text': 'Peripheral neuropathy',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.9166043400764465
}
]
},
{
'Id': 27,
'BeginOffset': 209,
'EndOffset': 213,
'Score': 0.6287723779678345,
'Text': 'back',
'Category': 'ANATOMY',
'Type': 'SYSTEM_ORGAN_SITE',
'Traits': [
]
},
{
'Id': 9,
'BeginOffset': 230,
'EndOffset': 240,
'Score': 0.9881155490875244,
'Text': 'Pregabalin',
'Category': 'MEDICATION',
'Type': 'GENERIC_NAME',
'Traits': [
]
},
{
'Id': 8,
'BeginOffset': 283,
'EndOffset': 300,
'Score': 0.9324891567230225,
'Text': '1.3(11-11-16).Now',
'Category': 'PROTECTED_HEALTH_INFORMATION',
'Type': 'DATE',
'Traits': [
]
},
{
'Id': 1,
'BeginOffset': 311,
'EndOffset': 313,
'Score': 0.42533111572265625,
'Text': 'cr',
'Category': 'TEST_TREATMENT_PROCEDURE',
'Type': 'TEST_NAME',
'Traits': [
],
'Attributes': [
{
'Type': 'TEST_VALUE',
'Score': 0.9705457091331482,
'RelationshipScore': 1.0,
'Id': 2,
'BeginOffset': 325,
'EndOffset': 329,
'Text': '0.89',
'Traits': [
]
}
]
},
{
'Id': 18,
'BeginOffset': 354,
'EndOffset': 366,
'Score': 0.5006829500198364,
'Text': 'Hepatomegaly',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
]
},
{
'Id': 19,
'BeginOffset': 372,
'EndOffset': 385,
'Score': 0.6968429684638977,
'Text': 'fatty changes',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'SIGN',
'Score': 0.44842731952667236
},
{
'Name': 'DIAGNOSIS',
'Score': 0.628720223903656
}
]
},
{
'Id': 20,
'BeginOffset': 386,
'EndOffset': 400,
'Score': 0.8746378421783447,
'Text': 'cholelithiasis',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.7521570324897766
}
]
},
{
'Id': 28,
'BeginOffset': 422,
'EndOffset': 430,
'Score': 0.47990521788597107,
'Text': 'Hairline',
'Category': 'ANATOMY',
'Type': 'SYSTEM_ORGAN_SITE',
'Traits': [
]
},
{
'Id': 21,
'BeginOffset': 422,
'EndOffset': 439,
'Score': 0.584293782711029,
'Text': 'Hairline fracture',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'DIAGNOSIS',
'Score': 0.9427288770675659
}
]
},
{
'Id': 29,
'BeginOffset': 440,
'EndOffset': 445,
'Score': 0.686669647693634,
'Text': 'lower',
'Category': 'ANATOMY',
'Type': 'DIRECTION',
'Traits': [
]
},
{
'Id': 30,
'BeginOffset': 453,
'EndOffset': 459,
'Score': 0.6198492646217346,
'Text': 'radius',
'Category': 'ANATOMY',
'Type': 'SYSTEM_ORGAN_SITE',
'Traits': [
]
},
{
'Id': 3,
'BeginOffset': 513,
'EndOffset': 516,
'Score': 0.9929847717285156,
'Text': 'Cr.',
'Category': 'TEST_TREATMENT_PROCEDURE',
'Type': 'TEST_NAME',
'Traits': [
],
'Attributes': [
{
'Type': 'TEST_VALUE',
'Score': 0.98384028673172,
'RelationshipScore': 0.9999998807907104,
'Id': 4,
'BeginOffset': 517,
'EndOffset': 521,
'Text': '0.82',
'Traits': [
]
}
]
},
{
'Id': 5,
'BeginOffset': 522,
'EndOffset': 526,
'Score': 0.7047827839851379,
'Text': 'prev',
'Category': 'TEST_TREATMENT_PROCEDURE',
'Type': 'TEST_NAME',
'Traits': [
],
'Attributes': [
{
'Type': 'TEST_VALUE',
'Score': 0.9839150905609131,
'RelationshipScore': 0.9999982118606567,
'Id': 6,
'BeginOffset': 527,
'EndOffset': 531,
'Text': '0.98',
'Traits': [
]
}
]
},
{
'Id': 22,
'BeginOffset': 542,
'EndOffset': 554,
'Score': 0.9988911747932434,
'Text': 'constipation',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'SYMPTOM',
'Score': 0.7921050786972046
}
]
},
{
'Id': 23,
'BeginOffset': 556,
'EndOffset': 572,
'Score': 0.9752655625343323,
'Text': 'cold intolerance',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'SYMPTOM',
'Score': 0.9271456599235535
}
]
},
{
'Id': 24,
'BeginOffset': 574,
'EndOffset': 579,
'Score': 0.8357191681861877,
'Text': 'cough',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'SYMPTOM',
'Score': 0.6976621747016907
}
]
},
{
'Id': 25,
'BeginOffset': 600,
'EndOffset': 608,
'Score': 0.9620472192764282,
'Text': 'sneezing',
'Category': 'MEDICAL_CONDITION',
'Type': 'DX_NAME',
'Traits': [
{
'Name': 'SYMPTOM',
'Score': 0.7259325385093689
}
]
}
],
'UnmappedAttributes': [
],
'ResponseMetadata': {
'RequestId': '75b3311e-d8de-41e6-9335-907034f63c30',
'HTTPStatusCode': 200,
'HTTPHeaders': {
'content-type': 'application/x-amz-json-1.1',
'date': 'Thu, 15 Aug 2019 04:47:21 GMT',
'x-amzn-requestid': '75b3311e-d8de-41e6-9335-907034f63c30',
'content-length': '5516',
'connection': 'keep-alive'
},
'RetryAttempts': 0
}
}

How to extract Json data scraped from website

I used Beautiful soup to extract data from a Website. Content is in JSON and I need to extract all the display_name values. I have no clue how to naviagate and print the values I need to save in my CSV.
I tried using some array examples like this one
for productoslvl in soup2.findAll('script',{'id' :'searchResult'}):
element = jsons[0]['display_name']
print (element)
but I keep getting KeyError
This is the JSON data:
{
'page_size': -1,
'refinements': [{
'display_name': 'Brand',
'values': [{
'display_name': 'Acqua Di Parma',
'status': 4,
'value': 900096
}],
'type': 'checkboxes'
}, {
'display_name': 'Bristle Type',
'values': [{
'display_name': 'Addictive',
'status': 1,
'value': 14578019
}, {
'display_name': 'Casual',
'status': 1,
'value': 14578020
}, {
'display_name': 'Chic',
'status': 1,
'value': 14301148
}, {
'display_name': 'Polished',
'status': 1,
'value': 14578022
}],
'type': 'checkboxes'
}, {
'display_name': 'Coverage',
'values': [{
'display_name': 'Balanced',
'status': 1,
'value': 14301025
}, {
'display_name': 'Light',
'status': 1,
'value': 14577894
}, {
'display_name': 'Rich',
'status': 1,
'value': 14577895
}],
'type': 'checkboxes'
}, {
'display_name': 'Formulation',
'values': [{
'display_name': 'Cream',
'status': 1,
'value': 100069
}, {
'display_name': 'Spray',
'status': 1,
'value': 100072
}],
'type': 'checkboxes'
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyMongo - Aggregation pipeline to get user - mentioned user network - python

Related

How to monitor AWS Fsx ONTAP Filesystem Usage using lambda function and cloudwatch metrics

How do I get data from a nested dict?

Tweets Analysis with PyMongo - Lower case before counting hashtags

I want to make csv dataset for my ML Algo for medical data

How to extract Json data scraped from website

Categories

Resources