Query for a specific item in firebase from python - python

I am getting this error
Index not defined, add ".indexOn": "Number", for path "/public_access", to the rules
and have no idea what the issue is. All I'd like to do is retrieve (Check if the entry exists in my db) to see if the tag has already been use
My rules look like this
{
"rules": {
"public_access": {
"all_users": {
".indexOn": "Numbers"
},
".read": "auth.uid === 'regular_user'",
".write": true,
},
"connection_testing": {
".read": true,
".write": "auth.uid === 'admin_user'"
},
"login_data": {
".read": true,
".write": true
},
}
}
And the python code I am trying to use is this
ref = db.reference('/public_access/all_users')
snapshot = ref.order_by_child('Number').equal_to('AAAAAA').get()
But it just errors, I've tried copying so many different examples and I've been stuck on it for hours. I know the solution will be something stupidly simple but I just can't apply it to what I want to do.
I don't need the number value at all, it was just an attempt to get it to find 'AAAAAA'.
The end goal is to see if AAAAAA exists by searching for it, and I have not been successful at achieving this

You don't need order_by_child or database rules here. If you want to search for "AAAAAA", just access it by the child method:
ref = db.reference('/public_access/all_users')
data = ref.child('AAAAAA').get()
print(data['Date_of_join']) # '12/34/34'
print(data['Number']) # 2

Related

Parsing JSON in AWS Lambda Python

For a personal project I'm trying to write an AWS Lambda in Python3.9 that will delete a newly created user, if the creator is not myself. For this, the logs in CloudWatch Logs will trigger (via CloudTrail and EventBridge) my Lambda. Therefore, I will receive the JSON request as my event in :
def lambdaHandler(event, context)
But I have trouble to parse it...
If I print the event, I get that :
{'version': '1.0', 'invokingEvent': '{
"configurationItemDiff": {
"changedProperties": {},
"changeType": "CREATE"
},
"configurationItem": {
"relatedEvents": [],
"relationships": [],
"configuration": {
"path": "/",
"userName": "newUser",
"userId": "xxx",
"arn": "xxx",
"createDate": "2022-11-23T09:02:49.000Z",
"userPolicyList": [],
"groupList": [],
"attachedManagedPolicies": [],
"permissionsBoundary": null,
"tags": []
},
"supplementaryConfiguration": {},
"tags": {},
"configurationItemVersion": "1.3",
"configurationItemCaptureTime": "2022-11-23T09:04:40.659Z",
"configurationStateId": 1669194280659,
"awsAccountId": "141372946428",
"configurationItemStatus": "ResourceDiscovered",
"resourceType": "AWS::IAM::User",
"resourceId": "xxx",
"resourceName": "newUser",
"ARN": "arn:aws:iam::xxx:user/newUser",
"awsRegion": "global",
"availabilityZone": "Not Applicable",
"configurationStateMd5Hash": "",
"resourceCreationTime": "2022-11-23T09:02:49.000Z"
},
"notificationCreationTime": "2022-11-23T09:04:41.317Z",
"messageType": "ConfigurationItemChangeNotification",
"recordVersion": "1.3"
}', 'ruleParameters': '{
"badUser": "arn:aws:iam::xxx:user/badUser"
}', 'resultToken': 'xxx=', 'eventLeftScope': False, 'executionRoleArn': 'arn:aws:iam: : xxx:role/aws-service-role/config.amazonaws.com/AWSServiceRoleForConfig', 'configRuleArn': 'arn:aws:config:eu-west-1: xxx:config-rule/config-rule-q3nmvt', 'configRuleName': 'UserCreatedRule', 'configRuleId': 'config-rule-q3nmvt', 'accountId': 'xxx'
}
And for my purpose, I'd like to get the "changeType": "CREATE" value to say that if it is CREATE, I check the creator and if it is not myself, I delete newUser.
So the weird thing is that I copy/paste that event into VSCode and format it in a .json document and it says that there are errors (line 1 : version and invokingEvent should be double quote for example, but well).
For now I only try to reach and print the
"changeType": "CREATE"
by doing :
import json
import boto3
import logging
iam = boto3.client('iam')
def lambda_handler(event, context):
"""
Triggered if a user is created
Check the creator - if not myself :
- delete new user and remove from groups if necessary
"""
try:
print(event['invokingEvent']["configurationItemDiff"]["changeType"])
except Exception as e:
print("Error because :")
print(e)
And get the error string indices must be integers - it happens for ["configurationItemDiff"].
I understand the error already (I'm new to python though so maybe not completely) and tried many things like :
print(event['invokingEvent']['configurationItemDiff']) : swapping double quote by simple quote but doesnt change anything
print(event['invokingEvent'][0]) : but it gives me the index { and [2] gives me the c not the whole value.
At this point I'm stuck and need help because I can't find any solution on this. I don't use SNS, maybe should I ? Because I saw that with it, the JSON document would not be the same and we can access through ["Records"][...] ? I don't know, please help
What you are printing is a python dict, it looks sort of like JSON but is not JSON, it is the representation of a python dict. That means it will have True / False instead of true / false, it will have ' instead of ", etc.
You could do print(json.dumps(event)) instead.
Anyway, the actual problem is that invokingEvent is yet another JSON, but in its string form, you need to to json.loads that nested JSON string. You can see that because the value after invokingEvent is inside another set of '...', therefore it is a string, not a parsed dict already.
invoking_event = json.loads(event['invokingEvent'])
change_type = invoking_event["configurationItemDiff"]["changeType"]
ruleParameters would be another nested JSON which needs parsing first if you wanted to use it.

Elasticsearch prevent indexing of Markdown hyperlinks

I am building a Markdown file content search using Elasticsearch. Currently the whole content inside the MD file is indexed in Elasticsearch. But the problem is it shows results like this [Mylink](https://link-url-here.org), [Mylink2](another_page.md)
in the search results.
I would like to prevent indexing of hyperlinks and reference to other pages. When someone search for "Mylink" it should only return the text without the URL. It would be great if someone could help me with the right solution for this.
You need to render Markdown in your indexing application, then remove HTML tags and save it alongside with the markdown source.
I think you have two main solutions for this problem.
first: clean the data in your source code before indexing it into Elasticsearch.
second: use the Elasticsearch filter to clean the data for you.
the first solution is the easy one but if you need to do this process inside the Elasticsearch you need to create a ingest pipeline.
then you can use the Script processor to clean the data you need by a ruby script that can find your regex and remove it
You could use an ingest pipeline with a script processor to extract the link text:
1. Set up the pipeline
PUT _ingest/pipeline/clean_links
{
"description": "...",
"processors": [
{
"script": {
"source": """
if (ctx["content"] == null) {
// nothing to do here
return
}
def content = ctx["content"];
Pattern pattern = /\[([^\]\[]+)\](\(((?:[^\()]+)+)\))/;
Matcher matcher = pattern.matcher(content);
def purged_content = matcher.replaceAll("$1");
ctx["purged_content"] = purged_content;
"""
}
}
]
}
The regex can be tested here and is inspired by this.
2. Include the pipeline when ingesting the docs
POST my-index/_doc?pipeline=clean_links
{
"content": "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
}
POST my-index/_doc?pipeline=clean_links
{
"content": "[Mylink2](another_page.md)"
}
The python docs are here.
3. Verify
GET my-index/_search?filter_path=hits.hits._source
should yield
{
"hits" : {
"hits" : [
{
"_source" : {
"purged_content" : "Mylink anotherLink",
"content" : "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
}
},
{
"_source" : {
"purged_content" : "Mylink2",
"content" : "[Mylink2](another_page.md)"
}
}
]
}
}
You could instead replace the original content if you want to fully discard them from your _source.
In contrast, you could go a step further in the other direction and store the text + link pairs in a nested field of the form:
{
"content": "...",
"links": [
{
"text": "Mylink",
"href": "https://link-url-here.org"
},
...
]
}
so that when you later decide to make them searchable, you'll be able to do so with precision.
Shameless plug: you can find other hands-on ingestion guides in my Elasticsearch Handbook.

Navigating Event in AWS Lambda Python

So I'm fairly new to both AWS and Python. I'm on a uni assignment and have hit a road block.
I'm uploading data to AWS S3, this information is being sent to an SQS Queue and passed into AWS Lambda. I know, it would be much easier to just go straight from S3 to Lambda...but apparently "that's not the brief".
So I've got my event accurately coming into AWS Lambda, but no matter how deep I dig, I can't reach the information I need. In AMS Lambda, I run the following query.
def lambda_handler(event, context):
print(event)
Via CloudWatch, I get the output
{'Records': [{'messageId': '1d8e0a1d-d7e0-42e0-9ff7-c06610fccae0', 'receiptHandle': 'AQEBr64h6lBEzLk0Xj8RXBAexNukQhyqbzYIQDiMjJoLLtWkMYKQp5m0ENKGm3Icka+sX0HHb8gJoPmjdTRNBJryxCBsiHLa4nf8atpzfyCcKDjfB9RTpjdTZUCve7nZhpP5Fn7JLVCNeZd1vdsGIhkJojJ86kbS3B/2oBJiCR6ZfuS3dqZXURgu6gFg9Yxqb6TBrAxVTgBTA/Pr35acEZEv0Dy/vO6D6b61w2orabSnGvkzggPle0zcViR/shLbehROF5L6WZ5U+RuRd8tLLO5mLFf5U+nuGdVn3/N8b7+FWdzlmLOWsI/jFhKoN4rLiBkcuL8UoyccTMJ/QTWZvh5CB2mwBRHectqpjqT4TA3Z9+m8KNd/h/CIZet+0zDSgs5u', 'body': '{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-2","eventTime":"2021-03-26T01:03:53.611Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"MY_ID"},"requestParameters":{"sourceIPAddress":"MY_IP_ADD"},"responseElements":{"x-amz-request-id":"BQBY06S20RYNH1XJ","x-amz-id-2":"Cdo0RvX+tqz6SZL/Xw9RiBLMCS3Rv2VOsu2kVRa7PXw9TsIcZeul6bzbAS6z4HF6+ZKf/2MwnWgzWYz+7jKe07060bxxPhsY"},"s3":{"s3SchemaVersion":"1.0","configurationId":"test","bucket":{"name":"MY_BUCKET","ownerIdentity":{"principalId":"MY_ID"},"arn":"arn:aws:s3:::MY_BUCKET"},"object":{"key":"test.jpg","size":246895,"eTag":"c542637a515f6df01cbc7ee7f6e317be","sequencer":"00605D33019AD8E4E5"}}}]}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1616720643174', 'SenderId': 'AIDAIKZTX7KCMT7EP3TLW', 'ApproximateFirstReceiveTimestamp': '1616720648174'}, 'messageAttributes': {}, 'md5OfBody': '1ab703704eb79fbbb58497ccc3f2c555', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-2:ARN', 'awsRegion': 'eu-west-2'}]}
[Disclaimer, I've tried to edit out any identifying information but if there's any sensitive data I'm not understanding or missed, please let me know]
Anyways, just for a sample, I want to get the Object Key, which is test.jpg. I tried to drill down as much as I can, finally getting to: -
def lambda_handler(event, context):
print(event['Records'][0]['body'])
This returned the following (which was nice to see fully stylized): -
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "eu-west-2",
"eventTime": "2021-03-26T01:08:16.823Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "MY_ID"
},
"requestParameters": {
"sourceIPAddress": "MY_IP"
},
"responseElements": {
"x-amz-request-id": "ZNKHRDY8GER4F6Q5",
"x-amz-id-2": "i1Cazudsd+V57LViNWyDNA9K+uRbSQQwufMC6vf50zQfzPaH7EECsvw9SFM3l3LD+TsYEmnjXn1rfP9GQz5G5F7Fa0XZAkbe"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "test",
"bucket": {
"name": "MY_BUCKET",
"ownerIdentity": {
"principalId": "MY_ID"
},
"arn": "arn:aws:s3:::MY_BUCKET"
},
"object": {
"key": "test.jpg",
"size": 254276,
"eTag": "b0052ab9ba4b9395e74082cfd51a8f09",
"sequencer": "00605D3407594DE184"
}
}
}
]
}
However, from this stage on if I try to write print(event['Records'][0]['body']['Records']) or print(event['Records'][0]['s3']), I'll get told I require an integer, not a string. If I try to write print(event['Records'][0]['body'][0]), I'll be given a single character every time (in this cause the first { bracket).
I'm not sure if this has something to do with tuples, or if at this stage it's all saved as one large string, but at least in the output view it doesn't appear to be saved that way.
Does anyone have any idea what I'd do from this stage to access the further information? In the full release after I'm done testing, I'll be wanting to save an audio file and the file name as opposed to a picture.
Thanks.
You are having this problem because the contents of the body is a JSON. But in string format. You should parse it to be able to access it like a normal dictionary. Like so:
import json
def handler(event: dict, context: object):
body = event['Records'][0]['body']
body = json.loads(body)
# use the body as a normal dictionary
You are getting only a single char when using integer indexes because it is a string. So, using [n] in an string will return the nth char.
It's because your getting stringified JSON data. You need to load it back to its Python dict format.
There is a useful package called lambda_decorators. you can install with pip install lambda_decorators
so you can do this:
from lambda_decorators import load_json_body
#load_json_body
def lambda_handler(event, context):
print(event['Records'][0]['body'])
# Now you can access the the items in the body using there index and keys.
This will extract the JSON for you.

pysaml2 - AuthnContextClassRef, PasswordProtectedTransport

I am struggling to understand how to configure pysaml2 and add the AuthnContext in my request.
I have a SP and I would need to add the following request when the client performs the login request:
<samlp:RequestedAuthnContext>
<saml:AuthnContextClassRef>
urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport
</saml:AuthnContextClassRef>
</samlp:RequestedAuthnContext>
I am struggling because I tried everything I could and I believe that it is possible to add that in my requests because in here https://github.com/IdentityPython/pysaml2/blob/master/src/saml2/samlp.py
I can see:
AUTHN_PASSWORD = "urn:oasis:names:tc:SAML:2.0:ac:classes:Password"
AUTHN_PASSWORD_PROTECTED = \
"urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport"
I just do not know how to reference that.. I have a simple configuration like this:
"service": {
"sp": {
"name": "BLABLA",
"allow_unsolicited": true,
"want_response_signed": false,
"logout_requests_signed": true,
"endpoints": {
"assertion_consumer_service": ["https://mywebste..."],
"single_logout_service": [["https://mywebste...", "urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect"]]
},
"requestedAuthnContext" : true
}
}
Anyone know how to add the above config?
I struggle to understand how to build the config dictionary, even by reading their docs. Any ideas?
I am happy to add the "PasswordProtectedTransport" directly in the code if the config does not allow that.. But I am not sure how to do it.
Thanks,
R
At some point your client calls create_authn_request(...)
(or prepare_for_authenticate(...),
or prepare_for_negotiated_authenticate(...)).
You should pass the extra arg requested_authn_context.
The requested_authn_context is an object of type saml2.samlp.RequestedAuthnContext that contains the wanted AuthnContextClassRef.
...
from saml2.saml import AUTHN_PASSWORD_PROTECTED
from saml2.saml import AuthnContextClassRef
from saml2.samlp import RequestedAuthnContext
requested_authn_context = RequestedAuthnContext(
authn_context_class_ref=[
AuthnContextClassRef(AUTHN_PASSWORD_PROTECTED),
],
comparison="exact",
)
req_id, request = create_authn_request(
...,
requested_authn_context=requested_authn_context,
)

pymongo include javascript in aggregate query

I'm currently tasked with researching databases and am trying various queries using the pymongo library to investigate suitability for given projects.
My timestamps are saved in millisecond integer format and I'd like to do a simple sales by day aggregated query. I understand from here (answer by Alexandre Russel) that as the timestamps weren't uploaded in BSON format I can't use date and time functions to create bins, but can manipulate timestamps using embedded javascript.
As such I've written the following query:
[{
"$project": {
"year": {
"$year": {
"$add": ["new Date(0)", "$data.horaContacto"]
}
},
"month": {
"$month": {
"$add": ["new Date(0)", "$data.horaContacto"]
}
}
}
}, {
"$group": {
"_id": {
"year": "$year",
"month": "$month"
},
"sales": {
"$sum": {
"$cond": ["$data.estadoVenta", 1, 0]
}
}
}
}]
But get this error:
pymongo.errors.OperationFailure: exception: $add only supports numeric or date types, not String
I think whats happening is that the js "new Date(0)" is being interpreted by the mongo driver as a string, not applied as js. If I remove the encapsulating inverted double quotes then Python tries to interpret this code and errors accordingly. This is just one example and I'd like to include more js in queries in future tests but can't see a way to get it to play nicely with Python (having said this I'm fairly new to Python too).
Does anybody know if:
I'm correct in assuming the error occurs because mongo interprets the
JS as a string and tries to sum it directly?
If I can indicate to
mongo this is JS from Python without Python trying to intepret the
code?
So far I've tried searching via Google and various combinations of single and double inverted commas.
Pasted below is a few rows of randomly generated test data if required:
Thanks,
James
{'_id': 0,'data': {'edad': '74','estadoVenta': True,'visits': [{'visitLength': 1819.349246663518,'visitNo': 1,'visitTime': 1480244647948.0}],'apellido2': 'Aguilar','apellido1': 'Garcia','horaContacto': 1464869545373.0,'preNombre': 'Agustin','_id': 0,'telefono': 630331272,'location': {'province': 'Aragón','city': 'Zaragoza','type': 'Point','coordinates': [-0.900203, 41.747726],'country': 'Spain'}}},
{'_id': 1,'data': {'edad': '87','estadoVenta': False,'visits': [{'visitLength': 2413.9938072105024,'visitNo': 1,'visitTime': 1465417353597.0}],'apellido2': 'Torres','apellido1': 'Acosta','horaContacto': 1473404147769.0,'preNombre': 'Sara','_id': 1,'telefono': 665968746,'location': {'province': 'Galicia','city': 'Cualedro','type': 'Point','coordinates': [-7.659321, 41.925328],'country': 'Spain'}}},
{'_id': 2,'data': {'edad': '48','estadoVenta': True,'visits': [{'visitLength': 2413.9938072105024,'visitNo': 1,'visitTime': 1465415138597.0}],'apellido2': 'Perez','apellido1': 'Sanchez','horaContacto': 1473404923569.0,'preNombre': 'Sara','_id': 2,'telefono': 665967346,'location': {'province': 'Galicia','city': 'Barcelona','type': 'Point','coordinates': [-7.659321, 41.925328],'country': 'Spain'}}}
The MongoDB aggregation framework cannot use any Javascript. You must specify all the data in your aggregation pipeline using BSON. PyMongo can translate a standard Python datetime to BSON, and you can send it as part of the aggregation pipeline, like so:
import datetime
epoch = datetime.datetime.fromtimestamp(0)
pipeline = [{
"$project": {
"year": {
"$year": {
"$add": [epoch, "$data.horaContacto"]
}
},
# the rest of your pipeline here ....
}
}]
cursor = db.collection.aggregate(pipeline)

Categories