Celery Result type for ElasticSearch - python

I'm exploring celery for my work currently and I'm trying to set-up Elasticsearch backend. Is there any way to send resulting value as a dictionary/JSON, not as a text? Therefore, results in Elasticsearch will be shown correctly and nested type could be used?
Automatic mapping created by celery
{
"celery" : {
"mappings" : {
"backend" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"result" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
I've tried to create my own mapping with nested field, but it has resulted in a elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'object mapping for [result] tried to parse field [result] as object, but found a concrete value')
UPDATE
Result is already encoded in JSON and inside Elasticsearch wrapper JSON string is saved inside a dictionary. Adding json.loads(result) as a quick-fix actually helps.
After the quick-fix new mapping has appeared:
{
"celery" : {
"mappings" : {
"backend" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"result" : {
"properties" : {
"date_done" : {
"type" : "date"
},
"result" : {
"type" : "long"
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"task_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
Updated Kibana view:
Is there any way to disable serialization of results in Celery?
I could add a pull-request with unpacking JSON, just for Elasticsearch, but it looks like a hack.

Since v4.0 the default result_serializer is json, so you should have results in JSON format anyway. Maybe your configuration uses something else? - In that case I suggest you remove it (if you use Celery >=4.0) and you should enjoy results in JSON format. I prefer msgpack but on the other hand I do not use ElasticSearch on Celery results...

Related

Mongodb find nested dict element

{
"_id" : ObjectId("63920f965d15e98e3d7c450c"),
"first_name" : "mymy",
"last_activity" : 1669278303.4341061,
"username" : null,
"dates" : {
"29.11.2022" : {
},
"30.11.2022" : {
}
},
"user_id" : "1085116517"
}
How can I find all documents with 29.11.2022 contained in date? I tried many things but in all of them it detects the dot letter as something else.
Use $getField in $expr.
db.collection.find({
$expr: {
$eq: [
{},
{
"$getField": {
"field": "29.11.2022",
"input": "$dates"
}
}
]
}
})
Mongo Playground

Why does this CloudFormation Template script not work?

I'm trying to create a stack on AWS CloudFormation, with an EC2 instance and 2 S3 buckets. My script is attempting to assign a Policy to the EC2 instance that allows access to the Storage bucket, but no matter what I do the rights are not assigned. Additionally, the userdata is not executed at all.
I tried testing thoroughly if the EC2 really does not have the rights: the CLI confirmed that it does not. I replaced the userdata with a simple script making a textfile, it really is not created. AWS Designer gives no complaints and shows the correct template structure. The stack description runs and executes with no errors, except the S3 storage bucket access and the user data don't work (no warnings).
After a LOT of manual editing and checking very carefully with the documentation, I realised I should have done this in a higher level language. Therefore I tried to import the script in a simple python Troposphere script using the templateGenerator. This leads to the following error (no other errors are created anywhere so far, everything just silently goes wrong, JSON syntax validators also have no complaints):
TypeError: <class 'troposphere.iam.PolicyType'>: MickStorageS3BucketsPolicy.PolicyDocument is <class 'list'>, expected (<class 'dict'>,)
However, clearly my PolicyDocument is of type dictionary, and I don't understand how it can be interpreted as a list. I have stared at this for many hours now, I may have become blind to the problem but I would Really really appreciate any help at this point!!!!
The security group and inbound traffic settings do work properly, my dockerized flask app runs fine (on the EC2) but just can't access the bucket (though I have to start it manually through SSH because userdata won't execute, I also tried doing that using the CFN-init segment in the ec2 metadata (under commands) but nothing executes, even if I try to run CFNinit manually after connecting by SSH).
This is the cloudformation template I wrote:
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "Attach IAM Role to an EC2",
"Parameters" : {
"KeyName" : {
"Description" : "EC2 Instance SSH Key",
"Type" : "AWS::EC2::KeyPair::KeyName",
"Default" : "MickFirstSSHKeyPair"
},
"InstanceType" : {
"Description" : "EC2 instance specs configuration",
"Type" : "String",
"Default" : "t2.micro",
"AllowedValues" : ["t2.micro", "t2.small", "t2.medium"]
}
},
"Mappings" : {
"AMIs" : {
"us-east-1" : {
"Name" : "ami-8c1be5f6"
},
"us-east-2" : {
"Name" : "ami-c5062ba0"
},
"eu-west-1" : {
"Name" : "ami-acd005d5"
},
"eu-west-3" : {
"Name" : "ami-05b93cd5a1b552734"
},
"us-west-2" : {
"Name" : "ami-0f2176987ee50226e"
},
"ap-southeast-2" : {
"Name" : "ami-8536d6e7"
}
}
},
"Resources" : {
"mickmys3storageinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
}
},
"mickmys3processedinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
}
},
"MickMainEC2" : {
"Type" : "AWS::EC2::Instance",
"Metadata" : {
"AWS::CloudFormation::Init" : {
"config" : {
"files" : {
},
"commands" : {
}
}
}
},
"Properties" : {
"UserData": {
"Fn::Base64" : "echo 'Heelo ww' > ~/hello.txt"
},
"InstanceType" : {
"Ref" : "InstanceType"
},
"ImageId" : {
"Fn::FindInMap" : [
"AMIs",
{
"Ref" : "AWS::Region"
},
"Name"
]
},
"KeyName" : {
"Ref" : "KeyName"
},
"IamInstanceProfile" : {
"Ref" : "ListS3BucketsInstanceProfile"
},
"SecurityGroupIds" : [
{
"Ref" : "SSHAccessSG"
},
{
"Ref" : "PublicAccessSG"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "MickMainEC2"
}
]
}
},
"SSHAccessSG" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Allow SSH access from anywhere",
"SecurityGroupIngress" : [
{
"FromPort" : "22",
"ToPort" : "22",
"IpProtocol" : "tcp",
"CidrIp" : "0.0.0.0/0"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "SSHAccessSG"
}
]
}
},
"PublicAccessSG" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Allow HTML requests from anywhere",
"SecurityGroupIngress" : [
{
"FromPort" : "80",
"ToPort" : "80",
"IpProtocol" : "tcp",
"CidrIp" : "0.0.0.0/0"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "PublicAccessSG"
}
]
}
},
"ListS3BucketsInstanceProfile" : {
"Type" : "AWS::IAM::InstanceProfile",
"Properties" : {
"Path" : "/",
"Roles" : [
{
"Ref" : "MickListS3BucketsRole"
}
]
}
},
"MickStorageS3BucketsPolicy" : {
"Type" : "AWS::IAM::Policy",
"Properties" : {
"PolicyName" : "MickStorageS3BucketsPolicy",
"PolicyDocument" : {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::mickmys3storageinstance", "arn:aws:s3:::mickmys3storageinstance/*"
]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": ["s3:*Object"],
"Resource": [
"arn:aws:s3:::mickmys3storageinstance", "arn:aws:s3:::mickmys3storageinstance/*"
]
}
]
},
"Roles" : [
{
"Ref" : "MickListS3BucketsRole"
}
]
}
},
"MickListS3BucketsRole" : {
"Type" : "AWS::IAM::Role",
"Properties" : {
"AssumeRolePolicyDocument": {
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Principal" : {
"Service" : ["ec2.amazonaws.com"]
},
"Action" : [
"sts:AssumeRole"
]
}
]
},
"Path" : "/"
}
}
},
"Outputs" : {
"EC2" : {
"Description" : "EC2 IP address",
"Value" : {
"Fn::Join" : [
"",
[
"ssh ec2-user#",
{
"Fn::GetAtt" : [
"MickMainEC2",
"PublicIp"
]
},
" -i ",
{
"Ref" : "KeyName"
},
".pem"
]
]
}
}
}
}
Here is my troposphere script generating the error on importing the above:
from troposphere import Ref, Template
import troposphere.ec2 as ec2
from troposphere.template_generator import TemplateGenerator
import json
with open("myStackFile.JSON") as f:
json_template = json.load(f)
template = TemplateGenerator(json_template)
template.to_json()
print(template.to_yaml())
I expected the roles to be assigned correctly, as well as the userdata to be executed. I expected troposphere to import the JSON, as it has the correct syntax and also the correct class typing as according to the documentation as far as I can see. I have doublechecked everything by hand for many many hours, I am not sure how to proceed finding the issue with this CloudFormation script. In the future (and I would advise anyone to do the same) I will not edit JSON (or worse, YAML) files by hand any more, and use higher level tools exclusively.
Thank you for ANY help/pointers!
Kind regards
Your user data isn't executed because you forgot #!/bin/bash. From the documentation:
User data shell scripts must start with the #! characters and the path to the interpreter you want to read the script (commonly /bin/bash). For a great introduction on shell scripting, see the BASH Programming HOW-TO at the Linux Documentation Project (tldp.org).
For the bucket permissions, I believe the issue is you specify the CloudFormation resource name in the policy instead of the actual bucket name. If you want the bucket to actually be named mickmys3storageinstance, you need:
"mickmys3storageinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
"BucketName": "mickmys3storageinstance"
}
},
Otherwise you should use Ref or Fn::Sub in the policy to get the actual bucket name.
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
{"Fn::Sub": "${mickmys3storageinstance.Arn}"},
{"Fn::Sub": "${mickmys3storageinstance.Arn}/*"}
]
},

Elastic Search and AWS python

I am working on AWS ElasticSearch using python,I have JSON file with 3 field.
("cat1","Cat2","cat3"), each row is separated with \n
example cat1:food, cat2: wine, cat3: lunch etc.
from requests_aws4auth import AWS4Auth
import boto3
import requests
payload = {
"settings": {
"number_of_shards": 10,
"number_of_replicas": 5
},
"mappings": {
"Categoryall" :{
"properties" : {
"cat1" : {
"type": "string"
},
"Cat2":{
"type" : "string"
},
"cat3" : {
"type" : "string"
}
}
}
}
}
r = requests.put(url, auth=awsauth, json=payload)
I created schema/mapping for the index as shown above but i don't know how to populate index.
I am thinking to put a for loop for JSON file and call post request to insert the index. Doesn't have an idea how to proceed.
I want to create index and bulk upload this file in the index. Any suggestion would be appreciated.
Take a look at Elasticsearch Bulk API.
Basically, you need to create a bulk request body and post it to your "https://{elastic-endpoint}/_bulk" url.
The following example is showing a bulk request to insert 3 json records into your index called "my_index":
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "1" } }
{ "cat1" : "food 1", "cat2": "wine 1", "cat3": "lunch 1" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "2" } }
{ "cat1" : "food 2", "cat2": "wine 2", "cat3": "lunch 2" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "3" } }
{ "cat1" : "food 3", "cat2": "wine 3", "cat3": "lunch 3" }
where each json record is represented by 2 json objects.
So if you write your bulk request body into a file called post-data.txt, then you can post it using Python something like this:
with open('post-data.txt','rb') as payload:
r = requests.post('https://your-elastic-endpoint/_bulk', auth=awsauth,
data=payload, ... add more params)
Alternatively, you can try Python elasticsearch bulk helpers.

Unhashable type 'dict' when trying to send an Elasticsearch geo query

I'm trying to get some geodata out of ES via the following snippet:
result = es.search(
index="loc",
body={
{
"filtered" : {
"query" : {
"field" : { "text" : "restaurant" }
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}
)
The query however doesn't succeed due to the following error:
"lon" : -70
TypeError: unhashable type: 'dict'
The location field is correctly mapped to the geo_point type and the query is taken from the official examples. Is there something wrong with the way I wrote the query?
You are nesting a dict inside a set. Remove outer curly braces to resolve the issue. The error stems from the fact that sets, dicts can't contain unhashable collections like e.g. dict (thanks #Matthias).
body=
{
"filtered" : {
"query" : {
"field" : { "text" : "restaurant" }
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}

how to aggregate on each item in collection in mongoDB

MongoDB noob here...
when I do db.students.find().pretty() in the shell I get a long list from my collection...like so..
{
"_id" : 19,
"name" : "Gisela Levin",
"scores" : [
{
"type" : "exam",
"score" : 44.51211101958831
},
{
"type" : "quiz",
"score" : 0.6578497966368002
},
{
"type" : "homework",
"score" : 93.36341655949683
},
{
"type" : "homework",
"score" : 49.43132782777443
}
]
}
now I've got about over 100 of these...I need to run the following on each of them...
lowest_hw_score =
db.students.aggregate(
// Initial document match (uses index, if a suitable one is available)
{ $match: {
_id : 0
}},
// Expand the scores array into a stream of documents
{ $unwind: '$scores' },
// Filter to 'homework' scores
{ $match: {
'scores.type': 'homework'
}},
// Sort in descending order
{ $sort: {
'scores.score': 1
}},
{ $limit: 1}
)
So I can run something like this on each result
for item in lowest_hw_score:
print lowest_hw_score
Right now "lowest_score" works on only one item I to run this on all items in the collection...how do I do this?
> db.students.aggregate(
{ $match : { 'scores.type': 'homework' } },
{ $unwind: "$scores" },
{ $match:{"scores.type":"homework"} },
{ $group: {
_id : "$_id",
maxScore : { $max : "$scores.score"},
minScore: { $min:"$scores.score"}
}
});
You don't really need the first $match, but if "scores.type" is indexed, it means it would be used before unwinding the scores. (I don't believe after the $unwind mongo would be able to use the index.)
Result:
{
"result" : [
{
"_id" : 19,
"maxScore" : 93.36341655949683,
"minScore" : 49.43132782777443
}
],
"ok" : 1
}
Edit: tested and updated in mongo shell

Categories