Write queryDSL to find unique error messages from sys log data? - python

Is there a way to configure the elasticsearch analyzer so that it is possible to get unique error messages in different scenarios?
1."...July 2020 23:00:00.674z... same message....."
2. slight changes in the string :
message1: "....message_details.. (unknown error 20004)
message2: "....message_details.. (unknown error 278945)
OR
message1:"....a::::: message_details ...."
message2:"....a:f23ed:fff:ff:: message_details ...."
The above two messages are the same apart from the character differnce.
Here is the query :
GET log_stash_2020.06.16/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"message": "Error"
}
},
{
"match_phrase": {
"type": "lab_id"
}
}
]
}
},
"aggs": {
"log_message": {
"significant_text": {
"field": "message",
"filter_duplicate_text": "true"
}
}
},
"size": 1000
}
I have added the sample log file.
{
"_index" : "logstash_2020.06.16",
"_type" : "doc",
"_id" : "################",
"_score" : 1.0,
"_source" : {
"logsource" : "router_id",
"timestamp" : "Jun 15 20:00:00",
"program" : "some_program",
"host" : "#############",
"priority" : "27",
"#timestamp" : "2020-06-16T00:00:01.020Z",
"type" : "lab_id",
"pid" : "####",
"message" : ": ############### send failed with error: ENOENT -- Item not found (No error: 0)",
"#version" : "1"
}
}
{
"_index" : "logstash_2020.06.16",
"_type" : "doc",
"_id" : "################",
"_score" : 1.0,
"_source" : {
"host" : "################",
"#timestamp" : "2020-06-16T00:00:02.274Z",
"type" : "####",
"tags" : [
"_grokparsefailure"
],
"message" : "################:Jun 15 20:00:18.908 EDT: mediasvr[2546]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host ##### exceeded 100% \n",
"#version" : "1"
}
}
Is there a way to do it in python ?(If elasticsearch does not have above mentioned functionality)

You can use the Elasticsearch Python client like so:
from elasticsearch import Elasticsearch
es = Elasticsearch(...)
resp = es.search(index="log_stash_2020.06.16", body={<dsl query>})
print(resp)
where is whatever query you want to run like the one you gave in the question.
<disclosure: I'm the maintainer of the Elasticsearch client and employed by Elastic>

Related

Logstash codec and character encoding problem

I send logs from a desktop python application (Python 3.6) to Logstash (7.5.0), when I want to log an error message for example with the text ">>>>>>>> ERROR <<<<<<<", in the logstash log file I see the following entry:
[2020-01-22T13:25:02,330][WARN ][logstash.codecs.line ][main] Received an event that has a different character encoding than you configured. {:text=>"\u0000\u0000\u0000MainThreadq\u001AX\v\u0000\u0000\u0000processNameq\eX\v\u0000\u0000\u0000MainProcessq\u001CX\a\u0000\u0000\u0000processq\u001DM\u001D\xEDu.\u0000\u0000\u0002\u001D}q\u0000(X\u0004\u0000\u0000\u0000nameq\u0001X\b\u0000\u0000\u0000__main__q\u0002X\u0003\u0000\u0000\u0000msgq\u0003X\u0018\u0000\u0000\u0000>>>>>>>> ERROR <<<<<<"UTF-8"}
And in Kibana, when I query the received messages, I see that some (in this case, 6) individual messages have been sent to Logstash per each log message that I sent (in this case, ">>>>>>>> ERROR <<<<<<<") as follows:
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "lNXhz28BzTlrr0WBIjwA",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : """\u0000\u0000\u0000stack_infoq\u0011NX\u0006\u0000\u0000\u0000linenoq\u0012K'X\b\u0000\u0000\u0000funcNameq\u0013X\b\u0000\u0000\u0000<module>q\u0014X\a\u0000\u0000\u0000createdq\u0015GA\u05CA;vϯWX\u0005\u0000\u0000\u0000msecsq\u0016G#n\xA2u\xEC\u0000\u0000\u0000X\u000F\u0000\u0000\u0000relativeCreatedq\u0017G#E\u001DM\xD0\u0000\u0000\u0000X\u0006\u0000\u0000\u0000threadq\u0018L4437804480L""",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.362Z"
}
},
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "k9Xhz28BzTlrr0WBITyc",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : """threadNameqX""",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.362Z"
}
},
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "kdXhz28BzTlrr0WBITyc",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : """MainThreadqXprocessNameqXMainProcessqXprocessqMC0u.""",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.369Z"
}
},
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "ktXhz28BzTlrr0WBITyc",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : "X",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.362Z"
}
},
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "j9Xhz28BzTlrr0WBITyc",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : """XfilenameqXtest2.pyqXmoduleq
Xtest2qXexc_infoqNXexc_textqNX""",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.345Z"
}
},
{
"_index" : "logstash-2020.01.23",
"_type" : "doc",
"_id" : "kNXhz28BzTlrr0WBITyc",
"_score" : 1.0,
"_source" : {
"host" : "localhost",
"port" : 50197,
"message" : """}q(XnameqX__main__qXmsgqX>>>>>>>> ERROR <<<<<<<qXargsqNX levelnameqXERRORqXlevelnoqK2Xpathnameq X1/Users/e0h014b/PycharmProjects/logstash2/test2.pyq""",
"#version" : "1",
"#timestamp" : "2020-01-23T00:50:35.331Z"
}
}
The logstash config file which I’m using is as the following:
input {
tcp {
port => 5959
codec => plain {
charset => "UTF-8"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
What should I do to have a normal format of logging in Logstash? Which codec and character encoding are proper in this application?
Thanks,
Elahe
If your log-messages just contain simple lines, you should go with the default codec, namely line.
I would always start with the default codec, test, verify the indexed data and then do some fine-tuning/changes to the codec if necessary.
Refer to this documentation about all available codecs.
I hope I could help you.

Why does this CloudFormation Template script not work?

I'm trying to create a stack on AWS CloudFormation, with an EC2 instance and 2 S3 buckets. My script is attempting to assign a Policy to the EC2 instance that allows access to the Storage bucket, but no matter what I do the rights are not assigned. Additionally, the userdata is not executed at all.
I tried testing thoroughly if the EC2 really does not have the rights: the CLI confirmed that it does not. I replaced the userdata with a simple script making a textfile, it really is not created. AWS Designer gives no complaints and shows the correct template structure. The stack description runs and executes with no errors, except the S3 storage bucket access and the user data don't work (no warnings).
After a LOT of manual editing and checking very carefully with the documentation, I realised I should have done this in a higher level language. Therefore I tried to import the script in a simple python Troposphere script using the templateGenerator. This leads to the following error (no other errors are created anywhere so far, everything just silently goes wrong, JSON syntax validators also have no complaints):
TypeError: <class 'troposphere.iam.PolicyType'>: MickStorageS3BucketsPolicy.PolicyDocument is <class 'list'>, expected (<class 'dict'>,)
However, clearly my PolicyDocument is of type dictionary, and I don't understand how it can be interpreted as a list. I have stared at this for many hours now, I may have become blind to the problem but I would Really really appreciate any help at this point!!!!
The security group and inbound traffic settings do work properly, my dockerized flask app runs fine (on the EC2) but just can't access the bucket (though I have to start it manually through SSH because userdata won't execute, I also tried doing that using the CFN-init segment in the ec2 metadata (under commands) but nothing executes, even if I try to run CFNinit manually after connecting by SSH).
This is the cloudformation template I wrote:
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "Attach IAM Role to an EC2",
"Parameters" : {
"KeyName" : {
"Description" : "EC2 Instance SSH Key",
"Type" : "AWS::EC2::KeyPair::KeyName",
"Default" : "MickFirstSSHKeyPair"
},
"InstanceType" : {
"Description" : "EC2 instance specs configuration",
"Type" : "String",
"Default" : "t2.micro",
"AllowedValues" : ["t2.micro", "t2.small", "t2.medium"]
}
},
"Mappings" : {
"AMIs" : {
"us-east-1" : {
"Name" : "ami-8c1be5f6"
},
"us-east-2" : {
"Name" : "ami-c5062ba0"
},
"eu-west-1" : {
"Name" : "ami-acd005d5"
},
"eu-west-3" : {
"Name" : "ami-05b93cd5a1b552734"
},
"us-west-2" : {
"Name" : "ami-0f2176987ee50226e"
},
"ap-southeast-2" : {
"Name" : "ami-8536d6e7"
}
}
},
"Resources" : {
"mickmys3storageinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
}
},
"mickmys3processedinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
}
},
"MickMainEC2" : {
"Type" : "AWS::EC2::Instance",
"Metadata" : {
"AWS::CloudFormation::Init" : {
"config" : {
"files" : {
},
"commands" : {
}
}
}
},
"Properties" : {
"UserData": {
"Fn::Base64" : "echo 'Heelo ww' > ~/hello.txt"
},
"InstanceType" : {
"Ref" : "InstanceType"
},
"ImageId" : {
"Fn::FindInMap" : [
"AMIs",
{
"Ref" : "AWS::Region"
},
"Name"
]
},
"KeyName" : {
"Ref" : "KeyName"
},
"IamInstanceProfile" : {
"Ref" : "ListS3BucketsInstanceProfile"
},
"SecurityGroupIds" : [
{
"Ref" : "SSHAccessSG"
},
{
"Ref" : "PublicAccessSG"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "MickMainEC2"
}
]
}
},
"SSHAccessSG" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Allow SSH access from anywhere",
"SecurityGroupIngress" : [
{
"FromPort" : "22",
"ToPort" : "22",
"IpProtocol" : "tcp",
"CidrIp" : "0.0.0.0/0"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "SSHAccessSG"
}
]
}
},
"PublicAccessSG" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Allow HTML requests from anywhere",
"SecurityGroupIngress" : [
{
"FromPort" : "80",
"ToPort" : "80",
"IpProtocol" : "tcp",
"CidrIp" : "0.0.0.0/0"
}
],
"Tags" : [
{
"Key" : "Name",
"Value" : "PublicAccessSG"
}
]
}
},
"ListS3BucketsInstanceProfile" : {
"Type" : "AWS::IAM::InstanceProfile",
"Properties" : {
"Path" : "/",
"Roles" : [
{
"Ref" : "MickListS3BucketsRole"
}
]
}
},
"MickStorageS3BucketsPolicy" : {
"Type" : "AWS::IAM::Policy",
"Properties" : {
"PolicyName" : "MickStorageS3BucketsPolicy",
"PolicyDocument" : {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::mickmys3storageinstance", "arn:aws:s3:::mickmys3storageinstance/*"
]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": ["s3:*Object"],
"Resource": [
"arn:aws:s3:::mickmys3storageinstance", "arn:aws:s3:::mickmys3storageinstance/*"
]
}
]
},
"Roles" : [
{
"Ref" : "MickListS3BucketsRole"
}
]
}
},
"MickListS3BucketsRole" : {
"Type" : "AWS::IAM::Role",
"Properties" : {
"AssumeRolePolicyDocument": {
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Principal" : {
"Service" : ["ec2.amazonaws.com"]
},
"Action" : [
"sts:AssumeRole"
]
}
]
},
"Path" : "/"
}
}
},
"Outputs" : {
"EC2" : {
"Description" : "EC2 IP address",
"Value" : {
"Fn::Join" : [
"",
[
"ssh ec2-user#",
{
"Fn::GetAtt" : [
"MickMainEC2",
"PublicIp"
]
},
" -i ",
{
"Ref" : "KeyName"
},
".pem"
]
]
}
}
}
}
Here is my troposphere script generating the error on importing the above:
from troposphere import Ref, Template
import troposphere.ec2 as ec2
from troposphere.template_generator import TemplateGenerator
import json
with open("myStackFile.JSON") as f:
json_template = json.load(f)
template = TemplateGenerator(json_template)
template.to_json()
print(template.to_yaml())
I expected the roles to be assigned correctly, as well as the userdata to be executed. I expected troposphere to import the JSON, as it has the correct syntax and also the correct class typing as according to the documentation as far as I can see. I have doublechecked everything by hand for many many hours, I am not sure how to proceed finding the issue with this CloudFormation script. In the future (and I would advise anyone to do the same) I will not edit JSON (or worse, YAML) files by hand any more, and use higher level tools exclusively.
Thank you for ANY help/pointers!
Kind regards
Your user data isn't executed because you forgot #!/bin/bash. From the documentation:
User data shell scripts must start with the #! characters and the path to the interpreter you want to read the script (commonly /bin/bash). For a great introduction on shell scripting, see the BASH Programming HOW-TO at the Linux Documentation Project (tldp.org).
For the bucket permissions, I believe the issue is you specify the CloudFormation resource name in the policy instead of the actual bucket name. If you want the bucket to actually be named mickmys3storageinstance, you need:
"mickmys3storageinstance" : {
"Type" : "AWS::S3::Bucket",
"Properties" : {
"BucketName": "mickmys3storageinstance"
}
},
Otherwise you should use Ref or Fn::Sub in the policy to get the actual bucket name.
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
{"Fn::Sub": "${mickmys3storageinstance.Arn}"},
{"Fn::Sub": "${mickmys3storageinstance.Arn}/*"}
]
},

Matching / Mapping lists with elasticsearch

There is a list in mongodb,
eg:
db_name = "Test"
collection_name = "Map"
db.Map.findOne()
{
"_id" : ObjectId(...),
"Id" : "576",
"FirstName" : "xyz",
"LastName" : "abc",
"skills" : [
"C++",
"Java",
"Python",
"MongoDB",
]
}
There is a list in elastcisearch index (I am using kibana to execute queries)
GET /user/_search
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "xyz abc"
"Age" : 21,
"skills" : [
"C++",
"Java",
"Python",
"MongoDB",
]
}
},
]
}
}
Can anyone help with the elasticsearch query that will match both the records based on skills.
I am using python to write the code
If a match is found, I am trying to get the first name and last name of that user
First name : "xyz"
Last name : "abc"
Assuming you are indexing all the document in elastic and of these you want to match documents where skills has both java and mongodb the query will be as:
{
"query": {
"bool": {
"filter": [
{
"term": {
"skills": "mongodb"
}
},
{
"term": {
"skills": "java"
}
}
]
}
}
}

Elastic Search and AWS python

I am working on AWS ElasticSearch using python,I have JSON file with 3 field.
("cat1","Cat2","cat3"), each row is separated with \n
example cat1:food, cat2: wine, cat3: lunch etc.
from requests_aws4auth import AWS4Auth
import boto3
import requests
payload = {
"settings": {
"number_of_shards": 10,
"number_of_replicas": 5
},
"mappings": {
"Categoryall" :{
"properties" : {
"cat1" : {
"type": "string"
},
"Cat2":{
"type" : "string"
},
"cat3" : {
"type" : "string"
}
}
}
}
}
r = requests.put(url, auth=awsauth, json=payload)
I created schema/mapping for the index as shown above but i don't know how to populate index.
I am thinking to put a for loop for JSON file and call post request to insert the index. Doesn't have an idea how to proceed.
I want to create index and bulk upload this file in the index. Any suggestion would be appreciated.
Take a look at Elasticsearch Bulk API.
Basically, you need to create a bulk request body and post it to your "https://{elastic-endpoint}/_bulk" url.
The following example is showing a bulk request to insert 3 json records into your index called "my_index":
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "1" } }
{ "cat1" : "food 1", "cat2": "wine 1", "cat3": "lunch 1" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "2" } }
{ "cat1" : "food 2", "cat2": "wine 2", "cat3": "lunch 2" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "3" } }
{ "cat1" : "food 3", "cat2": "wine 3", "cat3": "lunch 3" }
where each json record is represented by 2 json objects.
So if you write your bulk request body into a file called post-data.txt, then you can post it using Python something like this:
with open('post-data.txt','rb') as payload:
r = requests.post('https://your-elastic-endpoint/_bulk', auth=awsauth,
data=payload, ... add more params)
Alternatively, you can try Python elasticsearch bulk helpers.

post request with \n-delimited JSON in python

I'm trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a "data" is not a proper JSON, but a JSON that uses \n as delimiters.
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'
My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I'm not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
I know that this is a valid python request
import requests
import json
data = json.dumps({"field":"value"})
r = requests.post("localhost:9200/_bulk?pretty", data=data)
But what do I do if the JSON is \n-delimited?
What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:
data = [
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
{ "field1" : "value1" },
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
{ "field1" : "value3" },
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
{ "doc" : {"field2" : "value2"} }
]
data_to_post = '\n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)
However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.
As a follow-up to Daniel's answer above, I had to add an additional '\n' to the end of the data_to_post, and add a {Content-Type: application/x-ndjson} header to get it work in Elasticsearch 6.3.
data_to_post = '\n'.join(json.dumps(d) for d in data) + "\n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)
Otherwise, I will receive the error:
"The bulk request must be terminated by a newline [\\n]"
You can use python ndjson library to do it.
https://pypi.org/project/ndjson/
It contains JSONEncoder and JSONDecoder classes for easy use with other libraries, such as requests:
import ndjson
import requests
response = requests.get('https://example.com/api/data')
items = response.json(cls=ndjson.Decoder)

Categories