Grouping and summing cloudwatch log insights query - python

I have about 10k logs from log insights in the below format (cannot post actual logs due to privacy rules). I am using boto3 to query the logs.
Log insights query:
filter #message like /ERROR/
Output Logs format:
timestamp:ERROR <some details>Apache error....<error details>
timestamp:ERROR <some details>Connection error.... <error details>
timestamp:ERROR <some details>Database error....<error details>
What I need is to group the errors having similar substring (like group by Connection error, Apache error, Database error) or any other similar errors and get a sum of those.
Expected output:
Apache error 130
Database error 2253
Connection error 3120
Is there some regex or any other way I can use to pull out similar substrings and group them and get the sum? Either in python or in log insights.

Its impossible to say without seeing the source of your data but you can extract values from logs with a logs insight query like
filter #logStream like 'SOMEHOST'
| parse #message /<EventID.*?>(?<event_id>\d+)<\/EventID>/
| stats count() by event_id
In this case I was parsing windows event logs to count how many of each event type happened
----------------------
| event_id | count() |
|----------|---------|
| 7036 | 80 |
| 7001 | 4 |
| 7002 | 4 |
| 6013 | 1 |
| 7039 | 1 |
| 7009 | 1 |
| 7000 | 1 |
| 7040 | 2 |
| 7045 | 1 |
----------------------
This query just looked for the event ID xml element. In your case you would need to look at your data to see how best to identify and extract your error. but if the error is in a format that has a field you can extract it. Even if there is no field you could still use a regex if there is a pattern to the data

If you had the logs inside a list you could accomplish this with a regex like this.
logs = [(errors here)]
error_counts = {}
for log in logs:
match = re.search(r'ERROR\s+(.*?)\s*\.\.\.', log)
if match:
error_type = match.group(1)
error_counts[error_type] = error_counts.get(error_type, 0) + 1
for error_type, count in error_counts.items():
print(error_type.ljust(20), count)
If you're using the AWS SDK for Python to query the logs, you can use the filter_pattern parameter to filter the logs by error type and the stats parameter to get the count for each error type and then use the start_query method to submit a query to CloudWatch Logs Insights.

Related

django list model entry with multiple references

I have the following models which represent songs and the plays of each song:
from django.db import models
class Play(models.Model):
play_day = models.PositiveIntegerField()
source = models.CharField(
'source',
choices=(('radio', 'Radio'),('streaming', 'Streaming'), )
)
song = models.ForeignKey(Song, verbose_name='song')
class Song(models.Model):
name = models.CharField('Name')
Image I have the following entries:
Songs:
|ID | name |
|---|---------------------|
| 1 | Stairway to Heaven |
| 2 | Riders on the Storm |
Plays:
|ID | play_day | source | song_id |
|---|----------|-----------|---------|
| 1 | 2081030 | radio | 1 |
| 1 | 2081030 | streaming | 1 |
| 2 | 2081030 | streaming | 2 |
I would like to list all the tracks as follows:
| Name | Day | Sources |
|---------------------|------------|------------------|
| Stairway to Heaven | 2018-10-30 | Radio, Streaming |
| Riders on the Storm | 2018-10-30 | Streaming |
I am using Django==1.9.2, django_tables2==1.1.6 and django-filter==0.13.0 with PostgreSQL.
Problem:
I'm using Song as the model of the table and the filter, so the queryset starts with a select FROM song. However, when joining the Play table, I get two entries in the case of "Stairway to Heaven" (I know, even one is too much: https://www.youtube.com/watch?v=RD1KqbDdmuE).
What I tried:
I tried putting a distinct to the Song, though this yields the problem that I cannot sort for other columns than the Song.id (supposing I do distinct on that column)
Aggregate: this yields a final state, actually, a dictionary and which cannot be used with django_tables.
I found this solution for PostgreSQL Selecting rows ordered by some column and distinct on another though I don't know how to do this with django.
Question:
What would be the right approach to show one track per line "aggregating" information from references using Django's ORM?
I think that the proper way to do it is to use the array_agg postgresql function (http://postgresql.org/docs/9.5/static/functions-aggregate.html and http://lorenstewart.me/2017/12/03/postgresqls-array_agg-function).
Django seems to actually support this (in v. 2.1 at least: http://docs.djangoproject.com/en/2.1/ref/contrib/postgres/aggregates/) thus that seems like the way to go.
Unfortunately I don't have time to test it right now so I can't provide a thorough answer; however try something like: Song.objects.all().annotate(ArrayAgg(...))

Create a search function using Dynamodb and boto3

I'm trying to understand how to create a search function using dynamodb. This answer helped me to understand better the use of Global Secondary Indexes but I still have some questions. Suppose we have an structure like this and a GSI called last_name_index:
+------+-----------+----------+---------------+
| User | FirstName | LastName | Email |
+------+-----------+----------+---------------+
| 1001 | Test | Test | test#mail.com |
| 1002 | Jonh | Doe | jdoe#mail.com |
| 1003 | Another | Test | mail#mail.com |
+------+-----------+----------+---------------+
Using boto3 I can search now for a user if I know the last name:
table.query(
IndexName = "last_name_index",
KeyConditionExpression=Key('LastName').eq(name)
)
But what if want to search for users and I only know part of the last name? I know there is a contains function on boto3 but this only works with non index keys. Do I need to change the GSI? Or is there something I'm missing? I want to be able to do something like:
table.query(
IndexName = "last_name_index",
KeyConditionExpression=Key('LastName').contains(name) # part of the name
)

Flask SQLAlchemy sum function comparison

I have table that is holding some data about users. There are two fields there like and smile. I need to get data from table, grouped by user_id that will show if user has likes or smiles. Query that I would write in SQL looks like:
select sum(smile) > 0 as has_smile,
sum(like) > 0 as has_like,
user_id
from ratings
group by user_id.
This would provide output like:
| has_smile | has_like | user_id |
+-----------+----------+---------+
| 1 | 0 | 1 |
| 1 | 1 | 2 |
Is there any chance this query can be translated to SQLAlchemy (Flask-SQLAlchemy to be precise)? I know there is db.func.sum but I don't know how to add comparison there, and to have label. What I did for now is:
cls.query.with_entities("user_id").group_by(user_id).\
add_columns(db.func.sum(cls.smile).label("has_smile"),
db.func.sum(cls.like).label("has_like")).all()
but that will return exact number of smiles/likes instead of just 1/0 if there is or there is not smile/like.
Thanks to operator overloading you'd do comparison the way you're used to doing in Python in general:
db.func.sum(cls.smile) > 0
which produces an SQL expression object that you can then give a label to:
(db.func.sum(cls.smile) > 0).label('has_smile')

Load XML to MySQL

I need some input on how to best load the below XML file to MySQL.
I have a XML file which contains info like below:
<Start><Account>0001</Account><Asset>ABC</Asset><Value>500</Value><Asset>DEF</Asset><Value>600</Value></Start>
<Start>.......
When I use
LOAD XML LOCAL INFILE 'file.xml' INTO TABLE my_tablename ROWS IDENTIFIED BY '<Start>
the file loads successfully but the account column is all NULL.
I.e., select * from my_tablename;
Account | Asset | Value
Null | ABC | 500
Null | DEF | 600
as opposed to
I.e., select * from my_tablename;
Account | Asset | Value
0001 | ABC | 500
0001 | DEF | 600
What's the best way to handle this? re-format the file in python first? Another SQL query?
Thank you.
To have the result you need, your XML should be like this:
<Start><account>0001</account><asset>ABC</asset><value>500</value></Start>
<Start><account>0001</account><asset>DEF</asset><value>600</value></Start>
One account, asset and value tag per Start tag.

Django multipart ORM query including JOINs

I believe that I am simply failing to search correctly, so please redirect me to the appropriate question if this is the case.
I have a list of orders for an ecommerce platform. I then have two tables called checkout_orderproduct and catalog_product structured as:
|______________checkout_orderproduct_____________|
| id | order_id | product_id | qty | total_price |
--------------------------------------------------
|_____catalog_product_____|
| id | name | description |
---------------------------
I am trying to get all of the products associated with an order. My thought is something along the lines of:
for order in orders:
OrderProduct.objects.filter(order_id=order.id, IM_STUCK_HERE)
What should the second part of the query be so that I get back a list of products such as
["Fruit", "Bagels", "Coffee"]
products = (OrderProduct.objects
.filter(order_id=order.id)
.values('product_id'))
Product.objects.filter(id__in=products)
Or id__in=list(products): see note "Performance considerations" link.

Categories