grep logfile for a specific timeframe - python

I need to filter messages out of a log file which has the following format:
2013-03-22T11:43:21.817078+01:00 INFO log msg 1...
...
2013-03-22T11:44:32.817114+01:00 WARNING log msg 2...
...
2013-03-22T11:45:45.817777+01:00 INFO log msg 3...
...
2013-03-22T11:46:59.547325+01:00 INFO log msg 4...
...
(where ... means "more messages")
The filtering must be done based on a timeframe.
This is part of a bash script, and at this point in the code the timeframe is stored as $start_time and $end_time. For example:
start_time = "2013-03-22T11:45:20"
end_time = "2013-03-22T11:45:50"
Note that the exact value of $start_time or $end_time may may never appear in the log file; yet there will be several messages within the timeframe [$start_time, $end_time] which are the ones I'm looking for.
Now, I'm almost convinced I'll need a Python script to do the filtering, but I'd rather use grep (or awk, or any other tool) since it should run much faster (the log files are big).
Any suggestions?

based on the log content in your question, I think an awk oneliner may help:
awk -F'.' -vs="$start_time" -ve="$end_time" '$1>s && $1<e' logfile
Note: this is filtering content excluding the start and end time.

$ start_time="2013-03-22T11:45:20"
$ end_time="2013-03-22T11:45:50"
$ awk -F'.' '$1>s&&$1<e' s=$start_time e=$end_time file
2013-03-22T11:45:45.817777+01:00 INFO log msg 3...

Related

How to write a python pre-push to check the commit message?

I know I can use commit-msg hook to check the message (I did it) but I wish to check the message again prior to push with different criteria.
The example given under .git/hooks was written in shell script, but I wish to write a python script the string operations I have in mind are more complex.
I know I can change the first line to #!/usr/bin/env python. My problem is, I don't know how to get the lastest commit's message string. I experimented with git rev-list (on that example's line 44) but it doesn't give me the message string and it needs the commit hash id. Note that the message string might be multiline (because the first line is restricted to be 50 characters at most).
EIDT: other questions asked how to write a pre-push script in Python, but they didn't involve checking the message string.
pre-push hook is passed a list of local/remote refs and commit IDs (sha1), so you have to read them line by line, split and get the local commit ID. See how it can be done in a shell script.
Having a commit ID you can extract full commit message with the command
git show --format='%B' -s $SHA1
In Python it is something like:
for line in sys.stdin:
local_ref, local_sha1, remote_ref, remote_sha1 = line.strip().split()
message = subprocess.check_output(
['git', 'show', '--format=%B', '-s', local_sha1])
if not check(message):
sys.exit(1)

Get latest file from samba share using smbclient

My directory contains the file and it's different versions out of which I want to pick the latest version which can either be sorted by date or by the revision number at the end of file name some thing like
Myfile2001.txt
Where 2001 is the revision number.
How can I get the latest file from samba-share directory using smb-client, I thought of using mask to take out all the names and pipe it to output and then to performing some searching algorithm to find the largest number (latest revision) and then use smbclient get to get the file, but this does not seems an optimal solution and it's too tedious. I wonder if there is any other way to do it ?
EDIT: I figured out an alternate way in python(Just for ease) to capture the output of smbclient get ls in text file or STDOUT and then use python to find the latest file's name. Now I cannot figure out how can I redirect the output of the above command to a text file or STDOUT to process it according to a logic.
Is there any way to do it? As smbclient does not allow the ioredirection, still I am stuck at the same point with newer approach. I have gone through pysmb but cannot rely on it as it is an experimental library, however, any solution with pysmb is also accepted to solve the purpose momentarily.
I've solved this issue using awk in bash script. Goal : download the most recent csv file
${SmbCmd} "ls <mask>" 2>/dev/null\
| awk '{ if ($1 ~ "csv$") print $1 }' | sort | tail -1)
Where ${SmbCmd} have all the values to send to smb server, as the path to smbclient, the authentification method, the smb server name, the smb dir .... and finish by "--command", in long form
Of course, my csv files names have the creation date "name_yyyy-mm-dd.csv".
You can try something like that
${SmbCmd} "ls <mask>-*" | awk '{ if ($1 ~ "csv$") print $8$5$6";"$1 }'
But, the month isn't numeric

categorize errors & warnings in logs

I wanted to categorize the errors in my log files. I have many folders(~100) and each of them has a log file. I want to be able to parse all the log files and categorize different errors with their frequency. The log would have following format
2014-10-22 07:55:02,997 ERROR log_message [optional_stack_trace]
One approach is to first parse all the log statements having ERROR and putting them in a single file. Ideally the resultant file will have just the log_messages without the date & ERROR strings. I guess I can just group similar strings after that. What do you guys think? Any cleaner and better approach?
You're going to want something like this (using GNU awk for true 2-d arrays):
$ awk '{cnt[$3][$4]++} END{for (err in cnt) for (msg in cnt[err]) print err, msg, cnt[err][msg]}' file1 file2 ...
but since you didn't post any sample input and expected output, it's a guess.

Get lines from stdout after timestamp

There is a huge log of errors/warnings/infos printed out on stdout. I am only interested in the lines logged after I start a specific action.
Other information: I am using Python to telnet to a shell environment. I execute the commands on shell and store the time the action is started. I then call a command to view the log which spits it on stdout. I expect to read in the greped lines after that timestamp back to Python. I also store the current time but not sure how to use that (maybe grep on a date range?)
I can redirect to a file and use find but the log is huge and I'd rather not read all of it.
I can grep -n to get line number and then read everything after but I'm not sure how to.
Concept regex to egrep on is something like: {a-timestamp}*
Any suggestions would be appreciated!
awk '/the-timestamp-I-have/,0' the-log-file
This will print the lines from the-log-file, starting at the first line that matches the-timestamp-I-have and continuing through the last line.
Ref:
http://www.catonmat.net/blog/awk-one-liners-explained-part-three/
http://www.catonmat.net/blog/ten-awk-tips-tricks-and-pitfalls/#awk_ranges

Asynchronous tasks in Plone to query Python Package Index

I want to periodically (every hour?) query the Python Package Index API from Plone. Something equivalent to:
$ for i in `yolk -L 24 | awk '{print $1}'` # get releases made in last 24 hours
do
# search for plone classifier
results=`yolk -M $i -f classifiers | grep -i plone`
if [ $results ]; then
echo $i
fi
done
Results:
collective.sendaspdf
gocept.selenium
Products.EnhancedNewsItemImage
adi.workingcopyflag
Products.SimpleCalendarPortlet
Products.SimpleCalendar
Then I want to display this information in a template. I would love to, at least initially, avoid having to persist the results.
How do I display the results in a template without having to wait for the query to finish? I know there are some async packages available e.g.:
plone.app.async
But I'm not sure what the general approach should be (assuming I can schedule an async task, I may need to store the results somewhere. If I have to store the results, I'd prefer to do it "lightweight" e.g. annotations)
How about the low, low tech version?
Use a cron-job to run the query, put this in a temp file, then move the file into a known location, with a timestamp in the filename.
Then, when someone requests the page in question (showing new packages), simply read the newest file in that location:
filename = sorted(os.listdir(location))[-1]
data = open(os.path.join(location, filename)).read()
By using a move, you guarantee that the newest file in the designated location is always a complete file, avoiding a partial result being read.

Categories