Python Log file count per Hour per IP

Python Log file count per Hour per IP - python

This script that displays how many attacks occur per hour per day. I want it to also count by IP address so it will show the IP addresses that were attacked per hour, per day.
from itertools import groupby
#open the auth.log for reading
myAuthlog=open('auth.log', 'r')
# Goes through the log file line by line and produces a list then looks for 'Failed password for'
myAuthlog = (line for line in myAuthlog if "Failed password for" in line)
# Groups all the times and dates together
for key, group in groupby(myAuthlog, key = lambda x: x[:9]):
month, day, hour = key[0:3], key[4:6], key[7:9]
# prints the results out in a format to understand e.g date, time then amount of attacks
print "On%s-%s at %s:00 There was %d attacks"%(day, month, hour, len(list(group)))
The Log File looks like This
Feb 3 13:34:05 j4-be02 sshd[676]: Failed password for root from 85.17.188.70 port 48495 ssh2
Feb 3 21:45:18 j4-be02 sshd[746]: Failed password for invalid user test from 62.45.87.113 port 50636 ssh2
Feb 4 08:39:46 j4-be02 sshd[1078]: Failed password for root from 1.234.51.243 port 60740 ssh2
A Example outcome of the code i have is:
On 3-Feb at 21:00 There was 1 attacks
On 4-Feb at 08:00 There was 15 attacks
On 4-Feb at 10:00 There was 60 attacks

from itertools import groupby
import re
myAuthlog=open('dict.txt', 'r')
myAuthlog = (line for line in myAuthlog if "Failed password for" in line)
for key, group in groupby(myAuthlog, key = lambda x: x[:9] + re.search('from(.+?) port', x).group(1)):
month, day, hour, ip = key[0:3], key[4:6], key[7:9] , key[10:]
print "On%s-%s at %s:00 There was %d attacks FROM IP %s"%(day, month, hour, len(list(group)), ip)
Log file:
Feb 3 13:34:05 j4-be02 sshd[676]: Failed password for root from 85.17.188.70 port 48495 ssh2
Feb 3 21:45:18 j4-be02 sshd[746]: Failed password for invalid user test from 62.45.87.113 port 50636 ssh2
Feb 4 08:39:46 j4-be02 sshd[1078]: Failed password for root from 1.234.51.243 port 60740 ssh2
Feb 4 08:53:46 j4-be02 sshd[1078]: Failed password for root from 1.234.51.243 port 60740 ssh2
output:
On 3-Feb at 13:00 There was 1 attacks FROM IP 85.17.188.70
On 3-Feb at 21:00 There was 1 attacks FROM IP 62.45.87.113
On 4-Feb at 08:00 There was 2 attacks FROM IP 1.234.51.243

Since you already know how to get the log lines per hour per day, use the following to count the IPs per hour per day. This is not a complete solution.
from collections import defaultdict
import re
ip_count = defaultdict(int)
with open('logfile') as data:
for line in data:
ip_count[re.findall(r'.*from (.*) port.*', line)[0]] += 1
for ip, count in ip_count.iteritems():
print ip, count

Related

Take specifics date in log file and process it

I want to process a log file that contains events log, but only today logs.
The log file looks like this:
Aug 23 07:23:05 iZk1a211s8hkb4hkecu7w1Z sshd[19569]: Invalid user test from 10.148.0.13 port 48382
...
Sep 20 07:23:06 iZk1a211s8hkb4hkecu7w1Z sshd[19569]: Failed password for invalid user test from 10.148.0.13 port 48382 ssh2
...
Aug 23 07:23:07 iZk1a211s8hkb4hkecu7w1Z sshd[19564]: Failed password for invalid user sysadm from 10.148.0.13 port 48380 ssh2
...
Oct 15 07:23:09 iZk1a211s8hkb4hkecu7w1Z sshd[19573]: Invalid user sinusbot from 10.148.0.13 port 48384
...
Sep 08 07:23:11 iZk1a211s8hkb4hkecu7w1Z sshd[19573]: Failed password for invalid user sinusbot from 10.148.0.13 port 48384 ssh2
...
Nov 01 07:23:16 iZk1a211s8hkb4hkecu7w1Z sshd[19587]: Invalid user smkim from 10.148.0.13 port 48386
...
Nov 12 07:23:18 iZk1a211s8hkb4hkecu7w1Z sshd[19587]: Failed password for invalid user smkim from 10.148.0.13 port 48386 ssh2
How to grab the today line in the log?
I've tried this and got stuck in finding the patterns:
from datetime import date
today = date.today()
today = today.strftime("%B %d")
with open('file.log','r') as f:
for line in f:
date = line.find("*idk I'm stuck at this point*")
if date = today:
`*run my process script*`
Does anyone have any suggestions?

You need to extract the part of the string containing the date, parse it as datetime and convert it to a date:
from datetime import date
today: date = date.today()
with open('file.log','r') as f:
for line in f:
date: date = datetime.strptime(line[:15], "%b %d %H:%M:%S").date().replace(year=today.year)
if date == today:
`*run my process script*`

There is redundant "last login info" in the stdout of fabric.operations.sudo

When I run the fabric.operations.sudo to get the info from a remote VM (its kernel is 4.14.35 EL7.6), such as "date +%s", the excepted result should be "1549853543", but in my test, it's "Last login: Mon Feb 11 02:53:18 UTC 2019 on pts/0\r\n1549853543".
I have run the command "ssh user#vm 'date +%s'", the result is normal(only the number).
Does anyone know what's the reason? I have also fixed the "PrintLastLog" to "no" in the /etc/ssh/sshd_config.
result = sudo('date +%s').stdout.strip()
run_time = int(result) => exception occurs
Except: 1549853543
Actual: invalid literal for int() with base 10: 'Last login: Mon Feb 11 02:53:18 UTC 2019 on pts/0\r\n1549853543'

Fix the 2 places it seems the last login info disappear:
/etc/pam.d/system-auth:
session required pam_lastlog.so silent showfailed
2: /etc/ssh/sshd_config:
# Per CCE-CCE-80225-6: Set PrintLastLog yes in /etc/ssh/sshd_config
PrintLastLog no

How to parse multiple line catalina log in python - regex

I have catalina log:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated
WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
at ais.api.rest.rdss.Resource.lookAT(Resource.java:22)
at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
I try to parse it in python. My problem is that I dont know how many lines there are in log. Minimum are 2 lines. I try read from file and when first line start with j,m,s,o etc. it mean it is first line of log, because this are first letters of months. But I dont know how to continue. When I stop read the lines ? When next line will starts with one of these letters ? But how I do that?
import datetime
import re
SPACE = r'\s'
TIME = r'(?P<time>.*?M)'
PATH = r'(?P<path>.*?\S)'
METHOD = r'(?P<method>.*?\S)'
REQUEST = r'(?P<request>.*)'
TYPE = r'(?P<type>.*?\:)'
REGEX = TIME+SPACE+PATH+SPACE+METHOD+SPACE+TYPE+SPACE+REQUEST
def parser(log_line):
match = re.search(REGEX,log_line)
return ( (match.group('time'),
match.group('path'),
match.group('method'),
match.group('type'),
match.group('request')
)
)
db = MySQLdb.connect(host="localhost", user="myuser", passwd="mypsswd", db="Database")
with db:
cursor = db.cursor()
with open("Mylog.log","rw") as f:
for line in f:
if (line.startswith('j')) or (line.startswith('f')) or (line.startswith('m')) or (line.startswith('a')) or (line.startswith('s')) or (line.startswith('o')) or (line.startswith('n')) or (line.startswith('d')) :
logLine = line
result = parser(logLine)
sql = ("INSERT INTO ..... ")
data = (result[0])
cursor.execute(sql, data)
f.close()
db.close()
Best idea I have is read just two lines at a time. But that means discard all another data. There must be better way.
I want read lines like this:
1.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
2.line - oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl java:43)
3.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
So I want start read when line starts with datetime (this is no problem). Problem is that I want stop read when next line starts with datetime.

This may be what you want.
I read lines from the log inside a generator so that I can determine whether they are datetime lines or other lines. Also, importantly, I can flag that end-of-file has been reached in the log file.
In the main loop of the program I start accumulating lines in a list when I get a datetime line. The first time I see a datetime line I print it out if it's not empty. Since the program will have accumulated a complete line when end-of-file occurs I arrange to print the accumulated line at that point too.
import re
a_date, other, EOF = 0,1,2
def One_line():
with open('caroline.txt') as caroline:
for line in caroline:
line = line.strip()
m = re.match(r'[a-z]{3}\s+[0-9]{1,2},\s+[0-9]{4}\s+[0-9]{1,2}:[0-9]{2}:[0-9]{2}\s+[AP]M', line, re.I)
if m:
yield a_date, line
else:
yield other, line
yield EOF, ''
complete_line = []
for kind, content in One_line():
if kind in [a_date, EOF]:
if complete_line:
print (' '.join(complete_line ))
complete_line = [content]
else:
complete_line.append(content)
Output:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

python - Raw load as an answer

Im trying to send an NTP query using scapy and sockets, but when i receive date i get it in a raw form.
from scapy.all import*
from scapy.all import*
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
addr=("192.114.62.250",123)
ntp=NTP()
s.sendto(str(ntp),addr)
data,ip=s.recvfrom(1024)
the answer should be in data but all i get is
'\x1c\x02\n\xeb\x00\x00\x01b\x00\x00\r\x8c\xc0s\xd12\xdcH\xa5\xda}-\x1b/\xdcH\xa9T\x95\x81\x08\x00\xdcH\xa9_\xd2\xc2n\xe1\xdcH\xa9_\xd2\xc6\xca\x1c'
and what i want is :
Peer Clock Stratum: secondary reference (2)
Peer Polling Interval: 10 (1024 sec)
Peer Clock Precision: 0.000000 sec
Root Delay: 0.0054 sec
Root Dispersion: 0.0529 sec
Reference ID: 192.115.209.50
Reference Timestamp: Feb 10, 2017 20:49:30.488969000 UTC
Origin Timestamp: Feb 10, 2017 21:04:20.584000000 UTC
Receive Timestamp: Feb 10, 2017 21:04:31.823279000 UTC
Transmit Timestamp: Feb 10, 2017 21:04:31.823345000 UTC

Try doing the following:
data = data.replace("\\", "\\\\")
data.decode('string-escape')
print data

Turns out you can just make turn data into an NTP
data= NTP(data)
and I got what i wanted.

Parsing email headers with regular expressions in python

I'm a python beginner trying to extract data from email headers. I have thousands of email messages in a single text file, and from each message I want to extract the sender's address, recipient(s) address, and the date, and write it to a single, semicolon-delimitted line in a new file.
this is ugly, but it's what I've come up with:
import re
emails = open("demo_text.txt","r") #opens the file to analyze
results = open("results.txt","w") #creates new file for search results
resultsList = []
for line in emails:
if "From - " in line: #recgonizes the beginning of a email message and adds a linebreak
newMessage = re.findall(r'\w\w\w\s\w\w\w.*', line)
if newMessage:
resultsList.append("\n")
if "From: " in line:
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
resultsList.append(address)
resultsList.append(";")
if "To: " in line:
if "Delivered-To:" not in line: #avoids confusion with 'Delivered-To:' tag
address = re.findall(r'[\w.-]+#[\w.-]+', line)
if address:
for person in address:
resultsList.append(person)
resultsList.append(";")
if "Date: " in line:
date = re.findall(r'\w\w\w\,.*', line)
resultsList.append(date)
resultsList.append(";")
for result in resultsList:
results.writelines(result)
emails.close()
results.close()
and here's my 'demo_text.txt':
From - Sun Jan 06 19:08:49 2013
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Delivered-To: somebody_1#hotmail.com
Received: by 10.48.48.3 with SMTP id v3cs417003nfv;
Mon, 15 Jan 2007 10:14:19 -0800 (PST)
Received: by 10.65.211.13 with SMTP id n13mr5741660qbq.1168884841872;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Return-Path: <nobody#hotmail.com>
Received: from bay0-omc3-s21.bay0.hotmail.com (bay0-omc3-s21.bay0.hotmail.com [65.54.246.221])
by mx.google.com with ESMTP id e13si6347910qbe.2007.01.15.10.13.58;
Mon, 15 Jan 2007 10:14:01 -0800 (PST)
Received-SPF: pass (google.com: domain of nobody#hotmail.com designates 65.54.246.221 as permitted sender)
Received: from hotmail.com ([65.54.250.22]) by bay0-omc3-s21.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668);
Mon, 15 Jan 2007 10:13:48 -0800
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Mon, 15 Jan 2007 10:13:47 -0800
Message-ID: <BAY115-F12E4E575FF2272CF577605A1B50#phx.gbl>
Received: from 65.54.250.200 by by115fd.bay115.hotmail.msn.com with HTTP;
Mon, 15 Jan 2007 18:13:43 GMT
X-Originating-IP: [200.122.47.165]
X-Originating-Email: [nobody#hotmail.com]
X-Sender: nobody#hotmail.com
From: =?iso-8859-1?B?UGF1bGEgTWFy7WEgTGlkaWEgRmxvcmVuemE=?=
<nobody#hotmail.com>
To: somebody_1#hotmail.com, somebody_2#gmail.com, 3_nobodies#yahoo.com.ar
Bcc:
Subject: fotos
Date: Mon, 15 Jan 2007 18:13:43 +0000
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_d98_1c4f_3aa9"
X-OriginalArrivalTime: 15 Jan 2007 18:13:47.0572 (UTC) FILETIME=[E68D4740:01C738D0]
Return-Path: nobody#hotmail.com
The output is:
somebody_1#hotmail.com;somebody_2#gmail.com;3_nobodies#yahoo.com.ar;Mon, 15 Jan 2007 18:13:43 +0000;
This output would be fine except there's a line break in the 'From:' field in my demo_text.txt (line 24), and so I miss 'nobody#hotmail.com'.
I'm not sure how to tell my code to skip line break and still find email address in the From: tag.
More generally, I'm sure there are many more sensible ways to go about this task. If anyone could point me in the right direction, I'd sure appreciate it.

Your demo text is practicallly the mbox format, which can be perfectly processed with the appropriate object in the mailbox module:
from mailbox import mbox
import re
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
mymbox = mbox("demo.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
date = [ email["date"], ]
print ";".join(from_address + to_address + date)

In order to skip newlines, you can't read it line by line. You can try loading in your file, and using your keywords (From, To, etc.) as boundaries. So when you search for 'From -', you use the rest of your keywords as boundaries so they won't be included in the portion of the list.
Also, mentioning this cause you said you're a beginner:
The "Pythonic" way of naming your non-class variables is with underscores. So resultsList should be results_list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Log file count per Hour per IP - python

Related

Take specifics date in log file and process it

There is redundant "last login info" in the stdout of fabric.operations.sudo

How to parse multiple line catalina log in python - regex

python - Raw load as an answer

Parsing email headers with regular expressions in python

Categories

Resources