Trying to separate the date time in snort log (.csv) - python

I have set Snort up to output alerts into a excel.csv directly with my required information.
I am using Python to input the values in my excel.csv into a database. < This works, no issues here
However one of my values in the excel is the Snort timestamp (MONTH/DAY-HOUR:MIN:SEC.MILIIS).
I wish to separate the date and time into 2 separate columns for me to easily input it into my SQL database.
I am trying to separate the datetime (currently the format is MONTH/DAY-HOUR:MIN:SEC.MILIIS) into Date (DD/MM) and Time (HOUR:MIN:SEC).
Current format in the excel: 04/11-10:47:30.789142
What I would like:
Column 1: 04/11
Column 2: 10:47:30
Current script:
import pandas as pd
import sys
import csv
import datetime
#import my csv
#working, able to read all data
data = pd.read_csv(r'C:\Users\devon\Desktop\testSnort.csv')
print (data)
#Set column "Date Time" in the excel as the variable DateTimeList
#Able to print out the Date+time only
DateTimeList = ["DateTime"]
datetime = pd.read_csv(r'C:\Users\devon\Desktop\testSnort.csv', usecols=DateTimeList)
print (datetime)
I am able to output the current data, and to filter out the DateTime values.
However I do not seem to be able to strip the 2 apart into different columns
Could someone advise me if it is possible?
Thank you!

There are a couple of ways that you can go about it, but if you are reading the timestamp as a string, probably the easiest way to do it is to split on the -.
value1, value2 = timestamp.split("-")
That will give you the month and day in value1 and the time in value2.

date='01/02-12:12:00.0000'
print(date.split('-'))
#['01/02', '12:12:00.0000']

Related

Parsing dates using Pandas

I'm trying to read a column with date and time from csv file and wanted to plot the frequencies of datas per day.
I don't actually know how to read them though.
You'll need to define your initial column as datetime first.
df['created'] = pd.to_datetime(df['created'])

Remove date from datetime in csv

for a project in python we need to use a csv file with several columns and create a ML model. My problem is, that one column is datetime, and the date is useless for the predictions, but i don't know how to remove it, as it is in the same column with the time like (so I can't just drop the column):
26.03.2018 00:00:00
Can you help me remove the date somehow? I tried different methods for handling 'datetime' but non worked so far.
data = pd.read_csv("TotalTrafo.csv")
dir(data)
type(data.Trafo1)
pandas.core.series.Series
Just do:
df['DateTime column']=df['DateTime column'].dt.time
to get only time .
for a datetime object foo you can simply call foo.time to get only the time (foo.date for date, and so on)
If your pandas series does not contain datetime objects you can convert it to datetime by doing something like this
data['Trafo1'] = pd.to_datetime(data['Trafo1'])
#or
data.Trafo1 = pd.to_datetime(data.Trafo1)

How to group data by date and get the mode at the same time using python

I'm cleaning weather data and I have several fields with categorical values. In the data set one date can have several values where I need to group them using their date and at the same time I need to get the mode for that specific date
temp_df2 = temp_df2.groupby(['Time']).apply(pd.DataFrame.mode)
What about this?:
temp_df2 = temp_df2.groupby(['Time']).apply(pd.DataFrame.mode())
Where you replace mode with mode()?

How can i convert my date column to datetime?

I have imported some data but the date column is in this format: 50:58.0, 23:11.0.. etc- when i click on the cell in excel however it is: 02/05/2019 07:50:58 (for the first one 50:54.0). So when i import into python as a pandas table it still retains the 50:54.0 format although i do not know why.
I tried changing the column to datetime as:
df['EventTS'] = pd.to_datetime(df['EventTS'], format='%d%b%Y:%H:%M:%S.%f')
but it doesn't work the error is time data '07:27.0' does not match format '%d%b%Y:%H:%M:%S.%f' (match)
without changing format in excel how do i correct this issue in python?

How to deal with multiple date string formats in a python series

I have a csv file which I am trying to complete operations on. I have created a dataframe with one column titled "start_date" which has the date of warranty start. The problem I have encountered is that the format of the date is not consistent. I would like to know the number of days passed from today's calendar date and the date warranty started for this product.
Two examples of the entries in this start_date series:
9/11/15
9/11/15 0:00
How can I identify each of these formats and treat them accordingly?
Unfortunately you just have to try each format it might be. If you give an example format, strptime will attempt to parse it for you as discussed here.
The code will end up looking like:
import datetime
POSSIBLE_DATE_FORMATS = ['%m/%d/%Y', '%Y/%m/%d', etc...] # all the formats the date might be in
for date_format in POSSIBLE_DATE_FORMATS :
try:
parsed_date = datetime.strptime(raw_string_date, date_format) # try to get the date
break # if correct format, don't test any other formats
except ValueError:
pass # if incorrect format, keep trying other formats
You have a few options really. I'm not entirely sure what happens when you try to directly load the file with a 'pd.read_csv' but as suggested above you can define a set of format strings that you can try to use to parse the data.
One other option would be to read the date column in as a string and then parse it yourself. If you want the column to be like 'YYYY-MM-DD' then parse the string to have just that data and then save it back, something like.
import pandas as prandas
import datetime
df = prandas.read_csv('supa_kewl_data.dis_fmt_rox', dtype={'start_date': str})
print df.head()
# we are interested in start_date
date_strs = df['start_date'].values
#YYYY-MM-DD
#012345678910
filter_date_strs = [x[0:10] for x in date_strs]
df['filter_date_strs] = filter_date_strs
# sometimes i've gotten complained at by pandas for doing this
# try doing df.loc[:,'filter_date_strs'] = filter_date_strs
# if you get some warning thing
# if you want you can convert back to date time using a
dobjs = [datetime.datetime.strptime(x,'%Y-%m-%d') for x in filter_date_strs]
df['dobj_start_date'] = dobjs
df.to_csv('even_better_data.csv', index=False)
Hopefully this helps! Pandas documentation is sketchy sometimes, looking at the doc in 0.16.2 for read_csv() is intimidating... http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
The library itself is stellar!
Not sure if this will help, but this is what I do when I'm working with Pandas on excel files and want the date format to be 'mm/dd/yyyy' or some other.
writer = pd.ExcelWriter(filename, engine='xlsxwriter', datetime_format='mm/dd/yyyy')
df.to_excel(writer, sheetname)
Maybe it'll work with:
df.to_csv

Categories