I have some text below that needs to handled. The timestamp is currently listed and then the value. The format for the timestamp is yyyymmdd and I want to be able to alter it to yyyy-mm-dd or some other variation: yyyy/mm/dd, etc. I can't seem to find a string method that inserts characters into strings so I'm unsure of the best way to do go about this. Looking for efficiency here and general advice on slicing and dicing text in python. Thanks in advance!
19800101,0.76
19800102,0.00
19800103,0.51
19800104,0.00
19800105,1.52
19800106,2.54
19800107,0.00
19800108,0.00
19800109,0.00
19800110,0.76
19800111,0.25
19800112,0.00
19800113,6.10
19800114,0.00
19800115,0.00
19800116,2.03
19800117,0.00
19800118,0.00
19800119,0.25
19800120,0.25
19800121,0.00
19800122,0.00
19800123,0.00
19800124,0.00
19800125,0.00
19800126,0.00
19800127,0.00
19800128,0.00
19800129,0.00
19800130,7.11
19800131,0.25
19800201,.510
19800202,0.00
19800203,0.00
19800204,0.00
I'd do something like this:
#!/usr/bin/env python
from datetime import datetime
with open("stuff.txt", "r") as f:
for line in f:
# Remove initial or ending whitespace (like line endings)
line = line.strip()
# Split the timestamp and value
raw_timestamp, value = line.split(",")
# Make the timestamp an actual datetime object
timestamp = datetime.strptime(raw_timestamp, "%Y%m%d")
# Print the timestamp separated by -'s. Replace - with / or whatever.
print("%s,%s" % (timestamp.strftime("%Y-%m-%d"), value))
This lets you import or print the timestamp using any format allowed by strftime.
general advice on slicing and dicing text in python
The slice operator:
str = '19800101,0.76'
print('{0}-{1}-{2}'.format(str[:4], str[4:6], str[6:]))
Read: strings (look for the part on slices), and string formatting.
Strings are not mutable so inserting characters into strings won't work. Try this:
date = '19800131'
print '-'.join([date[:4],date[4:6],date[6:]])
Related
I have a script that reads through a log file that contains hundreds of these logs, and looks for the ones that have a "On, Off, or Switch" type. Then I output each log into its own list. I'm trying to find a way to extract the Out and In times into a separate list/array and then subtract the two times to find the duration of each separate log. This is what the outputted logs look like:
['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
This is my current code:
logfile = '/path/to/my/logfile'
with open(logfile, 'r') as f:
text = f.read()
words = ["On", "Off", "Switch"]
text2 = text.split('\n')
for l in text.split('\n'):
if (words[0] in l or words[1] in l or words[2] in l):
log = l.split(',')[0:3]
I'm stuck on how to target only the Out and In time values from the logs and put them in an array and convert to a time value to find duration.
Initial log before script: everything after the "In" time is useless for what I'm looking for so I only have the first three indices outputted
2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a","Type":"Switch,"In":"2020-01-31T00:30:20.140Z","Path":"interface","message":"interface changed status from unknown to normal","severity":"INFORMATIONAL","display":true,"json_map":"{\"severity\":null,\"eventId\":\"65e-64d9-45-ab62-8ef98ac5e60d\",\"componentPath\":\"interface_css\",\"displayToGui\":false,\"originalState\":\"unknown\",\"closed\":false,\"eventType\":\"InterfaceStateChange\",\"time\":\"2019-04-18T07:04:32.747Z\",\"json_map\":null,\"message\":\"interface_css changed status from unknown to normal\",\"newState\":\"normal\",\"info\":\"Event created with current status\"}","closed":false,"info":"Event created with current status","originalState":"unknown","newState":"normal"}
Below is a possible solution. The wordmatch line is a bit of a hack, until I find something clearer: it's just a one-liner that create an empty or 1-element set of True if one of the words matches.
(Untested)
import re
logfile = '/path/to/my/logfile'
words = ["On", "Off", "Switch"]
dateformat = r'\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[Zz]?'
pattern = fr'Out:\s*\[(?P<out>{dateformat})\].*In":\s*\"(?P<in>{dateformat})\"'
regex = re.compile(pattern)
with open(logfile, 'r') as f:
for line in f:
wordmatch = set(filter(None, (word in s for word in words)))
if wordmatch:
match = regex.search(line)
if match:
intime = match.group('in')
outtime = match.group('out')
# whatever to store these strings, e.g., append to list or insert in a dict.
As noted, your log example is very awkward, so this works for the example line, but may not work for every line. Adjust as necessary.
I have also not included (if so wanted), a conversion to a datetime.datetime object. For that, read through the datetime module documentation, in particular datetime.strptime. (Alternatively, you may want to store your results in a Pandas table. In that case, read through the Pandas documentation on how to convert strings to actual datetime objects.)
You also don't need to read nad split on newlines yourself: for line in f will do that for you (provided f is indeed a filehandle).
Regex is probably the way to go (fastness, efficiency etc.) ... but ...
You could take a very simplistic (if very inefficient) approach of cleaning your data:
join all of it into a string
replace things that hinder easy parsing
split wisely and filter the split
like so:
data = ['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
all_text = " ".join(data)
# this is inefficient and will create throwaway intermediate strings - if you are
# in a hurry or operate on 100s of MB of data, this is NOT the way to go, unless
# you have time
# iterate pairs of ("bad thing", "what to replace it with") (or list of bad things)
for thing in [ (": ",":"), (list('[]{}"'),"") ]:
whatt = thing[0]
withh = thing[1]
# if list, do so for each bad thing
if isinstance(whatt, list):
for p in whatt:
# replace it
all_text = all_text.replace(p,withh)
else:
all_text = all_text.replace(whatt,withh)
# format is now far better suited to splitting/filtering
cleaned = [a for a in all_text.split(" ")
if any(a.startswith(prefix) or "Switch" in a
for prefix in {"In:","Switch:","Out:"})]
print(cleaned)
Outputs:
['Out:2020-01-31T00:30:20.150Z', 'Type:Switch', 'In:2020-01-31T00:30:20.140Z']
After cleaning your data would look like:
2020-01-31T12:04:57.976Z 1234 Out:2020-01-31T00:30:20.150Z Id:Id:4-f-4-9-6a Type:Switch In:2020-01-31T00:30:20.140Z
You can transform the clean list into a dictionary for ease of lookup:
d = dict( part.split(":",1) for part in cleaned)
print(d)
will produce:
{'In': '2020-01-31T00:30:20.140Z',
'Type': 'Switch',
'Out': '2020-01-31T00:30:20.150Z'}
You can use datetime module to parse the times from your values as shown in 0 0 post.
I need to convert the form data below to a slightly different format to be able to submit correctly.
I have this form data.
PaReq:eJxdUt1ugjAYvfcpyB6AlvpTMLUJG1vmEp2Z7mKXpHRIVMBSBvr0a9FatAlJz/lO6en5PrLZCs6j
NWe14HTgOGTBqypOuZMls6cydrGHgwn2UOA/6bISrMIvfrzsFfrjosqKnHoudBEBBpryggu2jXNp
CEXF7Pg8X9JRgAIICbhCWz9wMY+oj/EYDyfwugi40FaWxwdOPyJnXRZCVgR02JZZUedSnKiPJgQY
YMu12NOtlOUUgKZp3N+ikGUsRbF3WeHWO0CAVphXgMdnkFWtiap/Y5sldBGFjf1Yuzzv0PL8evrc
pDMCtMLqk1hyiqCHoT/0HIimCE/HmICO78V10OapNxy5QaDiukBbL7WT8CbSmj7VS6QWgufMRGKQ
FfC2LHKuzqg+3vY9v7xidBg5VTcryqfGt4QeAyEv73c9Z1J1LwxZ+takbbhOfr6h9sjC65rpSehE
d4Yy1TXkQb9zlNkWEmD+r642A6n71A0vHRBwP9j/7TDLBQ==
TermUrl:https://www.footpatrol.co.uk/checkout/3d
MD:
Wanted format:
PaReq=eJxdUt1ugjAYvfcpyB6AlvpTMLUJG1vmEp2Z7mKXpHRIVMBSBvr0a9FatAlJz%2FlO6en5PrLZCs6j%0D%0ANWe14HTgOGTBqypOuZMls6cydrGHgwn2UOA%2F6bISrMIvfrzsFfrjosqKnHoudBEBBpryggu2jXNp%0D%0ACEXF7Pg8X9JRgAIICbhCWz9wMY%2Boj%2FEYDyfwugi40FaWxwdOPyJnXRZCVgR02JZZUedSnKiPJgQY%0D%0AYMu12NOtlOUUgKZp3N%2BikGUsRbF3WeHWO0CAVphXgMdnkFWtiap%2FY5sldBGFjf1Yuzzv0PL8evrc%0D%0ApDMCtMLqk1hyiqCHoT%2F0HIimCE%2FHmICO78V10OapNxy5QaDiukBbL7WT8CbSmj7VS6QWgufMRGKQ%0D%0AFfC2LHKuzqg%2B3vY9v7xidBg5VTcryqfGt4QeAyEv73c9Z1J1LwxZ%2BtakbbhOfr6h9sjC65rpSehE%0D%0Ad4Yy1TXkQb9zlNkWEmD%2Br642A6n71A0vHRBwP9j%2F7TDLBQ%3D%3D%0D%0A&TermUrl=https%3A%2F%2Fwww.footpatrol.co.uk%2Fcheckout%2F3d&MD=
I have tried this but seems to be a different format than what I need to submit correctly.
Code:
import urllib.parse
print(urllib.parse.quote_plus('''PaReq:eJxdUt1ugjAYvfcpyB6AlvpTMLUJG1vmEp2Z7mKXpHRIVMBSBvr0a9FatAlJz/lO6en5PrLZCs6j
NWe14HTgOGTBqypOuZMls6cydrGHgwn2UOA/6bISrMIvfrzsFfrjosqKnHoudBEBBpryggu2jXNp
CEXF7Pg8X9JRgAIICbhCWz9wMY+oj/EYDyfwugi40FaWxwdOPyJnXRZCVgR02JZZUedSnKiPJgQY
YMu12NOtlOUUgKZp3N+ikGUsRbF3WeHWO0CAVphXgMdnkFWtiap/Y5sldBGFjf1Yuzzv0PL8evrc
pDMCtMLqk1hyiqCHoT/0HIimCE/HmICO78V10OapNxy5QaDiukBbL7WT8CbSmj7VS6QWgufMRGKQ
FfC2LHKuzqg+3vY9v7xidBg5VTcryqfGt4QeAyEv73c9Z1J1LwxZ+takbbhOfr6h9sjC65rpSehE
d4Yy1TXkQb9zlNkWEmD+r642A6n71A0vHRBwP9j/7TDLBQ==
TermUrl:https://www.footpatrol.co.uk/checkout/3d
MD:'''))
Is this obtainable with python? And what do i need to do to achieve the wanted end result?
if your paraneters are separated by newlines you can use the splitlines method to get a list of parameters, and use re.split on each item to get a list with name, value.
Then apply quote_plus on each name and value, '='.join them and '&'.join all parameters.
import urllib.parse
import re
data = '''PaReq:eJxdUt1ugjAYvfcpyB6AlvpTMLUJG1vmEp2Z7mKXpHRIVMBSBvr0a9FatAlJz/lO6en5PrLZCs6jNWe14HTgOGTBqypOuZMls6cydrGHgwn2UOA/6bISrMIvfrzsFfrjosqKnHoudBEBBpryggu2jXNpCEXF7Pg8X9JRgAIICbhCWz9wMY+oj/EYDyfwugi40FaWxwdOPyJnXRZCVgR02JZZUedSnKiPJgQYYMu12NOtlOUUgKZp3N+ikGUsRbF3WeHWO0CAVphXgMdnkFWtiap/Y5sldBGFjf1Yuzzv0PL8evrcpDMCtMLqk1hyiqCHoT/0HIimCE/HmICO78V10OapNxy5QaDiukBbL7WT8CbSmj7VS6QWgufMRGKQFfC2LHKuzqg+3vY9v7xidBg5VTcryqfGt4QeAyEv73c9Z1J1LwxZ+takbbhOfr6h9sjC65rpSehEd4Yy1TXkQb9zlNkWEmD+r642A6n71A0vHRBwP9j/7TDLBQ==
TermUrl:https://www.footpatrol.co.uk/checkout/3d
MD:'''
data = [re.split(':(?!//)', line) for line in data.splitlines()]
data = '&'.join('='.join(urllib.parse.quote_plus(i) for i in l) for l in data)
If your data is split by newlines arbitrarily, you could join the lines and split by name. Then zip names and values, quote and join.
data = ''.join(data.splitlines())
data = zip(['PaReq', 'TermUrl', 'MD'], re.split('PaReq:|TermUrl:|MD:', data)[1:])
data = '&'.join('='.join(urllib.parse.quote_plus(i) for i in l) for l in data)
If you want to keep the newline cheracter, use only the last two lines in the second code snippet.
I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()
I'm currently trying to count the number of times a date occurs within a chat log for example the file I'm reading from may look something like this:
*username* (mm/dd/yyyy hh:mm:ss): *message here*
However I need to split the date from the time as I currently treat them as one. Im currently struggling to solve my problem so any help is appreciated. Down below is some sample code that I'm currently using to try get the date count working. Im currently using a counter however I'm wondering if there are other ways to count dates.
filename = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,))
mtxtr = filename.read()
date = []
number = []
occurences = Counter(date)
mtxtformat = mtxtr.split("\r\n")
print 'The Dates in the chat are as follows'
print "--------------------------------------------"
for mtxtf in mtxtformat:
participant = mtxtf.split("(")[0]
date = mtxtf.split("(")[-1]
message = date.split(")")[0]
date.append(date1.strip())
for item in date:
if item not in number:
number.append(item)
for item in number:
occurences = date.count(item)
print("Date Occurences " + " is: " + str(occurences))
Easiest way would be to use regex and take the count of the date pattern you have in the log file. It would be faster too.
If you know the date and time are going to be enclosed in parentheses at the start of the message (i.e. no parentheses (...): will be seen before the one containing the date and time):
*username* (mm/dd/yyyy hh:mm:ss): *message here*
Then you can extract based on the parens:
import re
...
parens = re.compile(r'\((.+)\)')
for mtxtf in mtxtformat:
match = parens.search(mtxtf)
date.append(match.group(1).split(' ')[0])
...
Note: If the message itself contains parens, this may match more than just the needed (mm/dd/yyyy hh:mm:ss). Doing match.group(1).split(' ')[0] would still give you the information you are looking for assuming there is no information enclosed in parens before your date-time information (for the current line).
Note2: Ideally enclose this in a try-except to continue on to the next line if the current line doesn't contain useful information.
I have a log file which has text that looks like this.
Jul 1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7)
There are two time formats in the file. I need to sort this log file based on the date time format enclosed in [].
This is the regex I am trying to use. But it does not return anything.
t_pat = re.compile(r".*\[\d+/\D+/.*\]")
I want to go over each line in file, be able to apply this pattern and sort the lines based on the date & time.
Can someone help me on this? Thanks!
You have a space in there that needs to be added to the regular expression
text = "Jul 1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7)"
matches = re.findall(r"\[\s*(\d+/\D+/.*?)\]", text)
print matches
['1/Jul/2013 03:27:12.818']
Next parse the time using the following function
http://docs.python.org/2/library/time.html#time.strptime
Finally use this as a key into a dict, and the line as the value, and sort these entries based on the key.
You are not matching the initial space; you also want to group the date for easy extraction, and limit the \D and .* patterns to non-greedy:
t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")
Demo:
>>> re.compile(r".*\[\s?(\d+/\D+?/.*?)\]").search(line).group(1)
'1/Jul/2013 03:27:12.818'
You can narrow down the pattern some more; you only need to match 3 letters for the month for example:
t_pat = re.compile(r".*\[\s?(\d{1,2}/[A-Z][a-z]{2}/\d{4} \d{2}:\d{2}:[\d.]{2,})\]")
Read all the lines of the file and use the sort function and pass in a function that parses out the date and uses that as the key for sorting:
import re
import datetime
def parse_date_from_log_line(line):
t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")
date_string = t_pat.search(line).group(1)
format = '%d/%b/%Y %H:%M:%S.%f'
return datetime.datetime.strptime(date_string, format)
log_path = 'mylog.txt'
with open(log_path) as log_file:
lines = log_file.readlines()
lines.sort(key=parse_date_from_log_line)