This function takes email body as input and returns values after Application name, source and message respectively and it works fine
def parse_subject(line):
info = {}
segments = line.split(' ')
info['time'] = segments[0]+' '+segments[1]
for i in range(2, len(segments)):
key = ''
if segments[i] == 'Application name:':
key = 'appname'
elif segments[i] == 'Source:':
key = 'source'
elif segments[i] == 'Message:':
key = 'message'
if key != '':
i += 1
info[key] = segments[i]
return info
For another email body format i need to extend segments format because i need to search more lines in message body so i changed info['time'] and as soon i extend segments for more than 2 i'm getting out of range errors
info['time'] = segments[0]+' '+segments[1]+' '+segments[2]+' '+segments[3]+' '+segments[4]+' '+segments[5]......up to segment[17]
maybe i'll need to extend more
and above function fails with list index out of range
i changed code but same error:
also tried changing number to match number of segments but same:
for i in range(<number of segments>, len(segments)):
example of segments: lenght will vary because string after Message has different value, sometime it's URL string
Question
when i define lenght of the segment, let's say up to segments[17],
what i need to change in function not to throw out of index error
def parse_subject(line):
info = {}
segments = line.split(' ')
info['time'] = segments[0]+' '+segments[1] + ' ' + segments[2] + ' ' + segments[3] + ' ' + segments[4] + ' ' + segments[5] + ' ' + segments[6] + ' ' + segments[7] + ' ' + segments[8] +' ' + segments[9] + ' ' + segments[10] + ' ' + segments[11] + ' ' + segments[12] +' ' + segments[13] + ' ' + segments[14] + ' '
+ segments[15] +' ' + segments[16] + ' ' + segments[17]
for i in range(16, len(segments)):
key = ''
if segments[i] == 'name:':
key = 'appname'
elif segments[i] == 'Source:':
key = 'source'
elif segments[i] == 'Message:':
key = 'message'
if key != '':
i += 1
info[key] = segments[i]
return info
if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
body = get_autosys_body(mail)
# print(body)
for line in body.splitlines():
if 'Application Name' in line:
job_info = parse_subject(line)
break
print(job_info)
I need to pass line variable (content below)
name:Contoso.Service
Source: host15
Timestamp: 2019-01-22T00:00:43.901Z
Message:null
to parse_subject(line) function and from above output to get:
Contoso.Service as value of job_info['appname']
host15 as value of jobinfo['source']
null as value of jobinfo['message']
In your code, you need to debug it. The error is telling you exactly what is wrong.
def old_parse_subject(line):
info = {}
segments = line.split(' ')
if len(segments < 18):
raise ValueError("segments[17] won't work if segments is not that long")
You could have done a print(len(segments)) or just print (segments) right before where you know the error is.
For reading an email header, if you know it has multiple lines, you get those with split('\n') and then for each line if you know it is "name: value" you get that with split(':', 1).
The second argument to split says only split on 1 colon, because any additional colons are allowed to be part of the data. For example, timestamps have colons.
def parse_subject(headers):
info = {}
# split the full header into separate lines
for line in headers.split('\n'):
# split on colon, but only once
key, value = line.split(':', 1)
# store info
info[key] = value
return info
data = """name:Contoso.Service
Source: host15
Timestamp: 2019-01-22T00:00:43.901Z
Message:null"""
print (parse_subject(data))
{'name': 'Contoso.Service', 'Source': ' host15', 'Timestamp': ' 2019-01-22T00:00:43.901Z', 'Message': 'null'}
Related
I am trying to get this code to split one at a time, but it is not functioning as expected:
for line in text_line:
one_line = line.split(' ',1)
if len(one_line) > 1:
acro = one_line[0].strip()
meaning = one_line[1].strip()
if acro in acronyms_dict:
acronyms_dict[acro] = acronyms_dict[acro] + ', ' + meaning
else:
acronyms_dict[acro] = meaning
Remove the ' ' from the str.split. The file is using tabs to delimit the acronyms:
import requests
data_site = requests.get(
"https://raw.githubusercontent.com/priscian/nlp/master/OpenNLP/models/coref/acronyms.txt"
)
text_line = data_site.text.split("\n")
acronyms_dict = {}
for line in text_line:
one_line = line.split(maxsplit=1) # <-- remove the ' '
if len(one_line) > 1:
acro = one_line[0].strip()
meaning = one_line[1].strip()
if acro in acronyms_dict:
acronyms_dict[acro] = acronyms_dict[acro] + ", " + meaning
else:
acronyms_dict[acro] = meaning
print(acronyms_dict)
Prints:
{
'24KHGE': '24 Karat Heavy Gold Electroplate',
'2B1Q': '2 Binary 1 Quaternary',
'2D': '2-Dimensional',
...
I’m writing a program that makes music albums into files that you can search for, and for that i need a str in the file that have a specific value that is made after the list is complete. Can you go back in that list and change a blank str with a new value?
I have searched online and found something called words.replace, but it doesn’t work, i get a Attribute error.
def create_album():
global idnumber, current_information
file_information = []
if current_information[0] != 'N/A':
save()
file_information.append(idnumber)
idnumber += 1
print('Type c at any point to abort creation')
for i in creation_list:
value = input('\t' + i)
if value.upper == 'C':
menu()
else:
-1file_information.append('')
file_information.append(value)
file_information.append('Album created - ' + file_information[2] +'\nSongs:')
-2file_information = [w.replace(file_information[1], str(file_information[0]) + '-' + file_information[2]) for w in file_information]
current_information = file_information
save_name = open(save_path + str(file_information[0]) + '-' + str(file_information[2]) + '.txt', 'w')
for i in file_information:
save_name.write(str(i) + '\n')
current_files_ = open(information_file + 'files.txt', 'w')
filenames.append(file_information[0])
for i in filenames:
current_files_.write(str(i) + '\n')
id_file = open(information_file + 'albumid.txt', 'w')
id_file.write(str(idnumber))
-1 is where i have put aside a blank row
-2 is the where i try to replace row 1 in the list with the value of row 0 and row 2.
The error message I receive is ‘int’ object has no attribute ‘replace’
Did you try this?
-2file_information = [w.replace(str(file_information[1]), str(file_information[0]) + '-' + file_information[2]) for w in file_information]
I have two loops, one to retrieve the key (Email address) and store it into a variable and another loop to retrieve the value of the keys (List containing data). However, after the script runs, the final key is repeated.
I have attempted shifting the for loops around and attempted creating a new block of code that puts the keys into a list and just retrieve them one by one, but it didn't work.
Code
import re
import os
userInfoDict = dict()
newDict = eval(open("/home/ad.ilstu.edu/keschm1/IT170/UserLog//myData.txt").read())
list = []
#print(newDict)
email = ' '
firstName = ' '
lastName = ' '
IP = ' '
for keys in newDict.keys():
email = keys
for values in newDict.values():
firstName = values[0]
lastName = values[1]
IP = values[2]
#print('The name is: ' + firstName + ' ' +lastName +' the IP is: ' + IP +'\n')
list.append(email + ": " + firstName + " " + lastName +":" + IP)
file = open("userInfo.txt", "w")
for items in list:
file.write(items + '\n')
#print(items)
file.close()
This is the result when printing onto a txt document
hosack#comcast.com.:Glen Wilson:172.39.112.76
hosack#comcast.com.:Cindy Tady:123.18.19.20 hosack#comcast.com.:Emma
Lim:11.11.11.11 hosack#comcast.com.:Anna Smith:172.28.19.15
hosack#comcast.com.:Jack Hosack:196.88.45.23
for keys in newDict.keys():
email = keys
What do you think this does? This assigns each key in newDict to the same email variable, one after another. So email will end up set to the last keys value you processed from the newDict dictionary. Probably not what you want.
Since this is the only way you set email, it will always be set to the same value when used later on in your code.
What you probably want is something like this:
for email, values in newDict.items():
firstName = values[0]
lastName = values[1]
IP = values[2]
#print('The name is: ' + firstName + ' ' +lastName +' the IP is: ' + IP +'\n')
list.append(email + ": " + firstName + " " + lastName +":" + IP)
I got dictionary from code below following this SO link
and need to get Application name,source and message for every key in dictionary so i tried to transfer it in JSON file
if mail["Subject"].find("example error alert") > 0 :
body = get_email_body(mail)
info = {}
segments = body.split(' ')
for line in body.splitlines():
if 'Application name' and 'null' in line:
info['test'] = segments[0] + ' ' + segments[1] + ' ' + segments[2] + ' ' + segments[3] + ' ' + segments[4]
elif 'Application name' in line:
info['test'] = segments[0] + ' ' + segments[1] + ' ' + segments[2] + ' ' + segments[3] + ' ' + segments[4] + ' ' + segments[5] + segments[6] + ' ' + segments[7] + ' ' + segments[8] + ' ' + segments[9]
r = json.dumps(info['test'])
loaded_r = json.loads(r)
print(str(r['Source']))
i have this dictionary
print(info['test'])
Application name: example.service
Source: example_host_1|exampleHost1
Timestamp: 2019-01-22T00:00:43.901Z
Message:
Application name: example.api
Source: example_host_2|exampleHost2
Timestamp: 2019-01-23T07:42:12.649Z
Message: HTTP"GET" "/api/endpoint/groups" responded 500
i converted it to JSON without error
r = json.dumps(info['test'])
loaded_r = json.loads(r)
and when try extract Application_name from it:
loaded_r['Application name']
or Source
loaded_r['Source']
i'm getting TypeError: string indices must be integers
as suggested by duplicate link tried also print (loaded_r['Source'][0]) and print(str(r['Source'])) but the same
Message body example (used segments to leave only first some lines to remove duplicates):
Source: example_host_1
Timestamp: 2019-01-22T00:00:43.901Z
Message: null
For instructions please see: wiki_link
Application name: example.api
Source: example_host_2
Timestamp: 2019-01-23T07:42:12.649Z
Message: HTTP "GET" "/api/endpoint/groups" responded 500 in 7795.6441 ms
Application name: service.API
Source: example_host_2
Timestamp: 2019-01-23T07:42:12.646Z
Message: Unhandled exception
For instructions please see: example_wiki_link
Dictionary stored in info variable
{'test': '\r\nApplication name: app.service\r\nSource: example_host_1\r\nTimestamp: 2019-01-22T00:00:43.901Z\r\nMessage:'}
{'test': '\r\nApplication name: app.API\r\nSource: adc266f53205\r\nTimestamp: 2019-01-23T07:42:12.649Z\r\nMessage: HTTP"GET" "/api/endpoint/groups" responded 500'}
I think loaded_r is a string and not a dictionary.
I think i'm good now, made some "mumbo-jumbo" but it works, converted dictionary to string and then used regex.Thanks everyone
res = ','.join([','.join(i) for i in info.items()])
x = res.replace('test,','')
regex1=r'Application name:\s*(.+?)\s+Source'
regex2=r'Source:\s*(.+?)\s+Timestamp:'
regex3 = r'(?<!^)Message:\s*.*'
a = re.findall(regex1 ,x)
b = re.findall(regex2 ,x)
c = re.findall(regex3, x)
print (a, b, c)
I am not using Python but I have script in python:
part of script
elif line.find("CONECT") > -1:
con = line.split()
line_value = line_value + 1
#print line_value
#print con[2]
try:
line_j = "e" + ', ' + str(line_value) + ', ' + con[2] + "\n"
output_file.write(line_j)
print(line_j)
line_i = "e" + ', ' + str(line_value) + ', ' + con[3] + "\n"
output_file.write(line_i)
print(line_i)
line_k = "e"+ ', ' + str(line_value) + ', ' + con[4] + "\n"
print(line_k)
output_file.write(line_k)
except IndexError:
continue
which give .txt output in format
e, 1, 2
e, 1, 3
e, 1, 4
e, 2, 1
e, 2, 3
etc.
I need remove similar lines with the same numbers, but no matter on order this numbers
i.e. line e, 2, 1..
Is it possible?
Of course, it is better to modify your code to remove that lines BEFORE you're writing them to file. You can use a list to store already saved values, and on each itereation, perfom a search if the values you're want to add is already exists in that list. The code below isn't tested and optimized, but it explains an idea:
# 'added = []' should be placed somewhere before 'if'
added = []
# you part of code
elif line.find("CONECT") > -1:
con = line.split()
line_value = line_value + 1
try:
line_j = "e, %s, %s\n" % (str(line_value),con[2])
tmp = sorted((str(line_value),con[2]))
if tmp not in added:
added.append(tmp)
output_file.write(line_j)
print(line_j)
line_i = "e, %s, %s\n" % (str(line_value),con[3])
tmp = sorted((str(line_value),con[3]))
if tmp not in added:
added.append(tmp)
output_file.write(line_i)
print(line_i)
line_k = "e, %s, %s\n" % (str(line_value),con[4])
tmp = sorted((str(line_value),con[4]))
if tmp not in added:
added.append(tmp)
print(line_k)
output_file.write(line_k)
except IndexError:
continue
Here is a comparison method for two lines of your file:
def compare(line1, line2):
els1 = line1.strip().split(', ')
els2 = line2.strip().split(', ')
return Counter(els1) == Counter(els2)
See the documentation for the Counter class.
If the count of elements doesn't matter you can replace the Counter class with set instead
The following approach should work. First add the following line further up in your code:
seen = set()
Then replace everything inside the try with the following code:
for con_value in con[2:5]:
entry = frozenset((line_value, con_value))
if entry not in seen:
seen.append(entry)
line_j = "e" + ', ' + str(line_value) + ', ' + con_value + "\n"
output_file.write(line_j)
print(line_j)
Make sure this code is indented to the same level as the code it replaces.