Text parsing via python

Text parsing via python - python

I have 100+ files with the extension *.log. Each file contains the results from an ansible-playbook run. I would like to parse the data using python, so that I can import it into an excel spreadsheet. I need some help to automate the process.
File content are:
ok: [wrt02.test1] => {
"msg": "nxos"
}
TASK [checklist : OUTPUT IOS_XR] *******************************************************************************************************************************************************************************************************
skipping: [leaf1J0101.test2]
skipping: [leaf1J0102.test2]
ok: [spine01.test1] => {
"msg": [
"Bundle-Ether1.100 192.168.245.65 Up Up default ",
"Bundle-Ether10.151 192.168.203.3 Up Up default ",
"Loopback0 192.168.255.7 Up Up default ",
"MgmtEth0/RSP0/CPU0/0 192.168.224.15 Up Up MANAGEMENT",
"TenGigE0/0/0/2 192.168.114.114 Up Up default ",
"TenGigE0/0/0/3 192.168.82.170 Up Up default"
]
}
RESULTS:
spine01.test1,Bundle-Ether1.100,192.168.245.65,
spine01.test1,Bundle-Ether10.151,192.168.203.3,
spine01.test1,Loopback0,192.168.255.7,
spine01.test1,MgmtEth0/RSP0/CPU0/0,192.168.224.15,
spine01.test1,TenGigE0/0/0/2,192.168.114.114,
spine01.test1,TenGigE0/0/0/3,192.168.82.170
CODE:
def findIOS(output):
# String we're looking for
OUTIOS_string = "TASK [checklist : OUTPUTIOS] ***************************************************************************************************************************************************************************************************************"
end_string = "TASK"
# Find the start of our string
start_index = output.find(OUTIOS_string) + len(OUTIOS_string) + 2
# Find the end of our string
end_index = output.find(end_string, start_index + 1)
lines = output[start_index:end_index].split('\n')
# Create a list to store our resulting dictionaries
#print lines
d = []
for line in lines:
#print line
if line != "":
# If line is not empty, find our starting and closing brackets
# Find the host:
hstart = line.find('[')
hend = line.rfind(']') + 1
start = line.find('{')
end = line.rfind('}') + 1
hostname = line[hstart:hend]
# Store content between brackets
obj = line[start:end]
hostname = hostname.replace("[", "").replace("]","")
print hostname
print obj
# Convert string to dictionary, and store the results
d.append(eval(obj))
print d
return d
def main():
output = None
with open("../showint.log", "rb") as f:
output = f.read()
if __name__ == '__main__':
main()
How to fetch the into above format? Thanks for the help

I let you the pleasure of creating the code around that.
import re
# This matches the msg pattern
find_start = r"^\s+\"msg\":\s\["
# This matches the line with the information you want that you can access with match.groups()
find_line = r"^\s+\"([^\s]+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

Related

How would I insert a number into a pre-existing script in and repeat 3544 times

I have a script for a video game modification that has to be repeated 3544 times.
country_event = {
id = 5893X
title = "Aquire $PROVINCENAME$"
desc = "$COUNTRY$ is surrendering a province to us."
picture = "administration"
trigger = {
has_country_flag = won_acquire_prov
NOT = { has_country_flag = prov_treaty }
any_country = {
limit = {
truce_with = THIS
NOT = {
X = {
owned_by = THIS
}
}
}
}
}
option = {
name = "Take $PROVINCENAME$"
X = {
secede_province = THIS
}
}
option = {
name = "We don't want $PROVINCENAME$"
}
option = {
name = "End Province arbitration"
set_country_flag = prov_treaty
}
}
The goal is to replace "X" with a number ranging from 1 to 3544 for each number there will be another "country_event". But I am not sure how to do this.
I have tried using a python script:
infile = open("provids.txt", "r")
outfile = open("0_provs_event.txt","w")
for line in infile:
if len(line) > 5:
if line[5] == "=":
prov = line.split("/")[1].split(".")[0]
if prov != "REB":
#^^^ Code does not change file if above is removed
newline = "country_event = { id = 5893" + prov + "title = Aquire $PROVINCENAME$ desc = $COUNTRY$ is surrendering a province to us. picture = administration trigger = { has_country_flag = won_aquire_prov NOT = { has_country_flag = prov_treaty } any_country = { limit = { truce_with = THIS NOT = {" + tag + "= { owned_by = THIS } } } } } option = { name = Take $PROVINCENAME$ " + prov + " = { secede_province = THIS } } option = { name = We don't want $PROVINCENAME$ } option = { name = End Province arbitration set_country_flag = prov_treaty } } \n"
outfile.write(newline)
infile.close()
outfile.close()
I haven't been able to get it to work. It just makes the file and does nothing else.

The main issue is how you're reading in infile
When you call open, python will open the file as '_io.TextIOWrapper' which is a class that contains metadata about the file among other things. If you want the text of the file as a list of strings, where each string is a line of the file (which appears to be what you want), you then need to read the file. This can be done using infile.read() if you want the whole this as a single string, or infile.readlines() if you want each line to be an element of a list
My preferred way of reading in files in python is as follows:
# Setup file paths, and something to store your output which will be written to outfile_path
infile_path = "path_to_infile"
outfile_path = "path_to_outfile"
outfile = []
with open(infile_path, 'r') as f: # Open infile_path into f as a readable object
infile = f.readlines() # Read through f and place each line as an element in a list
for line in infile:
# do stuff, append lines to outfile
outfile = "\n".join(outfile) # Turn the list outfile into a string with character returns between each line
with open(outfile_path, 'w') as f: # Open outfile_path into f as a writable object
f.write(outfile) # Write out the string outfile to f
The advantage of the with open version is you don't need to remember to close the file afterwards, that automatically happens after the with block (after the unindent in the code)

Parsing a dictionary in Python to my current table

I have a table that contains a few categories and two of them are: mac address and device name. I had a the list of my mac address written in my code (hardcoded) with their corresponding device names (ie deviceDict['00:00:00:00:00:00']= name)
Now, I passed those mac addresses and device names to a text file to be read from that same Python code and parse it onto my table. The code currently recognizes the text file but it is not parsing that information onto the table.
Here is the code:
# File: WapLogParser.py
# Desc: Parses a WAP log file and pulls out information relating to connected clients
# Usage: python WapLogParser.py [file glob]
import re
import sys
import glob
import os
deviceDict = dict()
# Base table for storing client info
# All names must match what is in the Wap Log file
# Exceptions: Date, Wap Name, Device Name - which are provided outside of the result parsing
table = [["Ssid", "Vlan", "Mac Address", "Connected Time", "Ip Address", "Rssi", "Date", "Wap Name", "Device Name"]]
def ParseResult(result, date, wapName):
lines = result.split('\n')
lines = list(filter(None, lines))
# Any useful info will be at least 2 lines long
if len(lines) == 1:
return
# create empty row
data = [""] * len(table[0])
# for each item in the result place it in the correct spot in the row
for line in lines:
if line != "":
# Parse the key/value pair
m = re.match(r"(.*):\s\.*\s?(.*)", line)
if m is not None:
for idx in range(len(table[0])):
if table[0][idx].lower() == m[1].lower():
data[idx] = m[2]
else:
break
# Remove the '(dBm)' from the RSSI value
data[5] = data[5].split()[0]
# Append WAP specific items to row
data[6] = date
data[7] = wapName
data[8] = GetDeviceName(data[2].upper())
# Add row to table
table.append(data)
def ParseFile(path):
with open(path) as f:
lines = f.readlines()
result = ""
command = ""
date = ""
# WAP name is always on the first line 16 characters in with 4
# unnecessary characters trailing
wapName = lines[0].strip()[16:-4]
for line in lines:
line = line.strip()
# Is an issued command?
if line.startswith("/#"):
if command != "":
ParseResult(result, date, wapName)
command = ""
# reset the result for the new command
result = ""
m = re.match(r"^/#.*show\sclient.*stats$", line)
if m is not None:
command = line
# Anything that is not a command add to the result
else:
result += line + "\n"
# Do we have the date?
if line.startswith("Current date:"):
date = line.replace("Current date: ", "")
# Print output to stderr
def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
# Print a 2d array in a csv format
def PrintAsCsv(table):
for row in table:
print(",".join(row))
def Main():
InitDeviceDict()
numArgs = len(sys.argv)
for filename in glob.iglob(sys.argv[numArgs - 1], recursive=True):
# Globs get directories too
if os.path.isfile(filename):
eprint("Parsing " + filename)
try:
ParseFile(filename)
except Exception as e: # Mainly for if we see a binary file
eprint("Bad file: " + e)
# Print in a format we can use
PrintAsCsv(table)
def GetDeviceName(macAddress):
if macAddress in deviceDict:
return deviceDict[macAddress]
manufacturerPart = macAddress[:8]
if manufacturerPart in deviceDict:
return deviceDict[manufacturerPart]
return 'Unknown Device'
def InitDeviceDict():
with open('try.txt','r') as fo:
for line in fo:
deviceDict = {}
line = line.split(',')
macAddress = line[0].strip()
manufacturerPart = line[1].strip()
if macAddress in deviceDict:
deviceDict[macAddress].append(manufacturerPart)
else:
deviceDict[macAddress]=(manufacturerPart)
print(deviceDict)
# entry point
# script arguments:
# WapLogParser.py [file glob]
if __name__ == "__main__":
Main()
The issue is on the functions GetDeviceName and InitDeviceDict. When I run the code and then a batch file to display my info on excel, I keep getting "unknown device" (as if it is not recognizing the mac address I entered to produce the device name)
Any way I can correct this? Thank you

The deviceDict that is populated in InitDeviceDict is not the global deviceDict. You are only modifying a function-local dictionary (and resetting it every line as well). Remove deviceDict = {} from that function and, at the top of the function use global deviceDict to declare that you are modifying the global.
def InitDeviceDict():
global deviceDict
with open('try.txt','r') as fo:
for line in fo:
line = line.split(',')
macAddress = line[0].strip()
manufacturerPart = line[1].strip()
if macAddress in deviceDict:
deviceDict[macAddress].append(manufacturerPart)
else:
deviceDict[macAddress]=[manufacturerPart]

Python: text log file processing and transposing rows to columns

I am new to python and stuck with a log file in text format, where it has following repetitive structure and I am required to extract the data from rows and change it into column depending upon the data. e.g.
First 50 line are trash like below(in first six lines):
-------------------------------------------------------------
Logging to file xyz.
Char
1,
3
r
=
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3
----------------------------------------------
a 1
b 2
c 3
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3, P_NO=546467
----------------------------------------------
Test_data_1 00001
Test_data_2 FOXABC
Test_data_3 SHEEP123
Country US
----------------------------------------------
Pid 0
Name SAB=1
----------------------------------------------
Sno 893489423
Log FileFormat
------------Continues for another million lines.
Now the required output is like below:
Required output format
PID, Name, a,b,c
0, "SAB=1, XYZ=3", 1,2,3
PID, Name , Test_data_1, Test_data_2, Test_data_3, Country
0, "SAB=1, XYZ=3, P_NO=546467", 00001, FOXABC, SHEEP123, US
Pid, Name, Sno
0, SAB=1, 893489423
I tried to write a code but failed to get the desired results: My attempt was as below:
'''
fn=open(file_name,'r')
for i,line in enumerate(fn ):
if i >= 50 and "Name " in line: # for first 50 line deletion/or starting point
last_tag=line.split(",")[-1]
last_element=last_tag.split("=")[0]
print(last_element)
'''
Any help would be really appreciated.
Newly Discovered Structure
RBY Structure

The solution I came up with is a bit messy but it works, check it out below:
import sys
import re
import StringIO
ifile = open(sys.argv[1],'r') #Input log file as command-line argument
ofile = open(sys.argv[1][:-4]+"_formatted.csv",'w') #output formatted log txt
stringOut = ""
i = 0
flagReturn = True
j = 0
reVal = re.compile("Pid[\s]+(.*)\nName[\s]+(.*)\n[-]+\<br\>(.*)\<br\>") #Regex pattern for separating the Pid & Name from the variables
reVar = re.compile("(.*)[ ]+(.*)") #Regex pattern for getting vars and their values
reVarStr = re.compile(">>> [0-9]+.(.*)=(.*)") #Regex Pattern for Struct
reVarStrMatch = re.compile("Struct(.*)+has(.*)+members:") #Regex pattern for Struct check
for lines in ifile.readlines():
if(i>8): #Omitting the first 9 lines of Garbage values
if(lines.strip()=="----------------------------------------------"): #Checking for separation between PID & Name group and the Var group
j+=1 #variable keeping track of whether we are inside the vars section or not (between two rows of hyphens)
flagReturn = not flagReturn #To print the variables in single line to easily separate them with regex pattern reVal
if(not flagReturn):
stringTmp = lines.strip()+"<br>" #adding break to the end of each vars line in order for easier separation
else:
stringTmp = lines #if not vars then save each line as is
stringOut += stringTmp #concatenating each lines to form the searchable string
i+=1 #incrementing for omitting lines (useless after i=8)
if(j==2): #Once a complete set of PIDs, Names and Vars have been collected
j=0 #Reset j
matchObj = reVal.match(stringOut) #Match for PID, Name & Vars
line1 = "Pid,Name,"
line2 = matchObj.group(1).strip()+",\""+matchObj.group(2)+"\","
buf = StringIO.StringIO(matchObj.group(3).replace("<br>","\n"))
structFlag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"
ofile.writelines(line1[:-1]+"\n")
ofile.writelines(line2[:-1]+"\n")
ofile.writelines("\n")
stringOut = "" #Reseting the string
ofile.close()
ifile.close()
EDIT
This is what I came up with to include the new pattern as well.
I suggest you do the following:
Run the parser script on a copy of the log file and see where it fails next.
Identify and write down the new pattern that broke the parser.
Delete all data in the newly identified pattern.
Repeat from Step 1 till all patterns have been identified.
Create individual regular expressions pattern for each type of pattern and call them in separate functions to write to the string.
EDIT 2
structFlag = False
RBYflag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
if(matchObjVars.group(1).strip()=="RBY" and not RBYFlag):
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+"**"
RBYFlag = True
elif(matchObjVars.group(1).strip()=="RBY"):
line2 += matchObjVars.group(2).strip()+"**"
else:
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"`
NOTE
This loop has become very bloated and it is better to create a separate function to identify the type of data and return some value accordingly.

python: read file and split the data

I have a file config and the contents are separated by space " "
cat config
/home/user1 *.log,*.txt 30
/home/user2 *.trm,*.doc,*.jpeg 10
I want to read this file,parse each line and print each field from the each line.
Ex:-
Dir = /home/user1
Fileext = *.log,*.txt
days=30
I couldn't go further than the below..
def dir():
file = open('config','r+')
cont = file.readlines()
print "file contents are %s" % cont
for i in range(len(cont)):
j = cont[i].split(' ')
dir()
Any pointers how to move further?

Your code is fine, you are just missing the last step processing each element of the splitted string, try this:
def dir():
file = open('config','r+')
cont = file.readlines()
print "file contents are %s" % cont + '\n'
elements = []
for i in range(len(cont)):
rowElems = cont[i].split(' ')
elements.append({ 'dir' : rowElems[0], 'ext' : rowElems[1], 'days' : rowElems[2] })
for e in elements:
print "Dir = " + e['dir']
print "Fileext = " + e['ext']
print "days = " + e['days']
dir()
At the end of this code, you will have all the rows processed and stored in an array of dictionaries you can easily access later.

You can write a custom function to parse each line, and then use the map function to apply that function against each line in file.readlines():
def parseLine(line):
# function to split and parse each line,
# and return the formatted string
Dir, FileExt, Days = line.split(' ')[:3]
return 'Dir = {}\nFileext = {}\nDays = {}'.format(Dir, FileExt, Days)
def dir():
with open('config','r+') as file:
print 'file contents are\n' + '\n'.join(map(parseLine, file.readlines()))
Results:
>>> dir()
file contents are
Dir = /home/user1
Fileext = *.log,*.txt
Days = 30
Dir = /home/user2
Fileext = *.trm,*.doc,*.jpeg
Days = 10

How can we strip spaces before writing into csv

! # $ % & ( ) * , - 0 / : < = > ? # [ \ ] ^
this the header of my csv file.. after : you can see one blank space like my csv file header also contain one column with header blank.how can remove by updating following code??
feature_list = ""
root_flag = 'false'
fvt_length = 0
output_file="/home/user/Project/Dataset/unigram_FVT.csv"
feature_vector_file1 = "/home/user/Project/Dataset/BST_unigram.txt"
d = os.path.dirname(output_file)
if not os.path.exists(d):
os.makedirs(d)
with open(output_file, "w" ) as fout:
fp_feature = csv.writer(fout)
fileread=open(feature_vector_file1,"r")
read_list=fileread.read()
read_list=dataPreprocessing.remove_words_less_than_3(read_list)
read_list = read_list.replace('\n','')
read_list = re.sub( '\s+', ' ',read_list).strip()
read_list = dataPreprocessing.remove_digits(read_list)
unigram_list=list(set(read_list.split(" ")))
for i in range(0,len(unigram_list)):
unigram_list[i]=unigram_list[i].lstrip().rstrip()
if root_flag == 'false' :
root = Node(unigram_list[i])
root_flag = 'true'
else :
root.insert(unigram_list[i])
feature_list = feature_list + "\n"+unigram_list[i]
feature_list1 = feature_list.strip()
line = feature_list1.split('\n')
line.sort()
line.append("Gender")
root.print_tree()
print len(line)
fp_feature.writerow(line)
FVT_unigram()
Can anybody can help me? Sometimes my file content contains some spaces but I have added this unigram_list[i]=unigram_list[i].lstrip().rstrip() but still my header contains spaces.

I ran into a similar problem with my program the other day, and I realized the easiest thing to do is ato write a simple if statement and just create a new string/list:
aStr = "Hello World this is a test!"
newStr = ""
for letter in aStr:
if letter!=" ":
newStr += letter
And when I print the newStr:
HelloWorldthisisatest!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Text parsing via python - python

I let you the pleasure of creating the code around that. import re # This matches the msg pattern find_start = r"^\s+\"msg\":\s\[" # This matches the line with the information you want that you can access with match.groups() find_line = r"^\s+\"([^\s]+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

Related

How would I insert a number into a pre-existing script in and repeat 3544 times

Parsing a dictionary in Python to my current table

Python: text log file processing and transposing rows to columns

python: read file and split the data

How can we strip spaces before writing into csv

Categories

Resources