converting text file in to Json

converting text file in to Json - python

we are getting output in this format
2020-11-19 12:00:01,414 - INFO - clusterDC -Backup started
2020-11-19 12:00:01,415 - Debug - executing command: /opt/couchbase/bin/cbbackupmgr backup --archive /backup/clusterDC --repo clusterDC_date --cluster nodedc --username user --threads 16
2020-11-19 12:00:01,414 - INFO - clusterDC - Backup Succeeded. Backup successfully completed.
But now we want them in below JSON format.
"backup":[
{
"server_name":"nodedc",
"status":"Success/Failed",
"backup_start_time":"yyyy-mm-dd hh:mm:ss.mmmuu",
"cluster_name":"clusterDC",
"location":"/backup/clusterDc/clusterDC_date",
"backup_end_time":"yyyy-mm-dd hh:mm:ss:mmmuu"
}

There are multiple ways you could do this. One would be to parse the your output into a a list then use numpy to convert that to json. For parsing, just use string manipulation (maybe regex if needed) to populate your list.

If the output follows the same format each time:
#output = raw output
output = [line.split() for line in output.splitlines()]
backup = {}
backup["server_name"] = text[1][14]
backup["status"] = text[2][8]
backup["backup_start_time"] = f"{text[0][0]} {text[0][1].replace(',', ':')}"
backup["cluster_name"] = text[0][5]
backup["location"] = f"{text[1][10]}/{text[1][12]}"
backup["backup_end_time"] = f"{text[2][0]} {text[2][1].replace(',', ':')}"
backup = {'backup': [backup]}
print(backup)
Output:
{'backup': [{'server_name': 'nodedc', 'status': 'Succeeded.', 'backup_start_time': '2020-11-19 12:00:01:414', 'cluster_name': 'clusterDC', 'location': '/backup/clusterDC/clusterDC_date', 'backup_end_time': '2020-11-19 12:00:01:414'}]}

Related

How to Add data into an existing YAML script using PyYAML?

I want to add data inside the 'tags' key in this YAML script
# Generated by Chef, local modifications will be overwritten
---
env: nonprod
api_key: 5d9k8h43124g40j9ocmnb619h762d458
hostname: ''
bind_host: localhost
additional_endpoints: {}
tags:
- application_name:testin123
- cloud_supportteam:eagles
- technical_applicationid:0000
- application:default
- lifecycle:default
- function:default-api-key
dogstatsd_non_local_traffic: false
histogram_aggregates:
- max
- median
- avg
- count
which should be like this,
tags:
- application_name:testing123
- cloud_supportteam:eagles
- technical_applicationid:0000
- application:default
- lifecycle:default
- function:default-api-key
- managed_by:Teams
so far I have created this script that will append the data at the end of the file seems not the solution,
import yaml
data = {
'tags': {
'- managed_by': 'Teams'
}
}
with open('test.yml', 'a') as outfile:
yaml.dump(data, outfile,indent=2)

Figured out it like this and this is working,
import yaml
from yaml.loader import SafeLoader
with open('test.yaml', 'r') as f:
data = dict(yaml.load(f,Loader=SafeLoader))
data ['tags'].append('managed_by:teams')
print(data['tags'])
with open ('test.yaml', 'w') as write:
data2 = yaml.dump(data,write,sort_keys= False, default_flow_style=False)
and the output was like this,
['application_name:testin123', 'cloud_supportteam:eagles', 'technical_applicationid:0000', 'application:default', 'lifecycle:default', 'function:default-api-key', 'managed_by:teams']
and the test.yaml file was updated,
tags:
- application_name:testing123
- cloud_supportteam:eagles
- technical_applicationid:0000
- application:default
- lifecycle:default
- function:default-api-key
- managed_by:teams

How to read lines with different format(exceptions) in Lambda function (python code) which reads from S3 to ElK?

Example Format of my Log file
-------------------------------------------------------------Start of Log file Format----------------------------------------
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1047 mili sec
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webservices.helper.THelper Method:lambda$0] in 20 - Parsing Response from server Failed
org.json.JSONException: No value for dataModel
at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]
-----------------------------------------------------------------------End of Log file format ------------------------------
I am currently handling data (with bad coding practice) which starts with a date in the first 3 lines of the log above.
The problem starts with 4th line onward where I have exceptions in my log file with no date. Actually The previous line has the date and its the continuation of the Exception.
I do not know how to handle those lines as the format changes. I wanted to get the date of the previous line for the exception line.
Either I have to keep the previous line's date in a temporary variable and use it if there is format change or any other way.
I need Date as mandatory to push in ELK which takes the timestamp for the log.
Also if there is a suggestion for clean code please do it. I need the lambda to run faster.
Currently this is my logic in python code (I am a complete beginner in Python code).
import boto3
import botocore
import re
import json
import requests
...
...
def handler(event, context):
print("Starting handler function")
for record in event['Records']:
# Get the bucket name and key for the new file
bucket = record['s3']['bucket']['name']
print("Bucket name - " + bucket)
print(date_time)
key = record['s3']['object']['key']
print("key name - " + key)
# Get, read, and split the file into lines
obj = s3.get_object(Bucket=bucket, Key=key)
body = obj['Body'].read()
lines = body.splitlines()
# Match the regular expressions to each line and index the JSON
for line in lines:
document = {}
try:
if line[0] == '2':
listOfData = line.split(" - ")
date = datetime.strptime(listOfData[0], "%Y-%m-%d %H:%M:%S.%f")
timestamp = date.isoformat()
level = listOfData[1]
classname = listOfData[2]
message = listOfData[-1]
document = { "timestamps": timestamp, "level": level, "classnames": classname, "messages": message }
print(document)
else:
document = {"messages": line}
except ClientError as e:
raise e
r = requests.post(url, auth=awsauth, json=document, headers=headers)
print(r)
Additional Info :
As suggested in the answer by Ben, when i print this_line , i get all lines in the log file printed properly in separate lines but there is a problem for the exception lines in between. below is what printed.
{'line content': ['2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webevics.helpr.Helpr Method:lamda] in 2 - Parsng Respns from servr Failed', 'org.jsn.JSONExcepion: No vlue for dataModel'], 'date string': '2020-08-14 05:35:47.655'}
{'line content': ['2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webserics.helpr.Helpr Method:lambda] in 2 - Parsing Respnse from servr Faied', 'org.jsn.JSONException: No vlue for dataModel', 'at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]'], 'date string': '2020-08-14 05:35:47.655'}
Here i am getting 2 lines printed where the first line is useless and 2nd line is better . So can we make something like the 1st line should not come and only the 2nd line is present in this_line ?

I started by placing your sample data in a file,
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1047 mili sec
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webservices.helper.THelper Method:lambda$0] in 20 - Parsing Response from server Failed
org.json.JSONException: No value for dataModel
at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
I've added a fourth entry just to verify my approach works. The Python below detects the start of a line using a regular expression tailored to the date format
import re
with open('example.txt','r') as file_handle:
file_content = file_handle.read().split("\n")
list_of_lines = []
this_line = {}
for index, line in enumerate(file_content):
if len(line.strip())>0:
reslt = re.match('\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d{3}', line)
if reslt: # line starts with a date
if len(this_line.keys())>0:
list_of_lines.append(this_line)
date_string = reslt.group(0)
this_line = {'date string':date_string, 'line content': []}
this_line['line content'].append(line)
else:
this_line['line content'].append(line)
The data structure produced, list_of_lines, is a list of dictionaries. Each dictionary is an error entry which may contain one or more lines. The two keys in the error dictionary are date string and line content.
To review the output data structure, try running
for line in list_of_lines[2]['line content']:
print(line)

Python configparser reads comments in values

ConfigParser also reads comments. Why? Shouldn't this be a default thing to "ignore" inline comments?
I reproduce my problem with the following script:
import configparser
config = configparser.ConfigParser()
config.read("C:\\_SVN\\BMO\\Source\\Server\\PythonExecutor\\Resources\\visionapplication.ini")
for section in config.sections():
for item in config.items(section):
print("{}={}".format(section, item))
The ini file looks as follows:
[LPI]
reference_size_mm_width = 30 ;mm
reference_size_mm_height = 40 ;mm
print_pixel_pitch_mm = 0.03525 ; mm
eye_cascade = "TBD\haarcascade_eye.xml" #
The output:
C:\_Temp>python read.py
LPI=('reference_size_mm_width', '30 ;mm')
LPI=('reference_size_mm_height', '40 ;mm')
LPI=('print_pixel_pitch_mm', '0.03525 ; mm')
LPI=('eye_cascade', '"TBD\\haarcascade_eye.xml" #')
I don't want to read 30 ;mm but I want to read just the number '30'.
What am I doing wrong?
PS: Python3.7

hi use inline_comment_prefixes while creating configparser object check example below
config = configparser.ConfigParser(inline_comment_prefixes = (";",))
Here is detailed documentation.

Python Converting tab limited file into csv

I basically want to convert tab delimited text file http://www.linux-usb.org/usb.ids into a csv file.
I tried importing using Excel, but it is not optimal, it turns out like:
8087 Intel Corp.
0020 Integrated Rate Matching Hub
0024 Integrated Rate Matching Hub
How I want it so for easy searching is:
8087 Intel Corp. 0020 Integrated Rate Matching Hub
8087 Intel Corp. 0024 Integrated Rate Matching Hub
Is there any ways I can do this in python?

$ListDirectory = "C:\USB_List.csv"
Invoke-WebRequest 'http://www.linux-usb.org/usb.ids' -OutFile $ListDirectory
$pageContents = Get-Content $ListDirectory | Select-Object -Skip 22
"vendor`tvendor_name`tproduct`tproduct_name`r" > $ListDirectory
#Variables and Flags
$currentVid
$currentVName
$currentPid
$currentPName
$vendorDone = $TRUE
$interfaceFlag = $FALSE
$nextline
$tab = "`t"
foreach($line in $pageContents){
if($line.StartsWith("`#")){
continue
}
elseif($line.length -eq 0){
exit
}
if(!($line.StartsWith($tab)) -and ($vendorDone -eq $TRUE)){
$vendorDone = $FALSE
}
if(!($line.StartsWith($tab)) -and ($vendorDone -eq $FALSE)){
$pos = $line.IndexOf(" ")
$currentVid = $line.Substring(0, $pos)
$currentVName = $line.Substring($pos+2)
"$currentVid`t$currentVName`t`t`r" >> $ListDirectory
$vendorDone = $TRUE
}
elseif ($line.StartsWith($tab)){
if ($interfaceFlag -eq $TRUE){
$interfaceFlag = $FALSE
}
$nextline = $line.TrimStart()
if ($nextline.StartsWith($tab)){
$interfaceFlag = $TRUE
}
if ($interfaceFlag -eq $FALSE){
$pos = $nextline.IndexOf(" ")
$currentPid = $nextline.Substring(0, $pos)
$currentPName = $nextline.Substring($pos+2)
"$currentVid`t$currentVName`t$currentPid`t$currentPName`r" >> $ListDirectory
Write-Host "$currentVid`t$currentVName`t$currentPid`t$currentPName`r"
$interfaceFlag = $FALSE
}
}
}
I know the ask is for python, but I built this PowerShell script to do the job. It takes no parameters. Just run as admin from the directory where you want to store the script. The script collects everything from the http://www.linux-usb.org/usb.ids page, parses the data and writes it to a tab delimited file. You can then open the file in excel as a tab delimited file. Ensure the columns are read as "text" and not "general" and you're go to go. :)
Parsing this page is tricky because the script has to be contextually aware of every VID-Vendor line proceeding a series of PID-Product lines. I also forced the script to ignore the commented description section, the interface-interface_name lines, the random comments that he inserted throughout the USB list (sigh) and everything after and including "#List of known device classes, subclasses and protocols" which is out of scope for this request.
I hope this helps!

You just need to write a little program that scans in the data a line at a time. Then it should check to see if the first character is a tab ('\t'). If not then that value should be stored. If it does start with tab then print out the value that was previously stored followed by the current line. The result will be the list in the format you want.

Something like this would work:
import csv
lines = []
with open("usb.ids.txt") as f:
reader = csv.reader(f, delimiter="\t")
device = ""
for line in reader:
# Ignore empty lines and comments
if len(line) == 0 or (len(line[0]) > 0 and line[0][0] == "#"):
continue
if line[0] != "":
device = line[0]
elif line[1] != "":
lines.append((device, line[1]))
print(lines)
You basically need to loop through each line, and if it's a device line, remember that for the following lines. This will only work for two columns, and you would then need to write them all to a csv file but that's easy enough

How to parse nagios status.dat file?

I'd like to parse status.dat file for nagios3 and output as xml with a python script.
The xml part is the easy one but how do I go about parsing the file? Use multi line regex?
It's possible the file will be large as many hosts and services are monitored, will loading the whole file in memory be wise?
I only need to extract services that have critical state and host they belong to.
Any help and pointing in the right direction will be highly appreciated.
LE Here's how the file looks:
########################################
# NAGIOS STATUS FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
########################################
info {
created=1233491098
version=2.11
}
program {
modified_host_attributes=0
modified_service_attributes=0
nagios_pid=15015
daemon_mode=1
program_start=1233490393
last_command_check=0
last_log_rotation=0
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=0
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
total_external_command_buffer_slots=4096
used_external_command_buffer_slots=0
high_external_command_buffer_slots=0
total_check_result_buffer_slots=4096
used_check_result_buffer_slots=0
high_check_result_buffer_slots=2
}
host {
host_name=localhost
modified_attributes=0
check_command=check-host-alive
event_handler=
has_been_checked=1
should_be_scheduled=0
check_execution_time=0.019
check_latency=0.000
check_type=0
current_state=0
last_hard_state=0
plugin_output=PING OK - Packet loss = 0%, RTA = 3.57 ms
performance_data=
last_check=1233490883
next_check=0
current_attempt=1
max_attempts=10
state_type=1
last_state_change=1233489475
last_hard_state_change=1233489475
last_time_up=1233490883
last_time_down=0
last_time_unreachable=0
last_notification=0
next_notification=0
no_more_notifications=0
current_notification_number=0
notifications_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_host=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
service {
host_name=gateway
service_description=PING
modified_attributes=0
check_command=check_ping!100.0,20%!500.0,60%
event_handler=
has_been_checked=1
should_be_scheduled=1
check_execution_time=4.017
check_latency=0.210
check_type=0
current_state=0
last_hard_state=0
current_attempt=1
max_attempts=4
state_type=1
last_state_change=1233489432
last_hard_state_change=1233489432
last_time_ok=1233491078
last_time_warning=0
last_time_unknown=0
last_time_critical=0
plugin_output=PING OK - Packet loss = 0%, RTA = 2.98 ms
performance_data=
last_check=1233491078
next_check=1233491378
current_notification_number=0
last_notification=0
next_notification=0
no_more_notifications=0
notifications_enabled=1
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_service=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
It can have any number of hosts and a host can have any number of services.

Pfft, get yerself mk_livestatus. http://mathias-kettner.de/checkmk_livestatus.html

Nagiosity does exactly what you want:
http://code.google.com/p/nagiosity/

Having shamelessly stolen from the above examples,
Here's a version build for Python 2.4 that returns a dict containing arrays of nagios sections.
def parseConf(source):
conf = {}
patID=re.compile(r"(?:\s*define)?\s*(\w+)\s+{")
patAttr=re.compile(r"\s*(\w+)(?:=|\s+)(.*)")
patEndID=re.compile(r"\s*}")
for line in source.splitlines():
line=line.strip()
matchID = patID.match(line)
matchAttr = patAttr.match(line)
matchEndID = patEndID.match( line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.setdefault(cur[0],[]).append(cur[1])
del cur
return conf
To get all Names your Host which have contactgroups beginning with 'devops':
nagcfg=parseConf(stringcontaingcompleteconfig)
hostlist=[host['host_name'] for host in nagcfg['host']
if host['contact_groups'].startswith('devops')]

Don't know nagios and its config file, but the structure seems pretty simple:
# comment
identifier {
attribute=
attribute=value
}
which can simply be translated to
<identifier>
<attribute name="attribute-name">attribute-value</attribute>
</identifier>
all contained inside a root-level <nagios> tag.
I don't see line breaks in the values. Does nagios have multi-line values?
You need to take care of equal signs within attribute values, so set your regex to non-greedy.

You can do something like this:
def parseConf(filename):
conf = []
with open(filename, 'r') as f:
for i in f.readlines():
if i[0] == '#': continue
matchID = re.search(r"([\w]+) {", i)
matchAttr = re.search(r"[ ]*([\w]+)=([\w\d]*)", i)
matchEndID = re.search(r"[ ]*}", i)
if matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2)
cur[1][attribute] = value
elif matchEndID:
conf.append(cur)
return conf
def conf2xml(filename):
conf = parseConf(filename)
xml = ''
for ID in conf:
xml += '<%s>\n' % ID[0]
for attr in ID[1]:
xml += '\t<attribute name="%s">%s</attribute>\n' % \
(attr, ID[1][attr])
xml += '</%s>\n' % ID[0]
return xml
Then try to do:
print conf2xml('conf.dat')

If you slightly tweak Andrea's solution you can use that code to parse both the status.dat as well as the objects.cache
def parseConf(source):
conf = []
for line in source.splitlines():
line=line.strip()
matchID = re.match(r"(?:\s*define)?\s*(\w+)\s+{", line)
matchAttr = re.match(r"\s*(\w+)(?:=|\s+)(.*)", line)
matchEndID = re.match(r"\s*}", line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.append(cur)
del cur
return conf
It is a little puzzling why nagios chose to use two different formats for these files, but once you've parsed them both into some usable python objects you can do quite a bit of magic through the external command file.
If anybody has a solution for getting this into a a real xml dom that'd be awesome.

For the last several months I've written and released a tool that that parses the Nagios status.dat and objects.cache and builds a model that allows for some really useful manipulation of Nagios data. We use it to drive an internal operations dashboard that is a simplified 'mini' Nagios. Its under continual development and I've neglected testing and documentation but the code isn't too crazy and I feel fairly easy to follow.
Let me know what you think...
https://github.com/zebpalmer/NagParser

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

converting text file in to Json - python

There are multiple ways you could do this. One would be to parse the your output into a a list then use numpy to convert that to json. For parsing, just use string manipulation (maybe regex if needed) to populate your list.

Related

How to Add data into an existing YAML script using PyYAML?

How to read lines with different format(exceptions) in Lambda function (python code) which reads from S3 to ElK?

Python configparser reads comments in values

Python Converting tab limited file into csv

How to parse nagios status.dat file?

Categories

Resources