Converting cloud-init logs to json using a script - python

I am trying to convert the cloud-init logs to json, so that the filebeat can pick it up and send it to the Kibana. I want to do this by using a shell script or python script. Is there any script that converts such logs to json?
My python script is below
import json
import subprocess
filename = "/home/umesh/Downloads/scripts/cloud-init.log"
def convert_to_json_log(line):
""" convert each line to json format """
log = {}
log['msg'] = line
log['logger-name'] = 'cloud-init'
log['ServiceName'] = 'Contentprocessing'
return json.dumps(log)
def log_as_json(filename):
f = subprocess.Popen(['cat','-F',filename],
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
line = f.stdout.readline()
log = convert_to_json_log(line)
print log
with open("/home/umesh/Downloads/outputs/cloud-init-json.log", 'a') as new:
new.write(log + '\n')
log_as_json(filename)
The scripts returns a file with json format, but the msg filed returns empty string. I want to convert each line of the log as message string.

Firstly, try reading the raw log file using python inbuilt functions rather than running os commands using subprocess, because:
It will be more portable (work across OS'es)
Faster and less prone to errors
Re-writing your log_as_json function as follows worked for me:
inputfile = "cloud-init.log"
outputfile = "cloud-init-json.log"
def log_as_json(filename):
# Open cloud-init log file for reading
with open(inputfile, 'r') as log:
# Open the output file to append json entries
with open(outputfile, 'a') as jsonlog:
# Read line by line
for line in log.readlines():
# Convert to json and write to file
jsonlog.write(convert_to_json(line)+"\n")

After taking some time on preparing the customised script finally i made the below script. It might be helpful to many others.
import json
def convert_to_json_log(line):
""" convert each line to json format """
log = {}
log['msg'] = json.dumps(line)
log['logger-name'] = 'cloud-init'
log['serviceName'] = 'content-processing'
return json.dumps(log)
# Open the file with read only permit
f = open('/var/log/cloud-init.log', "r")
# use readlines to read all lines in the file
# The variable "lines" is a list containing all lines in the file
lines = f.readlines()
# close the file after reading the lines.
f.close()
jsonData = ''
for line in lines:
jsonLine = convert_to_json_log(line)
jsonData = jsonData + "\n" + jsonLine;
with open("/var/log/cloud-init/cloud-init-json.log", 'w') as new:
new.write(jsonData)

Related

I need help creating a simple python script that stores an attribute value from a custom json file

JSON file looks like this:
{"Clear":"Pass","Email":"noname#email.com","ID":1234}
There are hundreds of json files with different email values, which is why I need a script to run against all files.
I need to extract out the value associated with the Email attribute, which is nooname#email.com.
I tried using import json but I'm getting a decoder error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Script looks like this:
import json
json_data = json.loads("file.json")
print (json_data["Email"]
Thanks!
According to the docs, json.loads() takes a str, bytes or bytearray as argument. So if you want to load a json file this way, you should pass the content of the file instead of its path.
import json
file = open("file.json", "r") # Opens file.json in read mode
file_data = file.read()
json_data = json.loads(file_data)
file.close() # Remember to close the file after using it
You can also use json.load() which takes a FILE as argument
import json
file = open("file.json", "r")
json_data = json.load(file)
file.close()
your script needs to open the file to get a file handle, than we can read the json.
this sample contains code that can read the json file. to simulate this, it uses a string that is identical with the data coming from the file.
import json
#this is to read from the real json file
#file_name = 'email.json'
#with open(file_name, 'r') as f_obj:
#json_data = json.load(f_obj)
# this is a string that equals the result from reading json file
json_data = '{"Clear":"Pass","Email":"noname#email.com","ID":1234}'
json_data = json.loads(json_data)
print (json_data["Email"])
result: noname#email.com
import json
with open("file.json", 'r') as f:
file_content = f.read()
#convert json to python dict
tmp = json.loads(file_content)
email = tmp["Email"]
As already pointed out in previous comments, json.loads() take contents of a file rather than a file.

GZip and output file

I'm having difficulty with the following code (which is simplified from a larger application I'm working on in Python).
from io import StringIO
import gzip
jsonString = 'JSON encoded string here created by a previous process in the application'
out = StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(str.encode(jsonString))
# Write the file once finished rather than streaming it - uncomment the next line to see file locally.
with open("out_" + currenttimestamp + ".json.gz", "a", encoding="utf-8") as f:
f.write(out.getvalue())
When this runs I get the following error:
File "d:\Development\AWS\TwitterCompetitionsStreaming.py", line 61, in on_status
with gzip.GzipFile(fileobj=out, mode="w") as f:
File "C:\Python38\lib\gzip.py", line 204, in __init__
self._write_gzip_header(compresslevel)
File "C:\Python38\lib\gzip.py", line 232, in _write_gzip_header
self.fileobj.write(b'\037\213') # magic header
TypeError: string argument expected, got 'bytes'
PS ignore the rubbish indenting here...I know it doesn't look right.
What I'm wanting to do is to create a json file and gzip it in place in memory before saving the gzipped file to the filesystem (windows). I know I've gone about this the wrong way and could do with a pointer. Many thanks in advance.
You have to use bytes everywhere when working with gzip instead of strings and text. First, use BytesIO instead of StringIO. Second, mode should be 'wb' for bytes instead of 'w' (last is for text) (samely 'ab' instead of 'a' when appending), here 'b' character means "bytes". Full corrected code below:
Try it online!
from io import BytesIO
import gzip
jsonString = 'JSON encoded string here created by a previous process in the application'
out = BytesIO()
with gzip.GzipFile(fileobj = out, mode = 'wb') as f:
f.write(str.encode(jsonString))
currenttimestamp = '2021-01-29'
# Write the file once finished rather than streaming it - uncomment the next line to see file locally.
with open("out_" + currenttimestamp + ".json.gz", "wb") as f:
f.write(out.getvalue())

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

Read lines and remove them after read complete

I am new to python language, trying to develop a script to read a file with emails in it, split good emails from bad emails and than remove that line from the source file.
I got so far but here i have no idea how to remove the line already readed
Any help?
import os
with open('/home/klevin/Desktop/python_test/email.txt', 'rw+') as f:
for line in f.readlines():
#print line
domain = line.split("#")[1]
#print(domain)
response = os.system("ping -c 1 " + domain)
if response == 0:
print(response)
file1 = open("good_emails.txt","a")
file1.write( line )
else:
print(response)
file = open("bad_emails.txt","a")
file.write( line )
In general I would not prefer to both read and write to a file at the same time. So here is what I would do:
open the file for reading
loop over the emails and do your thing. In the comments below you've clarified you want to test only the first 100 mails, so that is what the code below now does.
close the file
reopen the file but this time in write mode, truncating it (throwing away its contents)
write all the remaining (untested) emails to the file
This effectively removes all mails that have been tested.
The code might look like this:
import os
emails = []
# Opening the file for reading
with open('email.txt', 'r') as f, open("good_emails.txt", "w") as good, open("bad_emails.txt", "w") as bad:
emails = f.readlines()
# Only loop over the first 100 mails
for line in emails[:100]:
domain = line.split("#")[1]
response = os.system("ping -c 1 " + domain)
if response == 0:
print(response)
good.write( line )
else:
print(response)
bad.write( line )
# Now re-open the file and overwrite it with the correct emails
with open('email.txt', 'w') as f:
# Write the remaining emails to the original file
for e in emails[100:]:
f.write(e)
You can't. That's simply not how files work, you cannot just remove a couple lines from the middle of a file. To achieve what you want you want to overwrite or replace the file.
So in your code you'd remove the original file and copy good_email.txt over it:
import shutil
import subprocess
with open('email.txt', 'r') as original, open("good_emails.txt", "w") as good, open("bad_emails.txt", "w") as bad:
for line in original: # no need to readlines()
domain = line.split("#")[1]
response = subprocess.call(['ping', '-c', '1', domain])
if response == 0:
good.write(line)
else:
bad.write(line)
shutil.copyfile('good_emails.txt', 'emails.txt')

using readline() in python to read a txt file but the second line is not read in aws comprehend api

I am reading a text file and passing it to the API, but then I am getting the result only for the first line in the file, the subsequent lines are not being read.
code below :
filename = 'c:\myfile.txt'
with open(filename) as f:
plain_text = f.readline()
response = client_comprehend.detect_entities(
Text=plain_text,
LanguageCode='en'
)
entites = list(set([x['Type'] for x in response['Entities']]))
print response
print entites
When you are doing with f.readline() it will only take the first line of the file. So if you want to go through each line of the file you have to loop through it. Otherwise if you want to read the entire file(not meant for big files) you can use f.read()
filename = 'c:\myfile.txt'
with open(filename) as f:
for plain_text in f:
response = client_comprehend.detect_entities(
Text=plain_text,
LanguageCode='en'
)
entites = list(set([x['Type'] for x in response['Entities']]))
print response
print entites
As csblo has pointed out in the comments, your readline is only reading the first line of the file because it's only being called once. readline is called once in your program as it is written, it performs the actions for the single line that has been read, and then the program closes without doing anything else.
Conveniently, file objects can be iterated over in a for loop like you would a list. Iterating over a file will return one line per iteration, as though you had called readline and assigned it to a value. Using this, your code will work when rewritten as such:
filename = 'c:\myfile.txt'
with open(filename) as f:
for plain_text_line in f:
response = client_comprehend.detect_entities(
Text=plain_text_line,
LanguageCode='en'
)
entites = list(set([x['Type'] for x in response['Entities']]))
print response
print entites
This should iterate over all lines of the file in turn.

Categories