How to implement json format logs in python - python

I have a below piece of code for python logging and would want to convert the logs into json format for better accessibility of information. How can I convert them into JSON format?
import os
import logging
log_fmt = ("%(asctime)-s %(levelname)-s %(message)s")
logger = logging.getLogger()
logger.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))
logger.info(f"this is a test")
And the output looks like "2022-04-20 17:40:31,332 INFO this is a test"
How can I format this into a json object so I can access by keys?
Desired output:
{
"time": "2022-04-20 17:40:31,332",
"level": "INFO",
"message": "this is a test"
}

You could use the Python JSON Logger
But if you don't want to, or can't do that, then your log format string should be...
log_fmt = ("{\"time\": %(asctime)-s, \"level\": %(levelname)-s, \"message\": %(message)s},")
You'll end up with an extra comma at the end of the log file that you can programatically remove later. Or, you can do this if you want the comma at the top of the file...
log_fmt = (",{\"time\": %(asctime)-s, \"level\": %(levelname)-s, \"message\": %(message)s}")
But the json will look better in an editor with the comma at the end of every line.
If you provide a mechanism for users to download, or otherwise access log files, then you can do the trailing comma cleanup there, before you send the log file to the user.

Related

PDF long text extraction to JSON in Python

I'm trying to create a python script that extracts text from a PDF then converts it to a correctly formatted JSON file (see below).
The text extraction is not a problem. I'm using PyPDF2 to extract the text from user inputted pdf - which will often result in a LONG text string. I would like to add this text as a 'value' to a json 'key' (see 2nd example below).
My code:
# Writing all data to JSON file
# Data to be written
dictionary ={
"company": str(company),
"document": str(document),
"text": str(text) # This is what would be a LONG string of text
}
# Serializing json
json_object = json.dumps(dictionary, indent = 4)
print(json_object)
with open('company_document.json', 'w') as fp:
json.dump(json_object, fp)
The ideal output would be a JSON file that is structured like this:
[
{
"company": 1,
"document-name": "Orlando",
"text": " **LONG_TEXT_HERE** "
}
]
I'm not getting the right json structure as an output. Also, the long text string most likely contains some punctuation or special characters that can affect the json - such as closing the string too early. I could take this out before, but is there a way to keep it in for the json file so I can address it in the next step (in Neo4j) ?
This is my output at the moment:
"{\n \"company\": \"Stack\",\n \"document\": \"Overflow Report\",\n \"text\": \"Long text 2020\\nSharing relevant and accountable information about our development "quotes and things...
Current:
Current situation
Goal:
Ideal situation
Does anyone have an idea on how this can be achieved?
Like many people, you are confusing the CONTENT of your data with the REPRESENTATION of your data. The code you have works just fine. Notice:
import json
# Data to be written
dictionary ={
"company": 1,
"document": "Orlando",
"text": """Long text 2020
Sharing relevant and accountable information about our development.
This is a complicated text string with "quotes and things".
"""
}
# Serializing json
json_object = json.dumps([dictionary], indent = 4)
print(json_object)
with open('company_document.json', 'w') as fp:
json.dump([dictionary], fp)
When executed, this produces the following on stdout:
[
{
"company": 1,
"document": "Orlando",
"text": "Long text 2020\nSharing relevant and accountable information about our development.\nThis is a complicated text string with \"quotes and things\".\n"
}
]
Notice that the embedded quotes are escaped. That's what the standard requires. The file does not have the indentation, because you didn't ask for it, but it's still quite valid JSON.
[{"company": 1, "document": "Orlando", "text": "Long text 2020\nSharing relevant and accountable information about our development.\nThis is a complicated text string with \"quotes and things\".\n"}]
FOLLOWUP
This version reads in whatever was in the file before, adds a new record to the list, and saves the whole thing out.
import os
import json
# Read existing data.
MASTER = "company_document.json"
if os.path.exists( MASTER ):
database = json.load( open(MASTER,'r') )
else:
database = []
# Data to be written
dictionary ={
"company": 1,
"document": "Orlando",
"text": """Long text 2020
Sharing relevant and accountable information about our development.
This is a complicated text string with "quotes and things".
"""
}
# Serializing json
json_object = json.dumps([dictionary], indent = 4)
print(json_object)
database.append(dictionary)
with open(MASTER, 'w') as fp:
json.dump(database, fp)

How to add a timestamp and loglevel to each log in Python's structlog?

How to configure structlog so it automatically adds loglevel and a timestamp (and maybe other fields) by default to each log message it logs? So I do not have to add it to every message explicitly.
I am displaying my messages as JSON (for further processing with Fluentd, Elasticsearch and Kibana). loglevel is not (for some reason) included in the output JSON log.
That is how I confiure my structlog.
structlog.configure(
processors=[structlog.processors.JSONRenderer()],
wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
)
I am logging:
log.info("Artist saved", spotify_id=id)
Logs I am seeing (mind no time and no loglevel):
{"logger": "get_artists.py", "spotify_id": "4Y6z2aIww27vnxZz9xfG3S", "event": "Artist saved"}
I found my answer here: Python add extra fields to structlog-based formatters within logging
There are processors that are doing exactly what I needed:
structlog.configure(
processors=[
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso", key="ts"),
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
)
Adding both, add_log_level and TimeStamper resulted, as expected in the extra fields in the log ..., "level": "info", "ts": "2022-04-17T19:21:56.426093Z"}.

Reading/Writing to JSON adds an extra unnecessary curly bracket }

I am writing a program in Python where a JSON local file needs to be updated with the last processed item in a database so that the process kicks off again from that point.
Problem I am having in my code is that sometimes, it adds an extra curly bracket "}" to the end of the code causing the JSON to become invalid. This then breaks the scheduled process until the JSON file is updated.
I know that I can first read the file to the object, then close the file then open it again to write to it but personally it doesn't feel like the code would be as clean given that it is being constantly written to so that the tracking of the last processed item is not lost.
import json
with open(_SETTINGS, 'r+') as settings:
_last_processed = log['#timestamp']
settings_data[env]['last_processed'] = _last_processed
settings.seek(0)
# settings.truncate()
json.dumps(settings_data, settings, indent=2)
The JSON file, _SETTINGS, looks like as follows:
{
"UAT": {
"last_processed": "2019-10-10T00:00:00.0000Z"
},
"DEV": {
"last_processed": "2019-10-10T00:00:00.0000Z"
}
}
Annoyingly what only sometimes gets returned is the above JSON but with an extra closing curly bracket "}" as below.
{
"UAT": {
"last_processed": "2019-10-10T00:00:00.0000Z"
},
"DEV": {
"last_processed": "2019-10-10T00:00:00.0000Z"
}
}}
Anyone can shed some light on this?

Use a json file stored on fs/disk as output for an Ansible module

I am struggling with an ansible module I needed to create. Everything is done, the module gets a json file delivered from a third party onto the fs. This json file is expected to be the (only) output to be able to access to register the json file and access the content - or at least make the output somehow properly accessible.
The output file consists of a proper json file and I have tried various stuff to reach my goal.
Including:
Simply print out the json file using print or os.stdout.write, because according to the documentation, ansible simply takes the stdout.
Importing the json and dump is using json.dumps(data) or like this:
with open('path-to-file', 'r') as tmpfile:
data = json.load(tmpfile)
module.exit_json(changed=True, message="API call to %s successfull" % endpoint, meta=data)
This ended up having the json in the output, but in an escaped variant and ansible refuses to access the escaped part.
What would be the correct way to make the json data accessible for further usage?
Edit:
The json looks like this (well, it’s a huge json, this is simply a part of it):
{
"total_results": 51,
"total_pages": 2,
"prev_url": null,
"next_url": "/v2/apps?order-direction=asc&page=2&results-per-page=50",
After register, the debug output looks like this and I cannot access output.meta.total_results for example.
ok: [localhost] => {
"output": {
"changed": true,
"message": "API call filtering /v2/apps with name and yes-no was successfull",
"meta": "{\"total_results\": 51, \"next_url\": \"/v2/apps?order-direction=asc&page=2&results-per-page=50\", \"total_pages\": 2, \"prev_url\": null, (...)
The ansible output when trying to access the var:
ok: [localhost] => {
"output.meta.total_results": "VARIABLE IS NOT DEFINED!"
}
Interesting. My tests using os.stdout.write somehow failed, but using print json.dumps(data) works.
This is solved.

Rename log file in Python while file keeps writing any other logs

I am using the Python logger mechanism for keeping a record of my logs. I have two types of logs,
one is the Rotating log (log1, log2, log3...) and a non-rotating log called json.log (which has json logs in it as the name suggests).
The log files are created when the server is started and close when the app is closed.
What I am trying to do in general is: When I press the import button on my page, to have all json logs saved on the sqlite db.
The problem I am facing is:
When I try to rename the json.log file like this:
source_file = "./logs/json.log"
snapshot_file = "./logs/json.snapshot.log"
try:
os.rename(source_file, snapshot_file)
I get the windowsError: [Error 32] The process cannot access the file because it is being used by another process
and this is because the file is being used by the logger continuously. Therefore, I need to "close" the file somehow so I can do my I/O operation successfully.
The thing is that this is not desirable because logs might be lost until the file is closed, then renamed and then "re-created".
I was wondering if anyone came across such scenario again and if any practical solution was found.
I have tried something which works but does not seem convenient and not sure if it is safe so that any logs are not lost.
My code is this:
source_file = "./logs/json.log"
snapshot_file = "./logs/json.snapshot.log"
try:
logger = get_logger()
# some hackish way to remove the handler for json.log
if len(logger.handlers) > 2:
logger.removeHandler(logger.handlers[2])
if not os.path.exists(snapshot_file):
os.rename(source_file, snapshot_file)
try:
if type(logger.handlers[2]) == RequestLoggerHandler:
del logger.handlers[2]
except IndexError:
pass
# re-adding the logs file handler so it continues writing the logs
json_file_name = configuration["brew.log_file_dir"] + os.sep + "json.log"
json_log_level = logging.DEBUG
json_file_handler = logging.FileHandler(json_file_name)
json_file_handler.setLevel(json_log_level)
json_file_handler.addFilter(JSONLoggerFiltering())
json_file_handler.setFormatter(JSONFormatter())
logger.addHandler(json_file_handler)
... code continues to write the logs to the db and then delete the json.snapshot.file
until the next time the import button is pressed; then the snapshot is created again
only for writing the logs to the db.
Also for reference my log file has this format:
{'status': 200, 'actual_user': 1, 'resource_name': '/core/logs/process', 'log_level': 'INFO', 'request_body': None, ... }
Thanks in advance :)

Categories