Watch logs in a folder in real time with Python - python

I'm trying to make a custom logwatcher of a log folder using python. The objective is simple, finding a regex in the logs and write a line in a text if find it.
The problem is that the script must be running constantly against a folder in where could be multiple log files of unknown names, not a single one, and it should detect the creation of new log files inside the folder on the fly.
I made some kind of tail -f (copying part of the code) in python which is constantly reading a specific log file and write a line in a txt file if regex is found in it, but I don't know how could I do it with a folder instead a single log file, and how can the script detect the creation of new log files inside the folder to read them on the fly.
#!/usr/bin/env python
import time, os, re
from datetime import datetime
# Regex used to match relevant loglines
error_regex = re.compile(r"ERROR:")
start_regex = re.compile(r"INFO: Service started:")
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("log/script-log.txt")
# Function that will work as tail -f for python
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log/service.log")
loglines = follow(logfile)
counter = 0
for line in loglines:
if (error_regex.search(line)):
counter += 1
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + line)
out_file.close()
if (start_regex.search(line)):
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + "SERVICE STARTED\n" + sttime + "Number of errors detected during the startup = {}\n".format(counter))
counter = 0
out_file.close()

You can use watchgod for this purpose. This may be a comment too, not sure if it deserves to be na answer.

Related

How can I create a new text file in python

I made this program it takes user input and outputs it in a new text file.
output = input('Insert your text')
f = open("text.txt", "a")
f.write(output)
This code will take a users input and prints it in a new text file. But if the file already exists in the path, the python code will just append to the file. I want the code to create a new file in the path every time the program is run. So the first time the code is run it will be displayed as text.txt, and the second time it runs it should output a new file called text(1).txt and so on.
Start by checking if test.txt exists. If it does, with a loop, check for test(n).txt, with n being some positive integer, starting at 1.
from os.path import isfile
output = input('Insert your text')
newFileName = "text.txt"
i = 1
while isfile(newFileName):
newFileName = "text({}).txt".format(i)
i += 1
f = open(newFileName, "w")
f.write(output)
f.close()
Eventually, the loop will reach some n, for which the filename test(n).txt doesn't exist and will save the file with that name.
Check if the file you are trying to create already exists. If yes, then change the file name, else write text to the file.
import os
output = input('Insert your text ')
filename = 'text.txt'
i = 1
while os.path.exists(filename):
filename = 'text ('+str(i)+').txt'
i += 1
f = open(filename, "a")
f.write(output)
Check if file already exists
import os.path
os.path.exists('filename-here.txt')
If file exists then create file with another filename (eg - appending the filename with date & time or any number etc)
A problem with checking for existence is that there can be a race condition if two processes try to create the same file:
process 1: does file exist? (no)
process 2: does file exist? (no)
process 2: create file for writing ('w', which truncates if it exists)
process 2: write file.
process 2: close file.
process 1: create same file for writing ('w', which truncates process 2's file).
A way around this is mode 'x' (open for exclusive creation, failing if the file already exists), but in the scenario above that would just make process 1 get an error instead of truncating process 2's file.
To open the file with an incrementing filename as the OP described, this can be used:
import os
def unique_open(filename):
# "name" contains everything up to the extension.
# "ext" contains the last dot(.) and extension, if any
name,ext = os.path.splitext(filename)
n = 0
while True:
try:
return open(filename,'x')
except FileExistsError:
n += 1
# build new filename with incrementing number
filename = f'{name}({n}){ext}'
file = unique_open('test.txt')
file.write('content')
file.close()
To make the function work with a context manager ("with" statement), a contextlib.contextmanager can be used to decorate the function and provide automatic .close() of the file:
import os
import contextlib
#contextlib.contextmanager
def unique_open(filename):
n = 0
name,ext = os.path.splitext(filename)
try:
while True:
try:
file = open(filename,'x')
except FileExistsError:
n += 1
filename = f'{name}({n}){ext}'
else:
print(f'opened {filename}') # for debugging
yield file # value of with's "as".
break # open succeeded, so exit while
finally:
file.close() # cleanup when with block exits
with unique_open('test.txt') as f:
f.write('content')
Demo:
C:\>test.py
opened test.txt
C:\>test
opened test(1).txt
C:\>test
opened test(2).txt

How to create and write into a file correctly in Python

I am trying to create a file in a certain directory, and save the name of that file with today's date.
I am having some issue, where the file is created, but the title line that I want to write in, does not work.
from datetime import datetime
today = datetime.now().date().strftime('%Y-%m-%d')
g = open(path_prefix+today+'.csv', 'w+')
if os.stat(path_prefix+today+'.csv').st_size == 0: # this checks if file is empty
g = open(path_prefix+today+'.csv', 'w+')
g.write('Title\r\n')
path_prefix is just a path to the directory I am saving in /Users/name/Documents/folder/subfolder/
I am expecting a file 2019-08-22.csv to be saved in the directory given by path_prefix with a title as specified in the last line of the code above.
What I am getting is an empty file, and if I run the code again then the title is appended into the file.
As mentioned by #sampie777 I was not losing the file after writing to it, which is why the changes were not being saved when I opened the file. Adding close in an extra line solves the issue that I was having
from datetime import datetime
today = datetime.now().date().strftime('%Y-%m-%d')
g = open(path_prefix+today+'.csv', 'w+')
if os.stat(path_prefix+today+'.csv').st_size == 0: #this checks if file is empty
g = open(path_prefix+today+'.csv', 'w+')
g.write('Title\r\n')
g.close()
I am sure there are plenty of other ways to do this
You need to close the file before the content will be written to it. So call
g.close().
I can suggest to use:
with open(path_prefix+today+'.csv', 'w+') as g:
g.write('...')
This will automatically handle closing the file for you.
Also, why are you opening the file two times?
Tip: I see you are using path_prefix+today+'.csv' a lot. Create a variable for this, so you're code will be a lot easier to maintain.
Suggested refactor of the last lines:
output_file_name = path_prefix + today + '.csv' # I prefer "{}{}.csv".format(path_prefix, today) or "%s%s.csv" % (path_prefix, today)
is_output_file_empty = os.stat(output_file_name).st_size == 0
with open(output_file_name, 'a') as output_file:
if is_output_file_empty:
output_file.write('Title\r\n')
For more information, see this question: Correct way to write line to file?
and maybo also How to check whether a file is empty or not?
I haven't used Python in a while, but by doing a quick bit of research, this seems like it could work:
# - Load imports
import os
import os.path
from datetime import datetime
# - Get the date
dateToday = datetime.now().date()
# - Set the savePath / path_prefix
savePath = 'C:/Users/name/Documents/folder/subfolder/'
fileName = dateToday.strftime("%Y-%m-%d") # - Convert 'dateToday' to string
# - Join path and file name
completeName = os.path.join(savePath, fileName + ".csv")
# - Check for file
if (not path.exists(completeName)):
# - If it doesn't exist, write to it and then close
with (open(completeName, 'w+') as file):
file.write('Title\r\n')
else:
print("File already exists")

Making a Corpus out of Wiki DumpFile using Python in NLTK

I am trying to create a corpus out of Wiki DumpFile.
I've downloaded the Wiki enwiki-latest-pages-articles.xml.bz2 file, but when I run the code(script) it gives me some errors.
I am relatively new to this, but I do not understand how the python code and the wiki file should be placed (same folders, which folder, etc.).
I've run this command: make_wiki_corpus enwiki-latest-pages-articles.xml.bz2 wiki_en.txt
make_wiki_corpus - being my python script
enwiki-latest-pages-articles.xml.bz2 - is the wikipedia database
wiki_en.txt - the textfile I want to write into.
import sys
from gensim.corpora import WikiCorpus
def make_corpus(in_f, out_f):
"""Convert Wikipedia xml dump file to text corpus"""
output = open(out_f, 'w')
wiki = WikiCorpus(in_f)
i = 0
for text in wiki.get_texts():
output.write(bytes(' '.join(text), 'utf-8').decode('utf-8') + '\n')
i = i + 1
if (i % 10000 == 0):
print('Processed ' + str(i) + ' articles')
output.close()
print('Processing complete!')
if __name__ == '__main__':
if len(sys.argv) != 3:
print('Usage: python make_wiki_corpus.py <wikipedia_dump_file> <processed_text_file>')
sys.exit(1)
in_f = sys.argv[1]
out_f = sys.argv[2]
make_corpus(in_f, out_f)
I ran the command, containing this code, being in the same file with the enwiki-latest-pages-articles.xml.bz2 file, but at the command prompt I get some error messages like:
line 636 in \__init__
line 92 in __init__
FileNotFound Eroor : [Errorno21] No such file or directory "enwiki-latest-pages-articles.xml.bz2"
Maybe some of these ideas would be useful for u (hopefully):
suggestion #1: if I'm not mistaken python make_wiki_corpus {your bz2 file} {your txt file} should be used;
suggestion #2: try to apply the whole directory path to your needed file;
suggestion #3: u could also print the code from the development environment itself (to avoid any complications possibly).

How to remove row line numbers from several .doc/.docx files on Linux?

I need to remove row line numbers from a large collection of Word .doc/.docx files as part of a (Python) data processing pipeline.
I am aware of solutions to do this in C# using Word.Interop (e.g. Is it possible to use Microsoft.Office.Interop.Word to programatically remove line numbering from a Word document?) but it would be great to achieve this e.g. using LibreOffice in --headless mode (before evaluating MS Word + wine solutions).
For a single file, with the UI, one can follow https://help.libreoffice.org/Writer/Line_Numbering, but I need to do this for a lot of files, so a macro/script/command line solution to
1) cycle through a set of files
2) remove row numbers and save the result to file
and triggered with e.g. a Python subprocess call would be great, or even with calls to the Python API (https://help.libreoffice.org/Common/Scripting).
To perform line removal for a list of files in the working directory (and put the resulting output into pdfs) run LibreOffice in a Linux command line:
soffice --headless --accept="socket,host=localhost,port=2002;urp;StarOffice.ServiceManager"
and then in the Python interpreter
import uno
import socket
import os
import subprocess
from pythonscript import ScriptContext
from com.sun.star.beans import PropertyValue
# list docfiles in working dir
files = [x for x in os.listdir('.') if x.endswith(".docx")]
# iterate on files
for file in files:
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
ctx = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
smgr = ctx.ServiceManager
desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx)
# open file
model = desktop.loadComponentFromURL(uno.systemPathToFileUrl(os.path.realpath(file)), "_blank", 0, ())
# remove line numbers
model.getLineNumberingProperties().IsOn = False
# prepare to save output to pdf
XSCRIPTCONTEXT = ScriptContext(ctx, None, None)
p = PropertyValue()
p.Name = 'FilterName'
p.Value = 'writer_pdf_Export'
oDoc = XSCRIPTCONTEXT.getDocument()
# create pdf
oDoc.storeToURL("file://" + os.getcwd() + "/" + file + ".pdf", tuple([p]))
This should create pdf files with no line numbering in your working directory.
Useful links:
Add line numbers and export to pdf via macro on OpenOffice forums
LineNumberingProperties documentation
Info on running a macro from the command line

How to redirect stdout to only the console when within fileinput loop

Currently I have this piece of code for python 2.7:
h = 0
for line in fileinput.input('HISTORY',inplace=1):
if line[0:2] == x:
h = h + 1
if h in AU:
line = line.replace(x,'AU')
if 'timestep' in line:
h = 0
sys.stdout.write(('\r%s%% ') % format(((os.stat('HISTORY').st_size / os.stat('HISTORY.bak').st_size)*100),'.1f'))
sys.stdout.write(line)
What I am having trouble with is the following line:
sys.stdout.write(('\r%s%% ') % format(((os.stat('HISTORY').st_size / os.stat('HISTORY.bak').st_size)*100),'.1f'))
I need this information to be outputted to the console ONLY and not into the HISTORY file.
This code creates a temporary copy of the input file, then scans this and rewrites the original file. It handles errors during processing the file so that the original data isn't lost during the re-write. It demonstrates how to write some data to stdout occasionally and other data back to the original file.
The temporary file creation was taken from this SO answer.
import fileinput
import os, shutil, tempfile
# create a copy of the source file into a system specified
# temporary directory. You could just put this in the original
# folder, if you wanted
def create_temp_copy(src_filename):
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, 'temp-history.txt')
shutil.copy2(src_filename,temp_path)
return temp_path
# create a temporary copy of the input file
temp = create_temp_copy('HISTORY.txt')
# open up the input file for writing
dst = open('HISTORY.txt','w+')
for line in fileinput.input(temp):
# Added a try/catch to handle errors during processing.
# If this isn't present, any exceptions that are raised
# during processing could cause unrecoverable loss of
# the HISTORY file
try:
# some sort of replacement
if line.startswith('e'):
line = line.strip() + '#\n' # notice the newline here
# occasional status updates to stdout
if '0' in line:
print 'info:',line.strip() # notice the removal of the newline
except:
# when a problem occurs, just output a message
print 'Error processing input file'
finally:
# re-write the original input file
# even if there are exceptions
dst.write(line)
# deletes the temporary file
os.remove(temp)
# close the original file
dst.close()
If you only want the information to go to the console could you just use print instead?

Categories