I need a way to compare two files that have the same hostname in them. I have written a function that will parse out the hostnames and save that in a list. Once I have that I need to be able to compare the files.
Each file is in a different directory.
Step One: Retrieve "hostname" from each file.
Step Two: Run compare on files with same "hostname" from two directories.
Retrieve hostname Code:
def hostname_parse(directory):
results = []
try:
for filename in os.listdir(directory):
if filename.endswith(('.cfg', '.startup', '.confg')):
file_name = os.path.join(directory, filename)
with open(file_name, "r") as in_file:
for line in in_file:
match = re.search('hostname\s(\S+)', line)
if match:
results.append(match.group(1))
#print "Match Found"
return results
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
print "Error in hostname_parse function"
Sample Data:
Test File:
19-30#
!
version 12.3
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname 19-30
!
boot-start-marker
boot-end-marker
!
ntp clock-period 17179738
ntp source Loopback0
!
end
19-30#
In this case the hostname is 19-30. For ease of testing I just used the same file but modified it to be the same or not the same.
As stated above. I can extract the hostname but am now looking for a way to then compare the files based on the hostname found.
At the core of things it is a file comparison. However being able to look at specific fields will be what I would like to accomplish. For starters I'm just looking to see that the files are identical. Case sensitivity shouldn't matter as these are cisco generated files that have the same formatting. The contents of the files are more important as I'm looking for "configuration" changes.
Here is some code to meet your requirements. I had no way to test, so it may have a few challenges. Is used hash lib to calculate a hash on the file contents, as a way to find changes.
import hashlib
import os
import re
HOSTNAME_RE = re.compile(r'hostname +(\S+)')
def get_file_info_from_lines(filename, file_lines):
hostname = None
a_hash = hashlib.sha1()
for line in file_lines:
a_hash.update(line.encode('utf-8'))
match = HOSTNAME_RE.match(line)
if match:
hostname = match.group(1)
return hostname, filename, a_hash.hexdigest()
def get_file_info(filename):
if filename.endswith(('.cfg', '.startup', '.confg')):
with open(filename, "r") as in_file:
return get_file_info_from_lines(filename, in_file.readlines())
def hostname_parse(directory):
results = {}
for filename in os.listdir(directory):
info = get_file_info(filename)
if info is not None:
results[info[0]] = info
return results
results1 = hostname_parse('dir1')
results2 = hostname_parse('dir2')
for hostname, filename, filehash in results1.values():
if hostname in results2:
_, filename2, filehash2 = results2[hostname]
if filehash != filehash2:
print("%s has a change (%s, %s)" % (
hostname, filehash, filehash2))
print(filename)
print(filename2)
print()
Related
I am using a script to retrieve information from an email, such as attached files.
I no longer want to fully recover the file but only its SHA256. However, I can't figure out how to read the temporary "tmp" file which is of type "_io.bufferedrandom".
I tried to use hashlib, but with the temporary file nothing works.
Here is part of the code (otherwise it's too long):
# Extract MIME parts
filename = part.get_filename()
print(filename)
mimetype = part.get_content_type()
print(mimetype)
if filename and mimetype:
if mimetype in config['caseFiles'] or not config['caseFiles']:
log.info("Found attachment: %s (%s)" % (filename, mimetype))
# Decode the attachment and save it in a temporary file
charset = part.get_content_charset()
if charset is None:
charset = chardet.detect(bytes(part))['encoding']
# Get filename extension to not break TheHive analysers (see Github #11)
fname, fextension = os.path.splitext(filename)
fd, path = tempfile.mkstemp(prefix=slugify(fname) + "_", suffix=fextension)
try:
with os.fdopen(fd, 'w+b') as tmp:
tmp.write(part.get_payload(decode=1))
print(tmp)
for line in tmp:
m = hashlib.sha256(line)
print("test :", m.hexdigest())
attachments.append(path)
except OSerror as e:
log.error("Cannot dump attachment to %s: %s" % (path,e.errno))
return False
I've got a problem with validating text file. I need to check if parameters that I set are correctly saved. File name is an actual date and time and I need to check if parameters that were send are in this text (log) file. Below you can find my code:
Arguments are sent with argpars eg.
parser.add_argument("freq", type=int)
print('Saving Measurement...')
print(inst.write(':MMEMory:STORe:TRACe 0, "%s"' % timestr)) #Saving file on the inst
time.sleep(1) #Wait for file to save
print('Downloading file from device...')
ftp = FTP('XX.XX.XXX.XXX')
ftp.login()
ftp.retrbinary('RETR %s'% timestr + '.spa', open(timestr + '.spa', 'wb').write) #downloading saved file into a directory where you run script
print('Done, saved as: ' + timestr)
time.sleep(1)
with open (timestr + '.spa') as f:
if (str(args.freq)) in f.read():
print("saved correctly")
ftp.delete(timestr + '.spa') #Delete file from inst
ftp.quit()
I'm not sure if it works for me. Thank you for your help
You could use the re module to help you find a date pattern inside your file. I will give you a little example code that searches, at least for this case, this date pattern dd-mm-yyyy
import re
filepath = 'your-file-path.spa'
regex = '\d\d-\d\d-\d\d\d\d'
with open(filepath, 'r') as f:
file = f.read()
dates_found = re.findall(regex, file)
# dates_found will be an array with all the dates found in the file
print(dates_found)
You could use any regex you want as the first argument of re.findall(regex, file)
I have a (large) set of XML files that I want to search for a set of strings all being present within - I am trying to use the following Python code to do this:
import collections
thestrings = []
with open('Strings.txt') as f:
for line in f:
text = line.strip()
thestrings.append(text)
print('Searching for:')
print(thestrings)
print('Results:')
try:
from os import scandir
except ImportError:
from scandir import scandir
def scantree(path):
"""Recursively yield DirEntry objects for given directory."""
for entry in scandir(path):
if entry.is_dir(follow_symlinks=False) and (not entry.name.startswith('.')):
yield from scantree(entry.path)
else:
yield entry
if __name__ == '__main__':
for entry in scantree('//path/to/folder'):
if ('.xml' in entry.name) and ('.zip' not in entry.name):
with open(entry.path) as f:
data = f.readline()
if (thestrings[0] in data):
print('')
print('****** Schema found in: ', entry.name)
print('')
data = f.read()
if (thestrings[1] in data) and (thestrings[2] in data) and (thestrings[3] in data):
print('Hit at:', entry.path)
print("Done!")
Where Strings.txt is a file with the strings I am interested to find, and the first line is the schema URI.
This seems to run OK at first, but after some seconds gives me a:
FileNotFoundError: [WinError 3] The system cannot find the path specified: //some/path
Which is confusing me, since the path is being built during runtime?
Note, if I instrument the code as follows:
with open(entry.path) as f:
data = f.readline()
if (thestrings[0] in data):
To become:
with open(entry.path) as f:
print(entry.name)
data = f.readline()
if (thestrings[0] in data):
Then I see a number of potential files being found before the error occurs.
I realised that my script is finding some very long UNC path names, too long for Windows it seems, so I am now also checking the path length before attempting to open the file, as follows:
if name.endswith('.xml'):
fullpath = os.path.join(root, name)
if (len(fullpath) > 255): ##Too long for Windows!
print('File-extension-based candidate: ', fullpath)
else:
if os.path.isfile(fullpath):
with open(fullpath) as f:
data = f.readline()
if (thestrings[0] in data):
print('Schema-based candidate: ', fullpath)
Note, I also decided to check if the file really is a file, and I altered my code to use os.walk, as suggested above. Along with simplifying the check for a .xml file-extension by using .endswith()
Everything now seems to work OK...
I want to delete some specific lines in a file. The below code doesn't seem to work. There are no errors thrown but this code won't delete the lines that are meant to be deleted.
#!/usr/bin/python
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("input", help="input the data in format ip:port:name", nargs='*')
args = p.parse_args()
kkk_list = args.input # ['1.1.1.1:443:myname', '2.2.2.2:443:yourname']
def getStringInFormat(ip, port, name):
formattedText = "HOST Address:{ip}:PORT:{port}\n"\
" server tcp\n"\
" server {ip}:{port} name {name}\n\n".format(ip=ip,
port=port,
name=name)
return formattedText
with open("file.txt", "r+") as f:
fileContent = f.read()
# below two lines delete old content of file
f.seek(0)
f.truncate()
for kkk in kkk_list:
ip, port, name = re.split(":|,", kkk)
stringNeedsToBeDeleted = getStringInFormat(ip, port, name)
fileContent = fileContent.replace(stringNeedsToBeDeleted, "")
f.write(fileContent)
The content of the file from which I'm trying to delete looks like following. Please note the space before 2nd and 3rd lines
------------do not delete this line------
HOST Address:1.1.1.1:PORT:443\n"\
server tcp\n"\
server 1.1.1.1:443 name myname1
--------------- do not delete this line either
If the script is successful the file should look like below where there is only one new line in between.
------------do not delete this line------
------------do not delete this line either ----
Any insights?
You are doing everything correct in your file editing loop, which means that if you aren't actually replacing anything its because the string you are looking for doesn't exist. Indeed, when you tell us that you are looking for this string:
------------do not delete this line------
HOST Address:1.1.1.1:PORT:443\n"\
server tcp\n"\
server 1.1.1.1:443 name myname1
--------------- do not delete this line either
It doesn't appear to match up with the string you are trying to match it with:
formattedText = "HOST Address:{ip}:PORT:{port}\n"\
" server tcp\n"\
" server {ip}:{port} name {name}\n\n"
Keep in mind in order to replace this string with your current code, the strings have to exactly match, in this case I don't see the \n between "HOST Addess..." and " server tcp\n"\ or the \ lines. But I suspect those were just formatting errors on your part.
If you really want to get to the root of this problem I suggest you find a string you know for certain you are trying to delete and test your code with that to make sure the strings are the same. Here is an example. If you want to find:
HOST Address:1.1.1.1:PORT:443
server tcp
server 1.1.1.1:443 name myname1
Then compare with your search string via:
test_string = # the string I posted above, you should probably
# grab this from a file for consistency.
kkk = '1.1.1.1:443:myname'
ip, port, name = re.split(":|,", kkk)
assert ip == '1.1.1.1'
assert port == '443'
assert name == 'myname'
stringNeedsToBeDeleted = getStringInFormat(ip, port, name)
assert stringNeedsToBeDeleted == test_string, "Error, strings are not equal!"
This should give you a clue into what the actual problem is. myname1, which I grabbed directly from your example, doesn't match up with your match string.
You're opening the file in read mode 'r+'. You need to open it in write mode 'w' to write to it. Or just don't specify a mode.
You can copy the contents of your file to a list, write over the old file, mutate your list, and then write the list to the file.
import argparse
import re
import string
p = argparse.ArgumentParser()
p.add_argument("input", help="input the data in format ip:port:name",nargs='*')
args = p.parse_args()
kkk_list = args.input # ['1.1.1.1:443:myname', '2.2.2.2:443:yourname']
def getStringInFormat(ip, port, name):
formattedText = "HOST Address:{ip}:PORT:{port}\n"\
" server tcp\n"\
" server {ip}:{port} name {name}\n\n".format(ip=ip,
port=port,
name=name)
return formattedText
with open("file.txt", "r+") as f:
for kkk in kkk_list:
ip, port, name = re.split(":|,", kkk)
stringNeedsToBeDeleted = getStringInFormat(ip, port, name)
fileContent = fileContent.replace(stringNeedsToBeDeleted, "")
f.write(fileContent) #now, your file contains all the host addressed and IPs
f = open("file.txt").readlines()
contents = [i.strip('\n').split() for i in f]
new_file = open('file.txt', 'w')
new_file.write('')
new_file.write(''.join(contents[0]))
new_file.write('\n\n\n')
new_file.write(contents[''.join(len(contents)-1])))
new_file.close()
Problem Statement:
I have multiple(1000+) *.gz files in a remote server. I have to read these files and check for certain strings. If the strings matches, I have to return the file name. I have tried the following code. The following program is working but doesnot seem efficient as there is a huge IO involved. Can you please suggest an efficient way to do this.
My Code:
import gzip
import os
import paramiko
import multiprocessing
from bisect import insort
synchObj=multiprocessing.Manager()
hostname = '192.168.1.2'
port = 22
username='may'
password='Apa$sW0rd'
def miniAnalyze():
ifile_list=synchObj.list([]) # A synchronized list to Store the File names containing the matched String.
def analyze_the_file(file_single):
strings = ("error 72","error 81",) # Hard Coded the Strings that needs to be searched.
try:
ssh=paramiko.SSHClient()
#Code to FTP the file to local system from the remote machine.
.....
........
path_f='/home/user/may/'+filename
#Read the Gzip file in local system after FTP is done
with gzip.open(path_f, 'rb') as f:
contents = f.read()
if any(s in contents for s in strings):
print "File " + str(path_f) + " is a hit."
insort(ifile_list, filename) # Push the file into the list if there is a match.
os.remove(path_f)
else:
os.remove(path_f)
except Exception, ae:
print "Error while Analyzing file "+ str(ae)
finally:
if ifile_list:
print "The Error is at "+ ifile_list
ftp.close()
ssh.close()
def assign_to_proc():
# Code to glob files matching a pattern and pass to another function via multiprocess .
apath = '/home/remotemachine/log/'
apattern = '"*.gz"'
first_command = 'find {path} -name {pattern}'
command = first_command.format(path=apath, pattern=apattern)
try:
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(hostname,username=username,password=password)
stdin, stdout, stderr = ssh.exec_command(command)
while not stdout.channel.exit_status_ready():
time.sleep(2)
filelist = stdout.read().splitlines()
jobs = []
for ifle in filelist:
p = multiprocessing.Process(target=analyze_the_file,args=(ifle,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
except Exception, fe:
print "Error while getting file names "+ str(fe)
finally:
ssh.close()
if __name__ == '__main__':
miniAnalyze()
The above code is slow. There are lot of IO while getting the GZ file to local system. Kindly help me to find a better way to do it.
Execute a remote OS command such as zgrep, and process the command results locally. This way, you won't have to transfer the whole file contents on your local machine.