Currently my /etc/hosts file is missing the short-hostname(last column) is there a way to take the FQDN value in the file remove '.pdp.wdf.ltd' and add the hostname to the last column.
To reach till here I did write a small python script wrote it to a file, but unable to proceed to get the short-hostname added
#!/usr/bin/env python
import re,subprocess,os,socket
a=subprocess.Popen('ifconfig -a', stdout=subprocess.PIPE, shell=True)
_a, err= a.communicate()
_ou=dict(re.findall(r'^(\S+).*?inet addr:(\S+)', _a, re.S | re.M))
_ou=_ou.values()
_ou.remove('127.0.0.1')
y=[]
for i in _ou:
_z = '{0} ' .format (i), socket.getfqdn(i)
y.append(_z)
_y=dict(y)
_z=(' \n'.join('{0} \t {1}'.format(key, val)for (key,val) in _y.iteritems()))
cat /etc/hosts
#IP-Address Full-Qualified-Hostname Short-Hostname
10.68.80.28 dewdfgld00035.pdp.wdf.ltd
10.68.80.45 lddbrdb.pdp.wdf.ltd
10.68.80.46 ldcirdb.pdp.wdf.ltd
10.72.176.28 dewdfgfd00035b.pdp.wdf.ltd
Output needed in the /etc/hosts file
##IP-Address Full-Qualified-Hostname Short-Hostname
10.68.80.28 dewdfgld00035.pdp.wdf.ltd dewdfgld00035
10.68.80.45 lddbrdb.pdp.wdf.ltd lddbrdb
10.68.80.46 ldcirdb.pdp.wdf.ltd ldcirbd
10.72.176.28 dewdfgfd00035b.pdp.wdf.ltd dewdfgfd00035b
You can use the following to match (with global and multiline flags) :
(^[^\s#]+\s+([^.\n]+).*)
And replace with the following:
\1\2
See RegEX DEMO
Okies I got it but had to tweak around a bit.
#!/usr/bin/env python
import re,subprocess,os,socket,shutil
header= """#DO NOT EDIT MANUALLY ## File controlled by SaltStack#
# IP-Address Full-Qualified-Hostname Short-Hostname
#
::1 localhost loopback
127.0.0.1 localhost
"""
a=subprocess.Popen('ifconfig -a', stdout=subprocess.PIPE, shell=True)
_a, err= a.communicate()
_ou=dict(re.findall(r'^(\S+).*?inet addr:(\S+)', _a, re.S | re.M))
_ou=_ou.values()
_ou.remove('127.0.0.1')
y=[]
for i in _ou:
n = socket.getfqdn(i) +'\t'+ (socket.getfqdn(i).split("."))[0]
_z = '{0} ' .format (i), n
y.append(_z)
_y=dict(y)
_z=(' \n'.join('{0} \t {1}'.format(key, val)for (key,val) in _y.iteritems()))
_z = header + _z
def make_version_path(path, version):
if version == 0:
return path
else:
return path + "." + str(version)
def rotate(path,version=0):
old_path = make_version_path(path, version)
if not os.path.exists(old_path):
raise IOError, "'%s' doesn't exist" % path
new_path = make_version_path(path, version + 1)
if os.path.exists(new_path):
rotate(path, version + 1)
shutil.move(old_path, new_path)
_hosts_path = '/etc/hosts'
shutil.copy (_hosts_path, _hosts_path+'_salt_bak')
rotate(_hosts_path+'_salt_bak')
f = open(_hosts_path, "w")
f.write(_z);
f.close()
The change was done in the code
y=[]
for i in _ou:
n = socket.getfqdn(i) +'\t'+ (socket.getfqdn(i).split("."))[0]
_z = '{0} ' .format (i), n
y.append(_z)
_y=dict(y)
And it worked as expected.
Related
Is there a more elegant way of comparing these two files?
Right now I am getting the following error message: syntax error near unexpected token (... diff <( tr -d ' '.
result = Popen("diff <( tr -d ' \n' <" + file1 + ") <( tr -d ' \n' <"
+ file2 + ") | wc =l", shell=True, stdout=PIPE).stdout.read()
Python seems to read "\n" as a literal character.
The constructs you are using are interpreted by bash and do not form a standalone statement that you can pass to system() or exec().
<( ${CMD} )
< ${FILE}
${CMD1} | ${CMD2}
As such, you will need to wire-up the redirection and pipelines yourself, or call on bash to interpret the line for you (as #wizzwizz4 suggests).
A better solution would be to use something like difflib that will perform this internally to your process rather than calling on system() / fork() / exec().
Using difflib.unified_diff will give you a similar result:
import difflib
def read_file_no_blanks(filename):
with open(filename, 'r') as f:
lines = f.readlines()
for line in lines:
if line == '\n':
continue
yield line
def count_differences(diff_lines):
diff_count = 0
for line in diff_lines:
if line[0] not in [ '-', '+' ]:
continue
if line[0:3] in [ '---', '+++' ]:
continue
diff_count += 1
return diff_count
a_lines = list(read_file_no_blanks('a'))
b_lines = list(read_file_no_blanks('b'))
diff_lines = difflib.unified_diff(a_lines, b_lines)
diff_count = count_differences(diff_lines)
print('differences: %d' % ( diff_count ))
This will fail when you fix the syntax error because you are attempting to use bash syntax in what is implemented as a C system call.
If you wish to do this in this way, either write a shell script or use the following:
result = Popen(['bash', '-c',
"diff <( tr -d ' \n' <" + file1 + ") <( tr -d ' \n' <"
+ file2 + ") | wc =l"], shell=True, stdout=PIPE).stdout.read()
This is not an elegant solution, however, since it is relying on the GNU coreutils and bash. A more elegant solution would be pure Python. You could do this with the difflib module and the re module.
I ran into a curious problem while parsing json objects in large text files, and the solution I found doesn't really make much sense. I was working with the following script. It copies bz2 files, unzips them, then parses each line as a json object.
import os, sys, json
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# USER INPUT
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
args = sys.argv
extractDir = outputDir = ""
if (len(args) >= 2):
extractDir = args[1]
else:
extractDir = raw_input('Directory to extract from: ')
if (len(args) >= 3):
outputDir = args[2]
else:
outputDir = raw_input('Directory to output to: ')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# RETRIEVE FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tweetModel = [u'id', u'text', u'lang', u'created_at', u'retweeted', u'retweet_count', u'in_reply_to_user_id', u'coordinates', u'place', u'hashtags', u'in_reply_to_status_id']
filenames = next(os.walk(extractDir))[2]
for file in filenames:
if file[-4:] != ".bz2":
continue
os.system("cp " + extractDir + '/' + file + ' ' + outputDir)
os.system("bunzip2 " + outputDir + '/' + file)
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PARSE DATA
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
input = open (outputDir + '/' + file[:-4], 'r')
output = open (outputDir + '/p_' + file[:-4], 'w+')
for line in input.readlines():
try:
tweet = json.loads(line)
for field in enumerate(tweetModel):
if tweet.has_key(field[1]) and tweet[field[1]] != None:
if field[0] != 0:
output.write('\t')
fieldData = tweet[field[1]]
if not isinstance(fieldData, unicode):
fieldData = unicode(str(fieldData), "utf-8")
output.write(fieldData.encode('utf8'))
else:
output.write('\t')
except ValueError as e:
print ("Parse Error: " + str(e))
print line
line = input.readline()
quit()
continue
print "Success! " + str(len(line))
input.flush()
output.write('\n')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# REMOVE OLD FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
os.system("rm " + outputDir + '/' + file[:-4])
While reading in certain lines in the for line in input.readlines(): loop, the lines would occasionally be truncated at inconsistent locations. Since the newline character was truncated as well, it would keep reading until it found the newline character at the end of the next json object. The result was an incomplete json object followed by a complete json object, all considered one line by the parser. I could not find the reason for this issue, but I did find that changing the loop to
filedata = input.read()
for line in filedata.splitlines():
worked. Does anyone know what is going on here?
After looking at the source code for file.readlines and string.splitlines I think I see whats up. Note: This is python 2.7 source code so if you're using another version... maybe this answer pertains maybe not.
readlines uses the function Py_UniversalNewlineFread to test for a newline splitlines uses a constant STRINGLIB_ISLINEBREAK that just tests for \n or \r. I would suspect Py_UniversalNewlineFread is picking up some character in the file stream as linebreak when its not really intended as a line break, could be from the encoding.. I don't know... but when you just dump all that same data to a string the splitlines checks it against \r and \n theres no match so splitlines moves on until the real line break is encountered and you get your intended line.
So far the program takes in all the text files in a directory and then outputs them to a file with the same name but .out. If there is an ip address 1992.168.1.1-192.168.1.7 I want it to output all of the ips in that range to the new file name.
#!/usr/bin/env python
import sys
import re
import os
try:
if file.endswith (".txt"):
f=open(file, 'r')
try:
file = open(f, "r")
ips = []
for text in file.readlines():
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
#name of the file
name = os.path.splitext() [0]
name = name +".out"
outfile = open(name, 'w')
spider = "".join(ip)
if spider is not '':
outfile.write(spider)
outfile.write("\n")
finally:
file.close()
outfile.close()
except IOError, (errno, strerror):
print "I/O Error(%s) : %s" % (errno, strerror)
If you're using Python 3.3+, use the ipaddress module:
>>> for addr in IPv4Network('192.0.2.0/28'):
... addr
...
IPv4Address('192.0.2.0')
IPv4Address('192.0.2.1')
IPv4Address('192.0.2.2')
IPv4Address('192.0.2.3')
IPv4Address('192.0.2.4')
IPv4Address('192.0.2.5')
IPv4Address('192.0.2.6')
...
IPv4Address('192.0.2.15')
Probably the easiest way is to parse the ip into an integer - if you want to roll your own, that is.
def ip_to_int(ipstr):
b = [ int(p) for p in ipstr.split('.') ]
return (b[0] << 24) + (b[1] << 16) + (b[2] << 8) + b[3]
def int_to_ip(i):
octets = []
for n in range(4):
octets.append(i & 0xff)
i >>= 8
return ".".join([ str(octet) for octet in octets[::-1]])
def iprange(ipstring):
ipstart, ipend = ipstring.split('-')
return [ int_to_ip(ipint) for ipint in range(ip_to_int(ipstart), ip_to_int(ipend)+1) ]
And this really does not understand anything about valid addresses (network addresses), nor has it any error checking.
I created a python script to parse mail (exim) logfiles and execute pattern matching in order to get a top 100 list for most send domains on my smtp servers.
However, everytime I execute the script I get a different count.
These are stale logfiles, and I cannot find a functional flaw in my code.
Example output:
1:
70353 gmail.com
68337 hotmail.com
53657 yahoo.com
2:
70020 gmail.com
67741 hotmail.com
54397 yahoo.com
3:
70191 gmail.com
67917 hotmail.com
54438 yahoo.com
Code:
#!/usr/bin/env python
import os
import datetime
import re
from collections import defaultdict
class DomainCounter(object):
def __init__(self):
self.base_path = '/opt/mail_log'
self.tmp = []
self.date = datetime.date.today() - datetime.timedelta(days=14)
self.file_out = '/var/tmp/parsed_exim_files-' + str(self.date.strftime('%Y%m%d')) + '.decompressed'
def parse_log_files(self):
sub_dir = os.listdir(self.base_path)
for directory in sub_dir:
if re.search('smtp\d+', directory):
fileInput = self.base_path + '/' + directory + '/maillog-' + str(self.date.strftime('%Y%m%d')) + '.bz2'
if not os.path.isfile(self.file_out):
os.popen('touch ' + self.file_out)
proccessFiles = os.popen('/bin/bunzip2 -cd ' + fileInput + ' > ' + self.file_out)
accessFileHandle = open(self.file_out, 'r')
readFileHandle = accessFileHandle.readlines()
print "Proccessing %s." % fileInput
for line in readFileHandle:
if '<=' in line and ' for ' in line and '<>' not in line:
distinctLine = line.split(' for ')
recipientAddresses = distinctLine[1].strip()
recipientAddressList = recipientAddresses.strip().split(' ')
if len(recipientAddressList) > 1:
for emailaddress in recipientAddressList:
# Since syslog messages are transmitted over UDP some messages are dropped and needs to be filtered out.
if '#' in emailaddress:
(login, domein) = emailaddress.split("#")
self.tmp.append(domein)
continue
else:
try:
(login, domein) = recipientAddressList[0].split("#")
self.tmp.append(domein)
except Exception as e:
print e, '<<No valid email address found, skipping line>>'
accessFileHandle.close()
os.unlink(self.file_out)
return self.tmp
if __name__ == '__main__':
domainCounter = DomainCounter()
result = domainCounter.parse_log_files()
domainCounts = defaultdict(int)
top = 100
for domain in result:
domainCounts[domain] += 1
sortedDict = dict(sorted(domainCounts.items(), key=lambda x: x[1], reverse=True)[:int(top)])
for w in sorted(sortedDict, key=sortedDict.get, reverse=True):
print '%-3s %s' % (sortedDict[w], w)
proccessFiles = os.popen('/bin/bunzip2 -cd ' + fileInput + ' > ' + self.file_out)
This line is non-blocking. Therefore it will start the command, but the few following lines are already reading the file. This is basically a concurrency issue. Try to wait for the command to complete before reading the file.
Also see:
Python popen command. Wait until the command is finished since os.popen is deprecated since python-2.6 (depending on which version you are using).
Sidenote - The same happens to the line below. The file may, or may not, exist after executing the following line:
os.popen('touch ' + self.file_out)
I am trying to generate transparent background images with a python script run from the command line but I have a hard time passing all the arguments to subprocess.Popen so that Imagemagick's convert doesn't through me errors.
Here is my code:
# Import modules
import os
import subprocess as sp
# Define useful variables
fileList = os.listdir('.')
fileList.remove(currentScriptName)
# Interpret return code
def interpretReturnCode(returnCode) :
return 'OK' if returnCode is 0 else 'ERROR, check the script'
# Create background images
def createDirectoryAndBackgroundImage() :
# Ask if numbers-height or numbers-width before creating the directory
numbersDirectoryType = raw_input('Numbers directory: type "h" for "numbers-height" or "w" for "numbers-width": ')
if numbersDirectoryType == 'h' :
# Create 'numbers-height' directory
numbersDirectoryName = 'numbers-height'
numbersDirectory = interpretReturnCode(sp.call(['mkdir', numbersDirectoryName]))
print '%s%s' % ('Create "numbers-height" directory...', numbersDirectory)
# Create background images
startNumber = int(raw_input('First number for the background images: '))
endNumber = (startNumber + len(fileList) + 1)
for x in range(startNumber, endNumber) :
createNum = []
print 'createNum just after reset and before adding things to it: ', createNum, '\n'
print 'start' , x, '\n'
createNum = 'convert -size 143x263 xc:transparent -font "FreeSans-Bold" -pointsize 22 -fill \'#242325\' "text 105,258'.split()
createNum.append('\'' + str(x) + '\'"')
createNum.append('-draw')
createNum.append('./' + numbersDirectoryName + '/' + str(x) + '.png')
print 'createNum set up, createNum submittet to subprocess.Popen: ', createNum
createNumImage = sp.Popen(createNum, stdout=sp.PIPE)
createNumImage.wait()
creationNumReturnCode = interpretReturnCode(createNumImage.returncode)
print '%s%s%s' % ('\tCreate numbers image...', creationNumReturnCode, '\n')
elif numbersDirectoryType == 'w' :
numbersDirectoryName = 'numbers-width'
numbersDirectory = interpretReturnCode(sp.call(['mkdir', numbersDirectoryName]))
print '%s%s' % ('Create "numbers-width" directory...', numbersDirectory)
# Create background images
startNumber = int(raw_input('First number for the background images: '))
endNumber = (startNumber + len(fileList) + 1)
for x in range(startNumber, endNumber) :
createNum = []
print 'createNum just after reset and before adding things to it: ', createNum, '\n'
print 'start' , x, '\n'
createNum = 'convert -size 224x122 xc:transparent -font "FreeSans-Bold" -pointsize 22-fill \'#242325\' "text 105,258'.split()
createNum.append('\'' + str(x) + '\'"')
createNum.append('-draw')
createNum.append('./' + numbersDirectoryName + '/' + str(x) + '.png')
print 'createNum set up, createNum submittet to subprocess.Popen: ', createNum
createNumImage = sp.Popen(createNum, stdout=sp.PIPE)
createNumImage.wait()
creationNumReturnCode = interpretReturnCode(createNumImage.returncode)
print '%s%s%s' % ('\tCreate numbers image...', creationNumReturnCode, '\n')
else :
print 'No such directory type, please start again'
numbersDirectoryType = raw_input('Numbers directory: type "h" for "numbers-height" or "w" for "numbers-width": ')
For this I get the following errors, for each picture:
convert.im6: unable to open image `'#242325'': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `'#242325'' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `"text': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `"text' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `105,258': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `105,258' # error/constitute.c/ReadImage/544.
convert.im6: unable to open image `'152'"': No such file or directory # error/blob.c/OpenBlob/2638.
convert.im6: no decode delegate for this image format `'152'"' # error/constitute.c/ReadImage/544.
convert.im6: option requires an argument `-draw' # error/convert.c/ConvertImageCommand/1294.
I tried to change the order of the arguments without success, to use shell=True in Popen (but then the function interpretReturCode returns a OK while no image is created (number-heights folder is empty).
I would strongly recommend following the this process:
Pick a single file and directory
change the above so that sp.Popen is replaced by a print statement
Run the modified script from the command line
Try using the printed command output from the command line
Modify the command line until it works
Modify the script until it produces the command line that is exactly the same
Change the print back to sp.Popen - Then, (if you still have a problem:
Try modifying your command string to start echo convert so that
you can see what, if anything, is happening to the parameters during
the processing by sp.Popen.
There is also this handy hint from the python documents:
>>> import shlex, subprocess
>>> command_line = raw_input()
/bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'"
>>> args = shlex.split(command_line)
>>> print args
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args) # Success!