How to parse a concatenated a zone file - python

I have a large zone file that I want to move to another provider. My issue is the export is only one large concatenated zone file where as my new registrar only accepts single standard zone files.
For example allzone.txt contains:
somedomain.com
=========
Record data...
...
------------
anotherdomain.com
=========
Record data...
...
------------
evenmoredomain.com
=========
Record data...
...
------------
What I'd like to happen is that it takes the one file above and creates 3 files.
somedomain.txt
anotherdomain.com.txt
evenmoredomain.com.txt
Inside each of the files the delimiters of :
anydomain.com
=========
and
------------
are removed only leaving
"Record data"
Between.
So a file should be named domainA.com.txt and inside just the corresponding record data.
Not sure what the best way to do this. I can split on a delimiter but not sure how to take that content to write a new file where the name is what is before the delimiter (anydomain.com)
Thanks!

More or less
current_file = None
with open('allzone.txt') as f:
# read line by line
for line in f:
# open next file and close previous
if line.startswith('domain'):
# close previous file
if current_file:
current_file.close()
# open new file
current_file = open(line.strip() + '.txt', 'w')
# write to current file
if current_file:
if not (line.startswith('domain') or line.startswith('---') or line.startswith('===')):
current_file.write(line)
# close last file
if current_file:
current_file.close()
EDIT: new version for any domain
current_file = None
with open('allzone.txt') as f:
# read line by line
for line in f:
# open next file
if not current_file:
# open new file
current_file = open(line.strip() + '.txt', 'w')
# skip next line
next(f)
else:
# close previous file
if line.startswith('---') :
current_file.close()
current_file = None
# write records
#elif not line.startswith('==='): # use it if you don't use `next(f)`
else:
current_file.write(line)
# close last file
if current_file:
current_file.close()

Maybe something like this would work? It might still need some tweaking
def main():
with open('allzone.txt', 'r+') as f:
data = ''
first_line = ''
for line in f:
if first_line == '':
first_line = line
elif line == '------------\n':
new_file = open('%s.txt' % first_line.rstrip(), 'w+')
new_file.write(data)
new_file.close()
first_line = ''
data = ''
elif line == '=========\n' or line == '...\n' or line == '------------\n':
pass
else:
data += line
if __name__ == '__main__':
main()

Related

Opening file vs reading file in python

I am attempting to print the file, split by line using two methods: one is using the method read on files and the second is using a for loop and splitting the files into lines. I am getting a Traceback error on the last line stating that "words" is not defined. I cannot see why this is the case.
fname = input('enter file name')
try:
fhandle = open(fname, 'r')
except:
print('file does not exist')
exit()
#store entire file in a variable called data
data = fhandle.read()
print(data)
#iterate through each line in a file handle
for line in fhandle:
line = line.strip()
words = line.split()
print(words)
When reading a file, Python keeps track of a cursor within the file. Data is read from the position of the cursor onwards, and reading moves the cursor forward to the end of the data that was read. This is so that, e.g., calling f.readline() twice will return the next line each time, rather than the first line both times.
When you call f.read(), the whole file is read, so the cursor is moved to the end of the file. Then, when you iterate through fhandle, Python only considers the lines ahead of the cursor — of which there are none. Since the object being iterated through is empty, the body of the for loop is never executed, so words is never assigned to.
You can fix this by calling fhandle.seek(0) directly before the for loop to return the cursor to the start of the file.
There is also a logical error in your program. If you want to print every line, not just the last, in your for loop, you need to indent print(words) so that it's in the for loop.
As a best practice, you should also call fhandle.close() when you're finished using the file.
words it not define because of read(), it makes for loop didn't return anything.
Python file method read() reads at most size bytes from the file. If
the read hits EOF before obtaining size bytes, then it reads only
available bytes.
When print(words) is indented in for loop, it just return nothing too. But if read() is removed while print(words) isn't indented, it'll return a list of the last line:
fname = input('enter file name')
try:
fhandle = open(fname, 'r')
except:
print('file does not exist')
exit()
# store entire file in a variable called data
# data = fhandle.read()
# print(data)
# iterate through each line in a file handle
for line in fhandle:
line = line.strip()
words = line.split()
print(words)
# ['Line', '4']
And if print(words) is indented while read() is removed, it'll return this:
fname = input('enter file name')
try:
fhandle = open(fname, 'r')
except:
print('file does not exist')
exit()
# store entire file in a variable called data
# data = fhandle.read()
# print(data)
# iterate through each line in a file handle
for line in fhandle:
line = line.strip()
words = line.split()
print(words)
# ['Line', '1']
# ['Line', '2']
# ['Line', '3']
# ['Line', '4']
I'm not sure what is your intent using split() but if you just want to print line by line using read(), your code already did that.
When using for loop, just comment or remove read() then just print line
fname = input('enter file name')
try:
fhandle = open(fname, 'r')
except:
print('file does not exist')
exit()
# store entire file in a variable called data
# data = fhandle.read()
# print(data)
# iterate through each line in a file handle
for line in fhandle:
print(line.strip())
# Line 1
# Line 2
# Line 3
# Line 4
But if you're intend to make a list consisted of each line, you can use splitlines()
fname = input('enter file name')
try:
fhandle = open(fname, 'r')
except:
print('file does not exist')
exit()
#store entire file in a variable called data
data = fhandle.read().splitlines()
print(data)
# ['Line 1', 'Line 2', 'Line 3', 'Line 4']
Hopes this help.

Copy only the New added lines of a file in another file every iteration

I have a file with mode append and with time the number of lines increases and i want to copy only the new lines that are added after every iteration of this file in other files. This code copy the new lines with the old lines:
def get_qatcher_log_file(source_file):
"""Get Qatcher Log File"""
lastLine = None
with open(source_file,'r') as f:
file_name=source_file.rsplit('.')[0]
filename=file_name +"_" + datetime.datetime.now().strftime("%Y-%m-%d_%Hh%Mm%S.%fs") + ".log"
while True:
line = f.readline()
if not line:
break
lastLine = line
while True:
with open(source_file,'r') as f:
lines = f.readlines()
if lines[-1] != lastLine:
data=lines[len(lastLine):]
else:
if lines[-1] == lastLine:
data=lines
print("line",data)
with open(filename,"a") as f_destination:
f_destination.writelines(data)
f_destination.close()
print("line",data)
print("lastLine",lastLine)
return filename
I have modified my code to this new code but it still copy all the file not only in the appended lines in the last of the file. Does anyone have an idea how i can copy only the new lines added in the end of the same file and thank you in advance.

How to change a value in line of config file python

file = open("my_config", "r")
for line in file:
change line to "new_line"
How I can change a value of parameter in line.
Just to be clear, you open a config file (may we have a example of the structure? If it is a JSON file or similar, it could be easier), loop trough all lines of it and want to change one line?
The best way would be to recreate the file, stocked in a string and then rewrite it.
file = open("my_config", "w")
str_file = ""
for line in file:
# Change the line here
str_file += line+'\n'
str_file = str_file.strip() #To remove the last \n
file.write(str_file)
file.close()
EDIT: with your comment QA Answser, I'ld go with:
file = open("my_config", "w")
str_file = ""
for line in file:
if (line.split(':')[0] == 'SECURITY_LEVEL'):
line = 'SECURITY_LEVEL:' + VALUE #your new value here
str_file += line+'\n'
str_file = str_file.strip() #To remove the last \n
file.write(str_file)
file.close()

hadoop filesystem open file and skip first line

I'm reading the file in my HDFS using Python language.
Each file has a header and I'm trying to merge the files. However, the header in each file also gets merged.
Is there a way to skip the header from second file?
hadoop = sc._jvm.org.apache.hadoop
conf = hadoop.conf.Configuration()
fs = hadoop.fs.FileSystem.get(conf)
src_dir = "/mnt/test/"
out_stream = fs.create(hadoop.fs.Path(dst_file), overwrite)
files = []
for f in fs.listStatus(hadoop.fs.Path(src_dir)):
if f.isFile():
files.append(f.getPath())
for file in files:
in_stream = fs.open(file)
hadoop.io.IOUtils.copyBytes(in_stream, out_stream, conf, False)
Currently I have solved the problem with below logic, however would like to know if there is any better and efficient solution? appreciate your help
for idx,file in enumerate(files):
if debug:
print("Appending file {} into {}".format(file, dst_file))
# remove header from the second file
if idx>0:
file_str = ""
with open('/'+str(file).replace(':',''),'r+') as f:
for idx,line in enumerate(f):
if idx>0:
file_str = file_str + line
with open('/'+str(file).replace(':',''), "w+") as f:
f.write(file_str)
in_stream = fs.open(file) # InputStream object and copy the stream
try:
hadoop.io.IOUtils.copyBytes(in_stream, out_stream, conf, False) # False means don't close out_stream
finally:
in_stream.close()
What you are doing now is appending repeatedly to a string. This is a fairly slow process. Why not write directly to the output file as you are reading?
for file_idx, file in enumerate(files):
with open(...) as out_f, open(...) as in_f:
for line_num, line in enumerate(in_f):
if file_idx == 0 or line_num > 0:
f_out.write(line)
If you can load the file all at once, you can also skip the first line by using readline followed by readlines:
for file_idx, file in enumerate(files):
with open(...) as out_f, open(...) as in_f:
if file_idx != 0:
f_in.readline()
f_out.writelines(f_in.readlines())

Using python to read txt files and answer questions

a01:01-24-2011:s1
a03:01-24-2011:s2
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
So I have this list of data as a txt file, where animal name : date : location
So I have to read this txt file to answer questions.
So so far I have
text_file=open("animal data.txt", "r") #open the text file and reads it.
I know how to read one line, but here since there are multiple lines im not sure how i can read every line in the txt.
Use a for loop.
text_file = open("animal data.txt","r")
for line in text_file:
line = line.split(":")
#Code for what you want to do with each element in the line
text_file.close()
Since you know the format of this file, you can shorten it even more over the other answers:
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
# You now have three variables (animal_name, date, and location)
# This loop will happen once for each line of the file
# For example, the first time through will have data like:
# animal_name == 'a01'
# date == '01-24-2011'
# location == 's1'
Or, if you want to keep a database of the information you get from the file to answer your questions, you can do something like this:
animal_names, dates, locations = [], [], []
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
animal_names.append(animal_name)
dates.append(date)
locations.append(location)
# Here, you have access to the three lists of data from the file
# For example:
# animal_names[0] == 'a01'
# dates[0] == '01-24-2011'
# locations[0] == 's1'
You can use a with statement to open the file, in case of the open was failed.
>>> with open('data.txt', 'r') as f_in:
>>> for line in f_in:
>>> line = line.strip() # remove all whitespaces at start and end
>>> field = line.split(':')
>>> # field[0] = animal name
>>> # field[1] = date
>>> # field[2] = location
You are missing the closing the file. You better use the with statement to ensure the file gets closed.
with open("animal data.txt","r") as file:
for line in file:
line = line.split(":")
# Code for what you want to do with each element in the line

Categories