extracting lines in a text file between two lines of strings

extracting lines in a text file between two lines of strings - python

I have the following example text file (it is in the format as indicated below). I want to extract everything between the lines "Generating configuration...." and "`show accounting log all`", this is the beginning and end of what I am interested in.
some lines
some more line
Generating configuration....
interested config
interested config
interested config
`show accounting log all`
some lines
some more line
I wrote the following code, but its does not stop appending the lines to the textfile after it has found `show accounting log all`.
config_found = False
with open(filename, 'rb') as f:
textfile_temp = f.readlines()
for line in textfile_temp:
if re.match("Generating configuration....", line):
config_found = True
if re.match("`show accounting log all`", line):
config_found = False
if config_found:
i = line.rstrip()
textfile.append(i)
what am i doing wrong with my statements?

Instead of single quotes, you have to use back quote in your comparision and you can have if and elif for extracting in between strings. I have modified as below and it's working:
with open('file.txt', 'rb') as f:
textfile_temp = f.readlines()
config_found = False
textfile = []
for line in textfile_temp:
if re.match("`show accounting log all`", line):
config_found = False
elif config_found:
i = line.rstrip()
textfile.append(i)
elif re.match("Generating configuration....", line):
config_found = True
print textfile
Output:
['interested config', 'interested config', 'interested config']
Instead you can use split as below:
with open('file.txt', 'rb') as f:
textfile_temp = f.read()
print textfile_temp.split('Generating configuration....')[1].split("`show accounting log all`")[0]
Output:
interested config
interested config
interested config

config_found appears to have no scope outside of the loop.
Put config_found = False before the loop and it should work fine.

Related

Check if timestamp is between limits

I m looking for a solution to make inside a reading log file processus a compare to check the time range within of the current line
currently I have :
variables : datetime_ed & datetime_sd as enddate & startdate with each fd_fmt as '%Y-%m-%d-%H-%M'
the current existing code block looks like
with open(f, "r") as fi:#, open(output_file, "w") as fo:
for line in fi:
#for pattern in REGEXES:
if REGEXES[0].search(line): # search by ID instead of the wordslist at first
match = REGEXES[1].search(line)
if match:
line_id = match.group(1)
# doing stuff here
so I would do something like
with open(f, "r") as fi:#, open(output_file, "w") as fo:
for line in fi:
#for pattern in REGEXES:
if REGEXES[0].search(line): # search by ID instead of the wordslist at first
match = REGEXES[1].search(line)
#ONLY IF LINE CONTAIN DATE BETWEEN datetime_ed & datetime_sd
if match:
line_id = match.group(1)
# doing stuff here
how to do that ?
didn't found a function to do so in datetime documents
as temporary solution I made that :
with open(f, "r") as fi:#, open(output_file, "w") as fo:
for line in fi:
try:
ts = datetime.datetime.strptime(line[:16], fd_fmt) # try/except to avoid issues with non-dated lines
except ValueError:
pass
#print("ts ",ts)
if datetime_sd < ts < datetime_ed :
#for pattern in REGEXES:
if REGEXES[0].search(line): # search by ID instead of the wordslist at first
match = REGEXES[1].search(line)
if match:
#etc...
Is this good / correct ?

how do I use the line I get from the file in equation

I am processing on json files with python programming. I want to compare the data from json file with the lines in file.txt and get the output according to the result.
what should I replace with filex[0] in the code?
filename = 'paf.json'
with open(filename, 'r') as f:
for line in f:
if line.strip():
tweet = json.loads(line)
file1=open("file.txt","r")
filex=file1.readlines()
for linex in filex:
lines=linex
for char in tweet:
if str(tweet['entities']['urls'][0]['expanded_url']) == filex[0]:
print(str(tweet['created_at']))
break

It's hard to tell exactly what you're asking, but I suspect you want to loop through the lines in both files in parallel.
with open("paf.json", "r") as json_file, open("file.txt", "r") as text_file:
for json_line, text_line in zip(json_file, text_file):
tweet = json.loads(json_line)
if tweet['entities']['urls'][0]['expanded_url'] == text_line:
print(tweet['created_at'])
This will tell you if line N in the text file matches the URL in line N in the JSON file.

Why my function is returning empty string in python?

What I am doing is, removing all parts of speech except nouns from a text.
I have written a function for that. It may not be the best or optimized code to do that because I have just started coding in python. I am sure the bug must be very basic but I am just not able to figure it out.
In my function two inputs go as parameters. One is the location of text on hard drive and other is the location of file where we want the output.
Following is the code.
def extract_nouns(i_location, o_location):
import nltk
with open(i_location, "r") as myfile:
data = myfile.read().replace('\n', '')
tokens = nltk.word_tokenize(data)
tagged = nltk.pos_tag(tokens)
length = len(tagged)
a = list()
for i in range(0,length):
print(i)
log = (tagged[i][1][0] == 'N')
if log == False:
a.append(tagged[i][0])
fin = open(i_location, 'r')
fout = open(o_location, "w+")
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
with open(o_location, "r") as myfile_new:
data_out = myfile_new.read().replace('\n', '')
return data_out
When I call this function it is working just fine. I am getting the output on hard disk as I had intended but it does not return the output on the interface or should I say, it is returning a blank string instead of the actual output string.
This is how I am calling it.
t = extract_nouns("input.txt","output.txt")
If you want to try it, take following as the content of input file
"At eight o'clock on
Thursday film morning word line test
best beautiful Ram Aaron design"
This is the output I am getting in the output file (output.txt) when I call the function but the function returns blank string on the interface instead. It does not even print the output.
"
Thursday film morning word line test
Ram Aar design"

You need to close the file first:
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
fout.close()
Using with is usually the best way to open files as it automatically closes them and file.seek() to go back to the start of the file to read :
def extract_nouns(i_location, o_location):
import nltk
with open(i_location, "r") as myfile:
data = myfile.read().replace('\n', '')
tokens = nltk.word_tokenize(data)
tagged = nltk.pos_tag(tokens)
length = len(tagged)
a = []
for i in range(0,length):
print(i)
log = (tagged[i][1][0] == 'N')
if not log:
a.append(tagged[i][0])
with open(i_location, 'r') as fin, open(o_location, "w+") as fout:
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
fout.seek(0) # go back to start of file
data_out = fout.read().replace('\n' , '')
return data_out

The last statement in the function should be the return.
Because there is the print data_out, you return the return value of print which is none.
E.g:
In []: def test():
..: print 'Hello!'
..:
In []: res = test()
Hello!
In []: res is None
Out[]: True

How to interact with notepad document correctly in python?

I created a notepad text document called "connections.txt". I need to have some initial information inside it, several lines of just URLs. Each URL has it's own line. I put that in manually. Then in my program I have a function that checks if a URL is in the file:
def checkfile(string):
datafile = file(f)
for line in datafile:
if string in line:
return True
return False
where f is declared at the beginning of the program:
f = "D:\connections.txt"
Then I tried to write to the document like this:
file = open(f, "w")
if checkfile(user) == False:
usernames.append(user)
file.write("\n")
file.write(user)
file.close()
but it hasn't really been working correctly..I'm not sure what's wrong..am I doing it wrong?
I want the information in the notepad document to stay there ACROSS runs of the program. I want it to build up.
Thanks.
EDIT: I found something wrong... It needs to be file = f, not datafile = file(f)
But the problem is... It clears the text document every time I rerun the program.
f = "D:\connections.txt"
usernames = []
def checkfile(string):
file = f
for line in file:
if string in line:
return True
print "True"
return False
print "False"
file = open(f, "w")
user = "aasdf"
if checkfile(user) == False:
usernames.append(user)
file.write("\n")
file.write(user)
file.close()

I was working with the file command incorrectly...here is the code that works.
f = "D:\connections.txt"
usernames = []
def checkfile(string):
datafile = file(f)
for line in datafile:
if string in line:
print "True"
return True
print "False"
return False
user = "asdf"
if checkfile(user) == False:
usernames.append(user)
with open(f, "a") as myfile:
myfile.write("\n")
myfile.write(user)

The code that checks for a specific URL is ok!
If the problem is not erasing everything:
To write to the document without erasing everything you have to use the .seek() method:
file = open("D:\connections.txt", "w")
# The .seek() method sets the cursor to the wanted position
# seek(offset, [whence]) where:
# offset = 2 is relative to the end of file
# read more here: http://docs.python.org/2/library/stdtypes.html?highlight=seek#file.seek
file.seek(2)
file.write("*The URL you want to write*")
Implemented on your code will be something like:
def checkfile(URL):
# your own function as it is...
if checkfile(URL) == False:
file = open("D:\connections.txt", "w")
file.seek(2)
file.write(URL)
file.close()

Using python to read txt files and answer questions

a01:01-24-2011:s1
a03:01-24-2011:s2
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
So I have this list of data as a txt file, where animal name : date : location
So I have to read this txt file to answer questions.
So so far I have
text_file=open("animal data.txt", "r") #open the text file and reads it.
I know how to read one line, but here since there are multiple lines im not sure how i can read every line in the txt.

Use a for loop.
text_file = open("animal data.txt","r")
for line in text_file:
line = line.split(":")
#Code for what you want to do with each element in the line
text_file.close()

Since you know the format of this file, you can shorten it even more over the other answers:
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
# You now have three variables (animal_name, date, and location)
# This loop will happen once for each line of the file
# For example, the first time through will have data like:
# animal_name == 'a01'
# date == '01-24-2011'
# location == 's1'
Or, if you want to keep a database of the information you get from the file to answer your questions, you can do something like this:
animal_names, dates, locations = [], [], []
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
animal_names.append(animal_name)
dates.append(date)
locations.append(location)
# Here, you have access to the three lists of data from the file
# For example:
# animal_names[0] == 'a01'
# dates[0] == '01-24-2011'
# locations[0] == 's1'

You can use a with statement to open the file, in case of the open was failed.
>>> with open('data.txt', 'r') as f_in:
>>> for line in f_in:
>>> line = line.strip() # remove all whitespaces at start and end
>>> field = line.split(':')
>>> # field[0] = animal name
>>> # field[1] = date
>>> # field[2] = location

You are missing the closing the file. You better use the with statement to ensure the file gets closed.
with open("animal data.txt","r") as file:
for line in file:
line = line.split(":")
# Code for what you want to do with each element in the line

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

extracting lines in a text file between two lines of strings - python

config_found appears to have no scope outside of the loop. Put config_found = False before the loop and it should work fine.

Related

Check if timestamp is between limits

how do I use the line I get from the file in equation

Why my function is returning empty string in python?

How to interact with notepad document correctly in python?

Using python to read txt files and answer questions

Categories

Resources