How do I compare 2 lines in a string in Python - python

I have the console output stored in a string in Python.
It looks like:
output ="Status of xyz
Process is running
Status of abc
Process is stopped"
I want to get last word of each line and compare with last word of next line.
How can I do this in Python?.

First you need to separate the string into a list of lines:
lines = output.split('\n') #splits into lines
Then you need to loop over the lines and split the line into words
#we go through all lines except the last, to check the line with the next
for lineIndex in range(len(lines)-1):
# split line to words
WordsLine1 = lines[lineIndex].split()
WordsLine2 = lines[lineIndex+1].split() # split next line to words
#now check if the last word of the line is equal to the last word of the other line.
if ( WordsLine1[-1] == WordLine2[-1]):
#equal do stuff..

Here's the data
data = """\
Status of xyz Process is running
Status of abc Process is stopped
"""
Split into lines in a cross-platform manner:
lines = data.splitlines()
Loop over the lines pairwise, so you have the current line and the previous line at the same time (using zip):
for previous, current in zip(lines, lines[1:]):
lastword = previous.split()[-1]
if lastword == current.split()[-1]:
print('Both lines end with the same word: {word}'.format(word=lastword))
Alternatively, if you don't like how zip looks, we can loop over the lines pairwise by repeatedly setting a variable to store the last line:
last = None
for line in lines:
if last is not None and line.split()[-1] == last.split()[-1]:
print('both lines have the same last word')
last = line

Related

Why does my for loop run indefinitely and doesn't stop when the if condition is met?

I'm trying to read text from a file and using a loop to find a specific text from the file. The data in the file is listed vertically word by word.
When I run the script, after it prints the last word in the file it repeats itself from the beginning indefinitely.
with open('dictionary.txt','r') as file:
dictionary = file.read().strip()
for i in dictionary:
print (dictionary)
if( i == "ffff" ):
break
first split the lines by "\n" then print(i) not print(dictionary):
with open('dictionary.txt', 'r') as file:
dictionary = file.read().strip().split("\n")
for i in dictionary:
print(i)
if i == "ffff":
break
before, you should split the lines b/c it will loop into the string and check if the characters are ffff, and it won't be True
i will be a single character, so you should do first
dictionary = dictionary.split("\n")
BUT if your ffff is a line, if is a word separated with spaces you can do:
dictionary = dictionary.split(" ")

When iterating through lines in txt file, how can I capture multiple subsequent lines after a regex triggers?

I have a txt file:
This is the first line of block 1. It is always identifiable
Random
Stuff
This is the first line of block 2. It is always identifiable
Is
Always
This is the first line of block 3. It is always identifiable
In
Here!
I want to iterate through each line and look for the following code to trigger and capture a fixed amount of lines following:
for line in lines:
match = re.compile(r'(.*)block 2.(.*)'.search(line)
if match:
#capture current line and the following 2 lines
After parsing the txt file, I want to return:
This is the first line of block 2
Is
Always
In my particular example, the first line of my block is always identifiable. There is a consistent row count per block. The contents of lines >= 2 will always change and cannot reliably be returned when using regex.
You can call the next() function to get the next element in the iterator.
def get_block2(lines):
for line in lines:
match = re.compile(r'(.*)block 2\n').search(line)
if match:
line2 = next(lines)
line3 = next(lines)
return line, line2, line3
assuming lines is an iterator, so you can just grab them from it.
block2 = re.compile(r'(.*)block 2\n')
for l in lines:
if block2.search(l):
res = [l, next(lines), next(lines)]
break
print(res)
if not lines isn't an iterator, you just have to add lines = iter(lines) to the code.

How to remove lines that start with the same characters (but are random) in python?

I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line

Python 3.4 - Capture block of text based on single string

I have searched far and wide and I hope someone can either point me to the link I missed or help me out with this logic.
We have a script the goes out and collects logs from various devices and places them in text files. Within these text files there is a time stamp and we need to collect the few lines of text before and after this time stamp.
I already have a script that matches the time stamps and removes them for certain reports (included below) but I cannot figure out how to match the time stamp and then capture the surrounding lines.
regex_time_stamp = re.compile('\d{2}:\d{2}:\d{2}|\d{1,2}y\d{1,2}w\d{1,2}d|\d{1,2}w\d{1,2}d|\d{1,2}d\d{1,2}h')
with open(filename, 'r') as f:
h = f.readlines()
for line in h:
if regex_time_stamp.search(line) is not None:
new_line = re.sub(regex_time_stamp, '', line)
pre_list.append(new_line)
else:
pre_list.append(line)
Any assistance would be greatly appreciated! Thanks for taking the time to read this.
The basic algorithm is to remember the three most recently read lines. When you match a header, read the next two lines and the combine it with the header and the last three lines that you've saved.
Alternately, since you're saving all of the lines in a list, simply keep track of which element is the current element, and when you find a header you can go back and get the previous two and next two elements.
Catch with duplicated lines
Agreed with the basic algorithm by #Bryan-Oakley and #TigerhawkT3, however there's a catch:
What if several lines match consecutively?
You could end up duplicating "context" lines by printing the last 2 lines of the first match, and then the last 2 lines of the second match... that would actually also contain the previous matched line.
The solution is to keep track of which line number was last printed, in order to print just enough lines before the current matched line.
Flexible context parameter
What if also you want to print 3 lines before and after instead of 2? Then you need to keep track of more lines.
What if you want only 1 ?
Then your number of lines to print needs to be a parameter and the algorithm needs to use it.
Sample input and output
Here's a file sample that contains the word MATCH instead of your timestamp, for clarity. The other lines contain NOT + line number
==
NOT 0
NOT 1
NOT 2
NOT 3
NOT 4
MATCH LINE 5
NOT 6
NOT 7
NOT 8
NOT 9
MATCH LINE 10
MATCH LINE 11
NOT 12
MATCH LINE 13
NOT 14
==
The output should be:
==
NOT 3
NOT 4
LINE 5
NOT 6
NOT 8
NOT 9
LINE 10
LINE 11
NOT 12
LINE 13
NOT 14
==
Solution
This solution iterates on the file and keeps track of:
what is the last line that was printed? This will take care of not duplicating "context" lines if matched lines come in sequence.
what is the last line that was matched? This will tell the program to print the current line if it is "close" to the last matched line. How close? This is determined by your "number of lines to print" parameter. Then we also set the last_line_printed variable to the current line index.
Here's a simplified algorithm in English:
When matching a line we will:
print the last N lines, from the last_line_printed variable to the current index
print the current line after stripping the timestamp
set the last_line_printed = last_line_matched = current line index
continue
When not matching a line we will:
print the current line if current_index < last_line_matched index + number_of_lines_to_print
Of course we're taking care of whether we're close to the beginning of file by checking limits
Not print but return an array
This solution doesn't print directly but returns an array with all the lines to print. That's just a bit classier.
I like to name my "return" variable result but that's just me. It makes it obvious what is the result variable during the whole algorithm.
Code
You can try this code with the input above, it'll print the same output.
def search_timestamps_context(filename, number_of_lines_to_print=2):
import re
result = []
regex_time_stamp = re.compile('\d{2}:\d{2}:\d{2}|\d{1,2}y\d{1,2}w\d{1,2}d|\d{1,2}w\d{1,2}d|\d{1,2}d\d{1,2}h')
# for my test
regex_time_stamp = re.compile('MATCH')
with open(filename, 'r') as f:
h = f.readlines()
# Remember which is the last line printed and matched
last_line_printed = -1
last_line_matched = -1
for idx, line in enumerate(h):
if regex_time_stamp.search(line) is not None:
# We want to print the last "number_of_lines_to_print" lines
# ...unless they were already printed
# We want to return last lines from idx - number_of_lines_to_print
# print ('** Matched ', line, idx, last_line_matched, last_line_printed)
if last_line_printed == -1:
lines_to_print = max(idx - number_of_lines_to_print, 0)
else:
# Unless we've already printed those lines because of a previous match, then we continue
lines_to_print = max(idx - number_of_lines_to_print, last_line_printed + 1)
for l in h[lines_to_print:idx]:
result.append(l)
# Now print the stripped line
new_line = re.sub(regex_time_stamp, '', line)
result.append(new_line)
# Update the last line printed
last_line_printed = last_line_matched = idx
else:
# If not a match, we still need to print the current line if we had a match N lines before
if last_line_matched != -1 and idx < last_line_matched + number_of_lines_to_print:
result.append(line)
last_line_printed = idx
return result
filename = 'test_match.txt'
lines = search_timestamps_context(filename, number_of_lines_to_print=2)
print (''.join(lines))
Improvements
The usage of getlines() is inefficient: we are reading the whole file before starting.
It would be more efficient to just iterate, but then we need to remember the last lines in case we need to print them. To achieve that, we would maintain a list of the last N lines, and not more.
That's an exercise left to the reader :)

Python for loop not iterating

I'm trying to loop through a list of strings and add them to a dictionary if their length equals a length input by the user. When the last loop runs, it only runs one time. I know this because the first word in the dictionary is 8 characters long, and when the user input is 8, it prints just that word, and not the other 8 character words. If the input is 3, an empty dictionary is printed. Why is my loop not iterating through all of the words in the list linelist?
wordLength = raw_input("Enter a word length ")
word_dict = {}
infile = open("dictionary.txt")
for line in infile:
line = line.strip()
linelist = line.split(" ")
for word in linelist:
if len(word) == int(wordLength):
if len(word) in word_dict:
word_dict[len(word)] = word_dict[len(word)].append(word)
else:
word_dict[len(word)] = word
print word_dict
Each time your first loop runs, it sets linelist to a new value, overwriting any old value. After that first loop runs, linelist will contain only the split result from the last line of the file. Every time you process one line of the file, you are throwing away whatever you did with the previous line.
If you want to build a list of all words in the dictionary file, you need to make a list and append to it on each iteration of your for line in infile loop.
Also, it doesn't make much sense to use split on each line if each line is just one word, since there will be no splitting to be done.
for line in infile:
line = line.strip()
linelist = line.split(" ")
Every time you do linelist = line.split(" "), that replaces the old linelist with words from just the last line. The list ends up only holding words from the last line. If you want words from the entire file, create a single linelist and extend it with new words:
linelist = []
for line in infile:
# split with no argument splits on any run of whitespace, trimming
# leading and trailing whitespace
linelist += line.split()
# ^ this means add the elements of line.split() to linelist
Since apparently every word is on its own line, though, you shouldn't even be using split:
words = [line.strip() for line in infile]
Your second loop is not indented, so you run it only on the last value of linelist.

Categories