Python 3: find line and skip the first match

Python 3: find line and skip the first match - python

I have a pretty long file in which I have to look for a particular value. The problem is that this file has two lines that starts the same way but I need to print the second one.
The file is something like:
... random text
Total = 910 K. #Don't need it
... more random lines
Total = 1000 K #The one I need it
I'm using:
for i,line in enumerate(lines):
if line.find('Total =') != -1:
Total = line.split()[4]
break
But this is only giving me the first match.
How can I skip the fist match and just use the second one?

Probably not the best solution but you could use a flag to check if you've already found the first occurance
is_second_occurance = False
for i,line in enumerate(lines):
if line.find('Total =') != -1:
if is_second_occurance:
total = line.split()[4]
break
else:
is_second_occurance = True
A better solution is probably to break it into a function that returns a generator
def get_total(lines):
for line in lines:
if line.startswith("Total = "):
yield line.split()[4]
total = get_total(lines)
total = get_total(lines)
I think that should give you the second occurance

Or may be regex
import re
string='''Total = 910 K. #Don't need it
... more random lines
Total = 1000 K #The one I need it'''
re.findall('(Total[\s\=\w]*K)',string)[1] #will find everything related to Total

Related

Working with printing a certain amount of line in a file

I am trying to achieve:
User input word, and it outputs how many lines contain that word also sees it up to the first ten such lines. If no lines has the words, then your program must output Not found.
My code so far:
sentences = []
with open("txt.txt") as file:
for line in file:
words = line.split()
words_count += len(words)
if len(words) > len(maxlines.split()):
maxlines = line
sentences.append(line)
word = input("Enter word: ")
count = 0
for line in sentences:
if word in line:
print(line)
count += 1
print(count, "lines contain", word)
if count == 0:
print("Not found.")
How would I only print first 10 line regardless the amount of lines
Thank you!

If you want to iterate 10 times (old style, not pythonic at all)
index = 0
for line in file:
if index >= 10:
break
# do stuff 10 times
index += 1
Without using break, just put the stuff inside the condition. Notice that the loop will continue iterating, so this is not really a smart solution.
index = 0
for line in file:
if index < 10:
# do stuff 10 times
index += 1
However this is not pythonic at all. the way you should do it in python is using range.
for _ in range(10):
# do stuff 10 times
_ means you don't care about the index and just want to repeat 10 times.
If you want to iterate over file and keeping the index (line number) you can use enumerate
for lineNumber, line in enumerate(file):
if lineNumber >= 10:
break
# do stuff 10 times
Finally as#jrd1 suggested, you can actually read all the file and then only slice the part that you want, in your case
sentences[:10] # will give you the 10 first sentences (lines)

just change your code like this, it should help:
for line in sentences:
if word in line:
if count < 10: print(line) # <--------
count += 1

How to extract all the numbers from a text file using re.findall() and compute the sum using a for-loop?

The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers. I'm having different outcome it supposed to end with (209). Also, how can I simplify my code? Thanks (here is the txt file http://py4e-data.dr-chuck.net/regex_sum_167791.txt)
import re
hand = open("regex_sum_167791.txt")
total = 0
count = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
if len(x)!= 1 : continue
num = int(x[0])
total = num + total
print(total)

Assuming that you need to sum all the numbers in your txt:
total = 0
with open("regex_sum_167791.txt") as f:
for line in f:
total += sum(map(int, re.findall("\d+", line)))
print(total)
# 417209
Logics
To start with, try using with when you do open so that once any job is done, open is closed.
Following lines are removed as they seemed redundant:
count = count+1: Not used.
line = line.rstrip(): re.findall takes care of extraction, so you don't have to worry about stripping lines.
if len(x)!= 1 : continue: Seems like you wanted to skip the line with no digits. But since sum(map(int, re.findall("\d+", line))) returns zero in such case, this is also unnecessary.
num = int(x[0]): Finally, this effectively grabs only one digit from the line. In case of two or more digits found in a single line, this won't serve the original purpose. And since int cannot be directly applied to iterables, I used map(int, ...).

You were almost there:
import re
hand = open("regex_sum_167791.txt")
total = 0
for line in hand:
count = count+1
line = line.rstrip()
x = re.findall("[0-9]+", line)
for i in x:
total += int(i)
print(total)
Answer: 417209

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()

This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

Python textwrap and Reset count after everyline

Hello everyone i have an issue with this problem, the problem is i need to reset the count after every line in the file, i put a comment so you can see where i want to reset the count.
The program is suppose to cut each line after every specified lineLength.
def insert_newlines(string, afterEvery_char):
lines = []
for i in range(0, len(string), afterEvery_char):
lines.append(string[i:i+afterEvery_char])
string[:afterEvery_char] #i want to reset here to the beginning of every line to start count over
print('\n'.join(lines))
def main():
filename = input("Please enter the name of the file to be used: ")
openFile = open(filename, 'r')
file = openFile.read()
lineLength = int(input("enter a number between 10 & 20: "))
while (lineLength < 10) or (lineLength > 20) :
print("Invalid input, please try again...")
lineLength = int(input("enter a number between 10 & 20: "))
print("\nYour file contains the following text: \n" + file + "\n\n") # Prints original File to screen
print("Here is your output formated to a max of", lineLength, "characters per line: ")
insert_newlines(file, lineLength)
main()
Ex. If a file has 3 lines like this with each line having 20 chars
andhsytghfydhtbcndhg
andhsytghfydhtbcndhg
andhsytghfydhtbcndhg
after the lines are cut it should look like this
andhsytghfydhtb
cndhg
andhsytghfydhtb
cndhg
andhsytghfydhtb
cndhg
i want to RESET the count after every line in the file.

I'm not sure I understand your problem, but from your comments it appears you simply want to cut the input string (file) to lines lineLength long. That is already done in your insert_newlines(), no need for the line with comment there.
However, if you want to output lines meaning strings ending with newline char that should be no more than lineLength long, then you could simply read the file like this:
lines = []
while True:
line = openFile.readline(lineLength)
if not line:
break
if line[-1] != '\n':
line += '\n'
lines.append(line)
print(''.join(lines))
or alternatively:
lines = []
while True:
line = openFile.readline(lineLength)
if not line:
break
lines.append(line.rstrip('\n'))
print('\n'.join(lines))

I don't understand the issue here, the code seems to work just fine:
def insert_newlines(string, afterEvery_char):
lines = []
# if len(string) is 100 and afterEvery_char is 10
# then i will be equal to 0, 10, 20, ... 90
# in lines we'll have [string[0:10], ..., string[90:100]] (ie the entire string)
for i in range(0, len(string), afterEvery_char):
lines.append(string[i:i+afterEvery_char])
# resetting i here won't have any effect whatsoever
print('\n'.join(lines))
>>> insert_newlines('Beautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\n..', 10)
Beautiful
is better
than ugly.
Explicit
is better
than impli
cit.
Simpl
e is bette
r than com
plex.
..
isn't that what you want?

Increase index of split in Python

Well, I have a problem in a Python script, I need to do is that the index of the split function, increases automatically with every iteration of the loop. I do this:
tag = "\'"
while loop<=302:
for line in f1.readlines():
if tag in line:
word = line.split(tag)[num] #num is the index I need to increase
text = "Word: "+word+"."
f.write(text)
num = num + 1
loop = loop + 1
But...the "num" variable on index doesn't change...it simply stays the same. The num index indicates the word I need to take. So this is why "num = num + 1" would have to increase...
What is the problem in the loop?
Thanks!

Your question is confusing. But I think you want to move num = num + 1 into the for loop and if statement.
tag = "\'"
while loop<=302:
for line in f1.readlines():
if tag in line:
word = line.split(tag)[num] #num is the index I need to increase
num = num + 1
text = "Word: "+word+"."
f.write(text)
loop = loop + 1

Based on Benyi's comment in the question - do you just want this for the individual sentences? You might not need to index.
>>> mystring = 'hello i am a string'
>>> for word in mystring.split():
print 'Word: ',word
Word: hello
Word: i
Word: am
Word: a
Word: string

There seems to be a lot of things wrong with this.
First
while loop <= 302:
for line in f1.readlines():
f1.readlines() is going be [] for every iteration past the first
Second
for line in f1.readline():
word = line.split(tag)[num]
...
text = "Word: "+word+"."
Even if you made the for loop work, text will always be using the last iteration of the word. Maybe this is desired behavior, but it seems strange.
Third
while loop<=302:
...
loop = loop += 1
Seems like it would be better written as
for _ in xrange(302):
Since loop isn't used at all inside that scope. This is assuming loop starts at 0, if it doesn't then you just adjust 302 to however many iterations you wanted.
Lastly
num = num + 1
This is outside your inner loop, so num will always be the same for the first iteration, then won't matter latter because of the empty f1.readlines() as stated before.

I have a different approach to your problem as mentioned by you in the comment. Consider input.txt has the following entry:
this is a an input file.
then the Following code will give you the desired output
lines = []
with open (r'C:\temp\input.txt' , 'r') as fh:
lines = fh.read()
with open (r'C:\temp\outputfile.txt' , 'w') as fh1:
for words in lines.split():
fh1.write("Words:"+ words+ "\n" )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3: find line and skip the first match - python

Or may be regex import re string='''Total = 910 K. #Don't need it ... more random lines Total = 1000 K #The one I need it''' re.findall('(Total[\s\=\w]*K)',string)[1] #will find everything related to Total

Related

Working with printing a certain amount of line in a file

How to extract all the numbers from a text file using re.findall() and compute the sum using a for-loop?

Why is this not correct? (codeeval challenge)PYTHON

Python textwrap and Reset count after everyline

Increase index of split in Python

Categories

Resources