I got file called 'datafile', which contains data like this:
tag
12
22
33
tag
234
1
23
43
tag
8
tag
0
12
The number of numbers between the "tag"s varies.
What I need to do is to access every first( or second) number (if exist) after "tag".
My uncompleted Python code:
f = open('datafile', 'r')
for line in f.readlines():
line = line.strip()
if line == 'tag':
print('tag found!')
# how can I access next number here?
How can I proceed to next line inside the for loop?
It's easier to just reset a counter every time you encounter "tag":
k = 1 # or 2, or whatever
with open('datafile', 'r') as f:
since_tag = 0
for line in f:
line = line.strip()
if line == 'tag':
print('tag found!')
since_tag = 0
if since_tag == k:
print "kth number found: ", line
since_tag += 1
Use an iterator fed into a generator.
def getFirstSecond(browser):
for line in browser:
line = line.strip()
if line == 'tag':
first = next(browser).strip()
try:
second = next(browser).strip()
except StopIteration:
second = None
yield first, second if second != 'tag' else first, None
with open('datafile', 'r') as f:
browser = iter(f)
pairs = list(getFirstSecond(browser))
To answer your question, you proceed to the next line using the next function.
Note the use of the with statement; this is how you should be opening a file (it ensures the file is closed when you are done with it).
Related
I want to delete a line of text from a .txt file given an integer corresponding to the txt file's line number. For example, given the integer 2, delete line 2 of the text file.
I'm sort of lost on what to put into my program.
f = open('text1.txt','r+')
g = open('text2.txt',"w")
line_num = 0
search_phrase = "Test"
for line in f.readlines():
line_num += 1
if line.find(search_phrase) >= 0:
print("text found a line" , line_num)
decision = input("enter letter corresponding to a decision: (d = delete lines, s = save to new txt) \n")
if decision == 'd':
//delete the current line
if decision == 's':
//save the current line to a new file
Any help is appreciated! Thanks :)
This way:
with open('text1.txt','r') as f, open('text2.txt',"w") as g:
to_delete=[2,4]
for line_number, line in enumerate(f.readlines(), 1):
if line_number not in to_delete:
g.write(line)
else:
print(f'line {line_number}, "{line.rstrip()}" deleted')
Here it goes.
f = open('data/test.txt','rb')
text = f.readlines() # all lines are read into a list and you can acess it as a list
deleted_line = text.pop(1) #line is deleted and stored into the variable
print(text)
print(deleted_line)
f.write(text) # here you save it with the new data, you can always delete the data in the file to replace by the new one
In the text file I'm working on there are multiple lines containing the word "TOP", however, I want to get only the first occurrence coming after lines containing the word "IPT".
The second question I want to ask if it would be a better idea to work with Pandas library since it is csv (comma separated values) file.
Here's my code, but it gets all of the lines containing the word "TOP":
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
fileNum = 0
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
continue
if line[12:15] == "TOP":
print(line)
Example of my text file:
....
....
...SAT...
...
...TOP # I don't want to get this line
...
...
**...IPT...
...
...
...TOP... # I want to get this line**
...
...
...SAT...
...
...TOP... # I don't want to get this line.
**...IPT...
...TOP... # I want to get this line.**
You have two actions to write :
When you haven't seen IPT and IPT is in the line : save the line and start looking for TOP
When you see TOP and IPT has been seen : print the line and stop looking for TOP
Also, just look for basic string inclusion "TOP" in line rather than looking at a specific index, you don't need to be so specific here
temp = {}
with open("myfile.txt", 'r') as myfile:
search_mode = False
for idx, line in enumerate(myfile): # enumerate() return tuple : index + content
if not search_mode and "IPT" in line: # action 1
temp[idx] = line.rstrip()
search_mode = True
elif search_mode and "TOP" in line: # action 2
print(line)
search_mode = False
Gives :
print(json.dumps(temp, indent=4))
# >>>
...TOP... # I want get this line**
...TOP... # I want get this line.**
{
"7": "**...IPT...",
"16": "**...IPT..."
}
Pandas Dataframe are used for collection of labeled datas (imagine a CSV content) that's not what you have here
To fix your code just add variable, marking, whether IPT was already found or not.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
found_ipt=False
with open("myfile.txt", 'r') as myfile:
fileNum = 0
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
found_ipt=True
elif (line[12:15] == "TOP") & found_ipt:
print(line)
found_ipt=False
Keep track of whether you have found IPT yet in a variable "found". Then only look for TOP if found == True. The first time you find TOP after found == True is what you are looking for and you can stop looking.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
fileNum = 0
found = False
for line in myfile.readlines():
fileNum +=1
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
found = True
if found == True && line[12:15] == "TOP":
print(line)
break
lines = myfile.readlines()
for i, line in enumerate(lines):
...
if line[12:17] == "IPT":
temp[fileNum] = line.replace('\n', '')
for j, line2 in enumerate(lines[i:]):
if line2[12:15] == "TOP":
print(line2)
break
What it does is when it finds IPT line, it loops another one but slicing from the IPT line to next.
result = {}
with open("myfile.txt", 'r') as f:
ipt_found = False
for index, line in enumerate(f):
# For every line number and line in the file
if 'IPT' in line:
# If we find IPT in the line then we set ipt_found to True
ipt_found = True
elif 'TOP' in line and ipt_found:
# If we find TOP in the line and ipt_found is True then we add the line
result[index] = line
# Set ipt_found to False so we don't append anymore lines with TOP in
# until we find another line with IPT in
ipt_found = False
print(result)
That should do it.
temp = { } # Keys will be the line number, and values will be the lines that contains "IPT" with newline character removed
with open("myfile.txt", 'r') as myfile:
# This variable shows if a "IPT" has been found
string_found = False
# enumerate returns a generator of tuples, the first value of the tuple is the index (starting at 0), the second the line content
for line_num, line in enumerate(myfile.readlines()):
# if the string "IPT" is in our line and we haven't already found a previous IPT, we set string_found to True to signal that we can now get the next "TOP"
if "IPT" in line and not string_found:
string_found = True
# If there is a "TOP" in our line and we have already found an IPT previously, save the line
elif "TOP" in line and string_found:
temp[line_num] = line.replace("\n", "")
string_found = False
print(temp)
Say customPassFile.txt has two lines in it. First line is "123testing" and the second line is "testing321". If passwordCracking = "123testing", then the output would be that "123testing" was not found in the file (or something similar). If passwordCracking = "testing321", then the output would be that "testing321" was found in the file. I think that the for loop I have is only reading the last line of the text file. Any solutions to fix this?
import time
import linecache
def solution_one(passwordCracking):
print("Running Solution #1 # " + time.strftime("%Y-%m-%d %H:%M:%S",time.localtime()))
startingTimeSeconds = time.time()
currentLine = 1
attempt = 1
passwordFound = False
wordListFile = open("customPassFile.txt", encoding="utf8")
num_lines = sum(1 for line in open('customPassFile.txt'))
while(passwordFound == False):
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
if(passwordChecking == passwordCracking):
passwordFound = True
endingTimeSeconds = time.time()
overallTimeSeconds = endingTimeSeconds - startingTimeSeconds
print("~~~~~~~~~~~~~~~~~")
print("Password Found: {}".format(passwordChecking))
print("ATTEMPTS: {}".format(attempt))
print("TIME TO FIND: {} seconds".format(overallTimeSeconds))
wordListFile.close()
break
elif(currentLine == num_lines):
print("~~~~~~~~~~~~~~~~~")
print("Stopping Solution #1 # " + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
print("REASON: Password could not be cracked")
print("ATTEMPTS: {}".format(attempt))
break
else:
attempt = attempt + 1
currentLine = currentLine + 1
continue
The main problem with your code is that you open the file and you read it multiple times. The first time the file object position goes to the end and stays there. Next time you read the file nothing happens, since you are already at the end of the file.
Example
Sometimes an example is worth more than lots of words.
Take the file test_file.txt with the following lines:
line1
line2
Now open the file and read it twice:
f = open('./test_file.txt')
f.tell()
>>> 0
for l in f:
print(l, end='')
else:
print('nothing')
>>> line1
>>> line2
>>> nothing
f.tell()
>>> 12
for l in f:
print(l, end='')
else:
print('nothing')
>>> nothing
f.close()
The second time nothing happen, as the file object is already at the end.
Solution
Here you have two options:
you read the file only once and save all the lines in a list and then use the list in your code. It should be enough to replace
wordListFile = open("customPassFile.txt", encoding="utf8")
num_lines = sum(1 for line in open('customPassFile.txt'))
with
with open("customPassFile.txt", encoding="utf8") as f:
wordListFile = f.readlines()
num_lines = len(wordListFile)
you reset the file object position after you read the file using seek. It would be something along the line:
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
wordListFile.seek(0)
I would go with option 1., unless you have memory constraint (e.g. the file is bigger than memory)
Notes
I have a few extra notes:
python starts counters with 0 (like c/c++) and not 1 (like fortran). So probably you want to set:
currentLine = 0
when you read a file, the new line character \n is not stripped, so you have to do it (with strip) or account for it when comparing strings (using e.g. startswith). As example:
passwordChecking == passwordCracking
will likely always return False as passwordChecking contains \n and passwordCracking very likely doesn't.
Disclamer
I haven't tried the code, nor my suggestions, so there might be other bugs lurking around.
**I will delete this answer after OP understands the problem in indentation of I understand his intention of his code.*
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
#rest of the code.
Here your code is outside of for loop so only last line is cached.
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
#rest of the code.
I'm trying to create a function that accepts a file as input and prints the number of lines that are full-line comments (i.e. the line begins with #followed by some comments).
For example a file that contains say the following lines should print the result 2:
abc
#some random comment
cde
fgh
#another random comment
So far I tried along the lines of but just not picking up the hash symbol:
infile = open("code.py", "r")
line = infile.readline()
def countHashedLines(filename) :
while line != "" :
hashes = '#'
value = line
print(value) #here you will get all
#if(value == hashes): tried this but just wasn't working
# print("hi")
for line in value:
line = line.split('#', 1)[1]
line = line.rstrip()
print(value)
line = infile.readline()
return()
Thanks in advance,
Jemma
I re-worded a few statements for ease of use (subjective) but this will give you the desired output.
def countHashedLines(lines):
tally = 0
for line in lines:
if line.startswith('#'): tally += 1
return tally
infile = open('code.py', 'r')
all_lines = infile.readlines()
num_hash_nums = countHashedLines(all_lines) # <- 2
infile.close()
...or if you want a compact and clean version of the function...
def countHashedLines(lines):
return len([line for line in lines if line.startswith('#')])
I would pass the file through standard input
import sys
count = 0
for line in sys.stdin: """ Note: you could also open the file and iterate through it"""
if line[0] == '#': """ Every time a line begins with # """
count += 1 """ Increment """
print(count)
Here is another solution that uses regular expressions and will detect comments that have white space in front.
import re
def countFullLineComments(infile) :
count = 0
p = re.compile(r"^\s*#.*$")
for line in infile.readlines():
m = p.match(line)
if m:
count += 1
print(m.group(0))
return count
infile = open("code.py", "r")
print(countFullLineComments(infile))
I am still fight with python copy and replace lines, a question Here. Basically, I want to statistics the number of a pattern in a section, and renew it in the line. I think I have found the problem in my question: I call a sub-function to interate an same file in the main function, and the interation is messing up at time. I am pretty new to programming, I don't know how to do this copy-statistics-replace-copy thing in another way. Any suggestions or hints is welcome.
Here is part of code what I got now:
# sum number of keyframes
def sumKeys (sceneObj, objName):
sceneObj.seek(0)
block = []
Keys = ""
for line in sceneObj:
if line.find("ObjectAlias " + objName + "\n") != -1:
for line in sceneObj:
if line.find("BeginKeyframe") != -1:
for line in sceneObj:
if line.find("default") != -1:
block.append(line.rstrip())
Keys = len(block)
elif line.find("EndKeyframe") != -1:
break
break
break
return (Keys)
# renew number of keyframes
def renewKeys (sceneObj, objName):
sceneObj.seek(0)
newscene = ""
item = []
for line in sceneObj:
newscene += line
for obj in objName:
if line.find("ObjectAlias " + obj + "\n") != -1:
for line in sceneObj:
if line.find("EndKeyframe") != -1:
newscene += line
break
if line.find("BeginKeyframe") != -1:
item = line.split()
newscene += item[0] + " " + str(sumKey(sceneObj, obj)) + " " + item[-1] + "\n"
continue
else:
newscene += line
return (newscene)
Original:
lines
BeginObjects
lines
ObjectAlias xxx
lines
BeginKeyframe 34 12 ----> 34 is what I want to replace
lines
EndObject
BeginAnotherObjects
...
Goal:
lines
BeginObjects
lines
ObjectAlias xxx
lines
BeginKeyframe INT 12 ---->INT comes from sumKeys function
lines
EndObject
BeginAnotherObjects
...
You can use tell and seek to move inside a file, so to do what you want to do, you could use something like this, which I hacked together:
import re
# so, we're looking for the object 'HeyThere'
objectname = 'HeyThere'
with open('input.txt', 'r+') as f:
line = f.readline()
pos = f.tell()
found = False
while line:
# we only want to alter the part with the
# right ObjectAlias, so we use the 'found' flag
if 'ObjectAlias ' + objectname in line:
found = True
if 'EndObject' in line:
found = False
if found and 'BeginKeyframe' in line:
# we found the Keyframe part, so we read all lines
# until EndKeyframe and count each line with 'default'
sub_line = f.readline()
frames = 0
while not 'EndKeyframe' in sub_line:
if 'default' in sub_line:
frames += 1
sub_line = f.readline()
# since we want to override the 'BeginKeyframe', we
# have to move back in the file to before this line
f.seek(pos)
# now we read the rest of the file, but we skip the
# old 'BeginKeyframe' line we want to replace
f.readline()
rest = f.read()
# we jump back to the right position again
f.seek(pos)
# and we write our new 'BeginKeyframe' line
f.write(re.sub('\d+', str(frames), line, count=1))
# and write the rest of the file
f.write(rest)
f.truncate()
# nothing to do here anymore, just quit the loop
break
# before reading a new line, we keep track
# of our current position in the file
pos = f.tell()
line = f.readline()
The comments pretty much explain what's going on.
Given an input file like
foo
bar
BeginObject
something
something
ObjectAlias NotMe
lines
more lines
BeginKeyframe 22 12
foo
bar default
foo default
bar default
EndKeyframe
EndObject
foo
bar
BeginObject
something
something
ObjectAlias HeyThere
lines
more lines
BeginKeyframe 43243 12
foo
bar default
foo default
bar default
foo default
bar default
foo default
bar
EndKeyframe
EndObject
it will replace the line
BeginKeyframe 43243 12
with
BeginKeyframe 6 12