Python - How to stop the loop - python

I have this where it reads a file called source1.html, source2.html, source3.html, but when it cant find the next file (because it doesnt exist) it gives me a error. there can be an x amount of sourceX.html, so i need something to say if the next sourcex.html file can not be found, stop the loop.
Traceback (most recent call last): File "main.py", line 14, in
file = open(filename, "r") IOError: [Errno 2] No such file or
directory: 'source4.html
how can i stop the script looking for the next source file?
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')
while os.path.isfile(filename):
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
soup = BeautifulSoup(file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
savefile.close()
print "done"

use a try / except and break
while os.path.isfile(filename):
try: # try to do this
# <your code>
except FileNotFoundError: # if this error occurs
break # exit the loop
The reason your code doesn't currently work is you're checking the previous file exists in your while loop. Not the next one. Hence you could also do
while True:
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
if not os.path.isfile(filename):
break
# <rest of your code>

you can try opening file, and break out of while loop once you catch an IOError exception.
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')
while os.path.isfile(filename):
try:
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
except IOError:
print("file not found! breaking out of loop.")
break
soup = BeautifulSoup(file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
savefile.close()
print "done"

I'll suggest you to use os.path.exists() (which returns True/False) and os.path.isfile() both.
Use with statement to open file. It is Pythonic way to open files.
with statement is best preferred among the professional coders.
These are the contents of my current working directory.
H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>dir
Volume in drive H is New Volume
Volume Serial Number is C867-828E
Directory of H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles
11/05/2018 16:12 <DIR> .
11/05/2018 16:12 <DIR> ..
11/05/2018 15:54 106 source1.html
11/05/2018 15:54 106 source2.html
11/05/2018 15:54 106 source3.html
11/05/2018 16:12 0 stopReadingIfNot.md
11/05/2018 16:11 521 stopReadingIfNot.py
5 File(s) 839 bytes
2 Dir(s) 196,260,925,440 bytes free
The below Python code shows how will you read files source1.html, source2.html, source.3.html and stop if there is no more files of the form sourceX.html (where X is 1, 2, 3, 4, ... etc.).
Sample code:
import os
n = 1;
html_file_name = 'source%d.html'
# It is necessary to check if sourceX.html is file or directory.
# If it is directory the check it if it exists or not.
# It it exists then perform operation (read/write etc.) on file.
while os.path.isfile(html_file_name % (n)) and os.path.exists(html_file_name % (n)):
print "Reading ", html_file_name % (n)
# The best way (Pythonic way) to open file
# You don't need to bother about closing the file
# It will be taken care by with statement
with open(html_file_name % (n), "r") as file:
# Make sure it works
print html_file_name % (n), " exists\n";
n += 1;
Output:
H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>python stopReadingIfNot.py
Reading source1.html
source1.html exists
Reading source2.html
source2.html exists
Reading source3.html
source3.html exists
So based on the above logic. you can modify your code. It will work.
Thanks.

This appears to be a sequence error. Let's look at a small fragment of your code, specifically lines dealing with filename:
filename = "source" + str(n) + ".html"
while os.path.isfile(filename):
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
You're generating the next filename before you open the file (or really, checking the old filename then opening a new one). It's a little hard to see because you're really updating n while filename holds the previous number, but if we look at them in sequence it pops out:
n = 1
filename = "source1.html" # before loop
while os.path.isfile(filename):
filename = "source1.html" # first time inside loop
n = 2
open(filename)
while os.path.isfile(filename): # second time in loop - still source1
filename = "source2.html"
n = 3
open(filename) # We haven't checked if this file exists!
We can fix this a few ways. One is to move the entire updating, n before filename, to the end of the loop. Another is to let the loop mechanism update n, which is a sight easier (the real fix here is that we only use one filename value in each iteration of the loop):
for n in itertools.count(1):
filename = "source{}.html".format(n)
if not os.path.isfile(filename):
break
file = open(filename, "r")
#...
At the risk of looking rather obscure, we can also express the steps functionally (I'm using six here to avoid a difference between Python 2 and 3; Python 2's map wouldn't finish):
from six.moves import map
from itertools import count, takewhile
numbers = count(1)
filenames = map('source{}.html'.format, numbers)
existingfiles = takewhile(os.path.isfile, filenames)
for filename in existingfiles:
file = open(filename, "r")
#...
Other options include iterating over the numbers alone and using break when isfile returns False, or simply catching the exception when open fails (eliminating the need for isfile entirely).

Related

Splitting a big HTML file into multiple smaller files

I extracted a full Discord dm as an HTML file.
It's too big to open up like that, so I wanted to split it into multiple files. I found another post here that had a solution, but I can't seem to figure out what I do wrong. Since the script opens up for 1 second and just closes afterwards with no results.
Here is the code I'm using.
from __future__ import print_function
from lxml import etree, html
from io import StringIO
from pathlib import Path
parser = html.HTMLParser()
header= "<html><body>\n"
footer = "</body></html>\n"
i = 1
fi = 1
messagesPerFile = 3
file = "DMPim.html"
buffer = ""
try:
tree = html.parse(StringIO(Path(file).read_text()), parser)
try:
# target and print all <div class="collectionDiv"> elements and subelements
for element in tree.xpath('//div[#class="chatlog__message-group"]'):
buffer += etree.tostring(element, pretty_print=True).decode("utf-8")
if i % messagesPerFile == 0 and i > 0:
f = open("chat" + str(fi) + ".html", "w+")
f.write(header + buffer + footer)
f.close()
fi+=1
buffer = ""
i+=1
# if remaining elements are still in the buffer, write them out
if buffer != "":
f = open("chat" + str(fi) + ".html", "w+")
f.write(header + buffer + footer)
f.close()
except etree.XPathEvalError as details:
print ('ERROR: XPath expression', details.error_log)
except etree.XMLSyntaxError as details:
print ('ERROR: parser', details.error_log)
I'm pretty new in all this and wanted this as sort of start up project, so excuse me if I'm asking obvious things.

Python function within a loop iterating over text files only works on the first file

I'm writing a simple script which loops over some text file and uses a function which should replace some string looking in a .csv file (every row has the word to replace and the word which I want there)
Here is my simple code:
import os
import re
import csv
def substitute_tips(table, tree_content):
count = 0
for l in table:
print("element of the table", l[1])
reg_tree = re.search(l[1],tree_content)
if reg_tree is not None:
#print("match in the tree: ",reg_tree.group())
tree_content = tree_content.replace(reg_tree.group(), l[0])
count = count + 1
else:
print("Not found: ",l[1])
tree_content = tree_content
print("Substitutions done: ",count)
return(tree_content)
path=os.getcwd()
table_name = "162_table.csv"
table = open(table_name)
csv_table = csv.reader(table, delimiter='\t')
for root, dirs, files in os.walk(path, topdown=True):
for name in files:
if name.endswith(".tree"):
print(Fore.GREEN + "Working on treefile", name)
my_tree = open(name, "r")
my_tree_content = my_tree.read()
output_tree = substitute_tips(csv_table, my_tree_content)
output_file = open(name.rstrip("tree") + "SPECIES_NAME.tre", "w")
output_file.write(output_tree)
output_file.close()
else:
print(Fore.YELLOW + name ,Fore.RED + "doesn't end in .tree")
It's probably very easy, but I'm a newbie.
Thanks!
The files list returned by os.walk contains only the file names rather than the full path names. You should join root with the file names instead to be able to open them:
Change:
my_tree = open(name, "r")
...
output_file = open(name.rstrip("tree") + "SPECIES_NAME.tre", "w")
to:
my_tree = open(os.path.join(root, name), "r")
...
output_file = open(os.path.join(root, name.rstrip("tree") + "SPECIES_NAME.tre"), "w")

How to send the data written to a file (.csv) to a folder on my desktop (directory)?

Essentially the data is temperatures from 4 different states over the course of 12 months, so there is 48 files to be populated into my folder on my desktop directory. But I am not sure how to take the data being pulled from the web and then take the files being saved in my program to be sent to the directory of my desktop. That's what I am confused about, how to take the files being created on in my program and send them to a folder on my desktop.
I am copying the data from the web, cleaning it up, then saving it into a file, then taking that file and wanting to save it to a folder on my desktop.
Here is the code:
import urllib
def accessData(Id, Month):
url = "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=" + str(Id) + "&year=2017&month=" + str(Month) + "&graphspan=month&format=1"
infile = urllib.urlopen(url)
readLineByLine = infile.readlines()
infile.close()
return readLineByLine
f = open('stations.csv', 'r')
for line in f.readlines():
vals = line.split(',')
for j in range(1,13): # accessing months here from 1 to 12, b/c 13 exclusive
data = accessData(line, j)
filename = "{}-0{}-2017.csv".format(vals[0], j)
print(str(filename))
row_count = len(data)
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile = open(filename, 'w')
writeFile.write(data[i])
openfile = open(Desktop, writeFile , 'r')
file.close()
Have you tried running the script from your desktop. It looks like you haven't specified a directory. So maybe running from your desktop should output your results to your current working directory.
Alternatively, you could try use the in-built os library.
import os
os.getcwd() # to get the current working directory
os.chdir(pathname) # change your working directory to the path specified.
This would change your working directory to the place you want to save your files.
Also, in regards to the last four lines of your code. file is not open, so you cannot close this. Also, I do not believe you need the openfile statement.
writeFile = open(filename, 'w')
writeFile.write(data[i])
openfile = open(Desktop, writeFile , 'r')
file.close()
Try this instead.
with open(filename, 'w') as writeFile:
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile.write(data[i])
Using this approach you shouldn't need to close the file. 'w' is to write as if a new file, change this to 'a' if you need to append to the file.
You just need to provide writeFile.write() with the path to your destination file, rather than just the filename (which will otherwise be saved into your current working directory.)
Try something like:
f = open('stations.csv', 'r')
target_dir = "/path/to/your/Desktop/folder/"
for line in f.readlines():
...
# We can open the file outside your inner "row" loop
# using the combination of the path to your Desktop
# and your filename
with open(target_dir+filename, 'w') as writeFile:
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile.write(data[i])
# The "writeFile" object will close automatically outside the
# "with ... " block
As others have mentioned, you could approach this two different ways:
1) Run the script directly from the directory to which you would like to save the files. Then you would just need to specify the full path to the .csv file you are reading.
2) You could provide the full path to where you would like to save the files when you write them, however this seems more intensive and unnecessary.
On another note, when opening files for the purpose of reading/writing them, use with to simply open the file for as long as you need it, then when you exit the with statement, the file will automatically be closed.
Here is an example of Option 1 with some clean-up:
import urllib
def accessData(Id, Month):
url = "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=" + str(Id) + "&year=2017&month=" + str(Month) + "&graphspan=month&format=1"
infile = urllib.urlopen(url)
readLineByLine = infile.readlines()
infile.close()
return readLineByLine
with open('Path to File' + 'stations.csv', 'r') as f:
for line in f.readlines():
vals = line.split(',')
for j in range(1,13):
data = accessData(line, j)
filename = "{}-0{}-2017.csv".format(vals[0], j)
with open(filename, 'w') as myfile:
for i in range(2, len(data)):
if data[i]!='<br>\n' and data[i]!='\n':
myfile.write(data[i])
print(filename + ' - Completed')

Python Writing to txt error

Im trying to write different things onto a text file in a while loop but it only writes it once. I want to write something to unmigrated.txt
import urllib.request
import json
Txtfile = input("Name of the TXT file: ")
fw = open(Txtfile + ".txt", "r")
red = fw.read()
blue = red.split("\n")
i=0
while i<len(blue):
try:
url = "https://api.mojang.com/users/profiles/minecraft/" + blue[i]
rawdata = urllib.request.urlopen(url)
newrawdata = rawdata.read()
jsondata = json.loads(newrawdata.decode('utf-8'))
results = jsondata['id']
url_uuid = "https://sessionserver.mojang.com/session/minecraft/profile/" + results
rawdata_uuid = urllib.request.urlopen(url_uuid)
newrawdata_uuid = rawdata_uuid.read()
jsondata_uuid = json.loads(newrawdata_uuid.decode('utf-8'))
try:
results = jsondata_uuid['legacy']
print (blue[i] + " is " + "Unmigrated")
wf = open("unmigrated.txt", "w")
wring = wf.write(blue[i] + " is " + "Unmigrated\n")
except:
print(blue[i] + " is " + "Migrated")
except:
print(blue[i] + " is " + "Not-Premium")
i+=1
You keep overwriting opening the file with w inside the loop so you only see the last data that was written to the file, either open the file once outside the loop or open with a to append. Opening once would be the simplest approach, you can also use range instead of your while or better again just iterate over the list:
with open("unmigrated.txt", "w") as f: # with close your file automatically
for ele in blue:
.....
Also wring = wf.write(blue[i] + " is " + "Unmigrated\n") sets wring to None which is what write returns so probably not of any real use.
Lastly using a blank expect is usually never a good idea, catch the specific exceptions you expect and log or at least print when you get an error.
Using the requests library, I would break up your code doing something like:
import requests
def get_json(url):
try:
rawdata = requests.get(url)
return rawdata.json()
except requests.exceptions.RequestException as e:
print(e)
except ValueError as e:
print(e)
return {}
txt_file = input("Name of the TXT file: ")
with open(txt_file + ".txt") as fw, open("unmigrated.txt", "w") as f: # with close your file automatically
for line in map(str.rstrip, fw): # remove newlines
url = "https://api.mojang.com/users/profiles/minecraft/{}".format(line)
results = get_json(url).get("id")
if not results:
continue
url_uuid = "https://sessionserver.mojang.com/session/minecraft/profile/{}".format(results)
results = get_json(url_uuid).get('legacy')
print("{} is Unmigrated".format(line))
f.write("{} is Unmigrated\n".format(line))
I am not sure where 'legacy' fits into the code, that logic I will leave to you. You can also iterate directly over the file object so you can forget about splitting the lines into blue.
try:
with open("filename", "w") as f:
f.write("your content")
But that will overwrite all contents of the file.
Instead, if you want to append to the file use:
with open("filename", "a") as f:
If you choose to not use the with syntax, remember to close the file.
Read more here:
https://docs.python.org/2/library/functions.html#open

How to use Python to find a string in a line and change the text n lines after the string

I need to find every instance of "translate" in a text file and replace a value 4 lines after finding the text:
"(many lines)
}
}
translateX xtran
{
keys
{
k 0 0.5678
}
}
(many lines)"
The value 0.5678 needs to be 0. It will always be 4 lines below the "translate" string
The file has up to about 10,000 lines.
example text file name: 01F.pz2.
I'd also like to cycle through the folder and repeat the process for every file with the pz2 extension (up to 40).
Any help would be appreciated!
Thanks.
I'm not quite sure about the logic for replacing 0.5678 in your file, therefore I use a function for that - change it to whatever you need, or explain more in details what you want. Last number in line? only floating-point number?
Try:
import os
dirname = "14432826"
lines_distance= 4
def replace_whatever(line):
# Put your logic for replacing here
return line.replace("0.5678", "0")
for filename in filter(lambda x:x.endswith(".pz2") and not x.startswith("m_"), os.listdir(dirname)):
print filename
with open(os.path.join(dirname, filename), "r") as f_in, open(os.path.join(dirname,"m_%s" % filename), "w") as f_out:
replace_tasks = []
for line in f_in:
# search marker in line
if line.strip().startswith("translate"):
print "Found marker in", line,
replace_tasks.append(lines_distance)
# replace if necessary
if len(replace_tasks)>0 and replace_tasks[0] == 0:
del replace_tasks[0]
print "line to change is", line,
line_to_write = replace_whatever(line)
else:
line_to_write = line
# Write to output
f_out.write(line_to_write)
# decrease counters
for i, task in enumerate(replace_tasks):
replace_tasks[i] -= 1
The comments within the code should help understanding. The main concept is the list replace_tasks that keeps record of when the next line to modify will come.
Remarks: Your code sample suggests that the data in your file are structured. It will definitely be saver to read this structure and work on it instead of search-and-replace approach on a plain text file.
Thorsten, I renamed my original files to have the .old extension and the following code works:
import os
target_dir = "."
# cycle through files
for path, dirs, files in os.walk(target_dir):
# file is the file counter
for file in files:
# get the filename and extension
filename, ext = os.path.splitext(file)
# see if the file is a pz2
if ext.endswith('.old') :
# rename the file to "old"
oldfilename = filename + ".old"
newfilename = filename + ".pz2"
old_filepath = os.path.join(path, oldfilename)
new_filepath = os.path.join(path, newfilename)
# open the old file for reading
oldpz2 = open (old_filepath,"r")
# open the new file for writing
newpz2 = open (new_filepath,"w")
# reset changeline
changeline = 0
currentline = 0
# cycle through old lines
for line in oldpz2 :
currentline = currentline + 1
if line.strip().startswith("translate"):
changeline = currentline + 4
if currentline == changeline :
print >>newpz2," k 0 0"
else :
print >>newpz2,line

Categories