I keep getting the followinge error whenever there is a function call to xml(productline), but if I replace the functioncall with file = open('config\\' + productLine + '.xml','r'), it seems to work, why?
def xml(productLine):
with open('config\\' + productLine + '.xml','r') as f:
return f.read()
def getsanityresults(productline):
xmlfile=xml(productline) // replace with file = open('config\\' + productLine + '.xml','r')
dom = minidom.parse(xmlfile)
data=dom.getElementsByTagName('Sanity_Results')
#print "DATA"
#print data
textnode = data[0].childNodes[0]
testresults=textnode.data
#print testresults
for line in testresults.splitlines():
#print line
line = line.strip('\r,\n')
#print line
line = re.sub(r'(http://[^\s]+|//[^\s]+|\\\\[^\s]+)', r'\1', line)
print line
#print line
resultslis.append(line)
print resultslis
return resultslis
Error:
Traceback (most recent call last):
File "C:\Dropbox\scripts\announce_build_wcn\wcnbuild_release.py", line 910, in <module>
main()
File "C:\Dropbox\scripts\announce_build_wcn\wcnbuild_release.py", line 858, in main
testresults=getsanityresults(pL)
File "C:\Dropbox\scripts\announce_build_wcn\wcnbuild_release.py", line 733, in getsanityresults
dom = minidom.parse(xmlfile)
File "C:\python2.7.3\lib\xml\dom\minidom.py", line 1920, in parse
return expatbuilder.parse(file)
File "C:\python2.7.3\lib\xml\dom\expatbuilder.py", line 922, in parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: '<root>\n <PL name = "MSM8930.LA.2.0-PMIC-8917">\n
minidom.parse() expects either filename or file-object as a parameter but you are passing the content of the file, try this:
import os
from xml.dom import minidom
doc = minidom.parse(os.path.join('config', productline + '.xml'))
Unless you have specific requirements that favors minidom; use xml.etree.cElementTree to work with xml in Python. It is more pythonic and lxml that you might need in more complex cases supports its API so you don't need to learn twice.
I replace the functioncall with file = open('config\\' + productLine + '.xml','r'), it seems to work, why?
You've got two variables, with subtly different names:
xmlfile=xml(productline) // replace with file = open('config\\' + productLine + '.xml','r')
There's productline (lowercase l) and productLine (uppercase L).
If you use the same variable in both cases, you'll likely see more consistent results.
Related
I don't really know the Python language, so I'm asking for help from experts. I have a simple script and I need to add a construct to it
try:
except:
this is necessary so that the script ignores that there is no 'file.txt' file and does not display an error.
If the file "file.txt" is missing, the script.py script displays the following error:
Version 1.2.1.
Traceback (most recent call last):
File "script.py", line 10, in <module>
with open("file.txt") as myfile, open("save.txt", 'a') as save_file:
FileNotFoundError: [Errno 2] No such file or directory: 'file.txt'
How can I make the script ignore that there is no 'file.txt' and not throw this errorTraceback (most recent call last) ?
Script code:
import sys
if __name__ == '__main__':
if '-v' in sys.argv:
print(f'Version 1.2.1.')
h = format(0x101101, 'x')[2:]
with open("file.txt") as myfile, open("save.txt", 'a') as save_file:
for line in myfile:
if h in line:
save_file.write("Number = " + line + "")
print("Number = " + line + "")
Help how to add a try & except construct to it?
Thanks in advance for your help!
Put try: and except: around the code, and use pass in the except: block to ignore the error
try:
with open("file.txt") as myfile, open("save.txt", 'a') as save_file:
for line in myfile:
if h in line:
save_file.write("Number = " + line + "")
print("Number = " + line + "")
except FileNotFoundError:
pass
You do the try: and then have the indented-block of code you want to try and if the error is raised, it'll go to that except: block of code and do whatever you want there.
try:
with open("file.txt") as myfile, open("save.txt", 'a') as save_file:
for line in myfile:
if h in line:
save_file.write("Number = " + line + "")
print("Number = " + line + "")
except FileNotFoundError:
print("The error was found!")
# or do whatever other code you want to do, maybe nothing (so pass)
# maybe let the user know somehow, maybe do something else.
try:
with open("file.txt") as myfile, open("save.txt", 'a') as save_file:
for line in myfile:
if h in line:
save_file.write("Number = " + line + "")
print("Number = " + line + "")
except NameError:
print("file doesn't exist")
finally:
print("regardless of the result of the try- and except blocks, this block will be executed")
I downloaded a zip file from https://clinicaltrials.gov/AllPublicXML.zip, which contains over 200k xml files (most are < 10 kb in size), to a directory (see 'dirpath_zip' in the CODE) I created in ubuntu 16.04 (using DigitalOcean). What I'm trying to accomplish is loading all of these into MongoDB (also installed in the same location as the zip file).
I ran the CODE below twice and consistently failed when processing the 15988th file.
I've googled around and tried reading other posts regarding this particular error, but couldn't find a way to solve this particular issue. Actually, I'm not really sure what problem really is... any help is much appreciated!!
CODE:
import re
import json
import zipfile
import pymongo
import datetime
import xmltodict
from bs4 import BeautifulSoup
from pprint import pprint as ppt
def timestamper(stamp_type="regular"):
if stamp_type == "regular":
timestamp = str(datetime.datetime.now())
elif stamp_type == "filename":
timestamp = str(datetime.datetime.now()).replace("-", "").replace(":", "").replace(" ", "_")[:15]
else:
sys.exit("ERROR [timestamper()]: unexpected 'stamp_type' (parameter) encountered")
return timestamp
client = pymongo.MongoClient()
db = client['ctgov']
coll_name = "ts_"+timestamper(stamp_type="filename")
coll = db[coll_name]
dirpath_zip = '/glbdat/ctgov/all/alltrials_20180402.zip'
z = zipfile.ZipFile(dirpath_zip, 'r')
i = 0
for xmlfile in z.namelist():
print(i, 'parsing:', xmlfile)
if xmlfile == 'Contents.txt':
print(xmlfile, '==> entering "continue"')
continue
else:
soup = BeautifulSoup(z.read(xmlfile), 'lxml')
json_study = json.loads(re.sub('\s', ' ', json.dumps(xmltodict.parse(str(soup.find('clinical_study'))))).strip())
coll.insert_one(json_study)
i+=1
ERROR MESSAGE:
Traceback (most recent call last):
File "zip_to_mongo_alltrials.py", line 38, in <module>
soup = BeautifulSoup(z.read(xmlfile), 'lxml')
File "/usr/local/lib/python3.5/dist-packages/bs4/__init__.py", line 225, in __init__
markup, from_encoding, exclude_encodings=exclude_encodings)):
File "/usr/local/lib/python3.5/dist-packages/bs4/builder/_lxml.py", line 118, in prepare_markup
for encoding in detector.encodings:
File "/usr/local/lib/python3.5/dist-packages/bs4/dammit.py", line 264, in encodings
self.chardet_encoding = chardet_dammit(self.markup)
File "/usr/local/lib/python3.5/dist-packages/bs4/dammit.py", line 34, in chardet_dammit
return chardet.detect(s)['encoding']
File "/usr/lib/python3/dist-packages/chardet/__init__.py", line 30, in detect
u.feed(aBuf)
File "/usr/lib/python3/dist-packages/chardet/universaldetector.py", line 128, in feed
if prober.feed(aBuf) == constants.eFoundIt:
File "/usr/lib/python3/dist-packages/chardet/charsetgroupprober.py", line 64, in feed
st = prober.feed(aBuf)
File "/usr/lib/python3/dist-packages/chardet/hebrewprober.py", line 224, in feed
aBuf = self.filter_high_bit_only(aBuf)
File "/usr/lib/python3/dist-packages/chardet/charsetprober.py", line 53, in filter_high_bit_only
aBuf = re.sub(b'([\x00-\x7F])+', b' ', aBuf)
File "/usr/lib/python3.5/re.py", line 182, in sub
return _compile(pattern, flags).sub(repl, string, count)
MemoryError
Try to push reading from file and inserting into db in another method.
Also add gc.collect() for garbage collection.
import gc;
def read_xml_insert(xmlfile):
soup = BeautifulSoup(z.read(xmlfile), 'lxml')
json_study = json.loads(re.sub('\s', ' ', json.dumps(xmltodict.parse(str(soup.find('clinical_study'))))).strip())
coll.insert_one(json_study)
for xmlfile in z.namelist():
print(i, 'parsing:', xmlfile)
if xmlfile == 'Contents.txt':
print(xmlfile, '==> entering "continue"')
continue;
else:
read_xml_insert(xmlfile);
i+=1
gc.collect()
`
Please see.
I'm just starting to learn python and have a textfile that looks like this:
Hello
World
Hello
World
And I want to add the numbers '55' to the beggining and end of every string that starts with 'hello'
The numbers '66' to the beggining and every of every string that starts with 'World'
etc
So my final file should look like this:
55Hello55
66World66
55Hello55
66World66
I'm reading the file in all at once, storing it in a string, and then trying to append accordingly
fp = open("test.txt","r")
strHolder = fp.read()
print(strHolder)
if 'Hello' in strHolder:
strHolder = '55' + strHolder + '55'
if 'World' in strHolder:
strHolder = '66' + strHolder + '66'
print(strHolder)
fp.close()
However, my string values '55' and '66' are always being added to the front of the file and end of the file, not the front of a certain string and to the end of the string, where I get this output of the string:
6655Hello
World
Hello
World
5566
Any help would be much appreciated.
You are reading the whole file at once with .read().
You can read it line by line in a for loop.
new_file = []
fp = open("test.txt", "r")
for line in fp:
line = line.rstrip("\n") # The string ends in a newline
# str.rstrip("\n") removes newlines at the end
if "Hello" in line:
line = "55" + line + "55"
if "World" in line:
line = "66" + line + "66"
new_file.append(line)
fp.close()
new_file = "\n".join(new_file)
print(new_file)
You could do it all at once, by reading the whole file and splitting by "\n" (newline)
new_file = []
fp = open("text.txt")
fp_read = fp.read()
fp.close()
for line in fp_read.split("\n"):
if "Hello" # ...
but this would load the whole file into memory at once, while the for loop only loads line by line (So this may not work for larger files).
The behaviour of this is that if the line has "Hello" in it, it will get "55" before and after it (even if the line is " sieohfoiHellosdf ") and the same for "World", and if it has both "Hello" and "World" (e.g. "Hello, World!" or "asdifhoasdfhHellosdjfhsodWorldosadh") it will get "6655" before and after it.
Just as a side note: You should use with to open a file as it makes sure that the file is closed later.
new_file = []
with open("test.txt") as fp: # "r" mode is default
for line in fp:
line = line.rstrip("\n")
if "Hello" in line:
line = "55" + line + "55"
if "World" in line:
line = "66" + line + "66"
new_file.append(line)
new_file = "\n".join(new_file)
print(new_file)
You need to iterate over each line of the file in order to get the desired result. In your code you are using .read(), instead use .readlines() to get list of all lines.
Below is the sample code:
lines = []
with open("test.txt", "r") as f:
for line in f.readlines(): # < Iterate over each line
if line.startswith("Hello"): # <-- check if line starts with "Hello"
line = "55{}55".format(line)
elif line.startswith("World"):
line = "66{}66".format(line)
lines.append(line)
print "\n".join(lines)
Why to use with? Check Python doc:
The ‘with‘ statement clarifies code that previously would use try...finally blocks to ensure that clean-up code is executed. In this section, I’ll discuss the statement as it will commonly be used. In the next section, I’ll examine the implementation details and show how to write objects for use with this statement.
The ‘with‘ statement is a control-flow structure whose basic structure is:
with expression [as variable]: with-block
The expression is evaluated, and it should result in an object that supports the context management protocol (that is, has enter() and exit() methods).
once you have read the file:
read_file = read_file.replace('hello','55hello55')
It'll replace all hellos with 55hello55
and use with open(text.txt, 'r' ) as file_hndler:
To read a text file, I recommend the following way which is compatible with Python 2 & 3:
import io
with io.open("test", mode="r", encoding="utf8") as fd:
...
Here, I make the assumption that your file use uft8 encoding.
Using a with statement make sure the file is closed at the end of reading even if a error occurs (an exception). To learn more about context manager, take a look at the Context Library.
There are several ways to read a text file:
read the whole file with: fd.read(), or
read line by line with a loop: for line in fd.
If you read the whole file, you'll need to split the lines (see str.splitlines. Here are the two solutions:
with io.open("test", mode="r", encoding="utf8") as fd:
content = fd.read()
for line in content.splilines():
if "Hello" in line:
print("55" + line + "55")
if "World" in line:
print("66" + line + "66")
Or
with io.open("test", mode="r", encoding="utf8") as fd:
for line in content.splilines():
line = line[:-1]
if "Hello" in line:
print("55" + line + "55")
if "World" in line:
print("66" + line + "66")
If you need to write the result in another file you can open the output file in write mode and use print(thing, file=out) as follow:
with io.open("test", mode="r", encoding="utf8") as fd:
with io.open("test", mode="w", encoding="utf8") as out:
for line in content.splilines():
line = line[:-1]
if "Hello" in line:
print("55" + line + "55", file=out)
if "World" in line:
print("66" + line + "66", file=out)
If you use Python 2, you'll need the following directive to use the print function:
from __future__ import print_function
Ok well i have another question. I implemented the error checking but for some reason it still isn't working. I still get a python error instead of the error i just wrote in the program.
Traceback (most recent call last):
File "E:/python/copyfile.py", line 31, in <module>
copyFile()
File "E:/python/copyfile.py", line 8, in copyFile
file1 = open(source,"r")
IOError: [Errno 2] No such file or directory: 'C:/Users/Public/asdf.txt'
check out the shutil module in standard library:
shutil.copyfile(src, dst)
http://docs.python.org/2/library/shutil.html#shutil.copyfile
I would rather ask you to write your own:
import os
import hashlib
def md5ChkSum(_file): # Calculates MD5 CheckSum
with open(_file, 'rb') as fp:
hash_obj = hashlib.md5()
line = fp.readline()
while line:
hash_obj.update(line)
line = fp.readline()
return hash_obj.hexdigest()
def copier(_src, _dst):
if not os.path.exists(_src):
return False
_src_fp = open(_src, "r")
_dst_fp = open(_dst, "w")
line = _src_fp.readline()
while line:
_dst_fp.write(line)
line = _src_fp.readline()
_src_fp.close()
_dst_fp.close()
if md5ChkSum(_src) == md5ChkSum(_dst):
return "Copy: SUCCESSFUL"
return "Copy: FAILED"
res = copier(r"/home/cnsiva/6.jpg", r"/home/cnsiva/6_copied.jpg")
if not res:
print "FILE Does not Exists !!!"
else: print res
OUTPUT:
Copy: SUCCESSFUL
I'm trying to write multiple files to a directory with very little changed in between each file (eg. incremental id numbers) When I try run my program, it fails after writing about 5 files. But when I try it again and re-select the source file, it works. Here's my code:
if not os.path.isdir(self.fDirectory + "/AutoGen" + strftime("%Y-%m-%d %H:%M:%S", gmtime())):
os.mkdir(self.fDirectory + "/AutoGen" + strftime("%Y-%m-%d_%H.%M.%S", gmtime()))
anum = 0
for x in range(len(self.csvdata)-1):
for y in range(len(self.csvdata[x+1])):
self.myRoot.find(self.csvdata[0][y]).text = self.csvdata[x][y]
anum+=1
myTree.write(self.fDirectory + "/AutoGen" + strftime("%Y-%m-%d_%H.%M.%S", gmtime()) + "/" + self.filename + "_" + str(anum) + ".xml")
And here's the error I'm getting:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python32\lib\tkinter\__init__.py", line 1399, in __call__
return self.func(*args)
File "C:\Users\CNash\Documents\XML Generator\XMLGen.py", line 148, in doIt
myTree.write(self.fDirectory + "/AutoGen" + strftime("%Y-%m-%d_%H.%M.%S", gmtime()) + "/" + self.filename + "_" + str(anum) + ".xml")
File "C:\Python32\lib\xml\etree\ElementTree.py", line 836, in write
file = open(file_or_filename, "wb")
IOError: [Errno 2] No such file or directory: 'C:/Users/CNash/Documents/XML Generator/AutoGen2012-07-31_20.23.52/EXuTest_DOCD00140_6.xml'
Any ideas much appreciated!
For one, use os.path.join, it will make your life easier.
And it looks to me that the first and last calls to strftime happen at different times (and you left out an underscore in your first one). The script can't find the directory, because it doesn't exist. One named with a time a few seconds before probably, even suspiciously, does, I bet.
Try replacing your first if-statement with
dirname = os.path.join(self.fDirectory,strftime("AutoGen%Y-%m-%d_%H.%M.%S",gmtime()))
if not os.path.isdir(dirname):
os.mkdir(dirname)
and the last line with:
myTree.write(os.path.join(dirname, self.filename + "_" + str(anum) + ".xml"))