I've found similar questions to this but can't find an exact answer and I'm having real difficulty getting this to work, so any help would be hugely appreciated.
I need to find a XML file in a folder structure that changes every time I run some automated tests.
This piece of code finds the file absolutely fine:
import xml.etree.ElementTree as ET
import glob
report = glob.glob('./Reports/Firefox/**/*.xml', recursive=True)
print(report)
I get a path returned. I then want to use that path, in the variable "report" and look for text within the XML file.
The following code finds the text fine IF the python file is in the same directory as the XML file. However, I need the python file to reside in the parent file and pass the "report" variable into the first line of code below.
tree = ET.parse("JUnit_Report.xml")
root = tree.getroot()
for testcase in root.iter('testcase'):
testname = testcase.get('name')
teststatus = testcase.get('status')
print(testname, teststatus)
I'm a real beginner at Python, is this even possible?
Build the absolute path to your report file:
report = glob.glob('./Reports/Firefox/**/*.xml', recursive=True)
abs_path_to_report = os.path.abspath(report)
Pass that variable to whatever you want:
tree = ET.parse(abs_path_to_report )
Related
I have a script that goes through a directory with many XML files and extracts or adds information to these files. I use XPath to identify the elements of interest.
The relevant piece of code is this:
import lxml.etree as et
import lxml.sax
# deleted non relevant code
for root, dirs, files in os.walk(ROOT):
# iterate all files
for file in files:
if file.endswith('.xml'):
# join root dir and file name
file_path = os.path.join(ROOT, file)
# load root element from file
file_root = et.parse(file_path).getroot()
# This is a function that I define elsewhere in which I use XPath to identify relevant
# elements and extract, change or add some information
xml_dosomething(file_root)
# init tree object from file_root
tree = et.ElementTree(file_root)
# save modified xml tree object to file with an added text so that I can keep a copy of original.
tree.write(file_path.replace('.xml', '-clean.xml'), encoding='utf-8', doctype='<!DOCTYPE document SYSTEM "estcorpus.dtd">', xml_declaration=True)
I have seen in various places that people recommend using Sax(on) to speed up the processing of large files. After checking the documentation of the LXML Sax module in (https://lxml.de/sax.html) I'm at a loss as to how to modify my code so that I can leverage the Sax module. I can see the following in the documentation:
handler = lxml.sax.ElementTreeContentHandler()
then there is a list of statements like (handler.startElementNS((None, 'a'), 'a', {})) that would populate the 'handler' "document" (?) with what would be the elements of a the XML document. After that I see:
tree = handler.etree
lxml.etree.tostring(tree.getroot())
I think I understand what handler.etree does but my problem is that I want 'handler' to be the files in the directory that I'm working with rather than a string that I create by using 'handler.startElementNS' and the like. What do I need to change in my code to get the Sax module to do the work that needs to be done with the files as input?
I'll try to describe the problem in a simple way.
I have a .txt file that I can not know the full name of it which located under constant path
[for example: the full name is: Hello_stack.txt, I only can give to function the part: 'Hello_']
the input is: Path_to_file/ + 'Hello_'
the expected output is: Path_to_file/Hello_stack.txt
How can i do that?
I tried to give a path and check recursively if part of my file name is exist and if so, to return it's path.
this is my implementation: [of course I'd like to get another way if it works]
def get_CS_R2M_METRO_CALLBACK_FILE_PATH():
directory = 'path_of_file'
file_name = directory + 'part_of_file_name'
const_path = Path(file_name)
for path in [p for p in const_path.rglob("*")]:
if path.is_file():
return path
Thanks for help.
You might retrieve the file list in your path and then select from the list based upon your partial file name. Here is a snippet of code to perform that type of function on a Linux machine.
import os
dir = '/home/craig/Python_Programs/GetFile'
files = os.listdir(dir)
print('Files--> ', files)
for i in files:
myfile = 'Hello_'
if (myfile[0:4] == i[0:4]):
print('File(s) like \"Hello_\"-->', i)
When I executed this simple program over a directory/folder that had various files in the directory, here was the output to the terminal.
Una:~/Python_Programs/GetFile$ python3 GetFile.py
Files--> ['Hello_Stack.txt', 'Okay.txt', 'Hi_Stack.txt', 'GetFile.py', 'Hello_Stack.bak']
File(s) like "Hello_"--> Hello_Stack.txt
File(s) like "Hello_"--> Hello_Stack.bak
The literal value for your path would be different on a Windows machine. I hope this might provide you with a method to achieve your goal.
Regards.
I wrote a Python script to generate XML log files but every time I run it, it is saved/written in the same folder/path as the script itself.
Here is a simplified version of it:
import xml.etree.cElementTree as ET
root = ET.Element("LOG")
Child_1 = ET.SubElement(root, "CHILD1")
Child_1.text = "I am child 1"
tree = ET.ElementTree(root)
tree.write("log_file.xml")
However, I want to save/write it in a specific folder. What is the simplest way to do it? Thanks.
You can get the dir like this:
import os
path = os.path.join("C:\\", "Users", os.getlogin(), "Desktop","log_file.xml")
Then you can just do this:
tree.write(path)
I have a text file known as testConfigFile which is as follow :
inputCsvFile = BIN+"/testing.csv"
description = "testing"
In which BIN is my parent directory of the folder (already declared using os.getcwd in my python script).
The problem I'm facing now is, how to read and extract the BIN+"testing.csv" from the testConfigFile.txt.
Since the name testing.csv might be changed to other names, so it will be a variable. I'm planning to do something like, first the script reads the keyword "inputCsvFile = " then it will automatically extract the words behind it, which is "BIN+"testing.csv".
f = open("testConfigFile","r")
line = f.readlines(f)
if line.startswith("inputCsvFile = ")
inputfile = ...
This is my failed partial code, where I've no idea on how to fix it. Is there anyone willing to help me?
Reading a config off a unstructured txt file is not the best idea. Python actually is able to parse config files that are structured in a certain way. I have restructured your txt file so that it is easier to work with. The config file extension does not really matter, I have changed it to .ini in this case.
app.ini:
[csvfilepath]
inputCsvFile = BIN+"/testing.csv"
description = "testing"
Code:
from configparser import ConfigParser # Available by default, no install needed.
config = ConfigParser() # Create a ConfigParser instance.
config.read('app.ini') # You can input the full path to the config file.
file_path = config.get('csvfilepath', 'inputCsvFile')
file_description = config.get('csvfilepath', 'description')
print(f"CSV File Path: {file_path}\nCSV File Description: {file_description}")
Output:
CSV File Path: BIN+"/testing.csv"
CSV File Description: "testing"
To read more about configparser, you may refer here.
For a simple tutorial on configparser, you may refer here.
I am learning Natural Language Processing with python's nltk. I want to create a corpus from an XML file i have in my directory. So I used the following code.
>> from nltk.corpus import XMLCorpusReader
>> corpus_root = "/Desktop/my_dir/corpus/"
>> wiki = XMLCorpusReader(corpus_root ,'output.xml')
>> wiki.fileids()
>>
This code block is supposed to output the fileid as 'output.xml'.But it doesnt return anything and the cursor goes to the next line ">>".
I have my output.xml in the exact directory as specified in corpus_root.
I have all the permission to read and write to the file 'output.xml'.
I have nltk and all its data installed and has all the specified paths.
What should i do to make it work ?
Let's walk through your code:
from nltk.corpus import XMLCorpusReader
corpus_root = "/Desktop/my_dir/corpus/"
I'm a bit skeptical of this path name (see this answer: https://stackoverflow.com/a/6617625/583834). It probably should be something like /usr/my_username/Desktop/my_dir/corpus. Make sure that your path is correct by opening up your terminal window, navigating to your directory and executing pwd to get your absolute path. Then copy it above.
wiki = XMLCorpusReader(corpus_root ,'output.xml')
XMLCorpusReader reads a directory as well as a list of filenames already existing in that directory. The second argument here is your input file name, not your output name. (Note the third "how to do it" section here for a sample call of the related WordListCorpusReader: reader = WordListCorpusReader('.', ['wordlist']))
wiki.fileids()
It's likely that you're not getting anything from this last line because the previous two lines are not used correctly.