I need for testing purposes to fill an XML file with for example 1000 lines (P0 thru p999) with between 1 and 20 steps and then add the random steps.
How can I do that? I can't find any (good) example with a lot of for loops.
the XML needs to look something like this:
I hope to do this in Python.
<root>
<P>P0<NPS>20</NPS><STEPS>5,19,22,12,0,3,22,4,11,0,2,7,20,19,16,24,9,2,15,6,</STEPS></P>
<P>P1<NPS>2</NPS><STEPS>12,21,</STEPS></P>
<P>P2<NPS>15</NPS><STEPS>21,23,10,18,23,22,17,4,17,15,17,18,18,14,22,</STEPS></P>
<P>P3<NPS>4</NPS><STEPS>15,24,12,10,</STEPS></P>
...
</root>
Something like
import random
NUM_OF_LINES = 10
MAX_NUM_OF_STEPS = 7
STEP_RANGE = 20
TEMPLATE = '<P>P{}<NPS>{}</NPS><STEPS>{}</STEPS></P>'
for i in range(1,NUM_OF_LINES):
steps = random.randint(1,MAX_NUM_OF_STEPS)
step_values = [str(random.randint(0,STEP_RANGE)) for x in range(0,steps)]
line = TEMPLATE.format(i,steps,','.join(step_values))
print(line)
Related
So I have this program that reads lines from a file and inserts them into a list line by line (not pictured in code.) My objective is to find a specific start index indicated by a number surrounded by XML formats and find a specific end index indicated by a "/Invoice". I am able to successfully find these indexes using the start_indexes and end_indexes functions I created below.
I was informed (and experienced firsthand) the dangers of del list in loops otherwise that solution would have been perfect. Instead, I was advised to add everything I wanted to delete to a new list that I would then somehow use to delete the difference from my original list.
With that context being given, my question is as follows:
What is the best way to accomplish what I am trying to do with the def deletion_list()?
I am aware that the "lines" in lst_file are strings, and I am attempting to compare them to indexes. That's where I am stumped; I don't know a way to convert the temp variable that is a string and make it into an index so the function works as I expect, or if there is a better way to do it.
start_indexes = []
for i in str_lst:
invoice_index_start = lst_file.index('<InvoiceNumber>' + i + '</InvoiceNumber>\n')
start_indexes.append(invoice_index_start)
end_indexes = []
constant = '</Invoice>\n'
for i in range(0,len(start_indexes)):
invoice_index_end = lst_file.index(constant, start_indexes[i])
end_indexes.append(invoice_index_end + 1)
result = []
def deletion_list():
for lines in lst_file:
if lst_file[] > lst_file[invoice_index_start] and lst_file[] < lst_file[invoice_index_end]
result.append(lines)
return lst_file
I assume your list looks like similar as:
Invoice_1.xml and you would remove InvoiceNumber 2 and 4.
Input:
<?xml version="1.0" encoding="utf-8"?>
<root>
<Invoices>
<InvoiceNumber>1</InvoiceNumber>
<InvoiceNumber>2</InvoiceNumber>
<InvoiceNumber>3</InvoiceNumber>
<InvoiceNumber>4</InvoiceNumber>
</Invoices>
</root>
You can parse the input XML file and write the changed XML to Invoice_2.xml:
import xml.etree.ElementTree as ET
tree = ET.parse('Invoice_1.xml')
root = tree.getroot()
print("Original file:")
ET.dump(root)
rem_list = ['2', '4']
parent_map = {(c, p) for p in root.iter( ) for c in p}
for (c, p) in parent_map:
if c.text in rem_list:
p.remove(c)
tree.write('Invoice_2.xml', encoding='utf-8', xml_declaration=True)
tree1 = ET.parse('Invoice_2.xml')
root1 = tree1.getroot()
print("Changed file:")
ET.dump(root1)
Output:
<?xml version='1.0' encoding='utf-8'?>
<root>
<Invoices>
<InvoiceNumber>1</InvoiceNumber>
<InvoiceNumber>3</InvoiceNumber>
</Invoices>
</root>
If you want to delete items from a list, best way is to loop through a copy of original list and you can delete from the original list.
a: list = [1,2,3,4,5,6]
for item in a[:]:
if item % 2 == 0:
a.remove(item)
You can simplify your problem by using XML parsing. Refer this: XML parser-Real python
I am pretty new to Python, just finishing up a college class on it, and am also working through a book called "Head First Python", so I have a good basic understanding of the language but I need some help with a task that is a bit over my head. I have an XML file that my companies CAD software reads in to assign the correct material to a 3D solid model, in our case assigning a material name and density. However, this XML has its units in lbf.s^2/in^4 but it's been requested that they be in lbm/in^3 by one of our customers.
Can I import the XML into Python and iterate through it to change the values?
The first would be the material unit name itself, I would need to iterate through the XML and replace every instance of this:
<PropertyData property="MassDensity_0">
With this:
<PropertyData property="MassDensity_4">
Then for every instance found of MassDensity_0 I would need to multiply the density value by a conversion value to get the correct density value in the new units. As you can see below, the data values like the one you see below would need to be multiplied by a conversion factor.
<PropertyData property="MassDensity_0">
<Data format="exponential">
7.278130e-04</Data>
</PropertyData>
Does it make sense to attempt this in Python? There are over a hundred materials in this file, and editing them manually would be very tedious and time-consuming. I'm hoping Python can do the heavy lifting here.
I appreciate any assistance you can provide and thank you in advance!!!
This looks like task for built-in module xml.etree.ElementTree. After loading XML from file or string, you might alter it and save changed one to new file. It does support subset of XPath, which should allow to select elements to change, consider following simple example:
import xml.etree.ElementTree as ET
xml_string = '<?xml version="1.0"?><catalog><product id="p1"><price>100</price></product><product id="p2"><price>120</price></product><product id="p3"><price>150</price></product></catalog>'
root = ET.fromstring(xml_string) # now root is <catalog>
price3 = root.find('product[#id="p3"]/price') # get price of product with id p3
price3.text = str(int(price3.text) + 50) # .text is str, convert to int and adding 50 to it, then back to int
output = ET.tostring(root)
print(output)
output
b'<catalog><product id="p1"><price>100</price></product><product id="p2"><price>120</price></product><product id="p3"><price>200</price></product></catalog>'
Note that output is bytes and as such can be written to file open in binary mode. Consult docs for more information.
I'm sure it can and probably should be done with a special xml module. But rather for educational purposes here is the straightforward verbose Python solution:
import re
xml_in = \
'''<PropertyData property="MassDensity_0">
<Data format="exponential">
7.278130e-04
</Data>
</PropertyData>
<PropertyData property="MassDensity_0">
<Data format="exponential">
7.278130e-04
</Data>
</PropertyData>
'''
# remove spaces after ">" and before "<"
xml_in = re.sub(">\s*",">", xml_in)
xml_in = re.sub("\s*<","<", xml_in)
# split the xml by the mask
mask = "<PropertyData property=\"MassDensity_0\"><Data format=\"exponential\">"
chunks = xml_in.split(mask)
# change numbers
result = []
for chunk in chunks:
try:
splitted_chunk = chunk.split("<") # split the chunk by "<"
num = float(splitted_chunk[0]) # get the number
num *= 2 # <--- change the number
num = f"{num:e}" # get e-notation
new_chunk = "<".join([num] + splitted_chunk[1:]) # make the new chunk # make the new chunk
result.append(new_chunk) # add the new chunk to the result list
except:
result.append(chunk) # if there is no number add the chunk as is
# assembly xml back from the chunks and the mask
xml_out = mask.join(result)
# output
print(">\n<".join(xml_out.split("><")))
Output:
<PropertyData property="MassDensity_0">
<Data format="exponential">1.455626e-03</Data>
</PropertyData>
<PropertyData property="MassDensity_0">
<Data format="exponential">1.455626e-03</Data>
</PropertyData>
I'm using xml.etree.ElementTree module with Python 3.6 to create an XML file with dozens of subelements. What I'm aiming for should look like this:
<shots>
<shot id="0">
<Audio_Channels>2</Audio_Channels>
<Audio_File>testhq12.mov</Audio_File>
<Audio_Fps>Unspecified</Audio_Fps>
...
<Type>C</Type>
<Width>4096</Width>
<shot/>
<shot id="1">
....
</shots>
And so far I've been using the following code to create this structure but it gets very ugly when theres a lot of 'sub-fields' to add
_audio_channels = Element('Audio_Channels')
shot.append(_audio_channels)
_audio_channels.text = str(audio_channels_data)
_audio_file = Element('Audio_File')
shot.append(_audio_file)
_audio_file.text = str(audio_file_data)
.
.
.
And so I've tried to simplify it with a loop looking somewhat like this:
fields = ['Audio_Channels', 'Audio_File', 'Audio_Fps', ...]
for k in fields:
prop = Element(k)
shot.append(prop)
But I have no idea how to assing any text to them later on using only elements from fields list as sort of keys?
Tried this but it's not working
shot.insert(str(audio_file_data), 'Audio_File')
If I undertand correctly what you are after, try something like this:
import xml.etree.ElementTree as ET
fields = ['Audio_Channels', 'Audio_File', 'Audio_Fps']
dats = [ 2,'testhq12.mov', 'Unspecified']
shots = ET.Element('shots')
shot = ET.SubElement(shots, 'shot')
for f, d in zip(fields,dats):
elem = ET.Element(f)
elem.text=str(d)
shot.append(elem)
The output should look something like:
<shots>
<shot>
<Audio_Channels>2</Audio_Channels>
<Audio_File>testhq12.mov</Audio_File>
<Audio_Fps>Unspecified</Audio_Fps>
</shot>
</shots>
So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.
Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk
First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here
I've figured out how to get data from a single XML file into a row on a CSV. I'd like to iterate this across a number of files in a directory so that the data from each XML file is extracted to a new row on the CSV. I've done some searching and I get the gist of having to create a loop (perhaps using the OS module) but the specifics are lost on me.
This script does the extraction for a single XML file.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("[PATH/FILE.xml]")
root = tree.getroot()
test_file = open('PATH','w',newline='')
csvwriter = csv.writer(test_file)
header = []
count = 0
for trial in root.iter('[XML_ROOT]'):
item_info = []
if count == 0:
item_ID = trial.find('itemid').tag
header.append(item_ID)
data_1 = trial.find('data1').tag
header.append(data_1)
csvwriter.writerow(header)
count = count + 1
item_ID = trial.find('itemid').text
item_info.append(item_ID)
data_1 = trial.find('data1').text
trial_info.append(data_1)
csvwriter.writerow(item_info)
test_file.close()
Now I need to figure out what to do to it to iterate.
Edit:
Here is an example of an XML file i'm using. Just for testing i'm pulling out actrnumber as item_id and stage as data_1. Eventually I'll need to figure out the most sensible way to create arrays for the nested data. For instance in the outcomes node, nesting the data, probably in an array for primaryOutcome and all secondaryOutcome instances.
<?xml-stylesheet type='text/xsl' href='anzctrTransform.xsl'?>
<ANZCTR_Trial requestNumber="1">
<stage>Registered</stage>
<submitdate>6/07/2005</submitdate>
<approvaldate>7/07/2005</approvaldate>
<actrnumber>ACTRN12605000001695</actrnumber>
<trial_identification>
<studytitle>A phase II trial of gemcitabine in a fixed dose rate infusion combined with cisplatin in patients with operable biliary tract carcinomas</studytitle>
<scientifictitle>A phase II trial of gemcitabine in a fixed dose rate infusion combined with cisplatin in patients with operable biliary tract carcinomas with the primary objective tumour response</scientifictitle>
<utrn />
<trialacronym>ABC trial</trialacronym>
<secondaryid>National Clinical Trials Registry: NCTR570</secondaryid>
</trial_identification>
<conditions>
<healthcondition>Adenocarcinoma of the gallbladder or intra/extrahepatic bile ducts</healthcondition>
<conditioncode>
<conditioncode1>Cancer</conditioncode1>
<conditioncode2>Biliary tree (gall bladder and bile duct)</conditioncode2>
</conditioncode>
</conditions>
<interventions>
<interventions>Gemcitabine delivered as fixed dose-rate infusion with cisplatin</interventions>
<comparator>Single arm trial</comparator>
<control>Uncontrolled</control>
<interventioncode>Treatment: drugs</interventioncode>
</interventions>
<outcomes>
<primaryOutcome>
<outcome>Objective tumour response.</outcome>
<timepoint>Measured every 6 weeks during study treatment, and post treatment.</timepoint>
</primaryOutcome>
<secondaryOutcome>
<outcome>Tolerability and safety of treatment</outcome>
<timepoint>Prior to each cycle of treatment, and at end of treatment</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Duration of response</outcome>
<timepoint>Prior to starting every second treatment cycle, then 6 monthly for 12 months, then as clinically indicated</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Time to treatment failure</outcome>
<timepoint>Assessed at end of treatment</timepoint>
</secondaryOutcome>
...
</ANZCTR_Trial>
Simply generalize your process in a method and iterate across files with os.listdir assuming all XML files reside in same folder. And be sure to use context manager using with to better manage the open/close file process.
Also, your header parsing is redundant since you name the very tags that you extract: itemid and data1. Node names likely stay the same so can be hard-coded while text values differ, requiring parsing. Below uses list comprehension for a more streamlined collection of data within XML files and across XML files. This also separates the XML parsing and CSV writing.
# GENERALIZED METHOD
def proc_xml(xml_path):
full_path = os.path.join('/path/to/xml/folder', xml_path)
print(full_path)
tree = ET.parse(full_path)
root = tree.getroot()
item_info = [[trial.find('itemid').text, trial.find('data1').text] \
for trial in root.iter('[XML_ROOT]')][0]
return item_info
# NESTED LIST OF XML DATA PER FILE
xml_data_lst = [proc_xml(f) for f in os.listdir('/path/to/xml/folder') \
if f.endswith('.xml')]
# WRITE TO CSV FILE
with open('/path/to/final.csv', 'w', newline='') as test_file:
csvwriter = csv.writer(test_file)
# HEADERS
csvwriter.writerow(['itemid', 'data1'])
# DATA ROWS
for i in xml_data_lst:
csvwriter.writerow(i)
While .find gets you the next match, .findall should return a list of all of them. So you could do something like this:
extracted_IDs = []
item_IDs = trial.findall('itemid')
for id_tags in item_IDs:
extracted_IDs.append(id_tag.text)
Or, to do the same thing in one line:
extracted_IDs = [item.text for item in trial.findall('itemid')]
Likewise, try:
extracted_data = [item.text for item in trial.findall('data1')]
If you have an equal number of both, and if the row you want to write each time is in the form of [<itemid>,<data1>] paired sets, then you can just make a combined set like this:
combined_pairs = [(extracted_IDs[i], extracted_data[i]) for i in range(len(extracted_IDs))]