python : appending new data in xml is overring existing data - python

i want to add entire tag to xml, below is my XML format.
<?xml version="1.0" encoding="UTF-8"?>
<ca st="true" name="XMLConfig">
<app>
<!--- I want to add entire commneted tag to XML . !
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar> -->
<roll name="roll" fN="file.log" fP="logs.gz">
<ly type="ptl">
<pt>value</pt>
</ly>
<po>
<!-- Comment /> -->
<si size="100 MB" />
<!-- Comment /> -->
</po>
<de fI="max" max="10"/>
</roll>
</app>
as shown in above file i want to add this tag in file
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar>
this is where i reached so far..
for appenders in tree.xpath('//Appenders'):
if appenders.getchildren():
appenders.remove(appenders.getchildren()[0])
appenders.insert(0, appenders.getparent().append(etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></Layout></ar>')))
this is removing all other content after new content.
any help will be appreciated.!

In my opinion the first way you did it is way better. You just made some mistakes in your insert line, it should be this:
appenders.insert(0, etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></ly></ar>')))
I'm surprised it didn't throw an error for you because your insert line is basically this:
appenders.insert(0,None)
Also I noticed you do something in all of your questions:
You leave out some line(s) of your xml file. (I mean why?)
You shorten the tag names in your xml but you keep their long version in the code, which is kind of annoying because the person who wants to answer you have to change the code again to see if it is working.

I got it working, !
for apps in tree.xpath('//app'):
if appenders.tag == 'app':
appenders.insert(0, etree.SubElement(appenders, 'ar', ty="Co", name="st"))
for appender in tree.xpath('//ar'):
appender.insert(0, etree.SubElement(appender, 'ly', ty="pt"))
for layout in tree.xpath('//ly'):
layout.insert(0, etree.SubElement(layout, 'pt'))
for pattern in tree.xpath('//pt'):
pattern.text = 'value'
tree.write(r'C:\value.xml', xml_declaration=True, encoding='UTF-8')
if anyone has better way to do this .. please let me know to so i can improve on this .!

Related

Tika python does not preserve the order of texts in pdf

I am using tika-python to extract text from pdf. But when there are multiple table in a pdf page, the order of the text is not preserved. In my case the table at the top of the page comes at the end when extracted through tika.
I tried using following custom config file. But it is not working. I have tried keeping the statement <property name="sortByPosition" value="True"/> at various positions. But nothing has worked. I referred this for the config.xml.
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<!-- Default Parser for most things, except for 2 mime types, and never
use the Executable Parser -->
<parser class="org.apache.tika.parser.DefaultParser">
<mime-exclude>image/jpeg</mime-exclude>
<mime-exclude>application/pdf</mime-exclude>
<parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/>
<!-- property name="sortByPosition" value="True" -->
</parser>
<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/pdf</mime>
<!-- here? -->
<property name="sortByPosition" value="True"/> # this statement is for preserving the order
</parser>
</parsers>
</properties>
and the following command to read the text:
from tika import parser
data = parser.from_file(file_path, xmlContent=True,
config_path=/path/to/'tika_config.xml')
What I am doing wrong or what is the way to change the config or preserving order is not possible?

Python / XML closing tag error - unable to parse

the following code gives me the python error 'failed to parse' addon.xml:
(I've used an online checker and it says "error on line 33 at column 15: Opening and ending tag mismatch: description line 0 and extension" - which is the very end of the /extension end tag at the end of the document).
Any advice would be appreciated. This worked yesterday and I have no idea why it's not working at all
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<addon id="plugin.audio.criminalpodcast" name="Criminal Podcast" version="1.1.0" provider-name="leopheard">
<requires>
<import addon="xbmc.python" version="2.1.0"/>
<import addon="script.module.xbmcswift2" version="2.4.0"/>
<import addon="script.module.beautifulsoup4" version="4.3.1"/>
<import addon="script.module.requests" version="1.1.0"/>
<import addon="script.module.routing" version="0.2.0"/> </requires>
<provides>audio</provides> </extension>
<extension point="xbmc.addon.metadata">
<platform>all</platform>
<language></language>
<summary lang="en"></summary>
<description lang="en">description </description>
<license>The MIT License (MIT)</license>
<forum>https://forum.kodi.tv/showthread.php?tid=344790</forum>
<email>leopheard#gmail.com</email>
<source>https://github.com/leopheard/criminalpodcast</source>
<website>http://www.thisiscriminal.com</website>
<audio_guide></audio_guide>
<assets>
<icon>icon.png</icon>
<fanart>fanart.jpg</fanart>
<screenshot>resources/media/Criminal_SocialShare_2.png</screenshot>
<screenshot>resources/media/Criminal_SocialShare_3.png</screenshot>
<screenshot>resources/media/Radiotopia-logo.png</screenshot>
</assets>
Your "XML" file is not well-formed, so it cannot be parsed. Find out how it was created, correct the process so the problem does not occur again, and then regenerate the file.
Files that are vaguely XML-like but not well-formed are pretty well useless. Repair is sometimes possible if the errors are very systematic, but that doesn't appear to the the case here.
Most of the time a "failed to parse" error msg is due to the XML File itself.
Check you're XML File for the correct formatting.
I once forgot the root tag and had the same error message.

How to replace xml lines using 'if statements' in python?

Hi I'm new to xml files in general, but I am trying to replace specific lines in a xml file using 'if statements' in python 3.6. I've been looking at suggestions to use ElementTree, but none of the posts online quite fit the problem I have, so here I am.
My file is as followed:
<?xml version="1.0" encoding="UTF-8"?>
-<StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/MyObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>
I want to replace
url value="http://example.org/fhir/StructureDefinition/MyObservation"/
to something like
url value="http://example.org/fhir/StructureDefinition/NewObservation"/
by using conditional statements - because these are repeated multiple times in other files.
I have tried for-looping through the xml find to find the exact string match (which I've succeeded), but I wasn't able to delete, or replace the line (probably having to do with the fact that this isn't a .txt file).
Any help is greatly appreciated!
Your sample file contains a "-"-token in ln 3 that may be overlooked when copy/pasting in order to find a solution.
Input File
<?xml version="1.0" encoding="UTF-8"?>
<StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/MyObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>
Script
from xml.dom.minidom import parse # use minidom for this task
dom = parse('june.xml') #read in your file
search = "http://example.org/fhir/StructureDefinition/MyObservation" #set search value
replace = "http://example.org/fhir/StructureDefinition/NewObservation" #set replace value
res = dom.getElementsByTagName('url') #iterate over url tags
for element in res:
if element.getAttribute('value') == search: #in case of match
element.setAttribute('value', replace) #replace
with open('june_updated.xml', 'w') as f:
f.write(dom.toxml()) #update the dom, save as new xml file
Output file
<?xml version="1.0" ?><StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/NewObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>

Blank XML Namespace processing With Python

I am trying to parse a XML using python ,xml example snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<raml xmlns="raml21.xsd" version="2.1">
<series xmlns="" scope="USA" name="Arizona">
<header>
<log action="created"/>
</header>
<x_ns color="Blue">
<p name="timeZone">(GMT-10)</p>
</x_ns>
<x_ns color="Red">
<p name="AvgHeight">175</p>
</x_ns>
<x_ns color="black">
<p name="AvgWeight">235</p>
</x_ns>
the problem is namespaces keeps changing so as an alternative I tried to read the xmlns string first then create a dicionary using namespaces using the below code
root = raw_xml.getroot()
namespace_temp1=root.tag.split("}")
namespace_temp2=namespace_temp1[0].strip('{')
namespaces_auto={}
tag_name =["x","y","z","w","v"]
ns_name=[namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2]
namespace_temp3=zip(tag_name,ns_name)
for tag,ns in namespace_temp3:
namespaces_auto[tag]=ns
namespaces=namespaces_auto
to access a particular tag with namespace I am using the code as follows
for data in raw_xml.findall('x:x_ns',namespaces)
this pretty much solves the problem but gets stuck when the child node has blank xmlns as seen in the series tag (xmlns=""). Not Sure how to incorporate it in the code to check this condition.

xmlrpclib error when publishing to a Wordpress blog

some weeks ago I programmed a Python script which posted some content to a Wordpress blog, but since the past week it stopped working (I haven't changed anything) and now when I run the script I get this error:
File "C:\Python27\lib\xmlrpclib.py", line 557, in feed
self._parser.Parse(data, 0)
ExpatError: junk after document element: line 2, column 0
The function I use to post the desired content to Wordpress is this:
post_id = server.wp.newPost(blog_id, user, passw, content)
and it used to work since it started crashing for (apparently) no reason.
¿Do you now what could be the cause of this error?¿Might my Wordpress have been infected (I've checked it)?
Thanks, if you need more code to check something I'll post it, and sorry for my poor English.
Important Edit:
I didn't mention that before, but the script works perfect with other wordpress blogs, it only crashes when I try to post to the WP blog I made the script for (that's why I think it the site may be infected).
When the code works the variable data used in self._parser.Parse(data, 0) has this content:
<?xml version="1.0"?>
<methodResponse>
<params>
<param>
<value>
<string>90</string>
</value>
</param>
</params>
</methodResponse>
<?xml version="1.0"?>
<methodResponse>
<params>
<param>
<value>
<int>90</int>
</value>
</param>
</params>
</methodResponse>
Edit:
data variable is used by the library, I don't know what it should contain but debuggin i've found that when the script crashes it has this content:
<br />
<b>Warning</b>: strpos() [<a href='function.strpos'>function.strpos</a>]: Empty delimiter in <b>/PATH/wp-includes/class-wp-xmlrpc-server.php</b> on line <b>3954</b><br />
<br />
<b>Warning</b>: Cannot modify header information - headers already sent by (output started at /PATH/wp-includes/class-wp-xmlrpc-server.php:3954) in <b>/PATH/wp-includes/class-IXR.php</b> on line <b>471</b><br />
<br />
<b>Warning</b>: Cannot modify header information - headers already sent by (output started at /PATH/wp-includes/class-wp-xmlrpc-server.php:3954) in <b>/PATH/wp-includes/class-IXR.php</b> on line <b>472</b><br />
<br />
<b>Warning</b>: Cannot modify header information - headers already sent by (output started at /PATH/wp-includes/class-wp-xmlrpc-server.php:3954) in <b>/ANOTHER_PATH/public_ht
As I said i don't know what should 'data' contain, when the code worked I never checked what it's content was.

Categories