ParseError while parsing AndroidManifest.xml in python - python

I'm trying to parse an AndroiManifest.xml file to get informations and I have this error when I'm charging my file
xml.etree.ElementTree.ParseError: not well-formed (invalid token):
line 1, column 0
Here is my code :
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='AndroidManifest.xml')
root = tree.getroot()
My XML file seems well formed :
<?xml version="1.0" encoding="utf-8"?>
<manifest
xmlns:android="http://schemas.android.com/apk/res/android"
android:versionCode="132074037"
android:versionName="193.0.0.21.98"
android:installLocation="0"
package="com.facebook.orca">
How can I fix that and parse my XML to get a 'android:versionName' tag ?

Solved
I was trying to parse an AndroidManifest.xml after I've unzipped an apk but with this method, the AndroidManifest.xml is encoded so it's impossible to open, read or parse it. I was able to read it only by using Android Studio that automatically decodes an AndroidManifest file.
To parse an AndroidManifest.xml after unzipping an apk, the best way is to use aapt command line :
/Users/{Path_to_your_sdk}/sdk/build-tools/28.0.3/aapt dump
badging com.squareup.cash.apk | sed -n
"s/.*versionName='\([^']*\).*/\1/p"
And you will obtain the versionName of your app. Hope it will help.

Related

How to create XML,XSL files using python

Actually, my workflow is, I'll get some data from backend python, and I have to use that data and replicate it into an HTML page or in a pdf format as per the user's wish.
So I have created a python function in which the XML will be there and be saved automatically in our backend.
Here I'll provide my .py file where I wrote my code which generated XML code.
import xml.etree.ElementTree as xml
def GenerateXML(filename):
root = xml.Element("customers")
c1= xml.Element("customer")
root.append(c1)
type1= xml.SubElement(c1,"place")
type1.text = "UK"
Amount1 = xml.SubElement(c1,"Amount")
Amount1.text="4500"
tree=xml.ElementTree(root)
with open(filename,"wb") as f:
tree.write(f)
if __name__ == "__main__":
GenerateXML("fast.xml")
The result for this code will generate a backend file named fast.xml, which contains
#fast.xml
<customers>
<customer>
<place>uk</place>
<Amount>4500</Amount>
</customer>
</customers>
Creating an XML file is done, but attaching an XSL file to the XML is the issue,
can we do it with python, as we created .XML file
For example,
I have another XML file, which has an XSL file for it:
XML file
<?xml-stylesheet type = "text/xsl" href = "demo1.xsl"?>
<class>
<student>
<firstname>Graham</firstname>
<lastname>Bell</lastname>
<nickname>Garry</nickname>
</student>
<student>
<firstname>Albert</firstname>
<lastname>Einstein</lastname>
<nickname>Ally</nickname>
</student>
<student>
<firstname>Thomas</firstname>
<lastname>Edison</lastname>
<nickname>Eddy</nickname>
</student>
</class>
IN HTML it looks with a tabular form and with background colors.
but how to do it with an automated XML file
can anyone provide me a solution for this?
Thanks in Advance

How to write an xml file using libxml2 in python?

I am attempting to write an xml file in python3 using libxml2. I cannot find any relevant documentation regarding python about writing files with libxml. When I attempt to write an xml file parsed with libxml2 I get the error:
xmlDoc has no attribute write
Anyone here done this before? I can get it to work in Etree just fine but Etree will not respect the attribute order that I need.
You can use saveFile() or saveFileEnc(). Example:
import libxml2
XML = """
<root a="1" b="2">XYZ</root>
"""
doc = libxml2.parseDoc(XML)
doc.saveFile("test.xml")
doc.saveFileEnc("test2.xml", "UTF-8")
I could not find any good documentation for the Python API. Here is the corresponding C documentation: http://xmlsoft.org/html/libxml-tree.html#xmlSaveFile.
import libxml2
DOC = """<?xml version="1.0" encoding="UTF-8"?>
<verse>
<attribution>Christopher Okibgo</attribution>
<line>For he was a shrub among the poplars,</line>
<line>Needing more roots</line>
<line>More sap to grow to sunlight,</line>
<line>Thirsting for sunlight</line>
</verse>
"""
doc = libxml2.parseDoc(DOC)
root = doc.children
print root

How can i parse a xml file in python

I have an xml file(basically the file is a jenkins slave config.xml file) from where i have to get certain values.
So, i tried parsing the xml file using Element Tree something like this
tree = ET.parse(config.xml)
root = tree.getroot()
print root
for item in root.findall('slave'):
and then i am saving this parsed xml file in a text file, now i want to get the value within this tag
I can do it through bash but i want to know how can we do this in python
Here goes the bash code
cat test.xml | sed -n 's:.*<label>\(.*\)</label>.*:\1:p'
Here is a sample jenkins slave config.xml file
<slave>
<name>some_name</name>
<description/>
<remoteFS>some_value</remoteFS>
<numExecutors>xx</numExecutors>
<mode>EXCLUSIVE</mode>
<retentionStrategy class="xxxx"/>
<launcher class="xxxxx" plugin="xxxxx">
<host>xxx.x.x.xx</host>
<port>xx</port>
<credentialsId>xxxxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx</credentialsId>
<maxNumRetries>0</maxNumRetries>
<retryWaitTime>0</retryWaitTime>
<sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy/></launcher>
<label>some_label</label>
</slave>
Similarly as label i need other values as well such as hostname, port etc.
You can iterate recursively using .iter() to find elements. Check the official documentation.
Here is an example to print the label and host text from slave node.
Update: The code.py is modified to additionally print class attribute value of launcher tag. It uses element.attrib to get the attributes of a tag. More can be found in official documentation of parsing XML.
test.xml:
<slave>
<name>some_name</name>
<description/>
<remoteFS>some_value</remoteFS>
<numExecutors>xx</numExecutors>
<mode>xxx</mode>
<retentionStrategy class="xxxx"/>
<launcher class="xxxxx" plugin="xxxxx">
<host>xxx.x.x.xx</host>
<port>xx</port>
<credentialsId>xxxxxxxx</credentialsId>
<maxNumRetries>x</maxNumRetries>
<retryWaitTime>x</retryWaitTime>
<sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy"/>
</launcher>
<label>somelabel</label>
</slave>
code.py:
import xml.etree.ElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for item in root.iter('slave'):
for label in item.iter("label"):
print label.text
for host in item.iter("host"):
print host.text
for launcher in item.iter("launcher"):
print launcher.attrib["class"]
Output:
somelabel
xxx.x.x.xx
xxxxx

parse xml file when element contains smth. special with python

i would like to parse an XML file and write some parts into a csv file. I will do it with python. I am pretty new to programming and XML. I read a lot, but i couldn't found a useful example for my problem.
My XML file looks like this:
<Host name="1.1.1.1">
<Properties>
<tag name="id">1</tag>
<tag name="os">windows</tag>
<tag name="ip">1.11.111.1</tag>
</Properties>
<Report id="123">
<output>
Host is configured to get updates from another server.
Update status:
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32
Automatic settings:.....
</output>
</Report>
<Report id="123">
<output>
Host is configured to get updates from another server.
Environment Options:
Automatic settings:.....
</output>
</Report>
</Host>
My XML file contains 500 of this entries! I just want to parse XML blocks where the output contains Update status, because i want to write the 3 dates (last detected, last downloaded and last installed in my CSV file. I would also add the id, os and ip.
I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status. For the moment i am able to extract all text and attributes from the whole file but i am not able to filter blocks where my output contains Update status, last detected, last downloaded or last installed.
Can anyone give some advice how to achieve this?
desired output:
id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed:2015-11-23 01:05:32
all of this infos written in a .csv file
At the moment my code looks like this:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("file.xml")
root = tree.getroot()
# open csv file for writing
data = open('test.csv', 'w')
# create csv writer object
csvwriter = csv.writer(data)
# filter xml file
for tag in root.findall(".Host/Properties/tag[#name='ip']"):print(tag.text) # gives all ip's from whole xml
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()
Best regards
It's relatively straightforward when you start at the <Host> element and work your way down.
Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:
for host in tree.iter("Host"):
host_id = host.find('./Properties/tag[#name="id"]')
host_os = host.find('./Properties/tag[#name="os"]')
host_ip = host.find('./Properties/tag[#name="ip"]')
for output in host.iter("output"):
if output.text is not None and "Update status:" in output.text:
print("id:" + host_id.text)
print("os:" + host_os.text)
print("ip:" + host_ip.text)
for line in output.text.splitlines():
if ("last detected:" in line or
"last downloaded" in line or
"last installed" in line):
print(line.strip())
outputs this for your sample XML:
id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32
Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

How to remove all " \n" in xml payload by using lxml library

I'm trying to change a text value in xml file, and I need to return the updated xml content by using lxml library. I can able to successfully update the value, but the updated xml file contains "\n"(next line) character as below.
Output:
<?xml version='1.0' encoding='ASCII'?>\n<Order>\n <content>\n <sID>123</sID>\n <spNumber>UserTemp</spNumber>\n <client>ARRCHANA</client>\n <orderType>Dashboard</orderType>\n </content>\n
<content>\n <sID>111</sID>\n <spNumber>UserTemp</spNumber>\n <client>ARRCHANA</client>\n <orderType>Dashboard</orderType>\n </content>\n
</Order>
Note: I didn't format the above xml output, and posted it how exactly I get it from output console.
Input:
<Order>
<content>
<sID>123</sID>
<spNumber>UserTemp</spNumber>
<client>WALLMART</client>
<orderType>Dashboard</orderType>
</content>
<content>
<sID>111</sID>
<spNumber>UserTemp</spNumber>
<client>D&B</client>
<orderType>Dashboard</orderType>
</content>
</Order>
Also, I tried to remove the \n character in output xml file by using
getValue = getValue.replace('\n','')
but, no luck.
The below code I used to update the xml( tag), and tried to return the updated xml content back.
Python Code:
from lxml import etree
from io import StringIO
import six
import numpy
def getListOfNodes(location):
f = open(location)
xml = f.read()
f.close()
#print(xml)
getXml = etree.parse(location)
for elm in getXml.xpath('.//Order//content/client'):
index='ARRCHANA'
elm.text=index
#with open('C:\\New folder\\temp.xml','w',newline='\r\n') as writeFile:
#writeFile.write(str(etree.tostring(getXml,pretty_print=True, xml_declaration=True)))
getValue=str((etree.tostring(getXml,pretty_print=True, xml_declaration=True)))
#getValue = getValue.replace('\n','')
#getValue=getValue.replace("\n","<br/>")
print(getValue)
return getValue
When I'm trying to open the response payload through firefox browser, then It says the below error message:
XML Parsing Error: no element found Location:
file:///C:/New%20folder/Confidential.xml
Line Number 1, Column 1:
It says that "no element found location in Line Number 1, column 1" in xml file when it found "\n" character in it.
Can somebody assist me the better way to update the text value, and return it back without any additional characters.
It's fixed by myself by using the below script:
code = root.xpath('.//Order//content/client')
if code:
code[0].text = 'ARRCHANA'
etree.ElementTree(root).write('D:\test.xml', pretty_print=True)

Categories