Python lxml and xslt issue

Python lxml and xslt issue - python

I have some problem with lxml and python.
I have this code:
import lxml.etree as ET
xml_dom = ET.parse(xml_path)
xslt_dom = ET.parse(xslt_path)
print('transforming...')
transform = ET.XSLT(xslt_dom)
print('transformed: ', transform)
parsed_xml = transform(xml_dom)
print('all good!')
On my local environment, all works good (python 3.6.5 on a virtualenv with lxml 3.6.0).
The problem is, i have this code on a Centos 7 server, with the exact same specs (Python 3.6.5 and lxml 3.6.0), if i execute it from command line, all is good, when i put this code inside a Django (2.0) project, it "freeze" on this part:
transform = ET.XSLT(xslt_dom)
No exceptions, no errors, nothing. The print below that line never executes.
I changed permissions of the files, to apache group, set read permissions, and nothig works.
The weird thing is, from console works nice, from "apache + Django", don't.
Any suggestion?
Thanks.

Related

Why does this Python file (which parses XML and validates) work on Windows but not on Mac OS?

I've been helping some students with a Python class this fall, and I've noticed that several students using Mac OS to complete a particular assignment involving validating XML against XSD's are running into OS Errors ("Error reading file: failed to load eternal entity [the xml file]"). Their code works perfectly fine on my end (Windows) but on Mac OS it refuses to.
Here is the code that is causing the problem:
from lxml import etree
xmlschema_doc = etree.parse("the_xsd.xsd")
xmlschema = etree.XMLSchema(xmlschema_doc)
doc = etree.parse("the_xml.xml")
print(xmlschema.validate(doc))
In particular, the line doc = etree.parse("the_xml.xml") is where the error occurs.
I've made sure the students 1) have all their files (XML, Python, XSD) in the same folder, 2) I've suggested they use the full filepaths and 3) I found this bit of code and suggested they try it (to no avail):
prog_dir = os.path.abspath(os.path.dirname(__file__))
os.chdir(prog_dir)
Again: the XML validates against the XSD on Windows just fine, but on their Macs they get the error.
Any insights would be much appreciated.

Memory error when using androguard module in Yara Rules

I tried installing Yara 3.8.1 with androguard module. During the installation, I faced this issue, so I applied the patch given by #reox to the androguard.c file and it solved the problem. After that I tried a simple Yara rule with import "androguard" using command-line and it worked perfectly. Then I tried to use Yara rules inside my python app so I installed yara-python and used it in this way:
import yara
dex_path = './classes.dex'
my_rule = './rule.yar'
json_data = load_json_data()
rule = yara.compile(my_rule)
matches = rule.match(filepath=dex_path, modules_data={'androguard': json_data})
print(matches)
The match function works good when using Yara rules without import "androguard" module but when I want to apply a rule which imports androguard, the match function gives an error :
yara.Error: could not map file "./classes.dex" into memory
I'm applying a simple rule to an small file, in order of KB. I think that the problem is with the androguard module since when I remove the import "androguard", it works correctly. Any idea?

I had the same mistake with androguard, I solve the problem installing yara-python in the version 3.8.0
https://github.com/VirusTotal/yara-python/releases/tag/v3.8.0

Running python cgi script Interpreter results differ to browser

I was having difficulty converting a program I made to a cgi script. I suspected it was to do with os.walk so I made a smaller test script to test this.
(I noticed the single \ before the D in the variable loc and tried changing that to a double \ still no change)
Produces no errors cant tell why it doesn't run the for loop with os.walk in the browser.
I tried adding some data into s and run for loop printing of contents of it and that worked fine, but trying to do it on os.walk I can't seem to get it to work. I can't find anything relating to the issue on google or stackoverflow.
Below is the code:
import cgi,cgitb,os
loc = "C:\\Users\\wen\Desktop\\sample data\\old py stuff\\"
cgitb.enable(display=1,logdir=loc)
s = []
print("Content-type:text/html\r\n\r\n")
print("<html>")
print("<body>")
print("<p>"+loc+"</p>")
for r,ds,fs in os.walk(loc):
print("<p>omgwtf</p>")
for f in fs:
s.append(f)
for i in s:
print("<p>"+i+"</p>")
print("</body>")
print("</html>")
Took a screenshot, the output in interpreter on the left and browser on right
i.imgur.com/136y1Yq.jpg
webserver is running iis7

I'm pretty sure I've solved the problem, I needed to give the folders permissions for 'Authenticated users'.

Pointing to source file from IDLE editor in python

I'm working from the book called "Building Machine Learning Systems with Python". I've downloaded some data from MLComp to use as a training set. The file I downloaded (20news-18828) is currently in /Users/jones/testingdocuments/379
The book instructs code as follows:
import sklearn.datasets
MLCOMP_DIR = r"D:\data"
data = sklearn.datasets.load_mlcomp("20news-18828", mlcomp_root=MLCOMP_DIR)
print(data.filenames)
I've tried changing MLCOMP_DIR = /Users/jones/testingdocuments/379 and various combinations thereof, but cannot seem to get to the file.
What am I doing wrong?

MLCOMP_DIR = r"D:\data"
implies that you are on Windows, while
MLCOMP_DIR = /Users/jones/testingdocuments/379
will not work, but with quotes added, implies that you are not on Windows, unless you left of 'C:'. If you run the program from a console (or start Idle from a console with python -m idlelib, you should see some error message.

ElementTree SyntaxError: expected path separator ([)

I've searched extensively for the past few days and can't seem to find what I'm looking for. I've written a script using Python 2.7.3 and ElementTree to parse an XML file and edit an attribute buried deep within the XML file. The script works fine. I had a meeting late last week with the customer who informed me the target platform will be CentOS. I thought, no problem. To test on the anticipated platform I created a CentOS VMWare client and much to my surprise my script crapped the bed, giving me the error message, "SyntaxError: expected path separator ([)" In the course of my researching the nature of this error message I learned that CentOS 6.4 supports Python 2.6.6, which contains an older version of ElementTree that does not have support for searching for attributes [#attribute] syntax.
This customer won't upgrade Python on the platform, nor will they install additional libraries, so lxml is not an option for me. My question is, can I somehow still access the buried attribute and edit it without the ElementTree support for the [#attribute] facilities?
Here's an example of the kind of XML I'm dealing with:
`
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<my-gui>
<vehicles>
<car vendor="Ford"/>
</vehicles>
<options>
<line transmission='manual'/>
</options>
<title>Dealership</title>
<choice id='manual' title="Dealership">
<pkg-deal id='manual' auth='manager'>.</pkg-deal>
</choice>
<choice id='manual' title='Dealership'/>
<choice id='manual' DealerLocation='Dealer_Loc'/>
<choices-outline color='color_choice'>
<line choice='blue'/>
</choices-outline>
<choice id='cars' GroupID='convertables'>
<pkg-deal id='model.Taurus' version="SEL" arguments='LeatherInterior' enabled='XMRadio'>Taurus</pkg-deal>
<pkg-deal id='model.Mustang' version="GT" enabled='SIRIUSRadio'>Mustang</pkg-deal>
<pkg-deal id='model.Focus' version="SE" enabled='XMRadio'>Focus</pkg-deal>
<pkg-deal id='model.Fairlane'>Fairlane</pkg-deal>
<pkg-deal id='model.Fusion' version="SE" arguments='ClothInerior'>Fusion</pkg-deal>
<pkg-deal id='model.Fiesta' version="S Hatch" enabled="SIRIUSRadio">Fiesta</pkg-deal>
</choice>
</my-gui>
`
Here's a snippet of the successful Python 2.7.3 code that breaks under Python 2.6.6:
if self.root.iterfind('pkg-deal'):
self.deal = self.root.find('.//pkg-deal[#id="model.fusion"]')
self.arg = str(self.deal.get('arguments'))
if self.arg.find('with Scotchguard=') > 0:
QtGui.QMessageBox.information(self, 'DealerAssist', 'The selected car is already updated. Nothing to do.')
self.leave()
self.deal.set('arguments', self.arg + ' with Scotchguard')
...
...
Is there a way I can modify the first line of this 'if' statement block that will allow me to edit the 'arguments' attribute of the Fusion element? Or am I relegated to implementing libxml2, which promises to be a real pain?...
Thanks.

This may be side-stepping the question, but you could just try copying-and-pasting the version of ElementTree from Python 2.7, renaming it to avoid conflicting with the standard library, and importing and using that.
However, since ElementTree isn't meant to be used as a standalone file, what you need to do is navigate to C:\Python27\Lib\xml and copy the entire etree folder and import ElementTree by doing import etree.ElementTree inside your script.
To avoid accidentally importing or using the version of ElementTree from Python 2.6, you should probably rename the etree folder, its contents, delete the .pyc files, and fix the imports inside the file to reference the Python 2.7 version.

This same problem was solved by another user here.
This user filtered the attribute manually in Python 2.6. I'm posting their code example here even though the example pertains specifically to the asker's code:
def final_xml(self,username):
users = self.root.findall("user")
for user in users:
if user.attrib.get('username') == 'user1':
break
else:
raise ValueError('No such user')
# `user` is now set to the correct element
self.root.remove(user)
print user
tree = ET.ElementTree(self.root)
tree.write("msl.xml")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python lxml and xslt issue - python

Related

Why does this Python file (which parses XML and validates) work on Windows but not on Mac OS?

Memory error when using androguard module in Yara Rules

Running python cgi script Interpreter results differ to browser

Pointing to source file from IDLE editor in python

ElementTree SyntaxError: expected path separator ([)

Categories

Resources