Right way to use lxml to generate xml in python

Right way to use lxml to generate xml in python - python

I'm trying to create xml that corresponds to structure of directory (with subdirectories and files in that subdirs). When I try to use this example: Best way to generate xml? instead of output from example, that is:
<root>
<child/>
<child>some text</child>
</root>
I've got:
b'<root>\n <child/>\n <child>some text</child>\n</root>\n'
Why is it so?
Uses PyCharm IDE if it matters.

This is a change in Python 3. etree.tostring() is returning a bytes literal, which is denoted by b'string value'. Note that whenever you see \n in the bytes literal it means the same as a new line if you print it out to a file.
You can turn this into a regular string using s = s.decode('utf-8').

'\n' is a Line feed char (LF).
When you, for instance, save your output to a file and open it in some editor, you'll get what you expect.
Your output is fine.

Related

how to use python-gitlab to upload file with newline?

I'm trying to use python-gitlab projects.files.create to upload a string content to gitlab.
The string contains '\n' which I'd like it to be the real newline char in the gitlab file, but it'd just write '\n' as a string to the file, so after uploading, the file just contains one line.
I'm not sure how and at what point should I fix this, I'd like the file content to be as if I print the string using print() in python.
Thanks for your help.
EDIT---
Sorry, I'm using python 3.7 and the string is actually a csv content, so it's basically like:
',col1,col2\n1,data1,data2\n'
So when I upload it the gitlab file I want it to be:
,col1,col2
1,data1,data2

I figured out by saving the string to a file and read it again, this way the \n in the string will be translated to the actual newline char.
I'm not sure if there's other of doing this but just for someone that encounters a similar situation.

Write to an HTML file with Python

I have a couple of graphs I need to display in my browser offline, MPLD3 outputs the html as a string and I need to be able to make an html file containing that string. What I'm doing right now is:
tohtml = mpld3.fig_to_html(fig, mpld3_url='/home/pi/webpage/mpld3.js',
d3_url='/home/pi/webpage/d3.js')
print(tohtml)
Html_file = open("graph.html","w")
Html_file.write(tohtml)
Html_file.close();
tohtml is the variable where the HTML string is stored. I've printed this string to the terminal and then pasted it into an empty HTML file and I get my desired result. However, when I run my code, I get an empty file named graph.html

It seems like you may be reinventing the wheel here. Have you tried something like,
mpld3_url='/home/pi/webpage/mpld3.js'
d3_url='/home/pi/webpage/d3.js'
with open('graph.html', 'w') as fileobj:
mpld3.save_html(fig, fileobj, d3_url=d3_url, mpld3_url=mpld3_url)
Note, this is untested just going off of mpld3.save_html documentation and using prior knowledge about Python IO Streams

Is it possible to call xsl Apache FOP without providing an input file but instead passing a string

I am trying to generate a PDF using FOP. To do this I am taking in a template file, initialling its values with Jinja2 and then passing it through to fop with a system call.
Is it possible to do a subprocess call to FOP without passing through an input file but instead a string containing the XML directly? And if so how would I go about doing so?
I was hoping for something like this
fop -fo "XML here" -pdf output.pdf

Yes actually it was possible.
Using python I was able to import the xml from the file into lxml.etree:
tree = etree.parse('FOP_PARENT.fo.xml')
And then by using the etree to parse the include tags:
tree.xinclude()
Then it was a simple case of converting the xml back into unicode:
xml = etree.tounicode(tree)
This is how I got the templates to work. Hopefully this helps someone who has the same issue!

Cannot read in files

I have a small problem with reading in my file. My code:
import csv as csv
import numpy
with open("train_data.csv","rb") as training:
csv_file_object = csv.reader(training)
header = csv_file_object.next()
data = []
for row in csv_file_object:
data.append(row)
data = numpy.array(data)
I get the error no such file "train_data.csv", so I know the problem lies with the location. But whenever I specify the pad like this: open("C:\Desktop...etc) it doesn't work either. What am I doing wrong?

If you give the full file path, your script should work. Since it is not, it must be that you have escape characters in your path. To fix this, use a raw-string to specify the file path:
# Put an 'r' at the start of the string to make it a raw-string.
with open(r"C:\path\to\file\train_data.csv","rb") as training:
Raw strings do not process escape characters.
Also, just a technical fact, not giving the full file path causes Python to look for the file in the directory that the script is launched from. If it is not there, an error is thrown.

When you use open() and Windows you need to deal with the backslashes properly.
Option 1.) Use the raw string, this will be the string prefixed with an r.
open(r'C:\Users\Me\Desktop\train_data.csv')
Option 2.) Escape the backslashes
open('C:\\Users\\Me\\Desktop\\train_data.csv')
Option 3.) Use forward slashes
open('C:/Users/Me/Desktop/train_data.csv')
As for finding the file you are using, if you just do open('train_data.csv') it is looking in the directory you are running the python script from. So, if you are running it from C:\Users\Me\Desktop\, your train_data.csv needs to be on the desktop as well.

How can I say a file is SVG without using a magic number?

An SVG file is basically an XML file so I could use the string <?xml (or the hex representation: '3c 3f 78 6d 6c') as a magic number but there are a few opposing reason not to do that if for example there are extra white-spaces it could break this check.
The other images I need/expect to check are all binaries and have magic numbers. How can I fast check if the file is an SVG format without using the extension eventually using Python?

XML is not required to start with the <?xml preamble, so testing for that prefix is not a good detection technique — not to mention that it would identify every XML as SVG. A decent detection, and really easy to implement, is to use a real XML parser to test that the file is well-formed XML that contains the svg top-level element:
import xml.etree.cElementTree as et
def is_svg(filename):
tag = None
with open(filename, "r") as f:
try:
for event, el in et.iterparse(f, ('start',)):
tag = el.tag
break
except et.ParseError:
pass
return tag == '{http://www.w3.org/2000/svg}svg'
Using cElementTree ensures that the detection is efficient through the use of expat; timeit shows that an SVG file was detected as such in ~200μs, and a non-SVG in 35μs. The iterparse API enables the parser to forego creating the whole element tree (module name notwithstanding) and only read the initial portion of the document, regardless of total file size.

You could try reading the beginning of the file as binary - if you can't find any magic numbers, you read it as a text file and match to any textual patterns you wish. Or vice-versa.

This is from man file (here), for the unix file command:
The magic tests are used to check for files with data in particular fixed formats. The canonical example of this is a binary executable ... These files have a “magic number” stored in a particular place near the beginning of the file that tells the UNIX operating system that the file is a binary executable, and which of several types thereof. The concept of a “magic” has been applied by extension to data files. Any file with some invariant identifier at a small fixed offset into the file can usually be described in this way. ...
(my emphasis)
And here's one example of the "magic" that the file command uses to identify an svg file (see source for more):
...
0 string \<?xml\ version=
>14 regex ['"\ \t]*[0-9.]+['"\ \t]*
>>19 search/4096 \<svg SVG Scalable Vector Graphics image
...
0 string \<svg SVG Scalable Vector Graphics image
...
As described by man magic, each line follows the format <offset> <type> <test> <message>.
If I understand correctly, the code above looks for the literal "<?xml version=". If that is found, it looks for a version number, as described by the regular expression. If that is found, it searches the next 4096 bytes until it finds the literal "<svg". If any of this fails, it looks for the literal "<svg" at the start of the file, and so on.
Something similar could be implemented in Python.
Note there's also python-magic, which provides an interface to libmagic, as used by the unix file command.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Right way to use lxml to generate xml in python - python

This is a change in Python 3. etree.tostring() is returning a bytes literal, which is denoted by b'string value'. Note that whenever you see \n in the bytes literal it means the same as a new line if you print it out to a file. You can turn this into a regular string using s = s.decode('utf-8').

'\n' is a Line feed char (LF). When you, for instance, save your output to a file and open it in some editor, you'll get what you expect. Your output is fine.

Related

how to use python-gitlab to upload file with newline?

Write to an HTML file with Python

Is it possible to call xsl Apache FOP without providing an input file but instead passing a string

Cannot read in files

How can I say a file is SVG without using a magic number?

Categories

Resources