Setting Font Attributes Using Python-Docx - python

I am creating a word document programmatically using the Python-docx module.
I want to be able to center my headers, turn certain words to bold in a table I create, and do other basic mark up.
Unfortunately, reading over the source code in the module doesn't give me much of a lead on doing this.
I'm guessing it has something to do with the lxml/etree module that the docx code is based upon, but I don't have much familiarity with that library. Any ideas?

The link above points to the legacy repository for python-docx. The new one (v0.3.0 and later) is a complete rewrite and is located here: https://github.com/python-openxml/python-docx
All the features listed above are available in the current version.
The documentation is here: https://python-docx.readthedocs.org/en/latest/
Only bug fixes are being done on the legacy version, to support projects that still use it.
The python-docx SO tag is monitored and questions tagged with that usually get answered same day now.

Related

rule of thumb to group/split your own functions/classes into modules

Context
I write my own library for data analysis purpose. It includes classes to import data from the server, procedures to clean, analyze and display results. It also includes functions to compare results.
Concerns
I put all these in a single module and import it when I do a new project.
I put any newly-developed classes/functions in the same file.
I have concerns that my module becomes longer and harder to browse and explain.
Questions
I started Python six months ago and want to know common practices:
How do you group your function/classes and put them into separated files?
By Purpose? By project? By class/function?
Or you are not doing it at all?
In general how many lines of code in a single module?
What's the way to track the dependency among your own libraries?
Feel free to suggest any thoughts.
I believe the best way to answer this question is to look at what the leaders in this field are doing. There is a very healthy eco-system of modules available on pypi whose authors have wrestled with this question. Take a look at some of the modules you use frequently and thus are already installed on your system. Or better yet, many of those modules have their development versions hosted on GitHub (The pypi page usually has a pointer). Go there and look around.

reading coreproperties keywords from docx file with python-docx

From the script here I see how to set document keywords with the coreproperties function of python-docx. I want to look at the keywords already in a document written by someone else. Is there a getcoreproperties function or a keywords attribute or something similar?
I've grepped in folder C:\Python27\Lib\site-packages\python_docx-0.5.0-py2.7.egg\docx and none of the .py files there have the string "core" in them, and I've called doc() on a few things but without finding anything promising. Where/how should I look for clues to this kind of thing?
The python-docx library doesn't have support for core properties as of v0.5.0. But as it happens, that should be relatively easy to remedy.
The python-pptx sister project has support for core properties, as explained here:
http://python-pptx.readthedocs.org/en/latest/api/presentation.html#coreproperties-objects
Since the two projects are based on the same architecture, that code should be reusable essentially as-is. It turns out the core-properties bits are common to the the Open Packaging Convention, which is the same for all three of the MS Office XML file formats.
If you'll add an issue on the GitHub issue tracker I'll see how soon we can get to it.
https://github.com/python-openxml/python-docx/issues

Adding text to a 'paragraph' containing an image using python docx

I'm using python docx which claims in the documentation that:
'Often, a picture is placed in a paragraph by itself, but this is not required. It can have text before and after it in the paragraph in which it’s placed.'
But I cant find out how to do this, could someone explain (idealy with a basic example) how I get text before the image while in the same paragraph please. So the line of text ends with an image.
I've not found any answers to this but have seen people asking the same elsewhere with no solution.
Thanks
(note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)
At the time of this writing, python-docx doesn't have the features to support what you're trying to do.
The feature that would support it would be Run.add_picture(). If you add a feature request to the python-docx issue tracker, I'll see how soon we can get to it.
In the meantime, if you wanted to dig in and see what you could hack up, I'd recommend starting here, at Document.add_picture, as the structure would be analogous and use mostly the same calls.
If you just want to write docx files with Python, you can use another module:
https://github.com/rafaels88/py2docx

Can Python recognize a formula from Excel cell?

Can Python recognize formula from Excel cell and ignore processing the cell which has formulas?
Yes, but instead of reinventing the wheel. I would use one of these libraries.
They seem to provide what you need and a Tutorial is also available.
According to the answer provided by the author of xlrd in Jan 2011, xlrd does not currently provide access to Excel formulas. As I'm currently trying to do this, I'm inclined to believe that this is still the case - I'm working with version 0.9.2 which according to github is the latest. I've just noticed there is xlrd1 on PyPI, but as the two limitations listed are
There is no support for files in the Microsoft Excel 2007/2010 format
One cannot extract formulas from the input file
this offers no joy either.
Although my search is hardly exhaustive (I'm in a hurry), I'm of the opinion that the only sure way of accessing formulas in Python is to access the underlying COM object via Mark Hammond's pywin32. Fairly obviously you will need Excel installed, so this will limit the availability of this solution away from Windows. I'm currently using the Python Excels website for a bit of inspiration. I'm afraid that I don't have any reliable or coherent code as yet - my answer is posted mainly to warn that xlrd is sadly not yet the answer to grabbing Excel formulas via Python.
===
Tue 18.Mar.2014
BTW as I am moderately new to stackoverflow and currently lack the ability to comment or recommend, I would add that this answer was added specifically in light of the inadequacy of the previous answer to the question "can Python recognize a formula from an Excel cell?" which xlrd for all its merits does not. The main reason I posted an incompletely researched answer was to warn other users of a false positive which, for all its merits, xlrd is in this instance.
I'm currently engaged in many tasks, one of which involves this question. If I find an approach which "does" rather than one which "might" answer this question, other than the approach I have given, I will amend my answer.

Sublime Text editor plug-in, scan div id's and classes

I'm aware I'm supposed to show some starting code to give you a clue as to what I'm trying to do, but I'm really at a basic level and I can't find any resources to show me what I'm after. Basically, I'm trying to write a plug-in for Sublime Text editor, which selects all div ID's then outputs them into a file. What's the best approach? It seems like it should be easy, but I'm not too sure.
Thanks in advance for your help,
Ewan
This looks like a good place to start: http://www.sublimetext.com/docs/plugin-basics
Look at http://www.sublimetext.com/docs/2/api_reference.html, though be advised that Sublime Text 3 is currently in beta. It introduces changes to the plugin api, and a requirement to support Python 3. See http://www.sublimetext.com/docs/3/porting_guide.html
Assuming you have some familiarity with python, I would start with this tutorial on for writing plugins (Link). The author of that tutorial wrote, among other things, package control. Granted, it is for ST2, but for what you are trying to do, I don't for see any major issues with writing a plugin that is compatible with both ST2 and ST3.
How you go about writing your particular plugin is up to you. One approach may be leveraging the view.find_all() method. This takes a regular expression and returns a set of regions. From these regions, you can grab the text, and subsequently the IDs for the divs. There may be a better way, but that might work as an initial attempt. Writing to a file can be done through the usual python means.

Categories