Maybe this is a dumb question, but I don't get it so appologize :)
I have an RTF document, and I want to change it. E.g. there is a table, I want to duplicate a row and change the text in the second row in my code in an object-oriented way.
I think pyparsing should be the way to go, but I'm fiddling around for hours and don't get it. I'm providing no example code because it's all nonsense I think :/
Am I on the right path or is there a better approach?
Anyone did something like that before?
RTFs are text documents with special "symbols" to create the formatting. (see - http://search.cpan.org/~sburke/RTF-Writer/lib/RTF/Cookbook.pod#RTF_Document_Structure It seems that perl has a good RTF library though), so yes, PyParsing is a good way to go. You have to learn the structure and then parse (there are perl code examples in the page i mentioned. If you are lucky you can translate them in python with some effort)
There is a basic RTF module available for python. Check - http://pyrtf.sourceforge.net/
Hope that helps you a little.
Related
I'm using python docx which claims in the documentation that:
'Often, a picture is placed in a paragraph by itself, but this is not required. It can have text before and after it in the paragraph in which it’s placed.'
But I cant find out how to do this, could someone explain (idealy with a basic example) how I get text before the image while in the same paragraph please. So the line of text ends with an image.
I've not found any answers to this but have seen people asking the same elsewhere with no solution.
Thanks
(note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)
At the time of this writing, python-docx doesn't have the features to support what you're trying to do.
The feature that would support it would be Run.add_picture(). If you add a feature request to the python-docx issue tracker, I'll see how soon we can get to it.
In the meantime, if you wanted to dig in and see what you could hack up, I'd recommend starting here, at Document.add_picture, as the structure would be analogous and use mostly the same calls.
If you just want to write docx files with Python, you can use another module:
https://github.com/rafaels88/py2docx
I'm aware I'm supposed to show some starting code to give you a clue as to what I'm trying to do, but I'm really at a basic level and I can't find any resources to show me what I'm after. Basically, I'm trying to write a plug-in for Sublime Text editor, which selects all div ID's then outputs them into a file. What's the best approach? It seems like it should be easy, but I'm not too sure.
Thanks in advance for your help,
Ewan
This looks like a good place to start: http://www.sublimetext.com/docs/plugin-basics
Look at http://www.sublimetext.com/docs/2/api_reference.html, though be advised that Sublime Text 3 is currently in beta. It introduces changes to the plugin api, and a requirement to support Python 3. See http://www.sublimetext.com/docs/3/porting_guide.html
Assuming you have some familiarity with python, I would start with this tutorial on for writing plugins (Link). The author of that tutorial wrote, among other things, package control. Granted, it is for ST2, but for what you are trying to do, I don't for see any major issues with writing a plugin that is compatible with both ST2 and ST3.
How you go about writing your particular plugin is up to you. One approach may be leveraging the view.find_all() method. This takes a regular expression and returns a set of regions. From these regions, you can grab the text, and subsequently the IDs for the divs. There may be a better way, but that might work as an initial attempt. Writing to a file can be done through the usual python means.
My python env is 2.7
I know this is an old question, but I've lost my mind while I was searching and reading other people's questions and answers. Some of them is really out of date. Like the code below:
import lxml #wrong
import xml #correct
So, since I'm a newbie to python and know nothing whatsoever in the great python history, I wanna make things more clear to me. Such as, what is the so-called standard xml-parser module in python now? what can I do when I need parse some HTML by using the xpath syntax. If I have a mal-formed HTML source code, how can handle it by not using BeautifulSoup or something else like. If u can brief me with something, I'll be much appreciated.
OK, all in all, I just got one question. How can I parse mal-formed html code by using standard python module with python2.7?
Read the python library documentation if you need to stick to the standard library.
If you don't, definitely look at lxml, which does much more.
I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are
1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?)
2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?
Your problem seems to be more related to programming language parsing. I believe with regular expressions you will be able to find comments in most of the languages. The good thing is that you have regular expressions almost everywhere: Perl, Python, Ruby, AWK, Sed, etc.
But, as the other answer said, you'd better use some parsing machinery. And, if not a full blown parser, a lexer. For Python, check out the Pygments library, which has lexers for many languages already implemented.
1) What you need to know about is parsing, not regex. Additionally you will need the os module and some knowledge about pythons file handling. DiveIntoPython (http://www.diveintopython.net/) is a good start here. I'd recommend chapter 6. (And maybe 1-5 as well :) )
2) Python is a good start. Another language is not going to make it easier, but different. Python allready is pretty simple to start with.
I would recommend not to use regex for your task, as it is as simple as searching for comment signs and linefeeds.
The pyparsing module directly supports several styles of comments. E.g.,
from pyparsing import javaStyleComment
for match in javaStyleComment.scanString(text):
<do stuff>
So if your goal is just getting the job done, look into this since the comment parsers are likely to be more robust than anything you'd throw together. If you're more interested in learning to do it yourself, this might be too much processed food for your taste.
I've read through some documentation on the re module that comes with built-in python, but I just can't seem to get a grasp on it. In fact, I'm not exactly sure that is what I'm looking for, so let me explain:
I have a huge dictionary. What I want is to be able to type in a search criteria, let's say for example hello, and then have it search through the dictionary and give me a list like this:
hello, hell, hello world, hello123. Basically anything resembling the search criteria. Would I use regex for this or something else?
Since you are using Python, you should look at Xapian, it had great Python bindings.
What you are asking for is way more sophisticated that what regular expressions are for.
You need full text search, with stemming and other tricks to do the fuzzy matching.
You might want to look at something that can compute a Levenshtein (edit) distance. There's an excellent article here on how to build something like you are talking about from scratch (in Python! well and it has been ported to lots of other languages).
You might not want to go the "from-scratch" route, but the article will give you lots of interesting background that should help you decide which tool has the right level of sophistication for you. Xapian, as suggested above, Lucene, and other full-text search engines will provide this kind of capability, and it can be very sophisticated, but then again you might not need all that.
There is a new regexp module in PyPI repository (which will possibly replace the current Python re module sometimes).
It allows fuzzy matching.