Python Reportlab combine paragraph - python

I hope you can help me trying to combine a paragraph, my style is called "cursiva" and works perfectly also I have other's but it's the same if I change cursiva to other one. the issue is that If I use this coude o get this.
As you can see guys it shows with a line break and I need it shows togetter.
The problem is that i need to make it like this (one, one) togetter because I need to use two styles, the issue here is that I'm using arial narrrow so if I use italic or bold I need to use each one by separate because the typography does not alow me to use "< i >italic text< /i > ", so I need to use two different styles that actually works fine by separate.
how can I achive this?
cursiva = ParagraphStyle('cursiva')
cursiva.fontSize = 8
cursiva.fontName= "Arialni"
incertidumbre=[]
incertidumbre.extend([Paragraph("one", cursiva), Paragraph("one", cursiva)])
Thank you guys

The question you are asking is actually caused by a workaround for a different problem, namely that you don't know how to register font families in Reportlab. Because that is what is needed to make <i> and <b> work.
So you probably already managed to add a custom font, so the first part should look familiar, the final line is probably the missing link. It is registering the combination of these fonts a family.
from reportlab.pdfbase.pdfmetrics import registerFontFamily
pdfmetrics.registerFont(TTFont('Arialn', 'Arialn.ttf'))
pdfmetrics.registerFont(TTFont('Arialnb', 'Arialnb.ttf'))
pdfmetrics.registerFont(TTFont('Arialni', 'Arialni.ttf'))
pdfmetrics.registerFont(TTFont('Arialnbi', 'Arialnbi.ttf'))
registerFontFamily('Arialn',normal='Arialn',bold='Arialnb',italic='Arialni',boldItalic='Arialnbi')

Related

auto formatting python code to put parameters on same line

I really hate having spread out code, I am looking at a bunch of long code with parameters and arguments that are taking up way to much space.
def __init__(self,
network,
value_coef,
entropy_coef,
lr=None,
eps=None,
max_grad_norm=None,
conv=False):
Seems the guy who wrote it forced a 50 character line limit, I whole heartedly disagree. I would much rather it looked like this.
def __init__(self, network, value_coef, entropy_coef, lr=None, eps=None, max_grad_norm=None, conv=False):
There is also more nonsense like this which I would like to get rid of.
if self.conv:
grid_obs = rollouts.grid_obs[:-1]\
.view(-1, *rollouts.grid_obs.size()[2:])
dest_obs = rollouts.dest_obs[:-1]\
.view(-1, *rollouts.dest_obs.size()[2:])
obs = (grid_obs, dest_obs)
I am using VS code for the python and am an ex Intellij user and am missing all the built in code formatting code tools. Any one got any tips? I have been looking at autopep8 but it seems they are missing that functionality.
First, that's not 50 chars limit but 79 (as per pep8 conventions) and the way you would like to have it wouldn't be pep8 compliant as it's over 100 columns.
So, for the first snippet you can have it the way you don't like it (which is the correct way) or let your formatter know that you want the line-length to be over 79 columns.
For the second snippet you can remove the escape character \ and let the formatter do its job. I don't think it's 'nonsense' as you call it, but feel free to format it differently.
Autopep8 or Black both work very well and they are not missing any functionality.
Provided you installed one or the other, you have to add the proper key/value pair to your settings.json:
"python.formatting.provider": "autopep8" // (or "black")
If you use autopep8, for example, you can specify the line length you want (150 in your case) by adding this to your settings.json file:
"python.formatting.autopep8Args": [
"--line-length=150"
]
The same goes for black. In that case the value would be:
"python.formatting.blackArgs": [
"--line-length=150"
]
Formatting with that parameter will wrap your code to that amount.
You can format code with alt+shift+f (on a Mac) or right click on the editor and "Format Document".

Can python-docx preserve font color and styles when importing documents?

Essentially what I need to do is write a program that takes in many .docx files and puts them all in one, ordered in a certain way. I have importing working via:
import docx, os, glob
finaldocname = 'Midterm-All-Questions.docx'
finaldoc=docx.Document()
docstoworkon = glob.glob('*.docx')
if finaldocname in docstoworkon:
docstoworkon.remove(finaldocname) #dont process final doc if it exists
for f in docstoworkon:
doc=docx.Document(f)
fullText=[]
for para in doc.paragraphs:
fullText.append(para.text) #generates a long text list
# finaldoc.styles = doc.styles
for l in fullText:
# if l=='u\'\\n\'':
if '#' in l:
print('We got here!')
if '#1 ' not in l: #check last two characters to see if this is the first question
finaldoc.add_section() #only add a page break between questions
finaldoc.add_paragraph(l)
# finaldoc.add_page_break
# finaldoc.add_page_break
finaldoc.save(finaldocname)
But I need to preserve text styles, like font colors, sizes, italics, etc., and they aren't in this method since it just gets the raw text and dumps it. I can't find anything on the python-docx documentation about preserving text styles or importing in something other than raw text. Does anyone know how to go about this?
Styles are a bit difficult to work with in python-docx but it can be done.
See this explanation first to understand some of the problems with styles and Word.
The Long Way
When you read in a file as a Document() it will bring in all of the paragraphs and within each of these are the runs. These runs are chunks of text with the same style attached to them.
You can find out how many paragraphs or runs there are by doing len() on the object or you can iterate through them like you did in your example with paragraphs.
You can inspect the style of any given paragraph but runs may have different styles than the paragraph as a whole, so I would skip to the run itself and inspect the style there using paragraphs[0].runs[0].style which will give you a style object. You can inspect the font object beyond that which will tell you a number of attributes like size, italic, bold, etc.
Now to the long solution:
You first should create a new blank paragraph, then you should go and add_run() one by one with your text from your original. For each of these you can define a style attribute but it would have to be a named style as described in the first link. You cannot apply a stlye object directly as it won't copy the attributes over. But there is a way around that: check the attributes that you care about copying to the output and then ensure your new run applies the same attributes.
doc_out = docx.Document()
for para in doc.paragraphs:
p = doc_out.add_paragraph()
for run in para.runs:
r = p.add_run(run.text)
if run.bold:
r.bold = True
if run.italic:
r.italic = True
# etc
Obviously this is inefficient and not a great solution, but it will work to ensure you have copied the style appropriately.
Add New Styles
There is a way to add styles by name but because it isn't likely that the Word document you are getting the text and styles from is using named styles (rather than just applying bold, etc. to the words that you want), it is probably going to be a long road to adding a lot of slightly different styles or sometimes even the same ones.
Unfortunately that is the best answer I have for you on how to do this. Working with Word, Outlook, and Excel documents is not great in Python, especially for what you are trying to do.

Basics of connecting python to the web and validating user input

I'm relatively new, and I'm just at a loss as to where to start. I don't expect detailed step-by-step responses (though, of course, those are more than welcome), but any nudges in the right direction would be greatly appreciated.
I want to use the Gutenberg python library to select a text based on a user's input.
Right now I have the code:
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
text = strip_headers(load_etext(11)).strip()
where the number represents the text (in this case 11 = Alice in Wonderland).
Then I have a bunch of code about what to do with the text, but I don't think that's relevant here. (If it is let me know and I can add it).
Basically, instead of just selecting a text, I want to let the user do that. I want to ask the user for their choice of author, and if Project Gutenberg (PG) has pieces by that author, have them then select from the list of book titles (if PG doesn't have anything by that author, return some response along the lines of "sorry, don't have anything by $author_name, pick someone else." And then once the user has decided on a book, have the number corresponding to that book be entered into the code.
I just have no idea where to start in this process. I know how to handle user input, but I don't know how to take that input and search for something online using it.
Ideally, I'd be able to handle things like spelling mistakes too, but that may be down the line.
I really appreciate any help anyone has the time to give. Thanks!
The gutenberg module includes facilities for searching for a text by metadata, such as author. The example from the docs is:
from gutenberg.query import get_etexts
from gutenberg.query import get_metadata
print(get_metadata('title', 2701)) # prints frozenset([u'Moby Dick; Or, The Whale'])
print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])
print(get_etexts('title', 'Moby Dick; Or, The Whale')) # prints frozenset([2701, ...])
print(get_etexts('author', 'Melville, Hermann')) # prints frozenset([2701, ...])
It sounds as if you already know how to read a value from the user into a variable, and replacing the literal author in the above would be as simple as doing something like:
author_name = my_get_input_from_user_function()
texts = get_etexts('author', author_name)
Note the following note from the same section:
Before you use one of the gutenberg.query functions you must populate the local metadata cache. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast. If you fail to populate the cache, the calls will raise an exception.
With that in mind, I haven't tried the code I've presented in this answer because I'm still waiting for my local cache to populate.

Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.
Here is my attempt to use it. However, I don't understand how to work with the output. I'd appreciate your help.
t = unidecode(doclist[0].decode('utf-8','ignore'))
nltk.tokenize.texttiling.TextTilingTokenizer(t)
output:
<nltk.tokenize.texttiling.TextTilingTokenizer at 0x11e9c6350>
I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)
I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:
alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])
It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.

How to transform hyperlink codes into normal URL strings?

I'm trying to build a blog system. So I need to do things like transforming '\n' into < br /> and transform http://example.com into < a href='http://example.com'>http://example.com< /a>
The former thing is easy - just using string replace() method
The latter thing is more difficult, but I found solution here: Find Hyperlinks in Text using Python (twitter related)
But now I need to implement "Edit Article" function, so I have to do the reverse action on this.
So, how can I transform < a href='http://example.com'>http://example.com< /a> into http://example.com?
Thanks! And I'm sorry for my poor English.
Sounds like the wrong approach. Making round-trips work correctly is always challenging. Instead, store the source text only, and only format it as HTML when you need to display it. That way, alternate output formats / views (RSS, summaries, etc) are easier to create, too.
Separately, we wonder whether this particular wheel needs to be reinvented again ...
Since you are using the answer from that other question your links will always be in the same format. So it should be pretty easy using regex. I don't know python, but going by the answer from the last question:
import re
myString = 'This is my tweet check it out http://tinyurl.com/blah'
r = re.compile(r'(http://[^ ]+)')
print r.sub(r'\1', myString)
Should work.

Categories