I'm a begginer in Python and one of the first codes I've made it's an RPG, so there's a lot of texts in strings being printed. Before I learned how to "word wrap", I used to test every string and put an "\n" in the right places, so it could be better to read the history in the console.
But now I don't need those "\n" anymore, and it's been really laborious to replace each one of them using the Replace Dialog of Python IDLE. One of the problems is that I want to ignore double new lines ("\n\n"), because they do make the texts more presentable.
So if I just search "\n" he finds it, but I want to ignore all the "\n\n".
I tried using the "Regular expression" option and did a research with regex but with no success, since I'm completly new in this area. Tried some things like "^\n$" because, if I understood it right, the ^ and the $ delimit the search to what's between them.
I think it's clear what I need, but will write an example anyways:
print("Here's the narrator telling some things to the player. Of course I could do some things but\nnow it's time to ask for help!\n\nProbably it's a simple thing, but it's been lots of time in research and no\nsuccess...")
I want to find and replace those two "\n" with one empty space (" ") and totally ignore the "\n\n".
Can you guys help? Thanks in advance.
You need
re.sub(r'(?<!\n)\n(?!\n)', ' ', text)
See the regex demo.
Details
(?<!\n) - no newline allowed immediately on the left
\n - a newline
(?!\n) - no newline allowed immediately on the right
See Python demo:
import re
text = "Here's the narrator telling some things to the player. Of course I could do some things but\nnow it's time to ask for help!\n\nProbably it's a simple thing, but it's been lots of time in research and no\nsuccess..."
print(re.sub(r'(?<!\n)\n(?!\n)', ' ', text))
Output:
Here's the narrator telling some things to the player. Of course I could do some things but now it's time to ask for help!
Probably it's a simple thing, but it's been lots of time in research and no success...
Related
I am writing a piece of code to get lyrics from genius.com.
I have managed to extract the code from the website but it comes out in a format where all the text is on one line.
I have used regex to add a space but cannot figure out how to add a new line. Here is my code so far:
text_container = re.sub(r"(\w)([A-Z])", r"\1 \2", text_container.text)
This adds a space before the capital letter, but I cannot figure out how to add a new line.
It is returning [Verse 1]Leaves are fallin' down on the beautiful ground I heard a story from the man in red He said, "The leaves are fallin' down
I would like to add a new line before "He" in the command line.
Any help would be greatly appreciated.
Thanks :)
If genius.com doesn't somehow provide a separator, it will be very hard to find a way to know what to look for.
In your example, I made a regex searching for " [A-Z]", which will find " He...". But it will also find all places where a sentence starts with " I...". Sometimes new sentences will start with "I...", but it might make new lines where there actually shouldn't be one.
TL;DR - genius.com needs to provide some sort of separator so we know when there should be a new line.
Disclaimer: Unless I missed something in your description/example
A quick skim of the view-source for a genius lyrics page suggests that you're stripping all the HTML markup which would otherwise contain the info about linebreaks etc.
You're probably better off posting that code (likely as a separate question) and asking how to correctly extract not just the text nodes, but also enough of the <span> structure to format it as necessary.
Looking around I found an API that python has to pull lyrics from Genius.com, here's the link to the PyPI:
https://lyricsgenius.readthedocs.io/en/master/
Just follow the instructions and it should have what you need, with more info on the problem I could provide a more detailed response
I'm not sure about using regex. Try this method:
text = lyrics
new_text = ''
for i, letter in enumerate(text):
if i and letter.isupper():
new_text += '\n'
new_text += letter
print(new_text)
However, as oscillate123 has explained, it will create a new line for every capital letter regardless of the context.
I am working on some customer comments that some of them did not follow grammatical rules. For Example (Such as s and b.) in the following text that provides more explanation for previous sentence is surrounded by two dots.
text = "I was initially scared of ANY drug after my experience. But about a year later I tried. (Such as s and b.). I had a very bad reaction to this."
First, I want to find . (Such as s and b.). and then replace the dot before (Such as s and b.) to space. This is my code, but it does not work.
text = re.sub (r'(\.)(\s+?\(.+\)\s*\.)', r' \2 ', text )
Output should be:
"I was initially scared of ANY drug after my experience. But about a year later I tried (Such as s and b.). I had a very bad reaction to this."
I am using python.
The sample provided does not make much sense because the only change is that the ` character is moved one position to the left.
However, this might do the trick (to keep the dot inside the paranthesis):
text = re.sub(r'\.\s*\)\s*\.', '.)', text)
Or this to have it outside:
text = re.sub(r'\.\s*\)\s*\.', ').', text)
Edit: Or maybe you're looking for this to replace the dot before the opening paranthesis?
text = re.sub(r'\.(?=\s*\(.*?\)\.)', ').', text)
I would suggest this to remove a dot before parentheses when there is another one following them:
text = re.sub(r'\.(\s*?\([^)]*\)\s*\.)', r'\1', text)
See it run on repl.it
I'm trying to use docstrings w/ triple-quotes in my Jupyter notebooks using Python 2.7 .
I can disable the autoclose brackets/quotes thing but I'm quite keen on them; major increase in workflow.
Does anyone know how to do triple quotes without over-quoting while keeping the autoclose feature?
If I press the " key 3x I get """""";
If I press it 3x and delete once, I get """" pressing; and
If I press it 3x and delete twice, I get ""
Annoying, right? How can I have the best of both worlds (autoclose | docstrings) ?
This is a pretty low-level question, but I haven't seen an easy fix anywhere so the answer should be useful for the community. If you downvote, can you explain why this is a poor question please?
Nothing is wrong. When you type three " your cursor is at the middle of the resulting six. Thus, anything you type is within the string and has been auto-closed.
Type this exact string of characters: """This is working without clicking or otherwise moving the cursor. The result will be a correcly formatted string, because it will have auto-closed the string. Therefore you have both strings and autoclose.
I'm scraping a set of originally pdf files, using Python. Having gotten them to text, I had a lot of trouble getting the line endings out. I couldn't figure out what the line separator was. The trouble is, I still don't know.
It's not a '\n', or, I don't think, '\r\n'. However, I've managed to isolate one of these special characters. I literally have it in memory, and by doing a call to my_str.replace(eol, ''), I can remove all of these characters from one of my files.
So my question is open-ended. I'm a bit lost when it comes to unicode and such. How can I identify this character in my files without resorting to something ridiculous, like serializing it and then reading it in? Is there a way I can refer to it as a code, perhaps? I can't get Python to yield what it actually IS. All I ever see if I print it, or call unicode(special_eol) is the character in its functional usage as a newline.
Please help! Thanks, and sorry if I'm missing something obvious.
To determine what specific character that is, you can use str.encode('unicode_escape') or repr() to get (in Python 2) a ASCII-printable representation of the character:
>>> print u'☃'.encode('unicode_escape')
\u2603
>>> print repr(u'☃')
u'\u2603'
I can't get Python to print a word doc. What I am trying to do is to open the Word document, print it and close it. I can open Word and the Word document:
import win32com.client
msword = win32com.client.Dispatch("Word.Application")
msword.Documents.Open("X:\Backoffice\Adam\checklist.docx")
msword.visible= True
I have tried next to print
msword.activedocument.printout("X:\Backoffice\Adam\checklist.docx")
I get the error of "print out not valid".
Could someone shed some light on this how I can print this file from Python. I think it might be as simple as changing the word "printout". Thanks, I'm new to Python.
msword.ActiveDocument gives you the current active document. The PrintOut method prints that document: it doesn't take a document filename as a parameter.
From http://msdn.microsoft.com/en-us/library/aa220363(v=office.11).aspx:
expression.PrintOut(Background, Append, Range, OutputFileName, From, To, Item,
Copies, Pages, PageType, PrintToFile, Collate, FileName, ActivePrinterMacGX,
ManualDuplexPrint, PrintZoomColumn, PrintZoomRow, PrintZoomPaperWidth,
PrintZoomPaperHeight)
Specifically Word is trying to use your filename as a boolean Background which may be set True to print in the background.
Edit:
Case matters and the error is a bit bizarre. msword.ActiveDocument.Printout() should print it. msword.ActiveDocument.printout() throws an error complaining that 'PrintOut' is not a property.
I think what happens internally is that Python tries to compensate when you don't match the case on properties but it doesn't get it quite right for methods. Or something like that anyway. ActiveDocument and activedocument are interchangeable but PrintOut and printout aren't.
You probably have to escape the backslash character \ with \\:
msword.Documents.Open("X:\\Backoffice\\Adam\\checklist.docx")
EDIT: Explanation
The backslash is usually used to declare special characters. For example \n is the special character for a new-line. If you want a literal \ you have to escape it.