replace('\n','') not working for series of \n\n\n - python

For a pre downloaded html file,i want to replace multiple \n with a new line.
i have tried,
html = html.replace('\n',''), but doesn't work.
Is there any other work around?

The line drop may be represented by \r
print (html.replace("\n", "").replace("\r", ""))

Related

Python \n newline through Variable Posts \n instead of a new line

My code gets the Content of the Message like so (works fine):
try:
s, message=nachricht.split("'content': '")
message, s=message.split("', 'channel_id'")
except:
message="."
Now in the Result we get a Variable like this:
Hello i\nam\nDeployd
or
Hello i
am
Deployd.
This is the Second part it uses the Variable to post it to Discord:
payload=json.dumps({
"content": f"{message}\n{attachments}"
, "username": f"{user_name}#{user_discrm}, {user_id}{namehookidentifiyer}", "avatar_url": f"https://cdn.discordapp.com/avatars/{user_id}/{avatar_code}.png?size=4096"})
req = requests.post(f"{webhook}", data=payload.encode(),headers=headers)
while req.status_code==429:
time.sleep(1)
req = requests.post(f"{webhook}", data=payload.encode(), headers=headers)*
We only care about the content Variable right here so if as shown the content=
f"{message}\n{attachments}"
the \n used in the Middle works just like it should.
But the \n of the message Variable works like it is \\n not \n so the outcome is:
message \n variable
attachments variable
i want that \n to be a new line, i hope you understood everything please help me.
-------------------------------------------
I tried to not use the {message} instead only use content=message
that got the same outcome (with the \n being displayed)
How you may have read if the \n is defined in the content: "\n" without a variable it works fine so its not like \n cant work
I also printed the Varables after they were made and after send to discord
it always was \n not \\n so it technically should have worked
Also the attachments Variable also has \n newlines and that works even though its a Variable used just like the content one i think thats because the \n inside the attachments variable is defined in the String not from outside:
attachments=f"{attachments}\n{value}"
The \n's from the content are split out of the output of a file:
datei = open(f'./data/{guildname}/{kategorie}/{channelname}.txt','r')
nachrichten=datei.read()
for nachricht in nachrichten:
if len(nachricht)>=10:
try:
s, message=nachricht.split("'content': '")
message, s=message.split("', 'channel_id'")
except:
message="."
Fixed it by adding this code:
s, message1=nachricht.split("'content': '")
message1, s=message1.split("', 'channel_id'")
message=""
for value in message1.split('\\n'):
message=f"{message}\n{value}"
Chanched the massage Variable to massage1
then splitting it on every \n and putting the split parts together but with a \n inside the message="\n" so its not from an outside variable, i figured this out by looking at the attachments because they worked fine, then i though of a solution to bring the \n outside a variable into the code.

Unnecessary Indentations in BeautifulSoup

I'm trying to parse a webpage:However, I want to only focus on text within the div tag labelled "class='body conbody'". I want my program to look inside of this tag and output the text exactly like how they appear on the webpage.
Here is my code so far:
pres_file = directory + "\\" + pres_number + ".html"
with open(pres_file) as html_file:
soup = BeautifulSoup(html_file, 'html.parser')
desiredText = soup.find('div', class_='body conbody')
for para in desiredText.find_all('p'):
print(para.get_text())
The problem with my current code is that whenever I try to print the paragraphs, (a), (1), (2), (b), and (c) are always formatted with a lot of unnecessary newlines and additional spaces after it. However, I would like for it to output text that is equivalent to how it looks on the webpage. How can I change my code to accomplish this?
I want my program to look inside of this tag and output the text exactly like how they appear on the webpage.
The browser does a lot of processing to display a web page. This includes removing extra spaces. Additionally, the browser developer tools show a parsed version of the HTML as well as potential additions from dynamic JavaScript code.
On the other hand, you are opening a raw text file and get the text as it is, including any formatting such as indentation and line breaks. You will need to process this yourself to format it the way you want when you output it.
There are at least two things to look for:
Is the indentation tab or space characters? By default print() represents a tab as 8 spaces. You can either replace the tabs with spaces to reduce the indentation or you can use another output method that allows you to configure specify how to show tabs.
The strings themselves will include a newline character. But then print() also adds a line break. So either remove the newline character from each string or do print(para.get_text(), end='') to disable print adding another newline.
You can use strip() on strings, like para.get_text().strip(). This will remove any whitespaces before and after the string.
You can use either lstrip() and rstrip() to remove only the exceeding whitespaces from the left or right side of the string.
s = " \t \n\n something \t \n "
print(s.strip()) # 'something'
print(s.lstrip()) # 'something \t \n '
print(s.rstrip()) # ' \t \n\n something'
Would something like this work:
Strip left and right of the p
Indent the paragraph with 1em (so 1 times the font size)
Newline each paragraph
font_size = 16 # get the font size
for para in desiredText.find_all('p'):
print(font_size * " " + para.get_text().strip(' \t\n\r') + "\n")

How can I convert String (with linebreaks) to HTML?

When I print the string (in Python) coming from a website I scraped it from, it looks like this:
"His this
is
a sample
String"
It does not show the \n breaks. this is what I see in a Python interpreter.
And I want to convert it to HTML that will add in the line breaks. I was looking around and didn't see any libraries that do this out of the box.
I was thinking BeautifulSoup, but wasn't quite sure.
If you have a String that you have readed it from a file you can just replace \n to <br>, which is a line break in html, by doing:
my_string.replace('\n', '<br>')
You can use the python replace(...) method to replace all line breaks with the html version <br> and possibly surround the string in a paragraph tag <p>...</p>. Let's say the name of the variable with the text is text:
html = "<p>" + text.replace("\n", "<br>") + "</p>"
searching for this answer in found this, witch is likely better because it encodes all characters, at least for python 3
Python – Convert HTML Characters To Strings
# import html
import html
# Create Text
text = 'Γeeks for Γeeks'
# It Converts given text To String
print(html.unescape(text))
# It Converts given text to HTML Entities
print(html.escape(text))
I believe this will work
for line in text:
for char in line:
if char == "/n":
text.replace(char, "<br>")

Removed new line character appears in write to file but not in print to screen

One very stupid thing happens when I modify a string that contained newline characters within it.
After modifying the string variable, I print it. It successfully shows that the new line character has been removed.
When I write the string variable to a file, it prints the new line character there.
I spent hours figuring this out!
import os
import csv
s = "I want this \n new line removed"
s = s.replace("\n", "")
print(s)
file = open('my_file.tsv', 'w')
file.write(s)
file.close()
The above is a sample code. If you run this code, it will run. The string in my real project is a text dynamically fetched from a mysql database -- which is being modified. That contains one or more \n characters within it.
If in the above code, I try replacing that text obtained from the database in a hardcoded manner and running it, it throws me an error saying "EOL while scanning string lateral"
Can you please help me clean this text into something consumable?
Removal of '\r\n' worked!! Thanks a lot #abarnert for the suggestion.
The text wasn't visible to me in code form. It was raw text fetched from db. The raw text just looked like a paragraph with multiple newlines. Hence, I wasn't able to provide real text

Transform textarea input to paragraphed HTML

I'd like to transform what the user inputs into an textarea on a html page into a <p>-tagged output where each <p> is replacing new lines.
I'm trying with regular expressions but I can't get it to work. Will someone correct my expression?
String = "Hey, this is paragraph 1 \n and this is paragraph 2 \n and this will be paragraph 3"
Regex = r'(.+?)$'
It just results in Hey, this is paragraph 1 \n and this is paragraph 2 \n<p>and this will be paragraph 3</p>
I wouldn't use regular expressions for this, simply because you do not need it. Check this out:
text = "Hey, this is paragraph 1 \n and this is paragraph 2 \n and this will be paragraph 3"
html = ''
for line in text.split('\n'):
html += '<p>' + line + '</p>'
print html
To make it one line, because shorter is better, and clearer:
html = ''.join('<p>'+L+'</p>' for L in text.split('\n'))
I would do it this way:
s = "Hey, this is paragraph 1 \n and this is paragraph 2 \n and this will be paragraph 3"
"".join("<p>{0}</p>".format(row) for row in s.split('\n'))
You basically split your string into a list of lines. Then wrap each line with paragraph tags. In the end just join your lines.
Above answers relying on identifying '\n' do not work reliably. You need to use .splitlines(). I don't have enough rep to comment on the chosen answer, and when I edited the wiki, someone just reverted it. So can someone with more rep please fix it.
Text from a textarea may use '\r\n' as a new line character.
>> "1\r\n2".split('\n')
['1\r', '2']
'\r' alone is invalid inside a webpage, so using any of the above solutions produce ill formed web pages.
Luckily python provides a function to solve this. The answer that works reliably is:
html = ''.join('<p>'+L+'</p>' for L in text.splitlines())
You need to get rid of the anchor, $. Your regex is trying to match one or more of any non-newline characters, followed by the end of the string. You could use MULTILINE mode to make the anchors match at line boundaries, like so:
s1 = re.sub(r'(?m)^.+$', r'<p>\g<0></p>', s0)
...but this works just as well:
s1 = re.sub(r'.+', r'<p>\g<0></p>', s0)
The reluctant quantifier ( .+? ) wasn't doing anything useful either, but it didn't mess up the output like the anchor did.
Pretty easy >>
html='<p>'+s.replace("\n",'</p><p>')+'</p>'

Categories