I'm building my first website using flask and HTML. Some of my data that I want to migrate to this website resides in Markdown format. I am trying to convert Markdown into HTML using this however, I cannot get my hear around it:
https://github.com/Python-Markdown/markdown
I import it into my *.py file not sure what are the next steps after. This is what I got so far
from markdown import markdown
html = markdown.markdown(text)
not sure what should be put into the "text" variable. Also I have my markdown data residing in an html file how do I reference that from here? I have read through the installation guide but it's not very clear for me.
Thank you.
According to the docs located at https://python-markdown.github.io/reference/#using-markdown-as-a-python-library
text is supposed to contain your markdown text. In the below example found in the docs, some_file.txt would be the file containing your markdown.
input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
text = input_file.read()
html = markdown.markdown(text)
To get your text, you would need to parse it out of the HTML. There are several ways of doing this but we would need more information about the file to proceed. Is your HTML file stored locally? Where in the file is the markdown? A MRE would be helpful
Related
I'm making an EPUB using the EbookLib library and I'm following along their documentation. I am trying to set the content of a chapter to be the content of a HTML file. The only way I got it to work was giving plain HTML when setting the content.
c1 = epub.EpubHtml(title='Chapter one', file_name='ch1.xhtml', lang='en')
c1.set_content(u'<html><body><h1>Introduction</h1><p>Introduction paragraph.</p></body></html>')'
Is it possible to give a HTML file to be the content of the chapter?
I've tried things like c1.set_content(file_name='ch1.xhtml') but that didn't work, it only accepts plain HTML.
I figured it out! I'm opening and reading the file in a variable and then passing that variable to the set_content function. Posting this so it could be of use to someone in the future.
file = open('ch1.xhtml', 'r')
lines = file.read()
c2.set_content(lines)
file.close()
I have been trying to use the code made available here to edit HTML files using Python:
https://www.geeksforgeeks.org/how-to-modify-html-using-beautifulsoup/
# Python program to modify HTML
# with the help of Beautiful Soup
# Import the libraries
from bs4 import BeautifulSoup as bs
import os
import re
# Remove the last segment of the path
base = os.path.dirname(os.path.abspath(__file__))
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Give location where text is
# stored which you wish to alter
old_text = soup.find("p", {"id": "para"})
# Replace the already stored text with
# the new text which you wish to assign
new_text = old_text.find(text=re.compile(
'Geeks For Geeks')).replace_with('Vinayak Rai')
# Alter HTML file to see the changes done
with open("gfg.html", "wb") as f_output:
f_output.write(soup.prettify("utf-8"))
But nothing really happens, I tried changing the way the file is opened and changing the HTML file type, but it does nothing.
I'm not very practiced when it comes to programming so I don't know how well I will be able to answer any questions, but I will try my best to give any opportune information.
Thank you for your time.
The code is working fine when you have both the files right next to each other in a single directory:
files in same directory
"Geeks for Geeks" present within a p tag with id "para".
<p id="para">Geeks For Geeks</p>
When you have other tags within enclosing p tag with id "para".
<p id="para"><strong>Geeks For Geeks</strong></p>
If you are using a code editor (such as Atom or Sublime) you should be able to see the changes. In case of text editors, the changes may not reflect right away unless you manually reopen the file (ensuring you have not saved the file after running the Python script).
So my suggestion is:
Keep them both in the same directory.
Close the html file before running the Python script
After the script has been executed through cmd/bash (or built-in IDE console), reload the web page.
Feel free to reach out in case if the issue still persists.
Thanks.
import pdftotext
# Load your PDF
with open("docs/doc1.pdf", "rb") as f:
docs = pdftotext.PDF(f)
print(docs[0])
this code print blank for this specific file, if i change the file it is giving me result.
I tried even apache Tika. Tika also return None, How to solve this problem?
One thing I would like to mention here is that pdf is made of multiple images
Here is the file
This is sample pdf, not the original one. but i want to extract text from the pdf something like this
Below is a simple example to write an XML file and read it back. The writing works OK, but I am not sure how to read this file back? Below is some sample code. How do I get thse values from the XML file?
file1 = 'result1.xml'
fs = cv2.FileStorage(file1, cv2.FILE_STORAGE_WRITE)
fs.write('var1', 1)
fs.write('var2', 2)
fs = cv2.FileStorage(file1,cv2.FILE_STORAGE_READ)
fn = fs.real
Python in different versions has its own library to parse XML data.
Here is where you can find the documentation : XML Library
You have to be careful when using it, as said in title of the webpage, this library is not safe if XML files aren't built properly.
Here is another useful website : How to parse XML files using Python ?
Hello I am trying to make a python function to save a list of URLs in .txt file
Example: visit http://forum.domain.com/ and save all viewtopic.php?t= word URL in .txt file
http://forum.domain.com/viewtopic.php?t=1333
http://forum.domain.com/viewtopic.php?t=2333
I use this function but not save
I am very new in python can someone help me to create this
web_obj = opener.open('http://forum.domain.com/')
data = web_obj.read()
fl_url_list = open('urllist.txt', 'r')
url_arr = fl_url_list.readlines()
fl_url_list.close()
This is far from trivial and can have quite a few corner cases (I suppose the page you're referring to is a web page)
To give you a few pointers, you need to:
download the web page : you're already doing it (in data)
extract the URLs : this is hard, most probably, you'll want to usae an html parser, extract <a> tags, fetch the hrefattribute and put that into a list. then filter that list to have only the url formatted like you like (say with viewtopic). Let's say you got it into urlList
then open a file for Writing Text (thus wt, not r).
write the content f.write('\n'.join(urlList))
close the file
I advise to try to follow these steps and ask relevant questions when you're stuck on a particular issue.