I am trying to create a corpus of data from a set of .html pages I have stored in a directory.
These HTML pages have lots of info I don't need.
This info is all stored before the line
<div class="channel">
How can I programmatically remove all of the text before
<div class="channel">
in every HTML file in a folder?
Bonus question for a 50point bounty :
How do I programmatically remove everything AFTER, for example,
<div class="footer">
?
So if my index.html was previously :
<head>
<title>This is bad HTML</title>
</head>
<body>
<h1> Remove me</h1>
<div class="channel">
<h1> This is the good data, keep me</h1>
<p> Keep this text </p>
</div>
<div class="footer">
<h1> Remove me, I am pointless</h1>
</div>
</body>
After my script runs, I want it to be :
<div class="channel">
<h1> This is the good data, keep me</h1>
<p> Keep this text </p>
</div>
This is a bit heavy on memory usage, but it works. Basically you open up the directory, get all ".html" files, read them into a variable, find the split point, store the before or after in a variable, and then overwrite the file.
There are probably better ways to do this, nonetheless, but it works.
import os
dir = os.listdir(".")
files = []
for file in dir:
if file[-5:] == '.html':
files.insert(0, file)
for fileName in files:
file = open(fileName)
content = file.read()
file.close()
loc = content.find('<div class="channel">')
newContent = content[loc:]
file = open(fileName, 'w')
file.write(newContent)
file.close()
If you wanted to just keep up to a point:
newContent = content[0:loc - 1] # I think the -1 is needed, not sure
Note that the things you're searching should be kept in a variable, and not hardcoded.
Also, this won't work recursively for file/folder structures, but you can find out how to modify it to do that very easily.
to remove everything above and everything below
that means the only thing left should be this section:
<div class="channel">
<h1> This is the good data, keep me</h1>
<p> Keep this text </p>
</div>
rather than thinking to remove the unwanted, it would be easier to just extract the wanted.
you can easily extract channel div using XML parser such as DOM
You've not mentioned a language in the question - the post is tagged with python so this answer might still be out of context, but I'll give a php solution that could likely easily be rewritten in another language.
$html='....'; // your page
$search='<div class="channel">';
$components = explode($search,$html); // [0 => before the string, 1 => after the string]
$result = $search.$components[1];
return $result;
To do the reverse is fairly easy too; simply take the value of $components[0] after altering $search to your <div class="footer"> value.
If you happen to have the $search string cropping up multiple times:
$html='....'; // your page
$search='<div class="channel">';
$components = explode($search,$html); // [0 => before the string, 1 => after the string]
unset($components[0]);
$result = $search.implode($search,$components);
return $result;
Someone who knows python better than I do feel free to rewrite and take the answer!
Related
I have a Python variable whose value is a string of text and would like to edit that value via Javascript.
I have no idea how to go about doing this.
Attempts:
function changeValue(val) {
val = 'new text';
}
<textarea placeholder="some text">{{ changeValue({{ result }}) }}</textarea>
<textarea placeholder="some text">
{{ result }}
</textarea>
What I want: I have some text (result) being added and would like to check if the text is empty. If so, I want to show the placeholder text.
The issue: Although I can check if the value is empty, when I try to print that result out it reads none
Thanks to all!
You do not need to call the JavaScript function from the HTML file. There are several approaches you can take:
1. Store the variable in HTML metadata:
<meta id="result_var" data-result={{result}}>
And then get the data in JavaScript:
result = document.getElementById("result_var").value;
2. Keep the variable in the tag where it's supposed to be and get it from there in JavaScript:
<textarea placeholder="some text" id="result-var"> {{result}} </textarea>
And then get it in JavaScript:
let result = document.getElementById("result-var");
3. Query it from your API: You can create a route in your Flask app that returns JSON data with the variable you need and then get that data to your JavaScript file by sending a request to your API.
4. Jinja format: I've seen solutions that involve just using the variable as if it was a jinja variable in JavaScript like this: let result = JSON.parse('{{ result | tojson }}');. But I haven't been able to get this working properly, not sure why.
I hope this helps!
I am attempting to create a file in simple website and then read the contents of the same in a variable inside a Django view function and parse the variable to the template to be displayed on web page.
However, when I print the variable, it appears the same on cmd as is in the original text file, but the output on the web page has no formattings but appears like a single string.
I've been stuck on it for two days.
Also I'm relatively new to django and self learning it
file1 = open(r'status.txt','w',encoding='UTF-8')
file1.seek(0)
for i in range(0,len(data.split())):
file1.write(data.split()[i] + " ")
if i%5==0 and i!=0 and i!=5:
file1.write("\n")
file1.close()
file1 = open(r'status.txt',"r+",encoding='UTF-8')
d = file1.read()
print(d) #prints on cmd in the same formatting as in text file
return render(request,'status.html',{'dat':d}) **#the html displays it only as a single text
string**
<body>
{% block content %}
{{dat}}
{% endblock %}
</body>
Use the linebreaks filter in your template. It will render \n as <br/>.
use it like -:
{{ dat | linebreaks }}
from the docs:
Replaces line breaks in plain text with appropriate HTML; a single
newline becomes an HTML line break (<br>) and a new line followed by a
blank line becomes a paragraph break (</p>).
You can use linebreaksbr if you don't want <p> tag.
It's because in HTML newline is </br> in Python it is \n. You should convert it, before rendering
mytext = "<br />".join(mytext.split("\n"))
Depending of your needs and the file format you want to print, you may also want to check the <pre> HTML tag.
So I first create an array of all folders in a specific directory, I then pass that to my html file.
def test_yt_vid():
mylist = os.listdir(WD+r"static/"+YOUTUBE_FOLDER)
full_path = (WD+YOUTUBE_FOLDER)
return dict(mylist=mylist, full_path=full_path)
Next I look through that array to find what file has been selected.
<select name=list id="videolist" method="GET" action="/">
{% for mylist in mylist %}
<option value= "{{mylist}}" SELECTED>{{mylist}}</option>"
{% endfor %}
</select>
Next I use JS to get the specific value into a variable
$('#videolist').change(function () {
//console.log($("#videolist").val());
var fileInput = $("#videolist").val())};
So The problem is here, I'm not sure how I would go about passing that value into the following jinja code
<video id="videotesting1" class="video" width="300" height="240" autoplay loop controls="true">
<source src="{{url_for('static',filename='videoTest/' + 'testVid.mp4')}}" type="video/mp4">
</video >
I'm trying to replace 'testVid.mp4' with the variable fileInput from the JS, I tried using $("#videotesting1").attr("src","{{url_for('static',filename='videoTest/'" + fileInput +")}}");'
But no luck so far.
This is different to "How do you change video src using jQuery?" because I am trying to pass a jinja variable to HTML using js.
You have some wrong closed quotes. Take a look at filename, where you set 'videoTest/' plus some variable value (e.g x), which results in 'videoTest/'x. Do you notice it? The single quote closed after videoTest should appear after the variable fileInput. The correct way would be:
$("#videotesting1").attr("src","{{url_for('static',filename='videoTest/" + fileInput + "')}}");
When you modify the src, has by inspect element the src changed, but the desired video isn't being played? If so, try:
$("#videotesting1").load()
Take a look at what load does # JQuery docs.
Figure out the problem, the file name has to go outside the jinja code because it doesnt get rendered by jinja for some reason when the event happens.
$("#videotesting1").attr("src","{{url_for('static',filename='videoTest/')}}" + fileInput);
When I type the code below, it gives me a blank HTML page. Even though I put a <h1> and a <a href> tag. Only the <title> tag is executed. Does anyone know why and how to fix it?
Code:
my_variable = '''
<html>
<head>
<title>My HTML File</title>
</head>
<body>
<h1>Hello world!</h1>
Click me
</body>
</html>'''
my_html_file = open(r"\Users\hp\Desktop\Code\Python testing\CH\my_html_file.html", "w")
my_html_file.write(my_variable)
Thanks in advance!
As #bill Bell said, it's probably because you haven't closed your file (so it hasn't flushed its buffer).
So, in your case:
my_html_file = open(r"\Users\hp\Desktop\Code\Python testing\CH\my_html_file.html", "w")
my_html_file.write(my_variable)
my_html_file.close()
But, this is not the right way to do it. Indeed, if an errors occurs in the second line for example, the file'll never get closed. So, you can use the with statement to make sure that it always is. (just as #Rawing said)
with open('my-file.txt', 'w') as my_file:
my_file.write('hello world!')
So, in fact, it's like if you did:
my_file = open('my-file.txt', 'w')
try:
my_file.write('hello world!')
finally:
# this part is always executed, whatever happens in the try block
# (a return, an exception)
my_file.close()
How do I insert a variable into an HTML email I'm sending with python? The variable I'm trying to send is code. Below is what I have so far.
text = "We Says Thanks!"
html = """\
<html>
<head></head>
<body>
<p>Thank you for being a loyal customer.<br>
Here is your unique code to unlock exclusive content:<br>
<br><br><h1><% print code %></h1><br>
<img src="http://example.com/footer.jpg">
</p>
</body>
</html>
"""
Use "formatstring".format:
code = "We Says Thanks!"
html = """\
<html>
<head></head>
<body>
<p>Thank you for being a loyal customer.<br>
Here is your unique code to unlock exclusive content:<br>
<br><br><h1>{code}</h1><br>
<img src="http://example.com/footer.jpg">
</p>
</body>
</html>
""".format(code=code)
If you find yourself substituting a large number of variables, you can use
.format(**locals())
Another way is to use Templates:
>>> from string import Template
>>> html = '''\
<html>
<head></head>
<body>
<p>Thank you for being a loyal customer.<br>
Here is your unique code to unlock exclusive content:<br>
<br><br><h1>$code</h1><br>
<img src="http://example.com/footer.jpg">
</p>
</body>
</html>
'''
>>> s = Template(html).safe_substitute(code="We Says Thanks!")
>>> print(s)
<html>
<head></head>
<body>
<p>Thank you for being a loyal customer.<br>
Here is your unique code to unlock exclusive content:<br>
<br><br><h1>We Says Thanks!</h1><br>
<img src="http://example.com/footer.jpg">
</p>
</body>
</html>
Note, that I used safe_substitute, not substitute, as if there is a placeholder which is not in the dictionary provided, substitute will raise ValueError: Invalid placeholder in string. The same problem is with string formatting.
use pythons string manipulation:
http://docs.python.org/2/library/stdtypes.html#string-formatting
generally the % operator is used to put a variable into a string, %i for integers, %s for strings and %f for floats,
NB: there is also another formatting type (.format) which is also described in the above link, that allows you to pass in a dict or list slightly more elegant than what I show below, this may be what you should go for in the long run as the % operator gets confusing if you have 100 variables you want to put into a string, though the use of dicts (my last example) kinda negates this.
code_str = "super duper heading"
html = "<h1>%s</h1>" % code_str
# <h1>super duper heading</h1>
code_nr = 42
html = "<h1>%i</h1>" % code_nr
# <h1>42</h1>
html = "<h1>%s %i</h1>" % (code_str, code_nr)
# <h1>super duper heading 42</h1>
html = "%(my_str)s %(my_nr)d" % {"my_str": code_str, "my_nr": code_nr}
# <h1>super duper heading 42</h1>
this is very basic and only work with primitive types, if you want to be able to store dicts, lists and possible objects I suggest you use cobvert them to jsons http://docs.python.org/2/library/json.html and https://stackoverflow.com/questions/4759634/python-json-tutorial are good sources of inspiration
Hope this helps