I solved a lot of my issues with stackoverflow.
Today, I decided to ask my question.
I'm new learning Python. I'm looking at how to scrape data from the web.
I came to an example of website where products may have different variants in term of size or colour.
I can't figure out how 'follow' the link to reach the page of the variant. I can see that there is a call to a function but I don't know how to have access to this function/link.
See below urls as examples:
variant of colour and size
variant of colour:
Here is the code I use to get all the variants available but it's not working as I want and finally I don't know how to get the links:
# Define the part of the page I'm interested in:
article = soup.find('header', class_='pdp-header')
for variant in article.find_all('p', class_='pdp-size-variants__title'):
print(variant)
if "Colour" in variant:
for colours in article.find('div', class_='swatches__item'):
print(colours)
if "Size" in variant:
for sizes in article.find('button', class_='btn-supportive'):
print(sizes)
The result, I have is:
<p class="pdp-size-variants__title small--xs mb-1"><strong>Colour</strong></p>
<p class="pdp-size-variants__title small--xs mb-1"><strong>Size</strong></p>
If you can put me on the right direction that would be great.
Thanks a lot.
David
I can see that there is a call to a function but I don't know how to have access to this function/link.
The first step is to look at that function. Start by looking at the swatch:
<div class="swatches__item"><span title="Black" onclick="getProductVariantFunc(12145,267721)" class="swatch active" style="background-image:url('https://ccshop.sirv.com/ccs/images/swatches/Black.png');background-size:100%"> </span></div>
Here we see there is an onclick handler named getProductVariantFunc. Now you should open up the page source and find the code for that function:
<script>function getProductVariantFunc(n, t) {
$("#product-variants").addClass("disabled-div");
t !== 0 ? $.ajax({
cache: !1,
url: "/VariantAttributes/GetProductHtmlByAttributes",
type: "POST",
data: {attributeId: t, productId: n},
success: function (n) {
$("#product-info").html(n);
$("#product-variants").removeClass("disabled-div");
initProductDetails();
initPdp();
renderRecommendedProductList()
}
}) : $.ajax({
cache: !1,
url: "/VariantAttributes/GetProductHtml",
type: "POST",
data: {productId: n},
success: function (n) {
$("#product-info").html(n);
$("#product-variants").removeClass("disabled-div");
initProductDetails();
initPdp();
renderRecommendedProductList()
}
})
}</script>
We can see here that the page makes an ajax() call to the URL "/VariantAttributes/GetProductHtmlByAttributes". So all you have to do is request that same URL to see if it has the data you need.
Related
I have been working for a accounting based project using Django-2.0.6 and python-3.6.
I want to implement shortcut keys in my project using jquery.
I have tried a library known as django-keyboard-shortcuts but it doesnot supports python3.
So I want to do it using jquery or any other option(if is there).
For example:
If I press Cntrl + R or any other combination from my keyboard it will redirect me to the desired url given in the combination.
Update
I have tried the following:
$(document).ready(function() {
$(document).keypress(function(event) {
if (event.which === 99) { window.location = '{% url 'accounting_double_entry:groupcreate' pk=company_details.pk pk3=selectdatefield_details.pk %}'; }
});
});
But got one problem that when I try to put combination of keys like ctrl+q or something like that it does not works.
Any idea how to do it?
Thank you
I wanted to show my sound sensor readings from a django site (original code as posted in the link). Now as per the situation 2 of the accepted answer, I wanted to make a Javascript function which repeatedly calls the ajax_data function from views template.
But it seems that no repeated calls are being made. And no update in reading reflects either.
My django template till now:
<!doctype html>
<html>
<head>
<title>Noise measurement site</title>
<script src="http://code.jquery.com/jquery-3.0.0.js"
integrity="sha256-jrPLZ+8vDxt2FnE1zvZXCkCcebI/C8Dt5xyaQBjxQIo="
crossorigin="anonymous"></script>
<script language="javascript" type="text/javascript">
function updateValue() {
$.ajax({
url:"D:/Python programs/django_projects/Noise_Measurement/noise_m/views.py/ajax_data/",
//success: updateValue(), for experimenting with a recursive call...
});
}
$(document).ready(function(){
var v = setInterval(updateValue,2000);
});
</script>
</head>
<body>
<p>Hello there</p>
<p>Present noise level : {{noise_level}} dB</p>
</body>
</html>
(I have mentioned rest of the code in my previously asked question. I have read some of the answers on the platform but I'm not getting much results.)
Update
Sorry I made a mistake. I made slight changes in the code and posted output without that only. Now I have made the exact changes as in the previous part. But output is not sorted out yet. (Thanks to comment by CumminUp07)
** Own answer on own question **
Oh sorry, actually it was a misunderstanding of the syntax from my side.
I was first supposed to create method in views.py, which will send the reading from the module taking it. Then for that method, I had to assign an url using path(), in a fashion like:
path('read', views.data_update1, name='readings'),
Then the ajax request was supposed to be made to read link:
$.ajax({
url: "read",
dataType: "json",
contentType: "application/json",
success: function(r) { ... }
});
Then this method is conveniently called using setInterval method.
But finally at the line, the {{ }} didn't help, so the div where the value was to be displayed was assigned an id, whose value was updated on each call of the method.
Created a JSBin that demostrated the problem: http://jsbin.com/kukehoj/1/edit?html,js,console,output
I'm creating my first REST-powered website. The backend is in Python (Django REST Framework), and seems to be working fine. I'm trying to make the front-end get the comments for the posts, but its not working.
HTML Imports
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.1/knockout-min.js"></script>
scripts.js
function Comment(data) {
this.body = ko.observable(data.responseText)
}
function Post(data) {
this.title = ko.observable(data.title)
this.body = ko.observable(data.body)
var self = this;
self.comments = ko.observableArray([])
self.comments(($.map(data.comments, function(link) { // Map the data from
return $.getJSON(link, function(data) { return new Comment(data)}) //These requests
})))
}
function PostViewModel() {
// Data
var self = this;
self.posts = ko.observableArray([])
// Get the posts and map them to a mappedData array.
$.getJSON("/router/post/?format=json", function(allData) {
var mappedData = $.map(allData, function(data) { return new Post(data)})
self.posts(mappedData)
})
}
ko.applyBindings(new PostViewModel());
Server data:
[{ "title":"-->Title here<--",
"body":"-->Body here<--",
"comments":[
"http://127.0.0.1:8000/router/comment/6/?format=json",
"http://127.0.0.1:8000/router/comment/7/?format=json",
"http://127.0.0.1:8000/router/comment/8/?format=json",
"http://127.0.0.1:8000/router/comment/9/?format=json"]
}]
where each of the links leeds to:
{"body":"-->Body here<--"}
index.html
<div class="col-lg-7" data-bind="foreach: { data: posts, as: 'posts' }">
<h3 data-bind="text: title"></h3>
<p data-bind="text: body"> </p>
<span data-bind="foreach: { data: comments(), as: 'comments' }">
<p data-bind="text: comments.body"></p>
</span>
</div>
(There is a lot more HTML, but i removed the irrelevant parts)
Everything is working fine, except from that the comments seem to be in the wrong format.
The chrome console shows JSON "responseText" bound to each of the comment object values.
Wrong format
I'm sorry if this is a stupid question, but I have tried everything - but it doesn't work. (I'm a noob)
There is nothing wrong with your sample code you provided except the part you have this.body = ko.observable(data.responseText) while your data does not contain a responseText in your sample commentData object . if you replace commentData object with var commentData = {"responseText":"-->Body here<--"} it works.
Note: this part
<span data-bind="foreach: { data: comments(), as: 'comments' }">
<p data-bind="text: comments.body"></p> // comments.body => body
</span>
on your question is wrong but you have it correct on your sample code .It should be
<span data-bind="foreach: { data: comments(), as: 'comments' }">
<p data-bind="text: body"></p>
</span>
Here is a working version of your sample :https://jsfiddle.net/rnhkv840/26/
I assume you are using Django Rest Framework, so the JSON structure you get for your posts is done automatically by your serializer based on your model fields.
Back to the frontend, I have not used knockout js before, but what you require is to load the comments using another controller. Either you do it one by one using the links provided by your main resource (this can result in lots of queries sometimes), or you create a filter on your comments endpoint which will allow you to retrieve comments for a specific post.
Ever considered using the django REST framework? It can help you serialize all you models with a simple viewset. Check out the docs.
So found the actual problem. The way the JavaScript read the data from the server, ment that since there was only one value for the comments, the data property of a comment was the variable storing the body of the comment. Not the data.body.
I am from a non coding background so python, web2py is very new to me.
My app needs to export textarea content (using RTE redactor) to pdf. I get html content from textarea (redactor), can you please advice me on how to use pyfpdf to generate a pdf file on button click.
I don't know how to get the html content (images and text) on button click in view to generate pdf using appreport.
I was able to use app-report to generate a pdf (using PISA, PYPDF does not work) from an existing html file (without css) if html file has css it throws an error,
***<class 'sx.w3c.cssParser.CSSParseError'> Terminal function expression expected closing ')':: (u'Alpha(Opacity', u'=0); }\n\n\n\n.ui-state-')***
This might be due to a mistake in the controller code:
def myreport():
html = response.render('myreport.html', dict())
return plugin_appreport.REPORTPISA(html = html)
Another thing I tried was passing the html from my view to the controller using ajax post (in Javascript). Redactor is the textarea RTE I am using and alert gives me the desired html result.
View:
function getContent() {
var t= jQuery('#redactor_content').getCode();
alert(t);
jQuery.ajax({
type: "POST",
url: "http://127.0.0.1:8000/Test50/default/myreport2",
data: "{g : 'jQuery('#redactor_content').getCode()'}"
});
}
Controller:
def myreport2():
g = request.get_vars
html = response.render(g)
return plugin_appreport.REPORTPISA(html = html)
Due to my less knowledge in coding , I am not able to figure out and correct my mistake. I will be thankful if anybody can help me with this problem.
Regards,
Akash
Could it be this post request:
jQuery.ajax({
type: "POST",
url: "http://127.0.0.1:8000/Test50/default/myreport2",
data: "{g : 'jQuery('#redactor_content').getCode()'}"
});
}
I think you should have the 'data' parameter be a literal dictionary, not a string. Change this line like this (remove all but one set of quotes):
data: {g : jQuery('#redactor_content').getCode() }
This should properly send the request. The jQuery documentation says that the data parameter should be key-value pairs, not a string.
There is music website I regularly read, and it has a section where users post their own fictional music-related stories. There is a 91 part series (Written over a length of time, uploaded part by part) that always follows the convention of:
http://www.ultimate-guitar.com/columns/fiction/riot_band_blues_part_#.html.
I would like to be able to get just the formatted text from every part and put it into one html file.
Conveniently, there is a link to a print version, correctly formatted for my purposes. All I would have to do is write a script to download all of the parts and then dump them into file. Not hard.
Unfortunately, the url for a print version is as follows:
www.ultimate-guitar.com/print.php?what=article&id=95932
The only way to know what article corresponds to what ID field is to look at the value attribute of a certain input tag in the original article.
What I want to do is this:
Go to each page, incrementng through the varying numbers.
Find the <input> tag with attribute 'name="rowid"' and get the number in it's 'value=' attribute.
Go to www.ultimate-guitar.com/print.php?what=article&id=<value>.
Append everything (minus <html><head> and <body> to a html file.
Rinse and repeat.
Is this possible? And is python the right language? Also, what dom/html/xml library should I use?
Thanks for any help.
With lxml and urllib2:
import lxml.html
import urllib2
#implement the logic to download each page, with HTML strings in a sequence named pages
url = "http://www.ultimate-guitar.com/print.php?what=article&id=%s"
for page in pages:
html = lxml.html.fromstring(page)
ID = html.find(".//input[#name='rowid']").value
article = urllib2.urlopen(url % ID).read()
article_html = lxml.html.fromstring(article)
with open(ID + ".html", "w") as html_file:
html_file.write(article_html.find(".//body").text_content())
edit: Upon running this, it seems there may be some Unicode characters in the page. One way to get around this is to do article = article.encode("ascii", "ignore") or to put the encode method after .read(), to force ASCII and ignore Unicode, though this is a lazy fix.
This is assuming you just want the text content of everything inside the body tag. This will save files with the format of storyID.html (so "95932.html") in the local directory of the Python file. Change the save semantics if you like.
You could actually do this in javascript/jquery without too much trouble. javascripty-pseudocode, appending to an empty document:
for(var pageNum = 1; i<= 91; i++) {
$.ajax({
url: url + pageNum,
async: false,
success: function() {
var printId = $('input[name="rowid"]').val();
$.ajax({
url: printUrl + printId,
async: false,
success: function(data) {
$('body').append($(data).find('body').contents());
}
});
}
});
}
After the loading completes you could save the resultant HTML to a file.