How can I make my navigation links work from any page? - python

I have a Django template containing this menu:
<ul class="menu">
<li class="nav-current" role="presentation">index</li>
<li role="presentation">cpu info</li>
<li role="presentation">about</li>
</ul>
When I click "cpu info" from the home page my browser goes to /cpuinfo. This works.
But when I am on other pages like /post/ that link takes me to /post/cpuinfo, which isn't correct.
How can I make my link work from any page?

You need url in template, for example:
<li role="presentation">cpu info</li>
<!-- Change it ^^^ on real url name-->

Related

Web Scraping with Selenium - Detecting a dropdown menu

Selenium version 3.141.0
I'm writing a web scraping script that should select a certain option from a dropdown menu with Selenium webdriver. The problem is, I cannot seem to detect this dropdown menu element. I tried detecting it with Class and by CSS selector, but it's still undetectable.
the dropdown menu is a status menu, it contains:
Draft
Submitted
Reviewed
Released
Rejected
Obsolete
This is the HTML code of the part of the page where the dropdown menu is:
<div class="controls col-md-5 angular2-multiselect" id="status-field">
<ctf-angular2-multiselect class="defaultSettings ng-valid ng-touched ng-dirty">
<div class="cuppa-dropdown" qa-name="dropdown-multiselect">
<div class="selected-list" tabindex="0">
<div class="c-btn" qa-name="toggle-dropdown-statusField">
<!----><!----><!---->
<span>
<!----><span qa-name="item-0">Draft</span>
</span>
<!----><!----><!---->
<div class="dropdown-caret"></div>
</div>
</div>
<div class="dropdown-container" qa-name="dropdown" hidden="">
<div class="dropdown-list">
<div class="list-area" qa-name="list-area">
<!----><!----><!----><!----><!---->
<ul class="lazyContainer">
<!----><!---->
<span>
<!---->
<li class="pure-checkbox single-select-label-selected">
<!----><label qa-name="item-0" title="Draft" class="single-select-label">Draft</label>
</li>
<li class="pure-checkbox">
<!----><label qa-name="item-1" title="Submitted" class="single-select-label">Submitted</label>
</li>
<li class="pure-checkbox">
<!----><label qa-name="item-2" title="Reviewed" class="single-select-label">Reviewed</label>
</li>
<li class="pure-checkbox">
<!----><label qa-name="item-3" title="Released" class="single-select-label">Released</label>
</li>
<li class="pure-checkbox">
<!----><label qa-name="item-4" title="Rejected" class="single-select-label">Rejected</label>
</li>
<li class="pure-checkbox">
<!----><label qa-name="item-5" title="Obsolete" class="single-select-label">Obsolete</label>
</li>
</span>
<!---->
</ul>
<!----><!----><!----><!---->
</div>
</div>
</div>
</div>
</ctf-angular2-multiselect>
</div>
Apparently I'm not that good with HTML, so I was depending on IDs to detect elements in the previous codes I wrote. This code doesn't have any.
This is how the GUI looks like:
Picture of GUI
I tried using classes to detect the dropdown menu like this:
Select(driver.find_element(By.CSS_SELECTOR, 'ctf-angular2-multiselect')).select_by_value("Released")
But it doesn't work.
Trying to detect with ID like this:
Select(driver.find_element_by_id('status-field')).select_by_value("Released")
doesn't work either
You have at least two challenges to overcome:
This isn't a normal html select. You need to click on things through Selenium.
The 'dropdown-container' is hidden. You need click on whatever opens it first.
Should we assume the div with the 'c-btn' class opens the drop down? I'd normally check in chrome
dev tools if that element has event listeners or use chrome to
save element as a variable and run a js click on it to verify it's
the right element to click before adding it in selenium
Once .dropdown-container is not longer hidden. I think your life would be easier with using xpath to select the list elements. to select the 'Released' option, the xpath would just be //label[#title='Released']
You can also use chrome dev tools to verify xpath selections before adding to your selenium script.
With python selenium, you can then click on your xpath selection like this:
driver.find_element(
By.XPATH, "//label[#title='Released']").click()

How do I link a title with a url in Python and BeautifulSoup?

So im working on a script to download videos for me automatically, however it would seem i have stumbled upon a problem(insufficient experience).
How do i link a category title with a bunch of url's?
Expected output:
->CATEGORY 1
->https://example.com/part/of/link/x
->https://example.com/part/of/link/x
->https://example.com/part/of/link/x
.....
->CATEGORY 2
->https://example.com/part/of/link/x
for category_title in soup.findAll(class_='section-title'):
title = category_title.get_text().lstrip()
print(title)
for onelink in soup.findAll('a'):
link = onelink.get('href')
print(f'https://example.com{link}')
What this does is:
Lists all the titles(10 titles)
Lists all the links(100 links)
<div class="section-title">
CATEGORY 1
<li class="section-item next-random">
<a class="item" href="/part/of/link/x">
<div class="title-container">
<div class="btn-primary btn-sm pull-right">
BUTTON
</div>
<span class="random-name">
URL TITLE
</span>
</div>
</a>
</li>
</ul>
Depending on how the HTML structure of your page is made, you could check the "section-title" parent object, and then, list all the link of that particular section. Giving you all the link for category #
Here some help

Add my resume as a PDF file to a Pelican static blog site

I have a personal blog which I build with pelican and used Elegant. Here are the source and site repos: source and site.
Now I want to add my resume or CV as a separate category or page.
Expected setup should be. "Home", "Categories", "Tags", "Archives", and "Resume"
Expected output should be: when visitor click resume page it should open as a pdf file showing my resume or CV.
I tried a lot but without any success. Anyone can help me?
Just add a pdf here: content/images/resume.pdf
Then add a link to it in your header (in pelican-themes/elegant/templates/base.html). Underneath the following block:
<div class="nav-collapse collapse">
<ul class="nav pull-right top-menu">
You'll see a bunch of <li> tags, e.g.:
<li {% if page_name == 'archives' %} class="active"{% endif %}>Archives</li>
Add <li><a href='images/my.pdf'></a></li> below the Archives link referenced above and rebuild your site.

How to follow pagination with scrapy

I have this target url:
<nav>
<ul class="pagination pagination-lg">
<li class="active" itemprop="pageStart">
1</li>
<li itemprop="pageEnd">
2</li>
<li>
<a href="moto-2.html" aria-label="Next" class="xh-highlight">
<span aria-hidden="true">ยป</span></a>
</li><
</ul>
</nav>
but I cant select the next page link, I try with:
next_page_url = response.xpath('./div/div/div[1]/nav/ul/li[3]/a').extract_first()
also with
response.css('[class="xh-highlight"]').extract()
I only get as result [] on the shell
other point: I set the user agent as google chrome because I read here about other user with problems on mark accents, but don't fix my problem
I want to warn you Scrapy cannot scrape website rendered with javascript. Consider using a web driver like Selenuim with scrapy if the page is rendered in javascript.
I would recommend you go to scrapy shell, and type view(response). If you see a blank page than the page is rendered in javascript.
This is how you get urls from xpath, but I doubt it will make a difference sence you see no object
next_page_url = response.xpath('nav/ul/li[3]/a/text()')

Web Scraping with Python Request/lxml: Getting data from ul/li

so I'm pretty new to this, and I haven't been able to find anything on google on this question.
I'm using request and lxml with Python, I've seen that there's a lot of different modules for web scraping, but is there any reason to choose one over the other? Can you do the same stuff with requests/lxml as you can with for example BeautifulSoup?
Anyway, here's my actual question;
This is my code:
import requests
from lxml import html
# Login data
inputUrl = 'http://forum.mytestsite.com/login'
usr = 'myusername'
pwd = 'mypassword'
payload = dict(login=usr, password=pwd)
# Open session
with requests.Session() as s:
# Login
s.post(inputUrl, data=payload)
# Get page data
pageResult = s.get('http://forum.mytestsite.com/icons/', allow_redirects=False)
pageResult = html.fromstring(pageResult.content)
pageIcons = pageResult.xpath('//script[#id="table-icons"]/text()')
print pageIcons[0]
The result when printing pageIcons[0]:
<ul id="icons">
{{#each icons}}
<li data-handle="{{handle}}">
<img src="{{image_path}}" alt="{{desc_or_name this}}" title="{{desc_or_name this}}">
</li>
{{/each}}
</ul>
This is the website/js code that generates the icons:
<script id="table-icons" type="text/x-handlebars-template">
<ul id="icons">
{{#each icons}}
<li data-handle="{{handle}}">
<img src="{{image_path}}" alt="{{desc_or_name this}}" title="{{desc_or_name this}}">
</li>
{{/each}}
</ul>
</script>
And here's the result on the page:
<ul id="icons">
<li data-handle="558FSTBI" class="">
<img src="http://testsite.com/icons/558FSTBI.1.png" alt="Icon 1" title="Icon 1">
</li>
<li data-handle="310AYTZI">
<img src="http://testsite.com/icons/310AYTZI.1.png" alt="Icon 2" title="Icon 2">
</li>
<li data-handle="669PQXBI" class="">
<img src="http://testsite.com/icons/669PQXBI.1.png" alt="Icon 3" title="Icon 3">
</li>
</ul>
My goal:
What I would like to do is to retrieve all of li data-handles, but I haven't been able to figure out how to retrieve this data. So my goal is to retrieve all of the icon paths and their titles, could anyone help me out here? I'd really appreciate any help :)
You aren't parsing the li or ul.
Start with this
//ul[#id='icons']/li/img
And from those elements, you can extract the individual information
Regarding the first question, beautifulsoup optionally uses lxml. If you don't think you need it, and are comfortable with XPath, don't worry about it.
However, since it's Javascript generating the page, you need a headless browser rather than requests library.
Get page generated with Javascript in Python
Reading dynamically generated web pages using python

Categories