I dont know where to exactly start here, and I have to admit my knowledge of python and websites are limited. However In the past ive done some requests from an api and accessed a file or two from a website but I had some examples to build off of. In this case I have no written example to help me through the process so I dont really know where to start or if "requests" is even the way to go.
What I have is a distributor's website that has a file with product information.
If I were to download this file manually I would have to login, navigate to the download section of the website. At this point a popup appears where I select the brand I want to download, I have options to select from as far as data I would like to gather, a text box to name the file and a download button that has no url.
Im sure all this seems pretty vague since I dont know what info would be helpful at this point.
A nudge in the right direct would be great!!
Thanks
Screen shot of popup
It sounds like there may not be an API, in instances like this using a web automation solution such as selenium could get you the desired result.
For your case it sounds like you will need to find the button elements and then click them
From their basic example:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.send_keys("pycon")
enter code here`elem.send_keys(Keys.RETURN)
based on your example html code, after you load the page, you could use the following to find the button and click it
elem = driver.find_element_by_id("downloadBtn")
elem.click()
You can use a http library like Request to download this. but you may offer the username and password, you can study from its examples.
If the site you wish to download from has no JavaScript you will need to parse to navigate to the file you wish, consider using RoboBrowser. Selenium may be overkill for this.
Here is a basic example:
robo = RoboBrowser(history=True, parser="html.parser")
robo.open("http://www.python.org")
search = robo.get_form(action="/search/")
search["q"].value = "Really awesome search query"
robo.submit_form(search)
Related
I have dabbled with bits of simple code over the years. I am now interested in automating some repetitive steps in a web based CRM used at work. I tried a few automation tools. I was not able to get AutoIT to to work with the Chrome webdriver. I then tried WinTask and did not make meaningful progress. I started exploring Python and Selenium last week.
I now have automated the first few steps of my project by Googling about each step I wanted to achieve, learning from pages on Stackflow and other sites. Where I need help is that most of the links in the CRM are some sort of javascript links. Most of the text links or images have links that are formatted like this...
javascript:window.location = 'Reports/ResponseTimes.aspx?from=1%2f14%2f2021&to=1%2f14%2f2021&target=gn';
It looks like the many find_element_by functions in Selenium do not interact with the javascript links. Tonight I found a page that directed me to use... driver.execute_script(javaScript) ...Eventually I found an example that made it clear I should enter the javascript link into that function. This works...
driver.execute_script("window.location = 'Reports/ResponseTimes.aspx?from=1%2f14%2f2021&to=1%2f14%2f2021&target=gn';")
My issue is that I see now that the javascript links are actually and dynamically generated. In the code above the link gets updated with dates based on the current date. I can't reuse the driver.execute_script() code above since the dates have to be updated.
My hope is to find a way to code so that I can locate the javascript links I need based on some part of the link that does not change. The link above always has "target=gn" at the end and that is unique enough that if I could find and pull the current version of the link into a variable and then run it in driver.execute_script(), I believe that would solve my current issue.
I expect a solution could then be used in the next step I need to perform, where there a list of new leads that all needs to be updated in a manner that tells the system a human has reviewed the lead and "stopped the clock". To view each lead, there are more javascript links. Each link is unique since it includes a value that is the record number for the lead. Here's the first two...
javascript:top.viewItem(971244899);
javascript:top.viewItem(971312602);
I imagine that being able to search the page for some or all of... javascript:top.viewItem( ...in order to create a variable for... javascript:top.viewItem(971244899); ...so that it can be placed in... driver.execute_script() ...is the approach that is needed.
Thanks for any suggestions. I have made many searches on this site and Google for phrases that might teach me more about working with javascript links. I am asking for guidance since I have not been able to move forward on my own. Here's my current code...
import selenium
PATH = "C:\Program Files (x86)\chromedriver.exe"
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome(PATH)
driver.get("https://apps.vinmanager.com/cardashboard/login.aspx")
# log in
time.sleep(1)
search = driver.find_element_by_name("username")
search.send_keys("xxx")
search.send_keys(Keys.RETURN)
time.sleep(2)
search = driver.find_element_by_name("password")
search.send_keys("xxx")
search.send_keys(Keys.RETURN)
time.sleep(1)
# close news pop-up
driver.find_element_by_link_text("Close").click()
time.sleep(2)
# Nav to left pane
driver.switch_to.frame('leftpaneframe')
# Leads at No Contact link
driver.execute_script("window.location = 'Reports/ResponseTimes.aspx?from=1%2f14%2f2021&to=1%2f14%2f2021&target=gn';")
Eventually I found enough info online to recognize that I needed to replace the "//a" tag in the xpath find method with the proper tag, which was "//area" in my case and then extract the href so that I could execute it...
## click no contact link ##
print('click no contact link...')
cncl = driver.find_element_by_xpath("//area[contains(#href,'target=gn')]").get_attribute('href')
time.sleep(2)
driver.execute_script(cncl)
The problem: I want to write a Python script that takes a screenshot of a website I have opened in a browser each time it loads.
The thing is that I have a website where there are like 300 exam questions which I can get through, try each one of them and I will have the correction when I submit my answer. I will not have access to this questionnaire after a certain date, but I want to keep the questions (which I could write down, but laziness is strong in me, and want to learn Python).
The "attempt": I thought of doing a simple Python script with imgkit to take the screenshots. I'm opened to other suggestions, as imgkit was the first thing I saw while looking for this, and the code looks plain and simple to me:
import imgkit
imgkit.from_url('http://webpage.com', 'out.jpg')
But I have to provide the url for each webpage, and that will be more tedious than taking a screenshot with OS features, thus I want to automatize it.
The questions:
There is a way to make Python monitor a browser tab and take a screenshot each time it reloads (that will be when a new question appears)?
Or maybe get the tab's URL to pass it to imgkit and take the screenshot.
Another thing that I saw is that imgkit can generate a "screenshot" from a HTML file. Can Python download the HTML code from a tab I have open in my browser?
Selenium is your friend here. It is a framework designed for testing but it will make what you want really easy.
Selenium allows you to spin-up a web browser and control it. So you can instruct it to go to the web address you want and then do things. Normally you would instruct it to click here, write in a form, etc.
In your case you only want it to open a certain address, take a screenshot, go the the next address and repeat.
Here you have a tutorial on how to do exactly what you want.
The specific code is:
from selenium import webdriver
#1. Get the driver to manage the web-browser you choose
driver = webdriver.Chrome()
#2. Go the the webadress you want
driver.get('https://python.org')
#3. Take a screenshot
driver.save_screenshot("screenshot.png")
driver.close()
PS: In order for the tutorial to run you will need to have installed the web driver for Selenium to be able to spin-up and run Chrome. Here are the instructions for that.
Using Python + Selenium to create a web crawler/scraper to notify me when new homework is posted. Managed to log into the main website, but you need to click a link to select your course.
After searching through the HTML manually, I found this information about the link I usually click (The blue box is the link).
However, no button that seems clickable. So I searched the page for the link I knew it should redirect me to, and I found this:
It looks like a card, which is a new data structure/object for me. How can I use an automated web crawler to click this link?
Try the following:
ui.WebDriverWait(self.driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".title.ellipsis")))
driver.find_element_by_css_selector(".title.ellipsis").click()
Hope it helps you!
I've just started to learn coding this month and started with Python. I would like to automate a simple task (my first project) - visit a company's career website, retrieve all the jobs posted for the day and store them in a file. So this is what I would like to do, in sequence:
Go to http://www.nov.com/careers/jobsearch.aspx
Select the option - 25 Jobs per page
Select the date option - Today
Click on Search for Jobs
Store results in a file (just the job titles)
I looked around and found that Selenium is the best way to go about handling .aspx pages.
I have done steps 1-4 using Selenium. However, there are two issues:
I do not want the browser opening up. I just need the output saved to a file.
Even if I am ok with the browser popping up, using the Python code (exported from Selenium as Web Driver) on IDLE (i have windows OS) results in errors. When I run the Python code, the browser opens up and the link is loaded. But none of the form selections happen and I get the foll error message (link below), before the browser closes. So what does the error message mean?
http://i.stack.imgur.com/lmcDz.png
Any help/guidance will be appreciated...Thanks!
First about the error you've got, I should say that according to the expression NoSuchElementException and the message Unable to locate element, the selector you provided for the web-driver is wrong and web-driver can't find the element.
Well, since you did not post your code and I can't open the link of the website you entered, I can just give you a sample code and I will count as much details as I can.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("url")
number_option = driver.find_element_by_id("id_for_25_option_indicator")
number_option.click()
date_option = driver.find_element_by_id("id_for_today_option_indicator")
date_option.click()
search_button = driver.find_element_by_id("id_for_search_button")
search_button.click()
all_results = driver.find_elements_by_xpath("some_xpath_that_is_common_between_all_job_results")
result_file = open("result_file.txt", "w")
for result in all_results:
result_file.write(result.text + "\n")
driver.close()
result_file.close()
Since you said you just started to learn coding recently, I think I have to give some explanations:
I recommend you to use driver.find_element_by_id in all cases that elements have ID property. It's more robust.
Instead of result.text, you can use result.get_attribute("value") or result.get_attribute("innerHTML").
That's all came into my mind by now; but it's better if you post your code and we see what is wrong with that. Additionally, it would be great if you gave me a new link to the website, so I can add more details to the code; your current link is broken.
Concerning the first issue, you can simply use a headless browser. This is possible with Chrome as well as Firefox.
Check Grey Li's answer here for example: Python - Firefox Headless
from selenium import webdriver
options = webdriver.FirefoxOptions()
options.add_argument('headless')
driver = webdriver.Firefox(options=options)
Alright, I'm confused. So I want to scrape a page using Selenium Webdriver and Python. I've recorded a test case in the Selenium IDE. It has stuff like
Command Taget
click link=14
But I don't see how to run that in Python. The desirable end result is that I have the source of the final page.
Is there a run_test_case command? Or do I have to write individual command lines? I'm rather missing the link between the test case and the actual automation. Every site tells me how to load the initial page and how to get stuff from that page, but how do I enter values and click on stuff and get the source?
I've seen:
submitButton=driver.find_element_by_xpath("....")
submitButton.click()
Ok. And enter values? And get the source once I've submitted a page? I'm sorry that this is so general, but I really have looked around and haven't found a good tutorial that actually shows me how to do what I thought was the whole point of Selenium Webdriver.
I've never used the IDE. I just write my tests or site automation by hand.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
print browser.page_source
You could put that in a script and just do python wd_script.py or you could open up a Python shell and type it in by hand, watch the browser open up, watch it get driven by each line. For this to work you will obviously need Firefox installed as well. Not all versions of Firefox work with all versions of Selenium. The current latest versions of each (Firefox 19, Selenium 2.31) do though.
An example showing logging into a form might look like this:
username_field = browser.find_element_by_css_selector("input[type=text]")
username_field.send_keys("my_username")
password_field = browser.find_element_by_css_selector("input[type=password]")
password_field.send_keys("sekretz")
browser.find_element_by_css_selector("input[type=submit]").click()
print browser.page_source
This kind of stuff is much easier to write if you know css well. Weird errors can be caused by trying to find elements that are being generated in JavaScript. You might be looking for them before they exist for instance. It's easy enough to tell if this is the case by putting in a time.sleep for a little while and seeing if that fixes the problem. More elegantly you can abstract some kind of general wait for element function.
If you want to run Webdriver sessions as part of a suite of integration tests then I would suggest using Python's unittest to create them. You drive the browser to the site under test, and make assertions that the actions you are taking leave the page in a state you expect. I can share some examples of how that might work as well if you are interested.