My goal for this python code is to create a way to obtain job information into a folder. The first step is being unsuccessful. When running the code I want the url to print https://www.indeed.com/. However instead the code returns https://secure.indeed.com/account/login. I am open to using urlib or cookielib to resolve this ongoing issue.
import requests
import urllib
data = {
'action':'Login',
'__email':'email#gmail.com',
'__password':'password',
'remember':'1',
'hl':'en',
'continue':'/account/view?hl=en',
}
response = requests.get('https://secure.indeed.com/account/login',data=data)
print(response.url)
If you're trying to scrape information from indeed, you should use the selenium library for python.
https://pypi.python.org/pypi/selenium
You can then write your program within the context of a real user browsing the site normally.
Related
I'm trying to download the csv file using Python from this site:https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators
There's a csv button in the top right corner that I want to automatically load into a data warehouse. I've gone through a few tutorials (new to Python) and have yet to be successful. Any recommendations?
Use the library called requests, it is:
import requests
You need it to create the request to the cvs resource.
Also there's a library used for screen-scraping called bs4
import bs4
You will need both to construct what you want. Look for a course over there on web scraping with python and bs4.
Also there's a library called csv,
import csv
You can use it to easily parse the csv file once you get it.
Check this example or google it:
https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3
Here's another course on LinkedIn learning platform
https://www.linkedin.com/learning/scripting-for-testers/welcome
Selenium did the trick for me:
from selenium import webdriver
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://gats.pjm-eis.com/gats2/PublicReports/GATSGenerators'
browser.get(url)
browser.find_element_by_xpath('//*[#id="CSV"]').click()
browser.close()
In my Python Script, I would like to get content from the page that I launched using this script:
import webbrowser
url = 'https://yahoo.com'
webbrowser.get('Chrome').open_new(url)
I know how to get using Selenium Webdriver, and have never get it through script itself.
any ideas would helpful,
Thanks
I am trying to write a code that will download all the data from a server which holds the .rar files about imaginary cadastrial particles for student projects. What I got for now is the query for the server which only needs to input a specific number of particle and access it as url to download the .rar file.
url = 'http://www.pg.geof.unizg.hr/geoserver/wfs?request=getfeature&version=1.0.0&service=wfs&&propertyname=broj,naziv_ko,kc_geom&outputformat=SHAPE-ZIP&typename=gf:katastarska_cestica&filter=<Filter+xmlns="http://www.opengis.net/ogc"><And><PropertyIsEqualTo><PropertyName>broj</PropertyName><Literal>1900/1</Literal></PropertyIsEqualTo><PropertyIsEqualTo><PropertyName>naziv_ko</PropertyName><Literal>Suma Striborova Stara (9997)</Literal></PropertyIsEqualTo></And></Filter>'
This is the "url" I want to open with the web browser module for a particle "1900/1" but this way I get an error:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
When I manually input this url it downloads the file without a problem.
What is the way I can make this python web application work?
I used a webbrowser.open_new(url) option which does not work.
You're using the wrong tool. webbrowser is for controlling a native web browser. If you just want to download a file, use the requests module (or urllib.request if you can't install Requests).
import requests
r = requests.get('http://www.pg.geof.unizg.hr/geoserver/wfs', params={
'request': 'getfeature',
...
'filter': '<Filter xmlns=...>'
})
print(r.content) # or write it to a file, or whatever
Note requests will handle encoding GET parameters for you -- you don't need to worry about escaping the request yourself.
I want to write a python script to automate uploading of image on https://cloud.google.com/vision/ and collect information from JSON tab there. I need to know how to do it.
Till now, I'm only able to open the website on chrome using following code:-
import webbrowser
url = 'https://cloud.google.com/vision/'
webbrowser.open_new_tab(url + 'doc/')
I tried using urllib2 but couldn't get anything.
Help me out please
you have to use google-cloud-vision lib
there is a sample code in this docs
https://cloud.google.com/vision/docs/reference/libraries#client-libraries-install-python
you can start from here
I am trying to fetch some information from Workflowy using Python Requests Library. Basically I am trying to programmatically get the content under this URL: https://workflowy.com/s/XCL9FCaH1b
The problem is Workflowy goes through a 'loading phase' before the actual content is displayed when I visit this website so I end up getting the content of 'loading' page when I get the request. Basically I need a way to defer getting the content so I can bypass the loading phase.
It seemed like Requests library is talking about this problem here: http://www.python-requests.org/en/latest/user/advanced/#body-content-workflow but I couldn't get this example work for my purposes.
Here is the super simple block of code that ends up getting the 'loading page':
import requests
path = "https://workflowy.com/s/XCL9FCaH1b"
r = requests.get(path, stream=True)
print(r.content)
Note that I don't have to use Requests just picked it up because it looked like it might offer a solution to my problem. Also currently using Python 2.7.
Thanks a lot for your time!