I want to read the HTML contents of a site on Google's Play Store developer backend from Python.
The Url is
https://play.google.com/apps/publish/?dev_acc=1234567890#AppListPlace
The site is of course only accessibly if you're logged in.
I naively tried:
response = requests.get(url, auth=HTTPBasicAuth('username#gmail.com', 'mypassword'))
which yielded only the default 'you need to be logged in to view this page' html content.
Any way to do this?
Trying to read the HTML contents of the page is not the way to go.
Basic HTTP authentication is not something you will see very often these days. It's the kind which pops up a browser alert message asking you for your username and password. Google, like most other websites, uses their own more sophisticated system. That system is not designed to be accessed by anyone but humans. Not to mention that storing your Google account password in your source code is a terrible idea.
Instead, you should look into the Google Play Developer API, which is designed to be accessed by machines, and uses OAuth2 authentication.
Related
Okay, so I've got a website. It's a site, where I can check my university timetable. So I've decided to write an application in Python, which logs in, scraps the timetable (classrooms, hours, lecturers, etc.), and display them, lets say for the beggining, in a .txt file. So I've done some Basic Authentication with HTTP Requests, it looked just like this:
url = "http://httpbin.org"
authURL = "http://httpbin.org/basic-auth/user/passwd"
r=requests.get(authURL, auth=HTTPBasicAuth("user", "passwd"))
print (r.content)
It's a freely hosted service, just to practise. Okay, but there are many other types of authentication. And here's my question: How can I actually determine which one is this website using, and then use that information in my application?
I am trying to crawl a website for the first time. I am using urllib2 Python
I am currently trying to log into Foursquare social networking site using Python urlib2 and Beautifulsoup. To view a particular page, I need to provide username and password.
So,I followed the Basic Authentication described on the ducumentation page.
I guess, everything worked well, but the site throws up a security check asking me to type a text (capcha), before sending me the required page. It obviously looks like, the site is detecting that, a page is being requested not by a human, but a crawler.
So, what is the way, to avoid being detected. How to make urllib2 get the desired page, without having to stop at the security check? Pls help..
You probably want to use foursquare API instead.
You have to use the foursquare API. I guess, there is no other way. API are designed for such purposes.
Crawlers depending solely on the HTML format of the page will fail in the furture when the HTML page changes
I've been reading about beautifulSoup, http headers, authentication, cookies and something about mechanize.
I'm trying to scrape my favorite art websites with python. Like deviant art which I found a scraper for. Right now I'm trying to login but the basic authentication code examples I try don't work.
So question, How do I find out what type of authentication a site uses so that I know I'm trying to login the correct way? Including things like valid user-agents when they try to block bots.
Bear with my ignorance as I'm new to HTTP, python, and scraping.
It's very unlikely that any of the sites you are interested in use basic auth. You will need a library like mechanize that manages cookies and you will need to submit the login information to the site's login page.
It's my first question here.
Today, I've done a little application using wxPython: a simple Megaupload Downloader, but yet, it doesn't support premium accounts.
Now I would like to know how to download from MU with a login (free or premium user).
I'm very new to Python, so please don't be specific and "professional".
I used to download files with urlretrieve but, but is there a way to pass "arguments" or something to be able to log in as a premium user ?
Thank you. :D
EDIT =
News: new help needed xD
After trying with PyCUrl, htmllib2 and mechanize, I've done the login with urllib2 and cookiejar (the requested html says the username).
But when I start download a file, surely the server doesn't keep my login, in fact the downloaded file seems corrupted (I changed wait time from 45 to 25 seconds).
How can I download a file from MegaUpload keeping my previously done login? Thanks for your patient :D
Questions like this are usually frowned upon, they are very broad, and there are already an abundance of answers if you just search on google.
You can use urllib, or mechanize, or any library you can make an http post request with.
megaupload looks to have the form values
login:1
redir:1
username:
password:
just post those values at http://megaupload.com/?c=login
all you should have to do is set your username and password to the correct values!
For logging in using Python follow the following steps.
Find the list of parameters to be sent in the POST request and the url where the request has to be made by viewing the source of the login form. You may use a browser with "Inspect Element" feature to find it easily. [parameter name examples - userid, password]. Just check the tags name attribute.
Most of the sites set a cookie on logging in and the cookie has to be sent along with subsequent requests. To handle this download httllib2 (http://code.google.com/p/httplib2/ ) and read the wiki page on the link given. It has shown how to login with examples.
Now you can make subsequent requests for files, the cookies etc. will be handled automatically by httplib2.
i do alot of web stuff with python, i perfer using pycurl you can get it here
it is very simple to post data and login with curl, i've used it accross many languages such as PHP, python, and C++, hope this helps
You can use urllib this is a good example
Google provides APIs for a number of their services and bindings for several languages. However, not everything is supported. So this question comes from my incomplete understanding of things like wget, curl, and the various web programming libraries.
How can I authenticate programmatically to Google?
Is it possible to leverage the existing APIs to gain access to the unsupported parts of Google?
Once I have authenticated, how do I use that to access my restricted pages? It seems like the API could be used do the login and get a token, but I don't understand what I'm supposed to do next to fetch a restricted webpage.
Specifically, I am playing around with Android and want to write a script to grab my app usage stats from the Android Market once or twice a day so I can make pretty charts. My most likely target is python, but code in any language illustrating non-API use of Google's services would be helpful. Thanks folks.
You can get the auth tokens by authenticating a particular service against https://www.google.com/accounts/ClientLogin
E.g.
curl -d "Email=youremail" -d "Passwd=yourpassword" -d "service=blogger" "https://www.google.com/accounts/ClientLogin"
Then you can just pass the auth tokens and cookies along when accessing the service. You can use firebug or temper data firefox plugin to find out the parameter names etc.
You can use something like mechanize, or even urllib to achieve this sort of thing. As a tutorial, you can check out my article here about programmatically submitting a form .
Once you authenticate, you can use the cookie to access restricted pages.
CLientLogin is now deprecated: https://developers.google.com/accounts/docs/AuthForInstalledApps
How can we authenticate programmatically to Google with OAuth2?
I can't find an expample of request with user and password parameter as in the CLientLogin :(
is there a solution?