I have the following details about a POST request
1)The URL: "http://kuexams.org/get_results"
2)Request body: "htno=001111505&ecode=kuBA3_Supply_Dec_2013".
I got this from analyzing HTTP traffic. Then I found this site where you specify these values and it reurns the response. https://requestable.pieterhordijk.com/sSd7o
I need to know how to make something similar to what the site does. Just post the Request body to the URL and return the data for parsing.
P.S: I've tried multiple methods for POST to this site http://kuexams.org/results/3GKZ-D_QBHLWXrg7lZ2IGoKBI7lGfpSK37GNoykJ8k5UerNGYn21FN6w_R5XZ8IQVUHRb8ZYVwq-zN4BhIjusQ,,/ugresults/ug
Here is what I tired. It uses a different method but fails miserably.
import mechanize
import cookielib
import urllib
import logging
import sys
import re
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=3)
br.addheaders = [('User-agent', 'Firefox')]
r = br.open('http://kuexams.org/results/06JZ4SYmTV97s4oROGuLYglPFH3XxJKAunIilJkDBV0gBxSU6YVJ_kXRL0UZb3cIjz9aFdnkYaE-T_S3ubaXPg,,/ugresults/ug')
scrape = r.read()
#print(scrape)
pattern = re.compile('=5&code=(.{5})\'')
img = pattern.findall(scrape)
print(img)
imgstr = str()
imgstr = img[0]
print(imgstr)
br.select_form("appForm")
br.form["htno"] ='001111441'
br.form["entered_captcha"] = imgstr
response = br.submit()
print response.read()
br.back()
I can see post request there not using any cookie or captcha details. Its implemented only client side, So no need for storing cookie or getting captcha therefore this code will work for you. Use requests its cool and easy to use.
import requests
url='http://kuexams.org/get_results'
payload={'htno': '001111441', 'ecode': 'kuBA2_Supply_Dec_2013'}
headers={"User-Agent": "Some Cool Thing"}
r=requests.post(url,headers=headers,data=payload)
print r.content
Note that server is not accepting default User-agent by requests so i added custom one without that it won't accept POST request or might give Forbidden Error.
Related
I need to download the content of a web page using Python.
What I need is the TLE of a specific satellite from Space-Track.org website.
An example of the url I need to scrape is the following:
https://www.space-track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show
Below the unsuccesful code I wrote/copied:
import requests
url = 'https://www.space-
track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show'
res = requests.post(url)
html_page = res.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
print(text)
res.post(url) returns Response [204] and I can't access the content of the webpage.
Could this happen because of the required login?
I must admit that I am not experienced with Python and I don't have the knowledge to this myself.
What I can do is to manipulate text files and from the DevTools page I can get the HTML file and extrapolate the text, but how can I do this programmatically?
To access the url you mentioned , you need USERNAME and PASSWORD Authorization.
to do this( customize to your need):
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib ## http.cookiejar in python3
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.open("https://id.arduino.cc/auth/login/")
br.select_form(nr=0)
br.form['username'] = 'username'
br.form['password'] = 'password.'
br.submit()
print br.response().read()
I don't have access to this API, so take my advice with a grain of salt, but you should also try using requests.get instead of requests.post.
Why? Because requests.post POSTs data to the server, while requests.get GETs data from the server. GET and POST are known as HTTP methods, and to learn more about them, see https://www.tutorialspoint.com/http/http_methods.htm. Since web browsers use GET, you should give that a try.
I'm very new to python, and I'm trying to scrape a webpage using BeautifulSoup, which requires a log in.
So far I have
import mechanize
import cookielib
import requests
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.open('URL')
#login form
br.select_form(nr=2)
br['email'] = 'EMAIL'
br['pass'] = 'PASS'
br.submit()
soup = BeautifulSoup(br.response().read(), "lxml")
with open("output1.html", "w") as file:
file.write(str(soup))
(With "URL" "EMAIL" and "PASS" being the website, my email and password.)
Still the page I get in output1.html is the logged out page, rather than what you would see after logging in?
How can I make it so it logs in with the details and returns what's on the page after log in?
Cheers for any help!
Let me suggest another way to obtain desired page.
It may be a little bit easy to troubleshoot.
First, you should login manually with open any browser Developer tools's page Network. After sending login credentials, you will get a line with POST request. Open the request and right side you will get the "form data" information.
Use this code to send login data and get response:
`
from bs4 import BeautifulSoup
import requests
session = requests.Session()
url = "your url"
req = session.get(url)
soup = BeautifulSoup(req.text, "lxml")
# You can collect some useful data here (like csrf code or some token)
#fill in form data here
params = {'login': 'your login',
'password': 'your password'}
req = session.post(url)
I hope this code will be helpful.
I know there are alot of question about this matter but I try most of them.
my goal is to get the article from this page and use this in gae.
If I try to log in, it redirects to a long url ,after I log in there it redirects back to the article.
first I try urllib2 which is mentioned in here how to login to a website with python and mechanize and it didnt work.
then I took SelectLoginForm and login functions from https://github.com/cdhigh/KindleEar/blob/master/books/base.py it didnt work neither.
selenium wouldnt work because I gonna use it in gae. I guess gae cant support selenium
I started looking into mechanize module. my current code is :
# -*- coding: cp1254 -*-
import cookielib
import urllib2
import mechanize
b=mechanize.Browser()
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize.HTTPRefreshProcessor(),max_time=1)
b.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
b.open('https://hurpass.com/iframe/login?appkey=52da7ef64037f9497f0acb091390051062215&secret=52da7f0c4037f9497f0acb0b1390051084754&domain=sosyal.hurriyet.com.tr&callback_url=http://sosyal.hurriyet.com.tr/Account/AutoLogin?returnUrl=http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073&referer=http://sosyal.hurriyet.com.tr&user_page=http://sosyal.hurriyet.com.tr/Account/AutoLogin?returnUrl=http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073&is_mobile=0&session_timeout=0&is_vative=0&email=')
b.select_form(name='frm_login')
b["email"]="tasklak#hotmail.com"
b["password"]="123456"
b.submit(type="submit")
url='http://sosyal.hurriyet.com.tr/yazar/ahmet-hakan_131/baskanlik-diktatorluk-getirir-diyenleri-girtlaklamak-istiyorum_28116073'
last_response = b.response()
http_header_dict = last_response.info().dict
html_string_list = last_response.readlines()
html_data = "".join(html_string_list)
page = br.open(url)
print page.read().decode("UTF-8")
ha=open("test.html",'w')
ha.write(html_data)
ha.close
again I cant get this working but if I open the html it created, it redirects to logged article page. may it be mechanize redirection problem or is it impossible to login this page?
edit after mihail's answer:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
user = 'tasklak#hotmail.com'
password = '123456'
xor_password = ''.join(chr(12 ^ ord(c)) for c in password)
auth_url = 'http://auth.hurriyet.com.tr/api/loginuser/{}/?{}'.format(user, xor_password)
url='http://www.hurriyet.com.tr/anasayfa/'
sessionidd=urllib2.urlopen(auth_url).read().split(',')[1].split('\"')[3]
print sessionidd
opener.open(url+';ASPSESSIONID='+sessionidd)
print cj
edit 2:
sessionidd=urllib2.urlopen(auth_url).read().split(',')[1].split('\"')[3]
print sessionidd
opener.open(url)
k=0
for a in cj:
if k<2:
a.value=sessionidd
k+=1
print cj
First of all, you should know that if there isn't a publicly available API to do all this without scraping then it's very likely that what you are doing is not welcomed by the website owners, against their terms of service and could even be illegal and punishable by law depending on where you live.
Unless mechanize can interpret javascript code (which I doubt it does although I might be wrong) it's not going to be very helpful, although, skimming through the links you provided with Chrome's DevTools it looks like you could implement what you want with a few pure urlib2 requests.
For example, when you login for the first time, you'll see a GET request to http://auth.hurriyet.com.tr/api/loginuser/tasklak#hotmail.com/?%3D%3E%3F89%3A URL which includes your username and encoded password and returns some session IDs. The reason mechanize wouldn't work is because the password is encoded via a javascript code that's not being interpreted when you are submitting the form in your code.
Going into the source code of the login form you'll see that when the "Submit" button is clicked a loginUser() function is called which when you'll find you'll see that the password is being xor'ed with the following code:
for (i = 0; i < password.length; ++i) {
encoded_password += String.fromCharCode(12 ^ password.charCodeAt(i));
}
which you would have to rewrite in python, so to recieve the initial session IDs you'd have something like:
import urllib2
user = 'tasklak#hotmail.com'
password = '123456'
xor_password = ''.join(chr(12 ^ ord(c)) for c in password)
auth_url = 'http://auth.hurriyet.com.tr/api/loginuser/{}/?{}'.format(user, xor_password)
print(urllib2.urlopen(auth_url).read())
It looks like you're then going to need to validate the session IDs you received and retrieve session cookies which you then can use to get full articles but I will leave that to you.
I have the following code:
import requests
import sys
import urllib2
import re
import mechanize
import cookielib
#import json
#import imp
#print(imp.find_module("requests"))
#print(requests.__file__)
EMAIL = "******"
PASSWORD = "*******"
URL = 'https://www.imleagues.com/Login.aspx'
address = "http://www.imleagues.com/School/Team/Home.aspx?Team=27d6c31187314397b00293fb0cfbc79a"
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.add_password(URL, EMAIL, PASSWORD)
br.open(URL)
#br.open(URL)
#br.select_form(name="aspnetForm")
#br.form["ctl00$ContentPlaceHolder1$inUserName"] = EMAIL
#br.form["ctl00$ContentPlaceHolder1$inPassword"] = PASSWORD
#response = br.submit()
#br= mechanize.Browser()
site = br.open(address)
# Start a session so we can have persistant cookies
#session = requests.Session()
# This is the form data that the page sends when logging in
#login_data = {
# 'ctl00$ContentPlaceHolder1$inUserName': EMAIL,
# 'ctl00$ContentPlaceHolder1$inPassword': PASSWORD,
# 'aspnetFrom': 'http://www.imleagues.com/Members/Home.aspx',
#}
#URL_post = 'http://www.imleagues.com/Members/Home.aspx'
# Authenticate
#r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
#r = session.get('http://www.imleagues.com/School/Team/Home.aspx?Team=27d6c31187314397b00293fb0cfbc79a')
website = site.read()
f = open('crypt.txt', 'wb')
f.write(website)
#print(website_html)
I am trying to log into this site to monitor game times and make sure they aren't changed on me (again). I've tried various ways to do this, most commented out above, but all of them redirect me back to the login page. Any ideas? Thanks.
As I see in given website login button is not in submit tag. Login is javascript function
<a ... href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$btnLogin','')" </a>
and mechanize cannot handle javascript. I faced very similiar problem and came up with solution to use Spynner.
It is headless web browser. So you can acomplish same tasks as you use mechanize and it has javascript support.
How could I make Python's module mechanize (specifically mechanize.Browser()) to save its current cookies to a human-readable file? Also, how would I go about uploading that cookie to a web page with it?
Thanks
Deusdies,I just figured out a way with refrence to Mykola Kharechko's post
#to save cookie
>>>cookiefile=open('cookie','w')
>>>cookiestr=''
>>>for c in br._ua_handlers['_cookies'].cookiejar:
>>> cookiestr+=c.name+'='+c.value+';'
>>>cookiefile.write(cookiestr)
#binding this cookie to another Browser
>>>while len(cookiestr)!=0:
>>> br1.set_cookie(cookiestr)
>>> cookiestr=cookiestr[cookiestr.find(';')+1:]
>>>cookiefile.close()
If you want to use the cookie for a web request such as a GET or POST (which mechanize.Browser does not support), you can use the requests library and the cookies as follows
import mechanize, requests
br = mechanize.Browser()
br.open (url)
# assuming first form is a login form
br.select_form (nr=0)
br.form['login'] = login
br.form['password'] = password
br.submit()
# if successful we have some cookies now
cookies = br._ua_handlers['_cookies'].cookiejar
# convert cookies into a dict usable by requests
cookie_dict = {}
for c in cookies:
cookie_dict[c.name] = c.value
# make a request
r = requests.get(anotherUrl, cookies=cookie_dict)
The CookieJar has several subclasses that can be used to save cookies to a file. For browser compatibility use MozillaCookieJar, for a simple human-readable format go with LWPCookieJar, just like this (an authentication via HTTP POST):
import urllib
import cookielib
import mechanize
params = {'login': 'mylogin', 'passwd': 'mypasswd'}
data = urllib.urlencode(params)
br = mechanize.Browser()
cj = mechanize.LWPCookieJar("cookies.txt")
br.set_cookiejar(cj)
response = br.open("http://example.net/login", data)
cj.save()