log into stubborn webpage via Python

log into stubborn webpage via Python - python

I am not very experienced with this type of thing, but I cannot seem to log into this webpage via Python: https://ravenpack.com/discovery/login/
I have tried solutions from other StackOverflow posts, but nothing seems to work. It could be that it is not possible or I just do not know what I'm doing - either are likely possible
I have tried:
import requests
LOGIN_URL = 'https://ravenpack.com/discovery/login/'
DATA_URL = 'https://ravenpack.com/discovery/news_analytics_story/FFF4BFD4F4D4FF803852899BD1F02077/'
payload = {
'username': 'uname',
'password': 'pword'
}
with requests.Session() as s:
s.post(LOGIN_URL, data=payload)
r = s.get(DATA_URL)
print r.text
this:
from twill.commands import *
go('https://ravenpack.com/discovery/login/')
fv("2", "username", "uname")
fv("2", "password", "pword")
submit('1')
this:
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("https://ravenpack.com/discovery/login/") #Url that contains signin form
br.select_form()
br['username'] = "uname" #see what is the name of txt input in form
br['password'] = 'pword'
result = br.submit().read()
f=file('s.html', 'w')
f.write(result)
f.close()
and this:
from robobrowser import RoboBrowser
browser = RoboBrowser(history=True,user_agent='Mozilla/5.0')
login_url = 'https://ravenpack.com/discovery/login/'
browser.open(login_url)
form = browser.get_form(id='login_form')
form['username'].value = 'uname'
form['password'].value = 'pword'
browser.submit_form(form)
Any help is appreciated.

import requests
LOGIN_URL = 'https://ravenpack.com/discovery/login/'
DATA_URL = 'https://ravenpack.com/discovery/news_analytics_story/FFF4BFD4F4D4FF803852899BD1F02077/'
username = 'user'
password = 'password'
with requests.Session() as s:
s.post(LOGIN_URL, auth=HTTPBasicAuth(username, password))
r = s.get(DATA_URL)
print r.text

Related

PYTHON: requests and response 401

I have a little problem with authentication. I am writting a script, which is getting login and password from user(input from keyboard) and then I want to get some data from the website(http not https), but every time I run the script the response is 401.I read some similar posts from stack and I tried this solutions:
Solution 1
c = HTTPConnection("somewebsite")
userAndPass = b64encode(b"username:password").decode("ascii")
headers = { 'Authorization' : 'Basic %s' % userAndPass }
c.request('GET', '/', headers=headers)
res = c.getresponse()
data = res.read()
Solution 2
with requests.Session() as c:
url = 'somewebsite'
USERNAME = 'username'
PASSWORD = 'password'
c.get(url)
login_data = dict(username = USERNAME, password = PASSWORD)
c.post(url,data = login_data)
page = c.get('somewebsite', headers = {"Referer": "somwebsite"})
print(page)
Solution 3
www = 'somewebsite'
value ={'filter':'somefilter'}
data = urllib.parse.urlencode(value)
data=data.encode('utf-8')
req = urllib.request.Request(www,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
print(respData)
x = urllib.request.urlopen(www,"username","password")
print(x.read())'
I don't know how to solve this problem. Can somebody give me some link or tip ?

Have you tried the Basic Authentication example from requests?
>>> from requests.auth import HTTPBasicAuth
>>> requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
<Response [200]>

Can I know what type of authentication on the website?
this is an official Basic Auth example (http://docs.python-requests.org/en/master/user/advanced/#http-verbs)
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth('fake#example.com', 'not_a_real_password')
r = requests.post(url=url, data=body, auth=auth)
print(r.status_code)

To use api with authentication, we need to have token_id or app_id that will provide the access for our request. Below is an example how we can formulate the url and get the response:
strong text
import requests
city = input()
api_call = "http://api.openweathermap.org/data/2.5/weather?"
app_id = "892d5406f4811786e2b80a823c78f466"
req_url = api_call + "q=" + city + "&appid=" + app_id
response = requests.get(req_url)
data = response.json()
if (data["cod"] == 200):
hum = data["main"]["humidity"]
print("Humidity is % d " %(hum))
elif data["cod"] != 200:
print("Error occurred : " ,data["cod"], data["message"])

Authentication results in 404 code

There is a website I need to scrape, but before I do I need to login.
There seems to be three things I need to get in, the username, password and authenticity token. The user name and password I know, but I am not sure how to access the token.
This is what I have tried:
import requests
from lxml import html
login_url = "https://urs.earthdata.nasa.gov/home"
session_requests = requests.session()
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[#name='authenticity_token']/#value")))[0]
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)
print (result)
This results in :
<Response [404]>
My name and password are entered correctly so it is the token that must be going wrong. I think the problem is this line:
authenticity_token = list(set(tree.xpath("//input[#name='authenticity_token']/#value")))[0]
or this line:
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
by looking at the source code on the webpage I noticed there is a authenticity_token, csrf-token and a csrf-param. So its possible these are in the wrong order, but I tried all the combinations.
EDIT:
Here is a beautiful soup approach that results in 404 again.
s = requests.session()
response = s.get(login_url)
soup = BeautifulSoup(response.text, "lxml")
for n in soup('input'):
if n['name'] == 'authenticity_token':
token = n['value']
if n['name'] == 'utf8':
utf8 = n['value']
break
auth = {
'username': 'my_username'
, 'password': 'my_password'
, 'authenticity_token': token
, 'utf8' : utf8
}
s.post(login_url, data=auth)

If you inspect the page you'll notice that form action value is '/login', so you have to submit your data to https://urs.earthdata.nasa.gov/login'.
login_url = "https://urs.earthdata.nasa.gov/login"
home_url = "https://urs.earthdata.nasa.gov/home"
s = requests.session()
soup = BeautifulSoup(s.get(home_url).text, "lxml")
data = {i['name']:i.get('value', '') for i in soup.find_all('input')}
data['username'] = 'my_username'
data['password'] = 'my_password'
result = s.post(login_url, data=data)
print(result)
< Response [200]>
A quick example with selenium:
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
driver.get(url)
driver.find_element_by_name('username').send_keys('my_username')
driver.find_element_by_name('password').send_keys('my_password')
driver.find_element_by_id('login').submit()
html = driver.page_source
driver.quit()

Python - Login to site and post

I will login to site and post to shoutbox.php. I try this :
import urllib
import urllib2
login_data=urllib.urlencode({'username':'daniixxl','password':'steaua','submit':'Login'}) # replace username and password with filed name
op = urllib.urlopen('http://myxz.org/takelogin.php',login_data)
print op.read(100)
url = 'http://myxz.org/shoutbox.php'
data = urllib.urlencode({'shbox_text' : 'joe',
'sent' : 'yes'})
req2 = urllib2.Request(url, login_data)
print data
Problem is: Not post to shoutbox.php

from requests import session
mesaj_postat = {'yupii iar'}
logare = { 'username':'daniixxl',
'password':'steaua'
}
with session() as sesiune:
resp = sesiune.post('http://myxz.org/takelogin.php',data=logare)
if "statusdetails" in resp.text:
print("Logare reusita")
else:
print("Logare nereusita")

login to phpBB programmatically

I am trying to log in to a phpBB forum. However, I cannot figure out what is wrong with the code.
import requests
forum = "https://adblockplus.org/forum/"
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username': 'username', 'password': 'password'}
session = requests.Session()
r = session.post(forum + "ucp.php?mode=login", headers=headers, data=payload)
sidStart = r.text.find("sid")+4
sid = r.text[sidStart:sidStart+32]
parameters = {'mode': 'login', 'sid': sid}
r = session.post(forum, params=parameters)
if "Logout" in r.text:
print("We are in")
else:
print(r.text)
print(r)
It just always ends up not logged in.

import requests
forum = "https://adblockplus.org/forum/"
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username': 'username', 'password': 'password', 'redirect':'index.php', 'sid':'', 'login':'Login'}
session = requests.Session()
r = session.post(forum + "ucp.php?mode=login", headers=headers, data=payload)
print(r.text)
Made some small changes like adding redirect, sid and login to payload and it seems to work. Not sure which one helped, I'll leave figuring that out up to you.

how to pass search key and get result through bs4

def get_main_page_url("https://malwr.com/analysis/search/", strDestPath, strMD5):
base_url = 'https://malwr.com/'
url = 'https://malwr.com/account/login/'
username = 'myname'
password = 'pswd'
session = requests.Session()
# getting csrf value
response = session.get(url)
soup = bs4.BeautifulSoup(response.content)
form = soup.form
csrf = form.find('input', attrs={'name': 'csrfmiddlewaretoken'}).get('value')
## csrf1 = form.find('input', attrs ={'name': 'search'}).get('value')
# logging in
data = {
'username': username,
'password': password,
'csrfmiddlewaretoken': csrf
}
session.post(url, data=data)
# getting analysis data
response = session.get(urlparameter)
soup = bs4.BeautifulSoup(response.content)
form = soup.form
csrf = form.find('input', attrs={'name': 'csrfmiddlewaretoken'}).get('value')
## csrf1 = form.find('input', attrs ={'name': 'search'}).get('value')
data = {
'search': strMD5,
'csrfmiddlewaretoken': csrf
}
session.post(urlparameter, data = data)
response = session.get(urlparameter)
soup = bs4.BeautifulSoup(response.content)
print(soup)
if(None != soup.find('section', id='file').find('table')('tr')[-1].a):
link = soup.find('section', id='file').find('table')('tr')[-1].a.get('href')
link = urljoin(base_url, link)
webFile = session.get(link)
filename =link.split('/')[-2]
filename = arg + filename
localFile = open(filename, 'wb')
localFile.write(webFile.content)
webFile.close()
localFile.close()
I am able to login by searching crftoken. Then I am trying to send MD5 to search on malware.com, however I am not able to get the page that searches the sent MD5 to page.
I want to search the MD5 that we passes through crftoken.
Please let me know what is the wrong in code.

You've done almost everything correctly. Except that you need to pass the result of the POST request to BeautifulSoup. Replace:
session.post(urlparameter, data = data)
response = session.get(urlparameter)
with:
response = session.post(urlparameter, data=data)
Worked for me (I had an account at malwr).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

log into stubborn webpage via Python - python

Related

PYTHON: requests and response 401

Authentication results in 404 code

Python - Login to site and post

login to phpBB programmatically

how to pass search key and get result through bs4

Categories

Resources