Get Gitlab link page content in html

Get Gitlab link page content in html - python

I am having the same use case as here. I would like to access to the gitlab page to get html page content (private repo) but it always direct me to sign in page even I already pass the authentication which I refer to here
Below is my code:
import urllib, re, sys, requests
from bs4 import BeautifulSoup
LOGIN_URL = 'https://gitlab.devtools.com//users/auth/ldapmain/callback'
session = requests.Session()
data = {'username': username,
'password': password,
'authenticity_token': token}
r = session.post(LOGIN_URL, data=data)
print r.status_code
url = "https://gitlab.devtools.com/Sandbox/testing/merge_requests/2"
html = session.get(url)
print html.url
Any idea on this? Am I missing anything?

Related

Not able to log in to site and scrape data

I'm trying to scrape the site data, but facing issue while logging in to the site. when I log in to the site with username and password it does not do so.
I think there is an issue with the token, every time I try to login to the system a token is generated(check in the console headers)
import requests
from bs4 import BeautifulSoup
s = requests.session()
url = "http://indiatechnoborate.tymra.com"
with requests.Session() as s:
first = s.get(url)
start_soup = BeautifulSoup(first.content, 'lxml')
print(start_soup)
retVal=start_soup.find("input",{"name":"return"}).get('value')
print(retVal)
formdata=start_soup.find("form",{"id":"form-login"})
dynval=formdata.find_all('input',{"type":"hidden"})[1].get('name')
print(dynval)
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"
}
print(dictdata)
pr = {"task":"user.login"}
print(pr)
sec = s.post("http://indiatechnoborate.tymra.com/component/users/",data=dictdata,params=pr)
print("------------------------------------------")
print(sec.status_code,sec.url)
print(sec.text)
I want to log in to the site and want to get the data after login is done

try replacing this line:
dictdata={"username":"username", "password":"password","return":retVal,dynval:"1"}
with this one:
dictdata={"username":"username", "password":"password","return":retVal + "==",dynval:"1"}
hope this helps

Try to use authentication methods instead of passing in payload
import requests
from requests.auth import HTTPBasicAuth
USERNAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"
BASIC_AUTH = HTTPBasicAuth(USERNAME, PASSWORD)
LOGIN_URL = "http://indiatechnoborate.tymra.com"
response = requests.get(LOGIN_URL,headers={},auth=BASIC_AUTH)

Unable to login to indeed.com using python requests

I'm trying to write a code to collect resumes from "indeed.com" website.
In order to download resumes from "indeed.com" you have to login with your account.
The problem with me is after posting data it shows me response [200] which indicates successful post but still fail to login.
Here is my code :
import requests
from bs4 import BeautifulSoup
from lxml import html
page = requests.get('https://secure.indeed.com/account/login')
soup = BeautifulSoup(page.content, 'html.parser')
row_text = soup.text
surftok = str(row_text[row_text.find('"surftok":')+11:row_text.find('","tmpl":')])
formtok = str(row_text[row_text.find('"tk":') + 6:row_text.find('","variation":')])
logintok = str(row_text[row_text.find('"loginTk":') + 11:row_text.find('","debugBarLink":')])
cfb = int(str(row_text[row_text.find('"cfb":')+6:row_text.find(',"pvr":')]))
pvr = int(str(row_text[row_text.find('"pvr":') + 6:row_text.find(',"obo":')]))
hl = str(row_text[row_text.find('"hl":') + 6:row_text.find('","co":')])
data = {
'action': 'login',
'__email': 'myEmail',
'__password': 'myPassword',
'remember': '1',
'hl': hl,
'cfb': cfb,
'pvr': pvr,
'form_tk': formtok,
'surftok': surftok,
'login_tk': logintok
}
response = requests.post("https://secure.indeed.com/", data=data)
print response
print 'myEmail' in response.text
It shows me response [200] but when I search for my email in the response page to make sure that login is successful, I don't find it. It seems that login failed for a reason that I don't know.

send headers as well in your post request, get the headers from response headers of your browser.
headers = {'user-agent': 'Chrome'}
response = requests.post("https://secure.indeed.com/",headers = headers, data=data)

Some websites uses JavaScript redirection. "indeed.com" is one of them. Unfortunately, python requests does not support JavaScript redirection. In such situations we may use selenium.

Python requests post login not working for this site

So I've tried everything to try to login to this site with sessions and python requests but it doesn't seem to work and keeps redirecting me to the login page when I try to access the protected url. (status_code = 302)
import time
import smtplib
import requests
from bs4 import BeautifulSoup
from lxml import html
url = "https://beatyourcourse.com/school_required#"
protected_url = "https://beatyourcourse.com/flyering"
session = requests.Session()
responce = session.get(url)
tree = html.fromstring(responce.text)
token = list(set(tree.xpath("//input[#name='authenticity_token']/#value")))[0]
payload = {
'user[email]' : '****',
'user[password]' : '****',
'authenticity_token' : token
}
responce = session.post(url, data = payload) #Logging in
responce = session.get(protected_url) # visiting protected url
print responce.url # prints "https://beatyourcourse.com/school_required#" (redirected to login page)

Mechanize not logging in?

I'm very new to python, and I'm trying to scrape a webpage using BeautifulSoup, which requires a log in.
So far I have
import mechanize
import cookielib
import requests
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.open('URL')
#login form
br.select_form(nr=2)
br['email'] = 'EMAIL'
br['pass'] = 'PASS'
br.submit()
soup = BeautifulSoup(br.response().read(), "lxml")
with open("output1.html", "w") as file:
file.write(str(soup))
(With "URL" "EMAIL" and "PASS" being the website, my email and password.)
Still the page I get in output1.html is the logged out page, rather than what you would see after logging in?
How can I make it so it logs in with the details and returns what's on the page after log in?
Cheers for any help!

Let me suggest another way to obtain desired page.
It may be a little bit easy to troubleshoot.
First, you should login manually with open any browser Developer tools's page Network. After sending login credentials, you will get a line with POST request. Open the request and right side you will get the "form data" information.
Use this code to send login data and get response:
`
from bs4 import BeautifulSoup
import requests
session = requests.Session()
url = "your url"
req = session.get(url)
soup = BeautifulSoup(req.text, "lxml")
# You can collect some useful data here (like csrf code or some token)
#fill in form data here
params = {'login': 'your login',
'password': 'your password'}
req = session.post(url)
I hope this code will be helpful.

Fill and submit html form

I am trying / wanting to write a Python script (2.7) that goes to a form on a website (with the name "form1") and fills in the first input-field in said form with the word hello, the second input-field with the word Ronald, and the third field with ronaldG54#gmail.com
Can anyone help me code or give me any tips or pointers on how to do this ?

Aside from Mechanize and Selenium David has mentioned, it can also be achieved with Requests and BeautifulSoup.
To be more clear, use Requests to send request to and retrieve responses from server, and use BeautifulSoup to parse the response html to know what parameters to send to the server.
Here is an example script I wrote that uses both Requests and BeautifulSoup to submit username and password to login to wikipedia:
import requests
from bs4 import BeautifulSoup as bs
def get_login_token(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = [n['value'] for n in soup.find_all('input')
if n['name'] == 'wpLoginToken']
return token[0]
payload = {
'wpName': 'my_username',
'wpPassword': 'my_password',
'wpLoginAttempt': 'Log in',
#'wpLoginToken': '',
}
with requests.session() as s:
resp = s.get('http://en.wikipedia.org/w/index.php?title=Special:UserLogin')
payload['wpLoginToken'] = get_login_token(resp)
response_post = s.post('http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login',
data=payload)
response = s.get('http://en.wikipedia.org/wiki/Special:Watchlist')
Update:
For your specific case, here is the working code:
import requests
from bs4 import BeautifulSoup as bs
def get_session_id(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = soup.find_all('input', {'name':'survey_session_id'})[0]['value']
return token
payload = {
'f213054909': 'o213118718', # 21st checkbox
'f213054910': 'Ronald', # first input-field
'f213054911': 'ronaldG54#gmail.com',
}
url = r'https://app.e2ma.net/app2/survey/39047/213008231/f2e46b57c8/?v=a'
with requests.session() as s:
resp = s.get(url)
payload['survey_session_id'] = get_session_id(resp)
response_post = s.post(url, data=payload)
print response_post.text

Take a look at Mechanize and Selenium. Both are excellent pieces of software that would allow you to automate filling and submitting a form, among other browser tasks.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get Gitlab link page content in html - python

Related

Not able to log in to site and scrape data

Unable to login to indeed.com using python requests

Python requests post login not working for this site

Mechanize not logging in?

Fill and submit html form

Categories

Resources