I have a script that needs to find elements via HTML, but when it access to the main page, this page shows up: https://gyazo.com/84d0e5b7a73c97db5b780f18d0ba3f89
My questions are these:
How can I bypass it?
How can I get cookies via cfscrape.create_scraper() or requests.session()?
my script:
import datetime
import bs4
import cfscrape
s = cfscrape.create_scraper()
url = str(input("["+str(datetime.datetime.now())+"]"+" [INPUT] > URL # "))
product = s.get(url, headers=headers, allow_redirects=True)
soup = bs4.BeautifulSoup(product.text,"html.parser")
Related
I need to download the content of a web page using Python.
What I need is the TLE of a specific satellite from Space-Track.org website.
An example of the url I need to scrape is the following:
https://www.space-track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show
Below the unsuccesful code I wrote/copied:
import requests
url = 'https://www.space-
track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show'
res = requests.post(url)
html_page = res.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
print(text)
res.post(url) returns Response [204] and I can't access the content of the webpage.
Could this happen because of the required login?
I must admit that I am not experienced with Python and I don't have the knowledge to this myself.
What I can do is to manipulate text files and from the DevTools page I can get the HTML file and extrapolate the text, but how can I do this programmatically?
To access the url you mentioned , you need USERNAME and PASSWORD Authorization.
to do this( customize to your need):
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib ## http.cookiejar in python3
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.open("https://id.arduino.cc/auth/login/")
br.select_form(nr=0)
br.form['username'] = 'username'
br.form['password'] = 'password.'
br.submit()
print br.response().read()
I don't have access to this API, so take my advice with a grain of salt, but you should also try using requests.get instead of requests.post.
Why? Because requests.post POSTs data to the server, while requests.get GETs data from the server. GET and POST are known as HTTP methods, and to learn more about them, see https://www.tutorialspoint.com/http/http_methods.htm. Since web browsers use GET, you should give that a try.
I am very new to Python, I am trying to extract data from a site.
For that I am stuck on the first step of Login into the site only.
This is what I have tried:
#Importing Libs
import urllib3
from bs4 import BeautifulSoup
import requests
import http
jar = http.cookiejar.CookieJar(policy=None)
http = urllib3.PoolManager()
#Setting account details
acc_pwd = {'user_username':'userABC',
'user_password':'ABC123'}
#enter URL
quote_page = 'example.com'
response = http.request('GET', quote_page)
soup = BeautifulSoup(response.data)
print ("Data %s" % soup)
r = requests.get(quote_page, cookies=jar)
r = requests.post(quote_page, cookies=jar, data=acc_pwd)
print ("##############")
print ("RData %s" % r.text)
It takes me back to login page only.
Not sure if i am entering the details properly or not.
this generally works for me:
from bs4 import BeautifulSoup
import requests
from requests import Request, Session
from requests_ntlm import HttpNtlmAuth
base_url = ''
r = requests.get(base_url, auth=HttpNtlmAuth('domain\\username', 'password'))
This question has been addresses in various shapes and flavors but I have not been able to apply any of the solutions I read online.
I would like to use Python to log into the site: https://app.ninchanese.com/login
and then reach the page: https://app.ninchanese.com/leaderboard/global/1
I have tried various stuff but without success...
Using POST method:
import urllib
import requests
oURL = 'https://app.ninchanese.com/login'
oCredentials = dict(email='myemail#hotmail.com', password='mypassword')
oSession = requests.session()
oResponse = oSession.post(oURL, data=oCredentials)
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Using the authentication function from requests package
import requests
oSession = requests.session()
oResponse = oSession.get('https://app.ninchanese.com/login', auth=('myemail#hotmail.com', 'mypassword'))
oResponse2 = oSession.get('https://app.ninchanese.com/leaderboard/global/1')
Whenever I print oResponse2, I can see that I'm always on the login page so I am guessing the authentication did not work.
Could you please advise how to achieve this?
You have to send the csrf_token along with your login request:
import urllib
import requests
import bs4
URL = 'https://app.ninchanese.com/login'
credentials = dict(email='myemail#hotmail.com', password='mypassword')
session = requests.session()
response = session.get(URL)
html = bs4.BeautifulSoup(response.text)
credentials['csrf_token'] = html.find('input', {'name':'csrf_token'})['value']
response = session.post(URL, data=credentials)
response2 = session.get('https://app.ninchanese.com/leaderboard/global/1')
I am trying to fetch a sample page in python
import mechanize
def viewpage(url):
browser = mechanize.Browser()
page = browser.open(url)
source_code = page.read()
print source_code
viewpage('https://sama.com/index.php?req=1')
However everytime it will get redirected to index2.php (by a location header from webserver) thus for example the code print the response from index2.php rather than index.php is there anyway to avoid that?
You can use urllib2 or requests for more complex stuff.
import urllib2
response = urllib2.urlopen("http://google.com")
page_source = response.read()
urllib2 is a built-in module and requests is 3rd party.
I am trying / wanting to write a Python script (2.7) that goes to a form on a website (with the name "form1") and fills in the first input-field in said form with the word hello, the second input-field with the word Ronald, and the third field with ronaldG54#gmail.com
Can anyone help me code or give me any tips or pointers on how to do this ?
Aside from Mechanize and Selenium David has mentioned, it can also be achieved with Requests and BeautifulSoup.
To be more clear, use Requests to send request to and retrieve responses from server, and use BeautifulSoup to parse the response html to know what parameters to send to the server.
Here is an example script I wrote that uses both Requests and BeautifulSoup to submit username and password to login to wikipedia:
import requests
from bs4 import BeautifulSoup as bs
def get_login_token(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = [n['value'] for n in soup.find_all('input')
if n['name'] == 'wpLoginToken']
return token[0]
payload = {
'wpName': 'my_username',
'wpPassword': 'my_password',
'wpLoginAttempt': 'Log in',
#'wpLoginToken': '',
}
with requests.session() as s:
resp = s.get('http://en.wikipedia.org/w/index.php?title=Special:UserLogin')
payload['wpLoginToken'] = get_login_token(resp)
response_post = s.post('http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login',
data=payload)
response = s.get('http://en.wikipedia.org/wiki/Special:Watchlist')
Update:
For your specific case, here is the working code:
import requests
from bs4 import BeautifulSoup as bs
def get_session_id(raw_resp):
soup = bs(raw_resp.text, 'lxml')
token = soup.find_all('input', {'name':'survey_session_id'})[0]['value']
return token
payload = {
'f213054909': 'o213118718', # 21st checkbox
'f213054910': 'Ronald', # first input-field
'f213054911': 'ronaldG54#gmail.com',
}
url = r'https://app.e2ma.net/app2/survey/39047/213008231/f2e46b57c8/?v=a'
with requests.session() as s:
resp = s.get(url)
payload['survey_session_id'] = get_session_id(resp)
response_post = s.post(url, data=payload)
print response_post.text
Take a look at Mechanize and Selenium. Both are excellent pieces of software that would allow you to automate filling and submitting a form, among other browser tasks.