I am using Python requests to get information from the mobile website of the german railways company (https://mobile.bahn.de/bin/mobil/query.exe/dox')
For instance:
import requests
query = {'S':'Stuttgart Hbf', 'Z':'München Hbf'}
rsp = requests.get('https://mobile.bahn.de/bin/mobil/query.exe/dox', params=query)
which in this case gives the correct page.
However, using the following query:
query = {'S':'Cottbus', 'Z':'München Hbf'}
It gives another response, where the user is required to choose one of the given options (The server is confused about the starting stations, since there are many beginning with 'Cottbus')
Now, my question is: given this response, how can I choose one of the given options, and then repeat the request without getting this error ?
I tried to look at the cookies, to use a session instead of a simple get request. But nothing worked so far.
I hope you can help me.
Thanks.
You can use Beautifulsoup to parse the response and get the options if there is a select on the response:
import requests
from bs4 import BeautifulSoup
query = {'S': u'Cottbus', 'Z': u'München Hbf'}
rsp = requests.get('https://mobile.bahn.de/bin/mobil/query.exe/dox', params=query)
soup = BeautifulSoup(rsp.content, 'lxml')
# check if has choice dropdown
if soup.find('select'):
# Get list of tuples with text and input values that you will nee do use in the next POST request
options_value = [(option['value'], option.text) for option in soup.find_all('option')]
Related
I have written a simple script for myself as practice to find who had bought same tracks as me on bandcamp to ideally find accounts with similar tastes and so more same music on their accounts.
The problem is that fan list on a album/track page is lazy loading. Using python's requests and bs4 I am only getting 60 results out of potential 700.
I am trying to figure out how to send request i.e. here https://pitp.bandcamp.com/album/fragments-distancing to open more of the list. After finding what request is send when I click in finder, I used that json request to send it using requests, although without any result
res = requests.get(track_link)
open_more = {"tralbum_type":"a","tralbum_id":3542956135,"token":"1:1609185066:1714678:0:1:0","count":100}
for i in range(0,3):
requests.post(track_link, json=open_more)
Will appreciate any help!
i think that just typing a ridiculous number for count will do. i did some automation on your script too if you want to get data on other albums
from urllib.parse import urlsplit
import json
import requests
from bs4 import BeautifulSoup
# build the post link
get_link="https://pitp.bandcamp.com/album/fragments-distancing"
link=urlsplit(get_link)
base_link=f'{link.scheme}://{link.netloc}'
post_link=f"{base_link}/api/tralbumcollectors/2/thumbs"
with requests.session() as s:
res = s.get(get_link)
soup = BeautifulSoup(res.text, 'lxml')
# the data for tralbum_type and tralbum_id
# are stored in a script attribute
key="data-band-follow-info"
data=soup.select_one(f'script[{key}]')[key]
data=json.loads(data)
open_more = {
"tralbum_type":data["tralbum_type"],
"tralbum_id":data["tralbum_id"],
"count":1000}
r=s.post(post_link, json=open_more).json()
print(r['more_available']) # if not false put a bigger count
So, Recently I've been trying to get some marks from a result website (http://tnresults.nic.in/rgnfs.htm) for my school results.... My friends challenged me to get his marks for which I only know his DOB and not his Register Number.. How do I make a Python program to solve this by trying to input register numbers from a predefined range(I know his DOB, btw)?
I tried using requests, but it doesn't allow me to enter the register and DOB..
It creates a POST request with the following format after pushing the Submit button:
https://dge3.tn.nic.in/plusone/plusoneapi/marks/{registration number}/{DOB}
Sample (with 112231 as registration number and 01-01-2000 as DOB.
https://dge3.tn.nic.in/plusone/plusoneapi/marks/112231/01-01-2000
You can then iterate over different registration numbers with a predefined array.
Note: it has to be a POST request, not a regular GET request.
You probably have to do something like the following:
import requests
from bs4 import BeautifulSoup
DOB = '01-01-2000'
REGISTRATION_NUMBERS = ['1','2']
for reg_number in REGISTRATION_NUMBERS:
result = requests.post(f"https://dge3.tn.nic.in/plusone/plusoneapi/marks/{reg_number}/{DOB}")
content = result.content
print(content)
## BeautifulSoup logic
I don't know if that request is providing you the information you need, I don't have valid registration numbers combined with the correct date of birth, so I cannot really test it...
Update 2019-07-09:
Since you said the page is not working anymore and the website changed, I took a look.
It seems that some things have changed you now have to make a post request to http://tnresults.nic.in/rgnfs.asp. The fields 'regno', 'dob' and 'B1' (optional?) should be send as x-www-form-urlencoded.
Since that will return an 'Access Denied' you should set the 'Referer'-header to 'http://tnresults.nic.in/rgnfs.htm'. so:
import requests
from bs4 import BeautifulSoup
DOB = '23-10-2002'
REGISTRATION_NUMBERS = ['5709360']
headers = requests.utils.default_headers()
headers.update({'Referer': 'http://tnresults.nic.in/rgnfs.htm'})
for reg_number in REGISTRATION_NUMBERS:
post_data = {'regno': reg_number, 'dob': DOB}
result = requests.post(f"http://tnresults.nic.in/rgnfs.asp", data=post_data, headers=headers)
content = result.content
print(content)
## BeautifulSoup logic
Tested it myself successfully now you've provided a valid DOB and registration number.
I am working on a project and one of the steps includes getting a random word which I will use later. When I try to grab the random word, it gives me '<span id="result"></span>' but as you can see, there is no word inside.
Code:
import urllib2
from bs4 import BeautifulSoup
quote_page = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find("span", {"id": "result"})
print name_box
name = name_box.text.strip()
print name
I am thinking that maybe it might need to wait for a word to appear, but I'm not sure how to do that.
This word is added to the page using JavaScript. We can verify this by looking at the actual HTML that is returned in the request and comparing it with what we see in the web browser DOM inspector. There are two options:
Use a library capable of executing JavaScript and giving you the resulting HTML
Try a different approach that doesn't require JavaScript support
For 1, we can use something like requests_html. This would look like:
from requests_html import HTMLSession
url = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
session = HTMLSession()
r = session.get(url)
# Some sleep required since the default of 0.2 isn't long enough.
r.html.render(sleep=0.5)
print(r.html.find('#result', first=True).text)
For 2, if we look at the network requests that the page is making, then we can see that it retrieves random words by making a POST request to http://watchout4snakes.com/wo4snakes/Random/RandomWord. Making a direct request with a library like requests (recommended in the standard library documentation here) looks like:
import requests
url = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
print(requests.post(url).text)
So the way that the site works is that it sends you the site with no word in the span box, and edits it in later through JavaScript; that's why you get a span box with nothing inside.
However, since you're trying to get the word I'd definitely suggest you use a different method to getting the word, rather than scraping the word off the page, you can simply send a POST request to http://watchout4snakes.com/wo4snakes/Random/RandomWord with no body and receive the word in response.
You're using Python 2 but in Python 3 (for example, so I can show this works) you can do:
>>> import requests
>>> r = requests.post('http://watchout4snakes.com/wo4snakes/Random/RandomWord')
>>> print(r.text)
doom
You can do something similar using urllib in Python 2 as well.
Background:
Typically, if I want to see what type of requests a website is getting, I would open up chrome developer tools (F12), go to the Network tab and filter the requests I want to see.
Example:
Once I have the request URL, I can simply parse the URL for the query string parameters I want.
This is a very manual task and I thought I could write a script that does this for any URL I provide. I thought Python would be great for this.
Task:
I have found a library called requests that I use to validate the URL before opening.
testPage = "http://www.google.com"
validatedRequest = str(requests.get(testPage, verify=False).url)
page = urlopen(validatedRequest)
However, I am unsure of how to get the requests that the URL I enter receives. Is this possible in python? A point in the right direction would be great. Once I know how to access these request headers, I can easily parse through.
Thank you.
You can use the urlparse method to fetch the query params
Demo:
import requests
import urllib
from urlparse import urlparse
testPage = "http://www.google.com"
validatedRequest = str(requests.get(testPage, verify=False).url)
page = urllib.urlopen(validatedRequest)
print urlparse(page.url).query
Result:
gfe_rd=cr&dcr=0&ei=ISdiWuOLJ86dX8j3vPgI
Tested in python2.7
I am trying to write script that searches an inchikey (ex: OBSSCZVQJAGPOE-KMKNQKDISA-N) to get a chemical structure from this website:
http://www.chemspider.com/inchi-resolver/Resolver.aspx
From the documentation my code looks like it should work, but instead it just returns the original search page.
Thanks for the help,
import urllib
inchi = 'OBSSCZVQJAGPOE-KMKNQKDISA-N'
url = 'http://www.chemspider.com/inchi-resolver/Resolver.aspx'
data = urllib.urlencode({'"ctl00$ContentPlaceHolder1$TextBox1"':inchi})
response = urllib.urlopen(url, data)
print response.read()
Your code is performing a GET request and not a POST request. Apart from that: the form contains various hidden fields with some strange values which might be necessary for the processing as well.