How to get data from an url? - python

I want to use python to get the data from "https://jobs.51job.com/jinan-gxq/139057256.html?s=gsxq_zwlb_gsxqlb&t=17"
I expect to get data like below but I get a script.
Here is my code:
import requests
url = 'https://jobs.51job.com/jinan-gxq/139057256.html?s=gsxq_zwlb_gsxqlb&t=17'
res = requests.get(url)
print(res.text)
The output is:
'<html><script>\nvar arg1=\'6C00B31D29AE4156FC48C6823C7476185B28DB15\';\nvar _0x4818=[\'\\x63\\x73\\x4b\\x48\\x77\\x71\\x4d\\x49\',\'\\x5a\\x73\\x4b\\x4a\\x77\\x72\\x38\\x56\\x65\\x41\\x73\\x79\',\'\\x55\\x63\\x4b\\x69\\x4e\\x38\\x4f\\x2f\\x77\\x70\\x6c\\x77\\x4d\\x41\\x3d\\x3d\',\'\\x4a\\x52\\x38\\x43\\x54\\x67\\x3d\\x3d\',\'\\x59\\x73\\x4f\\x6e\\x62\\x53\\x45\\x51\\x77\\x37\\x6f\\x7a\\x77\\x71\\x5a\\x4b\\x65\\x73\\x4b\\x55\\x77\\x37\\x6b\\x77\\x58\\x38\\x4f\\x52\\x49\\x51\\x3d\\x3d\',\'\\x77\\x37\\x6f\\x56\
How can I get the data?

Related

405 Error when I try to access data from an api link

I am trying to get data from an api, but when I try to access it with the following code I get the message (405 Error The method is not allowed for the requested URL.)
response = requests.get(url)
data = response.text
print(data)
You can try converting the json to dictionary by rewriting the code as
response = requests.get(url)
dict=response.json()
data = dict.text
print(data)

Getting specific data from a website with python

code:
import requests
from bs4 import BeautifulSoup
url="https://covid19.saglik.gov.tr/"
R=requests.get(url)
print(R.text)
Question: Hello friends,I must receive below specific values from above website. These values are changing daily.When the program runs, it should be able to print out the specified key from the website .For example: print(data["tarih"]) , print(data["gunluk_test"]) , print(data["gunluk_vaka"] etc. in html script. How can I do that ?
CODE's OUTPUT RESULT:
var sondurumjson = [{"tarih":"13.05.2021","gunluk_test":"201.295","gunluk_vaka":"11.534","gunluk_hasta":"1.217","gunluk_vefat":"238","gunluk_iyilesen":"55.472","toplam_test":"50.259.943","toplam_hasta":"5.083.996","toplam_vefat":"44.059","toplam_iyilesen":"4.856.763","toplam_yogun_bakim":"","toplam_entube":"","hastalarda_zaturre_oran":"4,0","agir_hasta_sayisi":"2.765","yatak_doluluk_orani":"43,7","eriskin_yogun_bakim_doluluk_orani":"65,0","ventilator_doluluk_orani":"32,4","ortalama_filyasyon_suresi":"","ortalama_temasli_tespit_suresi":"8","filyasyon_orani":"99,9"}];//]]>
It looks like the dict is already recognised by python so json.loads fails did you paste the output of the api or did you print the dict that you are working on?
Anyhow if that is the result of your api you can just do this to access everything in the dictionary:
sondurumjson = [{"tarih":"13.05.2021","gunluk_test":"201.295","gunluk_vaka":"11.534","gunluk_hasta":"1.217","gunluk_vefat":"238","gunluk_iyilesen":"55.472","toplam_test":"50.259.943","toplam_hasta":"5.083.996","toplam_vefat":"44.059","toplam_iyilesen":"4.856.763","toplam_yogun_bakim":"","toplam_entube":"","hastalarda_zaturre_oran":"4,0","agir_hasta_sayisi":"2.765","yatak_doluluk_orani":"43,7","eriskin_yogun_bakim_doluluk_orani":"65,0","ventilator_doluluk_orani":"32,4","ortalama_filyasyon_suresi":"","ortalama_temasli_tespit_suresi":"8","filyasyon_orani":"99,9"}]
new_object = sondurumjson[0]
print(new_object['tarih'])
I think you should use R.json() instead of R.text
import requests
from bs4 import BeautifulSoup
import json
url="https://covid19.saglik.gov.tr/"
R=requests.get(url)
soup=BeautifulSoup(R.content,"html.parser" )
script=soup.find_all("script")[18]
s=str(script).split('\nvar sondurumjson = ')
#print(s[1])
a=str(s[1])
b=a.split("\r")
#print(b[0])
c=b[0].partition(';')
data=c[0]
#print(data)
data = json.loads(data)
print( "Tarih:", data[0]['tarih'])
print("Günlük Test:", data[0]['gunluk_test'])
print( "Günlük Vaka:", data[0]['gunluk_vaka'])

How to use Python requests to fill out a date option as well as download from a radiobutton

I'm trying to make a python script that will scrape data from this website
https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315
and download the CSV from yesterday. As you can see, it's got two menu options for dates, a radio button for CSV and then a submit button.
I thought perhaps I could use the requests library? Not looking for someone to do it for me, but if anyone could point me in the right direction that would be great!
I know this is too simple but here is what I have so far:
import requests
print('Download Starting...')
url = 'https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315'
r = requests.get(url)
filename = url.split('/')[-1] # this will take only -1 splitted part of the url
with open(filename,'wb') as output_file:
output_file.write(r.content)
print('done')
You need first to use requests.Session() in order to store cookies and re-send them in subsequent requests. The process is the following :
get the original URL first to get the cookies (session id)
make a request on POST /reports/ci_report/server/request.php with some parameters including date and output format. The result is a json with an id like this :
{'jrId': 'jr_13879611'}
make a request on GET /reports/ci_report/server/streamReport.php?jrId=jr_13879611 which gives the csv data
There is a parameter in the POST request where we need the menuitem query param value from your original url, so we parse the query params to get it using urlparse :
import requests
import time
import urllib.parse as urlparse
from urllib.parse import parse_qs
from datetime import datetime,timedelta
yesterday = datetime.now() - timedelta(1)
yesterday_date = f'{yesterday.strftime("%d")}-{yesterday.strftime("%B")[:3]}-{yesterday.strftime("%Y")}'
original_url = "https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315"
parsed = urlparse.urlparse(original_url)
target_url = "https://noms.wei-pipeline.com/reports/ci_report/server/request.php"
stream_report_url = "https://noms.wei-pipeline.com/reports/ci_report/server/streamReport.php"
s = requests.Session()
# load the cookies
s.get(original_url)
#get id
r = s.post(target_url,
params = {
"request.preventCache": int(round(time.time() * 1000))
},
data = {
"ReportProc": "CIPR_DAILY_BULLETIN",
"p_ci_id": parse_qs(parsed.query)['menuitem'][0],
"p_opun": "PL",
"p_gas_day_from": yesterday_date,
"p_gas_day_to": yesterday_date,
"p_output_option": "CSV"
})
r = s.get(stream_report_url, params = r.json())
print(r.text)
Try this on repl.it

urllib request for json does not match the json in browser

This is my code thus far.
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = urllib.urlopen(url)
print response
data = json.load(response)
print data
The problem is that when I look at the json in the browser it is long and contains more features than I see when printing it.
To be more exact, I'm looking for the 'points' part which should be
data['points']['points']
however
data['points']
has only 2 attributes and doesn't contain the second 'points' that I do see in the url in the browser.
Could it be that I can only load 1 "layer" deep and not 2?
You need to add a user-agent to your request.
Using requests (which urllib documentation recommends over directly using urllib), you can do:
import requests
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = requests.get(url, headers={'user-agent': 'Mozilla 5.0'})
print(response.json())
# long output....

Using Urllib2 with 511 api that requires token

So, all I want to do is send a request to the 511 api and return the train times from the train station. I can do that using the full url request, but I would like to be able to set values without paste-ing together a string and then sending that string. I want to have the api return the train times for different stations. I see other requests that use headers, but I don't know how to use headers with a request and am confused by the documentation.
This works...
urllib2.Request("http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?token=xxxx-xxx&stopcode=70142")
response = urllib2.urlopen(request)
the_page = response.read()
I want to be able to set values like this...
token = xxx-xxxx
stopcode = 70142
url = "http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?"
... and then put them together like this...
urllib2.Request(url,token, stopcode)
and get the same result.
The string formatting documentation would be a good place to start to learn more about different ways to plug in values.
val1 = 'test'
val2 = 'test2'
url = "https://www.example.com/{0}/blah/{1}".format(val1, val2)
urllib2.Request(url)
The missing piece is "urllib" needs to be used along with "urllib2". Specifically, the function urllib.urlencode() returns the encoded versions of the values.
From the urllib documentation here
import urllib
query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args
url = 'http://localhost:8080/?' + encoded_args
print urllib.urlopen(url).read()
So the corrected code is as follows:
import urllib
import urllib2
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
encoded_args = urllib.urlencode(query_args)
request = urllib2.Request(url+encoded_args)
response = urllib2.urlopen(request)
print(response.read())
Actually, it is a million times easier to use requests package and not urllib, urllib2. All that code above can be replaced with this:
import requests
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
r = request.get(url, params = query_args)
r.text

Categories