Getting specific data from a website with python - python

code:
import requests
from bs4 import BeautifulSoup
url="https://covid19.saglik.gov.tr/"
R=requests.get(url)
print(R.text)
Question: Hello friends,I must receive below specific values from above website. These values are changing daily.When the program runs, it should be able to print out the specified key from the website .For example: print(data["tarih"]) , print(data["gunluk_test"]) , print(data["gunluk_vaka"] etc. in html script. How can I do that ?
CODE's OUTPUT RESULT:
var sondurumjson = [{"tarih":"13.05.2021","gunluk_test":"201.295","gunluk_vaka":"11.534","gunluk_hasta":"1.217","gunluk_vefat":"238","gunluk_iyilesen":"55.472","toplam_test":"50.259.943","toplam_hasta":"5.083.996","toplam_vefat":"44.059","toplam_iyilesen":"4.856.763","toplam_yogun_bakim":"","toplam_entube":"","hastalarda_zaturre_oran":"4,0","agir_hasta_sayisi":"2.765","yatak_doluluk_orani":"43,7","eriskin_yogun_bakim_doluluk_orani":"65,0","ventilator_doluluk_orani":"32,4","ortalama_filyasyon_suresi":"","ortalama_temasli_tespit_suresi":"8","filyasyon_orani":"99,9"}];//]]>

It looks like the dict is already recognised by python so json.loads fails did you paste the output of the api or did you print the dict that you are working on?
Anyhow if that is the result of your api you can just do this to access everything in the dictionary:
sondurumjson = [{"tarih":"13.05.2021","gunluk_test":"201.295","gunluk_vaka":"11.534","gunluk_hasta":"1.217","gunluk_vefat":"238","gunluk_iyilesen":"55.472","toplam_test":"50.259.943","toplam_hasta":"5.083.996","toplam_vefat":"44.059","toplam_iyilesen":"4.856.763","toplam_yogun_bakim":"","toplam_entube":"","hastalarda_zaturre_oran":"4,0","agir_hasta_sayisi":"2.765","yatak_doluluk_orani":"43,7","eriskin_yogun_bakim_doluluk_orani":"65,0","ventilator_doluluk_orani":"32,4","ortalama_filyasyon_suresi":"","ortalama_temasli_tespit_suresi":"8","filyasyon_orani":"99,9"}]
new_object = sondurumjson[0]
print(new_object['tarih'])

I think you should use R.json() instead of R.text

import requests
from bs4 import BeautifulSoup
import json
url="https://covid19.saglik.gov.tr/"
R=requests.get(url)
soup=BeautifulSoup(R.content,"html.parser" )
script=soup.find_all("script")[18]
s=str(script).split('\nvar sondurumjson = ')
#print(s[1])
a=str(s[1])
b=a.split("\r")
#print(b[0])
c=b[0].partition(';')
data=c[0]
#print(data)
data = json.loads(data)
print( "Tarih:", data[0]['tarih'])
print("Günlük Test:", data[0]['gunluk_test'])
print( "Günlük Vaka:", data[0]['gunluk_vaka'])

Related

How to get data from an url?

I want to use python to get the data from "https://jobs.51job.com/jinan-gxq/139057256.html?s=gsxq_zwlb_gsxqlb&t=17"
I expect to get data like below but I get a script.
Here is my code:
import requests
url = 'https://jobs.51job.com/jinan-gxq/139057256.html?s=gsxq_zwlb_gsxqlb&t=17'
res = requests.get(url)
print(res.text)
The output is:
'<html><script>\nvar arg1=\'6C00B31D29AE4156FC48C6823C7476185B28DB15\';\nvar _0x4818=[\'\\x63\\x73\\x4b\\x48\\x77\\x71\\x4d\\x49\',\'\\x5a\\x73\\x4b\\x4a\\x77\\x72\\x38\\x56\\x65\\x41\\x73\\x79\',\'\\x55\\x63\\x4b\\x69\\x4e\\x38\\x4f\\x2f\\x77\\x70\\x6c\\x77\\x4d\\x41\\x3d\\x3d\',\'\\x4a\\x52\\x38\\x43\\x54\\x67\\x3d\\x3d\',\'\\x59\\x73\\x4f\\x6e\\x62\\x53\\x45\\x51\\x77\\x37\\x6f\\x7a\\x77\\x71\\x5a\\x4b\\x65\\x73\\x4b\\x55\\x77\\x37\\x6b\\x77\\x58\\x38\\x4f\\x52\\x49\\x51\\x3d\\x3d\',\'\\x77\\x37\\x6f\\x56\
How can I get the data?

When I take html from a website using urllib2, the inner html is empty. Anyone know why?

I am working on a project and one of the steps includes getting a random word which I will use later. When I try to grab the random word, it gives me '<span id="result"></span>' but as you can see, there is no word inside.
Code:
import urllib2
from bs4 import BeautifulSoup
quote_page = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find("span", {"id": "result"})
print name_box
name = name_box.text.strip()
print name
I am thinking that maybe it might need to wait for a word to appear, but I'm not sure how to do that.
This word is added to the page using JavaScript. We can verify this by looking at the actual HTML that is returned in the request and comparing it with what we see in the web browser DOM inspector. There are two options:
Use a library capable of executing JavaScript and giving you the resulting HTML
Try a different approach that doesn't require JavaScript support
For 1, we can use something like requests_html. This would look like:
from requests_html import HTMLSession
url = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
session = HTMLSession()
r = session.get(url)
# Some sleep required since the default of 0.2 isn't long enough.
r.html.render(sleep=0.5)
print(r.html.find('#result', first=True).text)
For 2, if we look at the network requests that the page is making, then we can see that it retrieves random words by making a POST request to http://watchout4snakes.com/wo4snakes/Random/RandomWord. Making a direct request with a library like requests (recommended in the standard library documentation here) looks like:
import requests
url = 'http://watchout4snakes.com/wo4snakes/Random/RandomWord'
print(requests.post(url).text)
So the way that the site works is that it sends you the site with no word in the span box, and edits it in later through JavaScript; that's why you get a span box with nothing inside.
However, since you're trying to get the word I'd definitely suggest you use a different method to getting the word, rather than scraping the word off the page, you can simply send a POST request to http://watchout4snakes.com/wo4snakes/Random/RandomWord with no body and receive the word in response.
You're using Python 2 but in Python 3 (for example, so I can show this works) you can do:
>>> import requests
>>> r = requests.post('http://watchout4snakes.com/wo4snakes/Random/RandomWord')
>>> print(r.text)
doom
You can do something similar using urllib in Python 2 as well.

PUT method using python3 and urbllib - headers

So I am trying to just receive the data from this json. I to use POST, GET on any link but the link I am currently trying to read. It needs [PUT]. So I wanted to know if I was calling this url correctly via urllib or am I missing something?
Request
{"DataType":"Word","Params":["1234"], "ID":"22"}
Response {
JSON DATA IN HERE
}
I feel like I am doing the PUT method call wrong since it is wrapped around Request{}.
import urllib.request, json
from pprint import pprint
header = {"DataType":"Word","Params":"[1234"]", "ID":"22"}
req = urllib.request.Request(url = "website/api/json.service", headers =
heaer, method = 'PUT')
with urllib.request.urlopen(req) as url:
data = json.loads(url.read(), decode())
pprint(data)
I am able to print json data as long as its anything but PUT. As soon as I get a site with put on it with the following JSON template I get an Internal Error 500. So I assumed it was was my header.
Thank you in advanced!

How to add dictionary in url as query param python request

Actually i am calling 3rd party API and requirement in to add json dictionary as it is. refer below URL example
https://pguat.paytm.com/oltp/HANDLER_INTERNAL/getTxnStatus?JsonData={"MID":"MID117185435","ORDERID":"ORDR4o22310421111",
"CHECKSUMHASH":
"NU9wPEWxmbOTFL2%2FUKr3lk6fScfnLy8wORc3YRylsyEsr2MLRPn%2F3DRePtFEK55ZcfdTj7mY9vS2qh%2Bsm7oTRx%2Fx4BDlvZBj%2F8Sxw6s%2F9io%3D"}
The query param name in "JsonData" and data should be in {} brackets.
import requests
import json
from urllib.parse import urlencode, quote_plus
import urllib.request
import urllib
data = '{"MID":"MID117185435","ORDERID":"ORDR4o22310421111","CHECKSUMHASH":"omcrIRuqDP0v%2Fa2DXTlVI4XtzvmuIW56jlXtGEp3S%2B2b1h9nU9cfJx5ZO2Hp%2FAN%2F%2FyF%2F01DxmoV1VHJk%2B0ZKHrYxqvDMJa9IOcldrfZY1VI%3D"}'
jsonData = data
uri = 'https://pguat.paytm.com/oltp/HANDLER_INTERNAL/getTxnStatus?jsonData='+str(quote_plus(data))
r = requests.get(uri)
print(r.url)
print(r.json)
print(r.json())
print(r.url) output on console
https://pguat.paytm.com/oltp/HANDLER_INTERNAL/getTxnStatus?jsonData=%7B%22MID%22%3A%22MEDTPA37902117185435%22%2C%22ORDERID%22%3A%22medipta1521537969o72718962111%22%2C%22CHECKSUMHASH%22%3A%22omcrIRuqDP0v%252Fa2DXTlVI4XtzvmuIW56jlXtGEp3S%252B2b1h9nU9cfJx5ZO2Hp%252FAN%252F%252FyF%252F01DxmoV1VHJk%252B0ZKHrYxqvDMJa9IOcldrfZY1VI%253D%22%7D
It converts {} to %7B and i want {} as it is..
Plz help ...
You need to undo quote_plus by importing and using unquote_plus.
I didn’t test against your url, just against your string.
When I print your uri string I get this as my output:
https://pguat.paytm.com/oltp/HANDLER_INTERNAL/getTxnStatus?jsonData=%7B%22MID%22%3A%22MID117185435%22%2C%22ORDERID%22%3A%22ORDR4o22310421111%22%2C%22CHECKSUMHASH%22%3A%22omcrIRuqDP0v%252Fa2DXTlVI4XtzvmuIW56jlXtGEp3S%252B2b1h9nU9cfJx5ZO2Hp%252FAN%252F%252FyF%252F01DxmoV1VHJk%252B0ZKHrYxqvDMJa9IOcldrfZY1VI%253D%22%7D
If I surround it like this:
print(str(unquote_plus(uri)))
I get this as output:
https://pguat.paytm.com/oltp/HANDLER_INTERNAL/getTxnStatus?jsonData={"MID":"MID117185435","ORDERID":"ORDR4o22310421111","CHECKSUMHASH":"omcrIRuqDP0v%2Fa2DXTlVI4XtzvmuIW56jlXtGEp3S%2B2b1h9nU9cfJx5ZO2Hp%2FAN%2F%2FyF%2F01DxmoV1VHJk%2B0ZKHrYxqvDMJa9IOcldrfZY1VI%3D"}

Python Parse JSON Response from URL

I'm am wanting to get information about my Hue lights using a python program. I am ok with sorting the information once I get it, but I am struggling to load in the JSON info. It is sent as a JSON response. My code is as follows:
import requests
import json
response= requests.get('http://192.168.1.102/api/F5La7UpN6XueJZUts1QdyBBbIU8dEvaT1EZs1Ut0/lights')
data = json.load(response)
print(data)
When this is run, all I get is the error:
in load return loads(fp.read(),
Response' object has no attribute 'read'
The problem is you are passing in the actual response which consists of more than just the content. You need to pull the content out of the response:
import requests
r = requests.get('https://github.com/timeline.json')
print r.text
# The Requests library also comes with a built-in JSON decoder,
# just in case you have to deal with JSON data
import requests
r = requests.get('https://github.com/timeline.json')
print r.json
http://www.pythonforbeginners.com/requests/using-requests-in-python
Looks like it will parse the JSON for you already...
Use response.content to access response content and json.loads method instead of json.load:
data = json.loads(response.content)
print data

Categories