api with python to give dictionary - python

from urllib.request import urlopen
from bs4 import BeautifulSoup
apikey='*****d2deb67f650f022ae13d07*****'
first='http://api.ipstack.com/'
ip='134.201.250.155'
third='?access_key='
print(first+ip+third+apikey)
#html=urlopen(first+ip+third+apikey)
soup=BeautifulSoup(html,"html.parser")
print(soup)
i had to hide the first,last 5 digits of my apikey,anyway this gives
{"ip":"134.201.250.155","type":"ipv4","continent_code":"NA","continent_name":"North America","country_code":"US","country_name":"United States","region_code":"CA","region_name":"California","city":"La Jolla","zip":"92037","latitude":32.8455,"longitude":-117.2521,"location":{"geoname_id":5363943,"capital":"Washington D.C.","languages":[{"code":"en","name":"English","native":"English"}],"country_flag":"http:\/\/assets.ipstack.com\/flags\/us.svg","country_flag_emoji":"\ud83c\uddfa\ud83c\uddf8","country_flag_emoji_unicode":"U+1F1FA U+1F1F8","calling_code":"1","is_eu":false}}
this is giving me a soup object,what do i i need to add to get the country_name,geoname_id,ip in a list so i can write them later in .json file

This seems like a json response
you need to parse it from json liberary
import json
parsed_json = json.loads(str(soup))
geoname_id = parsed_json['location']['geoname_id']
country_name = parsed_json['country_name']
ip = parsed_json['ip']
A better solution while dealing with REST apis that return json responses would be:
import requests
apikey='*****d2deb67f650f022ae13d07*****'
first='http://api.ipstack.com/'
ip='134.201.250.155'
query_string = {'access_key': apikey}
res = requests.get(first+ip+third, params=query_string)
res.raise_for_status()
ip = res.json()['ip']

The documentation is very helpful here - what you need to do is in there:
soup = BeautifulSoup(html,"html.parser")
print(soup.ip)
>>> "134.201.250.155"
Let me know if you need further help!

Related

How to get the url from extracted information from a website

So basically I am stuck on the problem where I don't know how to the url from the extracted data from a website.
Here is my code:
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
soup = BeautifulSoup(req.content, "html.parser")
print(soup.prettify())
I get a lot of information on output, but the only thing I need is the url, I hope someone can help me.
P.S:
It gives me this information:
{"response":{"items":[{"url":"https:\/\/2ch.hk\/b\/src\/262671212\/16440825183970.webm","type":"video\/webm","filesize":"20259","width":1280,"height":720,"name":"1521967932778.webm","board":"b","thread":"262671212"},{"url":"https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm","type":"video\/webm","filesize":"12055","width":1280,"height":720,"name":"1526793203110.webm","board":"b","thread":"261549765"}...
But i only need this part out of all the things
https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm (Not exactly this url, but just as an example)
You can do it this way:
url_array = []
for item in soup['response']['items']:
url_array.append(item['url'])
I guess if the API returns JSON data then it should be better to just parse it directly.
The url produces json data. Beautifulsoup can't grab json data and to grab json data, you can follow the next example.
import requests
import json
data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()
url= data['response']['items'][0]['url']
if url:
url=url.replace('.webm','.mp4')
print(url)
Output:
https://2ch.hk/b/src/263361969/16451225633240.mp4
The problem is you are telling BeautifulSoup to parse JSON data as HTML. You can get the URL you need more directly with the following code
import json
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
data = json.loads(req.content)
my_url = data['response']['items'][0]['url']

beautifulsoup with bscscan api in python

I am trying to use beautifulsoup with the bscscan api but I don't know how to separate the data that someone gives me, could someone guide me?
from bs4 import BeautifulSoup
import requests
import pandas as pd
url_base = 'https://api.bscscan.com/api?module=stats&action=tokensupply&contractaddress='
contract = '0x6053b8FC837Dc98C54F7692606d632AC5e760488'
url_fin = '&apikey=YourApiKeyToken'
url = url_base+contract+url_fin
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
totalsupply = soup.find('p').text
print(totalsupply)
Screenshot:
The first part of the solution is almost identical to what you already have. Since you didn't need pandas, I've removed it. I've also changed the parser from lxml to html.
from bs4 import BeautifulSoup
import requests
url_base = 'https://api.bscscan.com/api?module=stats&action=tokensupply&contractaddress='
contract = '0x6053b8FC837Dc98C54F7692606d632AC5e760488'
url_fin = '&apikey=YourApiKeyToken'
url = url_base+contract+url_fin
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html')
By now, if you print soup you will see something like this:
{"status":"1","message":"OK","result":"1289436"}
You may think that's a python dictionary, but that's only the representation (__repr__ or __str__). You still can't extract the keys and values as you would with a normal dictionary. soup is an instance of bs4.BeautifulSoup. So convert that into a json and then save each of the three items as its own variable:
from operator import itemgetter
import json
d = json.loads(soup.get_text())
status, message, result = itemgetter('status', 'message', 'result')(d)
Now you will have status, message, and result each as a variable.
What's noteworthy here is that if you already totalsupply being a valid dictionary, simply skip the json.loads() step above:
from operator import itemgetter
status, message, result = itemgetter('status', 'message', 'result')(totalsupply)
# alternatively if you have guarantees on the order:
status, message, result = totalsupply.values()
I would say, up until very recently, dicts in Python are unordered, unpacking of d.values() may actually give you the wrong behavior. OrderedDict is not a thing until Python 3.6, and they're ordered by insertion. If this isn't something you understand yet, I would suggest sticking to the itemgetter solution.

Urllib2 get specific element

I have a web-page and I want to get the <div class="password"> element using urllbi2 in Python without using Beautiful Soup.
My code so far:
import urllib.request as urllib2
link = "http://www.chiquitooenterprise.com/password"
response = urllib2.urlopen('http://www.chiquitooenterprise.com/')
contents = response.read('password')
It gives an error.
You need to decode() the response with utf-8 as it states in the Network tab:
Hence:
import urllib.request as urllib2
link = "http://www.chiquitooenterprise.com/password"
response = urllib2.urlopen('http://www.chiquitooenterprise.com/')
output = response.read().decode('utf-8')
print(output)
OUTPUT:
YOIYEDGXPU
You don't want bs4 you say but you could use requests
import requests
r = requests.get('http://www.chiquitooenterprise.com/password')
print(r.text)

bs4 object of type 'Response' has no len()

I've been trying to get this to work, but keep getting the same TypeError object has no len(). The BeautifulSoup documentation hasn't been any help. This seems to work on every tutorial I watch, and read, but not for me. What am I doing wrong?
import requests
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt0366627/?ref_=nv_sr_1")
print(http)
This returns Response [200], but if I try to add soup... I get the len error:
import requests
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt0366627/?ref_=nv_sr_1")
soup = BeautifulSoup(http, 'lxml')
print(soup)
As the docs say:
To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:
A Response object is neither a string nor an open filehandle.
The simplest way to get one of the two, as shown in the first example in the requests docs, is the .text attribute. So:
http = requests.get("https://www.imdb.com/title/tt6738136/?ref_=inth_ov_tt")
soup = BeautifulSoup(http.text, 'lxml')
For other options see Response Content—e.g., you can get the bytes with .content to let BeautifulSoup guess at the encoding instead of reading it from the headers, or get the socket (which is an open filehandle) with .raw.
My final code. It just prints out the title, year and summary, which was all I wanted. Thank all of you for your help.
import requests
import lxml
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt0366627/?ref_=nv_sr_1")
soup = BeautifulSoup(http.content, 'lxml')
title = soup.find("div", class_="title_wrapper").find()
summary = soup.find(class_="summary_text")
print(title.text)
print(summary.text)
The Response-200 that you getting from the following code:
import requests
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt6738136/?ref_=inth_ov_tt")
print(http)
shows that the your request is succeeded and returned a response. In-order to parse the HTML code there are two ways:
Direct print the text/String format
import requests
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt6738136/?ref_=inth_ov_tt")
print(http.text)
Use a HTML parser
import requests
from bs4 import BeautifulSoup
http = requests.get("https://www.imdb.com/title/tt6738136/?ref_=inth_ov_tt")
soup = BeautifulSoup(http.text, 'lxml')
print(soup)
it is better to use BeautifulSoup as using this will allow you to extract the required data from HTML, in-case you need it

python3: Read json file from url

In python3, I want to load this_file, which is a json format.
Basically, I want to do something like [pseudocode]:
>>> read_from_url = urllib.some_method_open(this_file)
>>> my_dict = json.load(read_from_url)
>>> print(my_dict['some_key'])
some value
You were close:
import requests
import json
response = json.loads(requests.get("your_url").text)
Just use json and requests modules:
import requests, json
content = requests.get("http://example.com")
json = json.loads(content.content)
Or using the standard library:
from urllib.request import urlopen
import json
data = json.loads(urlopen(url).read().decode("utf-8"))
So you want to be able to reference specific values with inputting keys? If i think i know what you want to do, this should help you get started. You will need the libraries urlllib2, json, and bs4. just pip install them its easy.
import urllib2
import json
from bs4 import BeautifulSoup
url = urllib2.urlopen("https://www.govtrack.us/data/congress/113/votes/2013/s11/data.json")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
newDictionary=json.loads(str(soup))
I used a commonly used url to practice with.

Categories