python3: Read json file from url

python3: Read json file from url - python

In python3, I want to load this_file, which is a json format.
Basically, I want to do something like [pseudocode]:
>>> read_from_url = urllib.some_method_open(this_file)
>>> my_dict = json.load(read_from_url)
>>> print(my_dict['some_key'])
some value

You were close:
import requests
import json
response = json.loads(requests.get("your_url").text)

Just use json and requests modules:
import requests, json
content = requests.get("http://example.com")
json = json.loads(content.content)

Or using the standard library:
from urllib.request import urlopen
import json
data = json.loads(urlopen(url).read().decode("utf-8"))

So you want to be able to reference specific values with inputting keys? If i think i know what you want to do, this should help you get started. You will need the libraries urlllib2, json, and bs4. just pip install them its easy.
import urllib2
import json
from bs4 import BeautifulSoup
url = urllib2.urlopen("https://www.govtrack.us/data/congress/113/votes/2013/s11/data.json")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
newDictionary=json.loads(str(soup))
I used a commonly used url to practice with.

Related

How to get the url from extracted information from a website

So basically I am stuck on the problem where I don't know how to the url from the extracted data from a website.
Here is my code:
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
soup = BeautifulSoup(req.content, "html.parser")
print(soup.prettify())
I get a lot of information on output, but the only thing I need is the url, I hope someone can help me.
P.S:
It gives me this information:
{"response":{"items":[{"url":"https:\/\/2ch.hk\/b\/src\/262671212\/16440825183970.webm","type":"video\/webm","filesize":"20259","width":1280,"height":720,"name":"1521967932778.webm","board":"b","thread":"262671212"},{"url":"https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm","type":"video\/webm","filesize":"12055","width":1280,"height":720,"name":"1526793203110.webm","board":"b","thread":"261549765"}...
But i only need this part out of all the things
https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm (Not exactly this url, but just as an example)

You can do it this way:
url_array = []
for item in soup['response']['items']:
url_array.append(item['url'])
I guess if the API returns JSON data then it should be better to just parse it directly.

The url produces json data. Beautifulsoup can't grab json data and to grab json data, you can follow the next example.
import requests
import json
data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()
url= data['response']['items'][0]['url']
if url:
url=url.replace('.webm','.mp4')
print(url)
Output:
https://2ch.hk/b/src/263361969/16451225633240.mp4

The problem is you are telling BeautifulSoup to parse JSON data as HTML. You can get the URL you need more directly with the following code
import json
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
data = json.loads(req.content)
my_url = data['response']['items'][0]['url']

Urllib2 get specific element

I have a web-page and I want to get the <div class="password"> element using urllbi2 in Python without using Beautiful Soup.
My code so far:
import urllib.request as urllib2
link = "http://www.chiquitooenterprise.com/password"
response = urllib2.urlopen('http://www.chiquitooenterprise.com/')
contents = response.read('password')
It gives an error.

You need to decode() the response with utf-8 as it states in the Network tab:
Hence:
import urllib.request as urllib2
link = "http://www.chiquitooenterprise.com/password"
response = urllib2.urlopen('http://www.chiquitooenterprise.com/')
output = response.read().decode('utf-8')
print(output)
OUTPUT:
YOIYEDGXPU

You don't want bs4 you say but you could use requests
import requests
r = requests.get('http://www.chiquitooenterprise.com/password')
print(r.text)

api with python to give dictionary

from urllib.request import urlopen
from bs4 import BeautifulSoup
apikey='*****d2deb67f650f022ae13d07*****'
first='http://api.ipstack.com/'
ip='134.201.250.155'
third='?access_key='
print(first+ip+third+apikey)
#html=urlopen(first+ip+third+apikey)
soup=BeautifulSoup(html,"html.parser")
print(soup)
i had to hide the first,last 5 digits of my apikey,anyway this gives
{"ip":"134.201.250.155","type":"ipv4","continent_code":"NA","continent_name":"North America","country_code":"US","country_name":"United States","region_code":"CA","region_name":"California","city":"La Jolla","zip":"92037","latitude":32.8455,"longitude":-117.2521,"location":{"geoname_id":5363943,"capital":"Washington D.C.","languages":[{"code":"en","name":"English","native":"English"}],"country_flag":"http:\/\/assets.ipstack.com\/flags\/us.svg","country_flag_emoji":"\ud83c\uddfa\ud83c\uddf8","country_flag_emoji_unicode":"U+1F1FA U+1F1F8","calling_code":"1","is_eu":false}}
this is giving me a soup object,what do i i need to add to get the country_name,geoname_id,ip in a list so i can write them later in .json file

This seems like a json response
you need to parse it from json liberary
import json
parsed_json = json.loads(str(soup))
geoname_id = parsed_json['location']['geoname_id']
country_name = parsed_json['country_name']
ip = parsed_json['ip']
A better solution while dealing with REST apis that return json responses would be:
import requests
apikey='*****d2deb67f650f022ae13d07*****'
first='http://api.ipstack.com/'
ip='134.201.250.155'
query_string = {'access_key': apikey}
res = requests.get(first+ip+third, params=query_string)
res.raise_for_status()
ip = res.json()['ip']

The documentation is very helpful here - what you need to do is in there:
soup = BeautifulSoup(html,"html.parser")
print(soup.ip)
>>> "134.201.250.155"
Let me know if you need further help!

python beautiful soup import urls

I am trying to import a list of urls and grab pn2 and main1. I can run it without importing the file so I know it works but I just have no idea what to do with the import. Here is what I have tried most recent and below it is a small portion of the urls. Thanks in advance.
import urllib
import urllib.request
import csv
from bs4 import BeautifulSoup
csvfile = open("ecco1.csv")
csvfilelist = csvfile.read()
theurl="csvfilelist"
soup = BeautifulSoup(theurl,"html.parser")
for row in csvfilelist:
for pn in soup.findAll('td',{"class":"productText"}):
pn2.append(pn.text)
for main in soup.find_all('div',{"class":"breadcrumb"}):
main1 = main.text
print (main1)
print ('\n'.join(pn2))
Urls:
http://www.eccolink.com/products/productresults.aspx?catId=2458
http://www.eccolink.com/products/productresults.aspx?catId=2464
http://www.eccolink.com/products/productresults.aspx?catId=2435
http://www.eccolink.com/products/productresults.aspx?catId=2446
http://www.eccolink.com/products/productresults.aspx?catId=2463

From what I see, you are opening a CSV file and using BeautifulSoup to parse it.
That should not be the way.
BeautifulSoup parses html files, not CSV.
Looking at your code, it seems correct if you were passing in html code to Bs4.
from bs4 import BeautifulSoup
import requests
links = []
file = open('links.txt')
html = requests.get('http://www.example.com')
soup = BeautifulSoup(html, 'html.parser')
for x in soup.find_all('a',"class":"abc"):
links.append(x)
file.write(x)
file.close()
Above is a very basic implementation of how I could get a target element in the html code and write it to a file/ or append it to a list. Use Requests rather than urllib. It is a better library and more modern.
If you want to input your data as CSV, my best option is to use csv reader as import.
Hope that helps.

BeautifulSoup HTMLParseError

New to Python, have a simple, situational question:
Trying to use BeautifulSoup to parse a series of pages.
from bs4 import BeautifulSoup
import urllib.request
BeautifulSoup(urllib.request.urlopen('http://bit.ly/'))
Traceback ...
html.parser.HTMLParseError: expected name token at '<!=KN\x01...
Working on Windows 7 64-bit with Python 3.2.
Do I need Mechanize? (which would entail Python 2.X)

If that URL is correct, you're asking why an HTML parser throws an error parsing an MP3 file. I believe the answer to this to be self-evident...

If you were trying to download that MP3, you could do something like this:
import urllib2
BLOCK_SIZE = 16 * 1024
req = urllib2.urlopen("http://bit.ly/xg7enD")
#Make sure to write as a binary file
fp = open("someMP3.mp3", 'wb')
try:
while True:
data = req.read(BLOCK_SIZE)
if not data: break
fp.write(data)
finally:
fp.close()

if you want to download a file in python you can use this as well
import urllib
urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3")
and it will save your file in the current working directory with "myfile.mp3" name.
i am able to download all types of files through it.
hope it may help !

instead of urllib.request i suggest use requests, and from this lib use get()
from requests import get
from bs4 import BeautifulSoup
soup = BeautifulSoup(
get(url="http://www.google.com").content,
'html.parser'
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python3: Read json file from url - python

In python3, I want to load this_file, which is a json format. Basically, I want to do something like [pseudocode]: >>> read_from_url = urllib.some_method_open(this_file) >>> my_dict = json.load(read_from_url) >>> print(my_dict['some_key']) some value

You were close: import requests import json response = json.loads(requests.get("your_url").text)

Just use json and requests modules: import requests, json content = requests.get("http://example.com") json = json.loads(content.content)

Or using the standard library: from urllib.request import urlopen import json data = json.loads(urlopen(url).read().decode("utf-8"))

Related

How to get the url from extracted information from a website

Urllib2 get specific element

api with python to give dictionary

python beautiful soup import urls

BeautifulSoup HTMLParseError

Categories

Resources