I'm trying to get data from FlightRadar24 using the script below, based on this answer to handle cookies. When I currently type that url into a browser, I get a nice long json or dictionary including a list of lat/long/alt updates. But when I try the code below, I get the error message listed below.
What do I need to do to successfully read the json into python?
NOTE: that link may stop working in a week or two - they don't make the data available forever.
import urllib2
import cookielib
jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"
response = opener.open(url)
print response.headers
print "Got page"
print "Currently have %d cookies" % len(jar)
print jar
Traceback (most recent call last):
File "[mypath]/test v00.py", line 8, in
response = opener.open(link)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
I am not sure what you need cookies for, but the issue is that the webserver is blocking access to the user-agent being sent by urllib in the request header (which is something like - 'Python-urllib/2.7' or so) .
You should add a valid browser User-agent to the header to get the correct data. Example -
import urllib2
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"
req = urllib2.Request(url, headers={"Connection":"keep-alive", "User-Agent":"Mozilla/5.0"})
response = urllib2.urlopen(req)
jsondata = response.read()
The first Answer by #AnandSKumar is the accepted answer but here are a few more lines that are helpful, since the jsondata = response.read() returns a string.
NOTE: that link may stop working in a week or two - they don't make the data available forever.
import urllib2
import json
import numpy as np
import matplotlib.pyplot as plt
# FROM this question: https://stackoverflow.com/a/32163003
# and THIS ANSWER: https://stackoverflow.com/a/32163003/3904031
# and a little from here: https://stackoverflow.com/a/6826511
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"
req = urllib2.Request(url, headers={"Connection":"keep-alive", "User-Agent":"Mozilla/5.0"})
response = urllib2.urlopen(req)
the_dict = json.loads(response.read())
trail = the_dict['trail']
trailarray = np.array(trail)
s0, s1 = len(trailarray)/3, 3
lat, lon, alt = trailarray[:s0*s1].reshape(s0,s1).T
alt *= 10. # they drop the last zero
# plot raw data of the trail. Note there are gaps - no time information here
plt.figure()
plt.subplot(2,2,1)
plt.plot(lat)
plt.hold
plt.plot(lon)
plt.title('raw lat lon')
plt.subplot(2,2,3)
plt.plot(alt)
plt.title('raw alt')
plt.subplot(1,2,2)
plt.plot(lon, lat)
plt.title('raw lat vs lon')
plt.text(-40, 46, "this segment is")
plt.text(-40, 45.5, "transatlantic")
plt.text(-40, 45, "gap in data")
plt.savefig('raw lat lon alt')
plt.show()
To convert the time and date info to human form:
def humanize(seconds_since_epoch):
""" from https://stackoverflow.com/a/15953715/3904031 """
return datetime.datetime.fromtimestamp(seconds_since_epoch).strftime('%Y-%m-%d %H:%M:%S')
import datetime
humanize(the_dict['arrival'])
returns
'2015-08-20 17:43:50'
Related
In python 2.7 shell I ran the follwoings:
$from googlefinance import getQuotes
$import json
$from urllib2 import urlopen
$print json.dumps(getQuotes('AAPL'), indent=2)
Got error message on the 4th command as follows:
Traceback (most recent call last):
Python Shell, prompt 3, line 1
File "C:\Users\mlashkar\_development\python\v2.7\Lib\site-packages\googlefinance\__init__.py", line 70, in getQuotes
content = json.loads(request(symbols))
File "C:\Users\mlashkar\_development\python\v2.7\Lib\site-packages\googlefinance\__init__.py", line 33, in request
resp = urlopen(req)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 435, in open
response = meth(req, response)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 473, in error
return self._call_chain(*args)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 407, in _call_chain
result = func(*args)
File "C:\Users\mlashkar\_development\python\v2.7\Lib\urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
Not sure whats going on.
Here is an image of my activities.
It seems like Google Finance modified their URLs/endpoints and the googlefinance package has not been updated to reflect the change.
Since most of these changes are rather opaque to end-users (and the library you're using hasn't been updated in 2 years), you might have better luck dealing with the raw Google Finance response yourself.
The Google Finance Endpoint
You can retrieve information about a particular ticker symbol via the following URL:
https://finance.google.com/finance?output=json&q=TICKER_SYMBOL
The Response
Google Finance returns JSON results in this format
\n// [\n{\n"symbol" : "AAPL",\n"exchange" : "NASDAQ",\n"id": "22144",\n"t"
: "AAPL",\n"e" : "NASDAQ",\n"name" : "Apple Inc."\n, "f_reuters_url" :
"http:\\x2F\\x2Fstocks.us.reuters.com\\x2Fstocks\\x2Fratios.asp?rpc=66\\x26symbol=AAPL.O",\n"f_recent_quarter_date" : "Q3 (Jul \\x2717)",\n"f_annual_date" : "2016",\n"f_ttm_date" : "2015",\n"financials" :
... a lot more stuff ...
[\n]\n}]\n'
It can't be loaded by Python's JSON parser as-is because it has leading //, and wraps everything inside []. It also has Unicode-escaped characters in various strings that need to be decoded.
Complete code and parsing
I'm going to use the requests module for this, but if you want an example with the built-in urllib module, I can show that as well.
import json
import requests
rsp = requests.get('https://finance.google.com/finance?q=AAPL&output=json')
if rsp.status_code in (200,):
# This magic here is to cut out various leading characters from the JSON
# response, as well as trailing stuff (a terminating ']\n' sequence), and then
# we decode the escape sequences in the response
# This then allows you to load the resulting string
# with the JSON module.
fin_data = json.loads(rsp.content[6:-2].decode('unicode_escape'))
# print out some quote data
print('Opening Price: {}'.format(fin_data['op']))
print('Price/Earnings Ratio: {}'.format(fin_data['pe']))
print('52-week high: {}'.format(fin_data['hi52']))
print('52-week low: {}'.format(fin_data['lo52']))
This would output:
Opening Price: 162.71
Price/Earnings Ratio: 18.43
52-week high: 164.94
52-week low: 102.53
There is a lot more data that's included in a full ticker JSON than what I'm outputting, so it's up to you to decide how you want to use any of it.
Alternatives
Alternatively, you could use the yahoo-finance module, which is probably less likely to have issues like this as Yahoo still provide a real finance API.
If you are using Python 3.6 or 2.7, try using:
Quandl https://www.quandl.com/
Use WIKI it seems to be stable
example:
Apple = quandl.get('WIKI/AAPL', start_date="2016-12-31", end_date="")
Time series docs:
https://docs.quandl.com/docs/time-series-2
If you make more than 50 requests Quandl requires a key(free to use)
Multiple stock details working on this endpoint with google stock ID
https://finance.google.com/finance/data?dp=mra&output=json&catid=all&cid=13564339,5904015
All,
I have a script i have in place which fetches JSON off of a webserver. Simple as the following:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
but i noticed is that sometimes instead of returning the JSON object, it will return what looks like a response dump. See here: https://pastebin.com/fUy5YMuY
What confuses me is to how to continue on.
Right now i took the above python and wrapped it
try:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
except Exception as ex:
with open("test.txt", "w") as t:
t.write(response)
print("Error", sys.exc_info())
Is there a way to catch this? Right now I get a ValueError... and then reparse it? I was thinking to do something like:
except Exception as ex:
response = reparse(response)
but im still confused as to why it will sometimes return the JSON and other times, the header info + content.
def reparse(response):
"""
Catch the ValueError and attempt to reparse it for the json contnet
"""
Can i feed something like the pastebin dump into some sort of requests.Reponse class or similar?
Edit Here is the full stack trace I am getting.
File "scrape_people_by_fcc_docket.py", line 82, in main
json_data = get_page(limit, page*limit)
File "scrape_people_by_fcc_docket.py", line 13, in get_page
data = json.loads(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 16 column 367717 (char 3 - 368222)
None
In the above code, the response variable is defined by:
response = requests.get(url).content
which is odd because most of the time, reponse will return a JSON object which is completely parsable.
Ideally, I have been trying to find a way to, when content isnt JSON, pass some how parse it for the actual content and then continue on.
instead of using .text or .content you can use the response method: .json() which so far seems to resolve my issues. I am doing continual testing and watching for errors and will update this as needed, but it seems that the json function will return the data i need without headers, and similarly already calls json.loads or similar to parse the information.
I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.
import urllib.request
# Read the content from a document
def read_content():
quotes = open("movie_quotes.txt")
content = quotes.read()
quotes.close()
check_profanity(content)
def check_profanity(text_to_read):
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
result = connection.read()
print(result)
connection.close
read_content()
It gives me the following error
Traceback (most recent call last):
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
read_content()
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
check_profanity(content)
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?
urllib accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.
Escape your query string properly with urllib.parse.quote_plus(); it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode() function to encode all key-value pairs:
from urllib.parse import urlencode
params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")
The below response is for python 3.*
400 Bad request occurs when there is space within your input text.
To avoid this use parse.
so import it.
from urllib import request, parse
If you are sending any text along with the url then parse the text.
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
Check the explanation here - https://discussions.udacity.com/t/problem-in-profanity-with-python-3-solved/227328
The Udacity profanity checker program -
from urllib import request, parse
def read_file():
fhand = open(r"E:\Python_Programming\Udacity\movie_quotes.txt")
file_content = fhand.read()
#print (file_content)
fhand.close()
profanity_check(file_content)
def profanity_check(input_to_check):
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
req = request.urlopen(url)
answer = req.read()
#print(answer)
req.close()
if b"true" in answer:
print ("Profanity Alret!!!")
else:
print ("Nothing to worry")
read_file()
I think this code is closer to what the Lesson was aiming to, inferencing the difference between native functions, classes and functions inside classes:
from urllib import request, parse
def read_text():
quotes = open('C:/Users/Alejandro/Desktop/movie_quotes.txt', 'r+')
contents_of_file = quotes.read()
print(contents_of_file)
check_profanity(contents_of_file)
quotes.close()
def check_profanity(text_to_check):
connection = request.urlopen('http://www.wdylike.appspot.com/?q=' + parse.quote(text_to_check))
output = connection.read()
# print(output)
connection.close()
if b"true" in output:
print("Profanity Alert!!!")
elif b"false" in output:
print("This document has no curse words!")
else:
print("Could not scan the document properly")
read_text()
I'm working on the same project also using Python 3 like the most.
While looking for the solution in Python 3, I found this HowTo, and I decided to give it a try.
It seems that on some websites, including Google, connections through programming code (for example, via the urllib module), sometimes does not work properly. Apparently this has to do with the User Agent, which is recieved by the website when building the connection.
I did some further researches and came up with the following solution:
First I imported URLopener from urllib.request and created a class called ForceOpen as a subclass of URLopener.
Now I could create a "regular" User Agent by setting the variable version inside the ForceOpen class. Then just created an instance of it and used the open method in place of urlopen to open the URL.
(It works fine, but I'd still appreciate comments, suggestions or any feedback, also because I'm not absolute sure, if this way is a good alternative - many thanks)
from urllib.request import URLopener
class ForceOpen(URLopener): # create a subclass of URLopener
version = "Mozilla/5.0 (cmp; Konqueror ...)(Kubuntu)"
force_open = ForceOpen() # create an instance of it
def read_text():
quotes = open(
"/.../profanity_editor/data/quotes.txt"
)
contents_of_file = quotes.read()
print(contents_of_file)
quotes.close()
check_profanity(contents_of_file)
def check_profanity(text_to_check):
# now use the open method to open the URL
connection = force_open.open(
"http://www.wdylike.appspot.com/?q=" + text_to_check
)
output = connection.read()
connection.close()
if b"true" in output:
print("Attention! Curse word(s) have been detected.")
elif b"false" in output:
print("No curse word(s) found.")
else:
print("Error! Unable to scan document.")
read_text()
This error is hard to describe because I can't figure out how the loop is even affecting the readline() and readlines() Methods. When I try using the former, I get these unexpected Traceback errors. When I try the latter, my code runs and nothing happens. I have determined that the bug is located in the first eight lines. The first few lines of the Topics.txt file is posted.
Code
import requests
from html.parser import HTMLParser
from bs4 import BeautifulSoup
Url = "https://ritetag.com/best-hashtags-for/"
Topicfilename = "Topics.txt"
Topicfile = open(Topicfilename, 'r')
Line = Topicfile.readlines()
Linenumber = 0
for Line in Topicfile:
Linenumber += 1
print("Reading line", Linenumber)
Topic = Line
Newtopic = Topic.strip("\n").replace(' ', '').replace(',', '')
print(Newtopic)
Link = Url.join(Newtopic)
print(Link)
Sourcecode = requests.get(Link)
When I run this bit here, it prints the the URL preceded by the first character of the line.For example, it prints as 2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ etc. for 24 Hour Fitness.
Topics.txt
21st Century Fox
24 Hour Fitness
2K Games
3M
Full Error
Reading line 1 24HourFitness
2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ohttps://ritetag.com/best-hashtags-for/uhttps://ritetag.com/best-hashtags-for/rhttps://ritetag.com/best-hashtags-for/Fhttps://ritetag.com/best-hashtags-for/ihttps://ritetag.com/best-hashtags-for/thttps://ritetag.com/best-hashtags-for/nhttps://ritetag.com/best-hashtags-for/ehttps://ritetag.com/best-hashtags-for/shttps://ritetag.com/best-hashtags-for/s
Traceback (most recent call last): File
"C:\Users\Caden\Desktop\Programs\LususStudios\AutoDealBot\HashtagScanner.py",
line 17, in
Sourcecode = requests.get(Link) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\api.py",
line 71, in get
return request('get', url, params=params, **kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\api.py",
line 57, in request
return session.request(method=method, url=url, **kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py",
line 475, in request
resp = self.send(prep, **send_kwargs) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py",
line 579, in send
adapter = self.get_adapter(url=request.url) File "C:\Python34\lib\site-packages\requests-2.10.0-py3.4.egg\requests\sessions.py",
line 653, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were
found for
'2https://ritetag.com/best-hashtags-for/4https://ritetag.com/best-hashtags-for/Hhttps://ritetag.com/best-hashtags-for/ohttps://ritetag.com/best-hashtags-for/uhttps://ritetag.com/best-hashtags-for/rhttps://ritetag.com/best-hashtags-for/Fhttps://ritetag.com/best-hashtags-for/ihttps://ritetag.com/best-hashtags-for/thttps://ritetag.com/best-hashtags-for/nhttps://ritetag.com/best-hashtags-for/ehttps://ritetag.com/best-hashtags-for/shttps://ritetag.com/best-hashtags-for/s'
I think there are two issues:
You seem to be iterating over Topicfile instead of Topicfile.readLines().
Url.join(Newtopic) isn't returning what you think it is. .join takes a list (in this case, a string is a list of characters) and will insert Url in between each one.
Here is code with these problems addressed:
import requests
Url = "https://ritetag.com/best-hashtags-for/"
Topicfilename = "topics.txt"
Topicfile = open(Topicfilename, 'r')
Lines = Topicfile.readlines()
Linenumber = 0
for Line in Lines:
Linenumber += 1
print("Reading line", Linenumber)
Topic = Line
Newtopic = Topic.strip("\n").replace(' ', '').replace(',', '')
print(Newtopic)
Link = '{}{}'.format(Url, Newtopic)
print(Link)
Sourcecode = requests.get(Link)
As an aside, I also recommend using lowercased variable names since camel case is generally reserved for class names in Python :)
Firstly, python conventions are to lowercase all variable names.
Secondly, you are exhausting the file pointer when you read all the lines at first, then continue to loop over the file.
Try to simply open the file, then loop over it
linenumber = 0
with open("Topics.txt") as topicfile:
for line in topicfile:
# do work
linenumber += 1
Then, the issue in the traceback, if you look closely, you are building up this really long url string and that's definitely not a url, so requests throws an error
InvalidSchema: No connection adapters were found for '2https://ritetag.com/best-hashtags-for/4https://ritetag.com/...
And you can debug to see that Url.join(Newtopic) is "interleaving" the Url String between each character of the Newtopic list, which is what str.join will do
I'm trying to test some requests codes by providing a mocked out TestAdapter my session object. Basically, my code looks like this:
URL = 'http://blahblahblah'
class TestAdapter(requests.adapters.HTTPAdapter):
def __init__(self, response):
self._response = response
super(TestAdapter, self).__init__()
def send(self, request, *args, **kwargs):
return self.build_response(request, self._response)
resp = urllib3\
.HTTPResponse(body=json.dumps({'results': results}), status=200,
headers={'content-type': 'application/json'})
adapter = TestAdapter(resp)
session = requests.Session()
session.mount(URL, adapter)
response = session.post(URL)
response.json()
However, this use raises an error from the depths of URLLib3:
Error
Traceback (most recent call last):
File "tests/test_web_cache_client.py", line 40, in test_valid_reponse
response.json()
File "/home/wilner/.virtualenvs/hub/local/lib/python2.7/site-packages/requests/models.py", line 778, in json
if not self.encoding and len(self.content) > 3:
File "/home/wilner/.virtualenvs/hub/local/lib/python2.7/site-packages/requests/models.py", line 724, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/home/wilner/.virtualenvs/hub/local/lib/python2.7/site-packages/requests/models.py", line 653, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/home/wilner/.virtualenvs/hub/local/lib/python2.7/site-packages/urllib3/response.py", line 255, in stream
while not is_fp_closed(self._fp):
File "/home/wilner/.virtualenvs/hub/local/lib/python2.7/site-packages/urllib3/util/response.py", line 22, in is_fp_closed
raise ValueError("Unable to determine whether fp is closed.")
ValueError: Unable to determine whether fp is closed.
I could definitely find another way of testing this stuff, but this seems like it should work. What am I doing wrong?
Betamax is a library that does something very similar to what you want. The way I do it in betamax can be found here (and is reproduced below for posterity):
def add_urllib3_response(serialized, response):
if 'base64_string' in serialized['body']:
body = io.BytesIO(
base64.b64decode(serialized['body']['base64_string'].encode())
)
else:
body = body_io(**serialized['body'])
h = HTTPResponse(
body,
status=response.status_code,
headers=response.headers,
preload_content=False,
original_response=MockHTTPResponse(response.headers)
)
response.raw = h
The important thing here is that body (the first parameter) to HTTPResponse is a io.BytesIO object in every case. If you ensure you're using bytes, e.g., json.dumps({'results': results}).encode('utf-8') needs to be passed to io.BytesIO.
I've worked around this by setting requests.Response._content = requests.adapters.HTTPAdapter._body to avoid the erroneous fp check. I think this is a bug in the library.