Obtain elevation from latitude longitude coordinates with a simple python script - python

I have a python script that I got from this question that will pull from the USGS Elevation Point Query Service. However, It keeps timing out and kicks me out after a seemingly random amount of time and before my query finishes. I need another method to pull elevation data given lat lon coordinates.
Here is my current query:
# ========= pull elev from usgs server ======
# USGS POINT QUERY SERVICE ==================
url = r'https://nationalmap.gov/epqs/pqs.php?'
# ===========================================
# coordinates with known elevation
lat = [48.633, 48.733, 45.1947, 45.1962]
lon = [-93.9667, -94.6167, -93.3257, -93.2755]
# create df
df = pd.DataFrame({
'lat': lat,
'lon': lon
})
def elevation_function(df, lat_column, long_column):
elevations = []
counter = 0
start = time.time()
for lat, lon in zip(df[lat_column], df[long_column]):
# define rest query params
params = {
'output': 'json',
'x': lon,
'y': lat,
'units': 'Meters'
}
# format query string and return query value
result = requests.get((url + urllib.parse.urlencode(params)))
elevations.append(result.json()['USGS_Elevation_Point_Query_Service']['Elevation_Query']['Elevation'])
counter += 1
print('Proportion of job complete: {}'.format(round(counter/df.shape[0],3)))
end = time.time()
print(str(round(end - start)) + " seconds into job\n")
df['elev'] = elevations
return elevations
start = time.time()
count = 0
for i in range(100):
count += 1
elevations = elevation_function(df, lat_column='lat', long_column='lon')
end = time.time()
print(str(round(end - start)))

Streamline the function and add error handling:
elevation_function needs to be written to work with pandas.DataFrame.apply
Using apply, with axis=1, automatically iterates through each row of coordinates
New Functions:
make_remote_request will continue to make the request until it gets response.
Change the exception to fit the exception returned by the server (e.g. except (OSError, urllib3.exceptions.ProtocolError) as error)
Optionally, import time and add time.sleep(5) before continue in the exception, to play nice with the remote server.
def make_remote_request(url: str, params: dict) -> json:
"""
Makes the remote request
Continues making attempts until it succeeds
"""
count = 1
while True:
try:
response = requests.get((url + urllib.parse.urlencode(params)))
except (OSError, urllib3.exceptions.ProtocolError) as error:
print('\n')
print('*' * 20, 'Error Occured', '*' * 20)
print(f'Number of tries: {count}')
print(f'URL: {url}')
print(error)
print('\n')
count += 1
continue
break
return response
def eleveation_function(x):
url = 'https://nationalmap.gov/epqs/pqs.php?'
params = {'x': x[1],
'y': x[0],
'units': 'Meters',
'output': 'json'}
result = make_remote_request(url, params)
return result.json()['USGS_Elevation_Point_Query_Service']['Elevation_Query']['Elevation']
Implement the function
import requests
import urllib
import urllib3
import pandas as pd
# coordinates with known elevation
lat = [48.633, 48.733, 45.1947, 45.1962]
lon = [-93.9667, -94.6167, -93.3257, -93.2755]
# create df
df = pd.DataFrame({'lat': lat, 'lon': lon})
lat lon
48.6330 -93.9667
48.7330 -94.6167
45.1947 -93.3257
45.1962 -93.2755
# apply the function
df['elevations'] = df.apply(eleveation_function, axis=1)
lat lon elevations
48.6330 -93.9667 341.14
48.7330 -94.6167 328.80
45.1947 -93.3257 262.68
45.1962 -93.2755 272.64

Can also pass in params with the following:
PARAMS = {'x':x[1], 'y':x[0], 'units':'Feet', 'output':'json'}
r = requests.get(url = URL, params = PARAMS)

Related

I need helping with Python code to get the KDJ Indicator values from Kucoin API

I have a Python script which works. It connects to Kucoin API and gets kline data then performs calculations to return values of K,D,J for the trading pair at the specified time interval. I then need to compare the values of k,d,j and see which one is higher, then open or close a position accordingly.
The problem with the code is that even though it does seems to get the data from Kucoin, the values of K,D,J never match the actual kdj indicator values you can see on the Kucoin app. So I wanted to ask for help fixing up the code so that the values of k,d and j it returns are correct and similar to what we see in the app for the trading pair (e.g BTC-USDT)
Here is the function called to get KDJ values:
def get_kdj_indicator(pair, tf):
tf = str(tf) + 'min'
try:
now = datetime.datetime.utcnow()
start_time = int((now - datetime.timedelta(hours=12)).timestamp())
end_time = int(now.timestamp())
response = client.get_kline_data(pair, tf, start_time, end_time)
#print(response)
df = pd.DataFrame(response, columns=['time', 'open', 'high', 'low', 'close', 'volume', 'amount'])
# Convert columns to pandas float
for column in ['open', 'high', 'low', 'close', 'volume', 'amount']:
df[column] = df[column].astype(float)
# Calculate KDJ indicator
highs = df['high'].to_numpy()
lows = df['low'].to_numpy()
closes = df['close'].to_numpy()
# Initialize KDJ parameters
k_values = []
d_values = []
rsv_values = []
k = 50
d = 50
# Calculate RSV values
rsv_values = []
for i in range(8, len(highs)):
high_period = max(highs[i-8:i+1])
low_period = min(lows[i-8:i+1])
close = closes[i]
if high_period == low_period:
rsv = 0
else:
rsv = (close - low_period) / (high_period - low_period) * 100
rsv_values.append(rsv)
# Calculate K and D values
for i in range(len(rsv_values)):
if i == 0:
k_values.append(k)
d_values.append(d)
else:
k = 2/3 * k + 1/3 * rsv_values[i]
d = 2/3 * d + 1/3 * k
k_values.append(k)
d_values.append(d)
# Calculate J values
j_values = [3*k - 2*d for k, d in zip(k_values, d_values)]
j = j_values[-1]
print("DEBUG get_kdj_indicator")
# Return the KDJ values as a tuple
return (k_values, d_values, j_values)
except Exception as e:
print(f"Error while getting KDJ indicator: {e}")
return None
Here is how I connect to Kucoin API:
from kucoin.client import Client
import time, csv, datetime
import numpy as np
import pandas as pd
trading_pair = 'BTC-USDT'
trade_value = 10
time_frame = 15
print("ENTERED KUCION SANDBOX API MODE")
api_key = '63eeXXXXXXXXXbfe4'
api_secret = '4166643-XXXXXX-XXXXXXX-3caABCDE17d'
api_passphrase = 'passerword'
try:
client = Client(api_key, api_secret, api_passphrase, sandbox=True)
except Exception as e:
print(f"Error while connecting to Kucoin client: {e}")
exit()
Here is a sample of how we use the kdj indicator function to open a position:
kdj = get_kdj_indicator(trading_pair, time_frame)
if kdj is not None:
k, d, j = kdj[0], kdj[1], kdj[2]
print(k[-1], d[-1], j[-1])
if j[-1] >= k[-1] and k[-1] >= d[-1]:
open_position(trading_pair, trade_value, stop_loss):
I previously tried to use the Ta-Lib library but was unable to install Ta-Lib using pip so i stopped using it. I thought that we could just send a query/request using the kucoin API which would then simply return the values of k,d and j indicator at the time of request for the specified trading pair but I dont think they allow that.
I have tried hiring someone but they were unable to calculate KDJ.
I am using documentation and python kucoin from this github:
https://github.com/sammchardy/python-kucoin/blob/develop/kucoin/client.py

I have an error in my python code and I don't know how to debug it. Can you help me debug my code?

I am trying to write a script in order to:
-download posts from twitter in a certain time period; in a certain city
-the post have to be in csv format
-the csv has to have the columns: text, latitute, longitude, date, time, user name
Nonteheless, I get an error. Can you help me solve it?
This is the error:
runfile('C:/Users/Dottorandi01/Desktop/pynuovo/prove.py', wdir='C:/Users/Dottorandi01/Desktop/pynuovo')
https://api.twitter.com/2/tweets/search/all?max_results=500&tweet.fields=geo&expansions=attachments.media_keys&query=has:geo point_radius:[-76.522224 3.420556 1km]&start_time=2014-01-01T00:00:00.00Z&end_time=2015-01-12T00:00:00.00Z
Traceback (most recent call last):
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3621 in get_loc
return self._engine.get_loc(casted_key)
File pandas_libs\index.pyx:136 in pandas._libs.index.IndexEngine.get_loc
File pandas_libs\index.pyx:163 in pandas._libs.index.IndexEngine.get_loc
File pandas_libs\hashtable_class_helper.pxi:5198 in pandas._libs.hashtable.PyObjectHashTable.get_item
File pandas_libs\hashtable_class_helper.pxi:5206 in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'author_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ~\Desktop\pynuovo\prove.py:134 in
getTwitterPost(lat, lon, radius, start_time, end_time)
File ~\Desktop\pynuovo\prove.py:51 in getTwitterPost
df['username'] = df['author_id'].apply(lambda x: x['username'])
File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3505 in getitem
indexer = self.columns.get_loc(key)
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623 in get_loc
raise KeyError(key) from err
KeyError: 'author_id'
This is the code:
#insert here acedemic token and query info
Token = "xxx" #insert here acedemic token (I have hidden my personal token here)
[lat, lon, radius, start_time, end_time]=['3.420556', '-76.522224', '1', '2014-01-01T00:00:00.00Z', '2015-01-12T00:00:00.00Z']
#import modules
import requests
import pandas as pd
import time
import configparser
#define functions
def getTwitterPost(lat, lon, radius, start_time, end_time):
query='point_radius:[' + lon + ' ' + lat + ' ' + radius + 'km]'
baseUrl = "https://api.twitter.com/2/tweets/search/all?max_results=500&tweet.fields=geo&expansions=attachments.media_keys&query=has:geo "+query+"&start_time="+start_time+"&end_time="+end_time
headers = {"Authorization": "Bearer "+Token}
tweets=[]
print(baseUrl)
# we do first call to get first next_token
resp = requests.get(
baseUrl, headers=headers).json()
data = []
if('data' in resp):
data = resp['data']
tweets = _mergeTweets(tweets, data)
#print(data)
# we iterate until no next_token (end of pagination)
if 'meta' in resp and 'next_token' in resp['meta']:
next_token = resp['meta']['next_token']
while True:
time.sleep(3) #api rate limit 300 per 15 min
resp = requests.get(baseUrl+"&next_token="+next_token, headers=headers).json()
if('data' in resp):
data = resp['data']
tweets = _mergeTweets(tweets, data)
if('meta' in resp and 'next_token' in resp['meta']):
next_token = resp['meta']['next_token']
else:
break
df = pd.DataFrame(tweets)
df['created_at'] = pd.to_datetime(df['created_at'])
df['date'] = df['created_at'].dt.date
df['time'] = df['created_at'].dt.time
df['username'] = df['author_id'].apply(lambda x: x['username'])
df['gender'] = df['author_id'].apply(lambda x: x['gender'] if 'gender' in x else None)
df.to_csv('2014calitweets.csv', index=False)
# split data in two dataframe: one with lat e long and one without
print('print df')
print(df.columns)
print(df)
dfNOLatLon = df[df['longitude']=='']
print('\n\nprint df no lat no lon')
print(dfNOLatLon)
dfLatLon = df[df['longitude']!='']
print('\n\nprint df si lat si lon')
print(dfLatLon)
# create a distinct of place_ids to find use 1.1 api to get lat and long
placesToFind=list(set(dfNOLatLon['place_id'].to_list()))
print('placesToFind')
print(placesToFind)
dfPlace = findLatLon(placesToFind, headers)
print('print df Places')
print(dfPlace)
# drop empty columns and join with dataframe without lat and log with dataframe with informations
dfNOLatLon.drop(columns=['longitude', 'latitude'], inplace=True)
print(dfNOLatLon.columns)
print(dfNOLatLon)
dfNOLatLon=dfNOLatLon.merge(dfPlace, on='place_id', how='left')
print(dfNOLatLon.columns)
print(dfNOLatLon)
# concat and export to csv file
dfFinal=pd.concat([dfLatLon, dfNOLatLon])
print(dfFinal)
dfFinal.to_csv('2014calitweets.csv')
extract data from response and concat with stored tweets
def _mergeTweets(tweets, data):
for el in data:
tweet = {}
try:
tweet['created_at'] = el['created_at']
except:
tweet['created_at']=''
try:
tweet['text']=el['text']
except:
tweet['text']=''
#sometimes has:geo return post with only place_id we assigned empty string to filter in second step
if('geo' in el and 'coordinates' in el['geo'] and el['geo']['coordinates']['type'] == 'Point' and 'place_id' in el['geo']):
tweet['longitude'] = str(
el['geo']['coordinates']['coordinates'][0])
tweet['latitude'] = str(el['geo']['coordinates']['coordinates'][1])
tweet['place_id'] = el['geo']['place_id']
elif('geo' in el and 'place_id' in el['geo']):
tweet['place_id'] = el['geo']['place_id']
tweet['longitude'] = ''
tweet['latitude'] = ''
tweets.append(tweet)
return tweets
create dataframe with place_id, latitude and longitude using twitter 1.1 api from list of place_id
def findLatLon(places, headers):
placeList=[]
baseUrl = "https://api.twitter.com/1.1/geo/id/"
print(places)
step = int(70)
print(len(places))
for i in range(0, len(places), step):
print(i)
placesInfo = [requests.get(baseUrl+id+".json", headers=headers).json() for id in places[i*step:(i+1)*step]]
for place in placesInfo:
if('geometry' in place and 'type' in place['geometry'] and place['geometry']['type'] == "Point"):
placeList.append({'place_id': place['id'], 'longitude': place['geometry']['coordinates'][0], 'latitude': place['geometry']['coordinates'][1]})
else:
#sometimes it return a polygon so we use centroid coordinates
placeList.append(
{'place_id': place['id'], 'longitude': place['centroid'][0], 'latitude': place['centroid'][1]})
if len(places) > (i+1)*step:
time.sleep(900) #api rate limit 70 per 15 mins
return pd.DataFrame(placeList)
getTwitterPost(lat, lon, radius, start_time, end_time)

How to save the results of a function as a new CSV?

The code is required to take addresses from a csv file and then use a function to compute the corresponding Latitudes and Longitudes. While I get the correct Latitudes and Longitudes but I am unable to save them to a new csv file.
import requests
import urllib.parse
import pandas as pd
#function to get the Coordinates:
def lat_long(add):
url = 'https://nominatim.openstreetmap.org/search/'+urllib.parse.quote(add)+'?format=json'
response = requests.get(url).json()
print(response[0]["lat"], response[0]["lon"])
return
#function is called to get the 5 Address Values from the CSV File and pass on to the function
df = pd.read_csv('C:\\Users\\Umer Abbas\\Desktop\\lat_long.csv')
i = 0
print("Latitude","","Longitude")
for i in range (0,5):
add = df._get_value(i, 'Address')
lat_long(add)
Output is:
Latitude Longitude
34.0096961 71.8990106
34.0123846 71.5787458
33.6038766 73.048136
33.6938118 73.0651511
24.8546842 67.0207055
I want to save this output into a new file and I am unable to get the results.
Just a small modification might help
def lat_long(add):
url = 'https://nominatim.openstreetmap.org/search/'+urllib.parse.quote(add)+'?format=json'
response = requests.get(url).json()
print(response[0]["lat"], response[0]["lon"])
Lat = response[0]["lat"]
Long = response[0]["lon"]
return Lat, Long
Lat_List = []
Long_List = []
df = pd.read_csv('C:\\Users\\Umer Abbas\\Desktop\\lat_long.csv')
i = 0
print("Latitude","","Longitude")
for i in range (0,5):
add = df._get_value(i, 'Address')
Lat =lat_long(add)[0]
Long = lat_long(add)[1]
Lat_List.append(Lat)
Long_List.append(Long)
df1 = pd.DataFrame(data, columns=['Latitude', 'Longitude])
df1['Latitude'] = Lat_List
df1['Longitude'] = Long_List
df1.to_csv("LatLong.csv)
#one line of change here
def lat_long(add):
url = 'https://nominatim.openstreetmap.org/search/'+urllib.parse.quote(add)+'?format=json'
response = requests.get(url).json()
print(response[0]["lat"], response[0]["lon"])
return response[0]["lat"], response[0]["lon"] # return the lat and long
# three lines added here
df = pd.read_csv('C:\\Users\\Umer Abbas\\Desktop\\lat_long.csv')
i = 0
l=[] # define empty list
print("Latitude","","Longitude")
for i in range (0,5):
add = df._get_value(i, 'Address')
l.append(lat_long(add)) # append to the empty l
# create a dataframe and output as csv
pd.DataFrame(l, columns=['Longitude', 'Latitude']).to_csv('test.csv', sep= ' ')

Retrieving data from the Air Quality Index (AQI) website through the API and only recieving small nr. of stations

I'm working on a personal project and I'm trying to retrieve air quality data from the https://aqicn.org website using their API.
I've used this code, which I've copied and adapted for the city of Bucharest as follows:
import pandas as pd
import folium
import requests
# GET data from AQI website through the API
base_url = "https://api.waqi.info"
path_to_file = "~/path"
# Got token from:- https://aqicn.org/data-platform/token/#/
with open(path_to_file) as f:
contents = f.readlines()
key = contents[0]
# (lat, long)-> bottom left, (lat, lon)-> top right
latlngbox = "44.300264,25.920181,44.566991,26.297836" # For Bucharest
trail_url=f"/map/bounds/?token={key}&latlng={latlngbox}" #
my_data = pd.read_json(base_url + trail_url) # Joined parts of URL
print('columns->', my_data.columns) #2 cols ‘status’ and ‘data’ JSON
### Built a dataframe from the json file
all_rows = []
for each_row in my_data['data']:
all_rows.append([each_row['station']['name'],
each_row['lat'],
each_row['lon'],
each_row['aqi']])
df = pd.DataFrame(all_rows, columns=['station_name', 'lat', 'lon', 'aqi'])
# Cleaned the DataFrame
df['aqi'] = pd.to_numeric(df.aqi, errors='coerce') # Invalid parsing to NaN
# Remove NaN entries in col
df1 = df.dropna(subset = ['aqi'])
Unfortunately it only retrieves 4 stations whereas there are many more available on the actual site. In the API documentation the only limitation I saw was for "1,000 (one thousand) requests per second" so why can't I get more of them?
Also, I've tried to modify the lat-long values and managed to get more stations, but they were outside the city I was interested in.
Here is a view of the actual perimeter I've used in the embedded code.
If you have any suggestions as of how I can solve this issue, I'd be very happy to read your thoughts. Thank you!
Try using waqi through aqicn... not exactly a clean API but I found it to work quite well
import pandas as pd
url1 = 'https://api.waqi.info'
# Get token from:- https://aqicn.org/data-platform/token/#/
token = 'XXX'
box = '113.805332,22.148942,114.434299,22.561716' # polygon around HongKong via bboxfinder.com
url2=f'/map/bounds/?latlng={box}&token={token}'
my_data = pd.read_json(url1 + url2)
all_rows = []
for each_row in my_data['data']:
all_rows.append([each_row['station']['name'],each_row['lat'],each_row['lon'],each_row['aqi']])
df = pd.DataFrame(all_rows,columns=['station_name', 'lat', 'lon', 'aqi'])
From there its easy to plot
df['aqi'] = pd.to_numeric(df.aqi,errors='coerce')
print('with NaN->', df.shape)
df1 = df.dropna(subset = ['aqi'])
df2 = df1[['lat', 'lon', 'aqi']]
init_loc = [22.396428, 114.109497]
max_aqi = int(df1['aqi'].max())
print('max_aqi->', max_aqi)
m = folium.Map(location = init_loc, zoom_start = 5)
heat_aqi = HeatMap(df2, min_opacity = 0.1, max_val = max_aqi,
radius = 60, blur = 20, max_zoom = 2)
m.add_child(heat_aqi)
m
Or as such
centre_point = [22.396428, 114.109497]
m2 = folium.Map(location = centre_point,tiles = 'Stamen Terrain', zoom_start= 6)
for idx, row in df1.iterrows():
lat = row['lat']
lon = row['lon']
station = row['station_name'] + ' AQI=' + str(row['aqi'])
station_aqi = row['aqi']
if station_aqi > 300:
pop_color = 'red'
elif station_aqi > 200:
pop_color = 'orange'
else:
pop_color = 'green'
folium.Marker(location= [lat, lon],
popup = station,
icon = folium.Icon(color = pop_color)).add_to(m2)
m2
checking for stations within HK, returns 19
df[df['station_name'].str.contains('HongKong')]

Read data from OECD API into python (and pandas)

I'm trying to download data from OECD API (https://data.oecd.org/api/sdmx-json-documentation/) into python.
I managed to download data in SDMX-JSON format (and transform it to JSON) so far:
OECD_ROOT_URL = "http://stats.oecd.org/SDMX-JSON/data"
def make_OECD_request(dsname, dimensions, params = None, root_dir = OECD_ROOT_URL):
"""Make URL for the OECD API and return a response"""
"""4 dimensions: location, subject, measure, frequency"""
if not params:
params = {}
dim_args = ['+'.join(d) for d in dimensions]
dim_str = '.'.join(dim_args)
url = root_dir + '/' + dsname + '/' + dim_str + '/all'
print('Requesting URL ' + url)
return rq.get(url = url, params = params)
response = make_OECD_request('MEI'
, [['USA', 'CZE'], [], [], ['M']]
, {'startTime': '2009-Q1', 'endTime': '2010-Q1'})
if (response.status_code == 200):
json = response.json()
How can I transform the data set into pandas.DataFrame? I tried pandas.read_json() and pandasdmx library, but I was not able to solve this.
The documentation the original question points to does not (yet?) mention that the API accepts the parameter contentType, which may be set to csv. That makes it trivial to use with Pandas.
import pandas as pd
def get_from_oecd(sdmx_query):
return pd.read_csv(
f"https://stats.oecd.org/SDMX-JSON/data/{sdmx_query}?contentType=csv"
)
print(get_from_oecd("MEI_FIN/IRLT.AUS.M/OECD").head())
Update:
The function to automatically download the data from OECD API is now available in my Python library CIF (abbreviation for the Composite Indicators Framework, installable via pip):
from cif import cif
data, subjects, measures = cif.createDataFrameFromOECD(countries = ['USA'], dsname = 'MEI', frequency = 'M')
Original answer:
If you need your data in Pandas DataFrame format, it is IMHO better to send your request to OECD with additional parameter 'dimensionAtObservation': 'AllDimensions', which results in more comprehensive JSON file.
Use following functions to download the data:
import requests as rq
import pandas as pd
import re
OECD_ROOT_URL = "http://stats.oecd.org/SDMX-JSON/data"
def make_OECD_request(dsname, dimensions, params = None, root_dir = OECD_ROOT_URL):
# Make URL for the OECD API and return a response
# 4 dimensions: location, subject, measure, frequency
# OECD API: https://data.oecd.org/api/sdmx-json-documentation/#d.en.330346
if not params:
params = {}
dim_args = ['+'.join(d) for d in dimensions]
dim_str = '.'.join(dim_args)
url = root_dir + '/' + dsname + '/' + dim_str + '/all'
print('Requesting URL ' + url)
return rq.get(url = url, params = params)
def create_DataFrame_from_OECD(country = 'CZE', subject = [], measure = [], frequency = 'M', startDate = None, endDate = None):
# Request data from OECD API and return pandas DataFrame
# country: country code (max 1)
# subject: list of subjects, empty list for all
# measure: list of measures, empty list for all
# frequency: 'M' for monthly and 'Q' for quarterly time series
# startDate: date in YYYY-MM (2000-01) or YYYY-QQ (2000-Q1) format, None for all observations
# endDate: date in YYYY-MM (2000-01) or YYYY-QQ (2000-Q1) format, None for all observations
# Data download
response = make_OECD_request('MEI'
, [[country], subject, measure, [frequency]]
, {'startTime': startDate, 'endTime': endDate, 'dimensionAtObservation': 'AllDimensions'})
# Data transformation
if (response.status_code == 200):
responseJson = response.json()
obsList = responseJson.get('dataSets')[0].get('observations')
if (len(obsList) > 0):
print('Data downloaded from %s' % response.url)
timeList = [item for item in responseJson.get('structure').get('dimensions').get('observation') if item['id'] == 'TIME_PERIOD'][0]['values']
subjectList = [item for item in responseJson.get('structure').get('dimensions').get('observation') if item['id'] == 'SUBJECT'][0]['values']
measureList = [item for item in responseJson.get('structure').get('dimensions').get('observation') if item['id'] == 'MEASURE'][0]['values']
obs = pd.DataFrame(obsList).transpose()
obs.rename(columns = {0: 'series'}, inplace = True)
obs['id'] = obs.index
obs = obs[['id', 'series']]
obs['dimensions'] = obs.apply(lambda x: re.findall('\d+', x['id']), axis = 1)
obs['subject'] = obs.apply(lambda x: subjectList[int(x['dimensions'][1])]['id'], axis = 1)
obs['measure'] = obs.apply(lambda x: measureList[int(x['dimensions'][2])]['id'], axis = 1)
obs['time'] = obs.apply(lambda x: timeList[int(x['dimensions'][4])]['id'], axis = 1)
obs['names'] = obs['subject'] + '_' + obs['measure']
data = obs.pivot_table(index = 'time', columns = ['names'], values = 'series')
return(data)
else:
print('Error: No available records, please change parameters')
else:
print('Error: %s' % response.status_code)
You can create requests like these:
data = create_DataFrame_from_OECD(country = 'CZE', subject = ['LOCOPCNO'])
data = create_DataFrame_from_OECD(country = 'USA', frequency = 'Q', startDate = '2009-Q1', endDate = '2010-Q1')
data = create_DataFrame_from_OECD(country = 'USA', frequency = 'M', startDate = '2009-01', endDate = '2010-12')
data = create_DataFrame_from_OECD(country = 'USA', frequency = 'M', subject = ['B6DBSI01'])
data = create_DataFrame_from_OECD(country = 'USA', frequency = 'Q', subject = ['B6DBSI01'])
You can recover the data from the source using code like this.
from urllib.request import urlopen
import json
URL = 'http://stats.oecd.org/SDMX-JSON/data/MEI/USA+CZE...M/all'
response = urlopen(URL).read()
responseDict = json.loads(str(response)[2:-1])
print (responseDict.keys())
print (len(responseDict['dataSets']))
Here is the output from this code.
dict_keys(['header', 'structure', 'dataSets'])
1
If you are curious about the appearance of the [2:-1] (I would be) it's because for some reason unknown to me the str function leaves some extraneous characters at the beginning and end of the string when it converts the byte array passed to it. json.loads is documented to require a string as input.
This is the code I used to get to this point.
>>> from urllib.request import urlopen
>>> import json
>>> URL = 'http://stats.oecd.org/SDMX-JSON/data/MEI/USA+CZE...M/all'
>>> response = urlopen(URL).read()
>>> len(response)
9886387
>>> response[:50]
b'{"header":{"id":"1975590b-346a-47ee-8d99-6562ccc11'
>>> str(response[:50])
'b\'{"header":{"id":"1975590b-346a-47ee-8d99-6562ccc11\''
>>> str(response[-50:])
'b\'"uri":"http://www.oecd.org/contact/","text":""}]}}\''
I understand that this is not a complete solution as you must still crack into the dataSets structure for the data to put into pandas. It's a list but you could explore it starting with this sketch.
The latest release of pandasdmx (pandasdmx.readthedocs.io) fixes previous issues accessing OECD data in sdmx-json.

Categories