I am having a go at using the sentinelsat python API to download satellite imagery. However, I am receiving error messages when I try to convert to a pandas dataframe. This code works and downloads my requested sentinel satellite images:
from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt
from datetime import date
api = SentinelAPI('*****', '*****', 'https://scihub.copernicus.eu/dhus')
footprint = geojson_to_wkt(read_geojson('testAPIpoly.geojson'))
products = api.query(footprint, cloudcoverpercentage = (0,10))
#this works
api.download_all(products)
However if I instead attempt to convert to a pandas dataframe
#api.download_all(products)
#this does not work
products_df = api.to_dataframe(products)
api.download_all(products_df)
I receive an extensive error message that includes
"sentinelsat.sentinel.SentinelAPIError: HTTP status 500 Internal Server Error: InvalidKeyException : Invalid key (processed) to access Products
"
(where processed is also replaced with title, platformname, processingbaseline, etc.). I've tried a few different ways to convert to a dataframe and filter/sort results and have received an error message every time (note: I have pandas/geopandas installed). How can I convert to a dataframe and filter/sort with the sentinelsat API?
Instead of
api.download_all(products_df)
try
api.download_all(products_df.index)
Related
In the Azure ML Studio I prepared a model with AutoML for time series forecasting. The data have some rare gaps in all data sets.
I am using the following code to call for a deployed Azure AutoML model as a web service:
import requests
import json
import pandas as pd
# URL for the web service
scoring_uri = 'http://xxxxxx-xxxxxx-xxxxx-xxxx.xxxxx.azurecontainer.io/score'
# Two sets of data to score, so we get two results back
new_data = pd.DataFrame([
['2020-10-04 19:30:00',1.29281,1.29334,1.29334,1.29334,1],
['2020-10-04 19:45:00',1.29334,1.29294,1.29294,1.29294,1],
['2020-10-04 21:00:00',1.29294,1.29217,1.29334,1.29163,34],
['2020-10-04 21:15:00',1.29217,1.29257,1.29301,1.29115,195]],
columns=['1','2','3','4','5','6']
)
# Convert to JSON string
input_data = json.dumps({'data': new_data.to_dict(orient='records')})
# Set the content type
headers = {'Content-Type': 'application/json'}
# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.text)
I am getting an error:
{\"error\": \"DataException:\\n\\tMessage: No y values were provided. We expected non-null target values as prediction context because there is a gap between train and test and the forecaster depends on previous values of target. If it is expected, please run forecast() with ignore_data_errors=True. In this case the values in the gap will be imputed.\\n\\tInnerException: None\\n\\tErrorResponse \\n{\\n
I tried to add "ignore_data_errors=True" to different parts of the code without a success, hence, getting another error:
TypeError: __init__() got an unexpected keyword argument 'ignore_data_errors'
I would very much appreciate any help as I am stuck at this.
To avoid getting the provided error in time series forecasting, you should enable Autodetect for the Forecast Horizon. It means that only ideal time series data can use manually set feature which is not helping for real-world cases.
see the image
I might be misunderstanding something about asammdf in general, but I'm not really sure how to go about decoding a CAN frame.
Supposing I have a dbc with relevant messages defined, I get that I can use mdf.extract_bus_logging() to parse the signals, but I'm not really sure what to do from there.
I'm sending a raw payload + can frame ID, and I think I could get away with using cantools to parse the raw data (and then passing that data into an asammdf signal), but it feels like there's some degree of support here within asammdf.
I have an example here along the lines of what I want to do (using this motohawk.dbc file from the cantools examples here https://cantools.readthedocs.io/en/latest/)
from asammdf import MDF
import cantools
filename = "~/mdf_dbc_play/tmp.dbc"
db = cantools.database.load_file(filename)
msg_bytes = [0,0,0xff,0x01,0x02,0x03,0x04,0x05]
# Temporary bogus timestamp
timestamps = [0.1]
# Use cantools to decode the raw bytes
msg = db.get_message_by_name("EEC1")
unpacked_signals = msg.decode(msg_bytes)
# Using version 3.10 just in case we have some versioning issues with old tooling.
with asammdf.MDF(version='3.10') as mdf:
# I assume there's some way of getting the actual value we care about from an unpacked message.
samples = unpacked_signals.get_values()
sig = asammdf.Signal(name="EEC1", samples=samples, timestamps=timestamps)
mdf.append(sig)
mdf.save("~/mdf_dbc_play/output.mdf")
For logging CAN traffic to an MF4 file you can try this code from python-can
https://github.com/hardbyte/python-can/blob/mf4-io/can/io/mf4.py
I built a Sagemaker endpoint that I am attempting to evoke using Lambda+API Gateway. I'm getting the following error:
"An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message \"unable to evaluate payload provided\"
I know why what it's complaining about, but I don't quite understand why it's occuring. I have confirmed that the shape of the input data of my lambda function is the same as how I trained the model. The following is my input payload in lambda:
X = pd.concat([X, rx_norm_dummies, urban_dummies], axis = 1)
payload = X.to_numpy()
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=payload)
In the jupyter notebook where I created my endpoint/trained my model, I can also access the model using a numpy ndarray so I'm confused why I'm getting this error.
y = X[0:10]
result = linear_predictor.predict(y)
print(result)
Here is a modificaiton I make to serialization of the endpoint:
from sagemaker.predictor import csv_serializer, json_deserializer
linear_predictor.content_type = 'text/csv'
linear_predictor.serializer = csv_serializer
linear_predictor.deserializer = json_deserializer
I'm new when it comes to Sagemaker/Lambda, so any help would be appreciated and I can send more code to add context if needed. Tried various foramts and cannot get this to work.
Not sure what algorithm you are trying to use in Sage but they have the quote below on their documentation. If your CSV has a header or if you don't specify the label_size a bad request (http status code 400) might be what you get because the model isn't compliant with what AWS expects to receive.
Amazon SageMaker requires that a CSV file does not have a header record and that the target variable is in the first column. To run unsupervised learning algorithms that don't have a target, specify the number of label columns in the content type. For example, in this case 'content_type=text/csv;label_size=0'.
From: https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html#cdf-csv-format
In your code, you're indicating to the SDK that you are passing JSON but payload in your code is actually a NumPy array.
I would suggest replacing payload = X.to_numpy() to payload = X.to_json(orient='records') or something similar to convert it to JSON first.
I am working with Azure functions to create triggers that aggregate data on an hourly basis. The triggers get data from blob storage, and to avoid aggregating the same data twice I want to add a condition that lets me only process blobs that were modified the last hour.
i am using the SDK, and my code for doing this looks like this:
''' Timestamp variables for t.now and t-1 '''
timestamp = datetime.now(tz=utc)
timestamp_negative1hr = timestamp+timedelta(hours=1)
''' Read data from input enivonment '''
data = BlockBlobService(account_name='accname', account_key='key')
generator = data.list_blobs('directory')
dataloaded = []
for blob in generator:
loader = data.get_blob_to_text('collection',blob.name, if_modified_since=timestamp_negative1hr)
trackerstatusobjects = loader.content.split('\n')
for trackerstatusobject in trackerstatusobjects:
dataloaded.append(json.loads(trackerstatusobject))
When I run this, the error I get is azure.common.AzureHttpError: The condition specified using HTTP conditional header(s) is not met. It is also specified that it is due to a timeout. The blobs are recieving data when I run it, so in any case it is not the correct return message. If I add .strftime("%Y-%m-%d %H:%M:%S:%z")to the end of my timestamp i get another error AttributeError: 'str' object has no attribute 'tzinfo'. This must mean that azure expects a datetime object, but for some reason it is not working for me.
Any ideas on how to solve it? Thanks
Hope you are well. I'm Using Python 2.7 and new at it. I'm trying to use yahoo finance API to get information from stocks, here is my code:
from yahoo_finance import Share
yahoo = Share('YHOO')
print yahoo.get_historical('2014-04-25', '2014-04-29')
This code thoug works once out of 4 attempts, the other 3 times gives me this errors:
YQL Query error: Query failed with error: No Definition found for Table yahoo.finance.quote
Is there anyway to fix this error so to have the code working 100% of the time?
Thanks.
Warmest regards
This is a server-side error. The query.yahooapis.com service appears to be handled by a cluster of machines, and some of those machines appear to be misconfigured. This could be a temporary problem.
I see the same error when accessing the API directly using curl:
$ curl "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quote%20where%20symbol%20%3D%20%22YHOO%22&format=json&env=store%3a//datatables.org/alltableswithkeys"
{"error":{"lang":"en-US","description":"No definition found for Table yahoo.finance.quote"}}
Other than retrying in a loop, there is no way to fix this on the Python side:
data = None
for i in range(10): # retry 10 times
try:
yahoo = Share('YHOO')
data = yahoo.get_historical('2014-04-25', '2014-04-29')
break
except yahoo_finance.YQLQueryError:
continue
if data is None:
print 'Failed to retrieve data from the Yahoo service, try again later'