I'm trying to set data validation rules for my current spreadsheet. One thing that would help me would to be able to view the rules in JSON from data validation rules I have already set (In the spreadsheet UI or within an API call).
Example.
request = {
"requests": [
{
"setDataValidation": {
"range": {
"sheetId": SHEET_ID,
"startRowIndex": 1,
"startColumnIndex": 0,
"endColumnIndex":1
},
"rule": {
"condition": {
"type": "BOOLEAN"},
"inputMessage": "Value MUST BE BOOLEAN",
"strict": "True"
}
}
}
]
}
service.spreadsheets().batchUpdate(spreadsheetId=SPREADSHEET_ID body=request).execute()
But what API calls do I use to see the Data Validation on these range of cells? This is useful for if I set the Data Validation rules in the spreadsheet and I want to see how google interprets them. I'm having a lot of trouble setting complex Datavalidations through the API.
Thank you
To obtain only the "Data Validation" components of a given spreadsheet, you simply request the appropriate field in the call to spreadsheets.get:
service = get_authed_sheets_service_somehow()
params = {
spreadsheetId: 'your ssid',
#range: 'some range',
fields: 'sheets(data/rowData/values/dataValidation,properties(sheetId,title))' }
request = service.spreadsheets().get(**params)
response = request.execute()
# Example print code (not tested :p )
for sheet in response['sheets']:
for range in sheet['data']:
for r, row in enumerate(range['rowData']):
for c, col in enumerate(row['values']):
if 'dataValidation' in col:
# print "Sheet1!R1C1" & associated data validation object.
# Assumes whole grid was requested (add appropriate indices if not).
print(f'\'{sheet["properties"]["title"]}\'!R{r}C{c}', col['dataValidation'])
By specifying fields, includeGridData is not required to obtain data on a per-cell basis from the range you requested. By not supplying a range, we target the entire file. This particular fields specification requests the rowData.values.dataValidation object and the sheetId and title of the properties object, for every sheet in the spreadsheet.
You can use the Google APIs Explorer to interactively determine the appropriate valid "fields" specification, and additionally examine the response:
https://developers.google.com/apis-explorer/#p/sheets/v4/sheets.spreadsheets.get
For more about how "fields" specifiers work, read the documentation: https://developers.google.com/sheets/api/guides/concepts#partial_responses
(For certain write requests, field specifications are not optional so it is in your best interest to determine how to use them effectively.)
I think I found the answer. IncludeGridData=True in your spreadsheet().get
from pprint import pprint
response = service.spreadsheets().get(
spreadsheetId=SPREADSHEETID, fields='*',
ranges='InputWorking!A2:A',includeGridData=True).execute()
You get a monster datastructure back. So to look at the very first data in your range you could do.
pprint(response['sheets'][0]['data'][0]['rowData'][0]['values'][0]['dataValidation'])
{'condition': {'type': 'BOOLEAN'},
'inputMessage': 'Value MUST BE BOOLEAN',
'strict': True}
Related
I'm trying to add rows and delete columns trough gspread, but i keep getting TypeError: string indices must be integers
Here's the code:
writeTo = client.open_by_key(file['id']).worksheet('Sheet1')
request = {
"requests": [
{
"deleteDimension": {
"range": {
"sheetId": 0,
"dimension": "COLUMNS",
"startIndex": 8,
"endIndex": 26
}
}
}
]
}
result = writeTo.batch_update(request)
I'm using sheets API v3
Modification points:
It seems that batch_update method is the method of Class gspread.spreadsheet.Spreadsheet. Ref When I saw your script, client.open_by_key(file['id']).worksheet('Sheet1') returnd Class gspread.worksheet.Worksheet. I thought that this was the reason of your issue.
When I tested your script, I could confirm that the same error of TypeError: string indices must be integers occurs.
When these points are reflected in your script, it becomes as follows.
From:
writeTo = client.open_by_key(file['id']).worksheet('Sheet1')
To:
writeTo = client.open_by_key(file['id'])
Note:
In this modification, it supposes that the value of file['id'] is the valid Spreadsheet ID. Please be careful about this. When an error like Requested entity was not found. occurs, please confirm your Spreadsheet ID again.
And, I think that your request body is correct. When I tested your request body using the modified script, I confirmed that no error occurs.
Reference:
batch_update
I believe the error is in file['id']
You can't slice a string by a character, they must be sliced via integers, like file[5] for example to get the first 4 letters in file.
So I'm fairly new to both AWS and Python. I'm on a uni assignment and have hit a road block.
I'm uploading data to AWS S3, this information is being sent to an SQS Queue and passed into AWS Lambda. I know, it would be much easier to just go straight from S3 to Lambda...but apparently "that's not the brief".
So I've got my event accurately coming into AWS Lambda, but no matter how deep I dig, I can't reach the information I need. In AMS Lambda, I run the following query.
def lambda_handler(event, context):
print(event)
Via CloudWatch, I get the output
{'Records': [{'messageId': '1d8e0a1d-d7e0-42e0-9ff7-c06610fccae0', 'receiptHandle': 'AQEBr64h6lBEzLk0Xj8RXBAexNukQhyqbzYIQDiMjJoLLtWkMYKQp5m0ENKGm3Icka+sX0HHb8gJoPmjdTRNBJryxCBsiHLa4nf8atpzfyCcKDjfB9RTpjdTZUCve7nZhpP5Fn7JLVCNeZd1vdsGIhkJojJ86kbS3B/2oBJiCR6ZfuS3dqZXURgu6gFg9Yxqb6TBrAxVTgBTA/Pr35acEZEv0Dy/vO6D6b61w2orabSnGvkzggPle0zcViR/shLbehROF5L6WZ5U+RuRd8tLLO5mLFf5U+nuGdVn3/N8b7+FWdzlmLOWsI/jFhKoN4rLiBkcuL8UoyccTMJ/QTWZvh5CB2mwBRHectqpjqT4TA3Z9+m8KNd/h/CIZet+0zDSgs5u', 'body': '{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-2","eventTime":"2021-03-26T01:03:53.611Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"MY_ID"},"requestParameters":{"sourceIPAddress":"MY_IP_ADD"},"responseElements":{"x-amz-request-id":"BQBY06S20RYNH1XJ","x-amz-id-2":"Cdo0RvX+tqz6SZL/Xw9RiBLMCS3Rv2VOsu2kVRa7PXw9TsIcZeul6bzbAS6z4HF6+ZKf/2MwnWgzWYz+7jKe07060bxxPhsY"},"s3":{"s3SchemaVersion":"1.0","configurationId":"test","bucket":{"name":"MY_BUCKET","ownerIdentity":{"principalId":"MY_ID"},"arn":"arn:aws:s3:::MY_BUCKET"},"object":{"key":"test.jpg","size":246895,"eTag":"c542637a515f6df01cbc7ee7f6e317be","sequencer":"00605D33019AD8E4E5"}}}]}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1616720643174', 'SenderId': 'AIDAIKZTX7KCMT7EP3TLW', 'ApproximateFirstReceiveTimestamp': '1616720648174'}, 'messageAttributes': {}, 'md5OfBody': '1ab703704eb79fbbb58497ccc3f2c555', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-2:ARN', 'awsRegion': 'eu-west-2'}]}
[Disclaimer, I've tried to edit out any identifying information but if there's any sensitive data I'm not understanding or missed, please let me know]
Anyways, just for a sample, I want to get the Object Key, which is test.jpg. I tried to drill down as much as I can, finally getting to: -
def lambda_handler(event, context):
print(event['Records'][0]['body'])
This returned the following (which was nice to see fully stylized): -
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "eu-west-2",
"eventTime": "2021-03-26T01:08:16.823Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "MY_ID"
},
"requestParameters": {
"sourceIPAddress": "MY_IP"
},
"responseElements": {
"x-amz-request-id": "ZNKHRDY8GER4F6Q5",
"x-amz-id-2": "i1Cazudsd+V57LViNWyDNA9K+uRbSQQwufMC6vf50zQfzPaH7EECsvw9SFM3l3LD+TsYEmnjXn1rfP9GQz5G5F7Fa0XZAkbe"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "test",
"bucket": {
"name": "MY_BUCKET",
"ownerIdentity": {
"principalId": "MY_ID"
},
"arn": "arn:aws:s3:::MY_BUCKET"
},
"object": {
"key": "test.jpg",
"size": 254276,
"eTag": "b0052ab9ba4b9395e74082cfd51a8f09",
"sequencer": "00605D3407594DE184"
}
}
}
]
}
However, from this stage on if I try to write print(event['Records'][0]['body']['Records']) or print(event['Records'][0]['s3']), I'll get told I require an integer, not a string. If I try to write print(event['Records'][0]['body'][0]), I'll be given a single character every time (in this cause the first { bracket).
I'm not sure if this has something to do with tuples, or if at this stage it's all saved as one large string, but at least in the output view it doesn't appear to be saved that way.
Does anyone have any idea what I'd do from this stage to access the further information? In the full release after I'm done testing, I'll be wanting to save an audio file and the file name as opposed to a picture.
Thanks.
You are having this problem because the contents of the body is a JSON. But in string format. You should parse it to be able to access it like a normal dictionary. Like so:
import json
def handler(event: dict, context: object):
body = event['Records'][0]['body']
body = json.loads(body)
# use the body as a normal dictionary
You are getting only a single char when using integer indexes because it is a string. So, using [n] in an string will return the nth char.
It's because your getting stringified JSON data. You need to load it back to its Python dict format.
There is a useful package called lambda_decorators. you can install with pip install lambda_decorators
so you can do this:
from lambda_decorators import load_json_body
#load_json_body
def lambda_handler(event, context):
print(event['Records'][0]['body'])
# Now you can access the the items in the body using there index and keys.
This will extract the JSON for you.
I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
def extract():
f = open('tmdb.json')
if f:
return json.loads(f.read())
movieDict = extract()
def index(movieDict={}):
for id, body in movieDict.items():
es.index(index='tmdb', id=id, doc_type='movie', body=body)
index(movieDict)
How can I update mapping for single field? I have field title to which I want to add different analyzer.
title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)
This fails.
I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.
I am able to specify analyzer for an query, like this:
query = {"query": {
"multi_match": {
"query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}}
How do I specify it for index or field?
I am also able to put analyzer to settings after closing and opening index
analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')
Copying exact settings for english analyzers doesn't do 'activate' it for my data.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer
By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.
Solved it with massive amount of googling....
You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)
Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.
If analyzers are specified by field you also need to specify type.
You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers
Code:
def create_index(movieDict={}, mapping={}):
es.indices.create(index='test_index', body=mapping)
start = time.time()
for id, body in movieDict.items():
es.index(index='test_index', id=id, doc_type='movie', body=body)
print("--- %s seconds ---" % (time.time() - start))
Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.
mapping = es.indices.get_mapping(index='tmdb')
This is example of how title key should be specified to use english analyzer
'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}
The pydocumentdb.document_client.DocumentClient object has a CreateCollection() method, defined here.
When creating a collection with this method, one needs to specify the database link (already known), the collection (I don't know how to reference it if it hasn't been made) and options.
Parameters that I would like to control when creating the collection are:
name of collection
type of collection (fixed size vs. partitioned)
partition keys
RU value
Indexing policy (or at least be able to create a default template somewhere and automatically copy it to the newly created one)
Enums for some of these parameters seem to be defined here, but I don't see any potentially useful HTTP headers in http_constants.py, and I don't see where RUs come in to play or where a cohesive "Collection" object would be passed as a parameter.
You could refer to the source sample code from here and the rest api from here.
import pydocumentdb;
import pydocumentdb.errors as errors
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'https://***.documents.azure.com:443/',
'MASTERKEY': '***'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
databaseLink = "dbs/db"
coll = {
"id": "testCreate",
"indexingPolicy": {
"indexingMode": "lazy",
"automatic": False
},
"partitionKey": {
"paths": [
"/AccountNumber"
],
"kind": "Hash"
}
}
collection_options = { 'offerThroughput': 400 }
client.CreateCollection(databaseLink , coll, collection_options)
Hope it helps you.
How can one access the global parameters ("GlobalParameters") sent from a web service in a Python script on Azure ML?
I tried:
if 'GlobalParameters' in globals():
myparam = GlobalParameters['myparam']
but with no success.
EDIT: Example
In my case, I'm sending a sound file over the web service (as a list of samples). I would also like to send a sample rate and the number of bits per sample. I've successfully configured the web service (I think) to take these parameters, so the GlobalParameters now look like:
"GlobalParameters": {
"sampleRate": "44100",
"bitsPerSample": "16",
}
However, I cannot access these variables from the Python script, neither as GlobalParameters["sampleRate"] nor as sampleRate. Is it possible? Where are they stored?
based on our understanding of your question, here may has a miss conception that Azure ML parameters are not “Global Parameters”, as a matter of fact they are just parameter substitution tied to a particular module. So in affect there are no global parameters that are accessible throughout the experiment you have mentioned. Such being the case, we think the experiment below accomplishes what you are asking for:
Please add an “Enter Data” module to the experiment and add Data in csv format. Then for the Data click the parameter to create a web service parameter. Add in the CSV data which will be substituted from data passed by the client application. I.e.
Please add an “Execute Python” module and hook up the “Enter Data” output to the “Execute Python” input1. Add the python code to take the dataframe1 and add it to a python list. Once you have it in a list you can use it anywhere in your python code.
Python code snippet
def azureml_main(dataframe1 = None, dataframe2 = None):
import pandas as pd
global_list = []
for g in dataframe1["Col3"]:
global_list.append(g)
df_global = pd.DataFrame(global_list)
print('Input pandas.DataFrame:\r\n\r\n{0}'.format(df_global))
return [df_global]
Once you publish your experiment, you can add in new values in the “Data”: “”, section below with the new values that you was substituted for the “Enter Data” values in the experiment.
data = {
"Inputs": {
"input1":
{
"ColumnNames": ["Col1", "Col2", "Col3"],
"Values": [ [ "0", "value", "0" ], [ "0", "value", "0" ], ]
}, },
"GlobalParameters": {
"Data": "1,sampleRate,44500\\n2,bitsPerSample,20",
}
}
Please feel free to let us know if this makes sense.
The GlobalParameters parameter can not be used in a Python script. It is used to override certain parameters in other modules.
If you, for example, take the 'Split Data' module, you'll find an option to turn a parameter into a web service parameter:
Once you click that, a new section appears titled "Web Service Parameters". There you can change the default parameter name to one of your choosing.
If you deploy your project as a web service, you can override that parameter by putting it in the GlobalParameters parameter:
"GlobalParameters": {
"myFraction": 0.7
}
I hope that clears things up a bit.
Although it is not possible to use GlobalParameters in the Python script (see my previous answer), you can however hack/abuse the second input of the Python script to pass in other parameters. In my example I call them metadata parameters.
To start, I added:
a Web service input module with name: "realdata" (for your real data off course)
a Web service input module with name: "metadata" (we will abuse this one to pass parameters to our Python).
a Web service output module with name: "computedMetadata"
Connect the modules as follows:
As you can see, I also added a real data set (Restaurant ratings) as wel as a dummy metadata csv (the Enter Data Manually) module.
In this manual data you will have to predefine your metadata parameters as if they were a csv with a header and a only a single row to hold the data:
In the example both sampleRate and bitsPerSample are set to 0.
My Python scripts then takes in that fake csv as metadata, does some dummy calculation with it and returns it as column name:
import pandas as pd
def azureml_main(realdata = None, metadata = None):
theSum = metadata["sampleRate"][0] + metadata["bitsPerSample"][0]
outputString = "The sum of the sampleRate and the bitsPerSecond is " + str(theSum)
print(outputString)
return pd.DataFrame([outputString])
I then published this as a web service and called it using Node.js like this:
httpreq.post('https://ussouthcentral.services.azureml.net/workspaces/xxx/services/xxx', {
headers: {
Authorization: 'Bearer xxx'
},
json: {
"Inputs": {
"realdata": {
"ColumnNames": [
"userID",
"placeID",
"rating"
],
"Values": [
[
"100",
"101",
"102"
],
[
"200",
"201",
"202"
]
]
},
"metadata": {
"ColumnNames": [
"sampleRate",
"bitsPerSample"
],
"Values": [
[
44100,
16
]
]
}
},
"GlobalParameters": {}
}
}, (err, res) => {
if(err) return console.log(err);
console.log(JSON.parse(res.body));
});
The output was as expected:
{ Results:
{ computedMetadata:
{ type: 'table',
value:
{ ColumnNames: [ '0' ],
ColumnTypes: [ 'String' ],
Values:
[ [ 'The sum of the sampleRate and the bitsPerSecond is 44116' ] ] } } } }
Good luck!