Cannot create cluster with properties using the dataproc API

Cannot create cluster with properties using the dataproc API - python

I'm trying to create a cluster programmatically in python:
import googleapiclient.discovery
dataproc = googleapiclient.discovery.build('dataproc', 'v1')
zone_uri ='https://www.googleapis.com/compute/v1/projects/{project_id}/zone/{zone}'.format(
project_id=my_project_id,
zone=my_zone,
)
cluster_data = {
'projectId': my_project_id,
'clusterName': my_cluster_name,
'config': {
'gceClusterConfig': {
'zoneUri': zone_uri
},
'softwareConfig' : {
'properties' : {'string' : {'spark:spark.executor.memory' : '10gb'}},
},
},
}
result = dataproc \
.projects() \
.regions() \
.clusters() \
.create(
projectId=my_project_id,
region=my_region,
body=cluster_data,
) \
.execute()
And I keep getting this error : Invalid JSON payload received. Unknown name "spark:spark.executor.memory" at 'cluster.config.software_config.properties[0].value': Cannot find field.">
The doc of the API is here : https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#SoftwareConfig
Property keys are specified in prefix:property format, such as core:fs.defaultFS.
And even when I change the properties to {'string' : {'core:fs.defaultFS' : 'hdfs://'}}, I get that same error.

Properties is a key/value mapping:
'properties': {
'spark:spark.executor.memory': 'foo'
}
The documentation could have had a better example. In general, the best way to find out what the API looks like is to click "Equivalent REST" in the Cloud Console, or --log-http when using gcloud. For example:
$ gcloud dataproc clusters create clustername --properties spark:spark.executor.memory=foo --log-http
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/projectid/regions/global/clusters?alt=json
method: POST
== body start ==
{"clusterName": "clustername", "config": {"gceClusterConfig": {"internalIpOnly": false, "zoneUri": "us-east1-d"}, "masterConfig": {"diskConfig": {}}, "softwareConfig": {"properties": {"spark:spark.executor.memory": "foo"}}, "workerConfig": {"diskConfig": {}}}, "projectId": "projectid"}
== body end ==
==== request end ====

Related

Error in creation of work-item in Azure DevOps using Pytest/Python

I am trying to create a work-item using Python and requests library.
def test_create_work_item(work_items):
payload = {
'op': 'add',
'path': '/fields/System.Title',
'value': 'Sample bug'
}
pl = json.dumps(payload)
work_item = work_items.create(body=pl, type='bug')
assert work_item.status_code == 200
I am getting the below error for this :
{"$id":"1","innerException":null,"message":"You must pass a valid patch document in the body of the request.","typeName":"Microsoft.VisualStudio.Services.Common.VssPropertyValidationException,Microsoft.VisualStudio.Services.Common","typeKey":"VssPropertyValidationException","errorCode":0,"eventId":3000}
The same body works okay with Postman. So not sure what more is needed here to get it working.

I`m not familiar with Python.... Check this example: Create work item
The API uses an array of new fields:
[
{
"op": "add",
"path": "/fields/System.Title",
"from": null,
"value": "Sample task"
}
]
In your case, you use just one field in the request:
{
'op': 'add',
'path': '/fields/System.Title',
'value': 'Sample bug'
}

Using the Firestore REST API to Update a Document Field

I've been searching for a pretty long time but I can't figure out how to update a field in a document using the Firestore REST API. I've looked on other questions but they haven't helped me since I'm getting a different error:
{'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT', 'details': [{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'oil', 'description': "Error expanding 'fields' parameter. Cannot find matching fields for path 'oil'."}]}]}}
I'm getting this error even though I know that the "oil" field exists in the document. I'm writing this in Python.
My request body (field is the field in a document and value is the value to set that field to, both strings received from user input):
{
"fields": {
field: {
"integerValue": value
}
}
}
My request (authorizationToken is from a different request, dir is also a string from user input which controls the directory):
requests.patch("https://firestore.googleapis.com/v1beta1/projects/aethia-resource-management/databases/(default)/documents/" + dir + "?updateMask.fieldPaths=" + field, data = body, headers = {"Authorization": "Bearer " + authorizationToken}).json()

Based on the the official docs (1,2, and 3), GitHub and a nice article, for the example you have provided you should use the following:
requests.patch("https://firestore.googleapis.com/v1beta1/projects{projectId}/databases/{databaseId}/documents/{document_path}?updateMask.fieldPaths=field")
Your request body should be:
{
"fields": {
"field": {
"integerValue": Value
}
}
}
Also keep in mind that if you want to update multiple fields and values you should specify each one separately.
Example:
https://firestore.googleapis.com/v1beta1/projects/{projectId}/databases/{databaseId}/documents/{document_path}?updateMask.fieldPaths=[Field1]&updateMask.fieldPaths=[Field2]
and the request body would have been:
{
"fields": {
"field": {
"integerValue": Value
},
"Field2": {
"stringValue": "Value2"
}
}
}
EDIT:
Here is a way I have tested which allows you to update some fields of a document without affecting the rest.
This sample code creates a document under collection users with 4 fields, then tries to update 3 out of 4 fields (which leaves the one not mentioned unaffected)
from google.cloud import firestore
db = firestore.Client()
#Creating a sample new Document “aturing” under collection “users”
doc_ref = db.collection(u'users').document(u'aturing')
doc_ref.set({
u'first': u'Alan',
u'middle': u'Mathison',
u'last': u'Turing',
u'born': 1912
})
#updating 3 out of 4 fields (so the last should remain unaffected)
doc_ref = db.collection(u'users').document(u'aturing')
doc_ref.update({
u'first': u'Alan',
u'middle': u'Mathison',
u'born': 2000
})
#printing the content of all docs under users
users_ref = db.collection(u'users')
docs = users_ref.stream()
for doc in docs:
print(u'{} => {}'.format(doc.id, doc.to_dict()))
EDIT: 10/12/2019
PATCH with REST API
I have reproduced your issue and it seems like you are not converting your request body to a json format properly.
You need to use json.dumps() to convert your request body to a valid json format.
A working example is the following:
import requests
import json
endpoint = "https://firestore.googleapis.com/v1/projects/[PROJECT_ID]/databases/(default)/documents/[COLLECTION]/[DOCUMENT_ID]?currentDocument.exists=true&updateMask.fieldPaths=[FIELD_1]"
body = {
"fields" : {
"[FIELD_1]" : {
"stringValue" : "random new value"
}
}
}
data = json.dumps(body)
headers = {"Authorization": "Bearer [AUTH_TOKEN]"}
print(requests.patch(endpoint, data=data, headers=headers).json())

I found the official documentation to not to be of much use since there was no example mentioned. This is the API end-point for your firestore database
PATCH https://firestore.googleapis.com/v1beta1/projects/{YOUR_PROJECT_ID}/databases/(default)/documents/{COLLECTION_NAME}/{DOCUMENT_NAME}
the following code is the body of your API request
{
"fields": {
"first_name": {
"stringValue":"Kurt"
},
"name": {
"stringValue":"Cobain"
},
"band": {
"stringValue":"Nirvana"
}
}
}
The response you should get upon successful update of the database should look like
{
"name": "projects/{YOUR_PROJECT_ID}/databases/(default)/documents/{COLLECTION_ID/{DOC_ID}",
{
"fields": {
"first_name": {
"stringValue":"Kurt"
},
"name": {
"stringValue":"Cobain"
},
"band": {
"stringValue":"Nirvana"
}
}
"createTime": "{CREATE_TIME}",
"updateTime": "{UPDATE_TIME}"
Note that performing the above action would result in a new document being created, meaning that any fields that existed previously but have NOT been mentioned in the "fields" body will be deleted. In order to preserve fields, you'll have to add
?updateMask.fieldPaths={FIELD_NAME} --> to the end of your API call (for each individual field that you want to preserve).
For example:
PATCH https://firestore.googleapis.com/v1beta1/projects/{YOUR_PROJECT_ID}/databases/(default)/documents/{COLLECTION_NAME}/{DOCUMENT_NAME}?updateMask.fieldPaths=name&updateMask.fieldPaths=band&updateMask.fieldPaths=age. --> and so on

Elasticsearch with python: query specific field

I'm using python's elasticsearch module to connect and search through my elasticsearch cluster.
In the cluster, one of the fields in my index is 'message' - I want to query my elastic, from python, for a specific value in this 'message' field.
Here is my basic search which simply returns all logs of a specific index.
es = elasticsearch.Elasticsearch(source_cluster)
doc = {
'size' : 10000,
'query': {
'match_all' : {}
}
}
res = es.search(index='test-index', body=doc, scroll='1m')
How should I change this query in order to find all results with the word 'moved' in their 'message' field?
The equivalent query that does it from Kibana is:
_index:test-index && message: moved
Thanks,
Noam

You need to use the match query. Try this:
doc = {
'size' : 10000,
'query': {
'match' : {
'message': 'moved'
}
}
}

Adding a Source to Librato Data When Sending through Segment

I am trying to figure out how to add a source to a metric in Librato when sending the information via Segment. I am using the python library and have tried creating a property for source (below) but it doesn't seem to be working properly.
Here's what I've got:
userID = '12345'
analytics.track(userID, 'event', {
'value': 1,
'integrations.Librato.source': userID
})
I've also tried 'source' and 'Librato.source' as properties, which were referenced in Segment's documentation. Any suggestions?

Similarly for ruby, using the segment gem you can specify a source like so:
require 'analytics-ruby'
segment_token = 'asdfasdf' # The secret write key for my project
Analytics.init({
secret: segment_token,
#Optional error handler
on_error: Proc.necd giw { |status, msg| print msg } })
Analytics.track(
user_id: 123,
writeKey: segment_token,
event: 'segment.librato',
properties: { value: 42 }, context: { source:'my.source.name' })

You can't set the source of the Librato metric in the properties when sending from Segment, you need to send it as part of the context meta data. Librato does not accept any properties other than 'value' so nothing else you send as a property will be recorded. To set the source using the python library, the code needs to be as follows:
userID = '12345'
analytics.track(userID, 'event', {
'value': 1
}, {
'Librato': {
'source': userID
}
})
If you are are using javascript, it would be:
analytics.track({
userId: '12345',
event: 'event'
properties: {
value: 1
},
context: {
'Librato': {
'source': userID
}
}
});

How to use "suggest" in elasticsearch pyes?

How to use the "suggest" feature in pyes? Cannot seem to figure it out due to poor documentation. Could someone provide a working example? None of what I tried appears to work. In the docs its listed under query, but using:
query = Suggest(fields="fieldname")
connectionobject.search(query=query)

Since version 5:
_suggest endpoint has been deprecated in favour of using suggest via _search endpoint. In 5.0, the _search endpoint has been optimized for suggest only search requests.
(from https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-suggesters.html)
Better way to do this is using search api with suggest option
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggest_dictionary = {"my-entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
query_dictionary = {'suggest' : suggest_dictionary}
res = es.search(
index='auto_sugg',
doc_type='entity',
body=query_dictionary)
print(res)
Make sure you have indexed each document with suggest field
sample_entity= {
'id' : 'test123',
'name': 'Ramtin Seraj',
'title' : 'XYZ',
"suggest" : {
"input": [ 'Ramtin', 'Seraj', 'XYZ'],
"output": "Ramtin Seraj",
"weight" : 34 # a prior weight
}
}

Here is my code which runs perfectly.
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggDoc = {
"entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
res = es.suggest(body=suggDoc, index="auto_sugg", params=None)
print(res)
I used the same client mentioned on the elasticsearch site here
I indexed the data in the elasticsearch index by using completion suggester from here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot create cluster with properties using the dataproc API - python

Related

Error in creation of work-item in Azure DevOps using Pytest/Python

Using the Firestore REST API to Update a Document Field

Elasticsearch with python: query specific field

Adding a Source to Librato Data When Sending through Segment

How to use "suggest" in elasticsearch pyes?

Categories

Resources