How to delete a specific document from azure search index?

How to delete a specific document from azure search index? - python

I have a specific number of documents from the azure search index to be deleted and I need a solution in python for the same.
I created an index in azure search already and the format of the index is given below
> {
> "#odata.context": "https://{name}.search.windows.net/indexes({'index name'})/$metadata#docs(*)",
> "value": [
> {
> "#search.score": ,
> "content": "",
> "metadata_storage_name": "",
> "metadata_storage_path": "",
> "metadata_storage_file_extension": "",}]}
metadata_storage_path is the unique key for each document in azure search index.
I got 2 ways to go about the problem using azure-python SDK and python request module but both the methods are throwing me an error which is listed below.
method - 1 (using python request module)
I got the reference from azure documentation
https://learn.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents
import json
import requests
api_key = "B346FEAB56E6D5*******"
headers = {
'api-key': f'{api_key}',
'Content-Type': 'application/json'
}
doc_idx = "Index name"
doc_url = f"https://{name}.search.windows.net/indexes/{doc_idx}-index/docs/search?api-version=2020-06-30-Preview"
payload = json.dumps({
"#search.action": "delete",
"key_field_name":({"metadata_storage_path": "aHR0cHM6Ly9mc2NvZ******"})
},
)
response = json.loads(requests.request("POST", doc_url, headers=headers, data=payload).text)
print(response)
I am getting the following error.
{'error': {'code': '',
'message': "The request is invalid.## Heading ## Details: parameters : The parameter 'key_field_name' in the request payload is not a valid parameter for the operation 'search'.\r\n"}}
I also tried manipulating the code but I am not able to make it work please let me know weather i am making some mistake in the code or is there some issues with the python request module and azure search.
Method - 2 (using azure python SDK)
I got the Reference from azure documentation.
https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.searchclient?view=azure-python
I tried to delete one document inside the azure search index with azure python SDK and the code is given below.
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents import SearchClient
key = AzureKeyCredential('B346FEAB******')
doc_idx = "index name"
service_endpoint = f"https://{name}.search.windows.net/indexes/{doc_idx}-index/docs/search?api-version=2020-06-30-Preview"
search_client = SearchClient(service_endpoint, doc_idx , key,)
# result = search_client.delete_documents(documents=[DOCUMENT])
result = search_client.delete_documents(documents=[{"metadata_storage_name": "XYZ.jpg"}])
print("deletion of document succeeded: {}".format(result[0].succeeded))
I am getting the following error.
ResourceNotFoundError Traceback (most recent call last)
<ipython-input-7-88beecc15663> in <module>
13 # result = search_client.upload_documents(documents=[DOCUMENT])
14
---> 15 result = search_client.delete_documents(documents=[{"metadata_storage_name": "XYZ.jpg"}])----------------------------------------------
ResourceNotFoundError: Operation returned an invalid status 'Not Found'
I also tried using metadata_storage_path instead of metadata_storage_name and I got the same error.
please check the code and let me know where I am making mistake and also if there is any other method for deleting a specific document in azure search index.

You have not defined a variable for name.
service_endpoint = f"https://{name}.search.windows.net/indexes/{doc_idx}-index/docs/search?api-version=2020-06-30-Preview"
Becomes
https://.search.windows.net/indexes/Index%20name-index/docs/search?api-version=2020-06-30-Preview

Related

Bad Request creating Automation Account in Azure Python SDK

I'm trying to create a new AutomationAccount using Python SDK. There's no problem if I get, list, update or delete any account, but I'm getting a BadRequest error when I try to create a new one.
Documentation is pretty easy: AutomationAccountOperations Class > create_or_update()
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from azure.identity import AzureCliCredential
from azure.mgmt.automation import AutomationClient
credential = AzureCliCredential()
automation_client = AutomationClient(credential, "xxxxx")
result = automation_client.automation_account.create_or_update("existing_rg", 'my_automation_account', {"location": "westeurope"})
print(f'Automation account {result.name} created')
This tiny script is throwing me this error:
Traceback (most recent call last):
File ".\deploy.py", line 10
result = automation_client.automation_account.create_or_update("*****", 'my_automation_account', {"location": "westeurope"})
File "C:\Users\Dave\.virtualenvs\new-azure-account-EfYek8IT\lib\site-packages\azure\mgmt\automation\operations\_automation_account_operations.py", line 174, in create_or_update
raise HttpResponseError(response=response, model=error, error_format=ARMErrorFormat)
azure.core.exceptions.HttpResponseError: (BadRequest) {"Message":"The request body on Account must be present, and must specify, at a minimum, the required fields set to valid values."}
Code: BadRequest
Message: {"Message":"The request body on Account must be present, and must specify, at a minimum, the required fields set to valid values."}
I've tried to use this method (create_or_update) on a different sdk like powershell using same parameters and it worked.
Some thoughts?

Solution is setting the Azure SKU parameter.
For some reason is not necessary on Powershell but it is on Python SDK. Now this snippet is creating my AutomationAccount successfully.
credential = AzureCliCredential()
automation_client = AutomationClient(credential, "xxxxx")
params = {"name": my_automation_account, "location": LOCATION, "tags": {}, "sku": {"name": "free"}}
result = automation_client.automation_account.create_or_update("existing_rg", 'my_automation_account', params)
print(f'Automation account {result.name} created')
Docs about this:
AutomationAccountOperations Class > create_or_update
AutomationAccountCreateOrUpdateParameters Class
Sku Class
Thanks #UpQuark

Trying to Parse a JSON using Python but having issues

I usually use Powershell and have parsed JSONs from HTTP requests, successfully, before. I am now using Python and using the 'Requests' library. I have successfully got the JSON from the API. Here is the format it came through in (I removed some information and other fields).:
{'content': [
{
'ContactCompany': Star,
'ContactEmail': test#company.star,
'ContactPhoneNumber': 123-456-7894,
'assignedGroup': 'TR_Hospital',
'assignedGroupId': 'SGP000000132297',
'serviceClass': None, 'serviceReconId': None
}
]
}
I'm having trouble getting the values inside of the 'content.' With my Powershell experience in the past, I've tried:
tickets_json = requests.get(request_url, headers=api_header).json()
Tickets_Info = tickets_json.content
for tickets in tickets_info:
tickets.assignedGroup
How do I parse the JSON to get the information inside of 'Content' in Python?

tickets_json = requests.get(request_url, headers=api_header).json()
tickets_info = tickets_json['content']
for tickets in tickets_info:
print(tickets['assignedGroup'])

Search for specific information within a json file

So I am trying to locate and acquire data from an api, I am fine with actually getting the data which is in json format from the api into my python program, however I am having troubles searching through the json for the specific data I want.
Here is a basic idea of what the json file from the api looks like:
{
"data": {
"inventory": {
"object_name": {
"amount": 8,
},
(Obviously the } close, I just didn't copy them)
I am trying to locate the amount within the json file of a specific object.
So far, here is the code I have, however, I have run into the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I have researched the error and it appears to be caused usually by a faulty json file, however as I have imported the json file from an api, it having issues is not the case and must be an issue with some of the converting to strings and such I have done in my code.
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
dataStr = str(dataJson)
amt = json.loads(dataStr)['data'].get('inventory').get('object_name').get('amount')
As stated previously, the main issue I have is actually collecting the data I need from the json endpoint, everything is fine with getting the data into the python script.

dataJson = data.json() is already python dict no need to json.loads(dataStr) just use
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
amt = dataStr['data'].get('inventory').get('object_name').get('amount')

How to download list data from SharePoint Online to a csv (preferably) or json file?

I have accessed a list in SharePoint Online with Python and want to save the list data to a file (csv or json) to transform it and sort some metadata for a migration
I have full access to the Sharepoint site I am connecting(client ID, secret..).
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.client_request import ClientRequest
from office365.sharepoint.client_context import ClientContext
I have set my settings:
app_settings = {
'url': 'https://company.sharepoint.com/sites/abc',
'client_id': 'id',
'client_secret': 'secret'
}
Connecting to the site:
context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'],
client_secret=app_settings['client_secret'])
ctx = ClientContext(app_settings['url'], context_auth)
Getting the lists and checking the titles:
lists = ctx.web.lists
ctx.load(lists)
ctx.execute_query()
for lista in lists:
print(lista.properties["Title"]) # this gives me the titles of each list and it works.
lists is a ListCollection Object
From the previous code, I see that I want to get the list titled: "Analysis A":
a1 = lists.get_by_title("Analysis A")
ctx.load(a1)
ctx.execute_query() # a1 is a List item - non-iterable
Then I get the data in that list:
a1w = a1.get_items()
ctx.load(a1w)
ctx.execute_query() # a1w is a ListItemCollection - iterable
idea 1: df to json/csv
df1 = pd.DataFrame(a1w) #doens't work)
idea 2:
follow this link: How to save a Sharepoint list as a file?
I get an error while executing the json.loads command:
JSONDecodeError: Extra data: line 1 column 5 (char 4)
Alternatives:
I tried Shareplum, but can't connect with it, like I did with office365-python-rest. My guess is that it doesn't have an authorisation option with client id and client secret (as far as I can see)
How would you do it? Or am I missing something?

Sample test demo for your reference.
context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'],
client_secret=app_settings['client_secret'])
ctx = ClientContext(app_settings['url'], context_auth)
list = ctx.web.lists.get_by_title("ListA")
items = list.get_items()
ctx.load(items)
ctx.execute_query()
dataList = []
for item in items:
dataList.append({"Title":item.properties["Title"],"Created":item.properties["Created"]})
print("Item title: {0}".format(item.properties["Title"]))
pandas.read_json(json.dumps(dataList)).to_csv("output.csv", index = None,header=True)

Idea 1
It's hard to tell what can go wrong without the error trace. But I suspect it's likely to do with malformed data that you are passing as the argument. See here from the documentation to know exactly what's expected.
Do also consider updating your question with relevant stack error traces.
Idea 2
JSONDecodeError: Extra data: line 1 column 5 (char 4)
This error simply means that the Json string is not a valid format. You can validate JSON strings by using this service. This often tells you the point of error which you can then use it to manually fix the problem.
This error could also be caused if the object that is being parsed is a python object. You can avoid this by jsonifying each line as you go
data_list= []
for line in open('file_name.json', 'r'):
data_list.append(json.loads(line))
This avoids storing intermediate python objects. Also see this related issue if nothing works.

Google Cloud Analyze Sentiment in JupyterLab with Python

I am using Google Cloud / JupyterLab /Python
I'm trying to run a sample sentiment analysis, following the guide here
However, on running the example, I get this error:
AttributeError: 'SpeechClient' object has no attribute
'analyze_sentiment'
Below is the code I'm trying:
def sample_analyze_sentiment (gcs_content_uri):
gcs_content_uri = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en" document = {
"gcs_content_uri":'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt',
"type": 'enums.Document.Type.PLAIN_TEXT', "language": 'en'
}
response = client.analyze_sentiment(document,
encoding_type=encoding_type)
I had no problem generating the transcript using Speech to Text but no success getting a document sentiment analysis!?

I had no problem to perform analyze_sentiment following the documentation example.
I have some issues about your code. To me it should be
from google.cloud import language_v1
from google.cloud.language import enums
from google.cloud.language import types
def sample_analyze_sentiment(path):
#path = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
# if path is sent through the function it does not need to be specified inside it
# you can always set path = "default-path" when defining the function
client = language_v1.LanguageServiceClient()
document = types.Document(
gcs_content_uri = path,
type = enums.Document.Type.PLAIN_TEXT,
language = 'en',
)
response = client.analyze_sentiment(document)
return response
Therefore, I have tried the previous code with a path of my own to a text file inside a bucket in Google Cloud Storage.
response = sample_analyze_sentiment("<my-path>")
sentiment = response.document_sentiment
print(sentiment.score)
print(sentiment.magnitude)
I've got a successful run with sentiment score -0.5 and magnitude 1.5. I performed the run in JupyterLab with python3 which I assume is the set up you have.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to delete a specific document from azure search index? - python

You have not defined a variable for name. service_endpoint = f"https://{name}.search.windows.net/indexes/{doc_idx}-index/docs/search?api-version=2020-06-30-Preview" Becomes https://.search.windows.net/indexes/Index%20name-index/docs/search?api-version=2020-06-30-Preview

Related

Bad Request creating Automation Account in Azure Python SDK

Trying to Parse a JSON using Python but having issues

Search for specific information within a json file

How to download list data from SharePoint Online to a csv (preferably) or json file?

Google Cloud Analyze Sentiment in JupyterLab with Python

Categories

Resources