Individual User Custom Search Engine with ElasticSearch/Django Haystack - python

We're in the process of writing a django app that lets users send private messages among themselves, as well as send message to a group, and are looking to implement a per-user customized search functionality so each user can search and view only messages they have received.
How do we offer a search experience that's customized to each user? Some messages are part of threads sent to thousands of users as part of a group, whereas others may be private messages sent between 2 users and even others may be "pending" messages that are held for moderation.
Do we hard-code the filters that determine if a user can view a message into each query we send to ElasticSearch, or if a message goes to a group with 1000 members do I add 1000 identical documents to ElasticSearch with the only thing changing being the recipient?
Update
So here's an individual message in it's serialized form serialized:
{
"snippet": "Hi All,Though Marylan...", // Friendly snippet, this will be needed in the result
"thread_id": 28719, // Unique ID for this thread
"thread_title": "Great Thread Title Here", // Title for the thread, will be used to diplay in search results
"sent_at": "2015-03-19 07:28:15.092030-05:00", // Datetime the message was originr
"text": "Clean Message Test Here", // Text to be queryable
"pending": false, // If pending, this should only appear in the search results of the sender
"id": 30580, // Unique ID for this message across the entire
"sender": {
"sender_is_staff": false, // If the sender is a staff member or not (Filterable)
"sender": "Anna M.", // Friendly name (we'll need this to display on the result page)
"sender_guid": "23234304-eeee-bbbb-1234-bfb19d56ad68" // Guid of sender (necessary to display a link to the user's profile in the result)
},
"recipient" {
"name": "", // Not filled in for group messages
"recipient_guid": "" // Not filled in for group messages
}
"type": "group", // Values for this can be 'direct' or 'group'
"group_id": 43 // This could be null
}
A user should be able to search:
All the messages that they're the "sender" of
All messages where their GUID is in the "recipient" area (and the "type" is "direct")
All the messages sent to the groups IDs they're a member of that are not pending (they could be a member of 100 groups though, so it could be [10,14,15,18,25,44,50,60,75,80,81,82,83,...])
In SQL that'd be SELECT * FROM messages WHERE text contains 'query here' AND (sender.guid = 'my-guid' OR recipient.guid = 'my-guid' OR (group_id in [10,14,15,18,25,44,50,60,75,80,81,82,83,...] AND pending != True))

I hope I'm understanding your problem correctly.
So you have a messaging system where there are 3 types of messages (group, 2-users, moderated). Your goal is to allow your users to search through all messages, with the option to apply filters on type, user, date, etc.
Take advantage of the scalable nature of ElasticSearch for storing your searchable data. First, consider the servers on which your ES nodes are running on. Do they have enough performant resources (memory, CPU, network, hard drive speed) for your traffic and the size/quantity of your documents? Once you've decided on the server specs, you can simply add more as needed to distribute data and processing.
Next, create your message document structure. I imagine your mapping may look something like this:
"message": {
"properties": {
"id": {
"type": "long"
},
"type": {
"type": "string"
},
"body": {
"type": "string"
},
"from_user": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
},
"to_user": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
},
"group": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
},
"added_on": {
"type": "date"
},
"updated_on": {
"type": "date"
},
"status_id": {
"type": "short"
}
}}
You may want to create custom analyzers for the "body" and "name" fields to customize your search results to fit your expectations. Then it's just a matter of writing queries and using filters/sorts to allow users to search globally or from/to specific users or groups.
After that, you just need to set up a bridge between your database and your ES index for syncing your messages for search. Sync frequency depends on how quickly you want messages to be made available for search.
Well, I truly hope I understood your question correctly. Otherwise, OK...

Related

AVRO - JSON Enconding of messages in Confluent Python SerializingProducer vs. Confluent Rest Proxy with UNIONS

attached an example AVRO-Schema
{
"type": "record",
"name": "DummySampleAvroValue",
"namespace": "de.company.dummydomain",
"fields": [
{
"name": "ID",
"type": "int"
},
{
"name": "NAME",
"type": [
"null",
"string"
]
},
{
"name": "STATE",
"type": "int"
},
{
"name": "TIMESTAMP",
"type": [
"null",
"string"
]
}
]
}
Regarding the section "JSON Encoding" of the official AVRO-Specs - see: https://avro.apache.org/docs/current/spec.html#json_encoding - a JSON Message which validates against the above AVRO-Schema should look like the following because of the UNION-Types used:
{
"ID":1,
"NAME":{
"string":"Kafka"
},
"STATE":-1,
"TIMESTAMP":{
"string":"2022-04-28T10:57:03.048413"
}
}
When producing this message via Confluent Rest Proxy (AVRO), everything works fine, the data is accepted, validated and present in Kafka.
When using the "SearializingProducer" from the confluent_kafka Python Package, the example message is not accepted and only "regular" JSON works, e. g.:
{
"ID":1,
"NAME":"Kafka",
"STATE":-1,
"TIMESTAMP":"2022-04-28T10:57:03.048413"
}
Is this intended behaviour or am I doing something wrong? Can I tell the SerializingProducer to accept this encoding?
I need to hold open both ways to produce messages but the sending system can/want´s only to provide one of the above Payloads. Is there a way to use both with the same payload?
Thanks in advance.
Best regards

Override the automatically generated parameters in FastAPI

I am writing FastAPI program that is just a bunch of #app.get endpoints for querying data. There are many, many different query arguments they could use that are automatically generated from a config file, for example the #app.get("/ADJUST_COLOR/") endpoint could look something like /ADJUST_COLOR/?RED_darker=10&BLUE_lighter=43&GREEN_inverse=true where all those parameters are generated from a list of colors and a list of operations to perform on those colors (This is only an example, not what I am actually doing).
The way I am doing that is to take in the request object like this:
#app.get("/ADJUST_COLOR/")
def query_COLORS( request: Request ):
return look_through_parameters(request.query_params)
But the problem is that the automatically generated swagger UI does not show any useful data:
Since I am parsing the request manually there are no parameters generated. But since I have a full list of the parameters I am expecting then I should be able to generate my own documentation and have the UI show it.
I have looked through these two documents: https://fastapi.tiangolo.com/tutorial/path-operation-configuration/
And https://fastapi.tiangolo.com/advanced/path-operation-advanced-configuration/
But I was not able to figure out if it was possible or not
You can define custom api schema in your route via openapi_extra (this is a recent feature of FastAPI, 0.68 will work but I'm not sure the exact earliest version that supports this):
#app.get("/ADJUST_COLOR/", openapi_extra={
"parameters": [
{
"in": "query",
"name": "RED_darker",
"schema": {
"type": "integer"
},
"description": "The level of RED_darker"
},
{
"in": "query",
"name": "BLUE_lighter",
"schema": {
"type": "integer"
},
"description": "The level of BLUE_lighter"
},
{
"in": "query",
"name": "GREEN_inverse",
"schema": {
"type": "boolean"
},
"description": "is GREEN_inverse?"
},
]
})
async def query_COLORS(request: Request):
return look_through_parameters(request.query_params)
Which is rendered like this in your api /docs:

How do I pull new employee information from DocuSign through API?

Is it possible to pull the new employee’s personal information who gets hired every day from DocuSign through API? I'm trying to find a way to automate the process of user account creation process from DocuSign to Active Directory by avoiding CSV file. New to this, any input might be useful?
There are two parts to this endeavor.
The first, is not related to DocuSign. You need to get an event fired everytime a new contact is added to AD and be able to process this request. I assume you have a way to do that.
Then, the second part is using our REST API to add a new user.
Make a POST request to:
POST /v2.1/accounts/{accountId}/users
You pass this information in the request body:
{
"newUsers": [
{
"userName": "Claire Horace",
"email": "claire#example.com.com"
},
{
"userName": "Tal Mason",
"email": "tal#example.com.com",
"userSettings": [
{
"name": "canSendEnvelope",
"value": "true"
},
{
"name": "locale",
"value": "fr"
}
]
}
]
}

Venmo Webhook to Monitor Payment Data

I am working on a charity project at school in which the top 10 donors will be rewarded. The ultimate goal is to have a live feed of the top 10 lists like a scoreboard, either on our websites or though periodic tweets. I am a second year computer science major and know python.
I dont think I will have an issues parsing the JSON into a python dictionary or list and then sorting the leaderboard. The problem is I don't know enough about web technologies in terms of importing the data using a webhook. I can see the data using https://requestb.in/ and testing transactions, but I need a more permanent solution. I also need to be able to run this all online and not on my computer.
I would really appreciate being pointed in the right direction.
Example Transaction data seen on https://requestb.in/
{
"date_created": "2013-12-16T16:15:23.514136",
"type": "payment.created",
"data": {
"action": "pay",
"actor": {
"about": "No Short Bio",
"date_joined": "2011-09-09T00:30:51",
"display_name": "Andrew Kortina",
"first_name": "Andrew",
"id": "711020519620608087",
"last_name": "Kortina",
"profile_picture_url": "",
"username": "kortina"
},
"amount": null,
"audience": "public",
"date_completed": "2013-12-16T16:20:00",
"date_created": "2013-12-16T16:20:00",
"id": "1312337325098795713",
"note": "jejkeljeljke",
"status": "settled",
"target": {
"email": null,
"phone": null,
"type": "user",
"user": {
"about": "No Short Bio",
"date_joined": "2011-09-09T00:30:54",
"display_name": "Shreyans Bhansali",
"first_name": "Shreyans",
"id": "711020544786432772",
"last_name": "Bhansali",
"profile_picture_url": "",
"username": "shreyans"
}
}
}
}
I see that your example JSON above is from https://developer.venmo.com/docs/webhooks
A webhook is basically just a URL that knows how to handle POST requests; when they want to notify your site/webapp, they call that URL and pass it the information they want you to receive.
The URL can be unencrypted (http) or encrypted (https); if you are dealing with financial info you definitely want it to be encrypted. Check your web host's instructions on setting up an SSL certificate.
On the same page it talks about how to set configure your webhook (log in to your Venmo account, go to the Developer tab, and enter your URL). For confirmation, it will make a GET call (ie https://your_site/path/page?venmo_challenge=XYZZY); your page needs to return the challenge value (ie XYZZY).
I will suggest Flask as a simple Python framework and Heroku for hosting; there are many other alternatives, but this should get you started.

How to validate a json structure in Python3

I am using Python3.5 and Django for a web api.When I refer to input, I refer to a HTTP request parameters. I have a parameter where I am expecting a JSON data which I need to validate before processing further.
I have a base json structure that the input has to be in.
Example,
{
"error": "bool",
"data": [
{
"name": "string",
"age": "number"
},
{
"name": "string",
"age": "number"
},
...
]
}
The above JSON represents the structure that I want my input to be in. The keys are predefined, and the value represents the datatype of that key that I am expecting. I came across a Python library(jsonschema) that does this validation, but I can't find any documentation where it works with dynamic data. i.e. the objects inside the JSON array 'data' can be of any number, of course this is the most simple scenario I came up with for explaining the basic requirement. In cases like these, how can I validate my json?
The solution here didn't help because it's just checking if the json is proper or not based on the Django model. My json has no relation with Django model. Its a simple json structure. It still doesn't tell me how to validate dynamic object
JSON Schema is a specification for validating JSON; jsonschema is just a Python library that implements it. It certainly does allow you to specify that a key can contain any number of elements.
An example of a JSON Schema that validates your code might be:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"additionalProperties": false,
"required": [
"error",
"data"
],
"properties": {
"error": {
"type": "boolean"
},
"data": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}
}
}
}
}
}
See https://spacetelescope.github.io/understanding-json-schema/ for a good overview
Take a look into the documentation of Python's JSON API. I believe json.tool is what you're looking for, however there are a couple of other ways to validate JSON using that API.

Categories