str to dict in python, but maintain the sequence of json attributes - python

I've tried ast.literal_eval and json.loads but both of these, doesn't maintain the sequence of json attributes when a string is provided. Please see the following example -
String before providing it to json.loads -
{
"type": "array",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"strList": {
"type": "array",
"items": {
"type": "string"
}
},
"strMap": {
"type": "object"
},
"p2": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p3": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p4": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
}
}
}
}
}
}
}
},
"p3": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p4": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
}
}
}
}
}
},
"b": {
"type": "boolean",
"required": true
}
},
"classnames": {
"rootNode": {
"classname": "com.agent.Person"
},
"p2": {
"classname": "com.agent.Person2",
"p3": {
"classname": "com.agent.Person3",
"p4": {
"classname": "com.agent.Person4"
}
}
},
"p3": {
"classname": "com.agent.Person3",
"p4": {
"classname": "com.agent.Person4"
}
}
}
}
String after providing it to json.loads -
{
'classnames': {
'p2': {
'classname': 'com.agent.Person2',
'p3': {
'classname': 'com.agent.Person3',
'p4': {
'classname': 'com.agent.Person4'
}
}
},
'p3': {
'classname': 'com.agent.Person3',
'p4': {
'classname': 'com.agent.Person4'
}
},
'rootNode': {
'classname': 'com.agent.Person'
}
},
'properties': {
'b': {
'required': True,
'type': 'boolean'
},
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p2': {
'items': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p3': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p4': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
}
},
'type': 'object'
}
},
'type': 'object'
}
},
'type': 'object'
},
'type': 'array'
},
'p3': {
'items': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p4': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
}
},
'type': 'object'
}
},
'type': 'object'
},
'type': 'array'
},
'strList': {
'items': {
'type': 'string'
},
'type': 'array'
},
'strMap': {
'type': 'object'
}
},
'type': 'array'
}
Can anyone please suggest an alternative or something in python which keeps the sequence of attributes as they are and convert the string into the python dictionary?

As tobias_k has said, python dictionaries are unordered, so you'll lose any order information as soon as you load your data into one.
You can, however, load your JSON string into a OrderedDict:
from collections import OrderedDict
import json
json.loads(your_json_string, object_pairs_hook=OrderedDict)
This method is mentioned in the json module documentation

Related

Avro python array serialization

Looks like the following doesn't fit avro schema:
[{'page_title': 'Antoine Meillet', 'page_id': 3, 'contributors': [['contribution', {'revisions': 2, 'username': 'Curry'}], ['contribution', {'revisions': 1, 'username': 'script de conversion'}], ['contribution', {'revisions': 1, 'username': 'Francis'}]]}]
Schema :
{
"namespace": "org.wikipedia.fr",
"name": "meta-history",
"type": "record",
"fields": [
{
"name": "page_title",
"type": "string"
},
{
"name": "page_id",
"type": "int"
},
{
"name": "contributors",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "contribution",
"fields": [
{
"name": "revisions",
"type": "int"
},
{
"name": "username",
"type": "string"
}
]
}
}
}
]
}
Got "ValueError: no value and no default for revisions"
Not sure what I'm doing wrong here...
Ok, not needed to name the array field.
[{'page_title': 'Antoine Meillet', 'page_id': 3, 'contributors': [{'revisions': 2, 'username': 'Curry'}, {'revisions': 1, 'username': 'script de conversion'}, {'revisions': 1, 'username': 'Francis'}]]
Got it right.

Why i'm getting null value instead of aggregated response?

I'm trying to perform min aggregation using nested aggregation in elasticsearch but still getting null values..
GET /my_index/_search
{
"query": {
"match": {
"FirstName": "Cheryl"
}
},
"aggs": {
"art": {
"nested": {
"path": "art"
},
"aggs": {
"min_price": {
"min": {
"field": "art.Income"
}
}
}
}
}
}
Mappings :
{
"mappings": {
"properties": {
"art": {
"type": "nested",
"properties": {
"FirstName": {
"type": "text"
},
"Price": {
"type": "integer"
}
}
}
}
}
}

searching only digits in a mixed field (elasticsearch)

I have a field with phone numbers with this format - XXX-XXX-XXXX or XXXXXXXXXX (its a merged table).
I want to be able to search XXXXXXXXXX and get results from both formats.
I tried using the decimal digit filter but it didn't work.
Here are the settings that i have tried which are as follow:
mapping = {
'mappings': {
DOC_TYPE: {
'properties': {
'first_name': {
'type': 'text',
'analyzer': 'word_splitter'
},
'last_name': {
'type': 'text',
'analyzer': 'word_splitter'
},
'email': {
'type': 'text',
'analyzer': 'email'
},
'gender': {
'type': 'text'
},
'ip_address': {
'type': 'text'
},
'language': {
'type': 'text'
},
'phone': {
'type': 'text',
'analyzer': 'digits'
},
'id': {
'type': 'long'
}
}
}
},
'settings': {
'analysis': {
'analyzer': {
'my_analyzer': {
'type': 'whitespace'
},
'better': {
'type': 'standard'
},
'word_splitter': {
'type': 'custom',
'tokenizer': 'nGram',
'min_gram': 5,
'max_gram': 5,
'filter': [
'lowercase'
]
},
'email': {
'type': 'custom',
'tokenizer': 'uax_url_email'
},
'digits': {
'type': 'custom',
'tokenizer': 'whitespace',
'filter': [
'decimal_digit'
]
}
}
}
}
}
Any ideas ?
Use a char_filter to remove the hyphens before indexing. As a simple example:
Set up the custom analyzer and apply it to the phone field.
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"phone_analyzer": {
"tokenizer": "standard",
"char_filter": [
"phone_char_filter"
]
}
},
"char_filter": {
"phone_char_filter": {
"type": "mapping",
"mappings": [
"- => "
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"phone": {
"type": "text",
"analyzer": "phone_analyzer"
}
}
}
}
}
Add some docs
POST my_index/_doc
{"phone": "123-456-7890"}
POST my_index/_doc
{"phone": "2345678901"}
Search in xxx-xxx-xxxx format
GET my_index/_search
{
"query": {
"match": {
"phone": "123-456-7890"
}
}
}
Search in xxxxxxxxxx format
GET my_index/_search
{
"query": {
"match": {
"phone": "1234567890"
}
}
}

Generating a dynamic nested JSON using for loop in python

I am newbie in Python. I have some difficulties generating a nested JSON using for loop in python. For generating a nested JSON, I got the length of dictionary on runtime and based on the dictionary length I want to generate nested JSON. eg. I got the length of dictionary is 4. The dictionary length may vary. Here is my data_dict dictionary:
data_dict = {"PHOTO_1" : {"key1" : "PHOTO_2", "key2" : "PHOTO_3", "key3" : "PHOTO_4"}, "PHOTO_2" : {"key1" : "PHOTO_1", "key2" : "PHOTO_3"},"PHOTO_3" : {"key1" : "PHOTO_2"},"PHOTO_4" : {"key1" : "PHOTO_1", "key2" : "PHOTO_2", "key3" : "PHOTO_3"}}
Expected result :
{
"Requests": [
{
"photo": {
"photoId": {
"id": "PHOTO_1"
},
"connections": {
"target": {
"id": "PHOTO_2"
}
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_1"
},
"connections": {
"target": {
"id": "PHOTO_3"
}
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_1"
},
"connections": {
"target": {
"id": "PHOTO_4"
}
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_2"
},
"connections": {
"target": {
"id": "PHOTO_1"
},
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_2"
},
"connections": {
"target": {
"id": "PHOTO_3"
},
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_3"
},
"connections": {
"target": {
"id": "PHOTO_2"
},
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_4"
},
"connections": {
"target": {
"id": "PHOTO_1"
},
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_4"
},
"connections": {
"target": {
"id": "PHOTO_2"
},
}
},
"updateData": "connections"
},
{
"photo": {
"photoId": {
"id": "PHOTO_4"
},
"connections": {
"target": {
"id": "PHOTO_3"
},
}
},
"updateData": "connections"
}
]
}
Please help. I'm not getting how to solve this query? Please don't mark it duplicate. I have already checked all the answers and my JSON query is totally different.
The solution using itertools.permutations() function:
import itertools, json
data_dict = {"first_photo" : "PHOTO_1", "second_photo" : "PHOTO_2", "Thrid" : "PHOTO_3"}
result = {"Requests":[]}
for pair in sorted(itertools.permutations(data_dict.values(), 2)):
result["Requests"].append({"photo":{"photoId":{"id": pair[0]},
"connections":{"target":{"id": pair[1]}}},"updateData": "connections"})
print(json.dumps(result, indent=4))
The additional approach for the new input dict:
data_dict = {"PHOTO_1" : {"key1" : "PHOTO_2", "key2" : "PHOTO_3", "key3" : "PHOTO_4"}, "PHOTO_2" : {"key1" : "PHOTO_1", "key2" : "PHOTO_3"},"PHOTO_3" : {"key1" : "PHOTO_2"},"PHOTO_4" : {"key1" : "PHOTO_1", "key2" : "PHOTO_2", "key3" : "PHOTO_3"}}
result = {"Requests":[]}
for k,d in sorted(data_dict.items()):
for v in sorted(d.values()):
result["Requests"].append({"photo":{"photoId":{"id": k},
"connections":{"target":{"id": v}}},"updateData": "connections"})
print(json.dumps(result, indent=4))

ordering json in python mapping object

I am using elasticsearch where the query is to be posted in json and should be in standard order or else the result will be wrong. the problem is that the python is changing my json ordering. my original json query is.
x= {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*a*"
}
},
"filter": {
"and": {
"filters": [
{
"term": {
"city": "london"
}
},
{
"term": {
"industry.industry_not_analyed": "oil"
}
}
]
}
}
}
},
"facets": {
"industry": {
"terms": {
"field": "industry.industry_not_analyed"
}
},
"city": {
"terms": {
"field": "city.city_not_analyzed"
}
}
}
}
but the resulting python object is as follow.
{
'query': {
'filtered': {
'filter': {
'and': {
'filters': [
{
'term': {
'city': 'london'
}
},
{
'term': {
'industry.industry_not_analyed': 'oil'
}
}
]
}
},
'query': {
'query_string': {
'query': '*a*'
}
}
}
},
'facets': {
'city': {
'terms': {
'field': 'city.city_not_analyzed'
}
},
'industry': {
'terms': {
'field': 'industry.industry_not_analyed'
}
}
}
}
the result is different than what I need how do I solve this.
Use OrderedDict() instead of {}. Note that you can't simply use OrderedDict(query=...) because that would create an unordered dict in the background. Use this code instead:
x = OrderedDict()
x['query'] = OrderedDict()
...
I suggest to implement a builder for this:
x = Query().filtered().query_string("*a*").and()....

Categories