Simplest ElasticSearch query

Simplest ElasticSearch query - python

I have the following document:
obj = {
"ID": 4,
"GUID": 4,
"Type": "Movie",
"Type": "Margin Call",
}
Is there a simple "all-type" query that can be done, for example something like:
>>> es.search(index="avails", term="margin")
Or -
>>> es.search(index="avails", term="Movie")
Or -
>>> es.search(index="avails", term="4")
Or, do I need to use the specialized ElasticSearch syntax differently for each of these searches? Basically, I'm just looking to approximate results and make sure object-creation is working working digging into the query language.

Here is the simplest way I could figure out to do a search in ElasticSearch:
def search(self, index=None, **kwargs):
"""
This will use the & syntax with all the kwargs provided.
For example: es.search(name="margin", id=4) ==> ?name=margin&id=4.
Use as a simplified search to gut check very basic things.
"""
index = index or self.index
qs = urllib.urlencode(kwargs)
try:
res = self.es.search(index=index, q=qs)
except TransportError, e:
print '>>>', e
res = None
return res

Related

Pymongo - Search and leave duplicates fast

I have some data in my MongoDB - a lot of data.
And I want to add new data with pymongo, but dont append data which already exists in DB based on phone_number. I'm using this code, but its only a loop in loop, so its slow as hell...
connected_db = self.main_mongo_terema # Connect to DB
data_to_insert = []
try:
collection = connected_db['terema_data'] # Connect to Collection
except:
self.log("Cant connect to DB")
try:
for dicts_ in data: # list of dicts
for key, val in dicts_.items():
if key == "phone_number":
match = collection.find({}, {"phone_number": val })
for x in match:
if x:
continue
else:
data_to_insert.append(dicts_)
except Exception as e:
self.log(f"Loop problem - {str(e)}")
try:
collection.insert_many(data_to_insert)
# Empty list
except TypeError:
pass
Something like this:
dict_ = [{'phone_number':0123, 'col1':'abc'}, {'phone_number':'456', 'col1':'def'}]
When dic['phone_number']: 'value' from dict_ not in DB add dic to DB, else do nothing

You are using three nested loops, so in simple words time complexity will be O(n^3), which will be like hell, if there is more data.
You can speed up the insert operation with bulk_write(). $setOnInsert operator will be helpful here.
from pymongo import UpdateOne
requests = []
try:
for dict_ in data:
requests.append(UpdateOne({'phone_number': dict_['phone_number']},
{
'$setOnInsert': {
'field1': dict_['field1'],
'field2': dict_['field2'],
# Rest of the fields.
}
},
upsert=True))
db.collection.bulk_write(requests)
except:
self.log(f"Loop problem - {str(e)}")

Create an unique index so that you do not have to check whether the number exists or not.
Create a index by the following command:
collection.createIndex( { "phone_number": 1 }, { unique: true } )
PS: You only need to do this once. After that all the phone_number(s) in the database will always be unique.
After that you can insert the data object directly. You do not need to check if the value is unique or not, mongodb will handle that for you.
collection.insert_many(data)

Python search replace with multiple Json objects

I wasn't sure how to search for this but I am trying to make a script that dynamically launches programs. I will have a couple of JSON files and I want to be able to do a search replace sort of thing.
So I'll setup an example:
config.json
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}
Then process.json
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
Now I want to be able to load both of these JSON objects and be able to use the values there to update. So like "CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive" will become CONFIG_ARCHIVE_DIR": "/app/config/archive"
Does anyone know a good way to do this recursively because I'm running into issues when I'm trying to use something like CONFIG_DIR which requires BASEDIR first.
I have this function that loads all the data:
#Recursive function, loops and loads all values into data
def _load_data(data,obj):
for i in obj.keys():
if isinstance(obj[i],str):
data[i]=obj[i]
if isinstance(obj[i],dict):
data=_load_data(data,obj[i])
return data
Then I have this function:
def _update_data(data,data_str=""):
if not data_str:
data_str=json.dumps(data)
for i in data.keys():
if isinstance(data[i],str):
data_str=data_str.replace("{"+i+"}",data[i])
if isinstance(data[i],dict):
data=_update_data(data,data_str)
return json.loads(data_str)
So this works for one level but I don't know if this is the best way to do it. It stops working when I hit a case like the CONFIG_DIR because it would need to loop over the data multiple times. First it needs to update the BASEDIR then once more to update CONFIG_DIR. suggestion welcome.
The end goal of this script is to create a start/stop/status script to manage all of our binaries. They all use different binaries to start and I want one Processes file for multiple servers. Each process will have a servers array to tell the start/stop script what to run on given server. Maybe there's something like this already out there so if there is, please point me in the direction.
I will be running on Linux and prefer to use Python. I want something smart and easy for someone else to pickup and use/modify.

I made something that works with the example files you provided. Note that I didn't handle multiple keys or non-dictionaries in the data. This function accepts a list of the dictionaries obtained after JSON parsing your input files. It uses the fact that re.sub can accept a function for the replacement value and calls that function with each match. I am sure there are plenty of improvements that could be made to this, but it should get you started at least.
def make_config(configs):
replacements = {}
def find_defs(config):
# Find leaf nodes of the dictionary.
defs = {}
for k, v in config.items():
if isinstance(v, dict):
# Nested dictionary so recurse.
defs.update(find_defs(v))
else:
defs[k] = v
return defs
for config in configs:
replacements.update(find_defs(config))
def make_replacement(m):
# Construct the replacement string.
name = m.group(0).strip('{}')
if name in replacements:
# Replace replacement strings in the replacement string.
new = re.sub('\{[^}]+\}', make_replacement, replacements[name])
# Cache result
replacements[name] = new
return new
raise Exception('Replacement string for {} not found'.format(name))
finalconfig = {}
for name, value in replacements.items():
finalconfig[name] = re.sub('\{[^}]+\}', make_replacement, value)
return finalconfig
With this input:
[
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
},
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
]
It gives this output:
{
'BASEDIR': '/app',
'CONFIG_ARCHIVE_DIR': '/app/config/archive',
'CONFIG_DIR': '/app/config',
'LOG_DIR': '/app/log',
'binary': 'java',
'executable': 'DummyProcess-0.1.0.jar',
'launch_args': '-Dspring.config.location=/app/config/application.yml -Dlogging.config=/app/config/logback-spring.xml -jar DummyProcess-0.1.0.jar',
'name': 'Dummy_Process',
'startup_log': '/app/log/startup_Dummy_Process.out'
}

As an alternative to the answer by #FamousJameous and if you don't mind changing to ini format, you can also use the python built-in configparser which already has support to expand variables.

I implemented a solution with a class (Config) with a couple of functions:
_load: simply convert from JSON to a Python object;
_extract_params: loop over the document (output of _load) and add them to a class object (self.params);
_loop: loop over the object returned from _extract_params and, if the values contains any {param}, call the _transform method;
_transform: replace the {param} in the values with the correct values, if there is any '{' in the value linked to the param that needs to be replaced, call again the function
I hope I was clear enough, here is the code:
import json
import re
config = """{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}"""
process = """{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
"""
class Config(object):
def __init__(self, documents):
self.documents = documents
self.params = {}
self.output = {}
# Loads JSON to dictionary
def _load(self, document):
obj = json.loads(document)
return obj
# Extracts the config parameters in a dictionary
def _extract_params(self, document):
for k, v in document.items():
if isinstance(v, dict):
# Recursion for inner dictionaries
self._extract_params(v)
else:
# if not a dict set params[k] as v
self.params[k] = v
return self.params
# Loop on the configs dictionary
def _loop(self, params):
for key, value in params.items():
# if there is any parameter inside the value
if len(re.findall(r'{([^}]*)\}', value)) > 0:
findings = re.findall(r'{([^}]*)\}', value)
# call the transform function
self._transform(params, key, findings)
return self.output
# Replace all the findings with the correct value
def _transform(self, object, key, findings):
# Iterate over the found params
for finding in findings:
# if { -> recursion to set all the needed values right
if '{' in object[finding]:
self._transform(object, finding, re.findall(r'{([^}]*)\}', object[finding]))
# Do de actual replace
object[key] = object[key].replace('{'+finding+'}', object[finding])
self.output = object
return self.output
# Entry point
def process_document(self):
params = {}
# _load the documents and extract the params
for document in self.documents:
params.update(self._extract_params(self._load(document)))
# _loop over the params
return self._loop(params)
# return self.output
if __name__ == '__main__':
config = Config([config, process])
print(config.process_document())
I am sure there are many other better ways to reach your goal, but I still hope this can bu useful to you.

How to decode dataTables Editor form in python flask?

I have a flask application which is receiving a request from dataTables Editor. Upon receipt at the server, request.form looks like (e.g.)
ImmutableMultiDict([('data[59282][gender]', u'M'), ('data[59282][hometown]', u''),
('data[59282][disposition]', u''), ('data[59282][id]', u'59282'),
('data[59282][resultname]', u'Joe Doe'), ('data[59282][confirm]', 'true'),
('data[59282][age]', u'27'), ('data[59282][place]', u'3'), ('action', u'remove'),
('data[59282][runnerid]', u''), ('data[59282][time]', u'29:49'),
('data[59282][club]', u'')])
I am thinking to use something similar to this really ugly code to decode it. Is there a better way?
from collections import defaultdict
# request.form comes in multidict [('data[id][field]',value), ...]
# so we need to exec this string to turn into python data structure
data = defaultdict(lambda: {}) # default is empty dict
# need to define text for each field to be received in data[id][field]
age = 'age'
club = 'club'
confirm = 'confirm'
disposition = 'disposition'
gender = 'gender'
hometown = 'hometown'
id = 'id'
place = 'place'
resultname = 'resultname'
runnerid = 'runnerid'
time = 'time'
# fill in data[id][field] = value
for formkey in request.form.keys():
exec '{} = {}'.format(d,repr(request.form[formkey]))

This question has an accepted answer and is a bit old but since the DataTable module seems being pretty popular among jQuery community still, I believe this approach may be useful for someone else. I've just wrote a simple parsing function based on regular expression and dpath module, though it appears not to be quite reliable module. The snippet may be not very straightforward due to an exception-relied fragment, but it was only one way to prevent dpath from trying to resolve strings as integer indices I found.
import re, dpath.util
rxsKey = r'(?P<key>[^\W\[\]]+)'
rxsEntry = r'(?P<primaryKey>[^\W]+)(?P<secondaryKeys>(\[' \
+ rxsKey \
+ r'\])*)\W*'
rxKey = re.compile(rxsKey)
rxEntry = re.compile(rxsEntry)
def form2dict( frmDct ):
res = {}
for k, v in frmDct.iteritems():
m = rxEntry.match( k )
if not m: continue
mdct = m.groupdict()
if not 'secondaryKeys' in mdct.keys():
res[mdct['primaryKey']] = v
else:
fullPath = [mdct['primaryKey']]
for sk in re.finditer( rxKey, mdct['secondaryKeys'] ):
k = sk.groupdict()['key']
try:
dpath.util.get(res, fullPath)
except KeyError:
dpath.util.new(res, fullPath, [] if k.isdigit() else {})
fullPath.append(int(k) if k.isdigit() else k)
dpath.util.new(res, fullPath, v)
return res
The practical usage is based on native flask request.form.to_dict() method:
# ... somewhere in a view code
pars = form2dict(request.form.to_dict())
The output structure includes both, dictionary and lists, as one could expect. E.g.:
# A little test:
rs = jQDT_form2dict( {
'columns[2][search][regex]' : False,
'columns[2][search][value]' : None,
'columns[2][search][regex]' : False,
} )
generates:
{
"columns": [
null,
null,
{
"search": {
"regex": false,
"value": null
}
}
]
}
Update: to handle lists as dictionaries (in more efficient way) one may simplify this snippet with following block at else part of if clause:
# ...
else:
fullPathStr = mdct['primaryKey']
for sk in re.finditer( rxKey, mdct['secondaryKeys'] ):
fullPathStr += '/' + sk.groupdict()['key']
dpath.util.new(res, fullPathStr, v)

I decided on a way that is more secure than using exec:
from collections import defaultdict
def get_request_data(form):
'''
return dict list with data from request.form
:param form: MultiDict from `request.form`
:rtype: {id1: {field1:val1, ...}, ...} [fieldn and valn are strings]
'''
# request.form comes in multidict [('data[id][field]',value), ...]
# fill in id field automatically
data = defaultdict(lambda: {})
# fill in data[id][field] = value
for formkey in form.keys():
if formkey == 'action': continue
datapart,idpart,fieldpart = formkey.split('[')
if datapart != 'data': raise ParameterError, "invalid input in request: {}".format(formkey)
idvalue = int(idpart[0:-1])
fieldname = fieldpart[0:-1]
data[idvalue][fieldname] = form[formkey]
# return decoded result
return data

Partial updates via PATCH: how to parse JSON data for SQL updates?

I am implementing 'PATCH' on the server-side for partial updates to my resources.
Assuming I do not expose my SQL database schema in JSON requests/responses, i.e. there exists a separate mapping between keys in JSON and columns of a table, how do I best figure out which column(s) to update in SQL given the JSON of a partial update?
For example, suppose my table has 3 columns: col_a, col_b, and col_c, and the mapping between JSON keys to table columns is: a -> col_a, b -> col_b, c -> col_c. Given JSON-PATCH data:
[
{"op": "replace", "path": "/b", "value": "some_new_value"}
]
What is the best way to programmatically apply this partial update to col_b of the table corresponding to my resource?
Of course I can hardcode these mappings in a keys_to_columns dict somewhere, and upon each request with some patch_data, I can do sth like:
mapped_updates = {keys_to_columns[p['path'].split('/')[-1]]: p['value'] for p in patch_data}
then use mapped_updates to construct the SQL statement for DB update. If the above throws a KeyError I know the request data is invalid and can throw it away. And I will need to do this for every table/resource I have.
I wonder if there is a better way.

This is similar to what you're thinking of doing, but instead of creating maps, you can create classes for each table instead. For example:
class Table(object):
"""Parent class of all tables"""
def get_columns(self, **kwargs):
return {getattr(self, k): v for k, v in kwargs.iteritems()}
class MyTable(Table):
"""table MyTable"""
# columns mapping
a = "col_a"
b = "col_b"
tbl = MyTable()
tbl.get_columns(a="value a", b="value b")
# the above returns {"col_a": "value a", "col_b": "value b"}
# similarly:
tbl.get_columns(**{p['path'].split('/')[-1]: p['value'] for p in patch_data})
This is just something basic to get inspired from, these classes can be extended to do much more.

patch_json = [
{"op": "replace", "path": "/b", "value": "some_new_value"},
{"op": "replace", "path": "/a", "value": "some_new_value2"}
]
def fix_key(item):
item['path'] = item['path'].replace('/', 'col_')
return item
print map(fix_key, patch_json)

Clean solution for missing values in python list comprehensions

Is there any way to check every element of a list comprehension in a clean and elegant way?
For example, if I have some db result which may or may not have a 'loc' attribute, is there any way to have the following code run without crashing?
db_objs = SQL("query")
top_scores = [{"name":obj.name, "score":obj.score, "latitude":obj.loc.lat, "longitude":obj.loc.lon} for obj in db_objs]
If there is any way to fill these fields in either as None or the empty string or anything, that would be much very nice. Python tends to be a magical thing, so if any of you have sage advice it would be much appreciated.

Clean and unified solution:
from operator import attrgetter as _attrgetter
def attrgetter(attrname, default=None):
getter = _attrgetter(attrname)
def wrapped(obj):
try:
return getter(obj)
except AttributeError:
return default
return wrapped
GETTER_MAP = {
"name":attrgetter('name'),
"score":attrgetter('score'),
"latitude":attrgetter('loc.lat'),
"longitude":attrgetter('loc.lon'),
}
def getdict(obj):
return dict(((k,v(obj)) for (k,v) in GETTER_MAP.items()))
if __name__ == "__main__":
db_objs = SQL("query")
top_scores = [getdict(obj) for obj in db_objs]
print top_scores

Try this:
top_scores = [{"name":obj.name,
"score":obj.score,
"latitude": obj.loc.lat if hasattr(obj.loc, lat) else 0
"longitude":obj.loc.lon if hasattr(obj.loc, lon) else 0}
for obj in db_objs]
Or, in your query set a default value.

It's not pretty, but getattr() should work:
top_scores = [
{
"name": obj.name,
"score": obj.score,
"latitude": getattr(getattr(obj, "loc", None), "lat", None),
"longitude": getattr(getattr(obj, "loc", None), "lon", None),
}
for obj in db_objs
]
This will set the dict item with key "latitude" to obj.loc.lat (and so on) if it exists; if it doesn't (and even if obj.loc doesn't exist), it'll be set to None.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simplest ElasticSearch query - python

Related

Pymongo - Search and leave duplicates fast

Python search replace with multiple Json objects

How to decode dataTables Editor form in python flask?

Partial updates via PATCH: how to parse JSON data for SQL updates?

Clean solution for missing values in python list comprehensions

Categories

Resources