Elasticsearch Data query to python object - python

Problem: I want to pick a field of index in elasticsearch and look for all the values against it. Like if I give a key I should get the value for that key and if that key exists more than once, so each whats the each value. Or even if I get one of the values would work for me.
How I am trying to work through it: Query the elasticsearch
I am trying to query my data from Elasticsearch;
r = es.search(index="test",body = {'query': {'wildcard': {'task_name': "*"}}})
I thought to load the data to a python object ( dictionary) to read a key values. However, when I try json.loads(r.json) it gives me an error : AttributeError: 'dict' object has no attribute 'json'.
I even tried with json.load(r) but the error remains the same.

Related

Get data from AsyncIOMotorCursor python

I have someone's code who is not in the company and I want to get the data stored in mongo database which is using AsyncIOMotorClient. I have no experience with motor.
Problem:
I am able to get this:
AsyncIOMotorCursor(<pymongo.cursor.Cursor object at 0x7fbd87b45ca0>)
I know how to iterate over <pymongo.cursor.Cursor object at 0x7fbd87b45ca0>, but when I try to iterate over the above AsyncIOMotorCursor, I get the following error:
TypeError: 'AsyncIOMotorCursor' object is not iterable
Kindly help.

ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'dict'

I'm trying to insert a dataframe using the query
engine = create_engine('scot://pswd:xyz# hostnumb:port/db_name')
dataframe.to_sql('table_name', engine, if_exists='replace')
but one column is a dictionary and I'm unable to insert it, only the column name is getting inserted.
I tried to change the type of the column in postgres from text to json object. still not able to insert.
I tried to use json.dumps() but still facing the issue.getting an error as "dtype: object is not JSON serializable"
Try specifying the dtype. So in your example, you would say
dataframe.to_sql('table_name', engine, if_exists='replace',dtype =
{'relevant_column':sqlalchemy.types.JSON})

sqlalchemy core integrity error

I'm working on parsing a file and inserting it into a database, using sqlalchemy core. I had it set up with the orm originally but that doesn't meet the speed requirements for the project.
My database has 2 tables: Objects and Attributes. The Objects table has a primary key of obj_id. The primary key for Attributes is composite: attr_name, attr_class, and obj_id, which is also a foreign key from Objects.
The attributes are stored after parsing the file in a list of dictionaries, like so:
[
{ 'obj_id' = obj_id, 'attr_name' = name, 'attr_class' = class, etc...},
{ ETC ETC ETC}]
The data is being inserted by first bulk inserting the objects, then the attributes. The object insert works perfectly. When inserting the attributes however, I get an integrity error, saying I tried to insert a duplicate primary key.
Here is my insert code for attributes:
self.engine.execute(
Attributes.__table__.insert(),
[{'obj_id' : attr['obj_id'],
'attr_name' : attr['attr_name'],
'attr_class': attr['attr_class'],
'attr_type' : attr['attr_type'],
'attr_size' : attr['attr_size']} for attr in attrList])
While trying to work this error out, I printed the id, name, and class of each attribute in the list to a file to find the duplicate key. Nowhere in the list is there actually an identical primary key, so this leads me to believe it is a problem with the structure of my query.
Can anyone figure this out with the info I've given, or give me somewhere to look for more information? I've already checked the documentation pretty thoroughly and couldn't find anything helpful.
Edit:
I also tried executing each insert statement separately, as suggested by someone on sqlalchemy's google group. The results were the same. The code I used:
insert = Attributes.__table__.insert()
for attr in attrList:
stmt = insert.values({'obj_id' : attr['obj_id'], ...})
self.engine.execute(stmt)
where ... was the rest of the values.
Edit 2:
The Integrity error is thrown as soon as I try to insert an attribute with the same name/class but a different object id. So for example:
In the format name-class-id:
By iteration 4, I've got:
Attr1-Class1-0
Attr2-Class2-0
Attr3-Class3-0
Attr4-Class4-0
On the next iteration, I try to insert Attr1-Class1-1, which fails.
I found the problem, completely unrelated to the insert code. When storing the data in the list, I was storing an Object as obj_id, which sqlalchemy didn't like. By fixing that I fixed the insertions.

Pyorient: how to parse json objects in pyorient?

I have a value in orientdb, which is a JSON object.
Assume that JSON object is :
a = {"abc":123}
When I am sending a query using pyorient, it is not able to get this value in the select query, and hangs.
In the orientdb console, this JSON object seems to be converted in some other format like
a = {abc=123}
I guess it is hanging because of the same issue.
The query from pyorient is:
client.query("select a from <tablename>")
This hangs and doesn't seems to be working.
Could you please help ho to parse this JSON object in pyorient?
I used OrientDb's REST API service to fetch JSON object fields from the database. PyOrient hangs when requested for a JSON object field.
So fetch the rid of the record you want, and use REST services to get all the fields, which worked perfectly fine.
pyorient gives you the output something like:
a = {'abc': '123'}
and json.loads() function works with " and not with ', so to solve it, you need to do this:
b=str(a)
b.replace("'",'"')
json_data = json.loads(b)
print(json_data.keys())
I have defined a function to get vertex and after you get your vertex you can use for loop to parse the json result. Lets say the vertex "Root" have an attribute "name", and in the for loop after the query execution we can parse the value like "res.name" to fetch the value.
I think in recent version, they fixed the hanging issue. I am not facing any issue with hanging while query execution.
def get_verted(vertex):
result = client.command("select * from "+vertex)
for res in result:
print res.name
get_vertex("Root")

PyMongo, Graphing

I have several mongo databases (some populated with collections and documents, some empty) and I am trying to parse through them and create a graph for the contents. I am planning on making nodes for each db, each collection, and each key in the collection, and the from each key to the value (so skipping the pages). Here is my code for getting the graph.
for db in dbs:
G.add_node(db)
for col in c[db].collection_names():
G.add_node(col)
G.add_edge(db, col, weight = 0.9)
for page in c[db][col].find():
if (u'_id' in page.viewvalues()):
pprint.pprint(page)
G.add_node(page[u'_id'])
G.add_edge(col, page[u'_id'], weight = 0.4)
for key, value in page.items():
G.add_node(key)
G.add_edge(col, key, weight = 0.1)
G.add_node(value)
G.add_edge(key,value)
My Problem is that I never pass the if statement if (u'_id' in page.viewvalues()): I know I am getting pages (if I print the pages before the if statement I get a few thousand printed but the if statement is always false. What have I done wrong in accessing the dictionary returned from the find() query? Thanks.
EDIT:
I should probably also mention that when I do something like this
for i in page:
instead of the of the if statement it works for a bit and then breaks saying TypeError: unhashable type: 'dict' and I figured this was when it hit an empty page or when find() returned no pages.
This works for me:
import pymongo
c = pymongo.Connection()
dbs = c.database_names()
for db in dbs:
for col in c[db].collection_names():
for page in c[db][col].find():
if '_id' in page:
for key, value in page.iteritems():
print key, value
You always get a dictionary while iterating over pymongo cursor (which is returned by find()). So, you can just check if there is an _id key in the dictionary.
By the way, you can specify what fields to see in the results by providing the fields argument to the find().

Categories