Pymongo is it possible to insert one with insert_many? - python

This may sound like a dumb question but I was working with pymongo and wrote the following function to insert documents and was wondering if the insert_many method would also work for one record inserts, that way I wouldn't need another function in case I was just inserting one record.
This is my function:
def insert_records(list_of_documents: list, collection):
i = collection.insert_many(list_of_documents)
print(len(i.inserted_ids), " documents inserted!")
When I insert one it throws an error:
post1 = {"_id":0, "user_name":"Jack"}
insert_records(list(post1), stackoverflow)
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
I know I can use insert_one() for this purpose, I was just wondering if it was possible to do everything with insert_many(), as the original insert() method is deprecated. Thanks!

As your post1 is a dict when you use list(post1) you have a list of keys:
>>> list(post1)
['_id', 'user_name']
Use instead:
>>> [post1]
[{'_id': 0, 'user_name': 'Jack'}]
So:
insert_records([post1], stackoverflow)

Related

Why can't I call __dict__ on object in list that happens to be within another object within a list within a dictionary?

Here's my setup: dictD contains a key users paired with value = list of UserObjects. Each UserObject has an attribute username plus two arrays, threads and comments.
I was able to convert dictD's array of user objects into a dictionary style with this call:
dictD["users"] = [user.__dict__ for user in dictD["users"]]
If I dump out dictD, here's the relevant part before I try to do my manipulation:
{
'users':[
{
'username': Redditor(user_name='$$$$$$$$$$'),
'threads':[
<__main__.redditThread instance at 0x7f05db28b320>
],
'comments':[
<__main__.comment instance at 0x7f05db278e60>
]
},
{
'username': Redditor(user_name='##########e\ gone'),
'threads':[
<__main__.redditThread instance at 0x7f05db2a4a70>
],
'comments':[
<__main__.comment instance at 0x7f05db298e18>
]
}
As you can see the comments contain comment objects and the threads list contains thread objects. So I'd like to do the same call for them that I did for the users array. But when I try to do this:
for user in dictD["users"]:
user.threads = [thread.__dict__ for thread in user.threads]
user.comments = [comment.__dict__ for comment in user.comments]
I run into this error:
AttributeError: 'dict' object has no attribute 'threads'
I also tried
users = dictD["users"]
for user in users...
but this triggers the same error message. How can I turn objects in lists into dictionary form when those objects' lists are themselves held within objects within lists within a dictionary?
Incidentally, I am doing all this so I can insert these objects into MongoDB, so if there is an easier way to serialize a complex object, please let me into the secret. Thank you.
Promoting my comment to an answer since it seems reasonable and nobody else is posting: it looks at a glance like you're confusing Python for Javascript: a dict with a key 'threads' is not an object you can reference with .threads, only with ["threads"]. ie. user.threads should be user["threads"]. A dict usually only has the same standard attributes (see: https://docs.python.org/2/library/stdtypes.html#typesmapping or https://docs.python.org/3/library/stdtypes.html#mapping-types-dict for Python 3.) The problem isn't that you're trying to call __dict__ on an object, it's that you're trying to get an attribute from an object that doesn't exist, later in that same line of code.
If you want to recreate complex objects from MongoDB rather than just nested dicts and lists then that is basically a process of deserialization; you can either handle that manually, or maybe use some sort of object mapping library to do it for you (eg. something like Mongoobject might work, though I've not tested it myself)

NDB key vs get_by_id

Just to know if I'm mistaked or not:
get() operations uses NDB cache, so this (Chapter is ndb.Model class):
# Get the entity
chapter_key = ndb.Key('Book', long(bookId), 'Chapter', long(chapterId))
chapter = chapter_key.get()
can use the ndb cache if is 2nd or more read of the entity.
But if I make this?
Chapter.get_by_id(long(id), parent=ndb.Key('Book', long(bookId)))
is this managed by ndb also, or this operation is a standart db operation and don't use cache?
Model.get_by_id will use the context-cache and memcache in exactly the same way as Key.get
As Greg's answer is correct, I just wanted to mention that instead of putting keys together manually, you may use urlsafe string, and pass it between functions.
Assuming you have a key:
page_key = ndb.Key('Book', long(bookId), 'Chapter', long(chapterId), 'Page', long(pageId))
Create urlsafe from it:
page_url_string = page_key.urlsafe()
To retrive the model, simply use:
page = ndb.Key(urlsafe=page_url_string).get()
Consider it if you're using models with several parentness, there should be no case when putting keys manually would be required, code gets messy really quick, as you need to pass additional variables between functions.

SQLAlchemy - build query filter dynamically from dict

So I have a dict passed from a web page. I want to build the query dynamically based on the dict. I know I can do:
session.query(myClass).filter_by(**web_dict)
However, that only works when the values are an exact match. I need to do 'like' filtering. My best attempt using the __dict__ attribute:
for k,v in web_dict.items():
q = session.query(myClass).filter(myClass.__dict__[k].like('%%%s%%' % v))
Not sure how to build the query from there. Any help would be awesome.
You're on the right track!
First thing you want to do different is access attributes using getattr, not __dict__; getattr will always do the right thing, even when (as may be the case for more convoluted models) a mapped attribute isn't a column property.
The other missing piece is that you can specify filter() more than once, and just replace the old query object with the result of that method call. So basically:
q = session.query(myClass)
for attr, value in web_dict.items():
q = q.filter(getattr(myClass, attr).like("%%%s%%" % value))

SQLAlchemy: selecting which columns of an object in a query

Is it possible to control which columns are queried in the query method of SQLAlchemy, while still returning instances of the object you are querying (albeit partially populated)?
Or is it necessary for SQLAlchemy to perform a SELECT * to map to an object?
(I do know that querying individual columns is available, but it does not map the result to an object, only to a component of a named tuple).
For example, if the User object has the attributes userid, name, password, and bio, but you want the query to only fill in userid and name for the objects it returns:
# hypothetical syntax, of course:
for u in session.query(User.columns[userid, name]).all():
print u
would print:
<User(1, 'bob', None, None)>
<User(2, 'joe', None, None)>
...
Is this possible; if so, how?
A simple solution that worked for me was:
users = session.query(User.userid, User.name)
for user in users:
print user
would print:
<User(1, 'bob')>
<User(2, 'joe')>
...
you can query for individual columns, which returns named tuples that do in fact act pretty much like your mapped object if you're just passing off to a template or something:
http://www.sqlalchemy.org/docs/orm/tutorial.html#querying
or you can establish various columns on the mapped class as "deferred", either configurationally or using options:
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred-column-loading
there's an old ticket in trac for something called "defer_everything_but()", if someone felt like providing tests and such there's no reason that couldn't be a feature add, here's a quick version:
from sqlalchemy.orm import class_mapper, defer
def defer_everything_but(entity, cols):
m = class_mapper(entity)
return [defer(k) for k in
set(p.key for p
in m.iterate_properties
if hasattr(p, 'columns')).difference(cols)]
s = Session()
print s.query(A).options(*defer_everything_but(A, ["q", "p"]))
defer() should really accept multiples, added ticket #2250 for that (edit: as noted in the comment this is in 0.9 as load_only())
Latest doc for load_only is here
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols
If you're looking at a way to control that at model definition level, use deferred
http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred

Splitting tuples in Python - best practice?

I have a method in my Python code that returns a tuple - a row from a SQL query. Let's say it has three fields: (jobId, label, username)
For ease of passing it around between functions, I've been passing the entire tuple as a variable called 'job'. Eventually, however, I want to get at the bits, so I've been using code like this:
(jobId, label, username) = job
I've realised, however, that this is a maintenance nightmare, because now I can never add new fields to the result set without breaking all of my existing code. How should I have written this?
Here are my two best guesses:
(jobId, label, username) = (job[0], job[1], job[2])
...but that doesn't scale nicely when you have 15...20 fields
or to convert the results from the SQL query to a dictionary straight away and pass that around (I don't have control over the fact that it starts life as a tuple, that's fixed for me)
#Staale
There is a better way:
job = dict(zip(keys, values))
I'd say that a dictionary is definitely the best way to do it. It's easily extensible, allows you to give each value a sensible name, and Python has a lot of built-in language features for using and manipulating dictionaries. If you need to add more fields later, all you need to change is the code that converts the tuple to a dictionary and the code that actually makes use of the new values.
For example:
job={}
job['jobid'], job['label'], job['username']=<querycode>
This is an old question, but...
I'd suggest using a named tuple in this situation: collections.namedtuple
This is the part, in particular, that you'd find useful:
Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the _fields attribute.
Perhaps this is overkill for your case, but I would be tempted to create a "Job" class that takes the tuple as its constructor argument and has respective properties on it. I'd then pass instances of this class around instead.
I would use a dictionary. You can convert the tuple to a dictionary this way:
values = <querycode>
keys = ["jobid", "label", "username"]
job = dict([[keys[i], values [i]] for i in xrange(len(values ))])
This will first create an array [["jobid", val1], ["label", val2], ["username", val3]] and then convert that to a dictionary. If the result order or count changes, you just need to change the list of keys to match the new result.
PS still fresh on Python myself, so there might be better ways off doing this.
An old question, but since no one mentioned it I'll add this from the Python Cookbook:
Recipe 81252: Using dtuple for Flexible Query Result Access
This recipe is specifically designed for dealing with database results, and the dtuple solution allows you to access the results by name OR index number. This avoids having to access everything by subscript which is very difficult to maintain, as noted in your question.
With a tuple it will always be a hassle to add or change fields. You're right that a dictionary will be much better.
If you want something with slightly friendlier syntax you might want to take a look at the answers this question about a simple 'struct-like' object. That way you can pass around an object, say job, and access its fields even more easily than a tuple or dict:
job.jobId, job.username = jobId, username
If you're using the MySQLdb package, you can set up your cursor objects to return dicts instead of tuples.
import MySQLdb, MySQLdb.cursors
conn = MySQLdb.connect(..., cursorclass=MySQLdb.cursors.DictCursor)
cur = conn.cursor() # a DictCursor
cur2 = conn.cursor(cursorclass=MySQLdb.cursors.Cursor) # a "normal" tuple cursor
How about this:
class TypedTuple:
def __init__(self, fieldlist, items):
self.fieldlist = fieldlist
self.items = items
def __getattr__(self, field):
return self.items[self.fieldlist.index(field)]
You could then do:
j = TypedTuple(["jobid", "label", "username"], job)
print j.jobid
It should be easy to swap self.fieldlist.index(field) with a dictionary lookup later on... just edit your __init__ method! Something like Staale does.

Categories