PYMONGO - How do I use the query $in operator with MongoIDs? - python

So I am trying to use the $in operator in Pymongo where I want to search with a bunch of MongoIDs.
First I have this query to find an array of MongoIDs:
findUsers = db.users.find_one({'_id':user_id},{'_id':0, 'f':1})
If I print the findUsers['f'] it looks like this:
[ObjectId('53b2dc0b24c4310292e6def5'), ObjectId('53b6dbb654a7820416a12767')]
These object IDs are user ids and what I want to do is to find all users that are in the users collection with this array of ObjectID. So my thought was this:
foundUsers = db.users.find({'_id':{'$in':findUsers['f']}})
However when I print the foundUsers the outcome is this:
<pymongo.cursor.Cursor object at 0x10d972c50>
which is not what I normally get when I print a query out :(
What am I doing wrong here?
Many thanks.
Also just for you reference, I have queried in the mongo shell and it works as expected:
db.users.find({_id: {$in:[ObjectId('53b2dc0b24c4310292e6def5'), ObjectId('53b6dbb654a7820416a12767')]}})

You are encountering the difference between findOne() and find() in MongoDB. findOne returns a single document. find() returns a mongoDB cursor. Normally you have to iterate over the cursor to show the results. The reason your code works in the mongo shell is that the mongo shell treats cursors differently if they return 20 documents or less - it handles iterating over the cursor for you:
Cursors
In the mongo shell, the primary method for the read operation is the
db.collection.find() method. This method queries a collection and
returns a cursor to the returning documents.
To access the documents, you need to iterate the cursor. However, in
the mongo shell, if the returned cursor is not assigned to a variable
using the var keyword, then the cursor is automatically iterated up to
20 times [1] to print up to the first 20 documents in the results.
http://docs.mongodb.org/manual/core/cursors/
The pymongo manual page on iterating over cursors would probably be a good place to start:
http://api.mongodb.org/python/current/api/pymongo/cursor.html
but here's a piece of code that should illustrate the basics for you. After your call to find() run this:
for doc in findUsers:
print(doc)

Related

Return all strings from column in a table. [MySQL; pymysql]

Is there a way to return all strings from a specific column in a MySQL database?
Note: I want to save those strings in a list, I'm also using pymysql.
Those are the steps you need to follow (in words as well since you do not provide any code either):
Establish a connection with the database
Create a cursor
With the cursor you created execute a query which should be type of "SELECT column_that_you_want FROM table_you_want"
The cursor now will hold the results.
You can add the results in a list via a loop for example. Typically those results will be a one item tuple.

How to use SQLalchemy in_ with a list object?

I was trying to query a database based on some pre selected items and ran into a weird situation. I started with pre selecting some parameters that I would like use as filter in a query from one of the tables in the database:
MX_noaa_numbers = list(Events_df[Events_df['flareclass'].str.contains('M|X')].noaanumber.unique())
Which produces a list such as:
[11583,11611,11771,11777,11778,11865,12253,11967,11968,...,12673]
But when I tried to obtain the results using:
session.query(ActiveRegion).filter(sql.or_(ActiveRegion.noaa_number1.in_(MX_noaa_numbers),
ActiveRegion.noaa_number2.in_(MX_noaa_numbers),
ActiveRegion.noaa_number3.in_(MX_noaa_numbers))).all()
it returns me an empty list. However if I print MX_noaa_numbers and copy the output inside the in_() statement substituting the object name (MX_noaa_numbers) I actually get the results as I should. Am I missing something or I actually ran into some weird error?
Thanks!

Get list of query results in Peewee

Considering to switch from SQLAlchemy to peewee but have a fundamental question as I'm not able to find an example of this. I want to execute a query that returns a list of the matched objects. What works is get which returns a single record:
Topping.select().where(Topping.id==jalapenos.id).get()
What I want to get is a list of results for which all examples indicate that I should iterate. Is there a way to get a list of results from:
Topping.select(Topping).where(Topping.stock > 0)
A peewee query is lazily executed. It returns an iterator which must be accessed before the query will execute, either by iterating over the records or calling the execute method directly.
To force the query to execute immediately:
results = Topping.select().execute()
To convert query results to a list:
query = Topping.select().where(Topping.stock > 0)
toppings = list(query)
# OR
toppings = [t for t in query]
Note that you can greatly simplify your query for retrieving a single entity with:
Topping.get(Topping.id==jalapenos.id)

Slow MongoDB/pymongo query

I am submitting a pretty simple query to MongoDB (version 2.6) using the pymongo library for python:
query = {"type": "prime"}
logging.info("Querying the DB")
docs = usaspending.get_records_from_db(query)
logging.info("Done querying. Sorting the results")
docs.sort("timestamp", pymongo.ASCENDING)
logging.info("Done sorting the results, getting count")
count = docs.count(True)
logging.info("Done counting: %s records", count)
pprint(docs[0])
raise Exception("End the script right here")
The get_records_from_db() function is quite simple:
def get_records_from_db(query=None):
return db.raws.find(query, batch_size=50)
Note that I will actually need to work with all the documents, not just docs[0]. I am just trying to get docs[0] as an example.
When I run this query the output I get is:
2015-01-28 10:11:05,945 Querying the DB
2015-01-28 10:11:05,946 Done querying. Sorting the results
2015-01-28 10:11:05,946 Done sorting the results, getting count
2015-01-28 10:11:06,617 Done counting: 559952 records
However I never get back docs[0]. I have an indexes on {"timestamp": 1} and {"type": 1}, and queries seem to work reasonably well (as the count is returned quite fast), but I am not sure why I never get back the actual document (the docs are quite small [under 50K]).
PyMongo does no actual work on the server when you execute these lines:
query = {"type": "prime"}
docs = usaspending.get_records_from_db(query)
docs.sort("timestamp", pymongo.ASCENDING)
At this point "docs" is just a PyMongo Cursor, but it has not executed the query on the server. If you run "count" on the Cursor, then PyMongo does a "count" command on the server and returns the result, but the Cursor itself still hasn't been executed.
However, when you run this:
docs[0]
Then in order to get the first result, PyMongo runs the query on the server. The query is filtered on "type" and sorted by "timestamp", so try this on the mongo shell to see what's wrong with the query:
> db.collection.find({type: "prime"}).sort({timestamp: 1}).limit(1).explain()
If you see a very large "nscanned" or "nscannedObjects", that's the problem. You probably need a compound index on type and timestamp (order matters):
> db.collection.createIndex({type: 1, timestamp: 1})
See my article on compound indexes.
The reason you never get back the actual documents is because Mongo batches together these commands into one query, so you would look at it this way:
find the records then sort the records and then count the records.
You need to build two totally separate queries:
find the records, sort the records, then give me the records
count the records
If you chain them, Mongo will chain them and think that they are one command.

How to use fn Object in Peewee ORM

I'm using Python's Peewee ORM to work with a MySQL database. Peewee supplies an object called "fn" that allows you to make certain types of calls to the database. One of those calls I want to make is the following:
Blocks.select(Blocks, fn.Count(Blocks.height))
Where Blocks is a table in my database, which has a column named height. This syntax is taken straight from Peewee's documentation, namely
User.select(
User, fn.Count(Tweet.id))
located here http://peewee.readthedocs.org/en/latest/peewee/querying.html. Note that I also have the following lines at the top of my python file
import peewee
from peewee import *
from peewee import fn
Yet when I run this code, it doesn't work, and it spits out this
<class '__main__.Blocks'> SELECT t1.`height`, t1.`hash`, t1.`time`, t1.`confirmations`, t1.`size`, t1.`version`, t1.`merkleRoot`, t1.`numTX`, t1.`nonce`, t1.`bits`, t1.`difficulty`, t1.`chainwork`, t1.`previousBlockHash`, t1.`nextBlockHash`, Count(t1.`height`) FROM `blocks` AS t1 []
So this is really just printing out the column names that are returned by the select query.
What peewee code do I have to write to return the count of the number of rows in a table? I regret using peewee because it makes what should be simple queries impossibly hard to find the right syntax for.
Peewee lazily evaluates queries, so you need to coerce it to a list or iterate through it in order to retrieve results, e.g.
query = User.select(User, fn.Count(Tweet.id).alias('num_tweets'))
for user in query:
print user.username, user.num_tweets
users = list(query)

Categories