PyMongo Collection Object Not Callable - python

I'm trying to create a Reddit scraper that takes the first 100 pages from the Reddit home page and stores them into MongoDB. I keep getting the error:
TypeError: 'Collection' object is not callable. If you meant to call the 'insert_one' method on a 'Collection' object it is failing because no such method exists.
Here is my code
import pymongo
import praw
import time
def main():
fpid = os.fork()
if fpid!=0:
# Running as daemon now. PID is fpid
sys.exit(0)
user_agent = ("Python Scraper by djames v0.1")
r = praw.Reddit(user_agent = user_agent) #Reddit API requires user agent
conn=pymongo.MongoClient()
db = conn.reddit
threads = db.threads
while 1==1: #Runs in an infinite loop, loop repeats every 30 seconds
frontpage_pull = r.get_front_page(limit=100) #get first 100 posts from reddit.com
for posts in frontpage_pull: #repeats for each of the 100 posts pulled
data = {}
data['title'] = posts.title
data['text'] = posts.selftext
threads.insert_one(data)
time.sleep(30)
if __name__ == "__main__":
main()

insert_one() was not added to pymongo until version 3.0. If you try calling it on a version before that, you will get the error you are seeing.
To check you version of pymongo, open up a python interpreter and enter:
import pymongo
pymongo.version
The legacy way of inserting documents with pymongo is just with Collection.insert(). So in your case you can change your insert line to:
threads.insert(data)
For more info, see pymongo 2.8 documentation

Related

API Calls bombs in loop when object not found

My code is erroring out when the object "#odata.nextLink" is not found in the JSON. I thought the while loop was supposed to account for this? I apologize if this is rudimentary but this is my first python project, so i dont know the stuffs yet.
Also, for what it is worth, the api results are quite limited, there now "total pages" value I can extract
# Import Python ODBC module
import pyodbc
import requests
import json
import sys
cnxn = pyodbc.connect(driver="{ODBC Driver 17 for SQL Server}",server="theplacewherethedatais",database="oneofthosedbthings",uid="u",pwd="pw")
cursor = cnxn.cursor()
storedProc = "exec GetGroupsAndCompanies"
for irow in cursor.execute(storedProc):
strObj = str(irow[0])
strGrp = str(irow[1])
print(strObj+" "+strGrp)
response = requests.get(irow[2], timeout=300000, auth=('u', 'pw')).json()
data = response["value"]
while response["#odata.nextLink"]:
response = requests.get(response["#odata.nextLink"], timeout=300000, auth=('u', 'pw')).json()
data.extend(response["value"])
cnxn.commit()
cnxn.close()
You can use the in keyword to test if a key is present:
while "#odata.nextLink" in response:

How to delete collection automatically in mongodb with pymongo ? (Django doesn't delete mongodb collection)

I have Django app which creates collections in MongoDB automatically. But when I tried to integrate the delete functionality, collections that are created with delete functionality are not deleted. Collections that are automatically created are edited successfully. This method is called in another file, with all parameters.
An interesting thing to note is when I manually tried to delete via python shell it worked. I won't be deleting the collections which are not required anymore.
import pymongo
from .databaseconnection import retrndb #credentials from another file all admin rights are given
mydb = retrndb()
class Delete():
def DeleteData(postid,name):
PostID = postid
tbl = name + 'Database'
liketbl = PostID + 'Likes'
likecol = mydb[liketbl]
pcol = mydb[tbl]
col = mydb['fpost']
post = {"post_id":PostID}
ppost = {"PostID":PostID}
result1 = mydb.commentcol.drop() #this doesn't work
result2 = mydb.likecol.drop() #this doesn't work
print(result1,'\n',result2) #returns none for both
try:
col.delete_one(post) #this works
pcol.delete_one(ppost) #this works
return False
except Exception as e:
return e
Any solutions, I have been trying to solve this thing for a week.
Should I change the database engine as Django doesn't support NoSQL natively. Although I have written whole custom scripts that do CRUD using pymongo.

TypeError when importing json data with pymongo

I am trying to import json data from a link containing valid json data to MongoDB.
When I run the script I get the following error:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
What am I missing here or doing wrong?
import pymongo
import urllib.parse
import requests
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"
userid = 769630584166547456
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
print(f"Replay url: {url2}")
raw_replay_data = requests.get(url2).json()
uri = 'mongodb://testuser:password#ds245687.mlab.com:45687/liveme'
client = pymongo.MongoClient(uri)
db = client.get_default_database()
replays = db['replays']
replays.insert_many(raw_replay_data)
client.close()
I saw that you are getting the video information data for 22 videos.
You can use :
replays.insert_many(raw_replay_data['data']['video_info'])
for saving them
You can make one field as _id for mongodb document
use the following line before insert_many
for i in raw_replay_data['data']['video_info']:
i['_id'] = i['vid']
this will make the 'vid' field as your '_id'. Just make sure that the 'vid' is unique for all videos.

multiprocessing python pymongo

I'm trying to do the following: grab some information off a page, and then insert it into a mongodb. There are a list of pages and I'm wanting to multiprocessing as these pages can take time to load. Once the webdriver returns the result I want to insert into the db. The problem I'm facing is that I'm only getting 1/4 of the results I'm expecting in the db, so I imagine the way I'm managing the results and the inserting isn't working. I was hoping someone could show me where I've gone wrong. The following is an example of the code:
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
from selenium import webdriver
import timeit
from pymongo import MongoClient
def mp_worker(urls):
driver = webdriver.Chrome(chromedriver,
chrome_options=options)
url = "http://website"+urls
driver.get(url)
return what_you_want
driver.quit() #do I do this here, close or quit?
def mp_handler():
urls= ["14360705","4584061","13788961","6877217","13194596","13400479","9868014","8524704","16394198","16315464"]
client = MongoClient()
db = client.test
collection = db['test-collection']
p = Pool(cpu_count()*2)
for result in p.imap(mp_worker, urls):
db.restaurants.update(result,{"upsert":"True"})
if __name__=='__main__':
start = timeit.default_timer()
mp_handler()
stop = timeit.default_timer()
print (stop - start)
This syntax is incorrect:
db.restaurants.update(result,{"upsert":"True"})
You want, likely:
db.restaurants.insert(result)
Or:
db.restaurants.update(filter, result, upsert=True)
Where "filter" is a MongoDB query (expressed as a Python dict) that uniquely matches the document you want to update or create.

Regarding memory consumption in python with urllib2 module

PS: I have a similar question with Requests HTTP library here.
I am using python v2.7 on windows 7 OS. I am using urllib2 module. I have two code snippets. One file is named as myServer.py The server class has 2 methods named as getName(self,code) and getValue(self).
The other script named as testServer.py simply calls the methods from the server class to retrieve the values and prints them. The server class basically retrieves the values from a Server in my local network. So, unfortunately I can't provide you the access for testing the code.
Problem: When I execute my testServer.py file, I observed in the task manager that the memory consumption keeps increasing. Why is it increasing and how to avoid it? If I comment out the following line
print serverObj.getName(1234)
in testServer.py then there is no increase in memory consumption.
I am sure that the problem is with the getName(self,code) of the server class. But unfortunately, I couldn't figure out what the problem is.
Code: Please find the code snippets below:
#This is the myServer.py file
import urllib2
import json
import random
class server():
def __init__(self):
url1 = 'https://10.0.0.1/'
username = 'user'
password = 'passw0rd'
passwrdmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwrdmgr.add_password(None, url1, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passwrdmgr)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
def getName(self, code):
code = str(code)
url = 'https://10.0.0.1/' + code
response = urllib2.urlopen(url)
data = response.read()
name = str(data).strip()
return name
def getValue(self):
value = random.randrange(0,11)
return value
The following is the testServer.py snippet
from myServer import server
import time
serverObj = server()
while True:
time.sleep(1)
print serverObj.getName(1234)
print serverObj.getValue()
Thank you for your time!
This is question is quite similar to my other question. So I think the answer is also quite similar. The answer can be found here https://stackoverflow.com/a/23172330/2382792

Categories