How to make functions for my pymongo/twitter script? - python

I'm working on creating scripts using python, mongodb and the pymongo module to fetch certain aspects of the Twitter API and store them in a mongo database. I've written some scripts to do different things: access the search API, access the user_timeline, and more. However, I have been just getting to know all of the tools that I'm working with and it's time for me to go back and make it more efficient. Thus, right now I'm working on adding functions and classes to my scripts. Here is one of my scripts without functions or classes:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
# Twitter handle that we are scraping mentions for
SCREEN_NAME = '#twitterapi'
# Connect to the database
connection = Connection()
db = connection.test
collection = db.twitterapi_mentions # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
These scripts have been working well for me, but I have to run the same script for multiple twitter handles. For instance I'll copy the same script and change the following two lines:
SCREEN_NAME = '#cocacola'
collection = db.cocacola_mentions
Thus I'm getting mentions for both #twitterapi and #cocacola. I've thought a lot about how I can make this into a function. The biggest problem that I've run into is finding a way to change the name of the collection. For instance, consider this script:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
def getMentions(screen_name):
# Connect to the database
connection = Connection()
db = connection.test
collection = db.screen_name # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=screen_name, result_type='recent', rpp=100, page=i) ['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
getMentions("#twitterapi")
getMentions("#cocacola")
If I use the above script then all of the data is stored in the collection "screen_name" but I want it to be stored in the screen name that is passed through. Ideally, I want #twitterapi mentions to be in a "twitterapi_mentions" collection and I want #cocacola mentions to be in a "cocacola_mentions" collection. I believe that using the Collection class of pymongo might be the answer and I've read the documentation but can't seem to get it to work. If you have other suggestions of how I should make this script more efficient they would be incredibly appreciated. Otherwise, please excuse any mistakes I've made, as I said, I'm new to this.

Use getattr to retrieve the attribute by string name:
collection = getattr(db, screen_name)

I'd go with:
collection = db[screen_name]
I think it's more straightforward.

Related

Trying to check if document with fields exists and if so edit it in pymongo

I'm trying to work a bit with pymongo, and I currently have a database that I need to look inside, and if the document with a specific field exists, then the document should be updated.
First I created a entry by running this a few times:
import pymongo
client = pymongo.MongoClient()
mydb = client["mydb"]
data = {'name': "john"}
mycol = mydb['something']
mycol.insert_one(data)
Which works the way I want it to.
Now, I need to check whether or not an entry exists where name = "john".
I followed this tutorial, which basically just shows this snippet:
db.student.find({name:{$exists:true}})
I tried to implement this, so it now looks like this:
import pymongo
from pymongo import cursor
client = pymongo.MongoClient()
mydb = client["mydb"]
print(mydb.something.find({"name":{"john"}}))
and this just returns <pymongo.cursor.Cursor object at 0x7fbf266239a0>
which I don't really know what to do with.
I also looked at some similar questions here, and found some suggestions for something like this:
print(mydb.values.find({"name" : "john"}).limit(1).explain())
But this just gives me a long json-looking string, which by the way doesnt change if I put other things in for "john".
So how do I check whether a document where "name" = "john" exists? and perhaps also then edit the document?
EDIT
I now tried the following solution:
import pymongo
from pymongo import cursor
client = pymongo.MongoClient()
mydb = client["mydb"]
mycol = mydb['something']
name = "john"
print(mycol.find_one({name:{"$exists":True}}))
But it only prints me None
Change find() to find_one(), or if you're expecting more than one result, iterate the cursor using a for loop:
print(db.student.find_one({'name':{'$exists': True}}))
or
for student in db.student.find({'name': {'$exists': True}}):
print(student)

How to use django commands to feed a db with an external API?

I'm learning django and I want to feed my django db with https://pokeapi.co API so i can make a drop down list on HTML with every pokemon name up to date.
fetchnames.py
import requests as r
def nameslist():
payload = {'limit':809}
listpokemons = []
response = r.get('https://pokeapi.co/api/v2/pokemon', params=payload)
pokemons = response.json()
for line in pokemons['results']:
listpokemons.append(line['name'])
return listpokemons
### Function that request from API and returns a list of pokemon names (['Bulbassaur', 'Ivyssaur',...)
core_app/management/commands/queryapi.py
from core_app.models import TablePokemonNames
from core_app.fetchnames import nameslist
class FetchApi(BaseCommand):
help = "Update DB with https://pokeapi.co/"
def add_model_value(self):
table = TablePokemonNames()
table.names = nameslist()
table.save()
core_app/models.py
class TablePokemonNames(models.Model):
id = models.AutoField(primary_key=True)
names = models.CharField(max_length=100)
i'm pretty sure that i'm missing a lot since i'm still learning to use django and i'm still confuse on how should i use django commands, but, i tried to make a django command with nameslist() function and nothing happend on the db, there is something wrong with using a list to feed a db?

TypeError when importing json data with pymongo

I am trying to import json data from a link containing valid json data to MongoDB.
When I run the script I get the following error:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
What am I missing here or doing wrong?
import pymongo
import urllib.parse
import requests
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"
userid = 769630584166547456
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
print(f"Replay url: {url2}")
raw_replay_data = requests.get(url2).json()
uri = 'mongodb://testuser:password#ds245687.mlab.com:45687/liveme'
client = pymongo.MongoClient(uri)
db = client.get_default_database()
replays = db['replays']
replays.insert_many(raw_replay_data)
client.close()
I saw that you are getting the video information data for 22 videos.
You can use :
replays.insert_many(raw_replay_data['data']['video_info'])
for saving them
You can make one field as _id for mongodb document
use the following line before insert_many
for i in raw_replay_data['data']['video_info']:
i['_id'] = i['vid']
this will make the 'vid' field as your '_id'. Just make sure that the 'vid' is unique for all videos.

Adding JSON like documents to a new collection Pymongo

so right now I am querying an existing collection within mongoDB for some documents that all have the tag: "_t" : "SkeletonJoints". Once I have these documents, I want to insert it into a NEW collection that is created to hold only documents of these types with the username (e.g. username_kinectdata).
So here is my code:
#Import modules needed
import os, pymongo, json
from datetime import datetime
conn = None
db = None
isConnected = False
#try connecting with mongodb server
try:
conn = pymongo.MongoClient()
db = conn.emmersiv #connect to the emmersiv db
print db.collection_names() #print the collection of files within emmersiv db
print "Connected to the MongoDB server"
isConnected = True
except:
print "Connection Failed..."
#get all collections in a list and then remove non user data
allUsers = db.collection_names()
'''Filtering Kinect Data By Username'''
for users in allUsers:
coll = pymongo.collection.Collection(db, users.lower())
print "Currently looking at", users.lower(), " to filter kinect data"
#find all skeletal data
#kinectData = coll.find({"_t": "SkeletonJoints"})
newColl = users.lower() + "_kinectData" #name of the new collection to be made
#try to create and insert all kinect data into a new collection
try:
for line in coll.find({'_t': 'SkeletonJoints'}):
print line
jsonObj = json.loads(line) #convert to JSON?
if jsonObj is not None:
#create collection
db.create_collection(newColl)
#and insert JSON documents
coll.insert(jsonObj)
print "Insertion finished for ", newColl
print "No Insertion for ", newColl
except pymongo.errors.CollectionInvalid:
print 'Collection ', newColl, ' already exists'
except pymongo.errors.OperationFailure:
print "----> OP insertion failed"
except pymongo.errors.InvalidName:
print "----> Invalid insertion Name"
except:
print "----> WTF? ", traceback.print_exc()
So my problem is when I try insert, there is nothing being inserted. I don't really understand why this doesn't work. I am trying to iterate through the cursor.....
Thank you for your help!
No need to convert to JSON: PyMongo reads BSON from MongoDB and converts to Python dicts, and when you pass it a Python dict PyMongo converts it to BSON and sends it to MongoDB. JSON is never involved.
No need to call create_collection, MongoDB creates a collection when you insert into it for the first time.
Your statement, for line in coll.find({'_t': 'SkeletonJoints'}), will retrieve each document from the current collection that has a field "_t" with the value "SkeletonJoints", so I hypothesize that no such documents exist? Have you tried the following in the mongo shell?:
> use emmersiv
> db.MY_COLLECTION.find({_t: "SkeletonJoints"})
I expect that if you do this query (replacing "MY_COLLECTION" with the name of an actual collection) you'll get no documents in the Mongo shell, either.

Trying to Parse JSON date to POST to another System (Python)

I am trying to write a script to GET project data from Insightly and post to 10000ft. Essentially, I want to take any newly created project in one system and create that same instance in another system. Both have the concept of a 'Project'
I am extremely new at this but I only to GET certain Project parameters in Insightly to pass into the other system (PROJECT_NAME, LINKS:ORGANIZATION_ID, DATE_CREATED_UTC) to name a few.
I plan to add logic to only POST projects with a DATE_CREATED_UTC > yesterday, but I am clueless on how to setup the script to grab the JSON strings and create python variables (JSON datestring to datetime). Here is my current code. I am simply just printing out some of the variables I require to get comfortable with the code.
import urllib, urllib2, json, requests, pprint, dateutil
from dateutil import parser
import base64
#Set the 'Project' URL
insightly_url = 'https://api.insight.ly/v2.1/projects'
insightly_key =
api_auth = base64.b64encode(insightly_key)
headers = {
'GET': insightly_url,
'Authorization': 'Basic ' + api_auth
}
req = urllib2.Request(insightly_url, None, headers)
response = urllib2.urlopen(req).read()
data = json.loads(response)
for project in data:
project_date = project['DATE_CREATED_UTC']
project_name = project['PROJECT_NAME']
print project_name + " " + project_date
Any help would be appreciated
Edits:
I have updated the previous code with the following:
for project in data:
project_date = datetime.datetime.strptime(project['DATE_CREATED_UTC'], '%Y-%m-%d %H:%M:%S').date()
if project_date > (datetime.date.today() - datetime.timedelta(days=1)):
print project_date
else:
print 'No New Project'
This returns every project that was created after yesterday, but now I need to isolate these projects and post them to the other system
Here is an example of returning a datetime object from a parsed string. We will use the datetime.strptime method to accomplish this. Here is a list of the format codes you can use to create a format string.
>>> from datetime import datetime
>>> date_string = '2014-03-04 22:30:55'
>>> format = '%Y-%m-%d %H:%M:%S'
>>> datetime.strptime(date_string, format)
datetime.datetime(2014, 3, 4, 22, 30, 55)
As you can see, the datetime.strptime method returns a datetime object.

Categories