Hi I was facing a problem when I doing PyMongo.
keywords is a user input by list(input().split())
When I doing the results = post_collection.find()
I tried four different statement after the find bracket,
{"Title": keyword} or {"Body": keyword} or {"Tags": keyword}
{"title": keyword} or {"body": keyword} or {"tags": keyword}
"Title": {"$in": keywords}} or {"Body": {"$in": keywords}} or {"Tags": {"$in": keywords}
{}
1,2,3 gives me no response, the results.count() return 0 to me , and it will never goes into next 'for posts in results' loop, it will just skip the for loop and keep going to the next input section.
4 returns me all the posts in my json file opened in the beginning.
I was wondering why I was having that problem, and I am struggling about it the whole day.
Thank you for your time
Below is part of my code.
part of code
Part of document
{
"Id": "1",
"PostTypeId": "1",
"AcceptedAnswerId": "9",
"CreationDate": "2010-08-17T19:22:37.890",
"Score": 16,
"ViewCount": 28440,
"Body": "What is the hardware and software differences between Intel and PPC Macs?\n",
"OwnerUserId": "10",
"LastEditorUserId": "15",
"LastEditDate": "2010-09-08T15:12:04.097",
"LastActivityDate": "2017-09-21T12:16:56.790",
"Title": "What is the difference between Intel and PPC?",
"Tags": "",
"AnswerCount": 9,
"CommentCount": 0,
"FavoriteCount": 6,
"ContentLicense": "CC BY-SA 2.5"
},
The or statement doesn't work like this in mongoDb. Read more here
Here is the query
{
"$or": [
{
"title": keyword
},
{
"body": keyword
}
{
"tags": keyword
}
]
}
Here is a working example
The problem in your code is under print(3). In your mongo request, the way you are passing the keyword is actually invalid, because this one is blocked in the scope of for loop on top of it. So the mongo request can't reach the correct keyword. The solution is to push all the code under print(keyword), in the first for loop. Like this:
for keyword in keywords:
print(keyword)
print(3)
results = posts_collection.find({"$or": [{"Title": keyword}, {"Body": keyword}, {"Tags": keyword}]})
print(results)
print(results.count())
print(4)
for posts in results:
print(5)
if posts['PostTypeId'] == 1:
print(posts['titlse']) # ...
else:
print(1)
continue
Related
I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here
I'm having something strange happen with the Robinhood API. Specifically with getting all of the options instruments (just data about the options). The code below is part of my program
def get_options_instruments(self):
params = {
"chain_symbol" : "AMD",
"chain_id" : "e66ce029-db96-4572-87a0-b144613c08bf",
"type": "call",
"state": "active",
"tradability": "tradable",
"strike_price" : "16.0000",
"expiration_date": "2018-10-19"
}
#API_URLS['option-instrument']= "https://api.robinhood.com/options/instruments/"
response = self.login_session.get(API_URLS['option-instrument'],params = params)
response = response.json()["results"]
print(json.dumps(response,indent = 4,separators=(',', ': ')))#'option-instrument' : "https://api.robinhood.com/options/instruments/",
All parameters seem to have an effect when receiving the option instruments EXCEPT expiration date (which is the most one of the important parameters that I need to use). Below is a sample response of an option instrument.
{
"issue_date": "1987-01-12",
"strike_price": "16.0000",
"url": "https://api.robinhood.com/options/instruments/3cb75cca-0987-46d7-bff1-20cadfb74a83/",
"expiration_date": "2018-07-20",
"tradability": "tradable",
"chain_id": "e66ce029-db96-4572-87a0-b144613c08bf",
"updated_at": "2018-06-03T00:16:56.985489Z",
"min_ticks": {
"cutoff_price": "3.00",
"below_tick": "0.01",
"above_tick": "0.05"
},
"state": "active",
"id": "3cb75cca-0987-46d7-bff1-20cadfb74a83",
"chain_symbol": "AMD",
"type": "call",
"created_at": "2017-11-18T04:15:17.795113Z"
}
I'm just wondering if anybody has any idea why something like this might happen? Could it perhaps be something on the API's side and not mine? Thank you.
Use "expiration_dates" instead and it will work.
I encountered this issue recently as well, where specifying parameters for the "expiration_date" field yielded nothing.
I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here
Like so
addrs=data['items'][2]['location']['address']
Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.
data["items"][2]["location"]["address"]
This will access the address for you.
You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]
I have a small API that i'm working on, everything works ok, all my requests do what they are supposed to but when I try to filter results through the URL query for some reason it works for id but not for device field.
def on_get(self, req, resp):
"""Handles GET requests"""
if req.get_param("id"):
result = {'location': r.db(PROJECT_DB).table(PROJECT_TABLE).get(req.get_param("id")).run(db_connection)}
elif req.get_param("device"):
result = {'location': r.db(PROJECT_DB).table(PROJECT_TABLE).get(req.get_param("device")).run(db_connection)}
else:
location = r.db(PROJECT_DB).table(PROJECT_TABLE).run(db_connection)
result = {'locations': [i for i in location]}
resp.body = json.dumps(result)
example http://localhost:8000/location?id=(some random id) this will work
but if i do http://localhost:8000/location?device=(some device) this will not work, returns null
So could anyone tell me what am I doing wrong? or better yet if anyone knows a better way to filter using the URL?
Note: I am using rethinkdb
EDIT:
This is what I have normally:
{
"locations": [
{
"id": "4bf4b94f-747a-42db-9d54-a8399d995025",
"location": "gps coords",
"device": "Device 2"
},
{
"id": "b5cce561-37d2-42e7-86e4-a31c008b0af2",
"location": "gps coords",
"device": "Device 1"
},
{
"id": "bebba7cf-710c-4ee8-ad69-2d58174d4e02",
"location": "gps coords",
"device": "Device 1"
},
{
"id": "e928f84b-60ff-40f3-b839-920bc99e5480",
"location": "gps coords",
"device": "Device1"
}
]
}
Filtering by id works ok, but not by device which is weird
I found the answer to this problem, the reason why it did not war was because rethinkdb only gets via primary key on the get query
result = {'location': r.db(PROJECT_DB).table(PROJECT_TABLE).get(req.get_param("device")).run(db_connection)}
so what I should have done was to filter the results by what I wanted like this and it would have worked
result = {'location': list(r.db(PROJECT_DB).table(PROJECT_TABLE).filter({'device': param}).run(db_connection))}
Thanks for the help everyone and hope this answer helps.
I have a JSON file with key value pair data. My JSON file looks like this.
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally",
"helpfullness": "3.3",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119",
"reviews": [
{
"attendance": "N/A",
"class": "CHEM 1A",
"textbook_use": "It's a must have",
"review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
},
{
"attendance": "N/A",
"class": "CHEMISTRY1A",
"textbook_use": "Essential to passing",
"review_text": "Saykally really isn't as bad as everyone made him out to be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
}]
{
{
"first_name": "Laura",
"last_name": "Stoker",
"helpfullness": "4.1",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606",
"reviews": [
{
"attendance": "N/A",
"class": "PS3",
"textbook_use": "You need it sometimes",
"review_text": "Stoker is by far the best professor. If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
},
{
"attendance": "Mandatory",
"class": "164A",
"textbook_use": "Barely cracked it open",
"review_text": "AMAZING professor. She has a good way of keeping lectures interesting. Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material. Oh, and did I mention she's hilarious!"
}]
}]
So I'm trying to do multiple things.
I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned.
Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.
{
courses:[
{
"course_name" : # class name
"course_mentioned_times" : # The amount of times the class was mentioned
professors:[ #The professor array should have professor that teaches this class which is in my shown json file
{
'first_name' : 'professor name'
'last_name' : 'professor last name'
}
}
So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd
if __name__ == "__main__":
open_json = open('result.json')
load_as_json = json.load(open_json)['professors']
outer_arr = []
outer_dict = {}
for items in load_as_json:
output_dictionary = {}
all_classes = items['reviews']
for classes in all_classes:
arr_info = []
output_dictionary['class'] = classes['class']
output_dictionary['first_name'] = items['first_name']
output_dictionary['last_name'] = items['last_name']
#output_dictionary['department'] = items['department']
output_dictionary['reviews'] = classes['review_text']
with open('output_info.json','wb') as outfile:
json.dump(output_dictionary,outfile,indent=4)
I think this program does what you want:
import json
with open('result.json') as open_json:
load_as_json = json.load(open_json)
courses = {}
for professor in load_as_json['professors']:
for review in professor['reviews']:
course = courses.setdefault(review['class'], {})
course.setdefault('course_name', review['class'])
course.setdefault('course_mentioned_times', 0)
course['course_mentioned_times'] += 1
course.setdefault('professors', [])
prof_name = {
'first_name': professor['first_name'],
'last_name': professor['last_name'],
}
if prof_name not in course['professors']:
course['professors'].append(prof_name)
courses = {
'courses': sorted(courses.values(),
key=lambda x: x['course_mentioned_times'],
reverse=True)
}
with open('output_info.json', 'w') as outfile:
json.dump(courses, outfile, indent=4)
Result, using the example input in the question:
{
"courses": [
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "PS3",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "164A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEM 1A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEMISTRY1A",
"course_mentioned_times": 1
}
]
}