Neatly Print Comments Through Python Reddit API - python

I am trying to do some text analysis with Reddit comments. The script I have currently prints out the body and upvote count all comments on a given subreddit's "hot" posts with more than 5 upvotes:
import praw
reddit = praw.Reddit(client_id=ID,
client_secret=SECRET, password=PWORD,
user_agent=UAGENT, username=UNAME)
subreddit = reddit.subreddit('cryptocurrency')
for submission in subreddit.hot(limit=10):
submission.comments.replace_more(limit=10)
for comment in submission.comments.list():
submission.comment_sort = 'top'
if comment.ups > 5:
print(comment.body, comment.ups)
However, the outputs look something like this:
(u'Just hodl and let the plebs lose money on scamcoin ICO\'s that don\'t even have a working product. I don\'t understand some of these "traders" and "investors".', 9)
(u"Good idea imho but it's gonna be abused af. Think about it. It will be the sexual go to app real soon. If they will 'ban' nudity on it, then you will simply get the instagram chicks on there with all the horny guys liking their photos and giving them free money. 'if this gets 1000 likes I will post a pic of me in bikini' ", 7)
(u"But but but, I just sold a kidney and bought in at the top, now I can't afford to get the stitches removed!\n\n/s just in case.", 7)
Two questions:
Is there any way to convert the outputs to JSON using python?
If not, how can I get rid of all of the excess characters other than the body and the upvote count?
My ultimate goal is to have this output neatly organized so that I can analyze keywords vs. upvote count (what keywords get the most upvotes, etc).
Thank you!

Answer to question 2: It looks like you are writing in Python 2, but are using Python 3 print syntax. To get rid of the tuple notation in your print call you need
from __future__ import print_function
at the top of your program.

1) Is there any way to convert the outputs to JSON using python?
It's almost as simple as this
output_string = json.dumps(comments)
Except a couple of keys cause the error TypeError: Object of type Foo is not JSON serializable
We can solve this. PRAW objects which are not serializable will work correctly when converted to a string.
def is_serializable(k, v):
try:
json.dumps({k: v})
except TypeError:
return False
return True
for comment in comments:
for k, v in comment.items():
if is_serializable(k, v):
comment[k] = v
else:
comment[k] = str(v)
Now saving works.
json.dumps(comments)
2) If not, how can I get rid of all of the excess characters other than the body and the upvote count?
I think you're asking how to remove keys you do not want. You can use:
save_keys = ['body', 'ups']
for k in list(comment):
if not k in save_keys:
del comment[k]
We use list(dict) to iterate over a copy of dict's keys. This prevents you from mutating the same thing you are iterating on.
list(dict) is the same as `list(dict.keys())

Related

Correctly selecting keys and values in dictionary from parsed JSON in Python, using requests

I'm writing a twitch chat bot in python with TwtichIO, then i decided to add some pokemon functionality after i found out there's a pokemon databse with a nice API and ended up using PokePy wrapper for PokeAPI.
I have been hard-coding all my lookups(not sure that's the correct word)
Like this for example:
#bot.command(name="type", aliases=['t'])
async def types2(ctx, *, msg):
pokemon = pokepy.V2Client().get_pokemon(msg.lower())
pokemon_type = ""
for x in range(len(pokemon[0].types)):
pokemon_type += pokemon[0].types[x].type.name
pokemon_type += " "
await ctx.channel.send(str(pokemon_type))
But when i got to finding out pokemon's weaknesses based on types, i found myself 3 layers deep into nested for loops and ifs, and knew i had to do more, just because i was hard-coding it. So i decided to try and find out how i can fetch the JSON file and parse it to a dict for easier access.
I found out about the requests library. I think i have successfully used it but cannot manage to select the keys as needed.
(i have used pprint() here to make reading easier) I will post like the first bit of the printed content because it's too long. (Full output is here: https://pastebin.com/tkYKRcqY)
#bot.command(name="weak", aliases=['t'])
async def weakness(ctx, *, msg):
deals = {'4x': [], '2x': [], '1x': [], '0.5x': [], '0.25x': [], '0x': []}
response = requests.get('https://pokeapi.co/api/v2/type/fire')
pprint(response)
fetch = response.json()
pprint(fetch)
# test = fetch[damage_relations][double_damage_from]
Output:
{'damage_relations': {'double_damage_from': [{'name': 'ground',
'url': 'https://pokeapi.co/api/v2/type/5/'},
{'name': 'rock',
'url': 'https://pokeapi.co/api/v2/type/6/'},
{'name': 'water',
'url': 'https://pokeapi.co/api/v2/type/11/'}],
So essentially what i want is, to grab only the names from damage_relations-> double_damage_from -> and store each name as 2x in my dict. Same for damage_relations -> half_damage_from -> name etc.... The idea is to return to chat what is the pokemon weak to, which will be the information stored in the dict. (There is also about 95% of the data that this fetches that i do not need for this)
I have also tried using json.loads(fetch) as i was desperate, but it returned an error saying it doesn't expect dict as argument.
Essentially my end goal is to put the correct damage values in the correct keys that i've created in the deals dictionary. Also there are some pokemon with multiple types, so for example if a pokemon has fire, ground as types i want to see if both of them will put the same type in the 2x key in deal dictionary, if they attempt to do so, i need to take out that type from 2x and place it in the 4x key.
I'm working blindly here, and it's possible that the answer is very simple, i honestly do not understand it and am trying to bruteforce my way though, since i really do not understand what's going wrong. If there is anything else that i need to provide as information please let me know.
Let's clean up type2. Note that it still has a space at the very end. Either manually remove it or look into str.join.
#bot.command(name="type", aliases=['t'])
async def types2(ctx, *, msg):
pokemon = pokepy.V2Client().get_pokemon(msg.lower())
pokemon_type = ""
for p_type in pokemon[0].types:
pokemon_type += f"{p_type.name} "
await ctx.channel.send(pokemon_type) # already a string
Just stick with fetch. It's already the dict you want. You could use json.loads(response.text), but that's redundant.
Also there are some pokemon with multiple types, so for example if a pokemon has fire, ground as types i want to see if both of them will put the same type in the 2x key in deal dictionary, if they attempt to do so, i need to take out that type from 2x and place it in the 4x key.
I suggest storing a pokemon's type(s) as a list (e.g. [fire], [fire, flying]). Let's say a pokemon's types are flying and rock. It'd have a 2x weakness and a 1/2 resistance against electric attacks. All you need to do is 2 * 1/2 == 1. If the attacking type is not present in the dict, just represent it as 1 (for no type shenanigans). To set the default, I suggest using dict.get.
What about using json module and json.loads(response.json())
You can review Python documentation or stackoverflow
String to Dictionary in Python
https://docs.python.org/3/library/json.html
Here's a few answers for you that should work in terms of getting the JSON as you want it, starting at response.json():
import json
json.dump(response.json(), open('outfile.json', 'w+'), indent=4)
Additionally, if you wanted to just print the prettified JSON, you could do so by doing the following:
import json
print(json.dumps(response.json(), indent=4))
In terms of why json.load/loads doesn't work is because that is for loading JSON dicts from files or strings, so you're essentially trying to pull a dict object out of a dict object in your code. Hope this was helpful!

Output of python code is one character per line

I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!
The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.

Zapier line item manipulation with python

My zap runs a GET from Intercom which places all messages in a convo into a line-item field.
I want to change the entire line item field into a string without all the commas that zapier puts in when it joins all the values. So I can write the conversation as a text note elsewhere.
Someone at zapier suggested I should use join in code to do this but ofcourse they aren't allowed to give me the actual code.
Input:
input_data = {
"values": "<p>Ok, I see your request. Give me just a minute to get it set up </p>,<p>ok</p>,<p>You should see the email shortly. When you get logged in, let me know if you have any other questions, I'm happy to help </p>,<p>cool</p>,<p>More Pipedrive testing</p>"
}
I tried the following code:
L = input_data['values']
return {" ".join(str(x) for x in L)}
But got the following errors:
TypeError(repr(o) + " is not JSON serializable")
TypeError: set(['< p > H i n e w b i e < / p >']) is not JSON serializable
Cool! So your issue is that you're returning a python set from your code step, which Zapier can't turn into json.
That's happening because you've got a string between curlies. See:
>>> {'asdf'}
set(['asdf'])
Your input is a big string with html in it. It seems like you could split on </p>,<p> and join on ' '.
In either case, you need to return your output as a value:
>>> {'result': " ".join(str(x) for x in L.split('</p>,<p>'))}
{'result': "<p>Ok, I see your request. Give me just a minute to get it set up ok You should see the email shortly. When you get logged in, let me know if you have any other questions, I'm happy to help cool More Pipedrive testing</p>"}
You could also pull of the leading and trailing <p> tags if you'd like.
Hope that helps!

Python - Searching a dictionary for strings

Basically, I have a troubleshooting program, which, I want the user to enter their input. Then, I take this input and split the words into separate strings. After that, I want to create a dictionary from the contents of a .CSV file, with the key as recognisable keywords and the second column as solutions. Finally, I want to check if any of the strings from the split users input are in the dictionary key, print the solution.
However, the problem I am facing is that I can do what I have stated above, however, it loops through and if my input was 'My phone is wet', and 'wet' was a recognisable keyword, it would go through and say 'Not recognised', 'Not recognised', 'Not recognised', then finally it would print the solution. It says not recognised so many times because the strings 'My', 'phone' and 'is' are not recognised.
So how do I test if a users split input is in my dictionary without it outputting 'Not recognised' etc..
Sorry if this was unclear, I'm quite confused by the whole matter.
Code:
import csv, easygui as eg
KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
Problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
for Problems, Solutions in (KeywordsCSV.items()):
pass
Note, I have the pass there, because this is the part I need help on.
My CSV file consists of:
problemKeyword | solution
For example;
wet Put the phone in a bowl of rice.
Your code reads like some ugly code golf. Let's clean it up before we look at how to solve the problem
import easygui as eg
import csv
# # KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
# why are you nesting THREE function calls? That's awful. Don't do that.
# KeywordsCSV should be named something different, too. `problems` is probably fine.
with open("Keywords and Solutions.csv") as f:
reader = csv.reader(f)
problems = dict(reader)
problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
# this one's not bad, but I lowercased your `Problem` because capital-case
# words are idiomatically class names. Chaining this many functions together isn't
# ideal, but for this one-shot case it's not awful.
Let's break a second here and notice that I changed something on literally every line of your code. Take time to familiarize yourself with PEP8 when you can! It will drastically improve any code you write in Python.
Anyway, once you've got a problems dict, and a problem that should be a KEY in that dict, you can do:
if problem in problems:
solution = problems[problem]
or even using the default return of dict.get:
solution = problems.get(problem)
# if KeyError: solution is None
If you wanted to loop this, you could do something like:
while True:
problem = eg.enterbox(...) # as above
solution = problems.get(problem)
if solution is None:
# invalid problem, warn the user
else:
# display the solution? Do whatever it is you're doing with it and...
break
Just have a boolean and an if after the loop that only runs if none of the words in the sentence were recognized.
I think you might be able to use something like:
for word in Problem:
if KeywordsCSV.has_key(word):
KeywordsCSV.get(word)
or the list comprehension:
[KeywordsCSV.get(word) for word in Problem if KeywordsCSV.has_key(word)]

Use generic keys in dictionary in Python

I am trying to name keys in my dictionary in a generic way because the name will be based on the data I get from a file. I am a new beginner to Python and I am not able to solve it, hope to get answer from u guys.
For example:
from collections import defaultdict
dic = defaultdict(dict)
dic = {}
if cycle = fergurson:
dic[cycle] = {}
if loop = mourinho:
a = 2
dic[cycle][loop] = {a}
Sorry if there is syntax error or any other mistake.
The variable fergurson and mourinho will be changing due to different files that I will import later on.
So I am expecting to see my output when i type :
dic[fergurson][mourinho]
the result will be:
>>>dic[fergurson][mourinho]
['2']
It will be done by using Python
Naming things, as they say, is one of the two hardest problems in Computer Science. That and cache invalidation and off-by-one errors.
Instead of focusing on what to call it now, think of how you're going to use the variable in your code a few lines down.
If you were to read code that was
for filename in directory_list:
print filename
It would be easy to presume that it is printing out a list of filenames
On the other hand, if the same code had different names
for a in b:
print a
it would be a lot less expressive as to what it is doing other than printing out a list of who knows what.
I know that this doesn't help what to call your 'dic' variable, but I hope that it gets you on the right track to find the right one for you.
i have found a way, if it is wrong please correct it
import re
dictionary={}
dsw = "I am a Geography teacher"
abc = "I am a clever student"
search = re.search(r'(?<=Geography )(.\w+)',dsw)
dictionary[search]={}
again = re.search(r'(?<=clever )(.\w+)' abc)
dictionary[search][again]={}
number = 56
dictionary[search][again]={number}
and so when you want to find your specific dictionary after running the program:
dictionary["teacher"]["student"]
you will get
>>>'56'
This is what i mean to

Categories