Python Praw skipping sticky in subreddits

Python Praw skipping sticky in subreddits - python

I am trying to loop through subreddits, but want to ignore the sticky posts at the top. I am able to print the first 5 posts, unfortunately including the stickies. Various pythonic methods of trying to skip these have failed. Two different examples of my code below.
subreddit = reddit.subreddit(sub)
for submission in subreddit.hot(limit=5):
# If we haven't replied to this post before
if submission.id not in posts_replied_to:
##FOOD
if subreddit == 'food':
if 'pLEASE SEE' in submission.title:
pass
if "please vote" in submission.title:
pass
else:
print(submission.title)
if re.search("please vote", submission.title, re.IGNORECASE):
pass
else:
print(submission.title)
I noticed a sticky tag in the documents but not sure exactly how to use it. Any help is appreciated.

Submissions which are stickied have a sticked attribute that evaluates to True. Add the following to your loop, and you should be good to go.
if submission.stickied:
continue
In general, I recommend checking the available attributes on the objects you are working with to see if there is something usable. See: Determine Available Attributes of an Object

It looks like you can get the id of a stickied post based on docs. So perhaps you could get the id(s) of the stickied post(s) (note that with the 'number' parameter of the sticky method you can say give me the first, or second, or third, stickied post; use this to your advantage to get all of the stickied posts) and for each submission that you are going to pull, first check its id against the stickied ids.
Example:
# assuming there are no more than three stickies...
stickies = [reddit.subreddit("chicago").sticky(i).id for i in range(1,4)]
and then when you want to make sure a given post isn't stickied, use:
if post.id not in stickies:
do something
It looks like, were there fewer than three, this would give you a list with duplicate ids, which won't be a problem.

As an addendum to #Al Avery's answer, you can do a complete search for the IDs of all stickies on a given subreddit by doing something like
def get_all_stickies(sub):
stickies = set()
for i in itertools.count(1):
try:
sid = sub.sticky(i)
except pawcore.NotFound:
break
if sid in stickies:
break
stickies.add(sid)
return stickies
This function takes into account that the documentation lead one to expect an error if an invalid index is supplied to stick, while the actual behavior seems to be that a duplicate ID is returned. Using a set instead of a list makes lookup faster if you have a large number of stickies. You would use the function as
subreddit = reddit.subreddit(sub)
stickies = get_all_stickies(subreddit)
for submission in subreddit.hot(limit=5):
if submission.id not in posts_replied_to and submission.id not in stickies:
print(submission.title)

Related

Import and insert word in sequence in Python

I want to import and insert word in sequence and NOT RANDOMLY, each registration attempt uses a single username and stop until the registration is completed. Then logout and begin a new registration with the next username in the list if the REGISTRATION is FAILED, and skip if the REGISTRATION is SUCCEDED.
I'm really confused because I have no clue. I've tried this code but it chooses randomly and I have no idea how to use the "for loop"
import random
Copy = driver.find_element_by_xpath('XPATH')
Copy.click()
names = [
"Noah" ,"Liam" ,"William" ,"Anthony"
]
idx = random.randint(0, len(names) - 1)
print(f"Picked name: {names[idx]}")
Copy.send_keys(names[idx])
How can I make it choose the next word in sequence and NOT RANDOMLY
Any Help Please

I am going to assume that you are happy with what the code does, with exception that the names it picks are random. This narrows everything down to one line, and namely the one that picks names randomly:
idx = random.randint(0, len(names) - 1)
Simple enough, you want "the next word in sequence and NOT RANDOMLY":
https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
If you take a look at the link I've provided, you can see that lists have a pop() method, returning and removing some element from the list. We want the first one so we will provide 0 as the argument for the pop method.
We modify the line to look something like this
name = names.pop(0)
Now you still want to have the for-loop that will loop over all of the actions including name picking so you encapsulate all of the code in a for-loop:
names = [
"Noah" ,"Liam" ,"William" ,"Anthony"
]
for i in range(len(names)):
# ...
Copy = driver.find_element_by_xpath('XPATH')
Copy.click()
name = names.pop(0)
print(f"Picked name: {name}")
Copy.send_keys(name)
# ...
You might notice that the names list is not inside the for-loop. That is because we don't want to reassign the list every time we try to use a new name.
If you're completely unsure how for-loops work or how to implement one yourself, you should probably start by reading about how they work.
https://docs.python.org/3/tutorial/controlflow.html?highlight=loop#for-statements
Last but not least you can see some # ... comments in my example indicating where the logic will probably go for the other part of your question: "Then logout and begin a new registration with the next username in the list if the REGISTRATION is FAILED, and skip if the REGISTRATION is SUCCEDED." I don't think we I can help you with that since there is simply not enough context or examples in your question.
Refer to this guide explaining how to ask a well formulated question so we can help you more next time.

How to check if extended entity is present or not in tweepy response

I am able to fetch different tweet parameters from tweet.
keyword = tweepy.Cursor(api.search, val,tweet_mode='extended',lang='en').items(2)
tweetdone = 0
all_tweet = []
for tweet in keyword:
tweet_record = {}
tweet_record['tweet.text'] = tweet.full_text
tweet_record['tweet.user.name'] = tweet.user.name
tweet_record['tweet.user.location'] = tweet.user.location
tweet_record['tweet.user.verified'] = tweet.user.verified
tweet_record['tweet.lang'] = tweet.lang
tweet_record['tweet.created_at'] = tweet.created_at
tweet_record['tweet.user'] = tweet.user
tweet_record['tweet.retweet_count'] = tweet.retweet_count
tweet_record['tweet.favorite_count'] = tweet.favorite_count
I want to parse media objects from the tweet, but extended_entities in which media_url is present is not available in all tweets.
so if I try to fetch it like this:
tweet_record['media_url'] = tweet.extended_entities.media_url
It errors out because extended_entities may not be present in some tweets.
How to deal this issue and fetch media content correctly?

You have a couple of options here, you can check whether the key exists, or use some try/excepts.
Check whether key exists:
You can do this because tweepy returns a status object, which acts similarly to a json file, or python dictionary, and thus you essentially have a key:value pair. You should be able to use (going by your above code)
if 'extended_entities' in tweet:
tweet_record['media_url'] = tweet.extended_entities.media_url
of course, the reverse is also possible
if 'extended_entities' not in tweet:
#whatever you want to do
This could lead to problems though, what if the extended_entities exists, but for some reason media_url doesn't? And what if you want to get even more from within that (there isn't for a status object, but hey, I'm just trying to future proof here!) You'll have to do long, or multi nested if statements, which won't look the best
if 'extended_entities' in tweet:
if 'media_url' in tweet['extended_entities']
#etc
so it might be easier to just throw it in a try except...
try:
tweet_record['media_url'] = tweet.extended_entities.media_url
except AttributeError:
#etc
this means the program won't error when particular elements aren't found. AttributeError is for accessing an invalid attribute of an object. You of course may want to re-order this for readability. Keep in mind though, that while doing this is pythonic it can be a bit hard to read if used too often in my opinion.
I referred to this question when looking up things for this answer. Gives some good ideas for this sort of thing if you need further help.
Hope that helps.

Also, a good option is to use hasattr(Object, name) within an if-statement:
if hasattr(tweet, "extended_entities"):
\# do whatever

Output of python code is one character per line

I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!

The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.

Django assert that response contains one of a list of possible strings

I'm writing tests for my Django app using the built-in testing tools. Right now I'm trying to write a test for a page that displays a list of a user's followers. When a user has no followers the page displays a message randomly picked from a list of strings. As an example:
NO_FOLLOWERS_MESSAGES = [
"You don't have any followers.",
"Sargent Dan, you ain't got no followers!"
]
So now I want to write a test that asserts that the response contains one of those strings. If I was only using one string, I could just use self.assertContains(request, "You don't have any followers.") but I'm stuck on how to write the test with multiple possible outcomes. Any help would be appreciated.

Try this:
if not any([x in response.content for x in NO_FOLLOWERS_MESSAGES]):
raise AssertionError("Did not match any of the messages in the request")
About any(): https://docs.python.org/2/library/functions.html#any

Would something like this work?
found_quip = [quip in response.content for quip in NO_FOLLOWERS_MESSAGES]
self.assertTrue(any(found_quip))

Internally assertContains(), uses the count from _assert_contains()
So if you want to preserve exactly the same behavior as assertContains(), and given that the implementation of _assert_contains() isn't a trivial one, you can get inspiration from the source code above and adapt one for your needs
Our assertContainsAny() inspired by assertContains()
def assertContainsAny(self, response, texts, status_code=200,
msg_prefix='', html=False):
total_count = 0
for text in texts:
text_repr, real_count, msg_prefix = self._assert_contains(response, text, status_code, msg_prefix, html)
total_count += real_count
self.assertTrue(total_count != 0, "None of the text options were found in the response")
Use by passing the argument texts as a list, e.g.
self.assertContainsAny(response, NO_FOLLOWERS_MESSAGES)

one-many relationship-google datastore-python

I have two models like below:-
class Food(db.Model):
foodname=db.StringProperty()
cook=db.StringProperty()
class FoodReview(db.Model):
thereview=db.StringProperty()
reviews=db.ReferenceProperty(Food,collections_name='thefoodreviews')
I go ahead and create an entity:-
s=Food(foodname='apple',cook='Alice')`
s.put()
When someone writes a review, the function which does the below comes in play:
theentitykey=db.Query(Food,keys_only=True).filter('foodname =','apple').get()
r=FoodReview()
r.reviews=theentitykey #this is the key of the entity retrieved above and stored as a ref property here
r.thereview='someones review' #someone writes a review
r.put()
Now the problem is how to retrieve these reviews. If I know the key of the entity, I can just do the below:-
theentityobject=db.get(food-key) # but then the issue is how to know the key
for elem in theentityobject.thefoodreviews:
print elem.thereview
else I can do something like this:-
theentityobj=db.Query(Food).filter('foodname =','apple').get()
and then iterate as above, but are the above two ways the correct ones?

If to get the food you're always doing db.Query(Food).filter('foodname =','apple') then it looks like your foodname is your key...
Why not just use it as a key_name?
Then, you can even fetch the reviews without fetching the food itself:
key = db.Key.from_path('food', 'apple')
reviews = FoodReview.all().filter("reviews =", key)

The second method looks exactly like what AppEngine tutorial advices.
Seems like the right thing to do, if you want to find all reviews for a particular foodname.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.