Collect "controversial" posts of a subbredit - python

I am trying to collect all posts from the "controversial" listing of a subbredit.
I tried with the Reddit API but there is a limit on how many post you can collect.
Then using PushiftAPI i can retrieve a big amount of posts but I can't find a way to collect from controversial listing.
Is there a way to collect controversial posts?

PRAW gives a way to get controversial streams and you can specify the time period as well.
From the docs
This method can be used like:
reddit.domain("imgur.com").controversial("week")
reddit.multireddit("samuraisam", "programming").controversial("day")
reddit.redditor("spez").controversial("month")
reddit.redditor("spez").comments.controversial("year")
reddit.redditor("spez").submissions.controversial("all")
reddit.subreddit("all").controversial("hour")
I am not familiar with PushiftAPI but hopefully this helps.

Related

How to remove my own instagram followers with Python?

I want to remove my own instagram followers without blocking them, using python.
I have seen many, many, many, many instagram python libraries online that allow you to stop or start following a person, but that is not what I'm looking for; I don't want to remove who I am following or start following someone, I want to remove people who are following me.
I looked into the official documentation of Instagram's HTTP API trying to make my own solution, but I couldn't find the documentation of this action under any endpoint ( I assume it should be under /friends/ ).
I vaguely remember some library that used to do this, but I cannot find it. Does anyone know of a good way to achieve this, preferably via passing an inclusion/exclusion list for the followers I want to have as a result?
I found a solution in an old library that does something similar. You can't directly remove followers through most tools, but if you block and then unblock a user, the effect you want is achieved. Example code:
# https://instagram-private-api.readthedocs.io/en/latest/_modules/instagram_private_api/endpoints/friendships.html
import instagramPrivateApi
# ...
# Implement a Client class that inherits FriendshipMixin
api = new Client()
api.friendships_block(uid)
api.friendships_unblock(uid)
Here is the API endPoint for removing a follower https://www.instagram.com/web/friendships/{user_id}/remove_follower/
You can do a post request on this URL with appropriate headers and that can do the job.

Search every subreddit by keyword with Praw

I'm having trouble understanding if this is possible in the praw API: I'd like to get a list of all posts that have comments mentioning a keyword, say "python". It seems like the search function is always called form a specific subreddit, as in
for submission in reddit.subreddit("all").search("python", sort="comments", limit=None):
print(submission.title)
But won't this only return posts that have made it to r/all? How can I search all subreddits, without brute force searching one subreddit at a time?
Searching /r/all will search all subreddits. (Or maybe it's all subreddits that have opted into /r/all)
"Made it to /r/all" includes all posts (at least from subreddits that opted into /r/all, which is most of them). The posts might appear in different listings, such as /hot and /new, or they might not be accessible through any listings due to the 1000-item limit, even though theoretically they are still part of the listing, just further down. Regardless, they will all be searchable this way.

How to get all Wikipedia articles from category and sub categories using Python? [duplicate]

I want to get all the articles names under a category and its sub-categories.
Options I'm aware of:
Using the Wikipedia API. Does it have such an option??
d/l the dump. Which format would be better for my usage?
There is also an option to search in Wikipedia something like incategory:"music", but I didn't see an option to view that in XML.
Please share your thoughts
The following resource will help you to download all pages from the category and all its subcategories:
http://en.wikipedia.org/wiki/Wikipedia:CatScan
There is also an API available here:
https://www.mediawiki.org/wiki/API:Categorymembers
You can do this through the following two API methods:
For articles pages for this category
YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Music
For get subcategories:
YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtype=subcat&cmtitle=Category:Music
You can get more info on Mediawiki API
Note that Wikipedia's categorization system is not a tree, or even an acyclic graph. It is quite possible that by continually following subcategory links you will eventually wind up back where you started.
If you are going to be making many such queries, you would be best served by downloading a database dump. If this will be an infrequent thing and will only be dealing with small categories, you could probably get away with making repeated queries to list=categorymembers.
incategory:"music" does not appear to do subcategory searching.

Get a JSON tree of all comments of a post?

I'm looking to backup a subreddit to disk. So far, it doesn't seem to be easily possible with the way that the Reddit API works. My best bet at getting a single JSON tree with all comments (and nested comments) would seem to be storing them inside of a database and doing a pretty ridiculous recursive query to generate the JSON.
Is there a Reddit API method which will give me a tree containing all comments on a given post in the expected order?
The number of comments you get from the API has a hard limit, for performance reasons; to ensure you're getting all comments, you have to parse through the child nodes and make additional calls as necessary.
Be aware that the subreddit listing will only include the latest 1000 posts, so if your target subreddit has more than that, you probably won't be able to obtain a full backup anyways.

Capturing information for customers such a referral URL and conversion

I was hoping to create my own in-house analytics so I tell my customers how many visits their company page got on my site and which URL they came from. I am coding this in Python (Flask) and I wondered if anyone could tell me what is the standard, or sensible approach to this problem.
I think it might be to have some sort of Redis queue which is triggered when a visitor comes and then this information is added to the database later so the site doesn't seem slow.
The standard, and sensible approach is to use Google Analytics. If you must roll your own, you have one of two approaches. JavaScript that is executed on every page (like GA) and pulls this kind of info into a DB. The second approach is parsing log files on the server. Awstats is a good bet for that.

Categories