Extract the main article text from a Wikipedia page using Python [closed]

Extract the main article text from a Wikipedia page using Python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've been searching for hours on how to extract the main text of a Wikipedia article, without all the links and references. I've tried wikitools, mwlib, BeautifulSoup and more. But I haven't really managed to.
Is there any easy and fast way for me to take the clear text (the actual article), and put it in a Python variable?
SOLUTION: Omid Raha solved it :)

You can use this package, that is a python wrapper for Wikipedia API,
Here is a quick start.
First install it:
pip install wikipedia
Example:
import wikipedia
p = wikipedia.page("Python programming language")
print(p.url)
print(p.title)
content = p.content # Content of page.
Output:
http://en.wikipedia.org/wiki/Python_(programming_language)
Python (programming language)

Related

it possible to generate the rss feed from a website that didnt provide rss [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to know if it's possible to use Python or any other way to generate an RSS feed for a website, if the site does not provide RSS feeds.
Are there any examples?

Yes, if I would build something like that I would design it like this.
Write a Flask server which would handle request.
On every request download data from the target website with bs4.
Transform the data to XML output according to RSS format.
It's a bit more than just short code, but nothing very hard.

Simple and performant way to save public list of IPs into python list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
What's a simple and performat way to save online published lists of IP addresses like this one in a standard python list? Example
ip_list = ['109.70.100.20','185.165.168.229','51.79.86.174']
HTML parsing library beautifulsoap seems way to sophisticated for the simple structure.

Its not that beautifulsoup is too sophisticated, its that the content type is text, not html. There are several APIs for downloading content, and requests is popular. If you use its text property, it will perform any decoding and unzipping needed
import requests
resp = requests.get("https://www.dan.me.uk/torlist/")
ip_list = resp.text.split()

How to search for links in a given page with Bash, or Python or any other popular scripts [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Given a http/https page, I would like to search for some links on that page, anyone knows how to achieve this goal with Bash, Python or any other popular script languages?

Try this in python. It will print all tags with a link:
import requests
from bs4 import BeautifulSoup as soup
print(soup(requests.get('Your link').content).find_all('a', href=True'))

You should use Beautiful Soup. It's an html parser library in python. You'll look for <a> tags and grab the inner content.

Discussion comments from Coursera [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I would like to do a simple machine learning project where I want to analyse comments from discussion forums on Coursera courses.
However, I am not sure if it is possible to do so programatically. So, providing a course page address, user name, password and getting all the discussion forums comments.
Being able to use Python for this would be awesome but I am language agnostic.

You can access web pages with python using urllib:
https://docs.python.org/2/library/urllib.html
or the higher lever interface requests:
http://requests.readthedocs.org/en/latest/
Then you still have to parse the content of the page and extract the comments.

Django website that use webcrawling [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I made in python a webcrawler, and I want to make Django website to display the results of the crawling, can I do it? for example: the crawler enter to "https://stackoverflow.com/questions" and take the questins of the first page and display it on the website(maybe on the tamplate of Django).

EDIT: I just re-read your question and it looks like you're already doing the scraping? You should implement that functionality into a Django site and then yes of course you can display it.
Yes you can do it. A great combination for web scraping with django is using the python requests library (which I'm guessing you're using) andBeautiful Soup. I've used this combination in a few different projects and I've been happy with it.
Another popular option uses the python Scrapy library and combines it into a django app called django-dynamic-scraper.
Hopefully this points you in the right direction.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract the main article text from a Wikipedia page using Python [closed] - python

Related

it possible to generate the rss feed from a website that didnt provide rss [closed]

Simple and performant way to save public list of IPs into python list [closed]

How to search for links in a given page with Bash, or Python or any other popular scripts [closed]

Discussion comments from Coursera [closed]

Django website that use webcrawling [closed]

Categories

Resources