Building comprehensive scraping program/database for real estate websites - python

I have a project I’m exploring where I want to scrape the real estate broker websites in my country (30-40 websites of listings) and keep the information about each property in a database.
I have experimented a bit with scraping in python using both BeautifulSoup and Scrapy.
What I would Ideally like to achieve is a daily updated database that will find new properties and remove properties when they are sold.
Any pointers as to how to achieve this?
I am relatively new to programming and open to learning different languages and resources if python isn’t suitable.
Sorry if this forum isn’t intended for this kind of vague question :-)

Build a scraper and schedule a daily run. You can use scrapy and the daily run will update the database daily.

Related

Scan webpage for text changes and content

New to programming. If this is too basic I do apologize.
As a more complex project to hone my budding skills with I am trying to build a price scanner and reporting feature that waits for new text on a webpage such as sale,discount,etc and read the price and send that information back to me.
Can you help me get pointed in the right direction?
I think, you need to do some research on scraping.
Some trainings and books that you may be interested to go deeper:
Data Scraping and Data Mining from Beginner to Pro with Python
Python Web Scraping Cookbook

Automated webscraper for specific words

Let's say I want to make a website that automatically scrapes specific websites in order to find the ex. bike model that my customer has typed.
Customer: Wants to find this one specific bike model that is really hard to get
Customer: Finds the website www.EXAMPLE.com, the website will notify him when there is an auction on ex. ebay or amazon.
Customer: Creates free account, and makes a post.
Website: Makes an automated scraping and keeps looking for this bike on ebay and amazon.
Website: As soon as scraping succeed and finds the bike, website sends notification to the customer.
Is that possible to make in python? And will I be able to make such a website with little knowledge after learning a bit of Python?
Yes it possible, you can achieve that by using a package such as Requests for scraping and Flask to build the website, it does require however a bit of knowledge.
Feel free to post a question after diving into the two links

Web scraping CNN data

I have a question- does CNN permit you to scrape data if it's for your own personal use? for instance, if i wanted to write a quick program that would scrape the price of a certain stock, can i scrape CNN money?
I've just started learning python so I apologize if this is a stupid question.
Obligatory I am not a lawyer.
In CNN's terms of use page it states that
You may not modify, publish, transmit, participate in the transfer or
sale, create derivative works, or in any way exploit, any of the
content, in whole or in part.
You may download copyrighted material
for your personal use only
So it looks like if you do it for personal use only and don't share any of the results of the work you would be fine.
However, some sites can scrapers automatically if they issue too many requests, so be sure to rate-limit your scraping, and don't request too many pages.

Scrapy's System Performance Requirement?

I am planning to create a website for price comparison of service provided by several companies.
The main idea is visitor enter search requirement on the website and start searching, the crawl result display instantly on website rather than output as a file.
I am new to python and scrapy, not sure if scrapy can do this?
I am going to use it daily and even many times every day, and crawl on 30+ websites. I am afraid the search may overload the server? Can a shared web hosting support such crawling? Is there any system performance requirement?

Scraping RSS scraping system

I am relatively new to python only about 2 months of learning mostly by myself and loving it. I have been trying to design a program that will scrape text RSS feeds from the National Weather Service but I have no idea where to start. I want something that will scan for severe weather aka tornado watches warnings exct and send them to my email. I have already scripted a simple email alert system that will even text my phone. I was wondering if any of you guys could point me in the right direction in how to go about building an rss scraper and incorporating that with the email program to build a functional weather alert system? I am a huge weather nerd if you cant tell, and this will end up being my senior year project and something to hopefully impress my meteorology professors next year. I would appreciate any help anybody could give.
Thanks,
Andrew :D
Don't reinvent the wheel, just use FeedParser. It knows how to handle all corner cases and crazy markup better than you'll ever do.
You will need a RSS Feed parser. Once you have parsed the feeds, you will have all the relevant information needed by you. Take a look at feedparser: http://code.google.com/p/feedparser/
you can use scrapy. scrapy is the one of the latest, greatest crawling tool.
You can use this to scrape any web content. Its worth learning.
http://doc.scrapy.org/en/0.14/index.html

Categories