Webscraping a page: notification on update stock prices - python

Question
I am interested in web scraping and data analysis and I would like to develop my skills by writing a program using Python 2.7 that will monitor changes in stock prices. My goal is to be able to compare two stocks (for the time being) at certain points throughout the day, save that info into a document format easily handled by pandas (which I will learn how to use after I get this front end working). In the end I would like to map relationship trends between chosen stocks (when this one goes up by x what effect does that have on the other one). This is just a hobby project so it doesn't matter if the code is production quality.
My Experience
I am a brand new Python programmer (I have a very basic understanding of python and no real experience with any non-included modules) but I do have a technical background so if the answer to my question requires reading and understanding documentation intended for intermediate level programmers that should be OK.
For the basics I am working my way through Learning Python: Powerful Object-Oriented Programming by Mark Lutz if this helps any.
What I'm Looking For
I recognize this is a very broad subject and I am not asking for anyone to write any actual code examples or anything. I just want some direction as to where to go to get information more specific to my interests and goals.
This is actually my first post on this forum so please forgive me if this doesn't follow best practices for posting. I did search for other questions like mine and read the posting tips docs prior to writing this.

So, you want to web-scrape? If you're using Python 2.7, then you'll want to look into the urllib2, requests and BeautifulSoup libraries. If you're using Python 3.x, then you'll want to peek at urllib, urllib.request, and BeautifulSoup, again. Together, these libraries should accomplish everything you're looking to do in terms of web-scraping.
If you're interested in scraping stock data, might I suggest the yahoo_finance package? This is a Python wrapper for the Yahoo Finance API. Whenever I've done things with stock data in the past, this module was invaluable. There's also googlefinance, too. It's much easier to use these already-developed wrappers to extract stock info, rather than scraping hundreds (if not thousands) of web pages to get the data you want.

Related

Scraping model information from a program using python

I'm attempting to pull physical property information (dimensions and resistance values, in particular) from an architectural (Autodesk - Revit) model and organize that information to be exported as specific variables.
To expand slightly, for an independent study I want to perform energy balances on Revit Models, starting simple and building from there. The goal is to write code that collects information from a Revit Model and then organizes it into variables such as "Total Wall Area", "Insulation Resistance", "Drywall depth", "Total Window Area", etc. that could be then sent to a model (or simply a spreadsheet) and stored as such.
I hope that makes some sense.
Given that I am a novice coder and would prefer to write in Python, does anyone have any advice or resources concerning an efficient (simple) path to go about importing and organizing specific parameters from a Revit model?
Is it necessary (or realistically necessary, given the humble extent of my knowledge) to use the API for this program (Revit) to accomplish this task?
I imagine this task is similar to web scraping yet I have no HTML to call and search through and therefore am happily winging my way along, asking folks far more knowledgeable than I if they have any insight.
A brief background, I have next to no knowledge of Revit or APIs in general, basic knowledge of coding in Python and really want to learn more!
Any help you are able to give is absolutely appreciated! I'm also happy to answer any questions that come up.
Thank you for reading and have a terrific day!
Great question - my +1 is definitely for Revit Python Shell (RPS).
Likewise I had a basic understanding of Python and none of the Revit API, but with RPS Ive coded multiple addins for our office (including rich user interfaces using winforms) and had no limitations so far from coding in Python. Its true that there is some translating C# API samples into Python - but the reward is in seeing a few paragraphs of code becoming a few lines...
The maker of RPS (Daren) is also really helpful, so no questions go unanswered.
Disclaimer is that (like you), Im a novice programmer who has simply wanted to use the API to extend Revit. RPS for the win
Indeed the most used programming language for Revit is C# (.NET), if you decide to go with IronPython, it should work, but there is less material...
Using C#, check the My First Revit Plugin training. For your specific scenario, download the SDK and check the "Fire Rating" sample.

Programming to grab data from Bloomberg terminal

I'm pretty blue in searching Bloomberg data through programming. Wondering if there is anyway I could use some programming language, like Python, to get a huge amount of data from Bloomberg terminal? Say I want to grab thousands of bond information regarding their rate changing dates during certain periods?
What you want is... to use the API. (Screenscraping is not really an option..., and why would you, when there is a pretty good API.)
Bloomberg makes it very easy to do this in Excel, and it sounds like this might be sufficient for your needs (i.e. they are localised to a specific problem). You need to install the Bloomberg API plugin. If you contact your Bloomberg representative or the helpdesk, they can help you do this.
If you are convinced that you need to do this 'programmatically', there are a number of version of the Bloomberg API written in different languages. To find out more, go to WAPI on your terminal.
However, you should be aware that there are limits to how much data you can get through the API. Bloomberg pretty hush-hush on this, but there is some information.

How to search internet with Python?

I want to write a program that searches through a fairly large website and extracts certain things. I've had a couple online Python courses, but neither said anything about how to access the internet with Python. I have no idea where I ought to start with this.
You have first to read about the standard python library urllib2.
Once you are comfortable with the basic ideas behind this lib you can try requests which is much easier to interact with the web especially APIs. I suggest using it in parallel with httpie to test out queries quick and dirty from command line.
If you go a little further building a librairy or an engine to crawl the web you will need some sort of asynchronous programming, I recommend starting with Gevent
Finally, if you want to create a crawler/bot you can take a look at Scrapy. You should however start with basic libraries before diving into this one as it can get quite complex
It sounds like you want a web crawler/scraper. What sorts of things do you want to pull? Images? Links? Just the job for a web crawler/scraper.
Start there, there should be lots of articles on Stackoverflow that will help you implement details such as connecting to the internet (getting a web response).
See this article.
There is much more in the internet than just websites, but I assume that you just want to crawl some html pages and extract data from them. You have many many options to solve that problem. Just some starting points:
urllib2 from the standard library
https://pypi.python.org/pypi/requests (much easier and more user friendly)
http://scrapy.org/ (a very good crawling framework)
http://www.crummy.com/software/BeautifulSoup/ (library to extract data from html)

Parsing HTML in python3, re, html.parser, or something else?

I'm trying to get a list of craigslist states and their associates urls. Don't worry, I have no intentions of spaming, if you're wondering what this is for see the * below.
What I'm trying to extract begins the line after 'us states' and is the next 50 < li >'s. I read through html.parser's docs and it seemed too low level for this, more aimed at making a dom parser or syntax highlighting/formatting in an ide as opposed to searching which makes me think my best bet is using re's. I would like to keep myself contained to what's in the standard library just for the sake of learning. I'm not asking for help writing a regular expression, I'll figure that out on my own, just making sure there's not a better way to do this before spending the time on that.
*This is my first program or anything beyond simple python scripts. I'm making a c++ program to manage my posts and remind me when they've expired in case I want to repost them, and a python script to download a list of all of the US states and cities/areas in order to populate a combobox in the gui. I really don't need it, but I'm aiming to make this 'production ready'/feature complete both as a learning exercise and to create a portfolio to possibly get a job. I don't know if I'll make the program publicly available or not, there's obvious potential for misuse and is probably against their ToS anyway.
There is xml.etree an XML Parser available in the Python Standard library itself. You should not using regex for parsing XMLs. Go the particular node where you find the information and extract the links from that.
Use lxml.html. It's the best python html parser. It supports xpath!

Best way for a beginner to learn screen scraping by Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
This might be one of those questions that are difficult to answer, but here goes:
I don't consider my self programmer - but I would like to :-) I've learned R, because I was sick and tired of spss, and because a friend introduced me to the language - so I am not a complete stranger to programming logic.
Now I would like to learn python - primarily to do screen scraping and text analysis, but also for writing webapps with Pylons or Django.
So: How should I go about learning to screen scrape with python? I started going through the scrappy docs but I feel to much "magic" is going on - after all - I am trying to learn, not just do.
On the other hand: There is no reason to reinvent the wheel, and if Scrapy is to screen scraping what Django is to webpages, then It might after all be worth jumping straight into Scrapy. What do you think?
Oh - BTW: The kind of screen scraping: I want to scrape newspaper sites (i.e. fairly complex and big) for mentions of politicians etc. - That means I will need to scrape daily, incrementally and recursively - and I need to log the results into a database of sorts - which lead me to a bonus question: Everybody is talking about nonSQL DB. Should I learn to use e.g. mongoDB right away (I don't think I need strong consistency), or is that foolish for what I want to do?
Thank you for any thoughts - and I apologize if this is to general to be considered a programming question.
I agree that the Scrapy docs give off that impression. But, I believe, as I found for myself, that if you are patient with Scrapy, and go through the tutorials first, and then bury yourself into the rest of the documentation, you will not only start to understand the different parts to Scrapy better, but you will appreciate why it does what it does the way it does it. It is a framework for writing spiders and screen scrappers in the real sense of a framework. You will still have to learn XPath, but I find that it is best to learn it regardless. After all, you do intend to scrape websites, and an understanding of what XPath is and how it works is only going to make things easier for you.
Once you have, for example, understood the concept of pipelines in Scrapy, you will be able to appreciate how easy it is to do all sorts of stuff with scrapped items, including storing them into a database.
BeautifulSoup is a wonderful Python library that can be used to scrape websites. But, in contrast to Scrapy, it is not a framework by any means. For smaller projects where you don't have to invest time in writing a proper spider and have to deal with scraping a good amount of data, you can get by with BeautifulSoup. But for anything else, you will only begin to appreciate the sort of things Scrapy provides.
Looks like Scrappy is using XPATH for DOM traversal, which is a language itself and may feel somewhat cryptic for some time. I think BeautifulSoup will give you a faster start. With lxml you'll have to invest more time learning, but it generally considered (not only by me) a better alternative to BeautifulSoup.
For database I would suggest you to start with SQLite and use it until you hit a wall and need something more scalable (which may never happen, depending on how far you want to go with that), at which point you'll know what kind of storage you need. Mongodb is definitely overkill at this point, but getting comfortable with SQL is a very useful skill.
Here is a five-line example I gave some time ago to illustrate hoe BeautifulSoup can be used.
Which is the best programming language to write a web bot?
I really like BeautifulSoup. I'm fairly new to Python but found it fairly easy to start screen scraping. I wrote a brief tutorial on screen scraping with beautiful soup. I hope it helps.
Per the database part of the question, use the right tool for the job. Figure out what you wanna do, how you wanna organize your data, what kind of access you need, etc. THEN decide if a no-sql solution works for your project.
I think no-sql solutions are here to stay for a variety of different applications. We've implemented them on various projects I've worked on in the last 20 years inside of SQL databases without dubbing it no-sql so the applications exist. So it's worth at least getting some background on what they offer and which products are working well to date.
Design your project well, and keep the persistence layer separate, and you should be able to change your database solution with only minor heartache if you decide that's what's necessary.
I recommend starting lower level while learning - scrapy is a high level framework.
Read a good Python book like Dive Into Python then look at lxml for parsing HTML.
before diving into Scrapy take Udacity's introduction to Computer Science: https://www.udacity.com/course/cs101
That's a great way to familiarize yourself with Python and you will actually learn Scrapy lot faster once you have some basic knowledge of Python.

Categories