I am going to make a simple website that displays data from a script I made with Beautiful Soup. I already have a working Python code that scrapes the data I need.
What I do not know how to do is drop this Python code into a website that scrapes data on a daily basis and displays it nicely. I am scraping daily stock prices and I want my site to store each day's price in a database and plot everything to date on a simple line graph and an accompanying table.
What keywords do I use in researching this? I've started to look into Django vs Flask. Am I on the right path?
Related
I want to have some python code use selenium or bs4 to check a website hourly for updates and the get that information and put it on a website in whatever way I want to format it. I would like to know what's the best way to connect these two together.
I would probably set up a cron job (scheduled task on windows) to run hourly to run your script to scrape the website you're getting data from. The logic in your script will depend on the website you're entering data into, but broadly speaking I would scrape the data, process it, then either send a POST request to the website or update the data source (a db, file, whatever) directly.
I'm trying to get a list of all NBA games and the referees from each game.
This website (https://official.nba.com/referee-assignments/) has the referee information, but you have to click the "Date" button in the top-right and can only look at one date at a time. The data goes all the way from 12-02-2015 through today.
I'm a complete newbie to web scraping. I've put together some python code (using selenium) so far that will scrape the information I want, but I can only figure out how to do it for 1 day. Is there any way to automate it so it can scrape the info for all dates from 12-02-2015 through today?
Right now I'm trying to scrape the dividend yield from a chart using the following code.
df = pd.read_html('https://www.macrotrends.net/stocks/charts/BMO/Bank-of-Montreal/dividend-yield-history')
df = df[0].dropna()
But the code wont pick up the chart's data.
Any suggestions on pulling it from the website?
Here is the specific link I'm trying to use: https://www.macrotrends.net/stocks/charts/BMO/Bank-of-Montreal/dividend-yield-history
I've used the code for picking up the book values but the objects they're using for the dividends and book values must be different.
Maybe I could use Beautiful Soup?
Sadly that website is rendered dynamically, so there's nothing in the html pandas is getting to scrape from. (The chart is loaded after the page). Scraping manually isn't going to help you here, because the data isn't there. (It's fetched after the page is loaded.)
You can either find an api which provides the data (best, quite possible given the content), work out where the page is fetching its data from and see if you can get it directly (better if possible), or use something like selenium to control a real browser, render the page, get the html, and then use that.
Im building tools for a business I am starting, and one of the tools I need is a live reader of scrap prices to put into my website as well as deciphering overhead/profit. How would I take that information and then put it into a live data table?
Very amateur programmer, but better to do it myself.
You can use flask and Beautiful Soup. With the flask you will be able to save the data after obtaining the information from the site after manipulating the html with beautiful soup.
I'm a beginner to Django and am currently building a site that displays stock prices. To do that, I need to download (or update) the stock prices once a day. I know that I can retrieve stock prices using Pandas. However, I would like my site to do it once everyday at a specific time, instead of retrieving data every time a visitor visits the view. I'm a bit stuck here and did a lot of Google searches. Can someone please point me to a link that I can read up on?
EDIT: I'm currently making this site on my own computer so I haven't uploaded my files yet.
If you are using a Linux box (like Debian[0]), and have cron[1] up and running:
Create a Shell Script to call a program you will write to get the data using Pandas.
Use crontab -l to edit your crontab file and add you script to execute at any time you need.
[0] https://www.debian.org/
[1] http://linux.die.net/man/1/crontab