How to scrape data from website with calendar - python

I'm trying to get a list of all NBA games and the referees from each game.
This website (https://official.nba.com/referee-assignments/) has the referee information, but you have to click the "Date" button in the top-right and can only look at one date at a time. The data goes all the way from 12-02-2015 through today.
I'm a complete newbie to web scraping. I've put together some python code (using selenium) so far that will scrape the information I want, but I can only figure out how to do it for 1 day. Is there any way to automate it so it can scrape the info for all dates from 12-02-2015 through today?

Related

How to Scrape data from pop-ups (i need to scrape data that is only visible once I click the popup, which is not a link)

I'm an absolute beginner in Python.
I need to scrape data from this website, which is a directory of professors
Some of the data are visible without the need to click, (names and school etc)
However I need to scrape email, department info as well.
I've been searching on the internet for the whole day and I don't know how to do it
Could anyone plz help?!
When you check the network activity, you'll see that the data is dynamically loaded from google spreadsheets. You can retrieve the spreadsheet directly without scraping.

Converting Python Script to Web Tool

I am going to make a simple website that displays data from a script I made with Beautiful Soup. I already have a working Python code that scrapes the data I need.
What I do not know how to do is drop this Python code into a website that scrapes data on a daily basis and displays it nicely. I am scraping daily stock prices and I want my site to store each day's price in a database and plot everything to date on a simple line graph and an accompanying table.
What keywords do I use in researching this? I've started to look into Django vs Flask. Am I on the right path?

Is it possible to write a Python web scraper that plays an mp3 whenever an element's text changes?

Trying to figure out how to make python play mp3s whenever a tag's text changes on an Online Fantasy Draft Board (ClickyDraft).
I know how to scrape elements from a website with python & beautiful soup, and how to play mp3s. But how do you think can I have it detect when a certain element changes so it can play the appropriate mp3?
I was thinking of having the program scrape the site every 0.5seconds to detect the changes,
but I read that that could cause problems? Is there any way of doing this?
The only way is too scrape the site on a regular basis. 0.5s is too fast. I don't know how time sensitive this project is. But scraping every 1/5/10 minute is good enough. If you need it quicker, just get a proxy (plenty of free ones out there) and you can scrape the site more often.
Just try respecting the site, Don't consume too much of the sites ressources by requesting every 0.5 seconds

Crawl data from an internal room booking website

currently I have a small task about crawl data from an internal web, but I still don't know where to start.
I have an internal website about lab-booking, you'll first need to enter username and password for access.
Come to the booking page, let say after filtered, I get a list of booking information of the lab A in 7 days, means that you will have 7 tables separately with columns are 0, 15, 30, 45, represent for minutes, and rows are 7:00, 8:00, .... 18:00 represent for hours. When you click on each cell, a new window appears with information contain in text boxes about the lab, and its status (Free/ Reserved). If the status is "Reserved", it comes with the info of who is booking, and till when. If the status is "Free", it comes with a form for you to fill in your booking information, but I guess we won't care much about this.
My goal for this is after crawling the data, I'll have a csv file with columns are days, and rows are times, with information in the cells are who is booking when for reserved time slots. It can contain null value if that time slot is free.
Because this is our company's common internal booking website, but there's a lab rule when using in our place, so I need to check if anyone violate the lab booking rule or not, first by collect the data automatically.
I have wrote a crawler from some websites by python, but those didn't come with this format so I'm a bit lost.
If you are trying to automate this process I would suggest Selenium[1]: https://selenium-python.readthedocs.io/
Or if it just crawling you can go for packages like Urllib2 or Requests in combination with Beautiful Soup.

Scrape a website with interactive buttons

I am totally new into scraping a website.
I am trying to download the tables from https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html
The way we use the website is to select a year from the button and press "Go", then a table for the selected year presented and I want to save the table.
I guess there should be a way to simulate human to select the year, for example, automatically select 1900 then press "go" , then loop for 100 times to record table from 1900 -2000. But I don't know how to simulate this human action.
I have know how to download the table once it is presented, but I just don't know how to let the table presented.
Thanks!
https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7_**1950**.html
https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7_**2030**.html
Like you see the only thing that changes is the year. So when you go to scrape a website. you need to scrape https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7_" + TheYearIWant + ".html

Categories