currently I have a small task about crawl data from an internal web, but I still don't know where to start.
I have an internal website about lab-booking, you'll first need to enter username and password for access.
Come to the booking page, let say after filtered, I get a list of booking information of the lab A in 7 days, means that you will have 7 tables separately with columns are 0, 15, 30, 45, represent for minutes, and rows are 7:00, 8:00, .... 18:00 represent for hours. When you click on each cell, a new window appears with information contain in text boxes about the lab, and its status (Free/ Reserved). If the status is "Reserved", it comes with the info of who is booking, and till when. If the status is "Free", it comes with a form for you to fill in your booking information, but I guess we won't care much about this.
My goal for this is after crawling the data, I'll have a csv file with columns are days, and rows are times, with information in the cells are who is booking when for reserved time slots. It can contain null value if that time slot is free.
Because this is our company's common internal booking website, but there's a lab rule when using in our place, so I need to check if anyone violate the lab booking rule or not, first by collect the data automatically.
I have wrote a crawler from some websites by python, but those didn't come with this format so I'm a bit lost.
If you are trying to automate this process I would suggest Selenium[1]: https://selenium-python.readthedocs.io/
Or if it just crawling you can go for packages like Urllib2 or Requests in combination with Beautiful Soup.
Related
I would like to create an excel sheet listing the user review scores of random games, along with the number of reviews overall.
E.x:
Name
Review Score
# of Reviews
Random Game
78%
230
Another Random Game
96%
3021
I could have a website give me a game and log the information manually, but if possible, I would like to write some code to grab that data and populate it into a file so that I can quickly accumulate a few hundred or thousand entries.
I've done a bit of googling, and I'm not quite sure where to start. What would be the best method for pulling data from steam?
You can either call their API on your own, or use steamreviews library.
steamreviews https://pypi.org/project/steamreviews/
manually call API: https://partner.steamgames.com/doc/store/getreviews
Edit: Seems there is a method to get reviews from the API, as shown in the other answer.
From what I can see from the API there is no specific endpoint to get review information. However, you could scrape a game's page and get the review information, as it is available on there.
As an example, Grand Theft Auto 5's store page shows the following.
You could then download the webpage using Requests, and process the HTML using BeautifulSoup.
You can use the same method as above to find the exact amount of positive/negative reviews, as that is also available at the bottom of the page just above the review section as a filter.
I'm scraping Ali Express products such as this one using Python. It has multiple variations, each with its own price. When one is clicked on, the price is updated to reflect this choice.
In a similar fashion, there are multiple buttons to choose where you want the item to be shipped from, which updates the shipping cost accordingly.
I want to scrape each variation's price as sent from each country. How can I do that without simulating clicks to change the prices so that I can scrape them? Where is the underlying logic that governs these price changes laid out? I couldn't find it when inspecting elements. Is it easily decipherable?
Or do I just need to give up and simulate clicks? If so, would that be done with Selenium? The reason I would prefer to extract it without clicking is that, for products such as the one I linked to, for example, there are 49 variations and 5 places from which the product is shipped so it would be a lot of clicking and a rather inelegant approach.
Thanks a lot!
take a look in the browser, all the data is in the dom
type window.runParams.data.skuModule.skuPriceList in you console you will see
I know that ecommerce companies applies this kind of logic in their backend apis. And to protect the apis from normal users. They use consul which is used to resolve the ips recieved from front end.
Now coming to your question. There can be two cases.
Frontend recieves the data from backend and applies their own logic. So i can tell you that the front end has already recieved all the data related to variants and its price. So they are storing it at their end in some data structure. And they update the values on the view only when you click the item.( You can find if this is the case if after clicking there is no delay and result is shown instantly). Though you can check the response fetched from the backend, it is bound to have all data which frontend is recieving and storing. You can check in chrome-debug tools->network->gql to filter
Second case in which it is fetching data each time from backend when you click. In that case it is changing some parameters on the link. If you can find out some kind of logic behind how parameters are being changed for similar variants maybe you can fetch the information then.(There will be delay in showing results after clicking)
I think its a good idea to use selenium or cypress. I know it will take time. But its the best option you got.
Is there any way to know the updated Gas Price from EthGasStation to be used in python. I want to regularly know the average transaction confirmation time by gas price before send a transaction to the blockchain.
I have done a quick search and find that I can use web scripting with python to retrieve the data I want from the website, but what I don't know how I can get data from figures because the confirmation time is represented as figures in the website. Is there any other website give me the transaction confirmation time by gas price as a raw data, so I can get it and use it in my python application?
Thanks in advance
I am not sure if this is what you're looking for, but here are 2 possible solutions I have found.
There is a free API that allows 5 requests per second.
https://etherscan.io/apis#gastracker
You can use Selenium to load the following page, input your data, and scrape the details accordingly. https://ethgasstation.info/calculatorTxV.php
I’m looking into web scraping /crawling possibilities and have been reading up on the Scrapy program. I was wondering if anyone knows if it’s possible to input instructions into the script so that once it’s visited the url it can then choose pre-selected dates from a calendar on the website. ?
End result is for this to be used for price comparisons on sites such as Trivago. I’m hoping I can get the program to select certain criteria such as dates once on the website like a human would.
Thanks,
Alex
In theory for a website like Trivago you can use the URL to set the dates you want to query but you will need to research user agents and proxies because otherwise your IP will get blacklisted really fast.
So I want to make a program that scrapes info from the lifesaving society website. https://www.lifesavingsociety.com/find-a-member.aspx
I would be coding this in python btw. Basically, I would want the program to take in a list of lifesaving id's and return info about when each person's certifications are expiring. The problem is, to scrape this data I feel that I would need to have the program individually enter each lifesaver's id # and then scrape the data. This feels like something that would take a long time and I thought maybe there was a better way to do it. Any ideas?