How to have python code get data from a website then update it on another website? - python

I want to have some python code use selenium or bs4 to check a website hourly for updates and the get that information and put it on a website in whatever way I want to format it. I would like to know what's the best way to connect these two together.

I would probably set up a cron job (scheduled task on windows) to run hourly to run your script to scrape the website you're getting data from. The logic in your script will depend on the website you're entering data into, but broadly speaking I would scrape the data, process it, then either send a POST request to the website or update the data source (a db, file, whatever) directly.

Related

can I add my script/apis to my Django project? If so how can I do this

So I am building a Django web app, and I want to allow users to search for their desired crypto currency and then return the price. I plan on getting the price from coinbase or some other site that already presents this information. How would I go about this. I figure I would have to wrote the script to get the price under views.py. What would be the best approach? Can I add a web scrapping script that already does this to django? Or would I have to connect say coinbases api to my Django project. If so how do I do this?
If you're looking at using an API from a service to get these prices then Request is something you can look at.
If you're looking at scrapping the data from a page, then you'll probably want to look at BeautifulSoup, or scrapy or one step further selenium
As for where you call it, that's on you. if it's data that you're always going to need, then you could look at runnning your script as a task or worker so you're always getting an up-to-date price. Otherwise you could trigger the script and wait for the response to come back. Lot's of draw backs to both of these, and I'm guessing if the site doesn't provide an API for getting the info you need through a managed endpoint they will probably block your requests if you make too many of them.
but that's a starter for 10

Can a Python + R file share a webdriver session between the languages?

I am working on a scraper built in RSelenium. A number of tasks are more easily accomplished using Python, so I've set up a .Rmd file with access to R and Python code chunks.
The R-side of the scraper opens a website in Chrome, logs in, and accesses and scrapes various pages behind the login wall. (This is being done with permission of the website owners, who would rather users scrape the data ourselves than put together a downloadable.)
I also need to download files from these pages, a task which I keep trying in RSelenium but repeatedly come back to Python solutions.
I don't want to take the time to rewrite the code in Python, as it's fairly robust, but my attempts to use Python result in opening a new driver, which starts a new session no longer logged in. Is there a way to have Python code chunks access an existing driver / session being driven by RSelenium?
(I will open a separate question with my RSelenium download issues if this solution doesn't pan out.)
As far as I can tell, and with help from user Jortega, Selenium does not support interaction with already open browsers, and Python cannot access an existing session created via R.
My solution has been to rewrite the scraper using Python.

How can I use a session ID in python for web-scraping dataes?

I want to webscraping from a website, where i have to log in first. The problem is that, there is a "robotprotection" too (so I have to verify that i am not a robot + a recaptcha-security.), but it's chances of success (passing the captcha) is ~30% and this is horrible for me.
There is another possibility maybe which one i am log in with my browser (for example chrome or firefox), and after im going to use this session ID in my python script to webscraping dataes automatically?
So, more simplier: I want to webscraping tables from a website, so i have to log in first. This 30% succes rate is not enough good for me, so i hope there is another possibilty : log in manually, and after use this session in python?!
After that, there is a textbox in this page, where i want to write what i want to search, and after it is navigate to the page, where i'll found the table and dataes.
Any ideas, or it is possible?
(now i have only a script which one i have to download the html code to this datapage, and after change some name in the code manually..it is a very big waste time, i hope i can automate it more.) - Python 2.7

Display Data In Real Time With Django

I have a simulator application that continuously spits out data, formatted in JSON, to a given host name and port number (UDP). I would like to be able to point the simulator output to a Django web application so that I can monitor/process the data as it comes in.
How do I receive and process data in real time using Django? What tools or packages are available to accomplish this? I did come across this answer: How to serve data from UDP stream over HTTP in Python?, but I don't completely understand.
Ex: Similar to this page: http://money.cnn.com/data/markets/
ALSO, I don't need to store any of the streaming data in a database. I just need to perform lookups based on the streaming data. Maybe it's not a Django issue at all?
Using Javascript.
Create a webpage with all the results, and then use javascript to collect the data from the page, and update it every X seconds.
Have the webpage be the JSON data, and the javascript grab it an interpret it.
get html code using javascript with a url
Then update the page using javascript. ww3 schools has great JS tutorials

Refresh Webpage using python

I have a webpage showing some data. I have a python script that continuously updates the data(fetches the data from database, and writes it to the html page).It takes about 5 minutes for the script to fetch the data. I have the html page set to refresh every 60 seconds using the meta tag. However, I want to change this and have the page refresh as soon as the python script updates it, so basically I need to add some code to my python script that refreshes the html page as soon as it's done writing to it.
Is this possible ?
Without diving into complex modern things like WebSockets, there's no way for the server to 'push' a notice to a web browser. What you can do, however, it make the client check for updates in a way that is not visible to the user.
It will involve writing Javascript & writing an extra file. When writing your main webpage, add, inside Javascript, a timestamp (Unix timestamp will be easiest here). You also write that same timestamp to a file on the web server (let's call it updatetime.txt). Using an AJAX request on the page, you pull in updatetime.txt & see if the number in the file is bigger than the number stored when you generate the document, refresh the page if you see an updated time. You can alter how 'instantly' the changes get noticed but controlling how quickly you poll.
I won't go into too much detail on writing the code but I'd probably just use $.ajax() from JQuery (even though it's sort of overkill for one function) to make the calls. The trick to putting something on a time in JS is setinterval. You should be able to find plenty of documentation on using both of them already written.

Categories