Scraping data from a website with a search box - python

First of all I want to apologize if my question is too broad or generic, but it would really save me a lot of needlessly wasted time to get an answer to guide me in the right direction for the work I want to do. With that out of the way, here it goes.
I am trying to retrieve some publicly available data from a website, to create a dataset to work with for a Data Science project. My big issue is that the website does not have a friendly way to download it, and, from what I gathered, it also has no API. So, getting the data requires skills that I do not possess. I would love to learn how to scrape the website (the languages I am most comfortable with are Python and R), and it would add some value to my project if I did it, but I also am somewhat pressured by time constraints, and don't know if it is even possible to scrape the website, much less to learn how to do it in a few days.
The website is this one https://www.rnec.pt/pt_PT/pesquisa-de-estudos-clinicos. It has a search box, and the only option I configure is to click the banner that says "Pesquisa Avançada" and then mark the box that says "Menor de 18 anos". I then click the "Pesquisar" button in the lower-right, and the results that show up are the ones that I want to extract (either that or, if it's simpler, all the results, without checking the "Menor de 18 anos" box). In my computer, 2 results show up per page, and there are 38 pages total. Each result has some of it details in the page where the results appear but, to get the full data from each entry, one has to click "Detalhes" in the lower right of each result, which opens a display with all the data from that result. If possible, I would love to download all the data from that "Detalhes" page of each result (the data there alerady contains the fields that show up in the search result page).
Honestly, I am ready to waste a whole day manually transcribing all the data, but it would be much better to do it computationally, even it it takes me two or three days to learn and do it.
I think that, for someone with experience in web scraping, it is probably super simple to check if it is possible to download the data I described, and what is the best way to go about it (in general terms, I would research and learn it). But I really am lost when it comes to this, and just kindly want to ask for some help in showing me the way go about it (even if the answer is "it is too complicated/impossible, just do it manually"). I know that there are some Python packages for web scraping, like BeautifulSoup and Selenium, but I don't really know if either of them would be appropriate.
I am sorry if my request is not exactly a short and simple coding question, but I have to try to gather any help or guidance I can get. Thank you in advance to everyone who reads my question and a special thank you if you are able to give me some pointers.

Related

Is it possible to write a program that checks if a pinned comment is present under a Youtube video every certain period of time? (using python?)

I'm not asking for someone to write out specific codes for me, but rather just point me towards the general direction.
Sometimes after writing and pinning a comment under a Youtube video, it disappears!
It may happen hours later, it may happen a few minutes later, all randomly. Then re-appears hours later, not sure why.
Since I am currently learning programming (python), I might as well try to tackle the problem myself. I want to make a program that regularly checks if the pinned comment is still present, and if not, send me an alert in whatever way so I can go write and pin a new comment for the time being. (and maybe learn a few things during the process)
So far I learned some basic stuff like how to get source code from a webpage, output it to somewhere like a txt file, and also searching with Regex. However when I check out the source code of a Youtube video, the comments text aren't there. Where do I begin learning about the necessary things needed to make this program work?

How to update a grid on my website using the twitter API in real-time

I was recently inspired by r/place and wanted to do a much smaller version for a community of mine. The large scale idea is to use the twitter API to take user input (via responses to a tweet of mine) and turn it into a real-time updated canvas. I've sorted out how to use the twitter API such that I can pick out tweets with valid colors and coordinates for the grid, but I don't have a way of converting the output of my program into some viewable content online.
My current idea is to update my website every time new valid coordinates and colors are received, and just put a link to my website in the tweet, that way after someone responds to my tweet with the color and coordinates they've picked, they can click the link and be taken to the updated canvas.
The problem is... I don't know how to do ANY of that! I am willing to learn, but I know very little about web development and don't know if any of this is plausible. Is this a good solution? Are there better ones? What are the first steps in converting my programs's outputs into viewable (and live updated) content on the web.
I understand that this problem is kind of vague, but it's mostly because I don't want to rule out viable solutions. Here is the core of the problem:
TLDR:
I need to figure out how to make a live-updated grid/canvas on my website according to a python script.
and
I would like to run the python script on my website's servers instead of having to run it locally on my computer (that way the program can run even when I can't have my computer on)
Would really appreciate if anyone could outline how these things would be done on a high level or point me in the direction of helpful resources!
Thanks in advance!

Listen to text in browse

I am wondering if and how it is possible to 'listen to' the text that is in my browser window.
I am specifically NOT looking to scrape websites in the sense that I want to crawl them for information, I am just interested in interacting with an arbitrary page that makes my browser output text.
Example
Suppose I am asking a question on Stack Overflow.
By the time I type a title of 'Listen to text in browser' the suggestions appear, one of them contains plain text 'Listen to browser request'
As soon as the word 'request' is on my screen, the browser will get shut down and I order a pizza
What would a good solution look like
I want to be able to do this for practically any website that somehow makes my computer show simple text. Ideally without having knowledge of how the text is generated.
I want this to be somewhat fast, subsecond should be possible
I do not want to hit the website or its api's, just want to use the information that is already on my screen.
I am not too picky about the OS and browser requirements.
I can also imagine there may be corner cases wher it is hard (perhaps text is shown as a picture, or perhaps parts of a sentence are actually spread across multiple textboxes that are just displayed to eachother). For now I just wonder how this can be done for a simple page.
Bonus points if it could even capture text from the field in which I am typing myself, so I can scald myself when I am about to say something stupid.
What have I come up with so far
I am in general confident that I can process the text once I get it into a tool, however the main challenge is on how one can listen to the browser.
I tried looking at the source code, but that does not appear to contain this dynamic text
Perhaps there is a steraming API on the browser itself, that can stream out changes?
Perhaps there is a way to grab all text from a browser, perhaps 10x per second or so
Using the normal scraping solutions are completely not what I want, so I do not want to fire a request to the webserver 10x per second.
In the worst case, I suppose we could use screen capture software, followed by text recognition, but I really hope there is something more elegant
I suppose there may be automation/testing software that can do this. That would be an answer but something lightweight (e.g. a python library) would be nicest.
I have tried searching but did not find any solution, or even the question. Presumably I am using the wrong words.

Trying to build a program to organize safety data and display it in graphs

I'm pretty new to coding, I've done more with HTML and CSS than anything, but now I'm trying to learn a new language because I have an idea for something I want to build. I'm not exactly sure what language would be the best for this. I've been trying to learn a little python, but I want to make sure I'm on the right road. What I'm trying to build is a program that would let me have a form that we would fill out anytime we have an accident at my job, the form would have questions like, type of injury, department associate was working in, time of injury, day of the week, etc.. It would also keep track of our Non-Meds, Medicals and osha reportables by shift and in total for the week, month, year or whatever time span you want to see and show it in graphs or pie charts, so the information could be used to determine where our weaknesses are in terms of safety. So the form would basically take the information you input, and store it in the correct category and automatically update the charts. I know I'm probably trying to take on something kind of big, being as I don't know a whole lot about coding, but I feel like working on this would give me a good starting point. I've already made a spreadsheet in excel to get us by for now, but I would like to make something thats more user friendly to update anytime an accident occurs. So what language do you guys think would being a good one to learn to tackle this?
Thanks for any help in advance!
I think I would use C# and WPF for creating the input form, because it is pretty easy to build GUIs for windows with C# and Visual Studio. Then I would store the input information into a SQL database for easy querying of the data. To analyze the data I would use python with MatPlot lib to make graphs and visualize the data in an understandable way. Then the C# GUI could display the graphs generated by python in a classical GUI with dropdowns or tabs or whatever fits the purpose.

How to modify the web-browsing history through a python script?

I want to programatically modify the browsing history of Chrome through a Python code.
I already knew that many browsers use sqllite database for the browsing history. And asked google and all the answers and questions were about importing/exporting the data of the browsing history.
However what I want to do is to modify the data in the database to delete specific or all the sites that I've visited.
I would like to ask you if there was some modules in Python that helps doing the task through code.
If not applicable then we will need to switch into making the code take control over the mouse and the screen, open chrome, go to the browsing history, select the rows wanted deleted and press delete/confirm. Which will be impossible for a beginner like me to gather the determination and resources to do it.
This may be helpful How can I delete all web history that matches a specific query in Google Chrome
it uses javascript, but you can easily translate it into some simple python and use something like BS BeautifulSoup or just scroll down and there's some sql things going on which look promising
And because you'll have the code already in one language translating it should be pretty simple, even for a beginner and especially using python. PS this took me about as long to find as it was to read your question. it's rare that you will want to do something that someone hasn't already done..just maybe not in python ;)

Categories