When you want to insert an image in Excel, you would simply go to the "insert" tab, click "image", and then choose an image from your machine. However, the alternative is to type a URL into the text box, and then Word/Excel/whatever Office document you're using will know to attempt to contact that URL and download the image.
The trouble is: I need to do this programmatically. I need to use Python and I'm using openpyxl. I can't find any functionality for this in openpyxl and definitely not in vanilla Python.
Does anyone have suggestions for how I might accomplish this?
I've been all over the web and this doesn't seem to be a topic that's talked about often. Any ideas are welcome.
Related
I have an FPDF2 table created using this script. I used to output it to a blank page and merge it to an existing pdf, which works fine.
But now we need to add the table to an existing page in the pdf and then if it doesn't fit, we insert new pages. And that's the problem.
FPDF doesn't seem to be able to draw to an existing page. I know I can use reportlab canvas can.drawString() to draw to an existing page, but I don't know if reportlab can draw an FPDF object.
Also, if I were to ditch FPDF and use only reportlab to draw a table, I don't know how to detect the end of the page and insert a new page if needed. I'm not starting at the start of a page, I'll be starting somewhere in the middle.
I would prefer to be able to use the FPDF2 script I already have and somehow add the output at a specific x,y position in a page though, if possible. Have you ever had this issue?
I also have Pypdf2 installed and used in the same project, but I think that only reportlab can do the job. Maybe I need to detect the end of the page via Pypdf2 and write to the page via reportlab?
Since no answer exists as of now and since I need an answer to finish my task, I did the following:
I added the FPDF table to a blank PDF page, and pushed it down, by using set_y(100), to have a half blank page to work with.
And I took a screenshot of the items which need to be placed above the table and then added them to the same page by using reportlab canvas.drawImage()
If there's a better solution, please post an answer and I'll accept it and refactor my code. For now, I'll accept my answer to close this question.
If I go to this website:
https://covid.cdc.gov/covid-data-tracker/#ed-visits
and click the "download" button (on the right), a .csv file is downloaded.
I can't find the address of that csv file, so that I could fetch it automatically with pd.read_csv(). I had a snoop around the web inspector thing, but I don't really know what I'm doing, and nothing jumped out at me as being the obvious answer. I've also looked around various other sites to try to find an API that gives me access to this data, bat there doesn't appear to be such thing.
Can anyone help me with that?
Thanks so much!
You might want to open your web inspector and go to the "Network"-Tab and then reload the page. You are going to see, that there's never a csv actually being loaded.
Also the export button doesn't link to any file. Rather it has some javascript binding, that exports the existing data in your client (the browser) as a csv to your filesystem. In other words: There isn't an address for that file. Its being created in your browser.
So even better, you can read the json directly. Just find the correct data in the Network-Tab, I think it might be this: https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data
So instead you could directly read the json:
pd.read_json('https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data') and then filter for the data that you need.
GOAL
Extract data from a web page.. automatically.
Data are on this page... Be careful , it's in French...
MY HARD WAY, manually
I choose the data I want by clicking on the desired fields on the left side ('CHOISIR DES INDICATEURS')
Then I select ('Tableau' = Table), to have data table.
Then I click on ('Action'), on the right side, then ('Exporter' = Export)
I choose the format I want (ie CSV) and hit ('Executer'= Execute) to download the file.
WHAT I TRIED
I tried to automate this process, but It's like an impossible task for me. I tried to inspect the page for the network exchanges to see if there is an underlying server I could make easy json request.
I mainly work with python and frameworks like BS4 or scrapy.
I have few data to extract, so I can easily do it manually. Thus this question, I just purely for my own knowledge, to see if it is possible to scrape a page like that.
I would appreciate if you could share your skills!
Thank you,
It is possible. Check this website for details. This website will tell you how to scrape a website with an example.
https://realpython.com/beautiful-soup-web-scraper-python/#scraping-the-monster-job-site
I want to translate site using Google Websites Translate and then download it like pdf or jpg. I tried to use wkhtmltopdf, but Google Websites Translate return result in frame. Thus if I take a screenshot (pdf or jpg) of translated page I get empty pdf.
Converting HTML to PDF may not work here.
Go for getting snaps of webpages in png/jpeg format.
try FireShot Chrome Extension.
I am not sure if it'll work, Trying is not bad.
I am scripting in python for some web automation. I know i can not automate captchas but here is what i want to do:
I want to automate everything i can up to the captcha. When i open the page (usuing urllib2) and parse it to find that it contains a captcha, i want to open the captcha using Tkinter. Now i know that i will have to save the image to my harddrive first, then open it but there is an issue before that. The captcha image that is on screen is not directly in the source anywhere. There is a variable in the source, inside some javascript, that points to another page that has the link to the image, BUT if you load that middle page, the captcha picture for that link changes, so the image associated with that javascript variable is no longer valid. It may be impossible to gather the image using this method, so please enlighten me if you have any ideas on this.
Now if I use firebug to load the page, there is a "GET" that is a direct link to the current Captcha image that i am seeing, and i'm wondering if there is anyway to make python or ullib2 see the "GET"s that are going on when a page is loaded, because if that was possible, this would be simple.
Please let me know if you have any suggestions.
Of course the captcha's served by a page which will serve a new one each time (if it was repeated, then once it was solved for one fake userid, a spammer could automatically make a million!). I think you need some "screenshot" functionality to capture the image you want -- there is no cross-platform way to invoke such functionality, but each platform (or desktop manager in the case of Linux, BSD, etc) tends to have one. Or, you could automate the browser (e.g. via SeleniumRC) to "screenshot" (e.g. "print to PDF") things at the right time. (I believe what you're seeing in firebug may be misleading you because it is "showing a snapshot"... just at the html source or DOM level rather than at a screen/bitmap level).