General trick is that I scrape a website and collect data from it and store in a Panda Data Frame which then I extract to an "Example Sheet" in "Already existing Excel file with existing sheets in it". Everything goes smoothly, the Data is inserted and all.
The question is : is it possible to also create a new file in excel? Meaning that I have a template in excel file which lacks data, I insert that data using my python script and want to save it as a new file while keeping the existing template intact.
Is there a way to it?
Using Beautiful soup, ExcelWriter , PandaDataFrame, Requests,openpyxl
Ive been looking through many threads but didn't find an answer to the problem.
Yes, of course.
eg:
Read the template: new = pd.read_excel('....')
Insert the data into the read with the help of DataFrame.
Save the DataFrame with a new name: new.to_excel('newfile.xlsx')
If you share the template, we will explain it better.
Related
I'm creating weekly reports and the data all come from a google sheet with the same format. Instead of entering the data manually in the word file. I created a Word template and want to import the data automatically from the google sheet to my Word template.
My Word template looks like:
The bolded data in the Word file come from the "New" column. The green/red data in the Word file come from "Diff" column.
I know how to get these data from the google sheet using Pandas, but I want to know how should I place them in the specific area in my word template.
I think the best way to go about it would be to go from Google Sheet -> Google Doc and take advantage of the native integration there. From there you can just export it as a .docx file or something and it should be openable in Word as well. I did this exact thing a while back, so it's definitely doable (if not easier now) but here's a place to start.
I have a question for you, I'm working on a new jenkins instance and as a result of the job I get a csv file with errors if there were any during the test. I would like to generate an HTML report based on this csv file, which would be more convenient to use than opening excel and loading the csv file to see the errors. I came across a plugin like HTML Publisher, unfortunately I don't know if it supports generating HTML reports based on csv files. Alternatively, you could do something like this with a python script and show the resulting html file in artifats. Do you have any ideas ??
I have one use-case .Lets say there is pdf report which has data from testing of some manufacturing components
and this PDF report is loaded in DB using some internally developed software.
We need to develop some reconciliation program wherein the data needs to be compared from PDF report to Database. We can assume pdf file has a fixed template.
If there are many tables and some raw text data in pdf then how mysql save this pdf data..in One table or in many tables .
Please suggest some approach(preferably in python) for comparing data
Finding and extracting specific text from URL PDF files, without downloading or writing (solution) have a look at this example and see if it will help. I found it worked efficiently for me, this is if the pdf is URL based, but you could simply change the input source to be your DB. In your case you can remove the two if statements under the if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal): line. You mention having PDFs with the same template, if you are looking to extract text from one specific area of the template, use the print statement that has been commented out to find coordinates of desired data. Then as is done in the example, use those coordinates in if statements.
This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.
I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.
I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.
I've made and deleted a couple of threads on here in reference to a single issue I'm having, only because I haven't been able to accurately describe what I'm trying to do.
Essentially: I want to automate a search process. This would involve automatically selecting a radio button on this webpage, then feeding each cell of column B:B in this excel spreadsheet into one of the forms enabled by the radio button. Ideally, the process would also populate the spreadsheet (or a new .csv or .txt file) with the output from each search.
I am utterly at a loss as to how to integrate excel with Python. I have the following script so far:
import mechanize
URL = 'http://www.adviserinfo.sec.gov/IAPD/Content/Search/iapd_Search.aspx'
br = mechanize.Browser()
br.open(URL)
for form in br.forms():
print "Form name:", form.name
print form
form.set_all_readonly(False)
But from there I don't know what to do. I'm not a programmer; teaching myself Python for this project. Any and all help will keep me from manually copying and pasting 1,175 names into a search bar, which is neither a good use of my time nor an opportunity to learn anything new. Thank you!
You'll have to see a Excel file in txt (open it with notepad or anything you like). That is the "real" archive, the divisions, pretty much like html.
For each row you want to write on excel, instead of print "Form name:", form.name you'll have to write on a file something like "form.name".
Do not forget to create the archive before the printing. Note that you'll create a xls file, and write him as you saw in the txt.