I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.
I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.
However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.
My sample file has three rows and looks like this:
My actual data has hyperlinks in the way that I have it for "Google" in my test data set.
The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.
Sample Test file here: Text.xlsx
Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
column_indices = [1]
max_col = ws.max_column
ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1):
for col in column_indices:
print(ws.cell(row, column=1).hyperlink.target)
ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target
wb.save('test.xlsx')
The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function.
The values looks like below,
>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'
Suggested code to handle HYPERLINK :
val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
hyplink = ws.cell(4,1).value # Or use re module for more robust check
Note : The second for loop to iterate over columns seems not required since you are always using column=1.
Related
I tried to used docxtpl with Python, but the return is very ugly, neerly unreadable. I tried using Dataframe, list, ... but I doesn't have a clean table in my word. Does any one know how to make it with Python ? Or is it more simple using VBA ?
(And docx doesn't allow me to add INSIDE the Word, the table i want.)
With Dataframe the table is trounce. And with list, the columns doesn't fit....
Thanks a lot
from docxtpl import DocxTemplate
doc = DocxTemplate(fichier_test)
context = {para_multiple[i]: liste_dataframe[i] for i in range(len(para_multiple))}
doc.render(context)
doc.save(file_location)
Where para_multiple is a list with all the tags in the .doc and liste_dataframe, the list of dataframe containing the data i need on the doc.
(This is what I get for now, i can't find out how to display it correctly)
I need to delete the tabulation and the index
https://i.stack.imgur.com/r3CQJ.png
In Windows, if the data you're trying to copy from Excel is in a named range, you could paste it in your Word document using the following bit:
from win32com import client
excel = client.Dispatch("Excel.Application")
word = client.Dispatch("Word.Application")
doc = word.Documents.Open("C:/word_file.docx")
book = excel.Workbooks.Open("C:/excel_file.xlsx")
sheet = book.Worksheets(1) #depends on which sheet your named range is located
sheet.Range(NAMED_RANGE_OF_YOUR_TABLE_IN_EXCEL).Copy()
target = doc.Range()
findtext = "WHERE_I_WANT_TO_PASTE_MY_TABLE_IN_WORD" #use a placeholder in your word document where you want the table to appear
if target.Find.Execute(FindText=findtext):
table_range = doc.Range(Start = target.start, End=target.end)
table_range.PasteExcelTable(False, False, False)
It will keep the formatting from the workbook.
I am trying to create a Hyperlink for each item in a column based on another column.
Here is an image to help you understand better:
Each title should hyperlink to the corresponding URL. (when you click apple, it should go to apple.com, when you click banana it should go to banana.com, so on) Is there a way to do this to a CSV file in python?
(Let's say my data is 2000 rows)
Thanks in advance
You can use (I used) the pandas library to read the data from csv and later write it to excel, and leverage the Excel function HYPERLINK to make the cell a, well, Hyperlink. The Excel function for HYPERLINK requires the url we are going to (with http:// or https:// at the beginning) and then the visible text, or friendly-name.
One thing the Excel file will not have when you open it is blue underlined text for the Hyperlinked cells. If that is needed, you can probably leverage this solution somewhat and also use the XlsxWriter module.
import pandas as pd
df = pd.read_csv("path\test.csv") #put your actual path here
# this will read the file and save it is a 'dataframe', basically a table.
df['title'] = '=HYPERLINK("https://' + df["url"] +'","' + df["title"]+'")'
"""the df that we originally pulled in has the columns 'title' and 'url'. This line re-writes the values in
'title' to use the HYPERLINK formula in Excel, where the link is the value from the URL column with 'https://'
added to the beginning, and then uses the value from 'title' as the text the user will see in Excel"""
df.to_excel("save_location\test2.xlsx",index=False) #this saves the new file as an Excel. index=False removes the index column that was created when you first make any dataframe
If you want your output to just be one column, the one they click, your final line will be slightly different:
df['title'].to_excel("save_location\test2.xlsx",index=False)
Did not work or probably you have included too many 's
import pandas as pd
df = pd.read_csv('test.csv')
df['URL'] = 'https://url/'+df['id'].astype(str)
keep_col = ['URL']
newurl = df[keep_col]
newurl.to_csv("newurl.csv", index=False)
This code is working but the output file does not show a clickable url
I am reading an excel sheet and plucking data from rows containing the given PO.
import pandas as pd
xlsx = pd.ExcelFile('Book2.xlsx')
df = pd.read_excel(xlsx)
PO_arr = ['121121','212121']
for i in PO_arr:
PO = i
PO_DATA = df.loc[df['PONUM'] == PO]
for i in range(1, max(PO_DATA['POLINENUM'].values) +1):
When I take this Excel sheet straight from its source, my code works fine. But when I cut out only the rows I want and paste them to a new spreadsheet with the exact same formatting and read this new spreadsheet, I have to change PO_DATA to look for an integer instead of a string as such:
PO_DATA = df.loc[df['PONUM'] == int(PO)]
If not, I get an error, and calling PO_DATA returns an empty dataframe.
C:\...\pandas\core\ops\array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
I checked the cell formatting in Excel and in both cases, they are formatted as 'General' cells.
What is going on that makes it so when I chop up my spreadsheet, I have to look for an integer and not a string? What do I have to do to make it work for sheets I've created and pasted relevant data into instead of only sheets from the source?
Excel can do some funky formatting when copy and paste is used: ctl-c : ctl-v.
I am sure you tried these but...
A) Try copy ctl-c then ctl-alt-v:"v":enter ... on new sheet/file
B) Try using the format painter in Excel : Looks like a paintbrush on the home tab - select the properly formatted cells first - double click format painter - move to your new file/sheet - select cells you want the format to conform to.
C) Select your new file/table you pasted into - select purple eraser icon from the top options in excel - clear all formats
Update: I found an old related thread that didn't necessarily answer the question but solved the problem.
you can force pandas to import values as a certain datatype when reading from excel using the converters argument for read_excel.
df = pd.read_excel(xlsx, converters={'POLINENUM':int,'PONUM':int})
I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()
I am using openpyxl to manipulate a Microsoft Excel Worksheet.
What I want to do is to add a Conditional Formatting Rule that fills the rows with a given colour if the row number is even, leaves the row blank if not.
In Excel this can be done by selecting all the worksheet, creating a new formatting rule with the text =MOD(ROW();2)=0 or =EVEN(ROW()) = ROW().
I tried to implement this behaviour with the following lines of code (considering for example the first 10 rows):
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws2.conditional_formatting.add('A1:A10', FormulaRule(formula=['MOD(ROW();2) = 0'], stopIfTrue=False, fill=redFill))
My program runs correctly but when I try to open the output Excel file, it tells me that the file contains unreadable content and it asks me if I want to recover the worksheet content. By clicking yes, the worksheet is what I expect but there is no formatting.
What is the correct way to apply such a formatting in openpyxl (possibly to the entire worksheet)?
Unfortunately, the way formulae are handled in conditional formatting is particularly opaque. The best thing to do is to create a file with the relevant conditional format and inspect the relevant file by unzipping it. The rules are stored in the relevant worksheet files and the formats in the styles file.
However, I suspect that the problem may simply because you are using ";" to separate parameters in the function: you must always use commas for this.
A sample formula from one of my projects:
green_text = Font(color="006100")
green_fill = PatternFill(bgColor="C6EFCE")
dxf2 = DifferentialStyle(font=green_text, fill=green_fill)
r3 = Rule(type="expression", dxf=dxf2)
r3.formula = ["AND(ISNUMBER(C2), C2>=400)"]