I have been using Django and xlsxwriter on a project that I am working on. I want to use data_validation in Sheet1 to pull in the lists that I have printed in Sheet2. I get the lists to print, but am not seeing the data_validation in Sheet1 when I open the file. Any insight on what I am doing incorrectly is much appreciated!
wb = xlsxwriter.Workbook(TestCass)
sh_1 = wb.add_worksheet()
sh_2 = wb.add_worksheet()
col = 15
head_col = 0
for header in headers:
sh_1.write(0,head_col,header)
sh_2.write(0,head_col,header)
list_row = 1
list = listFunction(headerToModelDic[header])
for entry in list:
sh_2.write(list_row,col,entry)
list_row += 1
sh_1.data_validation(1,col,50,col,{'validate':'list','source':'=Sheet2!$A2:$A9'})
col += 1
wb.close()
Note: The reason I am not pulling the list directly from the site is because it is too long (longer than 256 characters). Secondly, I ultimately would like the source range in the data validation to take in variables from sheet2, however I cannot get sheet 1 to have any sort of data validation as is so I figured I would start with the absolute values.
It looks like the data ranges are wrong in the example. It appears that you are writing out the list data in a column but the data validation refers to a row of data.
Maybe in your full example there is data in that row but in the example above there isn't.
I've modified your example slightly to a non-Django example with some sample data. I've also changed the data validation range to match the written data range:
import xlsxwriter
wb = xlsxwriter.Workbook('test.xlsx')
sh_1 = wb.add_worksheet()
sh_2 = wb.add_worksheet()
col = 15
head_col = 0
headers = ['Header 1']
for header in headers:
sh_1.write(0,head_col,header)
sh_2.write(0,head_col,header)
list_row = 1
list = [1, 2, 3, 4, 5]
for entry in list:
sh_2.write(list_row,col,entry)
list_row += 1
sh_1.data_validation(1,col,50,col,
{'validate':'list','source':'=Sheet2!$P2:$P6'})
col += 1
wb.close()
And here is the output:
Related
I'm trying to iterate a For Loop such that the elements in the two lists get exported to excel in columns A and B. However, whenever I run the code it only displays a single number in column B row 1 (B1).
The entire code is too long so I'm attaching just a snippet of the code where I am stuck.
This is what I'm getting in my excel file when I run the code
#Exporting data to Excel
workbook = xlsxwriter.Workbook('efficient_front.xlsx')
worksheet = workbook.add_worksheet()
i = 1
if company == first_company:
for perc_return in returns:
worksheet.write('A' + str(i) , perc_return)
i =+ 1
else:
for perc_return in returns:
worksheet.write('B' + str(i), perc_return)
i =+ 1
workbook.close()
consider the given lists => prod_codes, ID_codes. The below code will write each list as a column in an excel sheet. The parameters of worksheet.write() are as shown below
worksheet.write(row_number,column_number,value_to_be_written)
prod_codes = [1001,1002,1003,1004,1005,1006]
ID_codes = [123,345,567,789,908,345]
with xlsxwriter.Workbook('PATH for XLSX to be created') as workbook:
worksheet = workbook.add_worksheet("NAME_ME")
for index,value in enumerate(ID_codes):
worksheet.write(index,0,value)
for index,value in enumerate(prod_codes):
worksheet.write(index,1,value)
Please go through the official documentation, it's clear how to perform what you need to perform. https://xlsxwriter.readthedocs.io/working_with_data.html
You have a silent syntax error in your code with i =+ 1 instead of i += 1. The code translates to i = +1 which is equivalent to i = 1 so it doesn't iterate.
Here is an alternative way to structure you code with enumerate() and the (row, col) syntax of worksheet.write():
import xlsxwriter
workbook = xlsxwriter.Workbook('efficient_front.xlsx')
worksheet = workbook.add_worksheet()
returns = [1, 2, 3, 4, 5]
company = True
first_company = False
if company == first_company:
col_num = 0
else:
col_num = 1
for row_num, perc_return in enumerate(returns):
worksheet.write(row_num, col_num, perc_return)
workbook.close()
Output:
What I intend to do :
I have an excel file with Voltage and Current data which I would like to extract from a specific sheet say 'IV_RAW'. The values are only from 4th row and are in columns D and E.
Lets say the values look like this:
V(voltage)
I(Current)
47
1
46
2
45
3
0
4
-0.1
5
-10
5
Now, I just want to take out only the values starting with a voltage (V) of 45 and shouldnt take negative voltages. The corresponding current (I) values are also needed to be taken out. This has to be done for multiple excel files. So starting from a particular row number cannot be done instead voltage values should be the criterion.
What I know:
I know only how to take out the entire set of values using openxyl:
loc = ("path")
wb = load_workbook("Data") #thefilename
ws = wb["IV_raw"] #theactiveworksheet
#to extract the voltage and current data:
for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):
print(row)
I am a noon coder and new to python. So it will be really helpful if you guys could help. If there is a simplified versions with pandas it will be really great.
Thank you in advance
The following uses pandas which you should definitly take a look at. with sheet_name you set the sheet_name, header is the row index of the header (starting at 0, so Row 4 -> 3), usecols defines the columns using A1 notation.
The last line filters the dataframe. If I understand correctly, then you want Voltage between 0 and 45, thats what the example does and df is your resulting data_frame
import pandas as pd
file_loc = "path.xlsx"
df = pd.read_excel(file_loc,
sheet_name = 'IV_raw',
header = 3,
usecols = "D:E")
df = df[(df['V(voltage)'] > 0) & (df['V(voltage)'] < 45)]
Building on from your example, you can use the following example to get what you need
from openpyxl import load_workbook
wb = load_workbook(filepath,data_only=True) #load the file using its full path
ws = wb["Sheet1"] #theactiveworksheet
#to extract the voltage and current data:
data = ws.iter_rows(min_col=4, max_col=5, min_row=2, max_row=ws.max_row, values_only=True)
output = [row for row in data if row[0]>45]
you can try this,
import openpyxl
tWorkbook = openpyxl.load_workbook("YOUR_FILEPATH")
tDataBase = tWorkbook.active
voltageVal= "D4"
currentVal= "E4"
V = tDataBase[voltageVal].value
I = tDataBase[currentVal].value
I'm trying to find a creative way to get the dataframe of several sheets within a spreadsheet that's quite irregular but I can't find the way to do it.
If I try this:
file= 'filename.xlsx'
df = xlrd.open_workbook(file)
print(df)
This is my current output:
A | B | C
1 Random text | Empty cell|Empty cell
------------------------------------
2 Empty cell | |
------------------------------------
3 Empty cell | |
------------------------------------
4 CODE |HEADER 2 | HEADER 3
------------------------------------
5 INFORMATION |INFORMATION|INFORMATION
I want to start my dataframe in the CODE row and column, but pandas just gets the "Random text" cell as the first cell
This is my desired output:
4 CODE |HEADER 2 | HEADER 3
------------------------------------
5 INFORMATION |INFORMATION|INFORMATION
How would you make Pandas ignore the first rows? It has to be value-based because in the next sheets CODE starts in row 8, and in the next one in row 3
Not sure about XLRD, but Pandas has an easy way in the excel reading method that allows you to specify which row is your headers. That would be an easy fix unless you're intent on using XLRD.
You can try:
import pandas as pd
file= 'filename.xlsx'
df = pd.read_excel(open(file, 'rb'),sheet_name='sheetname', skiprows=[0,1,2])
Alternatively you can use header argument as mentioned earlier.
In my previous answer I pointed a static solution, and in this one I have added a helper function for dynamic parsing. get_header_index helper function dynamically gets the index of the row containing header keyword in the first column. You may change the col_index argument if you believe header keyword is in another column tough. Likewise you can change keyword argument's input as you like. The output dfs is dictionary of dataframes where keys are sheet names of a given workbook.
import pandas as pd
def get_header_index(sheet, col_index=0, keyword='code'):
arr = sheet[sheet.columns[int(col_index)]]
header_index = arr[arr.str.contains(str(keyword), na=False)].iloc[[0,]].index[0]
return header_index
file = 'filename.xlsx'
sheets_dict = pd.read_excel(open(file, 'rb'), sheet_name=None)
dfs = {}
for name, sheet in sheets_dict.items():
header = get_header_index(sheet, col_index=0, keyword='code') + 1
df = pd.read_excel(open(file, 'rb'), sheet_name=name, header=header)
dfs[name] = df
This is a form of what I did in mine, adjusted for your use (based on my previous comment):
for file in file_names: # Iterate through all of the individual report files
book = xlrd.open_workbook(file)
sheetname = get_sheetname(book)
if sheetname is not None: # Check that sheet name is valid
sheet = book.sheet_by_name(sheetname)
nrows = sheet.nrows
ncols = sheet.ncols
for i in range(nrows):
for j in range(ncols):
check = sheet.cell_value(i, j)
if check.contains("CODE"):
return (i, j)
I'm trying to determine how much data is missing from a large excel sheet. The following code takes a prohibitive amount of time to complete. I've seen similar questions, but I'm not sure how to translate the answer to this case. Any help would be appreciated!
import openpyxl
wb = openpyxl.load_workbook('C://Users/Alec/Documents/Vertnet master list.xlsx', read_only = True)
sheet = wb.active
lat = 0
loc = 0
ele = 0
a = openpyxl.utils.cell.column_index_from_string('CF')
b = openpyxl.utils.cell.column_index_from_string('BU')
c = openpyxl.utils.cell.column_index_from_string('BX')
print('Workbook loaded')
for x in range(2, sheet.max_row):
if sheet.cell(row = x, column = a).value:
lat += 1
if sheet.cell(row = x, column = b).value:
loc += 1
if sheet.cell(row = x, column = c).value:
ele += 1
print((x/sheet.max_row) * 100, '%')
print('Latitude: ', lat/sheet.max_row)
print('Location', loc/sheet.max_row)
print('Elevation', ele/sheet.max_row)
If you are simply trying to do the calc on a table on the sheet and not the entire sheet, you could make one adjustment to make it faster.
row = 1
Do Until IsEmpty(range("A1").offset(row,1).value)
if range("B"&row).value: lat += 1
if range("C"&row).value: loc += 1
if range("D"&row).value: ele += 1
row = row + 1
Loop
This would take you to the end of your defined table rather than the end of the whole sheet which is 90% of the reason it's taking you so long.
Hope this helps
Your problem is that, despite advice in the documentation to the contrary, you're using your own counters to access cells. In read-only mode each use of ws.cell() will force the worksheet to reparse the XML source for the worksheet. Simply use ws.iter_rows(min_col=a, max_col=c) to get the cells in the columns you're interested in.
I'm working on an application that processes huge Excel 2007 files, and I'm using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file - one "normal" method where the entire document is loaded into memory at once, and one method where iterators are used to read row-by-row.
The problem is that when I'm using the iterator method, I don't get any document meta-data like column widths and row/column count, and i really need this data. I assume this data is stored in the Excel document close to the top, so it shouldn't be necessary to load the whole 10MB file into memory to get access to it.
So, is there a way to get ahold of the row/column count and column widths without loading the entire document into memory first?
Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
The solution suggested in this answer has been deprecated, and might no longer work.
Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.get_highest_row() - 1
column_count = letter_to_index(sheet.get_highest_column()) + 1
IterableWorksheet.get_highest_column returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:
def letter_to_index(letter):
"""Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based
column index.
A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc.
Args:
letter (str): The column index letter.
Returns:
The column index as an integer.
"""
letter = letter.upper()
result = 0
for index, char in enumerate(reversed(letter)):
# Get the ASCII number of the letter and subtract 64 so that A
# corresponds to 1.
num = ord(char) - 64
# Multiply the number with 26 to the power of `index` to get the correct
# value of the letter based on it's index in the string.
final_num = (26 ** index) * num
result += final_num
# Subtract 1 from the result to make it zero-based before returning.
return result - 1
I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.
Python 3
import openpyxl as xl
wb = xl.load_workbook("Sample.xlsx", enumerate)
#the 2 lines under do the same.
sheet = wb.get_sheet_by_name('sheet')
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
#this works fore me.
This might be extremely convoluted and I might be missing the obvious, but without OpenPyXL filling in the column_dimensions in Iterable Worksheets (see my comment above), the only way I can see of finding the column size without loading everything is to parse the xml directly:
from xml.etree.ElementTree import iterparse
from openpyxl import load_workbook
wb=load_workbook("/path/to/workbook.xlsx", use_iterators=True)
ws=wb.worksheets[0]
xml = ws._xml_source
xml.seek(0)
for _,x in iterparse(xml):
name= x.tag.split("}")[-1]
if name=="col":
print "Column %(max)s: Width: %(width)s"%x.attrib # width = x.attrib["width"]
if name=="cols":
print "break before reading the rest of the file"
break
https://pythonhosted.org/pyexcel/iapi/pyexcel.sheets.Sheet.html
see : row_range() Utility function to get row range
if you use pyexcel, can call row_range get max rows.
python 3.4 test pass.
Options using pandas.
Gets all sheetnames with count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
for sheet in sheetnames:
df = xl.parse(sheet)
dimensions = df.shape
print('sheetname', ' --> ', dimensions)
Single sheet count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
df = xl.parse(sheetnames[0]) # [0] get first tab/sheet.
dimensions = df.shape
print(f'sheetname: "{sheetnames[0]}" - -> {dimensions}')
output sheetname "Sheet1" --> (row count, column count)