Problems with reading values as formula instead of data - python

When I want to read my Excel:
from openpyxl import load_workbook
import numpy as np
"read Excel"
wb = load_workbook('Libro1.xlsx')
hoja_1 = wb.get_sheet_by_name('1')
x = np.zeros(hoja_1.max_row)
y = np.zeros(hoja_1.max_row)
for i in range(0, hoja_1.max_row):
x[i] = hoja_1.cell(row = i + 1, column = 1).value
y[i] = hoja_1.cell(row = i + 1, column = 2).value
print(x)
print(y)
I get an error in:
x[i] = hoja_1.cell(row = i + 1, column = 1).value
ValueError: could not convert string to float: '=A1+1'

x and y are object. Convert that into list before assigning values.
Try this:
x = list(np.zeros(hoja_1.max_row))
y = list(np.zeros(hoja_1.max_row))

One but rather rare possibility is that a cell in 'Libro1.xlsx' marked as text format, which prevents it from recalulating.
When you open an Excel file, do you see the value of =A1+1 recalculated, or as text?
In any case, apply the simplest numeric formatting to all cells in Excel file (select cells and choose format), also eliminate any conditional formatting.
Similar problem described here: https://bitbucket.org/openpyxl/openpyxl/issues/699/valueerror-could-not-convert-string-to. Make sure your openpyxl version is up to date.

Related

Change dates format in one column (replace, insert column, append), any way to update? ..it must be simple

Need to change dates format in Excel column.
I can get into single cell but in case to update whole column with "proper_date" I am stuck
wb = load_workbook(...)
ws = wb['Lista']
daty_wystawienia = ws['G']
# This solution works but assigning values to first column under the chart
for daty in daty_wystawienia:
date_string = daty.value
if re.search('[0-9-]', str(date_string)):
proper_date = datetime.datetime.strptime(date_string, '%d-%m-%Y').strftime('%y.%m.%d')
for row in range(1):
ws.append([proper_date])
#tried to make last line: daty_wystawienia.append([proper_date]) but got:
AttributeError: 'tuple' object has no attribute 'append'
wb.save(...)
# Also tried this, and only this seems to work. Meaning replacing values with other correctly formatted, but I need this applied to whole column at once:
wb = load_workbook(...)
ws = wb['Lista polis']
daty_wystawienia = ws['G']
ws['G6'] = "19.05.06"
ws['G7'] = "19.05.06"
ws['G8'] = "19.05.06"
ws['G10'] = "19.05.07"
ws['G11'] = "19.05.07"
# or replace
for i in ws['G']:
ws['G9'] = ws['G9'].value.replace('06-05-2019', '10000000000')
wb.save(...)
Is there any way to replace, append, override existing values in excel using openpyxl. I am stuck on this.
Thanks in advance.
If you just want Excel to change the format of the Cell in order to display the date as you like, this is how I did it for a column:
from openpyxl import load_workbook
book = load_workbook('Example.xlsx')
ws = book['Sheet1']
for x in range (1, 500):
_cell = ws.cell(x,1)
_cell.number_format = '[$-en-GB]dd-mmm-yyyy'
book.save("Dates.xlsx")
Thanks for your effor. It looks beautiful but for some reason it does not work for me.
I went through it like this:
def date_of_issuance():
for i in ws.iter_rows():
for cell in i:
d_w = 'Date of issuance'
if cell.value == d_w:
c = cell.column
col = column_index_from_string(c)
r = cell.row
for daty in ws[c]:
date_string = daty.value
if re.search('[0-9]', str(date_string)):
proper_date = datetime.datetime.strptime(date_string, '%d-%m-%Y').strftime('%y-%m-%d')
date = datetime.datetime.strptime(proper_date, '%y-%m-%d').date()
for j in range(1):
ws.cell(row=r+1, column=col, value=date)
r += 1

Quickly count non empty cells in large excel sheet

I'm trying to determine how much data is missing from a large excel sheet. The following code takes a prohibitive amount of time to complete. I've seen similar questions, but I'm not sure how to translate the answer to this case. Any help would be appreciated!
import openpyxl
wb = openpyxl.load_workbook('C://Users/Alec/Documents/Vertnet master list.xlsx', read_only = True)
sheet = wb.active
lat = 0
loc = 0
ele = 0
a = openpyxl.utils.cell.column_index_from_string('CF')
b = openpyxl.utils.cell.column_index_from_string('BU')
c = openpyxl.utils.cell.column_index_from_string('BX')
print('Workbook loaded')
for x in range(2, sheet.max_row):
if sheet.cell(row = x, column = a).value:
lat += 1
if sheet.cell(row = x, column = b).value:
loc += 1
if sheet.cell(row = x, column = c).value:
ele += 1
print((x/sheet.max_row) * 100, '%')
print('Latitude: ', lat/sheet.max_row)
print('Location', loc/sheet.max_row)
print('Elevation', ele/sheet.max_row)
If you are simply trying to do the calc on a table on the sheet and not the entire sheet, you could make one adjustment to make it faster.
row = 1
Do Until IsEmpty(range("A1").offset(row,1).value)
if range("B"&row).value: lat += 1
if range("C"&row).value: loc += 1
if range("D"&row).value: ele += 1
row = row + 1
Loop
This would take you to the end of your defined table rather than the end of the whole sheet which is 90% of the reason it's taking you so long.
Hope this helps
Your problem is that, despite advice in the documentation to the contrary, you're using your own counters to access cells. In read-only mode each use of ws.cell() will force the worksheet to reparse the XML source for the worksheet. Simply use ws.iter_rows(min_col=a, max_col=c) to get the cells in the columns you're interested in.

Python to excel array broken into characters

When I put the code into excel every character is spaced out. This causes Tuesday to look like T,u,e,s,d,a,y in excel. The goal would be to have each cell in excel to have its own word and not the character. There are many for loops and I struggle with finding an answer to this ongoing problem. Any ideas?
import requests
from pprint import pprint
from xml.dom.minidom import parseString
from openpyxl import Workbook
NMNorth2=[("Farmington"),("Gallup"),("Grants"),("Las_Vegas"),("Raton"),("Santa_Fe"), ("Taos"),("Tijeras"),("Tucumcari")]
NMNorth=[("NM", "Farmington"),("NM", "Gallup"),("NM", "Grants"),("NM", "Las_Vegas"),("NM", "Raton"),("NM", "Santa_Fe"), ("NM", "Taos"),("NM", "Tijeras"),("NM", "Tucumcari")]
wb = Workbook()
dest_filename = 'weather.xlsx'
ws1 = wb.active
ws1.title = "Weather"
for state, city in NMNorth:
r = requests.get("http://api.wunderground.com/api/id/forecast/q/"+state+"/"+city+".json")
data = r.json()
forecast = data['forecast']['txt_forecast']['forecastday']
for n in forecast:
day = n['title']
forecaststm = (n['fcttext'])
columnVariable = 2
for x in day:
ws1.cell(row = 1, column = columnVariable).value = x
columnVariable +=1
for y in forecaststm:
ws1.cell(row = 2, column = columnVariable).value = y
columnVariable +=1
rowVariable = 2
ws1.cell(row = 1, column = 1).value = "City"
for state in NMNorth2:
ws1.cell(row = rowVariable, column = 1).value = state
rowVariable +=1
wb.save(filename = dest_filename)
The issue here is that python treats strings as iterables. In other words, this bites you if you think you're iterating through a list of strings (or similar) and go one level too deep in nested for loops; the easiest way to identify this is to print what you're working with on each loop.
In your case, the below loop is taking each letter (x) in the day of the week (day), writing it to a column and then incrementing the column you're writing to (columnVariable):
for x in day:
ws1.cell(row = 1, column = columnVariable).value = x
columnVariable +=1
Aside, camelCase isn't standard Python, it's more common to use underscores e.g. column_variable. See PEP8

Is it possible to get an Excel document's row count without loading the entire document into memory?

I'm working on an application that processes huge Excel 2007 files, and I'm using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file - one "normal" method where the entire document is loaded into memory at once, and one method where iterators are used to read row-by-row.
The problem is that when I'm using the iterator method, I don't get any document meta-data like column widths and row/column count, and i really need this data. I assume this data is stored in the Excel document close to the top, so it shouldn't be necessary to load the whole 10MB file into memory to get access to it.
So, is there a way to get ahold of the row/column count and column widths without loading the entire document into memory first?
Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
The solution suggested in this answer has been deprecated, and might no longer work.
Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.get_highest_row() - 1
column_count = letter_to_index(sheet.get_highest_column()) + 1
IterableWorksheet.get_highest_column returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:
def letter_to_index(letter):
"""Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based
column index.
A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc.
Args:
letter (str): The column index letter.
Returns:
The column index as an integer.
"""
letter = letter.upper()
result = 0
for index, char in enumerate(reversed(letter)):
# Get the ASCII number of the letter and subtract 64 so that A
# corresponds to 1.
num = ord(char) - 64
# Multiply the number with 26 to the power of `index` to get the correct
# value of the letter based on it's index in the string.
final_num = (26 ** index) * num
result += final_num
# Subtract 1 from the result to make it zero-based before returning.
return result - 1
I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.
Python 3
import openpyxl as xl
wb = xl.load_workbook("Sample.xlsx", enumerate)
#the 2 lines under do the same.
sheet = wb.get_sheet_by_name('sheet')
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
#this works fore me.
This might be extremely convoluted and I might be missing the obvious, but without OpenPyXL filling in the column_dimensions in Iterable Worksheets (see my comment above), the only way I can see of finding the column size without loading everything is to parse the xml directly:
from xml.etree.ElementTree import iterparse
from openpyxl import load_workbook
wb=load_workbook("/path/to/workbook.xlsx", use_iterators=True)
ws=wb.worksheets[0]
xml = ws._xml_source
xml.seek(0)
for _,x in iterparse(xml):
name= x.tag.split("}")[-1]
if name=="col":
print "Column %(max)s: Width: %(width)s"%x.attrib # width = x.attrib["width"]
if name=="cols":
print "break before reading the rest of the file"
break
https://pythonhosted.org/pyexcel/iapi/pyexcel.sheets.Sheet.html
see : row_range() Utility function to get row range
if you use pyexcel, can call row_range get max rows.
python 3.4 test pass.
Options using pandas.
Gets all sheetnames with count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
for sheet in sheetnames:
df = xl.parse(sheet)
dimensions = df.shape
print('sheetname', ' --> ', dimensions)
Single sheet count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
df = xl.parse(sheetnames[0]) # [0] get first tab/sheet.
dimensions = df.shape
print(f'sheetname: "{sheetnames[0]}" - -> {dimensions}')
output sheetname "Sheet1" --> (row count, column count)

Reading numeric Excel data as text using xlrd in Python

I am trying to read in an Excel file using xlrd, and I am wondering if there is a way to ignore the cell formatting used in Excel file, and just import all data as text?
Here is the code I am using for far:
import xlrd
xls_file = 'xltest.xls'
xls_workbook = xlrd.open_workbook(xls_file)
xls_sheet = xls_workbook.sheet_by_index(0)
raw_data = [['']*xls_sheet.ncols for _ in range(xls_sheet.nrows)]
raw_str = ''
feild_delim = ','
text_delim = '"'
for rnum in range(xls_sheet.nrows):
for cnum in range(xls_sheet.ncols):
raw_data[rnum][cnum] = str(xls_sheet.cell(rnum,cnum).value)
for rnum in range(len(raw_data)):
for cnum in range(len(raw_data[rnum])):
if (cnum == len(raw_data[rnum]) - 1):
feild_delim = '\n'
else:
feild_delim = ','
raw_str += text_delim + raw_data[rnum][cnum] + text_delim + feild_delim
final_csv = open('FINAL.csv', 'w')
final_csv.write(raw_str)
final_csv.close()
This code is functional, but there are certain fields, such as a zip code, that are imported as numbers, so they have the decimal zero suffix. For example, is there is a zip code of '79854' in the Excel file, it will be imported as '79854.0'.
I have tried finding a solution in this xlrd spec, but was unsuccessful.
That's because integer values in Excel are imported as floats in Python. Thus, sheet.cell(r,c).value returns a float. Try converting the values to integers but first make sure those values were integers in Excel to begin with:
cell = sheet.cell(r,c)
cell_value = cell.value
if cell.ctype in (2,3) and int(cell_value) == cell_value:
cell_value = int(cell_value)
It is all in the xlrd spec.
I know this isn't part of the question, but I would get rid of raw_str and write directly to your csv. For a large file (10,000 rows) this will save loads of time.
You can also get rid of raw_data and just use one for loop.

Categories