OpenPyXL - DataValidation adding list to the cell (issue) - python

This is my code:
dv = DataValidation(type="list", formula1='"11111,22222,33333,44444,55555,66666,77777,88888,99999,111110,122221,133332,144443,155554,166665,177776,188887,199998,211109,222220,233331,244442,255553,266664,277775,288886,299997,311108,322219,333330,344441,355552,366663,377774,388885,399996,411107,422218,433329,444440,455551,466662,477773,488884,499995,511106,522217,533328,544439,555550,566661,577772,588883,599994,611105,622216,633327,644438,655549,666660,677771,688882,699993,711104,722215,733326,744437,755548,766659,777770,788881,799992,811103,822214,833325,844436,855547,866658,877769,888880,899991,911102,922213,933324,944435,955546,966657,977768,988879,999990,1011101,1022212,1033323,1044434,1055545,1066656,1077767,1088878,1099989,1111100,1122211"', allow_blank=False)
sheet.add_data_validation(dv)
dv.add('K5')
But then I have this issue:
BUT if formula1 list is small ... then all is working fine.....
WHat is the way to add a BIG list of options which will not cause issues(as you can see above)?

Excel may impose additional limits on what it accepts. See https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oi29500/8ebf82e4-4fa4-43a6-9ecd-d2d793a6f4bf. In the implementers notes there is additional information but I cannot find the passage referred to.
Basically, I think it's generally easier to refer to values on a separate sheet.

Related

Transpose in Excel while encounter "--------"

I have some data in the same column like the picture below, which split by ---------
I want to transpose the column data to row format divided by -------, which like the picture below.
Anyone could give me a hint would be highly appreciated.
And I don't mind using python to solve this problem.
In B15 copied across to suit:
=IF($A15="---",OFFSET($A15,COLUMN()-16,),"")
and then all formulae copied down to suit. May be convenient then to Copy, Paste Special Values, filter to select --- and copy as required into another sheet.
--- represents however many hyphens are appropriate.

How to Check the cell from excel which has formula using python

I Want to access the cell which has formula from the excel workbook.Actually my python script is working fine only to read the data from excel, but i need that only the cell which has formula and to print that cells only
An example would be really appreciated..........
Two things that could help you solve the problem:
There's a typo: HasForumla instead of HasFormula
e is a string, not a cell object, so e.HasFormula won't work.
You probably want to use
e = sheet.cell(row,17)
to access the Cell object at that position. I'm not sure where you got your HasFormula attribute from, though - couldn't find it in the docs.
Edit:
I just looked at the README for the current distribution where it's stated that
xlrd will safely and reliably ignore any of these if present in the
file:
[...]
Formulas (results of formula calculations are extracted, of course).
[...]

Python sas7bdat module usage

I have to dump data from SAS datasets. I found a Python module called sas7bdat.py that says it can read SAS .sas7bdat datasets, and I think it would be simpler and more straightforward to do the project in Python rather than SAS due to the other functionality required. However, the help(sas7bdat) in interactive Python is not very useful and the only example I was able to find to dump a dataset is as follows:
import sas7bdat
from sas7bdat import *
# following line is sas dataset to convert
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
#following line is txt file to create
foo.convertFile('/support/textfiles/locked_data.txt','\t')
This doesn't do what I want because a) it uses the SAS variable names as column headers and I need it to use the variable labels, and b) it uses "nan" to denote missing numeric values where I'd rather just leave the value blank.
Can anyone point me to some useful documentation on the methods included in sas7bdat.py? I've Googled every permutation of key words that I could think of, with no luck. If not, can someone give me an example or two of using readColumnAttributes(), readColumnLabels(), and/or readColumnNames()?
Thanks, all.
As time passes, solutions become easier. I think this one is easiest if you want to work with pandas:
import pandas as pd
df = pd.read_sas('/support/sas/locked_data.sas7bdat')
Note that it is easy to get a numpy array by using df.values
This is only a partial answer as I've found no [easy to read] concrete documentation.
You can view the source code here
This shows some basic info regarding what arguments the methods require, such as:
readColumnAttributes(self, colattr)
readColumnLabels(self, collabs, coltext, colcount)
readColumnNames(self, colname, coltext)
I think most of what you are after is stored in the "header" class returned when creating an object with SAS7BDAT. If you just print that class you'll get a lot of info, but you can also access class attributes as well. I think most of what you may be looking for would be under foo.header.cols. I suspect you use various header attributes as parameters for the methods you mention.
Maybe something like this will get you closer?
from sas7bdat import SAS7BDAT
foo = SAS7BDAT(inFile) #your file here...
for i in foo.header.cols:
print '"Atrributes"', i.attr
print '"Labels"', i.label
print '"Name"', i.name
edit: Unrelated to this specific question, but the type() and dir() commands come in handy when trying to figure out what is going on in an unfamiliar class/library
I know I'm late for the answer, but in case someone searches for similar question. The best option is:
import sas7bdat
from sas7bdat import *
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
# This converts to dataframe:
ds = foo.to_data_frame()
Personally I think the better approach would be to export the data using SAS then process the external file as needed using Python.
In SAS, you can do this...
libname datalib "/support/sas";
filename sasdump "/support/textfiles/locked_data.txt";
proc export
data = datalib.locked_data
outfile = sasdump
dbms = tab
label
replace;
run;
The downside to this is that while the column labels are used rather than the variable names, the labels are enclosed in double quotes. When processing in Python, you may need to programmatically remove them if they cause a problem. I hope that helps even though it doesn't use Python like you wanted.

python: xlrd/csv - empty cell treatment when xlrd/csv packages read objects into memory

Is there an option to change the default way the csv and xlrd packages handle empty cells? By default empty cells are assigned an empty string value = ''. This is problematic when one is working with databases because an empty string is not a None value, which many python packages that interface with databases (SQLAlchemy for example) can handle as a Null for database consumption.
For example if an empty cell occurred in a field that is suppose to be a decimal/integer/float/double then the database will throw up an exception because an insert of a string was made to a field of type decimal/integer/float/double.
I haven't found any examples or documentation that shows how I can do this. My current approach is to inspect the data and do the following:
if item[i] == '':
item[i] = None
The problem with this is that I don't own the data and have no control over its quality. I can imagine that this would be a common occurrence since a lot of apps are using files/data that are produced by sources other then them.
If there is a way to change the default treatment then that would be a sensible approach in my opinion.
I have the same setup as yourself (sqlalchemy for the ORM, and data that I have little control over, being fed through excel files). I found that I need to curate the data from the xlrd before dumping it in the database. I am not aware of any tweaks that you can apply on the xlrd module.
On a more general note:
It is probably best to try and get as large a sample of example excel files as you can and see if your application can cope with it. I found that occasionally weird characters make it through the excel (people copy paste from different languages) which cause crushes further down. Also found that in some cases the file format was not UTF-8 but iso-8859 or something else. I ended up using iconv for converting the files.
you may also want to have a look at this stackoverflow article
Overall xlrd has worked for us, but I am less than impressed with the activity around the project. Seems like I am using a library that has little maintenance.
You could use the following code to change the values of all empty cells in the sheet you are reading in to NULL (or None, or whatever you like) before you actually read in the data. It loops through all rows and columns and checks if the cell_type is EMPTY and then changes the value of the respective cell to 'NULL'.
import xlrd
book = xlrd.open_workbook("data.xlsx")
sheet_name = book.sheet_names()[0] #getting the sheetname of the first sheet
sheet = book.sheet_by_name(sheet_name)
for r in range(0,sheet.nrows): #create a list with all row numbers that contain data and loop through it
for s in range(0, sheet.ncols): #create a list with all column numbers that contain data and loop through i
if sheet.cell_type(r, c) == xlrd.XL_CELL_EMPTY:
sheet._cell_values[r][c] = 'NULL'
Then you can read in the data (e.g. from the first column) and you will get NULL as a value if the cell was previously empty:
for r in range(0,sheet.nrows):
data_column_1 = sheet.cell(r,0).value
xlrd will tell you what type of cell you have (empty or blank, text, number, date, error).
This is covered in the xlrd documentation. Look at the Cell class, and these methods of the Sheet class: cell_type, col_types, and row_types.
The csv format has no way of expressing the difference between "no data at all" and "the value is a zero-length string". You will still need to check for '' and act accordingly.

How do you read a cell's value from an OpenOffice Calc .ods file?

I have been able to read an Excel cell value with xlrd using column and row numbers as inputs. Now I need to access the same cell values in some spreadsheets that were saved in .ods format.
So for example, how would I read with Python the value stored in cell E10 in an .ods file?
Hacking your way through the XML shouldn't be too hard ... but there are complications. Just one example: OOo in their wisdom decided not to write the cell address explicitly. There is no cell attribute like address="E10" or column="E"; you need to count rows and columns.
Five consecutive empty cells are represented by
<table:table-cell table:number-columns-repeated="5" />
The number-colums-repeated attribute defaults to "1" and also applies to non-empty cells.
It gets worse when you have merged cells; you get a covered-table-cell tag which is 90% the same as the table-cell tag, and attributes number-columns-spanned and number-rows-spanned need to be figured into column and row counting.
A table:table-row tag may have a number-rows-repeated attribute. This can be used to repeat the contents of a whole non-empty row, but is most often seen when there are more than 1 consecutive empty rows.
So, even if you would be satisfied with a "works on my data" approach, it's not trivial.
You may like to look at ODFpy. Note the second sentence: """Unlike other more convenient APIs, this one is essentially an abstraction layer just above the XML format.""" There is an ODF-to-HTML script which (if it is written for ODS as well as for ODT) may be hackable to get what you want.
If you prefer a "works on almost everybody's data and is supported and has an interface that you're familiar with" approach, you may need to wait until the functionality is put into xlrd ... but this isn't going to happen soon.
From libraries that I tried ezodf was the one that worked.
from ezodf import opendoc, Sheet
doc = opendoc('test.ods')
for sheet in doc.sheets:
print sheet.name
cell = sheet['E10']
print cell.value
print cell.value_type
pyexcel-ods crashed, odfpy crashed and in addition its documentation is either missing or horrible.
Given that supposedly working libraries died on the first file that I tested I would prefer to avoid writing my own processing as sooner or later it would either crash or what worse fail silently on some weirder situation.
EDIT: It gets worse. ezodf may silently return bogus data.

Categories