i am trying to set cell format to display 2 decimals, i need it to show 2 decimals, somehow when i use different office application to open, it gives mixed results, i am new to openpyxl (using latest 2.5.1), i have go through the documentation, setting the format seems pretty straight forward
what i have done for setting format is:-
ws.column_dimensions['F'].number_format = '#,##0.00'
ws.column_dimensions['G'].number_format = '#,##0.00'
not too sure am i miss out any?
For the attached image, it shows Open Office and Libre Office able to show the two decimals i needed, somehow WPS and MS Excel unable to show two decimals, a quick check on all the above format cell (through right click on that cell), Open Office and Libre Office is in Number format, somehow WPS and MS Excel is in General format
note: i have tried setting more decimals, e.g.: 3 decimals or more, still Open Office and Libre showing the specified decimals without problem, but WPS and MS Excel dont
i have found a work around, which i need to set every single cell manually for formatting it to two decimals, seems working when using WPS and MS Excel to open
for row in range(1, rows):
ws["F{}".format(row)].number_format = '#,##0.00'
ws["G{}".format(row)].number_format = '#,##0.00'
After doing the suggested, it still did not work for me.
Seems like MS Excel still sees the cell value as String.
So I tried to do a manual conversion of each cell before formatting:
for row in range(1, rows):
ws["F{}".format(row)].value = float(ws["F{}".format(row)].value)
ws["F{}".format(row)].number_format = '#,##0.00'
Gladly, it worked. I'm using Python 2.7.6. MS Excel Version 1808 64 bit.
You could use the round() function in python to round the figures there before writing them to their cells in excel. I would recommend adding the columns from excel to lists and then iterating through the lists using a for-loop
for i in list:
round(i, 2) #Rounds each element of the list to two decimal places
Hope this helps
Related
The conversion of xml to csv file, this is done by some code and the specifications that I have added.
As as result I get a csv file, once I open it I see some weird numbers that look something like this
1,25151E+21
Is there any way to eliminate this and show the whole numbers. The code itself that parses xml to csv is working fine so I’m assuming it is an excel thing.
I don’t want to go and do something manually every time I am generating a new csv file
Additional
The entire code can be found HERE and I have only long numbers in Quality
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
CSV doesn't pass any cell formatting rules to Excel. Hence if you open a CSV that has very large numbers in it, the default cell formatting will likely be Scientific. You can try changing the cell formatting to Number and if that changes the view to the entire number like you want, consider using the Xlsxwriter to apply cell formatting to the document while writing to Xlsx instead of CSV.
I often end up running a lambda on dataframes with this issue when I bring in csv, fwf, etc, for ETL and back out to XLSX. In my case they are all account numbers, so it's pretty bad when Excel helpfully overrides it to scientific notation.
If you don't mind the long number being a string, you can do this:
# First I force it to be an int column as I import everything as objects for unrelated reasons
df.thatlongnumber = df.thatlongnumber.astype(np.int64)
# Then I convert that to a string
df.thatlongnumber.apply(lambda x: '{:d}'.format(x))
Let me know if this is useful at all.
Scientific notation is a pain, what I've used before to handle situations like this is to cast it into a float and then use a format specifier, something like this should work:
a = "1,25151E+21"
print(f"{float(a.replace(',', '.')):.0f}")
>>> 1251510000000000065536
I have started using xlwings to create an excel tool which calls a python code. I think it is super useful and at the same time user-friendly as everybody is used to have excel as GUI.
Now to my problem: The tool works well. However, I am left with some formatting. Currently I am able to do some formatting (range(XX).number_format = XX ), but I have not managed to create my desired format.
I want to have comma separated numbers without decimals.
sht = xw.Book.caller().sheets[0]
sht.range('C:D').number_format = '0.00' (1)
sht.range('C:D').number_format = 'General' (2)
sht.range('C:D').number_format = '#’##0' (3)
(1): This works. However, numbers are not comma separated (as expected)
(2): Does not work. Python runs and runs, nothing happens. (same for 'Number')
(3): Produces desired results on my machine/ my excel version. However, on my friend's excel it looks different and does not produce the desired results anymore.
Thanks a lot in advance for your help.
In order to display the format in thousands and seperate by comma, I tried this,
number_format = "#,##0"
and it works perfectly in my case.
I'm importing data coming from excel files that come from another office.
In one of the columns, for each cell, I have lists of numbers used as tags. These were manually inserted, by different people and (my guess) using computers with different thousands settings, so the result is very heterogeneous.
As an example I have:
tags= ['205', '306.3', '3,206,302','7.205.206']
If this was a CSV file (I tried converting one single file to check), using
pd.read_csv(my_file,sep=';')
would give me exactly the above mentioned list.
Unfortunately as said, we're talking about excel files (plural) and I have to deal with it, and using
pd.read_excel(my_file,sheetname=my_sheet,encoding='utf-16',converters{'my_column':str})
what I get instead is:
tags= ['205', '306.3', '3,206,302','7205206']
As you see, whenever the number can be expressed logically in thousands (so, not the second number in my list) the dot is recognised as a thousands separator and I get a single number, instead of three.
I tried reading documentation, and searching on stackoverflow and google, but the keywords to describe this problem are too vague and I didn't find a viable solution, yet.
How can I get the right list using excel files?
Thanks.
This problem is likely happening because pandas is running their number parser before their date parser.
One possible fix is to add a thousands separator. For example, if you are actually using ',' as your thousands separator, you could add thousands=',' in your excel reader:
pd.read_excel(my_file,sheetname=my_sheet,encoding='utf-16',thousands=',',converters{'my_column':str})
You could also pick an arbitrary thousand separator that doesn't exist in your data to make the output stay the same if thousands=None (which should be the default according to documentation), doesn't already deal with your problem. You should also make sure that you are converting the fields to str (in which case using thousands is kind of redundant, as it's not applied to trings either way).
EDIT:
I tried using the following dummy data ('test.xlsx'):
a b c d
205 306.3 3,206,302 7.205.206
and with
dataf = pandas.read_excel('test.xlsx', header=0, converters={'a':str, 'b':str,'c':str,'d':str})
print(dataf.to_string)
I got the following output:
Columns: [205, 306.3, 3,206,302, 7.205.206]
Which is exactly what you were looking for. Are you sure you have the latest version of pandas and that you are in fact not using converters = {'col':int} or float in your converters keyword?
As it stands, it sounds like you are either converting your fields to numeric (int or float), or there is a problem elsewhere in your code. The pandas read_excel seems to work as described, and I can get the results you specified with the code specified above. In other wods: Your code should work, if it doesn't it might be due to outdated pandas version, other parts in your code or even problems with the source data. As it stands, it's not possible to answer your question further with the information you have provided.
I'm just trying out csvkit for converting Excel to csv. However, it's not taking into account formatting on dates and times, and producing different results from Excel's own save-as-csv. For example, this is a row of a spreadsheet:
And this what Excel's save-as produces:
22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23
The date has no special formatting, and the time is formatted as [mm].ss.00. However, this is in2csv's version of the csv:
1959-04-22,Bar,F,0.00106481481481,0.000960648148148,0.00068287037037,0.000546643518519
which is of course of no use at all. Any ideas? There don't seem to be any command-line options for this - no-inference doesn't help. Thanks.
EDIT
Both csvkit ans xlrd do seem to take into account formatting, but they're not smart about it. A date of 21/02/1066 is passed though as the text string '21/02/1066' in both cases, but a date '22/04/1959' is turned into '21662.0' by xlrd, and 1959-04-22 by csvkit. Both of them just give up on small elapsed times and pass through the float representation. This is Ok if you know that the cell should contain an elapsed time, because you can just multiply by 24*60*60 to get the right answer.
I don't think xlrd would be much help here since its date tuple functions only handle seconds, and not centiseconds.
EDIT 2
Found out something interesting. I started with a base spreadsheet containing times. In one of them I formatted the times as [m:]ss.00, and in the other I formatted them as [mm:]ss.00. I then saved each as a .xls and a .xlsx, giving a total of 4 spreadsheets. Excel could convert all 4 to csv, and all the time text in the csv's appeared as originally written (ie. 0:21.0, for example, for 0m 21.0s).
in2csv can't handle the two .xls versions at all; this time appears as 00:00:21. It also can't handle the [m:]ss.00 version of the .xlsx - conversion gives the catch-all 'index out of range' error. The only one of the 4 spreadsheets that in2csv can handle is the .xlsx one, with [mm:]ss.00 formatting.
The optional -I argument should be working to avoid this issue. When testing your sample data, I get what Excel's save-as produces.
Command:
in2csv sample.csv -I > sample-output-i.csv
Output:
22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23
-I, --no-inference Disable type inference when parsing CSV input.
https://csvkit.readthedocs.io/en/latest/scripts/in2csv.html
I have been able to read an Excel cell value with xlrd using column and row numbers as inputs. Now I need to access the same cell values in some spreadsheets that were saved in .ods format.
So for example, how would I read with Python the value stored in cell E10 in an .ods file?
Hacking your way through the XML shouldn't be too hard ... but there are complications. Just one example: OOo in their wisdom decided not to write the cell address explicitly. There is no cell attribute like address="E10" or column="E"; you need to count rows and columns.
Five consecutive empty cells are represented by
<table:table-cell table:number-columns-repeated="5" />
The number-colums-repeated attribute defaults to "1" and also applies to non-empty cells.
It gets worse when you have merged cells; you get a covered-table-cell tag which is 90% the same as the table-cell tag, and attributes number-columns-spanned and number-rows-spanned need to be figured into column and row counting.
A table:table-row tag may have a number-rows-repeated attribute. This can be used to repeat the contents of a whole non-empty row, but is most often seen when there are more than 1 consecutive empty rows.
So, even if you would be satisfied with a "works on my data" approach, it's not trivial.
You may like to look at ODFpy. Note the second sentence: """Unlike other more convenient APIs, this one is essentially an abstraction layer just above the XML format.""" There is an ODF-to-HTML script which (if it is written for ODS as well as for ODT) may be hackable to get what you want.
If you prefer a "works on almost everybody's data and is supported and has an interface that you're familiar with" approach, you may need to wait until the functionality is put into xlrd ... but this isn't going to happen soon.
From libraries that I tried ezodf was the one that worked.
from ezodf import opendoc, Sheet
doc = opendoc('test.ods')
for sheet in doc.sheets:
print sheet.name
cell = sheet['E10']
print cell.value
print cell.value_type
pyexcel-ods crashed, odfpy crashed and in addition its documentation is either missing or horrible.
Given that supposedly working libraries died on the first file that I tested I would prefer to avoid writing my own processing as sooner or later it would either crash or what worse fail silently on some weirder situation.
EDIT: It gets worse. ezodf may silently return bogus data.