I would like to use the value calculated in the second, "for i in range" statement to calculate a new value using the fourth, "for i in range" statement; however, I receive the error: "could not convert string to float: 'E2*37.5'"
How do I call upon the numerical value calculated in, sheet['F{}',format(i)] ='E{}*37.5'.format(i) instead of the formula/string?
import openpyxl
wb = openpyxl.load_workbook('camdatatest.xlsx', read_only= False, data_only = True)
# Assuming you are working with Sheet1
sheet = wb['Sheet1']
for i in range(2,80):
sheet['D{}'.format(i)] = '=C{}/3'.format(i)
for i in range(2,80):
sheet['F{}'.format(i)] = '=E{}*37.5'.format(i)
for i in range(2,80):
sheet['H{}'.format(i)] = '=D{}*G2*50'.format(i)
for i in range(2,80):
sheet['I{}'.format(i)].value = float(sheet['F{}'.format(i)].value)/float(sheet['H{}'.format(i)].value)
wb.save('camdatatestoutput.xlsx' , data_only= True)
Unfortunately that's not quite possible because
openpyxl never evaluates formula
There are other libraries that may do so. However your problem can be overcome by recognizing that you can use yet another cell reference instead of the calculated value here.
for i in range(2,80):
sheet['I{}'.format(i)] = '=E{}*37.5/H{}'.format(i,i)
Note that you can't set the value for the Ix cell because you don't actually have a value for the Ex or Hx cells. (well you might have but it's not clear from your question if you do)
Related
If I have the following saved Excel document with 26007.930562 in every cell, where the column names represent the Excel formatting I am using for the given cell:
and I run the following Python code:
from openpyxl.utils import get_column_letter, column_index_from_string
import win32com.client # https://www.youtube.com/watch?v=rh039flfMto
ExcelApp = win32com.client.GetActiveObject('Excel.Application')
wb = ExcelApp.Workbooks('test.xlsx')
ws = wb.Worksheets(1)
excelRange = ws.Range(ws.Cells(1, 1), ws.Cells(2, 4))
listVals = [[*row] for row in excelRange.Value]
print(listVals)
I get the following output:
[['general', 'currency', 'accounting', 'number'], [26007.930562, Decimal('26007.9306'), Decimal('26007.9306'), 26007.930562]]
Notice how there is a loss of precision for the "currency" and "accounting" formats. They get turned into some decimal that rounds off several of the later decimal places.
Is it possible to read in currency and accounting formatted cells while still keeping full precision? If so, how?
This is what I mean when I say "currency formatting":
EDIT:
BigBen's solution works in this example. But if you have dates, Value2 doesn't treat them like dates which causes errors in Python where you intend to treat them like dates. I ended up having to write this instead:
listVals = [] # https://stackoverflow.com/a/71375004
for rowvalue, rowvalue2 in zip([[*row] for row in excelRange.Value], [[*row] for row in excelRange.Value2]):
rowlist = []
for value, value2 in zip(rowvalue, rowvalue2):
if type(value) == pywintypes.TimeType:
rowlist.append(value)
else:
rowlist.append(value2)
listVals.append(rowlist)
I'm sure there's a faster / more efficient way to do it than that but I don't know what it is.
Use .Value2 instead of .Value:
listVals = [[*row] for row in excelRange.Value2]
Result:
[['general', 'currency', 'accounting', 'number'], [26007.93056, 26007.93056, 26007.93056, 26007.93056]]
I would like to sort allocate the mode value of the given column from a CSV file.
The code I've tried:
def mode_LVL(self):
data = pd.read_csv('highscore.csv', sep=',')
mode_lvl = data["LVL"].mode()
return mode_lvl
Results in:
The mode value of LVL: 0 6
dtype: int64
I would like the mode value only, not wanting the 0 and dtype.
I have attempted to resolve by, but failed:
mode_lvl = data.mode(axis = 'LVL', numeric_only=True )
Sorry I know that this issue may be simple to solve, but I've had issues searching for the right solution.
Here is necessary seelct first value of mode, because possible mode return multiple values if same count of top categories:
mode_lvl = data["LVL"].mode().iat[0]
when I try to make a new column to add to an existing dataframe , the new column only has empty values . However, when print "result" before assigns it to the dataframe it works fine! and thus I get this weird error of max arg!
ValueError: max() arg is an empty sequence
I'm using mplfinance to plot the data
strategy.py
def moving_average (self, df , i):
signal = df['sma20'][i]*1.10
if (df['sma20'][i] > df['sma50'][i]) & (signal >df['Close'][i]):
return df['Close'][i]
else:
return None
trading.py
for i in range(0, len(df['Close'])-1):
result = strategy.moving_average(df , i)
print(result)
df['buy']= result
df.to_csv('test.csv', encoding='utf-8')
apd = mpf.make_addplot(df['buy'],scatter=True,marker='^')
mpf.plot(df, type='candle', addplot=apd)
Based on the very small amount of information here, and on your comment
"because df['buy'] column has nan values only."
I'm going to guess that your problem is that strategy.moving_average() is returning None instead of nan when there is no signal.
There is a big difference between None and nan. (The main issue is that nan supports math, whereas None does not; and as a general rule plotting packages always do math).
I suggest you import numpy as np and then in strategy.moving_average()
change return None
to return np.nan.
ALSO just saw another problem.
You are only assigning a single value to df['buy'].
You need to take it out of the loop.
I suggest initialize result as an empty list before the loop
then:
result = []
for i in range(0, len(df['Close'])-1):
result.append(strategy.moving_average(df , i))
print(result)
df['buy']= result
df.to_csv('test.csv', encoding='utf-8')
apd = mpf.make_addplot(df['buy'],scatter=True,marker='^')
mpf.plot(df, type='candle', addplot=apd)
I am using openpyxl to manipulate a spreadsheet from Python.
I am trying to create a drop-down validation in a workbook tab called organisation. Is it possible to use a Python list to populate the elements in the drop down selection?
When I hardcode the drop down options to into the DataValidation line like so:
dv = DataValidation(type="list", formula1="The", "earth", "revolves", "around", "sun", allow_blank=True)
The drop down is created in the spreadsheet tab and populated with the options as expected.
However when I try to add the drop down options using Python list and then pass to the DataValidation line like so:
valid = ['"The,earth,revolves,around,sun"']
dv = DataValidation(type="list", formula1=valid, allow_blank=True)
the drop down list is not created.
For extra information please see the full script:
def addValidationDropDowns(path):
valid = ['"The,earth,revolves,around,sun"']
wb = openpyxl.load_workbook(path)
ws = wb['organisation']
dv = DataValidation(type="list", formula1=valid, allow_blank=True)
ws.add_data_validation(dv)
for x in range(0, 3):
dv.add(ws["A"+str(x+10)])
wb.save(path)
return
I struggle with this the first time i did it. It is curious that if the 'type' paramater or DataValidation is "list" you think ¡ok, let's use a list! but no! it is expecting a string. I think your example will work if you remove the square brackets to the 'valid' variable.
valid = '"The,earth,revolves,around,sun"'
I'm trying to set conditional formatting in openpyxl to emulate highlighting duplicate values. With this simple code, I should be able to highlight consecutive duplicates (but not the first value in a duplicate sequence).
from pandas import *
data = DataFrame({'a':'a a a b b b c b c a f'.split()})
wb = ExcelWriter('test.xlsx')
data.to_excel(wb)
ws = wb.sheets['Sheet1']
from openpyxl.style import Color, Fill
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
ws.conditional_formatting.addCellIs("B1:B1048576", 'equal', "=R[1]C", True, wb.book, None, None, redFill)
wb.save()
However, when I open it in Excel I get an error related to conditional formatting, and the data is not highlighted as expected. Is openpyxl able to handle R1C1 style referencing?
In regards to highlighting to find duplicates of sequential values, the formula you want is
=AND(B1<>"",B2=B1)
With a range starting from B2 (aka, B2:B1048576)
Note - this appears to be broken in the current 1.8.3 branch of openpyxl, but will be fixed shortly in the 1.9 branch.
from openpyxl import Workbook
from openpyxl.style import Color, Fill
wb = Workbook()
ws = wb.active
ws['B1'] = 1
ws['B2'] = 2
ws['B3'] = 3
ws['B4'] = 3
ws['B5'] = 7
ws['B6'] = 4
ws['B7'] = 7
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
dxfId = ws.conditional_formatting.addDxfStyle(wb, None, None, redFill)
ws.conditional_formatting.addCustomRule('B2:B1048576',
{'type': 'expression', 'dxfId': dxfId, 'formula': ['AND(B1<>"",B2=B1)']})
wb.save('test.xlsx')
As a further reference:
If you want to highlight all duplicates:
COUNTIF(B:B,B1)>1
If you want to highlight all duplicates except for the first occurence:
COUNTIF($B$2:$B2,B2)>1
If you to highlight sequential duplicates, except for the last one:
COUNTIF(B1:B2,B2)>1
Regarding RC notation - while openpyxl doesn't support excel RC notation, conditional formatting will write the formula as provided. Unfortunately, excel enables R1C1 notation only superficially as a flag, and converts all the formulas back to their A1 equivalent when saving, meaning you'd need a function to convert all R1C1 functions to their A1 equivalents for this to work.
Openpyxl doesn't support Excel RC notation.
You could use A1 notation instead which would mean that the equivalent formula is =B2 (I think).
However, you should verify that it actually works in Excel first.
My feeling is that it won't. In general conditional formatting uses absolute cell references $B$2 instead of relative cell references B1.
If it does work then convert your formula to A1 notation and that should work in Openpyxl.
You can't use R1C1 notation directly, and this answer would be a terrible way to format a range of cells, but OpenPyXL does allow you to use row and column numbers.
cell = ws.cell(r, c)
returns the worksheet cell at row r and column c, creating one if needed. Unlike the old xlrd/xlwt modules, row and column indices begin at 1, so you can read r and c directly off of a spreadsheet using the R1C1 reference style. For most purposes, you want to access .value, for example:
ws.cell(2, 3).value = 3
...
v = ws.cell(4, 5).value
It's not nearly as pretty as ws['R2C3'] = 3 or v = ws['R4C5'], but it helps with simple tasks.