Openpyxl: We found a problem with some content - python

I am getting the error message 'We found a problem with some content' opening a file I generated with openpyxl. The file is being generated by concatenating different xlsx files and adding additional formulas in further cells.
The problem is caused by a Formula with an if-condition I am writing into a cell (the second for loop is causing the excel error message).
That's the code:
import openpyxl as op
import glob
# Search for all xlsx files in directory and assign them to variable allfiles
allfiles = glob.glob('*.xlsx')
print('Following files are going to be included into the inventory: ' + str(allfiles))
# Create a workbook with a sheet called 'Input'
risk_inventory = op.load_workbook('./Report/Risikoinventar.xlsx', data_only = False)
input_sheet = risk_inventory['Input']
risk_inventory.remove(input_sheet)
input_sheet = risk_inventory.create_sheet()
input_sheet.title = 'Input'
r_maxrow = input_sheet.max_row + 1
# There is more code here which is not related to the problem
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
So depending on what information is in cell(row = i, column = 2) I want a specific formula in cell(row = i, column = 21). The first for loop works perfectly, second for loop causes the error message in excel and the formulas are not being pasted in)
As you probably already see I am trying to code with Python for a week an have never ever tried coding beforeā€¦
Many thanks in advance!

I've been having the same issue, and it was due to an incorrectly written formula. I found what was wrong by clicking "View" instead of "Delete" when opening the file.

Related

Outputted data into a xlxs sheet's row in sequence with openpyxl

I'm trying to write the data outputted h.mediacount to a column (C) in Sheet1.
I can't figure out how to iterate through to the next cell for the next output i.e writing h.mediacount to cell C2, looping and writing the next output to cell C3 etc.
Here is my code as it stands.
book = load_workbook(path)
sheet = book['Sheet1']
column_name = 'username'
for column_cell in sheet.iter_cols(1, sheet.max_column):
if column_cell[0].value == column_name:
B = 0
for data in column_cell[1:]:
htag = data.value
print(htag)
h = Hashtag.from_name(l.context, htag)
print(h.mediacount)
Please note the print(htag) and print(h.mediacount) are only there to demonstrate that the code works up to that point.
Update:
I've written this code out, however, it runs indefinitely without any errors, but also without any changes to the sheet. I am unable to see where it's going wrong as there are no errors.
column_name = 'username'
column_name2 = 'hashtags'
for column_cell in sheet.iter_cols(1, sheet.max_column):
if column_cell[0].value == column_name:
B = 0
for data in column_cell[1:]:
htag = data.value
h = Hashtag.from_name(l.context, htag)
if column_cell[0].value == column_name2:
C = 0
for cell in column_cell[1:]:
cell.value = h.mediacount
book.save('alpha list test.xlsx')
Update 2:
Tried adding print(h.mediacount) before
python if column_cell[0].value == column_name2:
and it loops through that flawlessly, must be an issue with the code underneath and writing to the workbook.

Export Pandas Dataframe to well-formed CSV

I have a cycle in which on every iteration I export the pandas dataframe to a CSV file. The problem is that i got an output as you see in the first picture, but i need to get something similar to the second one.
I also tried with some encoding type, such as utf-8, utf-16, but nothing changed.
The only difference between my solution and the ones found online is that my dataframe is built from a pickle file, but I don't think this is the problem.
for pickle_file in files:
key = pickle_file.split('/')[5].split('\\')[1] + '_' + pickle_file.split('/')[5].split('\\')[4]
with lz4.frame.open(pickle_file, "rb") as f:
while True:
try:
diz[key].append(pickle.load(f))
except EOFError:
break
for key in diz.keys():
a = diz[key]
for j in range(len(a)):
t = a[j]
for index,row in t.iterrows():
if row['MODE'] != 'biflow':
w = row['W']
feature = row['FEATURE']
mean = row['G-MEAN']
rmse = row['RMSE']
df.loc[-1] = [w] + [feature] + [rmse] + [mean] + [key]
df.index = df.index + 1
df = df.sort_values(by = ['W'])
df.to_csv(path + key + '.csv', index = False)
df = df[0:0]
The data is correctly formed. What you need to do is split each row into columns. In MS Excel it's Data > Text to Columns and then follow the function wizard.
If you are using a different application for opening the data, just google how to split text row data into columns for that application.

Compare two excel files in python

import xlrd
wb_1 = xlrd.open_workbook('Book1.xls', on_demand=True)
ws_1 = wb_1.sheet_by_name('Sheet3')
wb_2 = xlrd.open_workbook('Book2.xls', on_demand=True)
ws_2 = wb_2.sheet_by_name('Sheet3')
for i in range(ws_1.ncols):
col_value1 = ws_1.cell_value(0, i)
for cell in range(ws_1.nrows):
cell_value1 = ws_1.cell(cell, i)
for j in range(ws_2.ncols):
col_value2 = ws_2.cell_value(0, i)
for cell in range(ws_2.nrows):
cell_value2 = ws_2.cell(cell, i)
if cell_value2 == cell_value1:
print('same')
Im trying to compare two excel worksheets, Im not sure whether im going in a right way.How to find the changed values
Try the below code for extracting row and columns differences.
import xlrd
wb_1 = xlrd.open_workbook('Book1.xlsx', on_demand=True)
ws_1 = wb_1.sheet_by_name('Sheet3')
rw,cl,rw2,cl2=[[] for i in range(4)]
for i in range(0,ws_1.ncols):
col_value1 = ws_1.cell(0, i).value
cl.append(col_value1)
for cell in range(0,ws_1.nrows):
row_value1 = ws_1.cell(cell, i).value
rw.append(row_value1)
wb_2 = xlrd.open_workbook('Book2.xlsx', on_demand=True)
ws_2 = wb_2.sheet_by_name('Sheet3')
for i in range(0,ws_2.ncols):
col_value2 = ws_2.cell(0, i).value
cl2.append(col_value2)
for cell in range(0,ws_2.nrows):
row_value2 = ws_2.cell(cell, i).value
rw2.append(row_value2)
for i in range(len(cl)):
for j in range(len(cl2)):
if cl[i]!=cl2[j]:
print("column difference",i,j)
for i in range(len(rw)):
for j in range(len(rw2)):
if rw[i]!=rw2[j]:
print("row difference",i,j)
Try to convert Excel into CSV file, it will seperate your values with commas. Library is called CSV
Just "import csv"
Then open file using "with", get rows with column names and you will get list or dictionary ( depends on approach )
You'll have to just compare list indexes and that's the easiest way.
Read article:
https://realpython.com/python-csv/

Excel parser stuck on one row

So I was making a quick script to loop through a bunch of sheets in an excel file (22 to be exact) and what I wanted to do was the following:
Open the excel sheet and open the sheet named "All" which contained a list of names and then loop through each name and do the following
To loop through all the other 22 sheets in the same workbook and look through each one for the name, which I knew was in the 'B' column.
If the name were to be found, I wanted to take all the columns in that row containing the data for that name and these columns were from A-H
Then copy and paste them next to the original name (same row) in the 'All sheet' while leaving a bit of a space between the original name and the others.
I wanted to do this for all 22 sheets and for the 200+ names listed in the 'All' sheet, my code is as follows:
import openpyxl, pprint
columns = ['A','B','C','D','E','F','G','H']
k = 10
x = 0
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
print("Opening Workbook...")
wb = openpyxl.load_workbook('FileName.xlsx')
sheet_complete = wb.get_sheet_by_name("All")
row_count_all = sheet_complete.max_row
for row in range(4, row_count_all+1):
k = 10
cell = 'B' + str(row)
print(cell)
name = sheet_complete[cell].value
for i in range(2, 23):
sheet = wb.get_sheet_by_name(str(1995 + i))
row_count = sheet.max_row
for row2 in range(2, row_count+1):
cell2 = 'B' + str(row2)
name2 = sheet[cell].value
if name == name2:
x = x + 1
for z in range(0,len(columns)):
k = k + 1
cell_data = sheet[columns[z] + str(row2)].value
cell_target = colnum_string(k) + str(row)
sheet_complete[cell_target] = cell_data
wb.save('Scimago Country Ranking.xlsx')
print("Completed " + str(x) + " Task(s)")
break
The problem is that it keeps looping with the first name only, so it goes through all the names but when it comes to copying and pasting the data, it just redoes the first name so in the end, I end up with all the names in the 'All' sheet and next to each one is the data for the first name repeated over and over. I can't see what's wrong with my code but forgive me if it's a silly mistake as I'm kind of a beginner in these excel parsing scripts. print statements were for testing reasons.
P.S I know I'm using a deprecated function and I will change that, I was just too lazy to do it since it seems to still work fine and if that's the problem then please let me know.

imported csv to dataframe objects not recognized

I have imported multiple csv files from a folder. First I created a list of all the csv files in the folder and then I provide the length of the list to my function.
The csv files have rows with different column lengths so that is why I think I have to use readlines.
The problem is that when I try to filter the DataFrame the values are not recognized.
I saved it to a sqlite table and pulled it in to R and a value that looks like "H"
appears to be like this in r --- "\"H\""
How can I prevent those extra characters from being added to my object "H"
Or do I have another problem?
x = []
count = 0
while (count < len(filelist) ):
for file in filelist:
filename = open(filelist[count])
count = count + 1
for line in filename.readlines():
x.append(line.split(','))
df = pd.DataFrame(x)
For example I am just trying to create a mask. But I am getting all False. The DataFrame appears to contain "H"?
data['V1'] == "H"
Try this
df_list =[]
file_list = []
path = 'file_path'
for file in file_list:
df_name = 'df_%s' %file
df_list.append(df_name)
('df_%s' % file) = pd.read_csv(path+file)
new_df = pd.concat(df_list)
Answer: This code fixed the problem by removing the quotes throughout. Now the mask works.
for i, col in enumerate(df.columns):
df.iloc[:, i] = df.iloc[:, i].str.replace('"', '')

Categories