Please help to find correct solution from "simple to customize in future" point of view.
I have SQLite table and very big select. After this select I got 5 column and any rows.
I want to export this data to Special Excel file and Special Sheet. But not just export, I want add row = 0 with Headers of table. For example: header = [('Place', 'Players', 'Score', 'Delta', 'Game')].
For each row from SQLite I need add index to Place column from 1 to XXX.
Headers should be simple configure in future.
I try to directly import data from sqlite to excel, but in this case header not added. (here Players_Last_Day_Stat - sql select)
from xlsxwriter.workbook import Workbook
workbook = Workbook('Total_Stat.xlsx')
conn = create_connection()
c=conn.cursor()
worksheet = workbook.add_worksheet('Last-Day')
mysel=c.execute(Players_Last_Day_Stat)
for i, row in enumerate(mysel):
for j, value in enumerate(row):
if isinstance(value, float):
value = int(value)
worksheet.write(i, j, value)
But result like this
I expect this result finally:
Also, hot to change some cell bolt from python?
Thank you.
You're close. To make an index for your table, you can use worksheet.write_column. Here is what you can do to implement that (based on your code) and to shift the table (one column to the right and one row below) :
from xlsxwriter.workbook import Workbook
workbook = Workbook('Total_Stat.xlsx')
worksheet = workbook.add_worksheet('Last-Day')
conn = create_connection()
c = conn.cursor()
mysel = c.execute(Players_Last_Day_Stat)
header = ['Place', 'Players', 'Score', 'Delta', 'Game']
for idx, col in enumerate(header):
worksheet.write(0, idx, col) # <- write the column name one time in a row
#this was untouched
for i, row in enumerate(mysel):
for j, value in enumerate(row):
if isinstance(value, float):
value = int(value)
worksheet.write(i+1, j+1, value)
worksheet.write_column(1, 0, [i for i in range(1, len(c.execute(Players_Total_Stat).fetchall()) + 1)]) # make an index
#here, we make both 1st column/row bold
bold_fmt = workbook.add_format({'bold': True})
worksheet.set_row(0, None, bold_fmt)
worksheet.set_column(0, 0, None, bold_fmt)
Related
I am using openpyxl to read a column (A) from an excel spreadsheet. I then iterate through a dictionary to find the matching information and then I want to write this data back to column (C) of the same Excel spreadsheet.
I have tried to figure out how to append data back to the corresponding row but without luck.
CODE
from openpyxl import load_workbook
my_dict = {
'Agriculture': 'ET_SS_Agriculture',
'Dance': 'ET_FA_Dance',
'Music': 'ET_FA_Music'
}
wb = load_workbook("/Users/administrator/Downloads/Book2.xlsx") # Work Book
ws = wb['Sheet1'] # Work Sheet
column = ws['A'] # Column
write_column = ws['C']
column_list = [column[x].value for x in range(len(column))]
for k, v in my_dict.items():
for l in column_list:
if k in l:
print(f'The dict for {l} is {v}')
# append v to row of cell index of column_list
So, if my excel spreadsheet looks like this:
I would like Column C to look like this after I have matched the data dictionary.
In order to do this with your method you need the index (ie: row) to assign the values to column C, you can get this with enumerate when running over your column_list
for i, l in enumerate(column_list):
if k in l:
print(f'The dict for {l} is {v}')
# append v to row of cell index of column_list
write_column[i].value = v
After writing all the values you will need to run
wb.save("/Users/administrator/Downloads/Book2.xlsx")
To save your changes
That said, you do a lot of unnecessary iterations of the data in the spreadsheet, and also make things a little bit difficult for yourself by dealing with this data in columns rather than rows. You already have your dict with the values in column A, so you can just do direct lookups using split.
You are adding to each row, so it makes sense to loop over rows instead, in my opinion.
my_dict = {
'Agriculture': 'ET_SS_Agriculture',
'Dance': 'ET_FA_Dance',
'Music': 'ET_FA_Music'
}
wb = load_workbook("/Users/administrator/Downloads/Book2.xlsx") # Work Book
ws = wb['Sheet1'] # Work Sheet
for row in ws:
try:
# row[0] = column A
v = my_dict[row[0].value.split("-")[0]] # get the value from the dict using column A
except KeyError:
# leave rows which aren't in my_dict alone
continue
# row[2] = column C
row[2].value = v
wb.save("/Users/administrator/Downloads/Book2.xlsx")
If the cell contains "external" from the C column then copy cell "good" from the D column, into the E column, in the rows where the A column contains 003.
Below are two images (before and after) in excel.
Before:
After:
I tried to find a correct script but it did not work out. It needs to be changed to "row" and "column" where I put "???" :
import openpyxl
from openpyxl import load_workbook
wb_source = openpyxl.load_workbook("path/file.xlsx")
sheet = wb_source['Sheet1']
x=sheet.max_row
y=sheet.max_column
for r in range(1, x+1) :
for j in range(1, y+1):
copy(sheet.cell(row= ???, column=???)
if str(copy.value)=="external":
sheet.??
break
wb_source.save("path/file2.xlsx")
How should they be added (row and column)?
Read the entire sheet.
Create a dictionary for the external products
Write back to Excel.
Try:
import openpyxl
wb = openpyxl.load_workbook("file1.xlsx")
ws = wb['Sheet1']
data = list()
for r, row in enumerate(ws.iter_rows()):
data.append([cell.value for c, cell in enumerate(row)])
mapper = {l[0]: l[-1] for l in data if l[2]=="external"}
for r, row in enumerate(ws.iter_rows()):
if ws.cell(r+1, 1).value in mapper:
ws.cell(r+1, 5).value = mapper[ws.cell(r+1, 1).value]
wb.save("file2.xlsx")
I am trying to get the column value from worksheet1 to worksheets2(in specific column), while skipping all the nul/None value in between. My code worked when I printed out all the values in worksheet1 column, exluding all the nul values. However when I saved it to worksheet2, it only showed the last value and duplicate that to the whole column(from row 2 to 20).
Don't know why only last value was written in the new column
from openpyxl import Workbook
from openpyxl import load_workbook
source_file = (r'XXX(Source file).xlsx')
dest_file = (r'XXX(dest file).xlsx')
wb1=load_workbook(source_file, data_only=True)
wb1.active=0
ws1=wb1.active
wb2=load_workbook(dest_file)
wb2.active=0
ws2=wb2.active
for a in range(9,43):
cell2 = ws1.cell(row = a, column = 10)
if cell2.value is None or cell2.value == 0:
continue
else:
print(cell2.value)
for b in range(2,20):
ws2.cell(row = b, column=4).value = cell2.value
wb2.save(dest_file)
Your second loop is nested so that it will always overwrite all values in the column of the second sheet with the same value from the first.
I'd do something like this:
idx = 2
for row in ws1.iter_rows(min_row=9, max_row=43, min_col=10, max_col=10):
cell = row[0]
if not cell.value:
ws2.cell(row=idx, column=4, value=cell.value)
idx += 1
I am using python-2.7 and xlsxwriter for writing in excel sheet.
Following is my code...
workbook = Workbook('D:\S_details.xlsx')
sheet = workbook.add_worksheet()
rownum = 2
colnum = 2
for a in student_result:
for r, row in enumerate(student_result):
for c, col in enumerate(row):
bold = workbook.add_format({'bold': 1})
sheet.write('A1','Student_ID',bold)
sheet.write('B1','Student_Link',bold)
sheet.write('C1','Student_Name',bold)
sheet.write('D1','Student_Qualification',bold)
sheet.write('E1','Student_Address',bold)
sheet.write('F1','Student_City',bold)
sheet.write('G1','Student_State',bold)
sheet.write('H1','Student_Country',bold)
sheet.write('I1','Student_Stream',bold)
sheet.write('J1','Student_Gender',bold)
sheet.write(r,c,col)
rownum = rownum + 1
colnum = colnum + 1
the code runs well but the very first entry which is retrieved from database is overwritten by the header of each column.
Hence only first entry is overwritten and rest of the entries are visible perfectly.
I am also printing the data before writing it to excel sheet but it is not showing any error nor the records are duplicated or so.
Can anyone please guide where I am going wrong...
Guidance / Help in any form is welcome.
Thank-you in advance :)
There are a few issues with the code example:
The headers are re-written for every iteration of the inner loop. This part of the code should be outside the loop.
The for a in student_result loop is unused.
The row_num and col_num variables are incremented but not used.
The enumerate() returns a 0 row value which overwrites or is overwritten by the A1, B1 entries in the headers.
Fixing these issues would give something like this:
import xlsxwriter
workbook = xlsxwriter.Workbook('S_details.xlsx')
sheet = workbook.add_worksheet()
# Generate some sample data.
student_result = []
for num in range(1, 11):
student_result.append([num] * 10)
# Make the columns wider so that the text is visible.
sheet.set_column('A:J', 20)
# Add some formatted headers.
bold = workbook.add_format({'bold': 1})
sheet.write('A1','Student_ID',bold)
sheet.write('B1','Student_Link',bold)
sheet.write('C1','Student_Name',bold)
sheet.write('D1','Student_Qualification',bold)
sheet.write('E1','Student_Address',bold)
sheet.write('F1','Student_City',bold)
sheet.write('G1','Student_State',bold)
sheet.write('H1','Student_Country',bold)
sheet.write('I1','Student_Stream',bold)
sheet.write('J1','Student_Gender',bold)
# Write the data.
for row_num, row_data in enumerate(student_result):
for col_num, col_data in enumerate(row_data):
sheet.write(row_num + 1, col_num, col_data)
workbook.close()
You preset rownum and colnum, but you're not using them in the write statement. How about:
sheet.write(rownum,colnum,col)
Also you probably don't want to advance rownum in the col for loop, so:
for a in student_result:
for r, row in enumerate(student_result):
for c, col in enumerate(row):
bold = workbook.add_format({'bold': 1})
sheet.write('A1','Student_ID',bold)
sheet.write('B1','Student_Link',bold)
sheet.write('C1','Student_Name',bold)
sheet.write('D1','Student_Qualification',bold)
sheet.write('E1','Student_Address',bold)
sheet.write('F1','Student_City',bold)
sheet.write('G1','Student_State',bold)
sheet.write('H1','Student_Country',bold)
sheet.write('I1','Student_Stream',bold)
sheet.write('J1','Student_Gender',bold)
sheet.write(rownum,colnum,col)
colnum = colnum + 1
rownum += 1
See my code below. This code works very well, but I would like to do two things. One thing is I made if statement with or much shorter than actual for example. I have many columns like this, not all next to each other. I would like it to be shorter. Also, sometimes I may not know exact column letter.
So I want to know if there is a way to know the column name or header. Like the values that would be in very top row. So I can test to see if it is one of those values to always perform function on that cell if it's in the specified column.
I can't find openpyxl function to do column name. Not sure if it understands that first row is different than rest. I think maybe if not I can try to do test on first row, but don't understand how to make this.
So is there a way to call column name? or if there is no way to call column name to test, can someone help me with doing check on first row to see if it has value? then do change on correct row I'm in? Does this make sense.
So instead of code saying:
if cellObj.column == 'H' or ...
It would say:
if cellObj.column_header == 'NameOfField or ...
Or if not possible to do that, then:
if this cell has column where first row value is 'NameOfField' ...
Please help with best way to do this. I have looked on stackoverflow and in book and blog site, but does not seem to be a way to call column name (not the letter of column).
for row in sheet.iter_rows():
for cellObj in row:
if cellObj.column == 'H' or cellObj.column == 'I' or cellObj.column == 'L' or cellObj.column == 'M':
print(cellObj.value),
if cellObj.value.upper() == 'OldValue1':
cellObj.value = 1
print(cellObj.value)
elif cellObj.value.upper() == 'OldValue2':
cellObj.value = 2
print(cellObj.value)
EDIT
Assuming these are the header names you are looking for:
colnames = ['Header1', 'Header2', 'Header3']
Find the indices for these columns:
col_indices = {n for n, cell in enumerate(sheet.rows[0]) if cell.value in colnames}
Now iterate over the remain rows:
for row in sheet.rows[1:]:
for index, cell in enumerate(row):
if index in col_indices:
if cell.value.upper() == 'OldValue1':
cell.value = 1
print(cell.value)
elif cell.value.upper() == 'OldValue2':
cell.value = 2
print(cell.value)
Use a dictionary instead of a set to keep the column names around:
col_indices = {n: cell.value for n, cell in enumerate(sheet.rows[0])
if cell.value in colnames}
for row in sheet.rows[1:]:
for index, cell in enumerate(row):
if index in col_indices:
print('col: {}, row: {}, content: {}'.format(
col_indices[index], index, cell.value))
if cell.value.upper() == 'OldValue1':
cell.value = 1
elif cell.value.upper() == 'OldValue2':
cell.value = 2
Old answer
This makes your if statement shorter:
if cellObj.column in 'HILM':
print(cellObj.value),
For multi letter column coordinates you need to use a list:
if cellObj.column in ['H', 'AA', 'AB', 'AD']:
print(cellObj.value),
You can use a dictionary object to store the key-value pairs for your data, where the key will be the header for each column, and the value will be the particular column value. You can then append these dictionary objects to a list and access them using a for loop and normal dictionary syntax.
For example:
Assuming "my_workbook" is an excel workbook with the following column headers and values stored in the first worksheet:
Name Class Age John 1 12 Andrew 1 12 Jane 2
13
Load the workbook and get values only:
from openpyxl import load_workbook
wb = load_workbook('./my_workbook.xlsx')
ws = wb.worksheets[0].values
header = next(ws) #get the header row
my_data = []
Organise the data into a dictionary structure:
for row in ws:
my_data.append(dict(zip(header, row))
You can then access the columns of each row using the headers as keys:
for data in my_data:
print(data['Name'], data['Class'], data['Age'])
This will output:
John 1 12
Andrew 1 12
Jane 2 13
As a final note, using a dictionary structure to store and access your data makes your code more readable, as opposed to using indices, and allows you to re-arrange the columns in the excel file without having to modify your code. Hope this helps. 😊
You can access cells from the first row and and column using the sheet.cell(row=#, column = #) syntax. For example:
for row in enumerate(sheet.iter_rows()):
for j, cellObj in enumerate(row):
header_cell = sheet.cell(row=1, column=j)
if cellObj.column in ['H', 'I', 'L', 'M', 'AA', 'AB']:
print(cellObj.value),
if cellObj.value.upper() == 'OldValue1':
cellObj.value = 1
print(cellObj.value)
elif cellObj.value.upper() == 'OldValue2':
cellObj.value = 2
print(cellObj.value)
Since row returns a generator, you can easily extract headers in the first iteration, treat them as you need, and then continue to consume it. For instance:
headers = [cell.value for cell in next(sheet.rows)]
# find indexes of targeted columns
cols = [headers.index(header) for header in 'HILM']
conv = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.rows:
values = [cell.value for cell in row]
for col in cols:
values[col] = conv[values[col]]
You have many ways to do this. some approach that i used:
1. Brute force
Assuming "sheet" and "workbook" are defined.
header = [cell for cell in sheet['A1:XFD1'][0] if cell.value is not None and cell.value.strip() != ''] #you get all non-null columns
target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
target_header = [cell.column for cell in header if cell.value in target_values] #get column index
data = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.iter_rows(max_row=sheet.max_row, max_col=sheet.max_column):
for cell in row:
if cell.column in target_header and cell.value in data :
cell.value = data[cell.value]
In this case, the brute force is in "sheet['A1:XFD1']". we have to check for all columns the first time. But you'll get all cells references for columns. After that, we create target_values (our columns names...) and we create a list with column index (target_header). Finally we iterated over sheet. We check if the cell's column is in the column index and check if the cell's value is in data, so we're able to change the value.
Downside:if exists cell with random whitespace outside "data area". max_row and max_column will consider that cells (iterate over blank cells).
2. Check for bundaries
You can use your own max row and max column if the data has table form(no empty space between columns, a column with "id"-> not null, not whitespace).
from openpyxl.utils import get_column_letter
def find_limit_sheet(direction):
max_limit_value = 1
while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
max_limit_value = max_limit_value + 1
return (max_limit_value - 1) if max_limit_value != 1 else 1
max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
header = [cell for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1']] #you get all non-null columns
target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
data = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
for cell in row:
if cell.column in target_header and cell.value in data :
cell.value = data[cell.value]
In this case we are inside "data area" only.
3. Optional: Using Pandas
If you need more complex operation on excel data(i have to read a lots of excel in my work :( as data source). I prefer convert to pandas dataframe-> make operation -> save result .
In this case we use all the data.
from openpyxl.utils import get_column_letter
import pandas as pd
def find_limit_sheet(direction):
max_limit_value = 1
while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
max_limit_value = max_limit_value + 1
return (max_limit_value - 1) if max_limit_value != 1 else 1
max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
header = [cell.value for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1'][0]] #you get all non-null columns
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
row_data = [cell.value for cell in row]
raw_data.append(dict(zip(header, row_data)))
df = pandas.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]
You can also use a sub-set of columns using target_data for example 2.
...
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
...
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
row_data = [cell.value for cell in row if cell.column in target_header]
raw_data.append(dict(zip(header, row_data)))
df = pd.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]
...
INFO
openpyxl: 2.6.2
pandas: 0.24.2
python: 3.7.3
Data Structures: List Comprehensions doc
lambda expr: lambda expression