dbf to xls - first non header row not writing - python

I would like to convert .dbf file to .xls using python. I've referenced this snippet, however I cannot get the first non header row to write using this code:
from xlwt import Workbook, easyxf
import dbfpy.dbf
dbf = dbfpy.dbf.Dbf("C:\\Temp\\Owner.dbf")
book = Workbook()
sheet1 = book.add_sheet('Sheet 1')
header_style = easyxf('font: name Arial, bold True, height 200;')
for (i, name) in enumerate(dbf.fieldNames):
sheet1.write(0, i, name, header_style)
for row in range(1, len(dbf)):
for col in range(len(dbf.fieldNames)):
sheet1.row(row).write(col, dbf[row][col])
book.save("C:\\Temp\\Owner.xls")
How can I get the first non header row to write?
Thanks

You are missing out row 0 in the dbf which is the first row. In dbf files the column names are not a row. However row 0 in the Excel file is the header so the index needs to differ in the dbf and the xls so you need to add 1 to the row used in the Excel worksheet.
So
for row in range(len(dbf)):
for col in range(len(dbf.fieldNames)):
sheet1.row(row+1).write(col, dbf[row][col])
Note the snipper referred to does not add the 1 in the range either

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

xlsxwriter having problems when inserting 8-bit values with ' symbol

My aim is to compare a table from a htm file with a table from a xlsx file and i done it all by converting to a dataframe using python. Its all working correctly and could display the correct value from xlsx file but when I try to copy the file from xlsx file to a new xlsx file which i convert the information to a table with the name and values as column, it gives me an error. It could show the correct value when i use print(data[y].values[z,1]) but when i want to put it into the excel file im getting an error with worksheet.write_string(row,col,data[y].values[z,1]). I had tried first convert the value to a string by value=str(data[y].values[z,1]) and the print(value) then i only put the value variable into the xlsx file by worksheet.write_string(row,col,value) but everything i get from the output file is nan for the value. The name could be shows in characters but the value could not shown out. Is it because my value is a 8-bit value and the value 8'h0 and contains the symbol ' so it could not be done by the library? If it is a yes, then how can I solve this problem?
This is the output file:
This is what i get with print(data[y].values[z,1]):
This is my source code:
import pandas as pd
import numpy as np
htm = pd.read_html('HAS.htm')[5]
xlsx = pd.ExcelFile('MTL_SOCSouth_PCH_and_IOE_Security_Parameters.xlsm')
import xlsxwriter
workbook = xlsxwriter.Workbook('Output01.xlsx')
worksheet = workbook.add_worksheet()
sheets=xlsx.sheet_names
#remove unwanted sheet
sheets.pop(0);
sheets.pop(0);
sheets.pop(0);
sheets.pop(-1);
sheets.pop(-1);
sheets.pop(-1);
sheets.pop(-1);
#create an array to store data for each sheet
data=[0]*(len(sheets))
#insert each sheet into array
for x in range(len(sheets)):
data[x]=xlsx.parse(sheets[x],header=4,usecols='B,AM')
data[x]=pd.DataFrame(data[x])
#initialize to first row
row = 0
#loop from first row from htm file to last row
for x in range(len(htm.index)):
chapter=(htm.values[x,3])
chapter=chapter[:chapter.find(": ")]
chapter=chapter.split("Chapter ",maxsplit=1)[-1]
#if the chapter is equal to 37 then proceed, ignore if not 37
if(chapter=='37'):
col = 0
source=htm.values[x,0]
source=source[:source.find("[")]
print(source)
for y in range((len(sheets))):
for z in range(len(data[y].index)):
target=data[y].values[z,0]
targetname=str(target)
worksheet.write(row,col,targetname)
if source==target:
col += 1
print(sheets[y])
worksheet.write(row,col,sheets[y])
col += 1
print(data[y].values[z,1])
worksheet.write_string(row,col,data[y].values[z,1])
row += 1
workbook.close()

Copy column of cell values from one workbook to another with openpyxl

I am extracting data from one workbook's column and need to copy the data to another existing workbook.
This is how I extract the data (works fine):
wb2 = load_workbook('C:\\folder\\AllSitesOpen2.xlsx')
ws2 = wb2['report1570826222449']
#Extract column A from Open Sites
DateColumnA = []
for row in ws2.iter_rows(min_row=16, max_row=None, min_col=1, max_col=1):
for cell in row:
DateColumnA.append(cell.value)
DateColumnA
The above code successfully outputs the cell values in each row of the first column to DateColumnA
I'd like to paste the values stored in DateColumnA to this existing destination workbook:
#file to be pasted into
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
But I am missing a piece conceptually here. I can't connect the dots. Can someone advise how I can get this data from my source workbook to the new destination workbook?
Lets say you want to copy the column starting in cell 'A1' of 'Sheet1' in wb3:
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
for counter in range(len(DateColumnA)):
cell_id = 'A' + str(counter + 1)
ws3[cell_id] = DateColumnA[counter]
wb3.save('C:\\folder\\output.xlsx')
I ended up getting this to write the list to another pre-existing spreadsheet:
for x, rows in enumerate(DateColumnA):
ws3.cell(row=x+1, column=1).value = rows
#print(rows)
wb3.save('C:\\folder\\output.xlsx')
Works great but now I need to determine how to write the data to output.xlsx starting at row 16 instead of row 1 so I don't overwrite the first 16 existing header rows in output.xlsx. Any ideas appreciated.
I figured out a more concise way to write the source data to a different starting row on destination sheet in a different workbook. I do not need to dump the values in to a list as I did above. iter_rows does all the work and openpyxl nicely passes it to a different workbook and worksheet:
row_offset=5
for rows in ws2.iter_rows(min_row=2, max_row=None, min_col=1, max_col=1):
for cell in rows:
ws3.cell(row=cell.row + row_offset, column=1, value=cell.value)
wb3.save('C:\\folder\\DestFile.xlsx')

How to get Python script to write to existing sheet

I am writing a Python script and stuck on one of the early steps. I am opening an existing sheet and want to add two columns so I have used this:
#import the writer
import xlwt
#import the reader
import xlrd
#open the sussex results spreadsheet
book = xlrd.open_workbook('sussex.xlsx')
#open the first sheet
first_sheet = book.sheet_by_index(0)
#print the values in the second column of the first sheet
print first_sheet.col_values(1)
#in cell 0,0 (first cell of the first row) write "NIF"
sheet1.write(0, 6, "NIF")
#in cell 0,0 (first cell of the first row) write "Points scored"
sheet1.write(0, 6, "Points scored")
On line 12 I get an error:
name 'sheet1' is not defined
How do I define sheet 1 within the sheet that I have already opened?
sheet1 is never declared. Try changing it to
#import the writer
import xlwt
#import the reader
import xlrd
#open the sussex results spreadsheet
book = xlrd.open_workbook('sussex.xlsx')
#open the first sheet
first_sheet = book.sheet_by_index(0)
#print the values in the second column of the first sheet
print first_sheet.col_values(1)
#in cell 0,0 (first cell of the first row) write "NIF"
first_sheet.write(0, 6, "NIF")
#in cell 0,0 (first cell of the first row) write "Points scored"
first_sheet.write(0, 6, "Points scored")
edit: You could also use Pandas to read and write to Excel:
import pandas as pd
import numpy as np
#open the sussex results spreadsheet, first sheet is used automatically
df = pd.read_excel('sussex.xlsx')
#print the values in the second column of the first sheet
print(df.iloc[:,1])
#Create column 'NIF'
df['NIF'] = np.nan #I don't know what you want to do with this column, so I filled it with NaN's
#in cell 0,7 (first cell of the first row) write "Points scored"
df['Points scored'] = np.nan #I don't know what you want to do with this column, so I filled it with NaN's
<.... Do whatever calculations you want with NIF and Points scored ...>
# Write output
df.to_excel('sussex.xlsx')
I guess you need to have something like
sheet1 = book.sheet_by_index(0); because now sheet1 is not defined.
Also, document is opened using xlrd which is reader, and you need to write there values - so document should be opened also using xlwt.

how to sort xls file column wise and write it to another file with entire row using python?

how to sort xls file column wise and write it to another file with entire row using python ? the xls file has to be sorted column wise. And after sorting it has to be writen into another file.
How about:
column = 0 #The column you want to sort by
reader = list(csv.reader(open('input.xsl')))
reader.sort(key=lambda x: x[column])
writer = csv.writer(open('output.xsl', 'w'))
writer.writerows(reader)
My bad, well you can always export as csv i guess. If you want to stick to xls you can use xlrd and xlwt. I haven't worked much with this but I do have a sample from a task I had to do a while back. Here it is(not that is not 100% good because the cell titles for each columns will be stored as the first row on data on the output file):
import xlwt
from xlrd import open_workbook
target_column = 0
book = open_workbook('input.xls', formatting_info=True)
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0]
data = data[1:]
data.sort(key=lambda x: x[target_column])
wbk = xlwt.Workbook()
sheet = wbk.add_sheet(sheet.name)
for idx, label in enumerate(labels):
sheet.write(0, idx, label)
for idx_r, row in enumerate(data):
for idx_c, value in enumerate(row):
sheet.write(idx_r+1, idx_c, value)
wbk.save('result.xls')

Categories