My aim is to compare a table from a htm file with a table from a xlsx file and i done it all by converting to a dataframe using python. Its all working correctly and could display the correct value from xlsx file but when I try to copy the file from xlsx file to a new xlsx file which i convert the information to a table with the name and values as column, it gives me an error. It could show the correct value when i use print(data[y].values[z,1]) but when i want to put it into the excel file im getting an error with worksheet.write_string(row,col,data[y].values[z,1]). I had tried first convert the value to a string by value=str(data[y].values[z,1]) and the print(value) then i only put the value variable into the xlsx file by worksheet.write_string(row,col,value) but everything i get from the output file is nan for the value. The name could be shows in characters but the value could not shown out. Is it because my value is a 8-bit value and the value 8'h0 and contains the symbol ' so it could not be done by the library? If it is a yes, then how can I solve this problem?
This is the output file:
This is what i get with print(data[y].values[z,1]):
This is my source code:
import pandas as pd
import numpy as np
htm = pd.read_html('HAS.htm')[5]
xlsx = pd.ExcelFile('MTL_SOCSouth_PCH_and_IOE_Security_Parameters.xlsm')
import xlsxwriter
workbook = xlsxwriter.Workbook('Output01.xlsx')
worksheet = workbook.add_worksheet()
sheets=xlsx.sheet_names
#remove unwanted sheet
sheets.pop(0);
sheets.pop(0);
sheets.pop(0);
sheets.pop(-1);
sheets.pop(-1);
sheets.pop(-1);
sheets.pop(-1);
#create an array to store data for each sheet
data=[0]*(len(sheets))
#insert each sheet into array
for x in range(len(sheets)):
data[x]=xlsx.parse(sheets[x],header=4,usecols='B,AM')
data[x]=pd.DataFrame(data[x])
#initialize to first row
row = 0
#loop from first row from htm file to last row
for x in range(len(htm.index)):
chapter=(htm.values[x,3])
chapter=chapter[:chapter.find(": ")]
chapter=chapter.split("Chapter ",maxsplit=1)[-1]
#if the chapter is equal to 37 then proceed, ignore if not 37
if(chapter=='37'):
col = 0
source=htm.values[x,0]
source=source[:source.find("[")]
print(source)
for y in range((len(sheets))):
for z in range(len(data[y].index)):
target=data[y].values[z,0]
targetname=str(target)
worksheet.write(row,col,targetname)
if source==target:
col += 1
print(sheets[y])
worksheet.write(row,col,sheets[y])
col += 1
print(data[y].values[z,1])
worksheet.write_string(row,col,data[y].values[z,1])
row += 1
workbook.close()
Related
I am working on a problem where I have 2 .tsv files and one has been arranged wrongly with respect to the other one.
When I scan the file , I noticed a pattern which I am unable to put it in terms of coding language. The pattern that I observed was :
For every increase in the row number of metadata file = 8 rows of increment to match in the flipped_metadata.tsv file to match the same values in the metadata file
For every increase in the flipped_metadata file = 12 rows if increment in the metadata.tsv file to match the same values in the flipped_metadata file.
For more clarity I have attached the 2 .tsv files along with this:
Metadata.tsv file and Flipped_metadata.tsv file
The openpyxl library has good functions for dealing with Excel cell locations. These can be used to convert A1 to a proper row and column.
Read each row in and convert the cell reference to a simple numeric row and column value. Use a dictionary to store each cell found with the two values for that cell. e.g. cells[(1,1)] = "123 456"
Whilst reading in, keep a track of the largest row and column.
Create an empty array (list of lists) to allow each cell to be assigned into.
Iterate over all of the dictionary items and assign each value into the array.
Finally save the array to a new CSV file.
For example:
from openpyxl.utils.cell import coordinate_from_string, column_index_from_string
import csv
def flip(input_filename, output_filename):
cells = {}
max_row = 0
max_col = 0
with open(input_filename) as f_input:
for cell, v1, v2 in csv.reader(f_input, delimiter='\t'):
col_letter, row_number = coordinate_from_string(cell)
col_number = column_index_from_string(col_letter)
cells[(row_number, col_number)] = f"{v1} {v2}"
if row_number > max_row:
max_row = row_number
if col_number > max_col:
max_col = col_number
output = [[''] * max_col for _ in range(max_row)]
for (row_number, col_number), values in cells.items():
output[row_number-1][col_number-1] = values
with open(output_filename, 'w', newline='') as f_output:
csv.writer(f_output).writerows(output)
flip('metadata.tsv', 'output_metadata.csv')
flip('flipped_metadata.tsv', 'output_flipped_metadata.csv')
This would give you:
Note: this approach correctly handles all cell references e.g. FK42. It would also handle holes in the data, if A2 was deleted it would still align correctly, as it is not 100% clear if data in cells can be missing,
I have existing excel document and want to update M column according to A column. And I want to start from second row to maintain first row 'header'.
Here is my code;
import openpyxl
wb = openpyxl.load_workbook('D:\Documents\Desktop\deneme/formula.xlsx')
ws=wb['Sheet1']
for i, cellObj in enumerate(ws['M'], 1):
cellObj.value = '=_xlfn.ISOWEEKNUM(A2)'.format(i)
wb.save('D:\Documents\Desktop\deneme/formula.xlsx')
When I run that code;
-first row 'header' changes.
-all columns in excel "ISOWEEKNUM(A2)", but I want it to change according to row number (A3,A4,A5... "ISOWEEKNUM(A3), ISOWEEKNUM(A4), ISOWEEKNUM(A5)....")
Edit:
I handled right now the ISOWEEKNUM issue with below code. I changed A2 to A2:A5.
import openpyxl
wb = openpyxl.load_workbook('D:\Documents\Desktop\deneme/formula.xlsx')
ws=wb['Sheet1']
for i, cellObj in enumerate(ws['M'], 1):
cellObj.value = '=_xlfn.ISOWEEKNUM(A2:A5)'.format(i)
wb.save('D:\Documents\Desktop\deneme/formula.xlsx')
But still starts from first row.
Here is an answer using pandas.
Let us consider the following spreadsheet:
First import pandas:
import pandas as pd
Then load the third sheet of your excel workbook into a dataframe called df:
df=pd.read_excel('D:\Documents\Desktop\deneme/formula.xlsx', sheet_name='Sheet3')
Update column 'column_to_update' using column 'deneme'. The line below converts the dates in the 'deneme' column from strings to datetime objects and then returns the week of the year associated with each of those dates.
df['Column_to_update'] = pd.to_datetime(df['deneme']).dt.week
You can then save your dataframe to a new excel document:
df.to_excel('./newspreadsheet.xlsx', index=False)
Here is the result:
You can see that the values in 'column_to_update' got updated from 1, 2 and 3 to 12, 12 and 18.
I am trying to automate a process that basically reads in values from text files into certain excel cells. I have a template in excel that will read data from various sheets under certain names. For example, the template will read in data from "Video scores". Video scores is a .txt file that I copy and paste into excel. There are 5 different text files used in each project so it gets tedious after a while and when there are a lot of projects to complete.
How can I import or copy and paste these .txt files into excel to a specified sheet? I have been using openpyxl for the other parts of this project, but I am open to using another library if it can't be done with openpxl.
I've also tried opening and reading a file, but I couldn't figure out how to do what I want with that either. I have found a list of all the files I need, its just a matter of getting them into excel.
Thanks in advance for anyone who helps.
First, import the TXT file into a list in python, i'm asumming the TXT file is like this
1
2
3
4
....
with open(path_txt, "r") as e:
list1 = [i for i in e]
then, we paste the values of the list on the worksheet you need
from openpyxl import load_workbook
wb = load_workbook(path_xlsx)
ws = wb[sheet_name]
ws["A1"] = "values" #just a header
row = 2 #represent the 2 row of the sheet
column = 1 #represent the column "A" of the sheet
for i in list1:
ws.cell(row=row, column=column).value = i #getting the current cell, and writing the value of the list
row += 1 #just setting the current to the next
wb.save(path_xlsx)
Hope this works for you.
Pandas would do the trick!
Approach:
Have a sheet containing path to your files, separator, the corresponding target sheet names
Now read this excel sheet using pandas and iterate over each row for each file details, read the data, write it to new excel sheet of same workbook.
import pandas as pd
file_details_path = r"/Users/path for xl sheet/file details/File2XlDetails.xlsx"
target_sheet_path = r"/Users/path to target xl sheet/File samples/FiletoXl.xlsx"
# create a writer to save the file content in excel
writer = pd.ExcelWriter(target_sheet_path, engine='xlsxwriter')
file_details = pd.read_excel(file_details_path,
dtype = str,
index_col = False
)
def write_to_excel(file, trg_sheet_name):
# writes it to excel
file.to_excel(writer,
sheet_name = trg_sheet_name,
index = False,
)
# loop through each file record
for index, file_dtl in file_details.iterrows():
# you can print and check the row content for reference
print(file_dtl['File_path'])
print(file_dtl['Separator'])
print(file_dtl['Target_sheet_name'])
# reads file
file = pd.read_csv(file_dtl['File_path'],
sep = file_dtl['Separator'],
dtype = str,
index_col = False,
)
write_to_excel(file, file_dtl['Target_sheet_name'])
writer.save()
Hope this helps! Let me know if you run into any issues...
I am new to Python so only just getting to grips with it and would really appreciate some help as I can't figure out how to write the values from file A into file B.
I would like to:
filter the values of column D from 'mf_mar_2018.xls' (filter of 'saxon')
write the found values into a new file called 'saxons.xls'
I am able to get the non-filtered values and print them in Terminal.
My script is below:
#import the writer
import xlwt
#open the spreadsheet
workbook = xlwt.Workbook()
#add a sheet named "Club BFA ranking"
worksheet1 = workbook.add_sheet("Club BFA ranking")
#in cell 0,0 (first cell of the first row) write "Ranking"
worksheet1.write(0, 0, "Ranking")
#in cell 0,1 (second cell of the first row) write "Name"
worksheet1.write(0, 1, "Name")
#save and create the spreadsheet file
workbook.save("saxons.xls")
#import the reader
import xlrd
#open the rankings spreadsheet
book = xlrd.open_workbook('mf_mar_2018.xls')
#open the first sheet
first_sheet = book.sheet_by_index(0)
#print the values in the second column of the first sheet
print first_sheet.col_values(1)
Try something like this.
name = []
rank = []
for i in range(first_sheet.nrows):
#print(first_sheet.cell_value(i,3))
if('Saxon' in first_sheet.cell_value(i,3)):
name.append(first_sheet.cell_value(i,2))
rank.append(first_sheet.cell_value(i,8))
print('a')
for j in range(len(name)):
worksheet1.write(j+1,0,rank[j])
worksheet1.write(j+1,1,name[j])
workbook.save("saxons.xls")
I would like to convert .dbf file to .xls using python. I've referenced this snippet, however I cannot get the first non header row to write using this code:
from xlwt import Workbook, easyxf
import dbfpy.dbf
dbf = dbfpy.dbf.Dbf("C:\\Temp\\Owner.dbf")
book = Workbook()
sheet1 = book.add_sheet('Sheet 1')
header_style = easyxf('font: name Arial, bold True, height 200;')
for (i, name) in enumerate(dbf.fieldNames):
sheet1.write(0, i, name, header_style)
for row in range(1, len(dbf)):
for col in range(len(dbf.fieldNames)):
sheet1.row(row).write(col, dbf[row][col])
book.save("C:\\Temp\\Owner.xls")
How can I get the first non header row to write?
Thanks
You are missing out row 0 in the dbf which is the first row. In dbf files the column names are not a row. However row 0 in the Excel file is the header so the index needs to differ in the dbf and the xls so you need to add 1 to the row used in the Excel worksheet.
So
for row in range(len(dbf)):
for col in range(len(dbf.fieldNames)):
sheet1.row(row+1).write(col, dbf[row][col])
Note the snipper referred to does not add the 1 in the range either