Tablib export corrupting files

Tablib export corrupting files - python

I'm writing a simple code to transform csv back to xls with Tablib on python.
As I understand, Tablib does conversion for you if you import the csv.
import tablib
imported_data = tablib.import_set(open('DB.csv',encoding='utf8').read())
f = open('workfile.xls', 'wb')
f.write(imported_data.xls)
f.close()
This code handles small sample of the database, but fails at one point (~600 lines) meaning that is compiles successfully but Excel cannot open the file at that point.
I'm not sure how to proceed - is this tablib failing or does Excel fail to read encoded data?

this two functions allow you to import from csv, after export to excel file
import csv
from xlsxwriter import Workbook
import operator
# This function for import from csv
def CSV2list_dict(file_name):
with open(file_name) as f:
a = [{k: int(v) for k, v in row.items()}
for row in csv.DictReader(f, skipinitialspace=True)]
return a
# file_name must be end with .xlsx
# The second parameter represente the header row of data in excel,
# The type of header is a list of string,
# The third paramater represente the data in list dictionaries form
# The last paramater represente the order of the key
def Export2excel(file_name, header_row, list_dict, order_by):
list_dict.sort(key=operator.itemgetter(order_by))
wb=Workbook(file_name)
ws=wb.add_worksheet("New Sheet") #or leave it blank, default name is "Sheet 1"
first_row=0
for header in header_row:
col=header_row.index(header) # we are keeping order.
ws.write(first_row,col,header) # we have written first row which is the header of worksheet also.
row=1
for art in list_dict:
for _key,_value in art.items():
col=header_row.index(_key)
ws.write(row,col,_value)
row+=1 #enter the next row
wb.close()
csv_data = CSV2list_dict('DB.csv')
header = ['col0','col1','col2']
order = 'col0' # the type of col0 is int
Export2excel('workfile.xlsx', header, csv_data, order)

As an alternative approach, you could just ask Excel to do the conversion as follows:
import win32com.client as win32
import os
excel = win32.gencache.EnsureDispatch('Excel.Application')
src_filename = r"c:\my_folder\my_file.csv"
name, ext = os.path.splitext(src_filename)
target_filename = name + '.xls'
wb = excel.Workbooks.Open(src_filename)
excel.DisplayAlerts = False
wb.DoNotPromptForConvert = True
wb.CheckCompatibility = False
wb.SaveAs(target_filename, FileFormat=56, ConflictResolution=2)
excel.Application.Quit()
Microsoft has a list of File formats that you can use, where 56 is used for xls.

If you are using the new openpyxl 2.5 this will not work. You need to remove 2.5 and instead pip install 2.4.9.
import tablib
Depending on whether it is a dataset(one page) or databook(multiple) you need to declare:(changes here)
imported_data = tablib.Dataset()
or
imported_data = tablib.Databook()
Then you can import your data.(changes here)
imported_data.csv = tablib.import_set(open('DB.csv', enconding='utf8').read())
without specifying the .csv in your example tablib doesn't know the format.
imported_data = tablib.import_set(open('DB.csv',encoding='utf8').read())
then you could print to see the various options you have.
print(imported_data)
print(imported_data.csv)
print(imported_data.xlsx)
print(imported_data.dict)
print(imported_data.db)
etc.
Then write your file.(No changes here)
f = open('workfile.xls', 'wb')
f.write(imported_data.xls) # or .xlsx
f.close()

Related

How to append result of for loop in excel file?

I have a .txt file with a list of keywords, I read this file and for each keyword generate some kind of string. I would like to append this string generated for each keyword to excel file. I'd like also that each time I re run the script and read .txt file with new keywords, result is always appended to the same excel file instead of overwriting it.
I have tried this, but not sure if openpyxl is a good method, also I get an error:
raise ValueError("{0} is not a valid column name".format(str_col))
ValueError: tapis roulant elettrico is not a valid column name
for line
page.append(some_result)
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
headers = ['data']
workbook_name = 'Example.xlsx'
wb = Workbook()
page = wb.active
page.title = 'data'
page.append(headers)
some_result = {}
val = "some result"
with open("keywords.txt", "r") as file:
for line in file:
some_result = {line: val}
page.append(some_result)
wb.save(filename=workbook_name)
file.close()

Just my opinion, its definitely not a good idea to save your file for every entry in your loop. Itll slow things down and overall it will probably break things and make them more complicated. I think it is likely you are saving a new column name on every iteration. I made a few changes/comments that I didnt really test, but hopefully it might help you get moving in the right direction. I'm assuming you just want a single column in your excel sheet with the keywords you mention, but to give you a complete solution I would need to know details about whether youre allowing duplicates, why you need it in excel format at all, and a few other things. If a CSV is acceptable (excel can read these) then there is a much simpler solution than what youre doing if you use numpy and or pandas.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
headers = ['data']
workbook_name = 'Example.xlsx'
wb = Workbook()
page = wb.active
page.title = 'data'
page.append(headers)
some_result = {}
val = "some result"
temp_page_list = []
with open("keywords.txt", "r") as file:
for line in file:
some_result = {line: val}
#print(some_result)
#dont append to your real excel file here in the loop, doing it in a simple list will be less complicated and faster
#page.append(some_result)
temp_page_list.append(some_result)
#dont save things here
#wb.save(filename=workbook_name)
file.close()
#print some or all of temp_page_list here
#if it looks right, you can perhaps convert the list directly by iterating and saving the elements
#a better option may be to use a built in function from openpyxl to add the contents of temp_page_list if such a function exists

I have not worked with openpyxl before but I want to give you a general understanding of the python code that you wrote.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
headers = ['data']
workbook_name = 'Example.xlsx'
wb = Workbook()
page = wb.active
page.title = 'data'
page.append(headers)
some_result = {}
val = "some result"
with open("keywords.txt", "r") as file:
for line in file:
# some_result = {line: val} # this is a dictionary
some_result = val + str(line) # this is a string
page.append(some_result)
wb.save(filename=workbook_name)
file.close()
You are trying to append a dictionary with Key being the line variable and the Value associated with this key is the some_result variable. While you are trying to append this Key, Value Pair, I think it is assuming that you want to append the Value in a row that is associated with the Key as the column (but the Key doesn't exist already). So if you try the above code, I think it would append everything under one column. If you want separate column then you need to create columns if they don't exist

stripping data from two XLXS cells to csv

im having an issue where im trying to take data from two cells in an excel spread sheet and put them into a csv file. the data is lat and lon coordinates so they have to be side by side to be read by the program. here is what i have:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import xlwt
import xlrd
import csv
import os, openpyxl, glob
from openpyxl import Workbook
with open ('test.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for file in glob.glob ("/test"):
wb = openpyxl.load_workbook('test-data.xlsx')
ws = wb.active
def lat():
for row in ws.iter_rows('Q2:Q65536'):
for cell in row:
lat = cell.value
return lat
def lon():
for row in ws.iter_rows('R2:R65536'):
for cell in row:
lon = cell.value
return lon
cord=lat()+","+lon()
print (lat()+","+lon()) #just to see if its working
#spamwriter.writerow([cord]) uncomment to write to file
however it only gives me the first row of data not the rest of the rows (test-data has around 1500 rows). how would i make it to finish going through the file?

This may not be the most dynamic way, but I would use pandas for this task. It has built in pd.read_excel() and pd.to_csv() functions.
import pandas as pd
import string
latColumn = string.lowercase.index('q') # determine index that corresponds to Excel Column letter (user lower case)
longColumn = string.lowercase.index('r') # Does not work for AA, BB, ...
data = pd.read_excel('test-data.xlsx', 'Sheet1', parse_cols=[latColumn,longColumn])
# Total number of rows being read in 65536 - 2 = 65534
csvOut = "foo.csv"
data[:65534].to_csv(csvOut, index=False, header=False)
If you need append to the file and not replace it, change the data[:65534].to_csv(....) to
open(csvOut, 'a') as f: #append to the .csv file of your likings
data[:65534].to_csv(f, index=False, header=False)

csv Writer using Datafields returned by Pandas

Hello I'm working on a project that reads an excel worksheet, collects columns of data based on header title, and then writes that data to a much leaner csv file which I'll be using for more fun later.
I'm getting a syntax error while trying to write my new csv file, I think it has something to do with the datafields I'm using to get my columns in pandas.
I'm new to Python so any help you can provide would be great, thanks!
import pandas
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook("C:\\Python27\\Work\\spreadsheet.xlsx")
sh = wb.sheet_by_name('Sheet1')
spoofingFile = open('spoofing.csv', 'wb')
wr = csv.writer(spoofingFile, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
spoofingFile.close()
csv_from_excel()
df = pandas.read_csv('C:\\Python27\\Work\\spoofing.csv')
time = df["InviteTime (Oracle)"]
orignum = df["Orig Number"]
origip = df["Orig IP Address"]
destnum = df["Dest Number"]
sheet0bj = csv.writer(open("complete.csv", "wb")
sheet0bj.writerow([time,orignum,origip,destnum])
The syntax error is thus:
file c:\python27\work\formatsheettest.py, line36
sheet0bj.writerow([time, orignum, origip, destnum])
^
Syntax error: Invalid syntax

You're missing a closing paren on the second to last line.
sheet0bj = csv.writer(open("complete.csv", "wb")
should be
sheet0bj = csv.writer(open("complete.csv", "wb"))
I assume you've figured that out by now, though.

How to concatenate three excels files xlsx using python?

Hello I would like to concatenate three excels files xlsx using python.
I have tried using openpyxl, but I don't know which function could help me to append three worksheet into one.
Do you have any ideas how to do that ?
Thanks a lot

Here's a pandas-based approach. (It's using openpyxl behind the scenes.)
import pandas as pd
# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)

I'd use xlrd and xlwt. Assuming you literally just need to append these files (rather than doing any real work on them), I'd do something like: Open up a file to write to with xlwt, and then for each of your other three files, loop over the data and add each row to the output file. To get you started:
import xlwt
import xlrd
wkbk = xlwt.Workbook()
outsheet = wkbk.add_sheet('Sheet1')
xlsfiles = [r'C:\foo.xlsx', r'C:\bar.xlsx', r'C:\baz.xlsx']
outrow_idx = 0
for f in xlsfiles:
# This is all untested; essentially just pseudocode for concept!
insheet = xlrd.open_workbook(f).sheets()[0]
for row_idx in xrange(insheet.nrows):
for col_idx in xrange(insheet.ncols):
outsheet.write(outrow_idx, col_idx,
insheet.cell_value(row_idx, col_idx))
outrow_idx += 1
wkbk.save(r'C:\combined.xls')
If your files all have a header line, you probably don't want to repeat that, so you could modify the code above to look more like this:
firstfile = True # Is this the first sheet?
for f in xlsfiles:
insheet = xlrd.open_workbook(f).sheets()[0]
for row_idx in xrange(0 if firstfile else 1, insheet.nrows):
pass # processing; etc
firstfile = False # We're done with the first sheet.

When I combine excel files (mydata1.xlsx, mydata2.xlsx, mydata3.xlsx) for data analysis, here is what I do:
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob('myfolder/mydata*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
Then, when I want to save it as one file:
writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()

Solution with openpyxl only (without a bunch of other dependencies).
This script should take care of merging together an arbitrary number of xlsx documents, whether they have one or multiple sheets. It will preserve the formatting.
There's a function to copy sheets in openpyxl, but it is only from/to the same file. There's also a function insert_rows somewhere, but by itself it won't insert any rows. So I'm afraid we are left to deal (tediously) with one cell at a time.
As much as I dislike using for loops and would rather use something compact and elegant like list comprehension, I don't see how to do that here as this is a side-effect show.
Credit to this answer on copying between workbooks.
#!/usr/bin/env python3
#USAGE
#mergeXLSX.py <a bunch of .xlsx files> ... output.xlsx
#
#where output.xlsx is the unified file
#This works FROM/TO the xlsx format. Libreoffice might help to convert from xls.
#localc --headless --convert-to xlsx somefile.xls
import sys
from copy import copy
from openpyxl import load_workbook,Workbook
def createNewWorkbook(manyWb):
for wb in manyWb:
for sheetName in wb.sheetnames:
o = theOne.create_sheet(sheetName)
safeTitle = o.title
copySheet(wb[sheetName],theOne[safeTitle])
def copySheet(sourceSheet,newSheet):
for row in sourceSheet.rows:
for cell in row:
newCell = newSheet.cell(row=cell.row, column=cell.col_idx,
value= cell.value)
if cell.has_style:
newCell.font = copy(cell.font)
newCell.border = copy(cell.border)
newCell.fill = copy(cell.fill)
newCell.number_format = copy(cell.number_format)
newCell.protection = copy(cell.protection)
newCell.alignment = copy(cell.alignment)
filesInput = sys.argv[1:]
theOneFile = filesInput.pop(-1)
myfriends = [ load_workbook(f) for f in filesInput ]
#try this if you are bored
#myfriends = [ openpyxl.load_workbook(f) for k in range(200) for f in filesInput ]
theOne = Workbook()
del theOne['Sheet'] #We want our new book to be empty. Thanks.
createNewWorkbook(myfriends)
theOne.save(theOneFile)
Tested with openpyxl 2.5.4, python 3.4.

You can simply use pandas and os library to do this.
import pandas as pd
import os
#create an empty dataframe which will have all the combined data
mergedData = pd.DataFrame()
for files in os.listdir():
#make sure you are only reading excel files
if files.endswith('.xlsx'):
data = pd.read_excel(files, index_col=None)
mergedData = mergedData.append(data)
#move the files to other folder so that it does not process multiple times
os.rename(files, 'path to some other folder')
mergedData DF will have all the combined data which you can export in a separate excel or csv file. Same code will work with csv files as well. just replace it in the IF condition

Just to add to p_barill's answer, if you have custom column widths that you need to copy, you can add the following to the bottom of copySheet:
for col in sourceSheet.column_dimensions:
newSheet.column_dimensions[col] = sourceSheet.column_dimensions[col]
I would just post this in a comment on his or her answer but my reputation isn't high enough.

How can I open an Excel file in Python?

How do I open a file that is an Excel file for reading in Python?
I've opened text files, for example, sometextfile.txt with the reading command. How do I do that for an Excel file?

Edit:
In the newer version of pandas, you can pass the sheet name as a parameter.
file_name = # path to file + file name
sheet = # sheet name or sheet number or list of sheet numbers and names
import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5)) # print first 5 rows of the dataframe
Check the docs for examples on how to pass sheet_name: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Old version:
you can use pandas package as well....
When you are working with an excel file with multiple sheets, you can use:
import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names
>>> [u'Sheet1', u'Sheet2', u'Sheet3']
df = xl.parse("Sheet1")
df.head()
df.head() will print first 5 rows of your Excel file
If you're working with an Excel file with a single sheet, you can simply use:
import pandas as pd
df = pd.read_excel(path + filename)
print df.head()

Try the xlrd library.
[Edit] - from what I can see from your comment, something like the snippet below might do the trick. I'm assuming here that you're just searching one column for the word 'john', but you could add more or make this into a more generic function.
from xlrd import open_workbook
book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
if name.endswith('2'):
sheet = book.sheet_by_name(name)
# Attempt to find a matching row (search the first column for 'john')
rowIndex = -1
for cell in sheet.col(0): #
if 'john' in cell.value:
break
# If we found the row, print it
if row != -1:
cells = sheet.row(row)
for cell in cells:
print cell.value
book.unload_sheet(name)

This isn't as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:
http://www.python-excel.org/
If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:
http://docs.python.org/library/csv.html

There's the openpxyl package:
>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>> print row[0].value()

You can use xlpython package that requires xlrd only.
Find it here https://pypi.python.org/pypi/xlpython
and its documentation here https://github.com/morfat/xlpython

This may help:
This creates a node that takes a 2D List (list of list items) and pushes them into the excel spreadsheet. make sure the IN[]s are present or will throw and exception.
this is a re-write of the Revit excel dynamo node for excel 2013 as the default prepackaged node kept breaking. I also have a similar read node. The excel syntax in Python is touchy.
thnx #CodingNinja - updated : )
###Export Excel - intended to replace malfunctioning excel node
import clr
clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel C:\Program Files\Microsoft Office\Office15\EXCEL.EXE
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0] ##Filename
sheetName = IN[1] ##Sheet
RowOffset= IN[2] ##RowOffset
ColOffset= IN[3] ##COL OFfset
Data=IN[4] ##Data
Overwrite=IN[5] ##Check for auto-overwtite
XLVisible = False #IN[6] ##XL Visible for operation or not?
RowOffset=0
if IN[2]>0:
RowOffset=IN[2] ##RowOffset
ColOffset=0
if IN[3]>0:
ColOffset=IN[3] ##COL OFfset
if IN[6]<>False:
XLVisible = True #IN[6] ##XL Visible for operation or not?
################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11 #####define special sells value constant
################################
xls = Excel.ApplicationClass() ####Connect with application
xls.Visible = XLVisible ##VISIBLE YES/NO
xls.DisplayAlerts = False ### ALerts
import os.path
if os.path.isfile(strFileName):
wb = xls.Workbooks.Open(strFileName, False) ####Open the file
else:
wb = xls.Workbooks.add# ####Open the file
wb.SaveAs(strFileName)
wb.application.visible = XLVisible ####Show Excel
try:
ws = wb.Worksheets(sheetName) ####Get the sheet in the WB base
except:
ws = wb.sheets.add() ####If it doesn't exist- add it. use () for object method
ws.Name = sheetName
#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[] ###MESSAGE GATHERING
c=0
r=0
val=""
if Overwrite == False : ####Look ahead for non-empty cells to throw error
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
if col.Value2 >"" :
OUT= "ERROR- Cannot overwrite"
raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()
##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")

import pandas as pd
import os
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)
Now you can use the power of pandas DataFrames!

This code worked for me with Python 3.5.2. It opens and saves and excel. I am currently working on how to save data into the file but this is the code:
import csv
excel = csv.writer(open("file1.csv", "wb"))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tablib export corrupting files - python

Related

How to append result of for loop in excel file?

stripping data from two XLXS cells to csv

csv Writer using Datafields returned by Pandas

How to concatenate three excels files xlsx using python?

How can I open an Excel file in Python?

Categories

Resources