Reading contents of excel file in python webapp2

Reading contents of excel file in python webapp2 - python

I have two files namely sample.csv and sample.xlsx, all those files are stored in blobstore.I am able to read the records of csv file(which is in the blobstore) using the following code
blobReader = blobstore.BlobReader(blob_key)
inputFile = BlobIterator(blobReader)
if inputFile is None:
values = None
else:
try:
stringReader = csv.reader(inputFile)
data = []
columnHeaders = []
for rowIndex, row in enumerate(stringReader):
if(rowIndex == 0):
columnHeaders = row
else:
data.append(row)
values = {'columnHeaders' : columnHeaders, 'data' : data}
except:
values = None
self.response.write(values)
The output of the above code of a sample.csv file is
{'columnHeaders': ['First Name', 'Last Name', 'Email', 'Mobile'], 'data': [['fx1', 'lx2', 'flx1x2#xxx.com', 'xxx-xxx-xxxx'], ['fy1', 'ly2', 'fly1y2#yyy.com', 'yyy-yyy-yyyy'], ['fz1', 'lz2', 'flz1z2#zzz.com', 'zzz-zzz-zzzz']]}
Using the xlrd package, i am able to read the excel file contents, but in this i have to specify the exact file location
book = xlrd.open_workbook('D:/sample.xlsx')
first_sheet = book.sheet_by_index(0)
self.response.write(first_sheet.row_values(0))
cell = first_sheet.cell(0,0)
self.response.write(cell.value)
Is there any way to read the excel file contents from the blobstore, i have tried it with the following code
blobReader = blobstore.BlobReader(blobKey)
uploadedFile = BlobIterator(blobReader)
book = xlrd.open_workbook(file_contents=uploadedFile)
(or)
book = xlrd.open_workbook(file_contents=blobReader)
But it throws some error TypeError: 'BlobReader' object has no attribute 'getitem'.
Any ideas? Thanks..

Looking into the doc for open_workbook in the xlrd package doc, it seems that when you pass "file_contents", it's expecting a string.
Then you need to look into turning a Blob into a String, which can be done with BlobReader.read(), which gives you a string of the read data.

Related

replace('\n','') does not work for import to .txt format

The place where I put this code belongs to an import page.And here there is data in the data I want to import in .txt format, but this data contains the \n character.
if request.method == "POST":
txt_file = request.FILES['file']
if not txt_file .name.endswith('.txt'):
messages.info(request,'This is not a txt file')
data_set = csv_file.read().decode('latin-1')
io_string = io.StringIO(data_set)
next(io_string)
csv_reader = csv.reader(io_string, delimiter='\t',quotechar="|")
for column in csv_reader:
b = Module_Name(
user= request.user,
a = column[1],
b = column[2],
c = column[3],
d = column[4],
e = column[5],
f = column[6],
g = column[7],
h = column[8],
)
b.save()
messages.success(request,"Successfully Imported...")
return redirect("return:return_import")
This can be called the full version of my code. To explain, there is a \n character in the data that comes here as column[1]. This file is a .txt file from another export. And in this export column[1];
This is
a value
and my django localhost new-line character seen in unquoted field - do you need to open the file in universal-newline mode? gives a warning and aborts the import to the system.

the csv reader iterates over rows, not columns. So if you want to append the data from a given column together, you must iterate over all the rows first. For example:
import csv
from io import StringIO
io_string = "this is , r0 c1\r\na value, r1 c2\r\n"
io_string = StringIO(io_string)
rows = csv.reader(io_string)
column_0_data = []
for row in rows:
column_0_data.append(row[0])
print("".join(column_0_data))
the rest of your code looks iffy to me, but that is off topic.

How can I do automation for excel to xml in python?

My question is that I have assigned one task in that I have to read excel document and store that data into XML file. So I have done one code in python for that. But it giving me error when I am writing an XML file.
#!/usr/bin/python
import xlrd
import xml.etree.ElementTree as ET
workbook = xlrd.open_workbook('anuja.xls')
workbook = xlrd.open_workbook('anuja.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # Header
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnaries
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
for set1 in data :
f = open('data.xml', 'w')
f.write("<Progress>%s</Progress>" % (set1[0]))
f.write("<P>%s</P>" % (set1[1]))
f.write("<Major>%s</Major>" % (set1[2]))
f.write("<pop>%s</pop>" % (set1[3]))
f.write("<Key>%s</Key>" % (set1[4]))
f.write("<Summary>%s</Summary>" % (set1[5]))
Error is
Traceback (most recent call last):
File "./read.py", line 23, in <module>
f.write("<Progress>%s</Progress>" % (set1[0]))
KeyError: 0

So the error message actually tells you that there is no key '0' that you try to write to the XML file.
Some more Tipps:
You open the XML file in every iteration of your loop which will fail
There are easier ways to create XML files, check out this article https://pythonadventures.wordpress.com/2011/04/04/write-xml-to-file/
You should check out a python debugger, it will make it easy for you to investigate e.g. what your data loop looks from the inside. I like ipdb most https://pypi.python.org/pypi/ipdb

Odoo8.0, cannot valid the csv file when try to import it

I have problem when i was try to import .csv file. I was try to convert image to base64 and also i was try to create barcode by name csv file. The image it's success convert into base64 but the problem when i was try to create barcode by csv file name, i was always get error like :
Unknown error during import: <class 'openerp.exceptions.ValidationError'>: ('ValidateError', u'Field(s) `ean13` failed against a constraint: You provided an invalid "EAN13 Barcode" reference. You may use the "Internal Reference" field instead.') at row 2 Resolve other errors first
And this is my code:
files = []
text = ''"
data_text3 = []
header_column2 = ["id","product_variant_ids/ean13_barcode", "product_variant_ids/ean13", "ean13", "image", "ean13_barcode", "default_code", "product_variant_ids/default_code"]
number = 1 for file in os.listdir("gmbr/"):
file_name = os.path.splitext(file)[0]
for n in str(number):
directory_file = "gmbr/"+str(file)
img = open(directory_file, 'rb').read()
img_64 = base64.encodestr
text = str(number)+","+str(name_product)+","+str(file_name)+","+str(file_name)+","+str(img_64+","+" "+","+" "+","+" ")
number += 1
data_text3.append(text)
with open('sample2.csv', 'wb') as f:
writer = csv.writer(f, delimiter='\t', dialect='excel')
writer.writerow(header_column2)
for l in data_text3:
writer.writerow(l.split(','))

Split CSV file using Python shows not all data in Excel

I am trying to dump the values in my Django database to a csv, then write the contents of the csv to an Excel spreadsheet which looks like a table (one value per cell), so that my users can export a spreadsheet of all records in the database from Django admin. Right now when I export the file, I get this (only one random value out of many and not formatted correctly):
What am I doing wrong? Not sure if I am using list comprehensions wrong, reading the file incorrectly, or if there is something wrong with my for loop. Please help!
def dump_table_to_csv(db_table, io):
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM %s" % db_table, [])
row = cursor.fetchall()
writer = csv.writer(io)
writer.writerow([i[0] for i in cursor.description])
writer.writerow(row)
with open('/Users/nicoletorek/emarshal/myfile.csv', 'w') as f:
dump_table_to_csv(Attorney._meta.db_table, f)
with open('/Users/nicoletorek/emarshal/myfile.csv', 'r') as f:
db_list = f.read()
split_db_list = db_list.split(',')
output = BytesIO()
workbook = xlsxwriter.Workbook(output)
worksheet_s = workbook.add_worksheet("Summary")
header = workbook.add_format({
'bg_color': '#F7F7F7',
'color': 'black',
'align': 'center',
'valign': 'top',
'border': 1
})
row = 0
col = 0
for x in split_db_list:
worksheet_s.write(row + 1, col + 1, x, header)

The immediate problem with your sample code, as Jean-Francois points out, is that you aren't incrementing your counters in the loop. Also you may also find it more readable to use xlsxwriter.write_row() instead of xlsxwriter.write(). At the moment a secondary complication is you aren't preserving row information when you read in your data from the CSV.
If your data looks like this:
row_data = [[r1c1, r1c2], [r2c1, r2c2], ... ]
You can then use:
for index, row in enumerate(row_data):
worksheet_s.write_row(index, 0, row)
That said, I assume you are interested in the .xlsx because you want control over formatting. If the goal is to just to generate the .xlsx and there is no need for the intermediate .csv, why not just create the .xlsx file directly? This can be accomplished nicely in a view:
import io
from django.http import HttpResponse
def dump_attorneys_to_xlsx(request):
output = io.BytesIO()
workbook = xlsxwriter.Workbook(output, {'in_memory': True})
worksheet = workbook.add_worksheet('Summary')
attorneys = Attorney.objects.all().values()
# Write header
worksheet.write_row(0, 0, attorneys[0].keys())
# Write data
for row_index, row_dict in enumerate(attorneys, start=1):
worksheet.write_row(row_index, 0, row_dict.values())
workbook.close()
output.seek(0)
response = HttpResponse(output.read(), content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = 'attachment; filename=summary.xlsx'
return response

Your CSV file could be read in and written as follows:
import csv
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet_s = workbook.add_worksheet("Summary")
with open(r'\Users\nicoletorek\emarshal\myfile.csv', 'rb') as f_input:
csv_input = csv.reader(f_input)
for row_index, row_data in enumerate(csv_input):
worksheet_s.write_row(row_index, 0, row_data)
workbook.close()
This uses the csv library to ensure the rows are correctly read in, and the write_row function to allow the whole row to be written using a single call. The enumerate() function is used to provide a running row_index value.

From password-protected Excel file to pandas DataFrame

I can open a password-protected Excel file with this:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
I'm not sure though how to transfer the information to a pandas dataframe. Do I need to read cells one by one and all, or is there a convenient method for this to happen?

Simple solution
import io
import pandas as pd
import msoffcrypto
passwd = 'xyz'
decrypted_workbook = io.BytesIO()
with open(i, 'rb') as file:
office_file = msoffcrypto.OfficeFile(file)
office_file.load_key(password=passwd)
office_file.decrypt(decrypted_workbook)
df = pd.read_excel(decrypted_workbook, sheet_name='abc')
pip install --user msoffcrypto-tool
Exporting all sheets of each excel from directories and sub-directories to seperate csv files
from glob import glob
PATH = "Active Cons data"
# Scaning all the excel files from directories and sub-directories
excel_files = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.xlsx'))]
for i in excel_files:
print(str(i))
decrypted_workbook = io.BytesIO()
with open(i, 'rb') as file:
office_file = msoffcrypto.OfficeFile(file)
office_file.load_key(password=passwd)
office_file.decrypt(decrypted_workbook)
df = pd.read_excel(decrypted_workbook, sheet_name=None)
sheets_count = len(df.keys())
sheet_l = list(df.keys()) # list of sheet names
print(sheet_l)
for i in range(sheets_count):
sheet = sheet_l[i]
df = pd.read_excel(decrypted_workbook, sheet_name=sheet)
new_file = f"D:\\all_csv\\{sheet}.csv"
df.to_csv(new_file, index=False)

Assuming the starting cell is given as (StartRow, StartCol) and the ending cell is given as (EndRow, EndCol), I found the following worked for me:
# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value
# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))
Note: Excel Cell B5 is given as row 5, col 2 in win32com. Also, we need list(...) to convert from tuple of tuples to list of tuples, since there is no pandas.DataFrame constructor for a tuple of tuples.

from David Hamann's site (all credits go to him)
https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/
Use xlwings, opening the file will first launch the Excel application so you can enter the password.
import pandas as pd
import xlwings as xw
PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']
df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df

Assuming that you can save the encrypted file back to disk using the win32com API (which I realize might defeat the purpose) you could then immediately call the top-level pandas function read_excel. You'll need to install some combination of xlrd (for Excel 2003), xlwt (also for 2003), and openpyxl (for Excel 2007) first though. Here is the documentation for reading in Excel files. Currently pandas does not provide support for using the win32com API to read Excel files. You're welcome to open up a GitHub issue if you'd like.

Based on the suggestion provided by #ikeoddy, this should put the pieces together:
How to open a password protected excel file using python?
# Import modules
import pandas as pd
import win32com.client
import os
import getpass
# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'
full_name = os.path.join(file_path, file_name)
# print(full_name)
Getting command-line password input in Python
# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')
Workbooks.Open Method (Excel)
xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')
# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
row_num += 1
cell_val = xl_sh.Cells(row_num, 1).Value
# print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)
# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
col_num += 1
cell_val = xl_sh.Cells(1, col_num).Value
# print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)
ikeoddy's answer:
content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()
python win32 COM closing excel workbook
xl_wb.Close(False)

Adding to #Maurice answer to get all the cells in the sheet without having to specify the range
wb = xw.Book(PATH, password='somestring')
sheet = wb.sheets[0] #get first sheet
#sheet.used_range.address returns string of used range
df = sheet[sheet.used_range.address].options(pd.DataFrame, index=False, header=True).value

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading contents of excel file in python webapp2 - python

Looking into the doc for open_workbook in the xlrd package doc, it seems that when you pass "file_contents", it's expecting a string. Then you need to look into turning a Blob into a String, which can be done with BlobReader.read(), which gives you a string of the read data.

Related

replace('\n','') does not work for import to .txt format

How can I do automation for excel to xml in python?

Odoo8.0, cannot valid the csv file when try to import it

Split CSV file using Python shows not all data in Excel

From password-protected Excel file to pandas DataFrame

Categories

Resources