Python xlrd- Unicode error - python

I'm reading my data from the excel file and then writing it into the DB in Django. I'm using python xlrd module
I'm getting the following error:-
'ascii' codec can't encode character u'\xc1' in position 6: ordinal not in range(128)
I've tried all the solutions like
1) I was using str(variable) . Removed it. Now storing the value as it is in the DB.
2) Tried wb = open_workbook('static/'+filename, encoding_override="utf_16_le")
3) .encode(error=replace)
But nothing worked. How am I supposed to write-off this error?
Here is my code
def __init__(self, arm_id, dsp_name, DSP, hubcode, Pincode, pptl,state):
self.arm_id = arm_id
self.dsp_name = dsp_name
self.DSP = DSP.zfill(2)
self.hubcode = hubcode
self.Pincode = Pincode
self.pptl = pptl
self.state = state
wb = open_workbook('static/'+filename, encoding_override="utf_16_le")
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
items = []
arm_list = []
pptl_list = []
pptlcode_list = []
count = 1
status = 0
for row in range(1, number_of_rows):
values = []
for col in range(number_of_columns):
value = (sheet.cell(row,col).value)
try: value = str(int(value))
except ValueError: pass
finally: values.append(value)
item = Excel(*values)
count +=1
arm_id = item.arm_id
if arm_id not in arm_list:
description = 'Arm'+arm_id
arm_obj = Arm(arm_id = arm_id, description = description)
arm_obj.save()
arm_list.append(arm_id)
pptl_id = (item.pptl)
if pptl_id not in pptl_list:
try :
pptl_obj = PPTLconfig.objects.get(pptl_id = pptl_id)
pptl_obj.arm_id = arm_obj
pptl_obj.hubcode = hubcode
except :
description = 'PPTL'+pptl_id
pptl_obj = PPTLconfig(pptl_id = pptl_id, description = description , arm_id = arm_obj, hubcode = (item.hubcode))
finally :
pptl_obj.save()
pptl_list.append(pptl_id)
code = []
for factors in SORTATION_FACTORS:
if factors == 'DSP': code.append((item.DSP))
elif factors == 'Pincode': code.append((item.Pincode))
elif factors == 'DG': code.append((item.state).zfill(4))
code = ','.join(code)
if code not in pptlcode_list :
try :
code_obj = PPTLcode.objects.get(code = code)
code_obj.pconf_id = pptl_obj
except : code_obj = PPTLcode(code=code, pconf_id=pptl_obj)
finally :
code_obj.save()
pptlcode_list.append(code)
else :
error = "Duplicate PPTLcode " + code + " at Row " + str(count)
status = 1
delete_data(1)
return (status,error)
###############Add ArmPrinterMapping ######################
arm_obj_list = Arm.objects.all()
for arm_obj in arm_obj_list:
printer_name = 'Arm'+str(arm_obj.arm_id)
ap_mapping = ArmPrinterMapping(arm_id = arm_obj, printer_name = printer_name)
ap_mapping.save()
return (0,0)

set default encoding to utf8, it should work then
reload(sys)
sys.setdefaultencoding('utf8')

Related

How to speed up this search script?

IHello,
I have created a python script which aims to complete an excel file (wb) thanks to the first column of this file composed of many references (about 4000). To complete this excel, my script must search each reference (so use a for loop of list references from reading wb file) in two other excel files transformed into dataframe (df_mbom and df_ebom) and fill the specific cells of wb according to the presence or not of the references in df_mbom and df_ebom. If the reference is found, it is necessary to compare the level of the reference and the following line and fill wb accordingly. The created script works very well and it does the job very well.
But the only problem I have is that it takes more than 6 hours to search and fill wb for 1000 references so to process the 4000 references, it would take almost 24 hours! Do you have any suggestions to speed up this program?
Here is the code used:
from multiprocessing.dummy import Pool
def finding_complete(elt):
elt = str(elt)
pos = mylist_ref.index(elt)
print(pos)
item = r'^' + elt + '$'
df_findings = df_mbom[df_mbom['Article'].str.contains(item, case=True, regex=True)]
if df_findings.shape[0] == 0 :
active_sheet.cell(row = 4+pos, column = 19).value = "NOK"
active_sheet.cell(row = 4+pos, column = 18).value = "NOK"
else :
active_sheet.cell(row = 4+pos, column = 19).value = "OK"
boolean_f = df_findings.drop_duplicates(subset = ['Article'],keep = 'first')
ind = boolean_f.index.to_list()
idx = ind[0]
item1 = df_mbom['Niveau'][idx]
item2 = df_mbom['Niveau'][idx + 1]
if item2 > item1 :
active_sheet.cell(row = 4+pos, column = 18).value = "OK"
else :
active_sheet.cell(row = 4+pos, column = 18).value = "NOK"
df_findings2 = df_ebom[df_ebom['Article'].str.contains(item, case=True, regex=True)]
pos = mylist_ref.index(elt)
if df_findings2.shape[0] == 0 :
active_sheet.cell(row = 4+pos, column = 17).value = "NOK"
else :
boolean_f = df_findings2.drop_duplicates(subset = ['Article'],keep = 'first')
ind = boolean_f.index.to_list()
idx = ind[0]
item1 = df_ebom['Niveau'][idx]
item2 = df_ebom['Niveau'][idx + 1]
if item2 > item1 :
active_sheet.cell(row = 4+pos, column = 17).value = "OK"
else :
active_sheet.cell(row = 4+pos, column = 17).value = "NOK"
if __name__ == '__main__':
start = time.time()
path = '100446099_mbom.xlsx'
df_mbom = pd.read_excel(path, sheet_name=0, header=0)
path = '100446099_ebom.xlsx'
df_ebom = pd.read_excel(path, sheet_name=0, header=0)
location = 'DOC#6TERNORrev0.xlsx'
wb = openpyxl.load_workbook(filename=location) #, data_only=True"
active_sheet = wb["DOC#6 toutes regions"]
#Get cell value and put it in a list
mylist_ref = []
for row in active_sheet.iter_rows(min_row=4, max_row=active_sheet.max_row, min_col=2, max_col=2):
for cell in row:
if cell.value == None :
pass
else:
mylist_ref.append(cell.value)
print("Number of references :")
print(len(mylist_ref))
print(" ")
with Pool() as pool: #os.cpu_count())
pool.map(finding_complete,mylist_ref) # correspond à for elt in mylist_ref: do finding_complete
wb.save(location)
wb.close()
final = time.time()
timer = final - start
print(round(timer, 1))
Thanks in advance for your time.
convert the Excel file to json, procces the json, then write it to Excel.

How to retrieve column values by column name in python whit cx_Oracle

I'm programming a script that connects to an Oracle database and get the results into a log file. I want to get a output like this:
FEC_INCLUSION = 2005-08-31 11:43:48,DEBITO_PENDIENTE = None,CAN_CUOTAS = 1.75e-05,COD_CUENTA = 67084,INT_TOTAL = None,CAN_CUOTAS_ANTERIOR = None,COD_INVERSION = 1,FEC_MODIFICACION = 10/04/2012 09:45:22,SAL_TOT_ANTERIOR = None,CUOTA_COMISION = None,FEC_ULT_CALCULO = None,MODIFICADO_POR = CTAPELA,SAL_TOTAL = 0.15,COD_TIPSALDO = 1,MONTO_COMISION = None,COD_EMPRESA = 1,SAL_INFORMATIVO = None,COD_OBJETIVO = 5,SAL_RESERVA = None,INCLUIDO_POR = PVOROPE,APORTE_PROM = 0.0,COSTO_PROM = None,CREDITO_PENDIENTE = None,SAL_PROM = 0.0,
FEC_INCLUSION = 2005-08-31 11:43:49,DEBITO_PENDIENTE = None,CAN_CUOTAS = 0.0,COD_CUENTA = 67086,INT_TOTAL = None,CAN_CUOTAS_ANTERIOR = None,COD_INVERSION = 9,FEC_MODIFICACION = 25/02/2011 04:38:52,SAL_TOT_ANTERIOR = None,CUOTA_COMISION = None,FEC_ULT_CALCULO = None,MODIFICADO_POR = OPEJAMO,SAL_TOTAL = 0.0,COD_TIPSALDO = 1,MONTO_COMISION = None,COD_EMPRESA = 1,SAL_INFORMATIVO = None,COD_OBJETIVO = 5,SAL_RESERVA = None,INCLUIDO_POR = PVOROPE,APORTE_PROM = 0.0,COSTO_PROM = None,CREDITO_PENDIENTE = None,SAL_PROM = 0.0,
I created a dictionary with the query results:
def DictFactory(description,data):
column_names = [col[0] for col in description]
results = []
for row in data:
results.append(dict(zip(column_names,row)))
return results
Then I created this function which finally save the results into my log:
def WriteLog(log_file,header,data):
file_exist = os.path.isfile(log_file)
log = open(log_file,'a')
if not file_exist:
print "File does not exist, writing new log file"
open(log_file,'w').close()
mydata = DictFactory(header,data)
checkpoint_name = ReadCheckpointName()
string = ''
for m in mydata:
for k,v in m.items():
string = string + k + ' = ' + str(v) + ','
if k == checkpoint_name:
#print "KEY FOUND"
cur_checkpoint = v
cur_checkpoint = str(cur_checkpoint)
#print string
string = string + '\n'
print cur_checkpoint
log.write(string + '\n')
WriteCheckpoint(cur_checkpoint,checkpoint_file)
log.close()
This is the main function:
def GetInfo():
mypool = PoolToDB()
con = mypool.acquire()
cursor = con.cursor()
GetLastCheckpoint()
sql = ReadQuery()
#print sql
cursor.execute(sql)
data = cursor.fetchall()
WriteLog(log_file,cursor.description,data)
#WriteCsvLog(log_file,cursor.description,data)
cursor.close()
But I realized that it works if I use a query that fetch few records, however if I try to fetch many records my script never ends.
This is my output when I executed a query with 5000 records. As you can see it takes too long.
jballesteros#SplunkPorvenir FO_TIPSALDOS_X_CUENTA]$ python db_execution.py
Starting connection: 5636
GetLastCheckpoint function took 0.073 ms
GetLastCheckpoint function took 0.025 ms
ReadQuery function took 0.084 ms
File does not exist, writing new log file
DictFactory function took 23.050 ms
ReadCheckpointName function took 0.079 ms
WriteCheckpoint function took 0.204 ms
WriteLog function took 45112.133 ms
GetInfo function took 46193.033 ms
I'm pretty sure you know a much better way to do what I am trying to do.
This is the complete code:
#!/usr/bin/env python
# encoding: utf-8
import re
import sys
try:
import cx_Oracle
except:
print "Error: Oracle module required to run this plugin."
sys.exit(0)
import datetime
import re
import commands
import os
from optparse import OptionParser
import csv
import time
#################################
#### Database Variables ####
#################################
Config = {
"host" : "",
"user" : "",
"password" : "",
"instance" : "",
"port" : "",
}
Query = {
"sql" : "",
"checkpoint_datetype" : "",
"checkpoint_name" : "",
}
dir = '/home/jballesteros/PENS2000/FO_TIPSALDOS_X_CUENTA/'
connection_dir = '/home/jballesteros/PENS2000/Connection'
checkpoint_file = dir + 'checkpoint.conf'
log_file = '/var/log/Pens2000/FO_TIPSALDOS_X_CUENTA.csv'
internal_log = '/var/log/Pens2000/internal.log'
query = dir + 'query'
sys.path.append(os.path.abspath(connection_dir))
from db_connect_pool import *
def Timing(f):
def wrap(*args):
time1 = time.time()
ret = f(*args)
time2 = time.time()
print "%s function took %0.3f ms" % (f.func_name,(time2- time1)*1000.0)
return ret
return wrap
#Timing
def InternalLogWriter(message):
now = datetime.datetime.now()
log = open(internal_log, 'a')
log.write("%s ==> %s" % (now.strftime("%Y-%m-%d %H:%M:%S"),message))
log.close()
return
#Timing
def GetLastCheckpoint():
global cur_checkpoint
conf = open(checkpoint_file, 'r')
cur_checkpoint = conf.readline()
cur_checkpoint = cur_checkpoint.rstrip('\n')
cur_checkpoint = cur_checkpoint.rstrip('\r')
conf.close()
#Timing
def ReadQuery():
global cur_checkpoint
GetLastCheckpoint()
qr = open(query, 'r')
line = qr.readline()
line = line.rstrip('\n')
line = line.rstrip('\r')
Query["sql"], Query["checkpoint_datetype"],Query["checkpoint_name"] = line.split(";")
sql = Query["sql"]
checkpoint_datetype = Query["checkpoint_datetype"]
checkpoint_name = Query["checkpoint_name"]
if (checkpoint_datetype == "DATETIME"):
sql = sql + " AND " + checkpoint_name + " >= " + "TO_DATE('%s','YYYY-MM-DD HH24:MI:SS') ORDER BY %s" % (cur_checkpoint,checkpoint_name)
if (checkpoint_datetype == "NUMBER"):
sql = sql + " AND " + checkpoint_name + " > " + "%s ORDER BY %s" % (cur_checkpoint,checkpoint_name)
qr.close()
return str(sql)
#Timing
def ReadCheckpointName():
qr = open(query, 'r')
line = qr.readline()
line = line.rstrip('\n')
line = line.rstrip('\r')
Query["sql"], Query["checkpoint_datetype"],Query["checkpoint_name"] = line.split(";")
checkpoint_name = Query["checkpoint_name"]
return str(checkpoint_name)
#Timing
def LocateCheckPoint(description):
description
checkpoint_name = ReadCheckpointName()
#print checkpoint_name
#print description
startcounter = 0
finalcounter = 0
flag = 0
for d in description:
prog = re.compile(checkpoint_name)
result = prog.match(d[0])
startcounter = startcounter + 1
if result:
finalcounter = startcounter - 1
counterstr = str(finalcounter)
print "Checkpoint found in the array position number: " + counterstr
flag = 1
if (flag == 0):
print "Checkpoint did not found"
return finalcounter
#Timing
def DictFactory(description,data):
column_names = [col[0] for col in description]
results = []
for row in data:
results.append(dict(zip(column_names,row)))
return results
#Timing
def WriteCsvLog(log_file,header,data):
checkpoint_index = LocateCheckPoint(header)
file_exists = os.path.isfile(log_file)
with open(log_file,'ab') as csv_file:
headers = [i[0] for i in header]
csv_writer = csv.writer(csv_file,delimiter='|')
if not file_exists:
print "File does not exist, writing new CSV file"
csv_writer.writerow(headers) # Writing headers once
for d in data:
csv_writer.writerow(d)
cur_checkpoint = d[checkpoint_index]
cur_checkpoint = str(cur_checkpoint)
WriteCheckpoint(cur_checkpoint,checkpoint_file)
csv_file.close()
#Timing
def WriteLog(log_file,header,data):
file_exist = os.path.isfile(log_file)
log = open(log_file,'a')
if not file_exist:
print "File does not exist, writing new log file"
open(log_file,'w').close()
mydata = DictFactory(header,data)
checkpoint_name = ReadCheckpointName()
#prin #string = ''
for m in mydata:
for k,v in m.items():
string = string + k + ' = ' + str(v) + ','
if k == checkpoint_name:
#print "KEY FOUND"
cur_checkpoint = v
cur_checkpoint = str(cur_checkpoint)
#print string
string = string + '\n'
print cur_checkpoint
log.write(string + '\n')
WriteCheckpoint(cur_checkpoint,checkpoint_file)
log.close()
#Timing
def WriteCheckpoint(cur_checkpoint,conf_file):
conf = open(conf_file,'w')
conf.write(cur_checkpoint)
conf.close()
#Timing
def GetInfo():
mypool = PoolToDB()
con = mypool.acquire()
cursor = con.cursor()
GetLastCheckpoint()
sql = ReadQuery()
#print sql
cursor.execute(sql)
#data = cursor.fetchall()
#WriteLog(log_file,cursor.description,data)
#WriteCsvLog(log_file,cursor.description,data)
cursor.close()
def __main__():
parser = OptionParser()
parser.add_option("-c","--change- password",dest="pass_to_change",help="Change the password for database connection",metavar="1")
(options, args) = parser.parse_args()
if (options.pass_to_change):
UpdatePassword()
else:
GetInfo()
__main__()
This is a query sample:
SELECT COD_EMPRESA, COD_TIPSALDO, COD_INVERSION, COD_CUENTA, COD_OBJETIVO, CAN_CUOTAS, SAL_TOTAL, INT_TOTAL, SAL_RESERVA, APORTE_PROM, SAL_PROM, COSTO_PROM, SAL_TOT_ANTERIOR, FEC_ULT_CALCULO, INCLUIDO_POR, FEC_INCLUSION, MODIFICADO_POR, TO_CHAR(FEC_MODIFICACION,'DD/MM/YYYY HH24:MI:SS') AS FEC_MODIFICACION, CUOTA_COMISION, MONTO_COMISION, SAL_INFORMATIVO, CREDITO_PENDIENTE, DEBITO_PENDIENTE, CAN_CUOTAS_ANTERIOR FROM FO.FO_TIPSALDOS_X_CUENTA WHERE ROWNUM <=100000 AND FEC_INCLUSION >= TO_DATE('2005-08-31 11:43:49','YYYY-MM-DD HH24:MI:SS') ORDER BY FEC_INCLUSION
PS: I've really been searching in google and this forum about my question but I haven't found anything similar.

How to create json file having array in Python

I want to create a json file like
{
"a":["12","34","23",...],
"b":["13","14","45",....],
.
.
.
}
key should come from the list:
lis = ['a','b',...]
and value from the sql query "select id from" + i , where I am iterating through the list through "i". This query simply returns the column id.
Here is the sample code:
lis = ['a','b','c']
len_obj = len(lis)
with open("Dataset.json", 'w') as file:
for i in lis:
file.write(i)
obj_query = i + '_query'
obj_query = sf.query("select id from " + i)
jsondata = json.loads(json.dumps(obj_query['records']))
length = len(jsondata)
i = {}
k = 0
for j in range(length):
obj_id = jsondata[j]['Id']
# print("id " + obj_id)
if k == 0:
ids = "\"" + obj_id + "\""
k = 1
else:
ids = ids + ",\"" + obj_id + "\""
if count != len_obj - 1:
file.write(ids)
else:
file.write(ids)
count += 1
file.write("}")
final output should be like:
{
"a":["12","23",...],
"b":["234","456",...],
}
This is my first blog and 1st program also.
Please guide me through this.
Please forgive the indentation of the program as I am not able to write it here properly.
You should be able to condense the whole thing down to just this:
import json
tables = ["a", "b", "c", "d"]
data = {}
for t in tables:
results = sf.query("select id from %s" % t)["records"]
data[t] = [r["id"] for r in results]
with open("Dataset.json", "w") as f:
json.dump(data, f)
You can simply create a dictionary containing the values you are after and then convert it to json using json.dumps
import json
data = {}
data['a'] = ["12","34","23"]
data['b'] = ["13","14","45"]
json_data = json.dumps(data)
print json_data
#Jaco
lis = ['a','b','c']
with open("Dataset.json", 'w') as file:
for i in lis:
obj_query = i + '_query'
obj_query = sf.query("select id from " + i)
jsondata = json.loads(json.dumps(obj_query['records']))
length = len(jsondata)
# create dict
data1 = {}
k = 0
for j in range(length):
obj_id = jsondata[j]['Id']
# print("id " + obj_id)
if k == 0:
ids = obj_id
k = 1
else:
ids = ids + "," + obj_id
data1[i] = [ids]
json_data = json.dumps(data1)
file.write(json_data)
the response i got is
{"a":["12,23,34.."]}{"b":["23,45,..."]}{...}

Python - Variable being printed over string

I am using python 2.7 and i have a problem that i haven't encountered before, when i print a certain string and then a variable on the same line the variable is printed over the string. e.g. the script is coded like so print 'IP Rating = ', ipRating and the output in command prompt will be 'IP20ating = '. I have no idea why this is happening but i have the same code for various variables and string in the same script and they all come out as expected, i have tried renaming the variable and changing the string but there is still no difference, has anybody encoutered this error before or have any ideas why this might be happening? i can post the code if requested.
Many thanks :)
EDIT
Here is the code - I know i may have repeated myself a few times and there are unneccessary library's in there but the way i work is by importing all libraries i might need and then removing unnecessary code at the end.
from bs4 import BeautifulSoup as Soup
from bs4 import BeautifulSoup
from urllib import urlopen
import webbrowser
import httplib
import urllib2
import urllib
import string
import mylib
import xlrd
import glob
import xlwt
import bs4
import sys
import os
import re
print '\nStarting Web Search'
found = False
while found == False:
excelFile = "F:\\len\\web sheets completed\\csv formatted\\imported\\re-imported\\Import Corrections\\saxby web spreadsheet.xls"
try:
inFi = xlrd.open_workbook(excelFile)
found = True
except IOError:
print 'File not found.'
inFi = xlrd.open_workbook(excelFile)
inWS = inFi.sheet_by_index(0)
headers = mylib.getHeader(inWS)
supplyHead = mylib.findHeader('Supplier Part Ref', headers)
saxbeginurl = "http://www.saxbylighting.com/index.php?pg=search&ser="
badLink = "index.php?pg=search&ser=10180&next=0"
resLink = "http://www.saxbylighting.com/images/ProductImages/Zoomed/"
overCount = 0
for t in range(524,534):
projection = 0
ipRating = 0
diameter = 0
width = 0
weight = 0
length = 0
height = 0
i = 0
w = 0
l = 0
h = 0
d = 0
p = 0
x = 0
iP = 0
wei = 0
imgStock = str(inWS.cell(t, supplyHead).value.encode('latin-1'))
overCount = overCount + 1
print '\n',imgStock
if imgStock == '3TRAWI':
url = 'http://www.saxbylighting.com/index.php?pg=details&prod=53'
elif imgStock == '10313':
url = 'http://www.saxbylighting.com/index.php?pg=details&prod=204'
else:
url = saxbeginurl + imgStock
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
img_tags = soup.find_all("img")
the_image_tag = soup.find("img", src='/images/dhl_logo.png')
try:
for dataSheet in soup.find('div',{'class':'panes'}):
#print dataSheet, ' -- ', str(i)
i = i + 1
if i == 4:
reqData = str(dataSheet).split('<img', 1)[0]
first_Data = reqData.replace('<br/>','\n')
second_Data = first_Data.replace('<b>','')
third_Data = second_Data.replace('</b>','')
fourth_Data = third_Data.replace(':',': ')
dataList = fourth_Data.split('\n')
#print dataList
for information in dataList:
if 'Weight' in dataList[wei]:
pre_Weight = dataList[wei]
sec_weight = str(pre_Weight).replace('Weight :','')
weight = sec_weight.replace(' ','')
wei += 1
if 'IP' in dataList[iP]:
ipRating = str(dataList[iP])
iP += 1
for product_Dimensions in dataList:
if 'Product dimensions :' in dataList[x]:
#print dataList[x]
dimensionList = str(dataList[x]).replace('mm','mm:')
#print dimensionList
prelim_Dimensions = dimensionList.replace('Product dimensions :','')
first_Dimensions = prelim_Dimensions.replace('cm','0mm')
sec_Dimensions = first_Dimensions.replace(' ',' ')
third_Dimensions = sec_Dimensions.strip()
dimenList = third_Dimensions.split('mm:')
#print dimenList
for project in dimenList:
if 'Proj' in dimenList[p]:
pre_pro = str(dimenList[p]).replace('Proj','')
sec_pro = pre_pro.replace(':','')
thro_pro = sec_pro.replace(' ','')
projection = thro_pro
elif p == len(dimenList):
print 'Projection not found'
p += 1
for diamet in dimenList:
if 'dia' in dimenList[d]:
pre_dia = str(dimenList[d]).replace('dia','')
sec_dia = pre_dia.replace(':','')
third_dia = sec_dia.replace(' ','')
diameter = third_dia
elif d == len(dimenList):
print 'Diameter not found'
d += 1
for heig in dimenList:
if 'H:' in dimenList[h]:
pre_hei = str(dimenList[h]).replace('H','')
sec_hei = pre_hei.replace(':','')
third_hei = sec_hei.replace(' ','')
height = third_hei
elif h == len(dimenList):
print 'Height not found'
h += 1
for lent in dimenList:
if 'L:' in dimenList[l]:
pre_leng = str(dimenList[l]).replace('L','')
sec_leng = pre_leng.replace(':','')
third_leng = sec_leng.replace(' ','')
length = third_leng
elif l == len(dimenList):
print 'Length not found'
l += 1
for wid in dimenList:
if 'W:' in dimenList[w]:
pre_wid = str(dimenList[w]).replace('W','')
sec_wid = pre_wid.replace(':','')
third_wid = sec_wid.replace(' ','')
width = third_wid
elif w == len(dimenList):
print 'Width not found'
w += 1
x += 1
print 'IP Rating = ', ipRating
print 'Weight = ', weight
print 'Projection = ', projection, 'mm'
print 'Diameter = ',diameter, 'mm'
print 'Length = ',length, 'mm'
print 'Height = ',height, 'mm'
print 'Width = ',width, 'mm'
except TypeError:
print 'Type Error... skipping this product and carrying on.'
Here is an example output
IP44ating =
Weight = .51KGS
Projection = 35 mm
Diameter = 0 mm
Length = 0 mm
Height = 90 mm
Width = 120 mm
I strongly suspect that your data ipRating that you think is IP20 is actually \rIP20. That is: that you have a stray 0x13 carriage return character in there at the start of the variable. The carriage return character is moving the print position to the start of the line and then the variable is overwriting what you printed before.
You can test whether this is the problem by adding the line:
ipRating = ipRating.replace("\r", "")
before your print statement.
This is the proper way to do what you're doing.
print('IP Rating = %s' % ipRating)
or
print('IP Rating = %d' % ipRating)
That is just one example from all the print statements you have at the end of your code.
If you're putting a string variable in print, use a %s or otherwise use a %d. If you have any more questions just ask.

Error using utf-8 filenames in python script

I have a seemingly impossible conundrum and hope that you guys can help point me in the right direction. I have been coming to and leaving this project for weeks now and I think it is about time that I solve it, with your help hopefully.
I am making a script which is supposed to read a bunch of .xls excel files from a directory structure, parse their contents and load it into a mysql database. Now, in the main function, a list of (croatian) file names gets passed to the xlrd, and that is where the problem lies.
The environment is up to date FreeBSD 9.1.
I get the following error when executing the script:
mars:~/20130829> python megascript.py
Python version: 2.7.5
Filesstem encoding is: UTF-8
Removing error.log if it exists...
It doesn't.
Done!
Connecting to database...
Done!
MySQL database version: 5.6.13
Loading pilots...
Done!
Loading tehnicians...
Done!
Loading aircraft registrations...
Done!
Loading file list...
Done!
Processing files...
/2006/1_siječanj.xls
Traceback (most recent call last):
File "megascript.py", line 540, in <module>
main()
File "megascript.py", line 491, in main
data = readxlsfile(files, 'UPIS', piloti, tehnicari, helikopteri)
File "megascript.py", line 129, in readxlsfile
workbook = open_workbook(f)
File "/usr/local/lib/python2.7/site-packages/xlrd-0.9.2-py2.7.egg/xlrd/__init__.py", line 394, in open_workbook
f = open(filename, "rb")
IOError: [Errno 2] No such file or directory: u'/2006/1_sije\u010danj.xls'
I have included the complete output to make the code fow easier to follow.
I suppose that the problem is in xlrd not accepting utf-8 file list. I'm not sure how to get around that without messing around with xlrd code though. Any ideas?
Here goes the code:
#! /usr/bin/env/python
# -#*- coding: utf-8 -*-
import os, sys, getopt, codecs, csv, MySQLdb, platform
from mmap import mmap,ACCESS_READ
from xlrd import open_workbook, xldate_as_tuple
# Define constants
NALET_OUT = ''
PUTNICI_OUT = ''
DB_HOST = 'localhost'
DB_USER = 'user'
DB_PASS = 'pass'
DB_DATABASE = 'eth'
START_DIR = u'./'
ERROR_FILE = START_DIR + 'mega_error.log'
# Functions
def isNumber(s):
# Check if a string could be a number
try:
float(s)
return True
except ValueError:
return False
def getMonth(f):
# Izvuci mjesec iz imena datoteke u formatu "1_sijecanj.xls"
temp = os.path.basename(f)
temp = temp.split('_')
mjesec = int(temp[0])
return mjesec
def getYear(f):
# Izvuci godinu iz path
f = f.split('/')
godina = f[-2]
return godina
def databaseVersion(cur):
# Print Mysql database version
try:
cur.execute("SELECT VERSION()")
result = cur.fetchone()
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s]" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % (e.args[0], e.args[1])
print "MySQL database version: %s" % result
def getQuery(cur, sql_query):
# Perform passed query on passed database
try:
cur.execute(sql_query)
result = cur.fetchall()
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s]" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % (e.args[0], e.args[1])
return result
def getFiles():
files = []
# Find subdirectories
for i in [x[0] for x in os.walk(START_DIR)]:
if (i != '.' and isNumber(os.path.basename(i))):
# Find files in subdirectories
for j in [y[2] for y in os.walk(i)]:
# For every file in file list
for y in j:
fn, fe = os.path.splitext(y)
is_mj = fn.split("_")
if(fe == '.xls' and y.find('_') and isNumber(is_mj[0])):
mj = fn.split('_')
files.append(i.lstrip('.') + "/" + y)
# Sort list cronologically
files.sort(key=lambda x: getMonth(x))
files.sort(key=lambda x: getYear(x))
return files
def errhandle(f, datum, var, vrijednost, ispravka = "NULL"):
# Get error information, print it on screen and write to error.log
f = unicode(str(f), 'utf-8')
datum = unicode(str(datum), 'utf-8')
var = unicode(str(var), 'utf-8')
try:
vrijednost = unicode(str(vrijednost.decode('utf-8')), 'utf-8')
except UnicodeEncodeError:
vrijednost = vrijednost
ispravka = unicode(str(ispravka), 'utf-8')
err_f = codecs.open(ERROR_FILE, 'a+', 'utf-8')
line = f + ": " + datum + " " + var + "='" + vrijednost\
+ "' Ispravka='" + ispravka + "'"
#print "%s" % line
err_f.write(line)
err_f.close()
def readxlsfile(files, sheet, piloti, tehnicari, helikopteri):
# Read xls file and return a list of rows
data = []
nalet = []
putn = []
id_index = 0
# For every file in list
for f in files:
print "%s" % f
temp = f.split('/')
godina = str(temp[-2])
temp = os.path.basename(f).split('_')
mjesec = str(temp[0])
workbook = open_workbook(f)
sheet = workbook.sheet_by_name('UPIS')
# For every row that doesn't contain '' or 'POSADA' or 'dan' etc...
for ri in range(sheet.nrows):
if sheet.cell(ri,1).value!=''\
and sheet.cell(ri,2).value!='POSADA'\
and sheet.cell(ri,1).value!='dan'\
and (sheet.cell(ri,2).value!=''):
temp = sheet.cell(ri, 1).value
temp = temp.split('.')
dan = temp[0]
# Datum
datum = "'" + godina + "-" + mjesec + "-" + dan + "'"
# Kapetan
kapetan = ''
kapi=''
if sheet.cell(ri, 2).value == "":
kapetan = "NULL"
else:
kapetan = sheet.cell(ri, 2).value
if kapetan[-1:] == " ":
errhandle(f, datum, 'kapetan', kapetan, kapetan[-1:])
kapetan = kapetan[:-1]
if(kapetan):
try:
kapi = [x[0] for x in piloti if x[2].lower() == kapetan]
kapi = kapi[0]
except ValueError:
errhandle(f, datum, 'kapetan', kapetan, '')
kapetan = ''
except IndexError:
errhandle(f, datum, 'kapetan', kapetan, '')
kapi = 'NULL'
else:
kapi="NULL"
# Kopilot
kopilot = ''
kopi = ''
if sheet.cell(ri, 3).value == "":
kopi = "NULL"
else:
kopilot = sheet.cell(ri, 3).value
if kopilot[-1:] == " ":
errhandle(f, datum,'kopilot', kopilot,\
kopilot[:-1])
if(kopilot):
try:
kopi = [x[0] for x in piloti if x[2].lower() == kopilot]
kopi = kopi[0]
except ValueError:
errhandle(f, datum,'kopilot', kopilot, '')
except IndexError:
errhandle(f, datum, 'kopilot', kopilot, '')
kopi = 'NULL'
else:
kopi="NULL"
# Teh 1
teh1 = ''
t1i = ''
if sheet.cell(ri, 4).value=='':
t1i = 'NULL'
else:
teh1 = sheet.cell(ri, 4).value
if teh1[-1:] == " ":
errhandle(f, datum,'teh1', teh1, teh1[:-1])
teh1 = 'NULL'
if(teh1):
try:
t1i = [x[0] for x in tehnicari if x[2].lower() == teh1]
t1i = t1i[0]
except ValueError:
errhandle(f, datum,'teh1', teh1, '')
except IndexError:
errhandle(f, datum, 'teh1', teh1, '')
t1i = 'NULL'
else:
t1i="NULL"
# Teh 2
teh2=''
t2i=''
if sheet.cell(ri, 5).value=='':
t2i = "NULL"
else:
teh2 = sheet.cell(ri, 5).value
if teh2[-1:] == " ":
errhandle(f, datum,'teh2', teh2, teh2[-1:])
teh2 = ''
if(teh2):
try:
t2i = [x[0] for x in tehnicari if x[2].lower() == teh2]
t2i = t2i[0]
except ValueError:
errhandle(f, datum,'teh2', teh2, 'NULL')
t2i = 'NULL'
except IndexError:
errhandle(f, datum,'teh2', teh2, 'NULL')
t2i = 'NULL'
else:
t2i="NULL"
# Oznaka
oznaka = ''
heli = ''
if sheet.cell(ri, 6).value=="":
oznaka = errhandle(f, datum, "helikopter", oznaka, "")
else:
oznaka = str(int(sheet.cell(ri, 6).value))
try:
heli = [x[0] for x in helikopteri if x[0] == oznaka]
except ValueError:
errhandle(f, datum, 'helikopter', oznaka, '')
except IndexError:
errhandle(f, datum, 'helikopter', oznaka, '')
heli = ''
# Uvjeti
uvjeti = sheet.cell(ri, 9).value
# Letova
letova_dan = 0
letova_noc = 0
letova_ifr = 0
letova_sim = 0
if sheet.cell(ri, 7).value == "":
errhandle(f, datum, 'letova', letova, '')
else:
letova = str(int(sheet.cell(ri, 7).value))
if uvjeti=="vfr":
letova_dan = letova
elif uvjeti=="ifr":
letova_ifr = letova
elif uvjeti=="sim":
letova_sim = letova
else:
letova_noc = letova
#Block time
bt_dan = "'00:00:00'"
bt_noc = "'00:00:00'"
bt_ifr = "'00:00:00'"
bt_sim = "'00:00:00'"
try:
bt_tpl = xldate_as_tuple(sheet.cell(ri, 8).value, workbook.datemode)
bt_m = bt_tpl[4]
bt_h = bt_tpl[3]
bt = "'" + str(bt_h).zfill(2)+":"+str(bt_m)+":00'"
except ValueError or IndexError:
errhandle(f, datum, 'bt', sheet.cell(ri,8).value, '')
if uvjeti[:3]=="vfr":
bt_dan = bt
elif uvjeti[:3]=="ifr":
bt_ifr = bt
elif uvjeti[:3]=="sim":
bt_sim = bt
elif uvjeti[:2] == "no":
bt_noc = bt
else:
errhandle(f, datum, 'uvjeti', uvjeti, '')
# Vrsta leta
vrsta = "'" + sheet.cell(ri, 10).value + "'"
# Vjezba
vjezba = 'NULL';
try:
vjezba = sheet.cell(ri, 11).value
if vjezba == '':
# Too many results
#errhandle(f, datum, 'vjezba', vjezba, '')
vjezba = 'NULL'
if vjezba == "?":
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba) == 'i':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba)[-1:] == 'i':
errhandle(f, datum, 'vjezba', str(vjezba),\
str(vjezba).rstrip('i'))
vjezba = str(vjezba).rstrip('i')
if str(vjezba).find(' i ') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' i ')[0])
vjezba = str(vjezba).split(' i ')
vjezba = vjezba[0]
if str(vjezba)[-1:] == 'm':
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).rstrip('m'))
vjezba = str(vjezba).rstrip('m')
if str(vjezba).find(';') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(';')[0])
temp = str(vjezba).split(';')
vjezba = temp[0]
if str(vjezba).find('/') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('/')[0])
temp = str(vjezba).split('/')
vjezba = temp[0]
if str(vjezba).find('-') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('-')[0])
temp = str(vjezba).split('-')
vjezba = temp[0]
if str(vjezba).find(',') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(',')[0])
temp = str(vjezba).split(',')
vjezba = temp[0]
if str(vjezba).find('_') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('_')[0])
temp = str(vjezba).split('_')
vjezba = temp[0]
if str(vjezba) == 'bo':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba).find(' ') != -1:
if str(vjezba) == 'pp 300':
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' ')[1])
temp = str(vjezba).split(' ')
vjezba = temp[1]
else:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' ')[0])
temp = str(vjezba).split(' ')
vjezba = temp[0]
if str(vjezba) == 'pp':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = ''
except UnicodeEncodeError:
errhandle(f, datum, 'Unicode error! vjezba', vjezba, '')
if vjezba != 'NULL':
vjezba = int(float(vjezba))
# Visinska slijetanja
# Putnici
vp1 = str(sheet.cell(ri, 12).value)
bp1 = str(sheet.cell(ri, 13).value)
vp2 = str(sheet.cell(ri, 14).value)
bp2 = str(sheet.cell(ri, 15).value)
# Teret
teret = ''
teret = str(sheet.cell(ri, 16).value)
if teret == '':
teret = 0
# Baja
baja = ''
if sheet.cell(ri, 17).value == '':
baja = 0
else:
baja = int(sheet.cell(ri, 17).value) / 2 # dodano /2 da se dobiju tone
# Redosljed csv
id_index = id_index + 1
row = [id_index, datum, kapi, kopi, t1i, t2i, oznaka,\
letova, letova_dan, letova_noc, letova_ifr,\
letova_sim, bt, bt_dan, bt_noc, bt_ifr,\
bt_sim, vrsta, vjezba, teret, baja]
row = [str(i) for i in row]
nalet.append(row)
putn = []
if bp1 != '':
put = [id_index, vp1, bp1]
putn.append(put)
if bp2 != '':
put = [id_index, vp2, bp2]
putn.append(put)
data.append(nalet)
data.append(putn)
return data
def main():
# Python version
print "\nPython version: %s \n" % platform.python_version()
# Print filesystem encoding
print "Filesstem encoding is: %s" % sys.getfilesystemencoding()
# Remove error file if exists
print "Removing error.log if it exists..."
try:
os.remove(ERROR_FILE)
print "It did."
except OSError:
print "It doesn't."
pass
print "Done!"
# Connect to database
print "Connecting to database..."
db = MySQLdb.connect(DB_HOST, DB_USER, DB_PASS, DB_DATABASE,\
use_unicode=True, charset='utf8')
cur=db.cursor()
print "Done!"
# Database version
databaseVersion(cur)
# Load pilots, tehnicians and helicopters from db
print "Loading pilots..."
sql_query = "SELECT eth_osobnici.id, eth_osobnici.ime,\
eth_osobnici.prezime FROM eth_osobnici RIGHT JOIN \
eth_letacka_osposobljenja ON eth_osobnici.id=\
eth_letacka_osposobljenja.id_osobnik WHERE \
eth_letacka_osposobljenja.vrsta_osposobljenja='kapetan' \
OR eth_letacka_osposobljenja.vrsta_osposobljenja='kopilot'"
#piloti = []
#piloti = getQuery(cur, sql_query)
piloti=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
piloti.append(row)
print "Done!"
print "Loading tehnicians..."
sql_query = "SELECT eth_osobnici.id, eth_osobnici.ime,\
eth_osobnici.prezime FROM eth_osobnici RIGHT JOIN \
eth_letacka_osposobljenja ON eth_osobnici.id=\
eth_letacka_osposobljenja.id_osobnik WHERE \
eth_letacka_osposobljenja.vrsta_osposobljenja='tehničar 1' \
OR eth_letacka_osposobljenja.vrsta_osposobljenja='tehničar 2'"
tehnicari=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
tehnicari.append(row)
print "Done!"
print "Loading aircraft registrations..."
sql_query = "SELECT id FROM eth_helikopteri"
helikopteri=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
helikopteri.append(row)
print "Done!"
# Get file names to process
print "Loading file list..."
files = getFiles()
print "Done!"
# Process all files from array
print "Processing files..."
data = readxlsfile(files, 'UPIS', piloti, tehnicari, helikopteri)
print "Done!"
# Enter new information in database
result = 0
print "Reseting database..."
sql_query = "DELETE FROM eth_nalet"
cur.execute(sql_query)
db.commit()
sql_query = "ALTER TABLE eth_nalet AUTO_INCREMENT=0"
cur.execute(sql_query)
db.commit()
print "Done!"
print "Loading data in 'eth_nalet'..."
for row in data[0]:
sql_query = """INSERT INTO eth_nalet (id, datum, kapetan,
kopilot, teh1, teh2, registracija, letova_uk, letova_dan,
letova_noc, letova_ifr, letova_sim, block_time, block_time_dan,
block_time_noc, block_time_ifr, block_time_sim, vrsta_leta,
vjezba, teret, baja) VALUES (%s)""" % (", ".join(row))
cur.execute(sql_query)
db.commit()
print "Done!"
print "Loading data in 'eth_putnici'..."
for row in data[1]:
sql_query = """INSERT INTO eth_putnici (id_leta,
vrsta_putnika, broj_putnika) VALUES (%s)""" % (", ".join(row))
cur.execute(sql_query)
db.commit()
print "Done!"
# Close the database connection
print "Closing database connection..."
if cur:
cur.close()
if db:
db.close()
print "Database closed!"
if __name__ == '__main__':
main()
I apologize for not translating comments in the code, it was an old project of mine and I tend to make comments in english now. If something needs explanation please fire away.
The funny thing is that if I print the file list to the screen, they display just fine. But when they get passed to the xlrd they don't seem to be in the right format.
Respectfully,
me
I finally managed to find an error! It wasn't due to encoding error after all. It was a logic error.
In function getFiles() I stripped the leading "." from file list, and didn't strip "./" as I ought to. So, naturally file names were "/2006/1_siječanj.xls" instead of "2006/1_siječanj.xls" as they should be. It was an IOError and not not UnicodeEncodeError. And result of my oversight was that the script tried to find an absolute path instead of a relative path.
Well this was embarrassing. Thank you guys, hope this post helps someone else pay more attention to the error types python throws at us.
It looks like xlrd isn't converting the Unicode type to a local encoded type before trying to open the file. Python has guessed that the filesystem name encoding is UTF-8 and has correctly converted the č to the correct Unicode point.
There's two ways to fix this:
Try encoding the Unicode filename before asking xlrd to open it with:
workbook = open_workbook(f.encode(sys.getfilesystemencoding() ) )
Use raw 8bit filenames and don't convert filenames to Unicode
START_DIR = './'
IMHO, option 2 is probably safer in-case filenames haven't been written with UTF-8 filenames.
UPD
Note, os.walk returns Unicode strings when the given path is a Unicode string. A normal string path will return binary strings. This is the same behaviour as os.listdir (http://docs.python.org/2/library/os.html#os.listdir).
Example:
$ ls
€.txt
$ python
>>> import os
>>> os.listdir(".")
['\xe2\x82\xac.txt']
>>> os.listdir(u".")
[u'\u20ac.txt']
(e282 = UTF-8 €)
Remember: In Unix, unlike Windows, filenames do not contain encoding hints. Filenames are simply 8bit strings. You need to know what encoding they were created with if you want to convert them to a different encoding.

Categories