sqlite3 database won't accept fonted reshaped arabic data - python - python

OS: ubuntu 20.04
IDE: pyCharm community Build #PC-223.8214.51, built on December 20, 2022
Language: python 3.10.6
I'm using kivy to develop a simple app that queries an sqlite3 database for data written in Arabic. Now, because kivy doesn't natively support Arabic, I had to use both BIDI and arabic-reshaper so I can allow the user to type in proper Arabic.
The database is created from a .csv file, which in turn created by the LibreOffice calc also on Ubuntu 20.04. I have used pandas on terminal to create my database as follows:
users = pd.read_csv('numberDataBaseCSV.csv')
users.to_sql('users', conn, if_exists='replace', index = False)
First, I had to override the TextInput.insert_text as follows:
class Ar_text(TextInput):
max_chars = NumericProperty(20) # maximum character allowed
str = StringProperty()
def __init__(self, **kwargs):
super(Ar_text, self).__init__(**kwargs)
self.text = bidi.algorithm.get_display(arabic_reshaper.reshape("اطبع شيئاً"))
def insert_text(self, substring, from_undo=False):
if not from_undo and (len(self.text) + len(substring) > self.max_chars):
return
self.str = self.str+substring
self.text = bidi.algorithm.get_display(arabic_reshaper.reshape(self.str))
substring = ""
super(Ar_text, self).insert_text(substring, from_undo)
def do_backspace(self, from_undo=False, mode='bkspc'):
self.str = self.str[0:len(self.str)-1]
self.text = bidi.algorithm.get_display(arabic_reshaper.reshape(self.str))
Then I built the app as usual:
class TestApp(kivy.app.App):
def build(self):
def create_connection(db_file):
""" create a database connection to the SQLite database
specified by the db_file
:param db_file: database file
:return: Connection object or None
"""
conn = None
try:
conn = sqlite3.connect(db_file)
except sqlite3.Error as e:
print(e)
return conn
self.dbConnect = create_connection("numberData.db")
self.number_cursor = self.dbConnect.cursor()
reshaped_text = arabic_reshaper.reshape("بحث")
bidi_text = bidi.algorithm.get_display(reshaped_text)
self.circuit_number_label = kivy.uix.label.Label(text=bidi_text, font_name="janna-lt-bold/Janna LT Bold/Janna LT Bold.ttf")
self.nameTextField = Ar_text(text=bidi_text, font_name="janna-lt-bold/Janna LT Bold/Janna LT Bold.ttf")
self.searchButton = kivy.uix.button.Button(text=bidi_text, font_name="janna-lt-bold/Janna LT Bold/Janna LT Bold.ttf")
self.searchButton.bind(on_press = self.search_number_by_name)
boxLayout = kivy.uix.boxlayout.BoxLayout(orientation="vertical")
boxLayout.add_widget(self.circuit_number_label)
boxLayout.add_widget(self.nameTextField)
boxLayout.add_widget(self.searchButton)
return boxLayout
ِAnd this is the callback that's triggered when I click the button:
def search_number_by_name(self, event):
reshaped_text = bidi.algorithm.get_display(self.nameTextField.text)
print(reshaped_text)
print("select الرقم,الدارة from numbers_table where الاسم = '" + reshaped_text.strip() + "'")
self.number_cursor.execute("select الرقم,الدارة from numbers_table where الاسم = '" + reshaped_text.strip() + "'")
rows = self.number_cursor.fetchall()
for row in rows:
print(row)
And then of course running the app:
testApp = TestApp()
testApp.run()
Even though the data is correct and I should get the queried data from the database, nothing seems to be happening. Not even an error message.
when I hard code the data in the where clause:
quereied_data = "خالد"
and then build the query:
self.number_cursor.execute("select الرقم,الدارة from numbers_table where الاسم = '" + queried_data + "'")
I get the expected results from the database, only when I retrieve the data from the TextInput I get nothing.
I've tried to "undo" the reshaping process with the code snippet:
reshaped_text = bidi.algorithm.get_display(self.nameTextField.text)
But that also didn't work, I'm not sure where the problem is since I can't get any error messages. I'd appreciate the help.

As it turns out, there's a property in the TextInput class: str, so in order to retrieve the data and use in a query you need to do the following:
reshaped_text = self.nameTextField.str
Then use it as usual in the built query:
self.number_cursor.execute("select الرقم,الدارة from numbers_table where الاسم = '" + reshaped_text.strip() + "'")
rows = self.number_cursor.fetchall()
for row in rows:
print(row)

Related

Strange escape character added to record's message in Python logger

I am facing a weird issue.
The following code
logger.info("Setting coordinates to [lat: " + str(self.curr_coord.latitude())
+ ", lng: " + str(self.curr_coord.longitude())
+ "]")
The values returned by the functions mentioned above are simple double.
produces a string, which contains escape character after str(self.curr_coord.latitude()). In my custom logging handler I am connecting to a SQLite3 database file. In addition I have the ability to dump the log records into a CSV file.
The code above produces an entry that is exported the following entry
2021-10-26T14:47:39.528605+02:00,0,geointelkit,"Setting coordinates to [lat: 48.9475, lng: 8.4106]"
My CSV writer looks like this:
with open(db_dump_filename, "w", newline="") as csv_dump_file:
cols = ["Timestamp", "Level", "Source", "Message"]
db_exporter = csv.writer(csv_dump_file)
db_exporter.writerow(cols)
db_exporter.writerows(entries)
All of my other entries do not have quotation marks. At first I thought it may be the length of the message and some weird stuff SQLite is doing with it. However after adding an even longer string I didn't get the same problem.
The next thing I did was to customize the CSV writer:
with open(db_dump_filename, "w", newline="") as csv_dump_file:
cols = ["Timestamp", "Level", "Source", "Message"]
db_exporter = csv.writer(csv_dump_file,
quoting=csv.QUOTE_NONE,
delimiter=",",
escapechar="#")
db_exporter.writerow(cols)
db_exporter.writerows(entries)
The modified writer's output for that entry was
2021-10-26T14:47:39.528605+02:00,0,geointelkit,Setting coordinates to [lat: 48.9475#, lng: 8.4106]
Notice the # after 48.9475. While I do some formatting in my custom handler in regards to the records it handles it is only related to the timestamp (first column) and value after it (second column), which is simply mapping the logger level numeric values (20, 30, 40...) to the ones I use in my application. The record message is not touched in any way. Here is the formatter I'm using in my custom logger class:
self.formatter = logging.Formatter(fmt="%(asctime)s %(levelno)d %(name)s %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S")
The custom handler and custom logger are shown below. LogLevel is a simple class that includes (beside my different levels of logging) functions for mapping a log level to different representation. I didn't include it since it's not important.
LogEntry = namedtuple("LogEntry", "dtime lvl src msg")
class LogHandlerSQLite(logging.Handler):
db_init_script = """
CREATE TABLE IF NOT EXISTS logs(
TimeStamp TEXT,
Level INT,
Source TEXT,
Message TEXT
);
"""
db_clear_script = """
DROP TABLE IF EXISTS logs;
"""
db_insert_script = """
INSERT INTO logs(
Timestamp,
Level,
Source,
Message
)
VALUES (
'%(tstamp)s',
%(levelno)d,
'%(name)s',
'%(message)s'
);
"""
db_get_all = """
SELECT *
FROM logs
"""
def __init__(self, db_path="logs.db"):
logging.Handler.__init__(self)
self.db = db_path
conn = sq3.connect(self.db)
conn.execute(LogHandlerSQLite.db_init_script)
conn.commit()
conn.close()
def dump(self):
logger = logging.getLogger("geointelkit")
logger.info("Dumping log records to CSV file")
conn = sq3.connect(self.db)
cur = conn.cursor()
cur.execute(self.db_get_all)
entries = cur.fetchall()
conn.close()
db_dump_filename = self.rename_db_file(add_extension=False) + ".csv"
with open(db_dump_filename, "w", newline="") as csv_dump_file:
cols = ["Timestamp", "Level", "Source", "Message"]
db_exporter = csv.writer(csv_dump_file,
quoting=csv.QUOTE_NONE,
delimiter=",",
escapechar="#")
db_exporter.writerow(cols)
db_exporter.writerows(entries)
def clear_entries(self):
conn = sq3.connect(self.db)
# Drop current logs table
conn.execute(LogHandlerSQLite.db_clear_script)
# Shrink size of DB on the filesystem as much as possible
conn.execute("VACUUM")
# Create an empty one
conn.execute(LogHandlerSQLite.db_init_script)
conn.commit()
conn.close()
def format_time(self, record):
# FIXME Timestamp includes microseconds
record.tstamp = datetime.datetime.fromtimestamp(record.created, datetime.timezone.utc).astimezone().isoformat()
def emit(self, record):
"""
Emits a log entry after processing it. The processing includes
* formatting the entry's structure
* formatting the datetime component
* adding exception information (in case the log entry was emitted due to exception)
* mapping of the log level (Python) to custom level that allows it to be used in the model and log console
In addition the entry is inserted into a database, which can then be viewed by and processed with external tools
Args:
record: Log entry
Returns:
"""
# Format log entry
self.format(record)
self.format_time(record)
if record.exc_info: # for exceptions
record.exc_text = logging._defaultFormatter.formatException(record.exc_info)
else:
record.exc_text = ""
# Map Python logging module's levels to the custom ones
if record.levelno == 10:
record.levelno = LogEntriesModel.LoggingLevel.DEBUG # 3
elif record.levelno == 20:
record.levelno = LogEntriesModel.LoggingLevel.INFO # 0
elif record.levelno == 30:
record.levelno = LogEntriesModel.LoggingLevel.WARN # 1
elif record.levelno == 40:
record.levelno = LogEntriesModel.LoggingLevel.ERROR # 2
# Insert the log record
sql = LogHandlerSQLite.db_insert_script % record.__dict__
conn = sq3.connect(self.db)
conn.execute(sql)
conn.commit()
conn.close()
def rename_db_file(self, add_extension=False):
db_old_name = os.path.splitext(self.db)[0]
db_new_name = db_old_name + "_" \
+ datetime.datetime.now().replace(microsecond=0).isoformat().replace(":", "-").replace(" ", "_")
if add_extension:
db_new_name = db_new_name + os.path.splitext(self.db)[1]
return db_new_name
def close(self):
"""
Clean-up procedure called during shutdown of the logger that uses the handler
The previously created log database is renamed by adding the current system
time stamp. This is done to ensure that the database can be viewed later (e.g.
bug report) and not overwritten when the logging is setup again
Returns:
"""
db_new_name = self.rename_db_file(add_extension=True)
os.rename(self.db, db_new_name)
super().close()
class SQLiteLogger(logging.Logger):
def __init__(self, name="geointelkit", level=logging.DEBUG, db_path="logs.db"):
logging.Logger.__init__(self, name, level)
self.formatter = logging.Formatter(fmt="%(asctime)s %(levelno)d %(name)s %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S")
self.handler = LogHandlerSQLite(db_path=db_path)
self.handler.setFormatter(self.formatter)
self.addHandler(self.handler)
def clear_entries(self):
self.handler.clear_entries()
def dump(self):
self.handler.dump()
def setup_logger():
logging.setLoggerClass(SQLiteLogger)
Last but not least changing the escapechar to a blank space " " resulted in double blank spaces in ALL messages even those that are just a simple short string. Here are some examples including the problematic log record from above:
2021-10-26T15:12:09.467067+02:00,3,geointelkit,Adding OpenStreetMaps view
2021-10-26T15:12:09.536681+02:00,0,geointelkit,Setting coordinates to [lat: 48.9475 , lng: 8.4106]

Python script writes twice in to the tables

I am trying to automate this scenario. I have 2 .sql files (add1.sql and add2.sql) which has 1 insert script each.
My goal is to write one record to table1 by executing lines from add1.sql and one record to cm.cl by executing lines from add2.sql, waiting for about 5 mins so a backend service runs. This service writes from DB1 to DB2. I then connect to DB2 to see if the record from DB1 matches what was written to DB2. Depending no the results, an email is sent.
Below is my code. Everything works just fine except that it writes twice to DB1. So, basically 4 records are inserted instead of 2. Any idea why it writes 4 records?
import pypyodbc as pyodbc
import smtplib
sender = 'abc#abc.com'
receivers = ['abc#abc.com','xyz#abc.com']
import unittest
import time
class TestDB1(unittest.TestCase):
def testing_master(self):
Master_Conn = 'Driver=
{SQLServer};Server=server\servername;Database=database;UID=userid;PWD=password'
Master_db = pyodbc.connect(Master_Conn)
Master_Cursor = Master_db.cursor()
try:
#Open, read and execute add_shell.sql
file = open('C:\\aaa\\add1.sql', 'r')
line = file.read()
lines = line.replace('\n', ' ')
file1 = open('C:\\aaa\\add2.sql', 'r')
line1=file1.read()
lines1=line1.replace('\n', ' ')
Master_Cursor.execute(lines)
time.sleep(1)
Master_Cursor.execute(lines1)
Master_db.commit()
file.close()
file1.close()
#Get python object for latest record inserted in DB1
Master_CID=Master_Cursor.execute("select col1 from tablename1 order by sequenceid desc").fetchone()
#convert tuple to srting [0] gives first tuple element.
Master_CID_str=str(Master_CID[0])
#Get GUID by stripping first 2 chars and last char.
Master_CID_str=Master_CID_str[2:len(Master_CID_str)-1]
Master_CLID=Master_Cursor.execute("select col2 from tablename2 order by sequenceid desc").fetchone()
Master_CLID_str=str(Master_CLID[0])
Master_CLID_str=Master_CLID_str[2:len(Master_CLID_str) - 1]
# Wait for service that transfers data from one db to another DB to run
time.sleep(310)
finally:
Master_Cursor.close()
Master_db.close()
return Master_CID,Master_CID_str,Master_CLID,Master_CLID_str
def testing_int_instance(self):
#unpacking return value of tuple from testing_master() function
Master_CID,Master_CID_str,Master_CLID,Master_CLID_str=self.testing_master()
print ("printing from testing_int_instance {0}".format(Master_CID))
Int_Instance_Conn = 'Driver={SQL Server};Server=server2\servername2;Database=database2;UID=uid;PWD=password;'
Int_db = pyodbc.connect(Int_Instance_Conn)
Int_Cursor = Int_db.cursor()
#return Int_db, Int_Cursor
#execute select from db where col matches that of one inserted in master db.
Int_Instance_CID=Int_Cursor.execute("select col1 from table1 where cartridgemodelid = '%s'" %(Master_CID_str)).fetchone()
print(Int_Instance_CID)
smtpObj = smtplib.SMTP('22.101.1.333', 25)
if (Master_CID==Int_Instance_CID):
print("Matched")
content="This email confirms successful data transfer from Master to Instance for col1: \n"
message = "\r\n".join(["From:" + sender,"To:" + str(receivers[:]),"Subject: Test Result","",content +Master_CID_str])
#smtpObj = smtplib.SMTP('22.101.2.222', 25)
smtpObj.sendmail(sender, receivers, message)
elif (Master_CID!=Int_Instance_CID):
print("no match")
content = "This email confirms failure of data transfer from DB1 to DB2 for COL1: \n"
message = "\r\n".join(["From:" + sender, "To:" + str(receivers[:]), "Subject: Test Result", "",content +Master_CID_str])
smtpObj.sendmail(sender, receivers, message)
Int_Instance_CLID=Int_Cursor.execute("select COL2 from table2 where col= '%s'" %(Master_CLID_str)).fetchone()
print (Int_Instance_CLID)
if (Master_CLID == Int_Instance_CLID):
print ("Printing int_instance CLID {0}".format(Int_Instance_CLID))
content = "This email confirms successful data transfer from DB1 to DB2 for COL: \n"
message = "\r\n".join(
["From:" + sender, "To:" + str(receivers[:]), "Subject: Test Result", "", content + Master_CLID_str])
#smtpObj = smtplib.SMTP('22.101.2.222', 25)
smtpObj.sendmail(sender, receivers, message)
print ("Ids Matched")
elif (Master_CLID != Int_Instance_CLID):
DB1 to DB2 for COL: \n"
message = "\r\n".join(
["From:" + sender, "To:" + str(receivers[:]), "Subject: Test Result", "", content + Master_CLID_str])
#smtpObj = smtplib.SMTP('22.101.2.222', 25)
smtpObj.sendmail(sender, receivers, message)
smtpObj.quit()
Int_db.close()
if name == 'main':
unittest.main()
add1.sql is:
DECLARE #Name VARCHAR(2000)
DECLARE #PartNumber VARCHAR(2000)
SELECT #Name='test'+convert(varchar,getdate(),108)
SELECT #PartNumber='17_00001_'+convert(varchar,getdate(),108)
DECLARE #XML XML
DECLARE #FileName VARCHAR(1000)
DECLARE #Id UNIQUEIDENTIFIER
SELECT #Id = NEWID()
SELECT #FileName = 'test.xml'
SELECT #XML='<model>
<xml tags go here>
BEGIN
INSERT INTO table1
(ID,Name,Type,Desc,Number,Revision,Model,status,Modifiedby,Modifiedon)
VALUES(#Id,#Name,'xyz','',#partnumber,'01',#XML,'A','453454-4545-4545-4543-345342343',GETUTCDATE())
add2.sql is:
DECLARE #XML XML
DECLARE #CM_Name VARCHAR(2000)
DECLARE #FileName VARCHAR(1000)
DECLARE #PartNumber VARCHAR(2000)
DECLARE #Id UNIQUEIDENTIFIER
SELECT #Id=NEWID()
DECLARE #Name VARCHAR(2000)
DECLARE #CMId VARCHAR(2000)
DECLARE #CM_PartName VARCHAR(2000)
DECLARE #CM_Partnumber VARCHAR(2000)
SELECT #Name='test'+convert(varchar,getdate(),108)
SELECT #PartNumber='test'+convert(varchar,getdate(),108)
DECLARE #RowCount INT
DECLARE #Message VARCHAR(100);
SELECT #FileName = 'test.xml'
SELECT #CMId = CM.CMID,
#CM_Name = CM.CMName,
#CM_PN = CM.PN
FROM cm.Model CM
WHERE CM.MName LIKE 'test%'
ORDER BY CM.ModifiedBy DESC
SELECT #XML='<Layout>
other xml tags...
BEGIN
INSERT INTO cm.CL(ID, ModelID, Layout, Description, PN, Revision, CLayout, Status, ModifiedBy, ModifiedOn)
SELECT TOP 1 #Id, #CMId, #Name, '', #PartNumber, '01', #XML, 'A', '453454-345-4534-4534-4534543545', GETUTCDATE()
FROM cm.table1 CM
WHERE CM.Name=#CM_Name
AND CM.Partnumber=#CM_Partnumber
Currently, you are calling test_master() twice! First as your named method and then in second method when you unpack the returned values. Below is a demonstration of defined methods outside of the Class object. If called as is, testing_master will run twice.
Consider also using a context manager to read .sql scripts using with() which handles open and close i/o operations shown below:
# FIRST CALL
def testing_master():
#...SAME CODE...
try:
with open('C:\\aaa\\add1.sql', 'r') as file:
lines = file.read().replace('\n', ' ')
Master_Cursor.execute(lines)
Master_db.commit()
time.sleep(1)
with open('C:\\aaa\\add2.sql', 'r') as file1:
lines1 = file1.read().replace('\n', ' ')
Master_Cursor.execute(lines1)
Master_db.commit()
#...SAME CODE...
return Master_CID, Master_CID_str, Master_CLID, Master_CLID_str
def testing_int_instance():
# SECOND CALL
Master_CID, Master_CID_str, Master_CLID, Master_CLID_str = testing_master()
#...SAME CODE...
if __name__ == "__main__":
testing_master()
testing_int_instance()
Commenting out the time(310) seems like it works but as you mention the background Windows service does not effectively run and so interrupts database transfer.
To resolve, consider calling the second method at the end of the first by passing the values as parameters without any return and remove unpacking line. Then, in the main global environment, only run testing_master(). Of course qualify with self when inside a Class definition.
def testing_master():
#...SAME CODE...
testing_int_instance(Master_CID, Master_CID_str, Master_CLID, Master_CLID_str)
def testing_int_instance(Master_CID, Master_CID_str, Master_CLID, Master_CLID_str):
#...SKIP UNPACK LINE
#...CONTINUE WITH SAME CODE...
if __name__ == "__main__":
testing_master()
Due to your unittest, consider slight adjustment to original setup where you qualify every variable with self:
def testing_master():
...
self.Master_CID=Master_Cursor.execute("select col1 from tablename1 order by sequenceid desc").fetchone()
self.Master_CID_str=str(Master_CID[0])
self.Master_CID_str=Master_CID_str[2:len(Master_CID_str)-1]
self.Master_CLID=Master_Cursor.execute("select col2 from tablename2 order by sequenceid desc").fetchone()
self.Master_CLID_str=str(Master_CLID[0])
self.Master_CLID_str=Master_CLID_str[2:len(Master_CLID_str) - 1]
def testing_int_instance(self):
# NO UNPACK LINE
# ADD self. TO EVERY Master_* VARIABLE
...

Redis TypeError: must be string or buffer, not None

I want to set a django model in my redis data store, and then in another view function, I want to get it and reuse it again; but it turns out to say:
TypeError: must be string or buffer, not None.
This is my code in views.py:
connection = redis.Redis('localhost')
def recharge_account(request):
cur = recharge_form.cleaned_data['currency']
amnt = recharge_form.cleaned_data['amount']
user_profile = models.UserProfile.objects.get(user=models.User.objects.get(id=request.user.id))
user_b_account, created = models.BankAccount.objects.get_or_create(
owner=user_profile,
cur_code=cur,
method=models.BankAccount.DEBIT,
name=request.user.username + '_' + cur + '_InterPay-account',
account_id=make_id()
)
# saving the temporarily deposit in redis
deposit_dict = {"account": user_b_account, "amount": amnt, "banker": user_profile,
"date": user_b_account.when_opened, "cur_code": cur}
pickled_deposit_dict = pickle.dumps(deposit_dict)
cached_deposit_name = str(user_profile.user_id) + "-cachedDeposit"
connection.set(cached_deposit_name, pickled_deposit_dict)
....
def callback_handler(request, amount):
#getting from redis
new_deposit = pickle.loads(
connection.get(str(user_profile.user_id) + "-cachedDeposit"))
deposit = models.Deposit(account=new_deposit['account'], amount=new_deposit['amount'],
banker=new_deposit['banker'],
date=new_deposit['date'], cur_code=new_deposit['cur_code'])
deposit.save()
.....
My question is: is there any problem with my procedure? Is there any problem with saving a django model in redis?
What should I do to get this function to work properly and getting the saved data "not to be None"?

Pyuno indexing issue that I would like an explanation for

The following python libreoffice Uno macro works but only with the try..except statement.
The macro allows you to select text in a writer document and send it to a search engine in your default browser.
The issue, is that if you select a single piece of text,oSelected.getByIndex(0) is populated but if you select multiple pieces of text oSelected.getByIndex(0) is not populated. In this case the data starts at oSelected.getByIndex(1) and oSelected.getByIndex(0) is left blank.
I have no idea why this should be and would love to know if anyone can explain this strange behaviour.
#!/usr/bin/python
import os
import webbrowser
from configobj import ConfigObj
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY
from com.sun.star.awt.MessageBoxButtons import DEFAULT_BUTTON_OK, DEFAULT_BUTTON_CANCEL, DEFAULT_BUTTON_RETRY, DEFAULT_BUTTON_YES, DEFAULT_BUTTON_NO, DEFAULT_BUTTON_IGNORE
from com.sun.star.awt.MessageBoxType import MESSAGEBOX, INFOBOX, WARNINGBOX, ERRORBOX, QUERYBOX
def fs3Browser(*args):
#get the doc from the scripting context which is made available to all scripts
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
doc = XSCRIPTCONTEXT.getDocument()
parentwindow = doc.CurrentController.Frame.ContainerWindow
oSelected = model.getCurrentSelection()
oText = ""
try:
for i in range(0,4,1):
print ("Index No ", str(i))
try:
oSel = oSelected.getByIndex(i)
print (str(i), oSel.getString())
oText += oSel.getString()+" "
except:
break
except AttributeError:
mess = "Do not select text from more than one table cell"
heading = "Processing error"
MessageBox(parentwindow, mess, heading, INFOBOX, BUTTONS_OK)
return
lookup = str(oText)
special_c =str.maketrans("","",'!|##"$~%&/()=?+*][}{-;:,.<>')
lookup = lookup.translate(special_c)
lookup = lookup.strip()
configuration_dir = os.environ["HOME"]+"/fs3"
config_filename = configuration_dir + "/fs3.cfg"
if os.access(config_filename, os.R_OK):
cfg = ConfigObj(config_filename)
#define search engine from the configuration file
try:
searchengine = cfg["control"]["ENGINE"]
except:
searchengine = "https://duckduckgo.com"
if 'duck' in searchengine:
webbrowser.open_new('https://www.duckduckgo.com//?q='+lookup+'&kj=%23FFD700 &k7=%23C9C4FF &ia=meanings')
else:
webbrowser.open_new('https://www.google.com/search?/&q='+lookup)
return None
def MessageBox(ParentWindow, MsgText, MsgTitle, MsgType, MsgButtons):
ctx = XSCRIPTCONTEXT.getComponentContext()
sm = ctx.ServiceManager
si = sm.createInstanceWithContext("com.sun.star.awt.Toolkit", ctx)
mBox = si.createMessageBox(ParentWindow, MsgType, MsgButtons, MsgTitle, MsgText)
mBox.execute()
Your code is missing something. This works without needing an extra try/except clause:
selected_strings = []
try:
for i in range(oSelected.getCount()):
oSel = oSelected.getByIndex(i)
if oSel.getString():
selected_strings.append(oSel.getString())
except AttributeError:
# handle exception...
return
result = " ".join(selected_strings)
To answer your question about the "strange behaviour," it seems pretty straightforward to me. If the 0th element is empty, then there are multiple selections which may need to be handled differently.

Error message "MemoryError" in Python

Here's my problem: I'm trying to parse a big text file (about 15,000 KB) and write it to a MySQL database. I'm using Python 2.6, and the script parses about half the file and adds it to the database before freezing up. Sometimes it displays the text:
MemoryError.
Other times it simply freezes. I figured I could avoid this problem by using generator's wherever possible, but I was apparently wrong.
What am I doing wrong?
When I press Ctrl + C to keyboard interrupt, it shows this error message:
...
sucessfully added vote # 2281
sucessfully added vote # 2282
sucessfully added vote # 2283
sucessfully added vote # 2284
floorvotes_db.py:35: Warning: Data truncated for column 'vote_value' at row 1
r['bill ID'] , r['last name'], r['vote'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "floorvotes_db.py", line 67, in addAllFiles
addFile(file)
File "floorvotes_db.py", line 61, in addFile
add(record)
File "floorvotes_db.py", line 35, in add
r['bill ID'] , r['last name'], r['vote'])
File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 166, in execute
File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 35, in defaulte rrorhandler
KeyboardInterrupt
import os, re, datetime, string
# Data
DIR = '/mydir'
tfn = r'C:\Documents and Settings\Owner\Desktop\data.txt'
rgxs = {
'bill number': {
'rgx': r'(A|S)[0-9]+-?[A-Za-z]* {50}'}
}
# Compile rgxs for speediness
for rgx in rgxs: rgxs[rgx]['rgx'] = re.compile(rgxs[rgx]['rgx'])
splitter = rgxs['bill number']['rgx']
# Guts
class floor_vote_file:
def __init__(self, fn):
self.iterdata = (str for str in
splitter.split(open(fn).read())
if str and str <> 'A' and str <> 'S')
def iterVotes(self):
for record in self.data:
if record: yield billvote(record)
class billvote(object):
def __init__(self, section):
self.data = [line.strip() for line
in section.splitlines()]
self.summary = self.data[1].split()
self.vtlines = self.data[2:]
self.date = self.date()
self.year = self.year()
self.votes = self.parse_votes()
self.record = self.record()
# Parse summary date
def date(self):
d = [int(str) for str in self.summary[0].split('/')]
return datetime.date(d[2],d[0],d[1]).toordinal()
def year(self):
return datetime.date.fromordinal(self.date).year
def session(self):
"""
arg: 2-digit year int
returns: 4-digit session
"""
def odd():
return divmod(self.year, 2)[1] == 1
if odd():
return str(string.zfill(self.year, 2)) + \
str(string.zfill(self.year + 1, 2))
else:
return str(string.zfill(self.year - 1, 2))+ \
str(string.zfill(self.year, 2))
def house(self):
if self.summary[2] == 'Assembly': return 1
if self.summary[2] == 'Senate' : return 2
def splt_v_line(self, line):
return [string for string in line.split(' ')
if string <> '']
def splt_v(self, line):
return line.split()
def prse_v(self, item):
"""takes split_vote item"""
return {
'vote' : unicode(item[0]),
'last name': unicode(' '.join(item[1:]))
}
# Parse votes - main
def parse_votes(self):
nested = [[self.prse_v(self.splt_v(vote))
for vote in self.splt_v_line(line)]
for line in self.vtlines]
flattened = []
for lst in nested:
for dct in lst:
flattened.append(dct)
return flattened
# Useful data objects
def record(self):
return {
'date' : unicode(self.date),
'year' : unicode(self.year),
'session' : unicode(self.session()),
'house' : unicode(self.house()),
'bill ID' : unicode(self.summary[1]),
'ayes' : unicode(self.summary[5]),
'nays' : unicode(self.summary[7]),
}
def iterRecords(self):
for vote in self.votes:
r = self.record.copy()
r['vote'] = vote['vote']
r['last name'] = vote['last name']
yield r
test = floor_vote_file(tfn)
import MySQLdb as dbapi2
import floorvotes_parse as v
import os
# Initial database crap
db = dbapi2.connect(db=r"db",
user="user",
passwd="XXXXX")
cur = db.cursor()
if db and cur: print "\nConnected to db.\n"
def commit(): db.commit()
def ext():
cur.close()
db.close()
print "\nConnection closed.\n"
# DATA
DIR = '/mydir'
files = [DIR+fn for fn in os.listdir(DIR)
if fn.startswith('fvote')]
# Add stuff
def add(r):
"""add a record"""
cur.execute(
u'''INSERT INTO ny_votes (vote_house, vote_date, vote_year, bill_id,
member_lastname, vote_value) VALUES
(%s , %s , %s ,
%s , %s , %s )''',
(r['house'] , r['date'] , r['year'],
r['bill ID'] , r['last name'], r['vote'])
)
#print "added", r['year'], r['bill ID']
def crt():
"""create table"""
SQL = """
CREATE TABLE ny_votes (openleg_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
vote_house int(1), vote_date int(5), vote_year int(2), bill_id varchar(8),
member_lastname varchar(50), vote_value varchar(10));
"""
cur.execute(SQL)
print "\nCreate ny_votes.\n"
def rst():
SQL = """DROP TABLE ny_votes"""
cur.execute(SQL)
print "\nDropped ny_votes.\n"
crt()
def addFile(fn):
"""parse and add all records in a file"""
n = 0
for votes in v.floor_vote_file(fn).iterVotes():
for record in votes.iterRecords():
add(record)
n += 1
print 'sucessfully added vote # ' + str(n)
def addAllFiles():
for file in files:
addFile(file)
if __name__=='__main__':
rst()
addAllFiles()
Generators are a good idea, but you seem to miss the biggest problem:
(str for str in splitter.split(open(fn).read()) if str and str <> 'A' and str <> 'S')
You're reading the whole file in at once even if you only need to work with bits at a time. You're code is too complicated for me to fix, but you should be able to use file's iterator for your task:
(line for line in open(fn))
I noticed that you use a lot of slit() calls. This is memory consuming, according to http://mail.python.org/pipermail/python-bugs-list/2006-January/031571.html . You can start investigating this.
Try to comment out add(record) to see if the problem is in your code or on the database side. All the records are added in one transaction (if supported) and maybe this leads to a problem if it get too many records. If commenting out add(record) helps, you could try to call commit() from time to time.
This isn't a Python memory issue, but perhaps it's worth thinking about. The previous answers make me think you'll sort that issue out quickly.
I wonder about the rollback logs in MySQL. If a single transaction is too large, perhaps you can checkpoint chunks. Commit each chunk separately instead of trying to rollback a 15MB file's worth.

Categories