Float Values Rounding with SQLAlchemy and MySQL - python
My question is similar to this unanswered question: SQLAlchemy commits makes float to be rounded
I have a text file of data that looks like this:
#file camera date mjd focus error
ibcy02blq UVIS1 08/03/09 55046.196630 0.57857 0.55440
ibcy02bnq UVIS1 08/03/09 55046.198330 -0.15000 0.42111
ibcy03j8q UVIS1 08/11/09 55054.041650 -0.37143 0.40802
ibcy03jaq UVIS1 08/11/09 55054.043350 -0.91857 0.51859
ibcy04m4q UVIS1 08/18/09 55061.154900 -0.32333 0.52327
ibcy04m6q UVIS1 08/18/09 55061.156600 -0.24867 0.66651
ibcy05b7q UVIS1 09/05/09 55079.912670 0.64900 0.58423
ibcy05b9q UVIS1 09/05/09 55079.914370 0.82000 0.50202
ibcy06meq UVIS1 10/02/09 55106.909840 -0.09667 0.24016
But once I read it into my MySQL database it looks like this:
+------+-----------+--------+------------+---------+----------+
| id | filename | camera | date | mjd | focus |
+------+-----------+--------+------------+---------+----------+
| 1026 | ibcy02blq | UVIS1 | 2009-08-03 | 55046.2 | 0.57857 |
| 1027 | ibcy02bnq | UVIS1 | 2009-08-03 | 55046.2 | -0.15 |
| 1028 | ibcy03j8q | UVIS1 | 2009-08-11 | 55054 | -0.37143 |
| 1029 | ibcy03jaq | UVIS1 | 2009-08-11 | 55054 | -0.91857 |
| 1030 | ibcy04m4q | UVIS1 | 2009-08-18 | 55061.2 | -0.32333 |
| 1031 | ibcy04m6q | UVIS1 | 2009-08-18 | 55061.2 | -0.24867 |
| 1032 | ibcy05b7q | UVIS1 | 2009-09-05 | 55079.9 | 0.649 |
| 1033 | ibcy05b9q | UVIS1 | 2009-09-05 | 55079.9 | 0.82 |
| 1034 | ibcy06meq | UVIS1 | 2009-10-02 | 55106.9 | -0.09667 |
| 1035 | ibcy06mgq | UVIS1 | 2009-10-02 | 55106.9 | -0.1425 |
+------+-----------+--------+------------+---------+----------+
The mjd column is being truncated and I'm not sure why. I understand that there are floating point precision errors for something like 1/3 but this looks more like some type of rounding is being implemented.
Here is the code I use to ingest the data into the database:
def make_focus_table_main():
"""The main controller for the make_focus_table
module."""
logging.info('Process Starting')
filename_list = glob.glob('/grp/hst/OTA/focus/source/FocusModel/UVIS*FocusHistory.txt')
logging.info('Found {} files'.format(len(filename_list)))
for filename in filename_list:
logging.info('Reading data from {}'.format(filename))
output_list = []
with open(filename, 'r') as f:
data = f.readlines()
for line in data[1:]:
line = line.split()
output_dict = {}
output_dict['filename'] = line[0]
output_dict['camera'] = line[1]
output_dict['date'] = datetime.strptime(line[2], '%m/%d/%y')
output_dict['mjd'] = float(line[3])
output_dict['focus'] = float(line[4])
output_list.append(output_dict)
logging.info('Beginning bulk insert of records.')
engine.execute(Focus.__table__.insert(), output_list)
logging.info('Database insert complete.')
logging.info('Process Complete')
I've used pdb to check that the values are not being truncated prior to being passed to the database (i.e. Python/SQLAlchemy is not performing the rounding). I can verify this in the INSERT command SQLAlchemy issues:
2014-04-11 13:08:20,522 INFO sqlalchemy.engine.base.Engine INSERT INTO focus (filename, camera, date, mjd, focus) VALUES (%s, %s, %s, %s, %s)
2014-04-11 13:08:20,602 INFO sqlalchemy.engine.base.Engine (
('ibcy02blq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19663, 1.05778),
('ibcy02bnq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19833, 1.32333),
('ibcy03j8q', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04165, 1.57333),
('ibcy03jaq', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04335, 0.54333),
('ibcy04m4q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1549, -1.152),
('ibcy04m6q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1566, -1.20733),
('ibcy05b7q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91267, 2.35905),
('ibcy05b9q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91437, 1.84524)
... displaying 10 of 1025 total bound parameter sets ...
('ichl05qwq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05103, -2.98),
('ichl05qxq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05177, -3.07))
2014-04-11 13:08:20,959 INFO sqlalchemy.engine.base.Engine COMMIT
Here is how the column is defined in my SQLAlchemy classes:
class Focus(Base):
"""ORM for the table storing the focus measurement information."""
__tablename__ = 'focus'
id = Column(Integer(), primary_key=True)
filename = Column(String(17), index=True, nullable=False)
camera = Column(String(5), index=True, nullable=False)
date = Column(Date(), index=True, nullable=False)
mjd = Column(Float(precision=20, scale=10), index=True, nullable=False)
focus = Column(Float(15), nullable=False)
__table_args__ = (UniqueConstraint('filename', 'camera',
name='focus_uniqueness_constraint'),)
Here is the SQL that's logged from SQLAlchemy with echo=True when I create the table:
CREATE TABLE focus (
id INTEGER NOT NULL AUTO_INCREMENT,
filename VARCHAR(17) NOT NULL,
camera VARCHAR(5) NOT NULL,
date DATE NOT NULL,
mjd FLOAT(20) NOT NULL,
focus FLOAT(15) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT focus_uniqueness_constraint UNIQUE (filename, camera)
)
So far, so good. But here's what I see MySQL with a SHOW CREATE TABLE focus;:
CREATE TABLE `focus` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(17) NOT NULL,
`camera` varchar(5) NOT NULL,
`date` date NOT NULL,
`mjd` float NOT NULL,
`focus` float NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `focus_uniqueness_constraint` (`filename`,`camera`),
KEY `ix_focus_filename` (`filename`),
KEY `ix_focus_mjd` (`mjd`),
KEY `ix_focus_date` (`date`),
KEY `ix_focus_camera` (`camera`)
) ENGINE=InnoDB AUTO_INCREMENT=1193 DEFAULT CHARSET=latin1
Somehow the FLOAT definition changed! Is this some type of MySQL configuration setting? I'm just running this on my local host right now, but if this is a configuration setting I'm concerned about the portability of this code onto a production server if I continue to use floats. I could just switch to a decimal column type as I've seen in other SO questions since I need exact values but I would like to understand what's going on here.
Update: Just to expand a little on two-bit-alchemist's answer, here is how it changes my query:
> SELECT ROUND(mjd,10) FROM focus LIMIT 10;
+------------------+
| ROUND(mjd,10) |
+------------------+
| 55046.1953125000 |
| 55046.1992187500 |
| 55054.0429687500 |
| 55054.0429687500 |
| 55061.1562500000 |
| 55061.1562500000 |
| 55079.9140625000 |
| 55079.9140625000 |
| 55106.9101562500 |
| 55106.9101562500 |
+------------------+
10 rows in set (0.00 sec)
Notice that all the decimal precision is still there. I had no idea SELECT was rounding values but I guess this makes sense if you think about how a floating point representation works. It uses the full bytes allocated for that number, how many decimals you display is arbitrary up to the full length of the float:https://stackoverflow.com/a/20482699/1216837
Specifying the precision only seems to affect if it's stored as a double or a single: http://dev.mysql.com/doc/refman/5.0/en/floating-point-types.html.
But, what's also interesting/annoying is that I have to worry about this same thing when issuing a SELECT from the SQLAlchemy layer:
query = psf_session.query(Focus).first()
print query.filename, query.mjd, query.focus
Gives me bcy02blq 55046.2 1.05778 so the values are still being rounded. Again, this makes sense because SQLAlchemy is just issuing SQL commands anyway. All in all this is motivating me to switch to a DECIMAL column type: http://dev.mysql.com/doc/refman/5.0/en/fixed-point-types.html
It looks like all your values were printed with exactly six digits (except where .0 was left off in a couple of places). While I can't find any documentation on this, I suspect this is simply a default MySQL behavior for displaying float values in the context of a SELECT statement.
Based on the CREATE TABLE statement you provided, the internal representation is correct, so you need only add something like ROUND(mjd, 3) to your statement, with the first argument being the fields to round and the last being the number of digits to round to (which can be longer than what is displaying now).
Related
Why pywinauto won't detect the control identifiers in a window except the title bar?
I've been trying to automate the Cinebench window with python using pyautogui, as this is the best library that i came across. I made a few projects that worked well, but with Cinebench i don't get any control identifiers (Except for title, and the normal 3 top buttons). My main objective is to be able to automatically start benchmarks and read the final score. I didn't come here to bother you all as soon as I hit an issue, so here's all of the things that i tried: Switching backend="uia" to backend="win32". Result: code stopped working Waiting for the window to load, using time.sleep(). Result: no difference was noticed Adding a timeout=10 to the .connect() function. Result: no difference was noticed Researching if Cinebench had an API. Result: of course it doesn't (as of what i found) Researching if there was another library to do it. Result: didn't find any. I really don't want to do this using "click at this coordinates" and even so i wouldn't be able to read from it, so it would be useless. The code that i used: app = Application(backend="uia").start(rf"C:/Users/{os.getlogin()}/Desktop/MasterBench/Benchmarks/Cinebench.exe") app = Application(backend="uia").connect(title=CINEBENCH_WINDOW_NAME, timeout=10) app.CINEBENCHR23200.print_control_identifiers() What i got: Control Identifiers: Dialog - 'CINEBENCH R23.200' (L-8, T-8, R1928, B1088) ['CINEBENCH R23.200', 'CINEBENCH R23.200Dialog', 'Dialog'] child_window(title="CINEBENCH R23.200", control_type="Window") | | TitleBar - '' (L16, T-5, R1920, B23) | ['TitleBar'] | | | | Menu - 'Sistema' (L0, T0, R22, B22) | | ['SistemaMenu', 'Sistema', 'Menu', 'Sistema0', 'Sistema1'] | | child_window(title="Sistema", auto_id="MenuBar", control_type="MenuBar") | | | | | | MenuItem - 'Sistema' (L0, T0, R22, B22) | | | ['Sistema2', 'SistemaMenuItem', 'MenuItem'] | | | child_window(title="Sistema", control_type="MenuItem") | | | | Button - 'Riduci a icona' (L1779, T8, R1826, B22) | | ['Button', 'Riduci a iconaButton', 'Riduci a icona', 'Button0', 'Button1'] | | child_window(title="Riduci a icona", control_type="Button") | | | | Button - 'Ripristino' (L1826, T8, R1872, B22) | | ['Button2', 'Ripristino', 'RipristinoButton'] | | child_window(title="Ripristino", control_type="Button") | | | | Button - 'Chiudi' (L1872, T8, R1928, B22) | | ['Button3', 'Chiudi', 'ChiudiButton'] | | child_window(title="Chiudi", control_type="Button")
How to disable text wrap in a columnar column?
|---------|------------------|------------------|-----------|------------------| |serial no|ggggggg name |status |status code|AAAAAAAAAurl | |==============================================================================| |1 |ggggggggggg-kkkkkk|Healthy |200 |http://aaaaaaaaaaa| | |e | | |-service.dev.sdddd| | | | | |1.cccc.cc/health/l| | | | | |ive | |---------|------------------|------------------|-----------|------------------| |2 |zzzzzzzz-jjjjjj |Healthy |200 |http://ddddddddddd| | | | | |ader.dev.ffffff.cc| | | | | |cc.cc/health/live | |---------|------------------|------------------|-----------|------------------| I am trying to get the last column in one row the entire url. I am using the following python library to print this, tried few things but unable to get this working. I tried https://pypi.org/project/Columnar/ setting max column width and min column width and such as mentioned here, but none are working Edit: Headers are simply names of the columns, you can name it anything you want from columnar import columnar headers = ['serial no', 'service name', 'status', 'status code'] ... tabledata = [] counter = 0 for x in services: zzz = requests.get("http://xxx.yyy"+ x) counter = counter + 1 i = counter myrowdata = [i, x, zzz.text, zzz.status_code] tabledata.append(myrowdata) table = columnar(tabledata, headers, no_borders=True, max_column_width=None) print(table)
1.) You missed the column name "url" from headers. You should do as follows: headers = ['serial no', 'service name', 'status', 'status code', 'url'] 2.) You have to add url to myrowdata: myrowdata = [i, x, zzz.text, zzz.status_code, "http://xxx.yyy"+ x] Update: If you did all the fixes above, you have to run it in an external system terminal to get the real result, as some internal IDE console constrains the width of the display: In Spyder: SERIAL NO SERVICE NAME STATUS STATUS CODE URL 1 Anyname Anytext Anystatus_code http://aaaaaaaaaaaaaaaaaaa aadddddddddddddddddddddddd dddddddaaaaaaaaa.com In external system terminal:
Formating a table from a csv file
I'm trying to make a table from data from a CSV file using only the CSV module. Could anyone tell me what should I do to display the '|' at the end of every row(just after the last element in the row)? Here's what I have so far: def display_playlist( filename ): if filename.endswith('.csv')==False: #check if it ends with CSV extension filename = filename + ('.csv') #adding .csv if given without .csv extension max_element_length=0 #aligning columns to the longest elements for row in get_datalist_from_csv( filename ): for element in row: if len(element)>max_element_length: max_element_length=len(element) # print(max_element_length) #return max_element_length print('-----------------------------------------------------------------------------') for row in get_datalist_from_csv( filename ): for element in row: print('| ', end='') if (len(element)<=4 and element.isdigit==True): print(pad_to_length(element,4), end=' |') #trying to get '|' at the end[enter image description here][1] else: print(pad_to_length(element, max_element_length), end=' ') print('\n') print('-----------------------------------------------------------------------------') ## Read data from a csv format file def get_datalist_from_csv( filename ): ## Create a 'file object' f, for accessing the file: with open( filename ) as f: reader = csv.reader(f) # create a 'csv reader' from the file object datalist = list( reader ) # create a list from the reader return datalist # we have a list of lists ## For aligning table columns ## It adds spaces to the end of a string to make it up to length n. def pad_to_length( string, n): return string + " "* (n-len(string)) ## s*n gives empty string for n<1 The image I get for now is: | Track | Artist | Album | Time | Computer Love | Kraftwerk | Computer World | 7:15 | Paranoid Android | Radiohead | OK Computer | 6:27 | Computer Age | Neil Young | Trans | 5:24 | Digital | Joy Division | Still | 2:50 | Silver Machine | Hawkwind | Roadhawks | 4:39 | Start the Simulator | A-Ha | Foot of the Mountain | 5:11 | Internet Connection | M.I.A. | MAYA | 2:56 | Deep Blue | Arcade Fire | The Suburbs | 4:29 | I Will Derive! | MindofMatthew | You Tube | 3:17 | Lobachevsky | Tom Lehrer | You Tube | 3:04
getting alphabets after applying sentence tokenizer of nltk instead of sentences in Python 3.5.1
import codecs, os import re import string import mysql import mysql.connector y_ = "" '''Searching and reading text files from a folder.''' for root, dirs, files in os.walk("/Users/ultaman/Documents/PAN dataset/Pan Plagiarism dataset 2010/pan-plagiarism-corpus-2010/source-documents/test1"): for file in files: if file.endswith(".txt"): x_ = codecs.open(os.path.join(root,file),"r", "utf-8-sig") for lines in x_.readlines(): y_ = y_ + lines '''Tokenizing the senteces of the text file.''' from nltk.tokenize import sent_tokenize raw_docs = sent_tokenize(y_) tokenized_docs = [sent_tokenize(y_) for sent in raw_docs] '''Removing punctuation marks.''' regex = re.compile('[%s]' % re.escape(string.punctuation)) tokenized_docs_no_punctuation = '' for review in tokenized_docs: new_review = '' for token in review: new_token = regex.sub(u'', token) if not new_token == u'': new_review+= new_token tokenized_docs_no_punctuation += (new_review) print(tokenized_docs_no_punctuation) '''Connecting and inserting tokenized documents without punctuation in database field.''' def connect(): for i in range(len(tokenized_docs_no_punctuation)): conn = mysql.connector.connect(user = 'root', password = '', unix_socket = "/tmp/mysql.sock", database = 'test' ) cursor = conn.cursor() cursor.execute("""INSERT INTO splitted_sentences(sentence_id, splitted_sentences) VALUES(%s, %s)""",(cursor.lastrowid,(tokenized_docs_no_punctuation[i]))) conn.commit() conn.close() if __name__ == '__main__': connect() After writing the above code, The result is like 2 | S | N | | 3 | S | o | | 4 | S | | | 5 | S | d | | 6 | S | o | | 7 | S | u | | 8 | S | b | | 9 | S | t | | 10 | S | | | 11 | S | m | | 12 | S | y | | 13 | S | | 14 | S | d in the database. It should be like: 1 | S | No doubt, my dear friend. 2 | S | no doubt.
I suggest making the following edits(use what you would like). But this is what I used to get your code running. Your issue is that review in for review in tokenized_docs: is already a string. So, this makes token in for token in review: characters. Therefore to fix this I tried - tokenized_docs = ['"No doubt, my dear friend, no doubt; but in the meanwhile suppose we talk of this annuity.', 'Shall we say one thousand francs a year."', '"What!"', 'asked Bonelle, looking at him very fixedly.', '"My dear friend, I mistook; I meant two thousand francs per annum," hurriedly rejoined Ramin.', 'Monsieur Bonelle closed his eyes, and appeared to fall into a gentle slumber.', 'The mercer coughed;\nthe sick man never moved.', '"Monsieur Bonelle."'] '''Removing punctuation marks.''' regex = re.compile('[%s]' % re.escape(string.punctuation)) tokenized_docs_no_punctuation = [] for review in tokenized_docs: new_token = regex.sub(u'', review) if not new_token == u'': tokenized_docs_no_punctuation.append(new_token) print(tokenized_docs_no_punctuation) and got this - ['No doubt my dear friend no doubt but in the meanwhile suppose we talk of this annuity', 'Shall we say one thousand francs a year', 'What', 'asked Bonelle looking at him very fixedly', 'My dear friend I mistook I meant two thousand francs per annum hurriedly rejoined Ramin', 'Monsieur Bonelle closed his eyes and appeared to fall into a gentle slumber', 'The mercer coughed\nthe sick man never moved', 'Monsieur Bonelle'] The final format of the output is up to you. I prefer using lists. But you could concatenate this into a string as well.
nw = [] for review in tokenized_docs[0]: new_review = '' for token in review: new_token = regex.sub(u'', token) if not new_token == u'': new_review += new_token nw.append(new_review) '''Inserting into database''' def connect(): for j in nw: conn = mysql.connector.connect(user = 'root', password = '', unix_socket = "/tmp/mysql.sock", database = 'Thesis' ) cursor = conn.cursor() cursor.execute("""INSERT INTO splitted_sentences(sentence_id, splitted_sentences) VALUES(%s, %s)""",(cursor.lastrowid,j)) conn.commit() conn.close() if __name__ == '__main__': connect()
Is there an easy way to add permissions to a user or a group in Django?
I'm currently adding permissions to users and groups like this: permissions = list( Permission.objects.filter( Q(codename='add_server', content_type=ContentType.objects.get(app_label='bildverteiler', name='server')) | Q(codename='change_server', content_type=ContentType.objects.get(app_label='bildverteiler', name='server')) | Q(codename='delete_server', content_type=ContentType.objects.get(app_label='bildverteiler', name='server')) | Q(codename='change_group', content_type=ContentType.objects.get(app_label='bildverteiler', name='group')) | Q(codename='add_group', content_type=ContentType.objects.get(app_label='bildverteiler', name='group')) | Q(codename='delete_group', content_type=ContentType.objects.get(app_label='bildverteiler', name='group')) | Q(codename='change_user', content_type=ContentType.objects.get(app_label='auth', name='user')) | Q(codename='add_user', content_type=ContentType.objects.get(app_label='auth', name='user')) | Q(codename='delete_user', content_type=ContentType.objects.get(app_label='auth', name='user')) ) ) some_user_obj.user_permissions = permissions Is there a better way ? Maybe without the query ?
This solution has less characters: query=Q() for app_labe, name in [ ('bildverteiler', 'server'), ('bildverteiler', 'group'), ('auth', 'user')]: query|=Q(content_type__app_label=app_label, content_type__name=name, codname__in=['%s_%s' % (perm_name, name) for perm_name in ['change', 'add', delete']) Permission.objects.filter(query)