I created some mapped objects using the declarative style in SQLAlchemy. I have a mapping called ThermafuserReading which has a composed primary key made up of the Time_stamp column which is DateTime and ThermafuserId column which is an Integer and also acts as a Foreign Key to another table called Thermafuser. This is the definition of the class
class ThermafuserReading(Base):
"""Class to map to the Thermafuser Readings table in the HVAC DB"""
__tablename__ = 'Thermafuser_Reading'
_timestamp = Column('Time_stamp', DateTime, primary_key = True)
_thermafuserId = Column('ThermafuserId', Integer, ForeignKey("Thermafuser.ThermafuserId"), primary_key = True)
_roomOccupied = Column('RoomOccupied', Boolean)
_zoneTemperature = Column('ZoneTemperature', Float)
_supplyAir = Column('SupplyAir', Float, nullable=True)
_airflowFeedback = Column('AirflowFeedback', Float, nullable=True)
_CO2Input = Column('CO2Input', Float, nullable=True)
_maxAirflow = Column('MaxAirflow', Float, nullable=True)
_minAirflow = Column('MinAirflow', Float, nullable=True)
_unoccupiedHeatingSetpoint = Column('UnoccupiedHeatingSetpoint', Float, nullable=True)
_unoccupiedCoolingSetpoint = Column('UnoccupiedCoolingSetpoint', Float, nullable=True)
_occupiedCoolingSetpoint = Column('OccupiedCoolingSetpoint', Float, nullable=True)
_occupiedHeatingSetpoint = Column('OccupiedHeatingSetpoint', Float, nullable=True)
_terminalLoad = Column('TerminalLoad', Float, nullable=True)
#Relationship between Thermafuser Reading and Thermafuser
_thermafuser = relationship("Thermafuser", back_populates = "_thermafuserReadings", cascade = "all, delete-orphan", single_parent = True)
I am creating a session in the following way
sqlengine = sqlalchemy.create_engine("mysql+mysqldb://user:password#localhost:3306/HVAC")
Session = sessionmaker(bind=sqlengine)
session = Session()
At some point in my code I am creating a list called readings of Thermafuser Readings and adding such list the session via session.add_all(readings)
This are some example elements printed from the list readings:
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:30:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:35:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:40:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:45:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:50:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2016-12-31 23:55:00')>
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-01 00:00:00')>
The problem is that the session is only keeping the last item in this list, eventhough I did session.add_all(readings) e.g. This is what the session has inside:
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-01 00:00:00')>
I know the session keeps track of objects which have the same primary key and thus inserts only one instance of such objects in the sesssion but in this case the primary key (thermafuserId, timestamp) is different at each instance. I dont know why the session is only adding the last element of my list while neglecting the other elements.
Any idea?
EDIT:
I kept doing some tests and found out the reason why only the last element of the list is being added to the session. The problem lies in the identity_key for each of the objects in my list readings. This is the code I used for my tests:
for reading in readings:
print(reading, mapper.identity_key_from_instance(reading))
and this are some of the results
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:15:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:20:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:25:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:30:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:35:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:40:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:45:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:50:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-14 23:55:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
<ThermafuserReading(thermafuserId = '21', timestamp = '2017-01-15 00:00:00')> (<class 'hvacDBMapping.ThermafuserReading'>, (datetime.datetime(2017, 1, 15, 0, 0), 21))
As you can observe, the function sqlalchemy.orm.util.identity_key_from_instance() is not creating the identity keys correctly for my datetime objects.
Can somebody help me clarify why?
EDIT
This is a simplified code that illustrates the problem. No connection to the database in this code. The code where this problem first appeared is much more involved and posting it will only create confusion, but this code reproduces the error.
Session = sessionmaker()
session = Session()
mapper = inspect(ThermafuserReading)
#Open the csv file
csvFilePath = "/Users/davidlaredorazo/Box Sync/Data/Zone4/1C1A/1C1A 2016-12-31.csv"
with open(csvFilePath, 'r') as csvfile:
reader = csv.reader(csvfile)
componentId = 1
count = 0
reading = ThermafuserReading(None, componentId)
for row in reader:
if count == 0:
count += 1
continue
#print(row)
timestamp = parse(row[0], None, ignoretz = True)
reading.timestamp = timestamp
new_object = copy.copy(reading)
new_object.timestamp = timestamp
print(new_object, mapper.identity_key_from_instance(new_object))
session.add(new_object)
print("new elements")
for new in session.new:
print(new, mapper.identity_key_from_instance(new_object))
As univerio mentioned in the comments. What I was doing wrong was using copy.copy to copy instances of my mapped objects, this was messing with _sa_instance_state. The solution was to create an "ad hoc" copy function for my instances. Here is the copy function I used, this, indeed solved the problem.
def copy_sqla_object(obj, omit_fk=True):
"""Given an SQLAlchemy object, creates a new object (FOR WHICH THE OBJECT
MUST SUPPORT CREATION USING __init__() WITH NO PARAMETERS), and copies
across all attributes, omitting PKs, FKs (by default), and relationship
attributes."""
cls = type(obj)
mapper = class_mapper(cls)
newobj = cls() # not: cls.__new__(cls)
pk_keys = set([c.key for c in mapper.primary_key])
rel_keys = set([c.key for c in mapper.relationships])
prohibited = pk_keys | rel_keys
if omit_fk:
fk_keys = set([c.key for c in mapper.columns if c.foreign_keys])
prohibited = prohibited | fk_keys
for k in [p.key for p in mapper.iterate_properties if p.key not in prohibited]:
try:
value = getattr(obj, k)
setattr(newobj, k, value)
except AttributeError:
pass
return newobj
You can see a more detailed discussion of this issue in
https://groups.google.com/forum/#!topic/sqlalchemy/HVSxndh23m0
Related
I'm trying to avoid N+1 issue as described in http://docs.peewee-orm.com/en/latest/peewee/relationships.html#avoiding-the-n-1-problem but still additional queries are executed.
My model:
from peewee import (SqliteDatabase, Model, BigAutoField, CharField, ForeignKeyField)
db = SqliteDatabase(':memory:')
class TestModel(Model):
class Meta:
database = db
legacy_table_names = False
class TestUser(TestModel):
id = BigAutoField(primary_key=True)
name = CharField()
class Book(TestModel):
id = BigAutoField(primary_key=True)
name = CharField()
user = ForeignKeyField(TestUser, backref='books')
class Movie(TestModel):
id = BigAutoField(primary_key=True)
name = CharField()
user = ForeignKeyField(TestUser, backref='movies')
class Tape(TestModel):
id = BigAutoField(primary_key=True)
name = CharField()
user = ForeignKeyField(TestUser, backref='tapes')
And my test:
from peewee import JOIN
from playhouse.shortcuts import model_to_dict
from playhouse.test_utils import count_queries
from test_n_plus_1l import *
def test_should_avoid_n_plus_one_problem():
db.create_tables([TestUser, Book, Movie, Tape])
tu = TestUser.create(name='Test')
Book.create(name='Book1', user_id=tu.id)
Movie.create(name='Movie1', user_id=tu.id)
Tape.create(name='Tape1', user_id=tu.id)
with count_queries() as counter:
tu = TestUser.select(TestUser, Book, Movie, Tape) \
.join_from(TestUser, Book, JOIN.LEFT_OUTER) \
.join_from(TestUser, Movie, JOIN.LEFT_OUTER) \
.join_from(TestUser, Tape, JOIN.LEFT_OUTER) \
.where(TestUser.id == tu.id).get()
model_to_dict(tu, backrefs=True, manytomany=True, max_depth=4)
assert counter.count == 1
And after running it I'm getting assertion error:
E assert 4 == 1
Peewees prints executed sql, so I clearly see that joins were executed, but why peewee executes additional queries:
('SELECT "t1"."id", "t1"."name", "t2"."id", "t2"."name", "t2"."user_id", "t3"."id", "t3"."name", "t3"."user_id", "t4"."id", "t4"."name", "t4"."user_id" FROM "test_user" AS "t1" LEFT OUTER JOIN "book" AS "t2" ON ("t2"."user_id" = "t1"."id") LEFT OUTER JOIN "movie" AS "t3" ON ("t3"."user_id" = "t1"."id") LEFT OUTER JOIN "tape" AS "t4" ON ("t4"."user_id" = "t1"."id") WHERE ("t1"."id" = ?) LIMIT ? OFFSET ?', [1, 1, 0])
('SELECT "t1"."id", "t1"."name", "t1"."user_id" FROM "book" AS "t1" WHERE ("t1"."user_id" = ?)', [1])
('SELECT "t1"."id", "t1"."name", "t1"."user_id" FROM "movie" AS "t1" WHERE ("t1"."user_id" = ?)', [1])
('SELECT "t1"."id", "t1"."name", "t1"."user_id" FROM "tape" AS "t1" WHERE ("t1"."user_id" = ?)', [1])
The problem is that each TestUser may have any number of associated Books, Movies or Tapes. So this is an instance where you may benefit from using the prefetch() helper instead - because of the direction of the foreign-keys. That said, you're best off profiling, as prefetch() may not be any faster than just doing the queries.
Here is an example of using prefetch(). Note how each user has multiple Tweets and Notes, and each Tweet has multiple Flags -- but we only execute a total of 4 queries:
class User(Base):
name = TextField()
class Tweet(Base):
user = ForeignKeyField(User, backref='tweets')
content = TextField()
class TweetFlag(Base):
tweet = ForeignKeyField(Tweet, backref='tweet_flags')
flag_type = IntegerField()
class Note(Base):
user = ForeignKeyField(User, backref='notes')
content = TextField()
db.create_tables([User, Tweet, TweetFlag, Note])
for i in range(10):
user = User.create(name='user-%s' % i)
for t in range(10):
tweet = Tweet.create(user=user, content='%s/tweet-%s' % (user.name, t))
for f in range(3):
TweetFlag.create(tweet=tweet, flag_type=f)
for n in range(5):
Note.create(user=user, content='%s/note-%s' % (user.name, n))
from playhouse.shortcuts import model_to_dict
from playhouse.test_utils import count_queries
with count_queries() as counter:
q = User.select().order_by(User.name)
p = prefetch(q, Tweet, TweetFlag, Note)
accum = []
for res in p:
accum.append(model_to_dict(res, backrefs=True))
print(counter.count)
print(accum[0])
Output:
4
{'id': 1,
'name': 'user-0',
'notes': [{'content': 'user-0/note-0', 'id': 1},
{'content': 'user-0/note-1', 'id': 2},
{'content': 'user-0/note-2', 'id': 3},
{'content': 'user-0/note-3', 'id': 4},
{'content': 'user-0/note-4', 'id': 5}],
'tweets': [{'content': 'user-0/tweet-0',
'id': 1,
'tweet_flags': [{'flag_type': 0, 'id': 1},
{'flag_type': 1, 'id': 2},
{'flag_type': 2, 'id': 3}]},
...
I am creating a Jarvis style screen and have pulled data from outlook for upcoming meetings that I wish to present on the screen.
The function pulls data from outlook and presents it in a list: -
event(Start=datetime.datetime(2020, 11, 30, 12, 30), Subject='meeting 1 description',
Duration=60)
event(Start=datetime.datetime(2020, 11, 30, 14, 0), Subject='meeting 2 description', Duration=60)
event(Start=datetime.datetime(2020, 12, 1, 8, 30), Subject='meeting 3 description', Duration=60)
event(Start=datetime.datetime(2020, 12, 1, 10, 15), Subject='meeting 4 description', Duration=45)
event(Start=datetime.datetime(2020, 12, 1, 11, 0), Subject='meeting 5 description ',
Duration=90)"
This is great, but what I want to do now is have this present as:
Start time = 'start time'
Subject = 'Meeting description'
Duration = 'duration of meeting'
Is there a way of slicing up the string in a list item and then pulling that into the code as I want it presented? Basically splitting the item in a list into component parts?
Here is the code that pulls the lists: -
def get_date(datestr):
try: # py3
adate = datetime.datetime.fromtimestamp(datestr.Start.timestamp())
except Exception:
adate = datetime.datetime.fromtimestamp(int(datestr.Start))
return adate
def getCalendarEntries(days=3, dateformat="%d/%m/%Y"):
Outlook = win32com.client.Dispatch("Outlook.Application")
ns = Outlook.GetNamespace("MAPI")
appointments = ns.GetDefaultFolder(9).Items
appointments.Sort("[Start]")
appointments.IncludeRecurrences = "True"
today = datetime.datetime.today()
begin = today.date().strftime(dateformat)
tomorrow = datetime.timedelta(days=days) + today
end = tomorrow.date().strftime(dateformat)
appointments = appointments.Restrict(
"[Start] >= '" + begin + "' AND [END] <= '" + end + "'")
events = []
for a in appointments:
adate = get_date(a)
events.append(event(adate, a.Subject, a.Duration))
return events
if __name__ == "__main__":
events = getCalendarEntries()"""
Thanks all,
Graeme
This maybe a bit hacky but the syntax for event in your string is the same as one would define a dictionary. So we can replace 'event' with 'dict' and call eval which basically evaluates a string as if it was Python code. so for example if you run this
import datetime
event_str = r"event(Start=datetime.datetime(2020, 11, 30, 12, 30), Subject='meeting 1 description', Duration=60)"
dict_str = event_str.replace('event','dict')
my_dict = eval(dict_str)
print(my_dict)
this will print
{'Start': datetime.datetime(2020, 11, 30, 12, 30), 'Subject': 'meeting 1 description', 'Duration': 60}
So my_dict will be a dictionary that you can pull various bits out of, such as my_dict['Start'] will give you the start (as datetime), etc
you would need to call this construct on each element of your events list, eg the following should create a list of dictionaries, one for each event
all_dicts = [eval(e.replace('event','dict')) for e in events]
of course you can save yourself all this trouble if you created dictionaries in the first place, so replace the relevant line in your loop with
events.append(dict(Start=adate, Subject=a.Subject, Duration=a.Duration))
and then use dict functionatility to get the fields via events[i]['Start'] etc
So this is what I came up with after your suggestion - worked like a treat - thank you so much :) You are a super star!
import win32com.client
import datetime
from collections import namedtuple
event = namedtuple("event", "Start Subject Duration")
def get_date(datestr):
try: # py3
adate = datetime.datetime.fromtimestamp(datestr.Start.timestamp())
except Exception:
adate = datetime.datetime.fromtimestamp(int(datestr.Start))
return adate
def getCalendarEntries(days=3, dateformat="%d/%m/%Y"):
Outlook = win32com.client.Dispatch("Outlook.Application")
ns = Outlook.GetNamespace("MAPI")
appointments = ns.GetDefaultFolder(9).Items
appointments.Sort("[Start]")
appointments.IncludeRecurrences = "True"
today = datetime.datetime.today()
begin = today.date().strftime(dateformat)
tomorrow = datetime.timedelta(days=days) + today
end = tomorrow.date().strftime(dateformat)
appointments = appointments.Restrict(
"[Start] >= '" + begin + "' AND [END] <= '" + end + "'")
events = []
for a in appointments:
adate = get_date(a)
events.append(dict(Start=adate, Subject=a.Subject, Duration=a.Duration))
return events
if __name__ == "__main__":
events = getCalendarEntries()
print ("Time:", events[1]['Start'])
print ("Subject:",events[1]['Subject'])
print ("Duration:",events[1]['Duration'])
I am trying to convert the rows returned in a SQLAlchemy query to dictionaries. When I try to use the ._asdict() method, I am only getting a key-value pair for the first column in my results.
Is there something else I should do to create a key-value pair in the dictionary for all columns in the result row?
class Project(db.Model):
__tablename__ = 'entries'
id = db.Column(db.Integer, primary_key=True)
time_start = db.Column(db.DateTime(timezone=False))
time_end = db.Column(db.DateTime(timezone=False))
name = db.Column(db.String(256), nullable=True)
analyst = db.Column(db.String(256), nullable=True)
def __init__(id, time_start, time_end, project_name, analyst):
self.id = id
self.time_start = time_start
self.time_end = time_end
self.name = name
self.analyst = analyst
latest_projects = db.session.query((func.max(Project.time_end)), Project.analyst).group_by(Project.analyst)
for row in latest_projects.all():
print (row._asdict())
{'analyst': 'Bob'}
{'analyst': 'Jane'}
{'analyst': 'Fred'}
I was expecting to see results like this...
{'analyst': 'Bob', 'time_end': '(2018, 11, 21, 14, 55)'}
{'analyst': 'Jane', 'time_end': '(2017, 10, 21, 08, 00)'}
{'analyst': 'Fred', 'time_end': '(2016, 09, 06, 01, 35)'}
You haven't named the func.max() column, so there is no name to use as a key in the resulting dictionary. Aggregate function columns are not automatically named, even when aggregating a single column; that you based that column on on the time_end column doesn't matter here.
Give that column a label:
latest_projects = db.session.query(
func.max(Project.time_end).label('time_end'),
Project.analyst
).group_by(Project.analyst)
Demo:
>>> latest_projects = db.session.query(
... func.max(Project.time_end).label('time_end'),
... Project.analyst
... ).group_by(Project.analyst)
>>> for row in latest_projects.all():
... print (row._asdict())
...
{'time_end': datetime.datetime(2018, 11, 21, 14, 55), 'analyst': 'Bob'}
{'time_end': datetime.datetime(2016, 9, 6, 1, 35), 'analyst': 'Fred'}
{'time_end': datetime.datetime(2017, 10, 21, 8, 0), 'analyst': 'Jane'}
So I have a little program which makes a DB call and then converts it into a PDF.
I have most of them working but this last one is returning a Key Error on me and I cannot figure out why.
Here is an example of the data being returned by the DB:
((None, 0, 0, 0, 0, 0, 0, 0), (0, 0, 0, 0, 0, 26, 0, 26), (1, 1, 0, 0, 0, 17, 0, 18), (2, 0, 0, 0, 0, 15, 0, 16))
The Traceback:
Traceback (most recent call last):
File "C:\Users\Ace\AppData\Local\Programs\Python\Python36\lib\tkinter\__init__.py", line 1699, in __call__
return self.func(*args)
File "C:/Users/Ace/Desktop/IPNV/KP_App/FML/firstapp.py", line 232, in hrday_in
hourday_filter(noodle, dest, drange)
File "C:\Users\Ace\Desktop\IPNV\KP_App\FML\dataIN.py", line 187, in hourday_filter
doc.export(pths, drange)
File "C:\Users\Ace\Desktop\IPNV\KP_App\FML\calazan.py", line 58, in export
reverse=reverse_order)
KeyError: 'h'
Im not even sure where the 'h' comes from.
Here is the function that I run the data through to prepare for the pdf generation:
def hourday_filter(tuna, pth, drange):
data = []
for hr, number, local, chicken, alligator, ace, lola, chunk in tuna:
data.append({'hour': hr,
'number': number,
'local': local,
'long': chicken,
'inter': alligator,
'income': ace,
'tandem': lola,
'total': chunk})
fields = (
('hour', 'Hour of Day'),
('number', 'Internal Calls '),
('local', 'Local Calls'),
('long', 'Long Distance Calls'),
('inter', 'International Calls '),
('income', 'Incoming Calls'),
('tandem', 'Tandem Calls'),
('total', 'Total Calls'),
)
pths = pth + '/HourofDay.pdf'
doc = DataToPdf(fields, data, sort_by='hr',
title='Hour of Day Report')
doc.export(pths, drange)
And From there the data is passed to this function to actually convert it too pdf.
class DataToPdf:
"""
Export a list of dictionaries to a table in a PDF file.
"""
def __init__(self, fields, data, sort_by=None, title=None):
"""
Arguments:
fields - A tuple of tuples ((fieldname/key, display_name))
specifying the fieldname/key and corresponding display
name for the table header.
data - The data to insert to the table formatted as a list of
dictionaries.
sort_by - A tuple (sort_key, sort_order) specifying which field
to sort by and the sort order ('ASC', 'DESC').
title - The title to display at the beginning of the document.
"""
self.fields = fields
self.data = data
self.title = title
self.sort_by = sort_by
def export(self, filename, drange, data_align='LEFT', table_halign='LEFT'):
doc = SimpleDocTemplate(filename, pagesize=letter)
styles = getSampleStyleSheet()
styleH = styles['Heading1']
styleD = styles['Heading4']
date = time.strftime("%m/%d/%Y")
date2 = 'Ran on: ' + date
date3 = ' For the period ' + str(drange[0]) + ' to ' + str(drange[1]) # Edit here to display report date range
story = []
if self.title:
story.append(Paragraph(self.title, styleH))
story.append(Spacer(1, 0.25 * inch))
story.append(Paragraph(date2, styleD))
story.append(Spacer(1, 0.015 * inch))
story.append(Paragraph(date3, styleD))
if self.sort_by:
reverse_order = False
if str(self.sort_by[1]).upper() == 'DESC':
reverse_order = False
self.data = sorted(self.data,
key=itemgetter(self.sort_by[0]),
reverse=reverse_order)
converted_data = self.__convert_data()
table = Table(converted_data, hAlign=table_halign)
table.setStyle(TableStyle([
('FONT', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, 0), 'CENTER'),
('ALIGN', (0, 0), (0, -1), data_align),
('INNERGRID', (0, 0), (-1, -1), 0.50, colors.black),
('BOX', (0, 0), (-1, -1), 0.25, colors.black),
]))
data_len = len(converted_data)
for each in range(data_len):
if each % 2 == 0:
bg_color = colors.whitesmoke
else:
bg_color = colors.lightgrey
table.setStyle(TableStyle([('BACKGROUND', (0, each), (-1, each), bg_color)]))
story.append(table)
doc.build(story)
def __convert_data(self):
"""
Convert the list of dictionaries to a list of list to create
the PDF table.
"""
# Create 2 separate lists in the same order: one for the
# list of keys and the other for the names to display in the
# table header.
keys, names = zip(*[[k, n] for k, n in self.fields])
new_data = [names]
for d in self.data:
new_data.append([d[k] for k in keys])
return new_data
Is it possible that first result of the db (the null one) is causing this? I've made about a dozen of these reports now with no problems, not sure where I am messing up here.
So thanks to FamousJameous I realized that it was indeed the first field killing my filter. The sort_by call did not know how to deal with the NULL value. I managed to fix it by removing that first field from the tuple.
The database returned the results into the variable result
From there:
new_result = result[1:]
This line removed the NULL line and stopped the error
I'm trying to read from a CSV file and create an object in Django (1.9, Py 3.5) but I'm getting this error, no matter what I change the field to
invalid literal for int() with base 10: ''
And the line is:
other = row['Other']
site = Site.objects.create(consolidated_financials = row['Consolidated financials'],
type = Type.objects.get_or_create(name=row['Type'])[0],
tier1_business = Business.objects.get_or_create(tier=1, name=row['Tier-1 business'])[0],
tier2_business = Business.objects.get_or_create(tier=2, name=row['Tier-2 business'])[0],
tier3_business = Business.objects.get_or_create(tier=2, name=row['Tier-3 business'])[0],
site_name = row['Site Name'],
site_id = row['Site ID'],
region = Region.objects.get_or_create(name=row['Region'])[0],
country = Country.objects.get_or_create(name=row['Country'], region=Region.objects.get_or_create(name=row['Region'])[0])[0],
city = City.objects.get_or_create(name=row['City'], country=Country.objects.get_or_create(name=row['Country'], region=Region.objects.get_or_create(name=row['Region'])[0])[0])[0],
site_type = SiteType.objects.get_or_create(name=row['Type of site?'])[0],
remote_site = row['Remote site?'],
finance_manager_name = row['Finance Manager Name'],
finance_manager_sso = row['Finance Manager SSO'],
quarter = row['Quarter'],
revenue = row['Revenue'],
supply_chain_manager_name = row['Supply Chain Manager Name'],
supply_chain_manager_sso = row['Supply Chain Manager SSO'],
product_lines = row['Product Lines'],
manufacturing_processes = row['Manufacturing Processes'],
factory_utilization = row['Factory Utilization'],
fte = row['FTE'],
hourly = row['Hourly'],
salaried = row['Salaried'],
other = row['Other']
)
The Site model:
class Site(models.Model):
"""
Model for a site entry
#author: Leonardo Pessoa
#since: 05/09/2016
"""
from decimal import Decimal
consolidated_financials = models.BooleanField()
type = models.ForeignKey(Type)
tier1_business = models.ForeignKey(Business, limit_choices_to = {'tier': 1}, related_name='%(class)s_tier1')
tier2_business = models.ForeignKey(Business, limit_choices_to = {'tier': 2}, related_name='%(class)s_tier2')
tier3_business = models.ForeignKey(Business, limit_choices_to = {'tier': 3}, related_name='%(class)s_tier3')
site_name = models.CharField(max_length = 150, unique=True)
site_id = models.IntegerField()
region = models.ForeignKey(Region)
country = models.ForeignKey(Country)
city = models.ForeignKey(City)
site_type = models.ForeignKey(SiteType)
remote_site = models.BooleanField()
finance_manager_name = models.CharField(max_length = 50)
finance_manager_sso = models.IntegerField()
quarter = models.DecimalField(max_digits = 12, decimal_places = 2, default=Decimal('0.0'))
revenue = models.DecimalField(max_digits = 12, decimal_places = 2, default=Decimal('0.0'))
supply_chain_manager_name = models.CharField(max_length = 50, default='')
supply_chain_manager_sso = models.IntegerField(default=000000000)
product_lines = models.CharField(max_length = 100, default='')
manufacturing_processes = models.TextField(max_length = 500, default='')
factory_utilization = models.DecimalField(max_digits = 5, decimal_places = 2, default=Decimal('0.0'))
fte = models.IntegerField()
hourly = models.IntegerField()
salaried = models.IntegerField()
other = models.TextField(max_length = 500, default='')
ges_id = models.CharField(max_length = 20)
latitude = models.DecimalField(max_digits = 10, decimal_places=7, default=Decimal('0.0'))
longitude = models.DecimalField(max_digits = 10, decimal_places=7, default=Decimal('0.0'))
The row:
row
{'City': 'xxxxxxx',
'Consolidated financials': 'True',
'Country': 'Argentina (AR)',
'FTE': '',
'Factory Utilization': '',
'Finance Manager Name': '',
'Finance Manager SSO': '',
'Hourly': '',
'Manufacturing Processes': '',
'Other': '',
'Product Lines': '',
'Quarter': '',
'Region': 'Latin America',
'Remote site?': 'True',
'Revenue': '',
'Salaried': '',
'Site ID': '12312',
'Site Name': 'xxxxxxxxx',
'Supply Chain Manager Name': '',
'Supply Chain Manager SSO': '',
'Tier-1 business': 'xxxxxxxxxx',
'Tier-2 business': 'xxxxxxxxxxxxx',
'Tier-3 business': 'Latin America',
'Type': 'xxxxxx xxxxx',
'Type of site?': 'Other'}
I know the code has a lot of room for performance optimization but I just want to prove the functionality first.
Thanks!
The problem is that your Site model is expecting other to be an int (does the model have other = IntegerField or similar?), and you're providing an empty string. The easiest fix is to change row['Other'] to row['Other'] or 0
If you know that you're going to get non-numeric values as a general rule, then you could add basic logic to test for non-digits, or update your IntegerField to something which can accept text. A list of valid Django fields can be found here.
# An example of conditional logic to test for a non-number and use 0 if so
other = row['Other'] if row['Other'] and row['Other'].isdigit() else 0
Edit
Looking at your model, the issue is probably not with the Other field, but there are typing problems nevertheless. For example Supply Chain Manager SSO is supposed to be an int, but you are definitely passing a ''.