Need to create many python objects by iterating over an excel file - python

So I've created a class...
class Dept_member:
quarterly_budget = 0
outside_services = 0
regular_count = 0
contractor_count = 0
gds_function = ''
dept_name = ''
def __init__(self, quarterly_budget, outside_services, dept_name):
self.quarterly_budget = quarterly_budget
self.outside_services = outside_services
self.dept_name = dept_name
def regular_cost(self):
print "%s" % str((self.quarterly_budget - self.outside_services) / self.regular_count)
def contractor_cost(self):
print "%s" % str(self.outside_services / self.contractor_count)
Now I want to use variables I collect while iterating over an excel file to create objects for each row using the class detailed above.
for row in range(6,d_sh.get_highest_row()):
if f_sh.cell(row=row, column=2).value:
deptno = f_sh.cell(row=row, column=2).value
q_budget = f_sh.cell(row=row, column=17).value #Q3 Actual
os_budget = f_sh.cell(row=row, column=14).value
deptnode = f_sh.cell(row=row, column=1).value
chop = deptnode.split(" ")
deptname = " ".join(chop[1:])
Dept = "gds_"+str(deptno) ### This is what I want my new object to be called!
Dept = Dept_member(q_budget, os_budget, deptname)
Below are some output from an idle interactive session after this runs.
>>>
>>> deptno
u'180024446'
>>> q_budget
59412.00978792818
>>> os_budget
9973.898858075034
>>> deptnode
u'M180024446 GDS Common HW FEP China'
>>> deptname
u'GDS Common HW FEP China'
>>> Dept
<__main__.Dept_member instance at 0x126c32050>
>>> Dept.quarterly_budget
59412.00978792818
What I really wanted was an object named gds_180024446 but instead it mutated the variable.
Is it possible to create a bunch of objects using variables in a loop?

You should probably use python dictionaries (tutorial page describing dictionaries), instead of creating bunch of variables using eval function:
Dept["gds_"+str(deptno)] = Dept_member(q_budget, os_budget, deptname)
After that, you can fetch your object from dictionary with:
Dept['gds_180024446']

I think you need to use the eval function like
eval("gds_" + str(deptno) + " = Dept_member(q_budget, os_budget, deptname)")

Related

Passing a list within methods in a class

I have a problem with passing a variable from a method to method within a given class.
The code is this (I am practicing beginner):
class Calendar():
def __init__(self,link):
self.link = link
self.request = requests.get(link)
self.request.encoding='UTF-8'
self.soup = BeautifulSoup(self.request.text,'lxml')
def DaysMonth(self):
Dates = []
tds = self.soup.findAll('td', {'class':'action'})
for td in tds:
check = (td.findAll('a')[0].text)
if "Víkendová odstávka" in check:
date = td.findAll('span')[0].text
Dates.append(date)
return Dates
def PrintCal(self):
return ['Víkendová odstávka serverů nastane ' + date + '. den v měsíci.' for date in Dates]
def main(self):
PrintCal(DaysMonth())
I would like to pass the list Dates from the method DaysMonth to the method PrintCal. When I initiate the class, i.e. cal = Calendar('link'), and run cal.PrinCal(), I get that the name Dates has not been defined. If I run cal.DaysMonth(), the output is as expected.
What is the issue here? Thank you!
Dates is a local variable in the DaysMonth method, and is therefore not visible anywhere else. Fortunately, DaysMonth does return Dates, so it's easy to get the value you want. Simply add the following line to your PrintCal method (before the return statement):
Dates = self.DaysMonth()
You are trying to do too much in the Calendar object, particularly in the init method. You don't want to combine the scraping and parsing of a website at the time the object is instantiated. I would use the Calendar object to store and display the results of the scraping/parsing. If you need everything to be object-oriented, than create a separate Scraper/Parser class that handles that part of the logic.
class Calendar():
def __init__(self, dates):
self.dates = dates
def display_dates(self):
return ['Víkendová odstávka serverů nastane ' + date + '. den v měsíci.'
for date in self.dates]
r = requests.get(link, encoding='UTF-8')
soup = BeautifulSoup(r.text,'lxml')
dates = []
for td in soup.findAll('td', {'class':'action'}):
check = (td.findAll('a')[0].text)
if "Víkendová odstávka" in check:
dates.append(td.findAll('span')[0].text)
c = Calendar(dates=dates)
print(c.display_dates)

Create multiple dataframes as properties of an instance of a class within if loop

I have a class, myClass, that I wish to add several dataframes too. At first the class requires a name, and a list of filepaths for an instance to be created:
class myClass:
def __init__(self, name, filepathlist):
self.name = name
self.filepathlist = filepathlist
The data that is pulled into the instance is not in the desired format. As such I have created a method of the class to format the data and create a property of the class for each file that is read:
def formatData(self):
i = 0
if i < (len(self.filepathlist) - 1):
DFRAW = pd.read_csv(self.filepathlist[i], header = 9) #Row 9 is the row that is not blank (all blank auto-skipped)
DFRAW['DateTime'], DFRAW['dummycol1'] = DFRAW[' ;W;W;W;W'].str.split(';', 1).str
DFRAW['Col1'], DFRAW['dummycol2'] = DFRAW['dummycol1'].str.split(';', 1).str
DFRAW['Col2'], DFRAW['dummycol3'] = DFRAW['dummycol2'].str.split(';', 1).str
DFRAW['Col3'], DFRAW['Col4'] = DFRAW['dummycol3'].str.split(';', 1).str
DFRAW= DFRAW.drop([' ;W;W;W;W', 'dummycol1', 'dummycol2', 'dummycol3'], axis = 1)
#There appears to be an issue with these two lines.
processedfilename = "MYDFNAME" + str(i)
self.processedfilename = DFRAW
i = i + 1
I have run the formatting lines of code, those that start with DFRAW, outside of the class and believe these are working correctly.
Somewhere in the script there is an issue with assigning the dataframes as properties of the class; I create a list of filepaths and an instance of the class:
filepathlist = [r"file1.csv",r"file2.csv"]
myINST = myClass("MyInstName", filepathlist )
Then run the formatting method:
myINST.formatData()
Now running the following to check that the instance of the class, myINST, has the properties correctly assigned;
vars(myINST)
But this returns the filepathlist, name and roughly 8000 lines of rows of data from the dataframe. I was expecting the following:
filepathlist, name, MYDFNAME0, MYDFNAME1
What is the error in my code or my approach?
vars will return all the values of an instance, and since myClass have three values: name, filepathlist and processedfilename (which should really be a dataframe), so it will return all.
If you only want the filepathlist, you can access it through instance_object.field_name.
myINST.filepathlist and this will return [r"file1.csv",r"file2.csv"].
Also, you are probably not doing correct here:
processedfilename = "MYDFNAME" + str(i)
self.processedfilename = DFRAW
i = i + 1
(1) You are storing dataframe object in a field called processedfilename, which is weird. (2) You are not appending values but rather replacing, thus after the loop, this will only return you the latest data frame in your filepathlist.
You should store your dataframe in a better format: list, dictionary, etc.
Actually you can access your dataframe(s) in vars() if you incorporate it into the __init__ method. Below builds a dictionary of dataframes with keys being original csv file names.
class myClass:
def __init__(self, name, filepathlist):
self.name = name
self.filepathlist = filepathlist
self.mydataframedict = self.formatData()
def formatData(self):
tmp_dict = {}
for f in self.filepathlist:
DFRAW = pd.read_csv(f, header = 9)
DFRAW['DateTime'], DFRAW['dummycol1'] = DFRAW[' ;W;W;W;W'].str.split(';', 1).str
DFRAW['Col1'], DFRAW['dummycol2'] = DFRAW['dummycol1'].str.split(';', 1).str
DFRAW['Col2'], DFRAW['dummycol3'] = DFRAW['dummycol2'].str.split(';', 1).str
DFRAW['Col3'], DFRAW['Col4'] = DFRAW['dummycol3'].str.split(';', 1).str
DFRAW = DFRAW.drop([' ;W;W;W;W', 'dummycol1', 'dummycol2', 'dummycol3'], axis = 1)
tmp_dict[f] = DFRAW
return tmp_dict
filepathlist = [r"file1.csv", r"file2.csv"]
myINST = myClass("MyInstName", filepathlist )
new_dict = myINST.formatData() # LOCAL VARIABLE (ALSO ACCESSIBLE IN VARS)
print(vars(myINST))
# {'name': 'MyInstName', 'mydataframedict': {'file1.csv': ..., 'file2.csv': ...},
# 'filepathlist': ['file1.csv', 'file2.csv']}

How do I instantiate a group of objects from a text file?

I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"
I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2
Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)

Using Keys as Variables in Python

There is probably a term for what I'm attempting to do, but it escapes me. I'm using peewee to set some values in a class, and want to iterate through a list of keys and values to generate the command to store the values.
Not all 'collections' contain each of the values within the class, so I want to just include the ones that are contained within my data set. This is how far I've made it:
for value in result['response']['docs']:
for keys in value:
print keys, value[keys] # keys are "identifier, title, language'
#for value in result['response']['docs']:
# collection = Collection(
# identifier = value['identifier'],
# title = value['title'],
# language = value['language'],
# mediatype = value['mediatype'],
# description = value['description'],
# subject = value['subject'],
# collection = value['collection'],
# avg_rating = value['avg_rating'],
# downloads = value['downloads'],
# num_reviews = value['num_reviews'],
# creator = value['creator'],
# format = value['format'],
# licenseurl = value['licenseurl'],
# publisher = value['publisher'],
# uploader = value['uploader'],
# source = value['source'],
# type = value['type'],
# volume = value['volume']
# )
# collection.save()
for value in result['response']['docs']:
Collection(**value).save()
See this question for an explanation on how **kwargs work.
Are you talking about how to find out whether a key is in a dict or not?
>>> somedict = {'firstname': 'Samuel', 'lastname': 'Sample'}
>>> if somedict.get('firstname'):
>>> print somedict['firstname']
Samuel
>>> print somedict.get('address', 'no address given'):
no address given
If there is a different problem you'd like to solve, please clarify your question.

SQLAlchemy session query with INSERT IGNORE

I'm trying to do a bulk insert/update with SQLAlchemy. Here's a snippet:
for od in clist:
where = and_(Offer.network_id==od['network_id'],
Offer.external_id==od['external_id'])
o = session.query(Offer).filter(where).first()
if not o:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
session.add(o)
session.flush()
for country in countrylist:
c = session.query(Country).filter(Country.name==country).first()
where = and_(OfferPayout.offer_id==o.id,
OfferPayout.country_name==country)
opayout = session.query(OfferPayout).filter(where).first()
if not opayout:
opayout = OfferPayout()
opayout.offer_id = o.id
opayout.payout = od['payout']
if c:
opayout.country_id = c.id
opayout.country_name = country
else:
opayout.country_id = 0
opayout.country_name = country
session.add(opayout)
session.flush()
It looks like my issue was touched on here, http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg05983.html, but I don't know how to use "textual clauses" with session query objects and couldn't find much (though admittedly I haven't had as much time as I'd like to search).
I'm new to SQLAlchemy and I'd imagine there's some issues in the code besides the fact that it throws an exception on a duplicate key. For example, doing a flush after every iteration of clist (but I don't know how else to get an the o.id value that is used in the subsequent OfferPayout inserts).
Guidance on any of these issues is very appreciated.
The way you should be doing these things is with session.merge().
You should also be using your objects relation properties. So the o above should have o.offerpayout and this a list (of objects) and your offerpayout has offerpayout.country property which is the related countries object.
So the above would look something like
for od in clist:
o = Offer()
o.network_id = od['network_id']
o.external_id = od['external_id']
o.title = od['title']
o.updated = datetime.datetime.now()
payout = od['payout']
countrylist = od['countries']
for country in countrylist:
opayout = OfferPayout()
opayout.payout = od['payout']
country_obj = Country()
country_obj.name = country
opayout.country = country_obj
o.offerpayout.append(opayout)
session.merge(o)
session.flush()
This should work as long as all the primary keys are correct (i.e the country table has a primary key of name). Merge essentially checks the primary keys and if they are there merges your object with one in the database (it will also cascade down the joins).

Categories