How do I instantiate a group of objects from a text file? - python

I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"

I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2

Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)

Related

Creating namedtuple valid for differents parameters

I'm trying to figure it out a way to create a namedtuple with variable fields depending on the data you receive, in my case, I'm using the data from StatCounter and not on all the periods are the same browsers. I tried this way but it is a bit ugly and I'm sure there is a better way to achieve it.
def namedtuple_fixed(name: str, fields: List[str]) -> namedtuple:
"""Check the fields of the namedtuple and changes the invalid ones."""
fields_fixed: List[str] = []
for field in fields:
field = field.replace(" ", "_")
if field[0].isdigit():
field = f"n{field}"
fields_fixed.append(field)
return namedtuple(name, fields_fixed)
Records: namedtuple = namedtuple("empty_namedtuple", "")
def read_file(file: str) -> List["Records"]:
"""
Read the file with info about the percentage of use of various browsers
"""
global Records
with open(file, encoding="UTF-8") as browsers_file:
reader: Iterator[List[str]] = csv.reader(browsers_file)
field_names: List[str] = next(reader)
Records = namedtuple_fixed("Record", field_names)
result: List[Records] = [
Records(
*[
dt.datetime.strptime(n, "%Y-%m").date()
if record.index(n) == 0
else float(n)
for n in record
]
)
for record in reader
]
return result
The "namedtuple_fixed" function is to fix the names that have invalid identifiers.
Basically, I want to create a named tuple that receives a variable number of parameters, depending on the file you want to analyze. And if it's with type checking incorporated (I mean using NamedTuple from the typing module), much better.
Thanks in advance.
This solves my problem, but just partially
class Record(SimpleNamespace):
def __repr__(self):
items = [f"{key}={value!r}" for key, value in self.__dict__.items()]
return f"Record({', '.join(items)})"
Using the types.SimpleSpace documentation
And it can cause problems, like for example if you initiallize a Record like the following:
foo = Record(**{"a": 1, "3a": 2})
print(foo.a) # Ok
print(foo.3a) # Syntax Error

python object taking full list

I am trying to write my first object-oriented program.
The code I have come up with is:
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, name, pot):
self.atcon = atcon
self.name =name
self.pot=[]
Lattice.latc += 1
atcon=[]
with open(inpf, "r") as f:
for line in f:
match = re.search(reall, line)
if match:
if (match.group(1).strip() == "atcon"):
atcon.append(match.group(2).split())
print("CON =>"+str(atcon))
print("CON[0] =>"+str(atcon[0]))
lat0=Lattice(atcon[0],pot[0],name[0])
print("lat0 =>"+str(lat0.atcon))
I was expecting that lat0.atcon will be atcon[0]
But the result of last 3 print statement is:
CON =>[['0.5d0', '0.50d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0']]
CON[0] =>['0.5d0', '0.50d0']
lat0 =>[['0.5d0', '0.50d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0']]
I don't understand why. I am an absolute beginner and with no formal python training (learning using net and SO), so, please be patience.
Update:
After the accepted reply, this is the code I am currently working with. The problem with this is, I am reading everything in a list and then inserting those list to the lat0 i.e.
#if match found
atcon.append(match.group(2).split())
# after getting all match, in global scope
lat0=Lattice(atcon[0],name[0],pot[0])
So, I think I am either wasting the list or the object lat0. Is it possible the I directly populate lat0 when the match is found?
e.g.
#if match found for <name>
lat0=Lattice(name)
mini.py:
#!/usr/bin/env python3
import sys
import re
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, names, pots):
self.atcon = conc
self.name =names
self.pot=pots
Lattice.latc += 1
reall='(.*)\s*=\s*(.*)'
inpf = sys.argv[1]
print(inpf)
with open(inpf, "r") as f:
pot=[]
name=[]
atcon=[]
for line in f:
match = re.search(reall, line)
if match:
if (match.group(1).strip() == "atcon"):
atcon.append(match.group(2).split())
if (match.group(1).strip() == "pot"):
pot.append(match.group(2).split())
if (match.group(1).strip() == "name"):
name.append(match.group(2).split())
lat0=Lattice(atcon[0],name[0],pot[0])
print("POT =>"+str(pot))
print("NAME =>"+str(name))
print("CON =>"+str(atcon))
print("CON[0] =>"+str(atcon[0]))
print("lat0 =>"+str(lat0.pot))
Typical Input
pot=Rh1.pot Rh2.pot Fe1a.pot Fe2a.pot
name=Rh-up Fe-up
atcon=0.5d0 0.50d0
pot=Rh3.pot Rh4.pot Fe3a.pot Fe4a.pot
name=Rh-up Fe-up
atcon=0.98d0 0.02d0
I'm betting that you either wrote or tested this class in IDLE. At which point I'm sure it got really confusing but the error is very simple. When you instantiate your class it's generally advisable to use the values you sent to the __init__ and not reference other ones.
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, name, pot):
self.atcon = conc
self.name =name
self.pot=pot
Lattice.latc += 1
What happened is that atcon, pon and name were defined in the global scope and you referenced them as in the example bellow:
atcon=[1, 2, 3, 4, 5, 6]
pot = [7,8,9]
name = ["foo", "bar"]
class globs:
def __init__(self):
self.atcon = atcon
self.pot = pot
self.name = name
Which gave the following output:
>>> g = globs()
>>> g.atcon
[1, 2, 3, 4, 5, 6]
>>> g.pot
[7, 8, 9]
>>> g.name
['foo', 'bar']
EDIT Answer to extended edit to original question.
I think I get it. Two things still confuse me:
If I follow the code, it looks like you only want to get the 1st hit in the file as lat0. But you haven't explained if you only want the 1st hit in the file, or a list of objects of all hits.
You do a split, but according to your sample input that will still return a list i.e. ["Rh1.pot", "Rh2.pot", "Fe1a.pot", "Fe2a.pot"], I could be presumptuous but I've added a [0] after the split to retrieve only the 1st hit. Remove that if I missed the point.
Here is code that will stop the loop after the first hit is found. I declare atcon, pot, and name as lists because .split() will return a list, but I don't append the results not to waste memory. I also return Lattice object to exit the function and avoid wasting time parsing the rest of the lines.
Additionally the final if atcon and pot and name is there to avoid returning in case there is a piece of text that matches, but doesn't contain all important info. In python if of a empty list will be False. You can leave the rest of the code as is (except the print statements).
inpf = sys.argv[1]
print(inpf)
def parse(inpf):
atcon, pot, name = [], [], []
reall='(.*)\s*=\s*(.*)'
with open(inpf, "r") as f:
for line in f:
print(line)
if match:
if (match.group(1).strip() == "atcon"):
atcon = match.group(2).split()[0]
if (match.group(1).strip() == "pot"):
pot = match.group(2).split()[0]
if (match.group(1).strip() == "name"):
name = match.group(2).split()[0]
if atcon and pot and name:
return Lattice(atcon, name, pot)
lat0 = parse("test.txt")
print("lat0 =>"+str(lat0.pot)+" "+str(lat0.name)+" "+str(lat0.atcon))
Tested on
atcon=0.5d0 0.50d0
atcon=0.5d0 0.50d0
atcon=0.5d0 0.50d0
pot=Rh1.pot Rh2.pot Fe1a.pot Fe2a.pot
name=Rh-up Fe-up
atcon=0.5d0 0.50d0
pot=Rh3.pot Rh4.pot Fe3a.pot Fe4a.pot
name=Rh-up Fe-up
atcon=0.98d0 0.02d0
Output:
lat0 =>Rh1.pot Rh-up 0.5d0

How to check many variables in Python if not null?

I'm writing a python scraper code for OpenData and I have one question about : how to check if all values aren't filled in site and if it is null change value to null.
My scraper is here.
Currently I'm working on it to optimalize.
My variables now look like:
evcisloval = soup.find_all('td')[3].text.strip()
prinalezival = soup.find_all('td')[5].text.strip()
popisfaplnenia = soup.find_all('td')[7].text.replace('\"', '')
hodnotafaplnenia = soup.find_all('td')[9].text[:-1].replace(",", ".").replace(" ", "")
datumdfa = soup.find_all('td')[11].text
datumzfa = soup.find_all('td')[13].text
formazaplatenia = soup.find_all('td')[15].text
obchmenonazov = soup.find_all('td')[17].text
sidlofirmy = soup.find_all('td')[19].text
pravnaforma = soup.find_all('td')[21].text
sudregistracie = soup.find_all('td')[23].text
ico = soup.find_all('td')[25].text
dic = soup.find_all('td')[27].text
cislouctu = soup.find_all('td')[29].text
And Output :
scraperwiki.sqlite.save(unique_keys=["invoice_id"],
data={ "invoice_id":number,
"invoice_price":hodnotafaplnenia,
"evidence_no":evcisloval,
"paired_with":prinalezival,
"invoice_desc":popisfaplnenia,
"date_received":datumdfa,
"date_payment":datumzfa,
"pay_form":formazaplatenia,
"trade_name":obchmenonazov,
"trade_form":pravnaforma,
"company_location":sidlofirmy,
"court":sudregistracie,
"ico":ico,
"dic":dic,
"accout_no":cislouctu,
"invoice_attachment":urlfa,
"invoice_url":url})
I googled it but without success.
First, write a configuration dict of your variables in the form:
conf = {'evidence_no': (3, str.strip),
'trade_form': (21, None),
...}
i.e. key is the output key, value is a tuple of id from soup.find_all('td') and of an optional function that has to be applied to the result, None otherwise. You don't need those Slavic variable names that may confuse other SO members.
Then iterate over conf and fill the data dict.
Also, run soup.find_all('td') before the loop.
tds = soup.find_all('td')
data = {}
for name, (num, func) in conf.iteritems():
text = tds[num].text
# replace text with None or "NULL" or whatever if needed
...
if func is None:
data[name] = text
else:
data[name] = func(text)
This will remove a lot of duplicated code. Easier to maintain.
Also, I am not sure the strings "NULL" are the best way to write missing data. Doesn't sqlite support Python's real None objects?
Just read your attached link, and it seems what you want is
evcisloval = soup.find_all('td')[3].text.strip() or "NULL"
But be careful. You should only do this with strings. If the part before or is either empty or False or None, or 0, they will all be replaced with "NULL"

Using Keys as Variables in Python

There is probably a term for what I'm attempting to do, but it escapes me. I'm using peewee to set some values in a class, and want to iterate through a list of keys and values to generate the command to store the values.
Not all 'collections' contain each of the values within the class, so I want to just include the ones that are contained within my data set. This is how far I've made it:
for value in result['response']['docs']:
for keys in value:
print keys, value[keys] # keys are "identifier, title, language'
#for value in result['response']['docs']:
# collection = Collection(
# identifier = value['identifier'],
# title = value['title'],
# language = value['language'],
# mediatype = value['mediatype'],
# description = value['description'],
# subject = value['subject'],
# collection = value['collection'],
# avg_rating = value['avg_rating'],
# downloads = value['downloads'],
# num_reviews = value['num_reviews'],
# creator = value['creator'],
# format = value['format'],
# licenseurl = value['licenseurl'],
# publisher = value['publisher'],
# uploader = value['uploader'],
# source = value['source'],
# type = value['type'],
# volume = value['volume']
# )
# collection.save()
for value in result['response']['docs']:
Collection(**value).save()
See this question for an explanation on how **kwargs work.
Are you talking about how to find out whether a key is in a dict or not?
>>> somedict = {'firstname': 'Samuel', 'lastname': 'Sample'}
>>> if somedict.get('firstname'):
>>> print somedict['firstname']
Samuel
>>> print somedict.get('address', 'no address given'):
no address given
If there is a different problem you'd like to solve, please clarify your question.

append in a list causing value overwrite

I am facing a peculiar problem. I will describe in brief bellow
Suppose i have this piece of code -
class MyClass:
__postBodies = []
.
.
.
for the_file in os.listdir("/dir/path/to/file"):
file_path = os.path.join(folder, the_file)
params = self.__parseFileAsText(str(file_path)) #reads the file and gets some parsed data back
dictData = {'file':str(file_path), 'body':params}
self.__postBodies.append(dictData)
print self.__postBodies
dictData = None
params = None
Problem is, when i print the params and the dictData everytime for different files it has different values (the right thing), but as soon as the append occurs, and I print __postBodies a strange thing happens. If there are thee files, suppose A,B,C, then
first time __postBodies has the content = [{'body':{A dict with some
data related to file A}, 'file':'path/of/A'}]
second time it becomes = [{'body':{A dict with some data relaed to
file B}, 'file':'path/of/A'}, {'body':{A dict with some data relaed to
file B}, 'file':'path/of/B'}]
AND third time = [{'body':{A dict with some data relaed to file C},
'file':'path/of/A'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/B'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/C'}]
So, you see the 'file' key is working very fine. Just strangely the 'body' key is getting overwritten for all the entries with the one last appended.
Am i making any mistake? is there something i have to? Please point me to a direction.
Sorry if I am not very clear.
EDIT ------------------------
The return from self.__parseFileAsText(str(file_path)) call is a dict that I am inserting as 'body' in the dictData.
EDIT2 ----------------------------
as you asked, this is the code, but i have checked that params = self.__parseFileAsText(str(file_path)) call is returning a diff dict everytime.
def __parseFileAsText(self, fileName):
i = 0
tempParam = StaticConfig.PASTE_PARAMS
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = "text"
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = "N"
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = ""
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = ""
for line in fileinput.input([fileName]):
temp = str(line)
temp2 = temp.strip()
if i == 0:
postValues = temp2.split("|||")
if int(postValues[(len(postValues) - 1)]) == 0 or int(postValues[(len(postValues) - 1)]) == 2:
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = str(postValues[0])
if str(postValues[1]) == '':
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = 'text'
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = postValues[1]
if str(postValues[2]) != "N":
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = str(postValues[2])
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = str(postValues[3])
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
i = i+1
else:
if tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] != "" :
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = str(tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE])+"\n"+temp2
else:
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = temp2
return tempParam
You are likely returning the same dictionary with every call to MyClass.__parseFileAsText(), a couple of common ways this might be happening:
__parseFileAsText() accepts a mutable default argument (the dict that you eventually return)
You modify an attribute of the class or instance and return that instead of creating a new one each time
Making sure that you are creating a new dictionary on each call to __parseFileAsText() should fix this problem.
Edit: Based on your updated question with the code for __parseFileAsText(), your issue is that you are reusing the same dictionary on each call:
tempParam = StaticConfig.PASTE_PARAMS
...
return tempParam
On each call you are modifying StaticConfig.PASTE_PARAMS, and the end result is that all of the body dictionaries in your list are actually references to StaticConfig.PASTE_PARAMS. Depending on what StaticConfig.PASTE_PARAMS is, you should change that top line to one of the following:
# StaticConfig.PASTE_PARAMS is an empty dict
tempParam = {}
# All values in StaticConfig.PASTE_PARAMS are immutable
tempParam = dict(StaticConfig.PASTE_PARAMS)
If any values in StaticConfig.PASTE_PARAMS are mutable, you could use copy.deepcopy but it would be better to populate tempParam with those default values on your own.
What if __postBodies wasn't a class attribute, as it is defined now, but just an instance attribute?

Categories