Python - Formatting strings

Python - Formatting strings - python

I have the input file :
sun vehicle
one number
two number
reduce command
one speed
five speed
zero speed
speed command
kmh command
I used the following code:
from collections import OrderedDict
output = OrderedDict()
with open('final') as in_file:
for line in in_file:
columns = line.split(' ')
if len(columns) >= 2:
word,tag = line.strip().split()
if output.has_key(tag) == False:
output[tag] = [];
output[tag].append(word)
else:
print ""
for k, v in output.items():
print '<{}> {} </{}>'.format(k, ' '.join(v), k)
output = OrderedDict()
I am getting the output as:
<vehicle> sun </vehicle>
<number> one two </number>
<command> reduce speed kmh </command>
<speed> one five zero </speed>
But my expected output should be:
<vehicle> sun </vehicle>
<number> one two </number>
<command> reduce
<speed> one five zero </speed>
speed kmh </command>
Can someone help me in solving this?

It looks like the output you want to achieve is underspecified!
You presumably want the code to "know in advance" that speed is a part of command, before you get to the line speed command.
To do what you want, you will need a recursive function.
How about
for k, v in output.items():
print expandElements(k, v,output)
and somewhere you define
def expandElements(k,v, dic):
out = '<' +k + '>'
for i in v:
# check each item of v for matches in dic.
# if no match, then out=out+i
# otherwise expand using a recursive call of expandElements()
# and out=out+expandElements
out = out + '<' +k + '>'

It looks like you want some kind of tree structure for your output?
You are printing out with print '<{}> {} </{}>'.format(k, ' '.join(v), k) so all of your output is going to have the form of '<{}> {} </{}>'.
If you want to nest things you are going to need a nested structure to represent them.

For recursivly parsing the input file I would make a class representing the tag. Each tag can have its children. Every children is first a string added manually with tag.children.append("value") or by calling tag.add_value(tag.name, "value").
class Tag:
def __init__(self, name, parent=None):
self.name = name
self.children = []
self.has_root = True
self.parent = parent
def __str__(self):
""" compose string for this tag (recursivly) """
if not self.children:
return self.name
children_str = ' '.join([str(child) for child in self.children])
if not self.parent:
return children_str
return '<%s>%s</%s>' % (self.name, children_str, self.name)
#classmethod
def from_file(cls, file):
""" create root tag from file """
obj = cls('root')
columns = []
with open(file) as in_file:
for line in in_file:
value, tag = line.strip().split(' ')
obj.add_tag(tag, value)
return obj
def search_tag(self, tag):
""" search for a tag in the children """
if self.name == tag:
return self
for i, c in enumerate(self.children):
if isinstance(c, Tag) and c.name == tag:
return c
elif isinstance(c, str):
if c.strip() == tag.strip():
self.children[i] = Tag(tag, self)
return self.children[i]
else:
result = c.search_tag(tag)
if result:
return result
def add_tag(self, tag, value):
"""
add a value, tag pair to the children
Firstly this searches if the value is an child. If this is the
case it moves the children to the new location
Afterwards it searches the tag in the children. When found
the value is added to this tag. If not a new tag object
is created and added to this Tag. The flag has_root
is set to False so the element can be moved later.
"""
value_tag = self.search_tag(value)
if value_tag and not value_tag.has_root:
print("Found value: %s" % value)
if value_tag.parent:
i = value_tag.parent.children.index(value_tag)
value = value_tag.parent.children.pop(i)
value.has_root = True
else:
print("not %s" % value)
found = self.search_tag(tag)
if found:
found.children.append(value)
else:
# no root
tag_obj = Tag(tag, self)
self.children.append(tag_obj)
tag_obj.add_tag(tag, value)
tag_obj.has_root = False
tags = Tag.from_file('final')
print(tags)
I know in this example the speed-Tag is not added twice. I hope that's ok.
Sorry for the long code.

Related

Efficient partial search of a trie in python

This is a hackerrank exercise, and although the problem itself is solved, my solution is apparently not efficient enough, so on most test cases I'm getting timeouts. Here's the problem:
We're going to make our own Contacts application! The application must perform two types of operations:
add name, where name is a string denoting a contact name. This must store as a new contact in the application.
find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting with partial and print the count on a new line.
Given n sequential add and find operations, perform each operation in order.
I'm using Tries to make it work, here's the code:
import re
def add_contact(dictionary, contact):
_end = '_end_'
current_dict = dictionary
for letter in contact:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return(dictionary)
def find_contact(dictionary, contact):
p = re.compile('_end_')
current_dict = dictionary
for letter in contact:
if letter in current_dict:
current_dict = current_dict[letter]
else:
return(0)
count = int(len(p.findall(str(current_dict))) / 2)
re.purge()
return(count)
n = int(input().strip())
contacts = {}
for a0 in range(n):
op, contact = input().strip().split(' ')
if op == "add":
contacts = add_contact(contacts, contact)
if op == "find":
print(find_contact(contacts, contact))
Because the problem requires not returning whether partial is a match or not, but instead counting all of the entries that match it, I couldn't find any other way but cast the nested dictionaries to a string and then count all of the _end_s, which I'm using to denote stored strings. This, it would seem, is the culprit, but I cannot find any better way to do the searching. How do I make this work faster? Thanks in advance.
UPD:
I have added a results counter that actually parses the tree, but the code is still too slow for the online checker. Any thoughts?
def find_contact(dictionary, contact):
current_dict = dictionary
count = 0
for letter in contact:
if letter in current_dict:
current_dict = current_dict[letter]
else:
return(0)
else:
return(words_counter(count, current_dict))
def words_counter(count, node):
live_count = count
live_node = node
for value in live_node.values():
if value == '_end_':
live_count += 1
if type(value) == type(dict()):
live_count = words_counter(live_count, value)
return(live_count)

Ok, so, as it turns out, using nested dicts is not a good idea in general, because hackerrank will shove 100k strings into your program and then everything will slow to a crawl. So the problem wasn't in the parsing, it was in the storing before the parsing. Eventually I found this blogpost, their solution passes the challenge 100%. Here's the code in full:
class Node:
def __init__(self):
self.count = 1
self.children = {}
trie = Node()
def add(node, name):
for letter in name:
sub = node.children.get(letter)
if sub:
sub.count += 1
else:
sub = node.children[letter] = Node()
node = sub
def find(node, data):
for letter in data:
sub = node.children.get(letter)
if not sub:
return 0
node = sub
return node.count
if __name__ == '__main__':
n = int(input().strip())
for _ in range(n):
op, param = input().split()
if op == 'add':
add(trie, param)
else:
print(find(trie, param))

Checking for None when accessing nested attributes

I am currently implementing an ORM that stores data defined in an XSD handled with a DOM generated by PyXB.
Many of the respective elements contain sub-elements and so forth, which each have a minOccurs=0 and thus may resolve to None in the DOM.
Hence when accessing some element hierarchy containing optional elements I now face the problem whether to use:
with suppress(AttributeError):
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
or rather
if root.subelement is not None:
if root.subelement.sub_subelement is not None:
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
While both styles work perfectly fine, which is preferable? (I am not Dutch, btw.)

This also works:
if root.subelement and root.subelement.sub_subelement:
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
The if statement evaluates None as False and will check from left to right. So if the first element evaluates to false it will not try to access the second one.

If you have quite a few such lookups to perform, better to wrap this up in a more generic lookup function:
# use a sentinel object distinct from None
# in case None is a valid value for an attribute
notfound = object()
# resolve a python attribute path
# - mostly, a `getattr` that supports
# arbitrary sub-attributes lookups
def resolve(element, path):
parts = path.split(".")
while parts:
next, parts = parts[0], parts[1:]
element = getattr(element, next, notfound)
if element is notfound:
break
return element
# just to test the whole thing
class Element(object):
def __init__(self, name, **attribs):
self.name = name
for k, v in attribs.items():
setattr(self, k, v)
e = Element(
"top",
sub1=Element("sub1"),
nested1=Element(
"nested1",
nested2=Element(
"nested2",
nested3=Element("nested3")
)
)
)
tests = [
"notthere",
"does.not.exists",
"sub1",
"sub1.sub2",
"nested1",
"nested1.nested2",
"nested1.nested2.nested3"
]
for path in tests:
sub = resolve(e, path)
if sub is notfound:
print "%s : not found" % path
else:
print "%s : %s" % (path, sub.name)

How to get source corresponding to a Python AST node?

Python AST nodes have lineno and col_offset attributes, which indicate the beginning of respective code range. Is there an easy way to get also the end of the code range? A 3rd party library?

EDIT: Latest code (tested in Python 3.5-3.7) is here: https://bitbucket.org/plas/thonny/src/master/thonny/ast_utils.py
As I didn't find an easy way, here's a hard (and probably not optimal) way. Might crash and/or work incorrectly if there are more lineno/col_offset bugs in Python parser than those mentioned (and worked around) in the code. Tested in Python 3.3:
def mark_code_ranges(node, source):
"""
Node is an AST, source is corresponding source as string.
Function adds recursively attributes end_lineno and end_col_offset to each node
which has attributes lineno and col_offset.
"""
NON_VALUE_KEYWORDS = set(keyword.kwlist) - {'False', 'True', 'None'}
def _get_ordered_child_nodes(node):
if isinstance(node, ast.Dict):
children = []
for i in range(len(node.keys)):
children.append(node.keys[i])
children.append(node.values[i])
return children
elif isinstance(node, ast.Call):
children = [node.func] + node.args
for kw in node.keywords:
children.append(kw.value)
if node.starargs != None:
children.append(node.starargs)
if node.kwargs != None:
children.append(node.kwargs)
children.sort(key=lambda x: (x.lineno, x.col_offset))
return children
else:
return ast.iter_child_nodes(node)
def _fix_triple_quote_positions(root, all_tokens):
"""
http://bugs.python.org/issue18370
"""
string_tokens = list(filter(lambda tok: tok.type == token.STRING, all_tokens))
def _fix_str_nodes(node):
if isinstance(node, ast.Str):
tok = string_tokens.pop(0)
node.lineno, node.col_offset = tok.start
for child in _get_ordered_child_nodes(node):
_fix_str_nodes(child)
_fix_str_nodes(root)
# fix their erroneous Expr parents
for node in ast.walk(root):
if ((isinstance(node, ast.Expr) or isinstance(node, ast.Attribute))
and isinstance(node.value, ast.Str)):
node.lineno, node.col_offset = node.value.lineno, node.value.col_offset
def _fix_binop_positions(node):
"""
http://bugs.python.org/issue18374
"""
for child in ast.iter_child_nodes(node):
_fix_binop_positions(child)
if isinstance(node, ast.BinOp):
node.lineno = node.left.lineno
node.col_offset = node.left.col_offset
def _extract_tokens(tokens, lineno, col_offset, end_lineno, end_col_offset):
return list(filter((lambda tok: tok.start[0] >= lineno
and (tok.start[1] >= col_offset or tok.start[0] > lineno)
and tok.end[0] <= end_lineno
and (tok.end[1] <= end_col_offset or tok.end[0] < end_lineno)
and tok.string != ''),
tokens))
def _mark_code_ranges_rec(node, tokens, prelim_end_lineno, prelim_end_col_offset):
"""
Returns the earliest starting position found in given tree,
this is convenient for internal handling of the siblings
"""
# set end markers to this node
if "lineno" in node._attributes and "col_offset" in node._attributes:
tokens = _extract_tokens(tokens, node.lineno, node.col_offset, prelim_end_lineno, prelim_end_col_offset)
#tokens =
_set_real_end(node, tokens, prelim_end_lineno, prelim_end_col_offset)
# mark its children, starting from last one
# NB! need to sort children because eg. in dict literal all keys come first and then all values
children = list(_get_ordered_child_nodes(node))
for child in reversed(children):
(prelim_end_lineno, prelim_end_col_offset) = \
_mark_code_ranges_rec(child, tokens, prelim_end_lineno, prelim_end_col_offset)
if "lineno" in node._attributes and "col_offset" in node._attributes:
# new "front" is beginning of this node
prelim_end_lineno = node.lineno
prelim_end_col_offset = node.col_offset
return (prelim_end_lineno, prelim_end_col_offset)
def _strip_trailing_junk_from_expressions(tokens):
while (tokens[-1].type not in (token.RBRACE, token.RPAR, token.RSQB,
token.NAME, token.NUMBER, token.STRING,
token.ELLIPSIS)
and tokens[-1].string not in ")}]"
or tokens[-1].string in NON_VALUE_KEYWORDS):
del tokens[-1]
def _strip_trailing_extra_closers(tokens, remove_naked_comma):
level = 0
for i in range(len(tokens)):
if tokens[i].string in "({[":
level += 1
elif tokens[i].string in ")}]":
level -= 1
if level == 0 and tokens[i].string == "," and remove_naked_comma:
tokens[:] = tokens[0:i]
return
if level < 0:
tokens[:] = tokens[0:i]
return
def _set_real_end(node, tokens, prelim_end_lineno, prelim_end_col_offset):
# prelim_end_lineno and prelim_end_col_offset are the start of
# next positioned node or end of source, ie. the suffix of given
# range may contain keywords, commas and other stuff not belonging to current node
# Function returns the list of tokens which cover all its children
if isinstance(node, _ast.stmt):
# remove empty trailing lines
while (tokens[-1].type in (tokenize.NL, tokenize.COMMENT, token.NEWLINE, token.INDENT)
or tokens[-1].string in (":", "else", "elif", "finally", "except")):
del tokens[-1]
else:
_strip_trailing_extra_closers(tokens, not isinstance(node, ast.Tuple))
_strip_trailing_junk_from_expressions(tokens)
# set the end markers of this node
node.end_lineno = tokens[-1].end[0]
node.end_col_offset = tokens[-1].end[1]
# Try to peel off more tokens to give better estimate for children
# Empty parens would confuse the children of no argument Call
if ((isinstance(node, ast.Call))
and not (node.args or node.keywords or node.starargs or node.kwargs)):
assert tokens[-1].string == ')'
del tokens[-1]
_strip_trailing_junk_from_expressions(tokens)
# attribute name would confuse the "value" of Attribute
elif isinstance(node, ast.Attribute):
if tokens[-1].type == token.NAME:
del tokens[-1]
_strip_trailing_junk_from_expressions(tokens)
else:
raise AssertionError("Expected token.NAME, got " + str(tokens[-1]))
#import sys
#print("Expected token.NAME, got " + str(tokens[-1]), file=sys.stderr)
return tokens
all_tokens = list(tokenize.tokenize(io.BytesIO(source.encode('utf-8')).readline))
_fix_triple_quote_positions(node, all_tokens)
_fix_binop_positions(node)
source_lines = source.split("\n")
prelim_end_lineno = len(source_lines)
prelim_end_col_offset = len(source_lines[len(source_lines)-1])
_mark_code_ranges_rec(node, all_tokens, prelim_end_lineno, prelim_end_col_offset)

We had a similar need, and I created the asttokens library for this purpose. It maintains the source in both text and tokenized form, and marks AST nodes with token information, from which text is also readily available.
It works with Python 2 and 3 (tested with 2.7 and 3.5). For example:
import ast, asttokens
st='''
def greet(a):
say("hello") if a else say("bye")
'''
atok = asttokens.ASTTokens(st, parse=True)
for node in ast.walk(atok.tree):
if hasattr(node, 'lineno'):
print atok.get_text_range(node), node.__class__.__name__, atok.get_text(node)
Prints
(1, 50) FunctionDef def greet(a):
say("hello") if a else say("bye")
(17, 50) Expr say("hello") if a else say("bye")
(11, 12) Name a
(17, 50) IfExp say("hello") if a else say("bye")
(33, 34) Name a
(17, 29) Call say("hello")
(40, 50) Call say("bye")
(17, 20) Name say
(21, 28) Str "hello"
(40, 43) Name say
(44, 49) Str "bye"

ast.get_source_segment was added in python 3.8:
import ast
code = """
if 1 == 1 and 2 == 2 and 3 == 3:
test = 1
"""
node = ast.parse(code)
ast.get_source_segment(code, node.body[0])
Produces: if 1 == 1 and 2 == 2 and 3 == 3:\n test = 1
Thanks to Blane for his answer in https://stackoverflow.com/a/62624882/3800552

Hi I know its very late , But I think is this is what you are looking for,
I am doing the parsing only for function definitions in the module.
We can get the first and last line of the ast node by this method. This way the source code lines of a function definition can be obtained by parsing the source file by reading only the lines we need .
This is a very simple example ,
st='def foo():\n print "hello" \n\ndef bla():\n a = 1\n b = 2\n
c= a+b\n print c'
import ast
tree = ast.parse(st)
for function in tree.body:
if isinstance(function,ast.FunctionDef):
# Just in case if there are loops in the definition
lastBody = func.body[-1]
while isinstance (lastBody,(ast.For,ast.While,ast.If)):
lastBody = lastBody.Body[-1]
lastLine = lastBody.lineno
print "Name of the function is ",function.name
print "firstLine of the function is ",function.lineno
print "LastLine of the function is ",lastLine
print "the source lines are "
if isinstance(st,str):
st = st.split("\n")
for i , line in enumerate(st,1):
if i in range(function.lineno,lastLine+1):
print line

Splitting the elements of a list into a list and then splitting them again

This is a sample of the raw text i'm reading:
ID: 00000001
SENT: to do something
to 01573831
do 02017283
something 03517283
ID: 00000002
SENT: just an example
just 06482823
an 01298744
example 01724894
Right now I'm trying to split it into a lists of lists of lists.
Topmost level list: By the ID so 2 elements here (done)
Next level: Within each ID, split by newlines
Last level: Within each line split the word and ID, for the lines beginning with ID or SENT, it doesn't matter if they are split or not. Between the word and their ID is an indent (\t)
Current code:
f=open("text.txt","r")
raw=list(f)
text=" ".join(raw)
wordlist=text.split("\n \n ") #split by ID
toplist=wordlist[:2] #just take 2 IDs
Edit:
I was going to cross-reference the words to another text file to add their word classes which is why i asked for a lists of lists of lists.
Steps:
1) Use .append() to add on word classes for each word
2) Use "\t".join() to connect a line together
3) Use "\n".join() to connect different lines in an ID
4) "\n\n".join() to connect all the IDs together into a string
Output:
ID: 00000001
SENT: to do something
to 01573831 prep
do 02017283 verb
something 03517283 noun
ID: 00000002
SENT: just an example
just 06482823 adverb
an 01298744 ind-art
example 01724894 noun

A more pythonic version of Thorsten's answer:
from collections import namedtuple
class Element(namedtuple("ElementBase", "id sent words")):
#classmethod
def parse(cls, source):
lines = source.split("\n")
return cls(
id=lines[0][4:],
sent=lines[1][6:],
words=dict(
line.split("\t") for line in lines[2:]
)
)
text = """ID: 00000001
SENT: to do something
to\t01573831
do\t02017283
something\t03517283
ID: 00000002
SENT: just an example
just\t06482823
an\t01298744
example\t01724894"""
elements = [Element.parse(part) for part in text.split("\n\n")]
for el in elements:
print el
print el.id
print el.sent
print el.words
print

I'd regard every part of the topmost split as an "object". Thus, I'd create a class with properties corresponding to each part.
class Element(object):
def __init__(self, source):
lines = source.split("\n")
self._id = lines[0][4:]
self._sent = lines[1][6:]
self._words = {}
for line in lines[2:]:
word, id_ = line.split("\t")
self._words[word] = id_
#property
def ID(self):
return self._id
#property
def sent(self):
return self._sent
#property
def words(self):
return self._words
def __str__(self):
return "Element %s, containing %i words" % (self._id, len(self._words))
text = """ID: 00000001
SENT: to do something
to\t01573831
do\t02017283
something\t03517283
ID: 00000002
SENT: just an example
just\t06482823
an\t01298744
example\t01724894"""
elements = [Element(part) for part in text.split("\n\n")]
for el in elements:
print el
print el.ID
print el.sent
print el.words
print
In the main code (one line, the list comprehension) the text is only split at each double new-line. Then, all logic is deferred into the __init__ method, making it very local.
Using a class also gives you the benefit of __str__, allowing you control over how your objects are printed.
You could also consider rewriting the last three lines of __init__ to:
self._words = dict([line.split("\t") for line in lines[2:]])
but I wrote a plain loop as it seemed to be easier to understand.
Using a class also gives you the

I'm not sure exactly what output you need but you can adjust this to fit your needs (This uses the itertools grouper recipe):
>>> from itertools import izip_longest
>>> def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
>>> with open('text.txt') as f:
print [[x.rstrip().split(None, 1) for x in g if x.rstrip()]
for g in grouper(6, f, fillvalue='')]
[[['ID:', '00000001'], ['SENT:', 'to do something'], ['to', '01573831'], ['do', '02017283'], ['something', '03517283']],
[['ID:', '00000002'], ['SENT:', 'just an example'], ['just', '06482823'], ['an', '01298744'], ['example', '01724894']]]

would this work for you?:
Top - level (which you have done)
def get_parent(text, parent):
"""recursively walk through text, looking for 'ID' tag"""
# find open_ID and close_ID
open_ID = text.find('ID')
close_ID = text.find('ID', open_ID + 1)
# if there is another instance of 'ID', recursively walk again
if close_ID != -1:
parent.append(text[open_ID : close_ID])
return get_parent(text[close_ID:], parent)
# base-case
else:
parent.append(text[open_ID:])
return
Second - level: split by newlines:
def child_split(parent):
index = 0
while index < len(parent):
parent[index] = parent[index].split('\n')
index += 1
Third - level: split the 'ID' and 'SENT' fields
def split_field(parent, index):
if index < len(parent):
child = 0
while child < len(parent[index]):
if ':' in parent[index][child]:
parent[index][child] = parent[index][child].split(':')
else:
parent[index][child] = parent[index][child].split()
child += 1
return split_field(parent, index + 1)
else:
return
Running it all together:
def main(text):
parent = []
get_parent(text, parent)
child_split(parent)
split_field(parent, 0)
The result is quite nested, perhaps it can be cleaned up somewhat? Or perhaps the split_fields() function could return a dictionary?

How to Build a dictionary from a text file in Python

I have a text file with entries that look like this :
JohnDoe
Assignment 9
Reading: NO
header: NO
HW: NO
Solutions: 0
show: NO
Journals: NO
free: NO
Finished: NO
Quiz: 0
Done
Assignment 3
E-book: NO
HW: NO
Readings: NO
Show: 0
Journal: NO
Study: NO
Test: NO
Finished: NO
Quiz: 0
Done
This is a small sample. The file has several students in it. Each student has two assignments under their name and they only pass if the line that starts with "Finished" in each assignment reads "Finished: YES". All of the data under each assignment is disorganized, but somewhere under each assignment a line will say "Finished: YES (or NO)" I need a way to read the file and say whether or not any of the students have passed. So far, I have
def get_entries( file ):
with open( "dicrete.txt.rtf", 'rt') as file:
for line in file:
if "Finished" in line:
finished, answer = line.split(':')
yield finished, answer
# dict takes a sequence of `(key, value)` pairs and turns in into a dict
print dict(get_entries( file ))
I can only get this code to return a single entry (the first "Finished" it reads as key and "YES or NO" as value, which is what I want, but I want it to return Every line in the file that that starts with "Finished". So the sample data I provided I want to return a dict with 2 entries {Finished:"NO" , Finished:"NO"}

Dictionaries can only store one mapping per key. So, you can never have a dictionary that has two different entries for the same key.
Consider using a list of two-tuples instead, like [("Finished", "NO"), ("Finished", "NO")].

Sounds like you need a better data model! Let's look at that, shall we?
Let's define an Assignment class that we can call with all the lines of text between Assignment: # and Finished: YES/NO.
class Assignment(object):
def __init__(self, id, *args, **kwargs):
self.id = id
for key,val in kwargs.items():
setattr(self, key.lower(), val)
finished = getattr(self, 'finished', None)
if finished is None:
raise AttributeError("All assignments must have a 'finished' value")
else:
self.finished = True if finished.lower() == "yes" else False
#classmethod
def from_string(cls, s):
"""Builds an Assignment object from a string
a = Assignment.from_string('''Assignment: 1\nAttributes: Go Here\nFinished: yes''')
>>> a.id
1
>>> a.finished
True"""
d = dict()
id = None
for line in s.splitlines():
key,*val = map(str.strip, line.split(":"))
val = ' '.join(val) or None
if key.lower().startswith('assignment'):
id = int(key.split()[-1])
continue
d[key.lower()] = val
if id is not None:
return cls(id, **d)
else:
raise ValueError("No 'Assignment' field in string {}".format(s))
Once you have your model, you'll need to parse your input. Luckily this is actually pretty simple.
def splitlineson(s, sentinel):
"""splits an iterable of strings into a newline separated string beginning with each sentinel.
>>> s = ["Garbage", "lines", "SENT$", "first", "group", "SENT$", "second", "group"]
>>> splitlineson(s, "SENT$")
iter("SENT$\nfirst\ngroup",
"SENT$\nsecond\ngroup")"""
lines = []
for line in s:
if line.lower().strip().startswith(sentinel.lower()):
if any((sentinel.lower() in line.lower() for line in lines)):
yield "\n".join(lines)
lines = [line.strip()]
else:
if line:
lines.append(line.strip())
yield "\n".join(lines)
with open('path/to/textfile.txt') as inf:
assignments = splitlineson(inf, "assignment ")
assignment_list = [Assignment.from_string(a) for a in assignments]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Formatting strings - python

It looks like you want some kind of tree structure for your output? You are printing out with print '<{}> {} </{}>'.format(k, ' '.join(v), k) so all of your output is going to have the form of '<{}> {} </{}>'. If you want to nest things you are going to need a nested structure to represent them.

Related

Efficient partial search of a trie in python

Checking for None when accessing nested attributes

How to get source corresponding to a Python AST node?

Splitting the elements of a list into a list and then splitting them again

How to Build a dictionary from a text file in Python

Categories

Resources