Iterating through a pyparsing tree yields only unique items? - python

I have a simple parser written in pyparsing below:
import pyparsing as pp
Token = pp.Word(pp.alphas)("Token")
Modifier = pp.Word(pp.nums)("Modifier")
Random = pp.Group(pp.Keyword("?") + pp.OneOrMore(Modifier))("Random")
Phrase = pp.Group(Token + pp.OneOrMore(Modifier))("Phrase")
Collection = pp.Group(pp.delimitedList(Phrase ^ Random, ","))("Collection")
tree = Collection.parseString("hello 12 2, ? 1 2, word 4, ? 3 4, testing 5")
I then tried doing this:
>>> for name, item in tree[0].items():
print name, item
Phrase ['testing', '5']
Random ['?', '3', '4']
...but for some reason, it returned only the last Phrase and Random items in the tree. How can I get all of them?
(Note: I also tried doing this:
>>> for item in tree[0]:
print item
['hello', '12', '2']
['?', '1', '2']
['word', '4']
['?', '3', '4']
['testing', '5']
...but as you can see, it doesn't return the token name, which I need. I also tried doing item.name, but those always returned empty strings.)
How do I iterate through a pyparsing tree and get every single item, in order, along with the assigned name?

ParseResults can get their defining name by calling getName():
>>> for f in tree[0]: print f.getName(), f.asList()
...
Phrase ['hello', '12', '2']
Random ['?', '1', '2']
Phrase ['word', '4']
Random ['?', '3', '4']
Phrase ['testing', '5']
You can also revert to using setResultsName and setting the listAllMatches argument to True. In version 1.5.6, the expr("name") shortcut was enhanced so that if the name ends with '*', then that is equivalent to expr.setResultsName("name", listAllMatches=True). Here is how the output changes by setting this flag:
>>> Random = pp.Group(pp.Keyword("?") + pp.OneOrMore(Modifier))("Random*")
>>> Phrase = pp.Group(Token + pp.OneOrMore(Modifier))("Phrase*")
>>> Collection = pp.Group(pp.delimitedList(Phrase ^ Random, ","))("Collection")
>>> tree = Collection.parseString("hello 12 2, ? 1 2, word 4, ? 3 4, testing 5")
>>> print tree.dump()
[[['hello', '12', '2'], ['?', '1', '2'], ['word', '4'], ['?', '3', '4'], ['testing', '5']]]
- Collection: [['hello', '12', '2'], ['?', '1', '2'], ['word', '4'], ['?', '3', '4'], ['testing', '5']]
- Phrase: [['hello', '12', '2'], ['word', '4'], ['testing', '5']]
- Random: [['?', '1', '2'], ['?', '3', '4']]

Related

Reading a text file into lists, based on the spaces in the file

So I have this txt file:
Haiku
5 *
7 *
5 *
Limerick
8 A
8 A
5 B
5 B
8 A
And I want to write a function that returns something like this:
[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8' ,'A']]
Ive tried this:
small_pf = open('datasets/poetry_forms_small.txt')
lst = []
for line in small_pf:
lst.append(line.strip())
small_pf.close()
print(lst)
At the end I end up with this:
['Haiku', '5 *', '7 *', '5 *', '', 'Limerick', '8 A', '8 A', '5 B', '5 B', '8 A']
My problem is that this is one big list, and the elements of the list are attached together, like '5 *' or '8 A'.
I honestly don't know where to start and thats why I need some guidance into what to do for those two problems.
Any help would be greatly appreciated.
When you see an empty line : don't add it, save the tmp list you've been filling, and continue
lst = []
with open('test.txt') as small_pf:
tmp_list = []
for line in small_pf:
line = line.rstrip("\n")
if line == "":
lst.append(tmp_list)
tmp_list = []
else:
tmp_list.extend(line.split())
if tmp_list: # add last one
lst.append(tmp_list)
print(lst)
# [['Haiku', '5', '*', '7', '*', '5', '*'],
# ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
First split the file into sections on blank lines (\n\n), then split each section on any whitespace (newlines or spaces).
lst = [section.split() for section in small_pf.read().split('\n\n')]
Result:
[['Haiku', '5', '*', '7', '*', '5', '*'],
['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
Solution without using extra modules
small_pf = small_pf.readlines()
result = []
tempList = []
for index,line in enumerate(small_pf):
if line == "\n" or index == len(small_pf) -1:
result.append(tempList.copy())
del tempList[:]
else:
for value in line.strip("\n").split():
tempList.append(value)
result
Solution with module
You can use regex to solve your problem:
import re
small_pf = small_pf.read()
[re.split("\s|\n", x) for x in re.split("\n\n", small_pf)]
Output
[['Haiku', '5', '*', '7', '*', '5', '*'],
['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
This approach assumes that a line either starts with a character that is a decimal value or a nondecimal value. Moreover, it assumes that if it starts with a nondecimal value that this should start a new list with the line (as a string, without any trailing whitespace) as the first element. If subsequent lines start with a decimal value, these are stripped of trailing whitespace, and parts of the line (determined by separation from a space) are added as elements in the most recently created list.
lst = []
with open("blankpaper.txt") as f:
for line in f:
# ignore empty lines
if line.rstrip() == '':
continue
if not line[0].isdecimal():
new_list = [line.rstrip()]
lst.append(new_list)
continue
new_list.extend(line.rstrip().split(" "))
print(lst)
Output
[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
I hope this helps. If there are any questions, please let me know.

How to introduce constraints using Python itertools.product()?

The following script generates 4-character permutations of set s and outputs to file:
import itertools
s = ['1', '2', '3', '4', '!']
l = list(itertools.product(s, repeat=4))
with open('output1.txt', 'w') as f:
for i in l:
f.write(''.join([str(v) for v in i]) + '\n')
Output:
...
11!1
11!2
11!3
11!4
11!!
...
How are constraints introduced such as:
No permutation should start with '!'
The 3rd character should be '3'
etc.
The repeat parameter is meant to be used when you do want the same set of options for each position in the sequence. Since you don't, then you should just use positional arguments to give the options for each position in the sequence. (docs link)
For your example, the first letter can be any of ['1', '2', '3', '4'], and the third letter can only be '3':
import itertools as it
s = ['1', '2', '3', '4', '!']
no_exclamation_mark = ['1', '2', '3', '4']
only_3 = ['3']
l = it.product(no_exclamation_mark, s, only_3, s)
#Kelly Bundy wrote the same solution in a comment, but simplified using the fact that strings are sequences of characters, so if your options for each position are just one character each then you don't need to put them in lists:
l = it.product('1234', '1234!', '3', '1234!')
Don't convert the result to a list. Instead, filter it using a generator comprehension:
result = itertools.product(s, repeat=4)
result = (''.join(word) for word in result)
result = (word for word in result if not word.startswith('!'))
result = (word for word in result if word[2] == '3')
The filtering will not be executed until you actually read the elements from result, such as converting it to a list or using a for-loop:
def f1(x):
print("Filter 1")
return x.startswith('A')
def f2(x):
print("Filter 2")
return x.endswith('B')
words = ['ABC', 'ABB', 'BAA', 'BBB']
result = (word for word in words if f1(word))
result = (word for word in result if f2(word))
print('No output here')
print(list(result))
print('Filtering output here')
This will output
No output here
Filter 1
Filter 2
Filter 1
Filter 2
Filter 1
Filter 1
['ABB']
Filtering output here
The itertools.product function can't handle the kinds of constraints you describe itself. You can probably implement them yourself, though, with extra iteration and changes to how you build your output. For instance, to generate a 4-character string where the third character is always 3, generate a 3-product and use it to fill in the first, second and fourth characters, leaving the third fixed.
Here's a solution for your two suggested constraints. There's not really a generalization to be made here, I'm just interpreting each one and combining them:
import itertools
s = ['1', '2', '3', '4', '!']
for i in s[:-1]: # skip '!'
for j, k in itertools.product(s, repeat=2): # generate two more values from s
print(f'{i}{j}3{k}')
This approach avoids generating values that will need to be filtered out. This is a lot more efficient than generating all possible four-tuples and filtering the ones that violate the constraints. The filtering approach will often do many times more work, and it gets proportionally much worse the more constraints you have (since more and more of the generated values will be filtered).
Itertools' product does not have an integrated filter mechanism. It will generate all permutations brutally and you will have to filter its output (which is not very efficient).
To be more efficient you would need to implement your own (recursive) generator function so that you can short-circuit the generation as soon as one of the constraint is not met (i.e. before getting to a full permutation):
def perm(a,p=[]):
# constraints applied progressively
if p and p[0] == "!": return
if len(p)>= 3 and p[2]!= '3': return
# yield permutation of 4
if len(p)==4: yield p; return
# recursion (product)
for x in a:
yield from perm(a,p+[x])
Output:
s = ['1', '2', '3', '4', '!']
for p in perm(s): print(p)
['1', '1', '3', '1']
['1', '1', '3', '2']
['1', '1', '3', '3']
['1', '1', '3', '4']
['1', '1', '3', '!']
['1', '2', '3', '1']
['1', '2', '3', '2']
['1', '2', '3', '3']
...
['4', '4', '3', '3']
['4', '4', '3', '4']
['4', '4', '3', '!']
['4', '!', '3', '1']
['4', '!', '3', '2']
['4', '!', '3', '3']
['4', '!', '3', '4']
['4', '!', '3', '!']

How to merge or delete an element in a list that duplicates the previous item?

I have a list:
output = ['9', '-', '-', '7', '-', '4', '4', '-', '3', '-', '0', '2']
and I'm trying trying to reduce the '-','-' section to just a single '-', however, haven't had much luck in trying.
final = [output[i] for i in range(len(output)) if output[i] != output[i-1]]
final = 9-7-4-3-02
I've tried that above, but it also reduces the '4','4' to only '4'. So any help would be great.
You should check if the item is equal to the previous item and to '-', which can easily be done in Python using a == b == c.
Note that you should also handle the first character differently, since output[0] == output[0-1] will compare the first item with the last item, which might lead to invalid results.
The following code will handle this:
final = [output[0]] + [output[i] for i in range(1, len(output)) if not (output[i] == output[i-1] == '-')]
The zip() function is your friend for situations where you need to compare/process elements and their predecessor:
final = [a for a,b in zip(output,['']+output) if (a,b) != ('-','-')]
You can use itertools.groupby:
from itertools import groupby as gb
output = ['9', '-', '-', '7', '-', '4', '4', '-', '3', '-', '0', '2']
r = [j for a, b in gb(output) for j in ([a] if a == '-' else b)]
Output:
['9', '-', '7', '-', '4', '4', '-', '3', '-', '0', '2']

Extend/append python join list

I have an example:
li = [['b', 'b', 'c', '3.2', 'text', '3', '5', '5'], ['a', 'w', '3', '4'], ['a', 'x', '3', '4'],['a','b'],['312','4']]
a = 0
b = []
c = []
count = []
for x in range(len(li)):
for a in range(len(li[x])):
if li[x][a].isalpha():
a += 1
elif not li[x][a].isalpha() and li[x][a + 1].isalpha():
a += 1
else:
break
i = (len(li[x]) - a)
b.extend([' '.join(li[x][0:a])])
b.extend(li[x][a::])
count.append(i)
for x in range(len(count)):
a = count[x] + 1
z = (sum(count[:x]))
if x == 0:
c.append(b[:a])
else:
c.append(b[a+1::z])
print(c)
I have various items in the li list and the length of the list itself is not constant.
If any element in the array is a string or if there is some other symbol between the two strings, it combines everything into one element - this join works as I wanted.
I would like to preserve the existing structure. For example, output now looks like this:
[['b b c 3.2 text', '3', '5', '5'], ['a w', 'a x', 'a b', '4'], ['a w', '4'], ['5', '4'], ['a w', '']]
but it should look like this:
[['b b c 3.2 text', '3', '5', '5'],['aw','3','4'],['ax','3','4'],['ab'],['312','4']
Of course, the code I sent did not work properly - I think of a solution but I still have some problems with it - I do not know how to add ranges to this list c - I try to pull the length of the elements of the list as count but it also doesn't work for me - maybe this is a bad solution? Maybe this extend b is not the best solution? Maybe there is no point in using so many 'transformations' and creating new lists?
Let me some tips.
The definition is a bit unclear to me, but I think this will do it. Code is not very verbose, though. If it does what you intended, I can try to explain / make it simpler.
li = [['b', 'b', 'c', '3.2', 'text', '3', '5', '5'], ['a', 'w', '3', '4'], ['a', 'x', '3', '4'],['a','b'],['312','4']]
def join_to_last_text(lst: list, min_join: int = 1) -> list:
last_text = max((i for i,s in enumerate(lst) if s.isalpha()), default=min_join - 1)
return [' '.join(lst[:last_text + 1])] + lst[last_text + 1:]
output = [join_to_last_text(lst) for lst in li]
print(output)
# You can join a minimum of first items by setting a higher max default.
# If max does not find isalpha, it will use this value.
output_min_2 = [join_to_last_text(lst, min_join=2) for lst in li]
print(output_min_2)
#Johan Schiff's code works as expected but leaves a corner case - when the first element of the list is not a text. I have made a small change in his code to take care of that situation:
li = [['b', 'b', 'c', '3.2', 'text', '3', '5', '5'], ['a', 'w', '3', '4'], ['a', 'x', '3', '4'],['a','b'],['312','4']]
def join_to_last_text(lst: list) -> list:
first_text = min((i for i,s in enumerate(lst) if s.isalpha()), default=0)
last_text = max((i for i,s in enumerate(lst) if s.isalpha()), default=0)
return lst[:first_text] + [''.join(lst[first_text:last_text + 1])] + lst[last_text + 1:]
output = [join_to_last_text(lst) for lst in li]
print(output)
Where would this give a different output(a correct one)? Check out the following test case:
li = [['4','b', 'b', 'c', '3.2', 'text', '3', '5', '5'], ['a', 'w', '3', '4']]
#Johan's code would output -
[['5bbc3.2text', '3', '5', '5'], ['aw', '3', '4']]
whereas based on the following phrase in the question
If any element in the array is a string or if there is some other symbol between the two strings, it combines everything into one element
the output should be-
[['5', 'bbc3.2text', '3', '5', '5'], ['aw', '3', '4']]

How to split a string that includes sign characters

How can I split a string that includes "sign characters" but no spaces? For example:
aString = '1+20*40-3'
I want the output to be:
['1', '+', '20', '*', '40', '-', '3']
I tried this:
aString.split('+' and '*' and '-')
but that didn't work.
You can use regular expression to do this task in python. The code will be:
import re
aString= '1+20*40-3'
print re.findall('[+-/*]|\d+',aString)
output:
>>>
['1', '+', '20', '*', '40', '-', '3']
Refer documentation here

Categories