obtaining substring from square bracket in a sentence - python

I would like to ask as a python beginner, I would like to obtain strings from inside a square bracket and best if without trying to import any modules from python. If not it's okay.
For example,
def find_tags
#do some codes
x = find_tags('Hi[Pear]')
print(x)
it will return
1-Pear
if there are more than one brackets for example,
x = find_tags('[apple]and[orange]and[apple]again!')
print(x)
it will return
1-apple,2-orange,3-apple
I would greatly appreciate if someone could help me out thanks!

Here, I tried solving it. Here is my code :
bracket_string = '[apple]and[orange]and[apple]again!'
def find_tags(string1):
start = False
data = ''
data_list = []
for i in string1:
if i == '[':
start = True
if i != ']' and start == True:
if i != '[':
data += i
else:
if data != '':
data_list.append(data)
data = ''
start = False
return(data_list)
x = find_tags(bracket_string)
print(x)
The function will return a list of items that were between brackets of a given string parameter.
Any advice will be appreciated.

If your pattern is consistent like [sometext]sometext[sometext]... you can implement your function like this:
import re
def find_tags(expression):
r = re.findall('(\[[a-zA-Z]+\])', expression)
return ",".join([str(index + 1) + "-" + item.replace("[", "").replace("]", "") for index, item in enumerate(r)])
Btw you can use stack data structure (FIFO) to solve this problem.

You can solve this using a simple for loop over all characters of your text.
You have to remember if you are inside a tag or outside a tag - if inside you add the letter to a temporary list, if you encounter the end of a tag, you add the whole templorary list as word to a return list.
You can solve the numbering using enumerate(iterable, start=1) of the list of words:
def find_tags(text):
inside_tag = False
tags = [] # list of all tag-words
t = [] # list to collect all letters of a single tag
for c in text:
if not inside_tag:
inside_tag = c == "[" # we are inside as soon as we encounter [
elif c != "]":
t.append(c) # happens only if inside a tag and not tag ending
else:
tags.append(''.join(t)) # construct tag from t and set inside back to false
inside_tag = False
t = [] # clear temporary list
if t:
tags.append(''.join(t)) # in case we have leftover tag characters ( "[tag" )
return list(enumerate(tags,start=1)) # create enumerated list
x = find_tags('[apple]and[orange]and[apple]again!')
# x is a list of tuples (number, tag):
for nr, tag in x:
print("{}-{}".format(nr, tag), end = ", ")
Then you specify ',' as delimiter after each print-command to get your output.
x looks like: [(1, 'apple'), (2, 'orange'), (3, 'apple')]

Related

how to recursively create nested list from string input

So, I would like to convert my string input
'f(g,h(a,b),a,b(g,h))'
into the following list
['f',['g','h',['a','b'],'a','b',['g','h']]]
Essentially, I would like to replace all '(' into [ and all ')' into ].
I have unsuccessfully tried to do this recursively. I thought I would iterate through all the variables through my word and then when I hit a '(' I would create a new list and start extending the values into that newest list. If I hit a ')', I would stop extending the values into the newest list and append the newest list to the closest outer list. But I am very new to recursion, so I am struggling to think of how to do it
word='f(a,f(a))'
empty=[]
def newlist(word):
listy=[]
for i, letter in enumerate(word):
if letter=='(':
return newlist([word[i+1:]])
if letter==')':
listy.append(newlist)
else:
listy.extend(letter)
return empty.append(listy)
Assuming your input is something like this:
a = 'f,(g,h,(a,b),a,b,(g,h))'
We start by splitting it into primitive parts ("tokens"). Since your tokens are always a single symbol, this is rather easy:
tokens = list(a)
Now we need two functions to work with the list of tokens: next_token tells us which token we're about to process and pop_token marks a token as processed and removes it from the list:
def next_token():
return tokens[0] if tokens else None
def pop_token():
tokens.pop(0)
Your input consist of "items", separated by a comma. Schematically, it can be expressed as
items = item ( ',' item )*
In the python code, we first read one item and then keep reading further items while the next token is a comma:
def items():
result = [item()]
while next_token() == ',':
pop_token()
result.append(item())
return result
An "item" is either a sublist in parentheses or a letter:
def item():
return sublist() or letter()
To read a sublist, we check if the token is a '(', the use items above the read the content and finally check for the ')' and panic if it is not there:
def sublist():
if next_token() == '(':
pop_token()
result = items()
if next_token() == ')':
pop_token()
return result
raise SyntaxError()
letter simply returns the next token. You might want to add some checks here to make sure it's indeed a letter:
def letter():
result = next_token()
pop_token()
return result
You can organize the above code like this: have one function parse that accepts a string and returns a list and put all functions above inside this function:
def parse(input_string):
def items():
...
def sublist():
...
...etc
tokens = list(input_string)
return items()
Quite an interesting question, and one I originally misinterpreted. But now this solution works accordingly. Note that I have used list concatenation + operator for this solution (which you usually want to avoid) so feel free to improve upon it however you see fit.
Good luck, and I hope this helps!
# set some global values, I prefer to keep it
# as a set incase you need to add functionality
# eg if you also want {{a},b} or [ab<c>ed] to work
OPEN_PARENTHESIS = set(["("])
CLOSE_PARENTHESIS = set([")"])
SPACER = set([","])
def recursive_solution(input_str, index):
# base case A: when index exceeds or equals len(input_str)
if index >= len(input_str):
return [], index
char = input_str[index]
# base case B: when we reach a closed parenthesis stop this level of recursive depth
if char in CLOSE_PARENTHESIS:
return [], index
# do the next recursion, return it's value and the index it stops at
recur_val, recur_stop_i = recursive_solution(input_str, index + 1)
# with an open parenthesis, we want to continue the recursion after it's associated
# closed parenthesis. and also the recur_val should be within a new dimension of the list
if char in OPEN_PARENTHESIS:
continued_recur_val, continued_recur_stop_i = recursive_solution(input_str, recur_stop_i + 1)
return [recur_val] + continued_recur_val, continued_recur_stop_i
# for spacers eg "," we just ignore it
if char in SPACER:
return recur_val, recur_stop_i
# and finally with normal characters, we just extent it
return [char] + recur_val, recur_stop_i
You can get the expected answer using the following code but it's still in string format and not a list.
import re
a='(f(g,h(a,b),a,b(g,h))'
ans=[]
sub=''
def rec(i,sub):
if i>=len(a):
return sub
if a[i]=='(':
if i==0:
sub=rec(i+1,sub+'[')
else:
sub=rec(i+1,sub+',[')
elif a[i]==')':
sub=rec(i+1,sub+']')
else:
sub=rec(i+1,sub+a[i])
return sub
b=rec(0,'')
print(b)
b=re.sub(r"([a-z]+)", r"'\1'", b)
print(b,type(b))
Output
[f,[g,h,[a,b],a,b,[g,h]]
['f',['g','h',['a','b'],'a','b',['g','h']] <class 'str'>

Parsing a string containing code into a list / tree in python

as the title suggests I'm trying to parse a piece of code into a tree or a list.
First off I would like to thank for any contribution and time spent on this.
So far my code is doing what I expect, yet I am not sure that this is the optimal / most generic way to do this.
Problem
1. I want to have a more generic solution since in the future I am going to need further analysis of this sintax.
2. I am unable right now to separate the operators like '=' or '>=' as you can see below in the output I share.
In the future I might change the content of the list / tree from strings to tuples so i can identify the kind of operator (parameter, comparison like = or >= ....). But this is not a real need right now.
Research
My first attempt was parsing the text character by character, but my code was getting too messy and barely readable, so I assumed that I was doing something wrong there (I don't have that code to share here anymore)
So i started looking around how people where doing it and found some approaches that didn't necessarily fullfil the requirements of simplicity and generic.
I would share the links to the sites but I didn't keep track of them.
The Syntax of the code
The syntax is pretty simple, after all I'm no interested in types or any further detail. just the functions and parameters.
strings are defined as 'my string', variables as !variable and numbers as in any other language.
Here is a sample of code:
db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)
My Output
Here my output is partialy correct since I'm still unable to separate the "= '3'" part (of course I have to separate it because in this case its a comparison operator and not part of a string)
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, "= '3'", "'4'", "'5'"]}, '6']}]
Desired Output
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, "=", "'3'", "'4'", "'5'"]}, '6']}]
My code so far
The parseRecursive method is the entry point.
import re
class FileParser:
#order is important to avoid miss splits
COMPARATOR_SIGN = {
'#='
,'#<>'
,'<>'
,'>='
,'<='
,'='
,'>'
,'<'
}
def __init__(self):
pass
def __charExistsInOccurences(self,current_needle, needles, text):
"""
check if other needles are present in text
current_needle : string -> the current needle being evaluated
needles : list -> list of needles
text : string/list<string> -> a string or a list of string to evaluate
"""
#if text is a string convert it to list of strings
text = text if isinstance(text, list) else [text]
exists = False
for t in text:
#check if needle is inside text value
for needle in needles:
#dont check the same key
if needle != current_needle:
regex_search_needle = split_regex = '\s*'+'\s*'.join(needle) + '\s*'
#list of 1's and 0's . 1 if another character is found in the string.
found = [1 if re.search(regex_search_needle, x) else 0 for x in t]
if sum(found) > 0:
exists = True
break
return exists
def findOperator(self, needles, haystack):
"""
split parameters from operators
needles : list -> list of operators
haystack : string
"""
string_open = haystack.find("'")
#if no string has been found set the index to 0
if string_open < 0:
string_open = 0
occurences = []
string_closure = haystack.rfind("'")
operator = ''
for needle in needles:
#regex to ignore the possible spaces between characters of the needle
split_regex = '\s*'+'\s*'.join(needle) + '\s*'
#parse parameters before and after the string
before_string = re.split(split_regex, haystack[0:string_open])
after_string = re.split(split_regex, haystack[string_closure+1:])
#check if any other needle exists in the results found
before_string_exists = self.__charExistsInOccurences(needle, needles, before_string)
after_string_exists = self.__charExistsInOccurences(needle, needles, after_string)
#if the operator has been found merge the results with the occurences and assign the operator
if not before_string_exists and not after_string_exists:
occurences.extend(before_string)
occurences.extend([haystack[string_open:string_closure+1]])
occurences.extend(after_string)
operator = needle
#filter blank spaces generated
occurences = list(filter(lambda x: len(x.strip())>0,occurences))
result_check = [1 if x==haystack else 0 for x in occurences]
#if the haystack was originaly a simple string like '1' the occurences list is going to be filled with the same character over and over due to the before string an after string part
if len(result_check) == sum(result_check):
occurences= [haystack]
operator = ''
return operator, occurences
def parseRecursive(self,text):
"""
parse a block of text
text : string
"""
assert(len(text) < 1, "text is empty")
function_open = text.find('(')
accumulated_params = []
if function_open > -1:
#there is another function nested
text_prev_function = text[0:function_open]
#find last space coma or equal to retrieve the function name
last_space = -1
for j in range(len(text_prev_function)-1, 0 , -1):
if text_prev_function[j] == ' ' or text_prev_function[j] == ',' or text_prev_function[j] == '=':
last_space = j
break
func_name = ''
if last_space > -1:
#there is something else behind the function name
func_name = text_prev_function[last_space+1:]
#no parentesis before so previous characters from function name are parameters
text_prev_func_params = list(filter(lambda x: len(x.strip())>0,text_prev_function[:last_space+1].split(',')))
text_prev_func_params = [x.strip() for x in text_prev_func_params]
#debug here
#accumulated_params.extend(text_prev_func_params)
for itext_prev in text_prev_func_params:
operator, text_prev_operator = self.findOperator(self.COMPARATOR_SIGN,itext_prev)
if operator == '':
accumulated_params.extend(text_prev_operator)
else:
text_prev_operator.append(operator)
accumulated_params.extend(text_prev_operator)
#accumulated_params.extend(text_prev_operator)
else:
#function name is the start of the string
func_name = text_prev_function[0:].strip()
#find the closure of parentesis
function_close = text.rfind(')')
#parse the next function and extend the current list of parameters
next_func = text[function_open+1:function_close]
func_params = {func_name : self.parseRecursive(next_func)}
accumulated_params.append(func_params)
#
# parameters after the function
#
new_text = text[function_close+1:]
accumulated_params.extend(self.parseRecursive(new_text))
else:
#there is no other function nested
split_text = text.split(',')
current_func_params = list(filter(lambda x: len(x.strip())>0,split_text))
current_func_params = [x.strip() for x in current_func_params]
accumulated_params.extend(current_func_params)
#accumulated_params = list(filter(lambda x: len(x.strip())>0,accumulated_params))
return accumulated_params
text = "db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)"
obj = FileParser()
print(obj.parseRecursive(text))
You can use pyparsing to deal with such a case.
* pyparsing can be installed by pip install pyparsing
Code:
import pyparsing as pp
# A parsing pattern
w = pp.Regex(r'(?:![^(),]+)|[^(), ]+') ^ pp.Suppress(',')
pattern = w + pp.nested_expr('(', ')', content=w)
# A recursive function to transform a pyparsing result into your desirable format
def transform(elements):
stack = []
for e in elements:
if isinstance(e, list):
key = stack.pop()
stack.append({key: transform(e)})
else:
stack.append(e)
return stack
# A sample
string = "db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)"
# Operations to parse the sample string
elements = pattern.parse_string(string).as_list()
result = transform(elements)
# Assertion
assert result == [{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
# Show the result
print(result)
Output:
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
Note:
If there is an unbalanced parenthesis inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or an IndexError is raised. So be careful in such cases.
At the moment, only a single sample is available to make a pattern to parse string. So if you encounter a parsing error, provide more examples in your question.

Python2 tokenization and add to dictonary

I have some texts that I need to generate tokens splitting by space. Furthermore, I need to remove all punctuation, as I need to remove everything inside double braces [[...]] (including the double braces).
Each token I will put on a dictionary as the key that will have a list of values.
I have tried regex to remove these double braces patterns, if-elses, but I can't find a solution that works. For the moment I have:
tokenDic = dict()
splittedWords = re.findall(r'\[\[\s*([^][]*?)]]', docs[doc], re.IGNORECASE)
tokenStr = splittedWords.split()
for token in tokenStr:
tokenDic[token].append(value);
Is this what you're looking for?
import re
value_list = []
inp_str = 'blahblah[[blahblah]]thi ng1[[junk]]hmm'
tokenDic = dict()
#remove everything in double brackets
bracket_stuff_removed = re.sub(r'\[\[[^]]*\]\]', '', inp_str)
#function to keep only letters and digits
clean_func = lambda x: 97 <= ord(x.lower()) <= 122 or 48 <= ord(x) <= 57
for token in bracket_stuff_removed.split(' '):
cleaned_token = ''.join(filter(clean_func, token))
tokenDic[cleaned_token] = list(value_list)
print(tokenDic)
Output:
{'blahblahthi': [], 'ng1hmm': []}
As for appending to the list, I don't have enough info right now to tell you the best way in your situation.
If you want to set the value when you're adding the key, do this:
tokenDic[cleaned_token] = [val1, val2, val3]
If you want to set the values after the key has been added, do this:
val_to_add = "something"
if tokenDic.get(cleaned_token, -1) == -1:
print('ERROR', cleaned_token, 'does not exist in dict')
else:
tokenDic[cleaned_token].append(val_to_add)
If you want to directly append to the dict in both cases, you'll need to use defaultdict(list) instead of dict.. then if the key does not exist in the dict, it will create it, make the value an empty list, and then add your value.
To remove everything inside [[]] you can use re.sub and you already have the correct regex so just do this.
x = [[hello]]w&o%r*ld^$
y = re.sub("\[\[\s*([^][]*?)]]","",x)
z = re.sub("[^a-zA-Z\s]","",y)
print(z)
This prints "world"

How to return an array of characters in cypher program (python3)

i wrote code when input for example is "a" he return "h". But how i can make it work if i want to return array of characters, for example if is input "aa"
to return "hh"?
def input(s):
for i in range(len(s)):
ci = (ord(s[i])-90)%26+97
s = "".join(chr(ci))
return s
Never use built-in names as input
l = []
def input_x(s):
for i in s:
i = (ord(i)-90)%26+97
l.append(chr(i))
s = ''.join(l)
return s
You can use strings to do this. My variable finaloutput is a string that I will use to store all the updated characters.
def foo(s):
finaloutput = ''
for i in s:
finaloutput += chr((ord(i)-90)%26+97)
return finaloutput
This code uses string concatenation to add together a series of characters. Since strings are iterables, you can use the for loop shown above instead of the complex one that you used.
def input_x(s):
result = ""
for i in s:
ci = (ord(i)-90)%26+ 97
result += chr(ci)
print(result)

translating a string using key-value from dictionary

This function takes a dictionary as an argument and translates the given string. However, it has become an endless loop. I can't for the life of me figure out how to make it work normally. For example: it is supposed to take a string "hi" and translates it into "[-]1"
def translate(glyphs):
string = input("Enter string to be translated: ").strip()
new = ''
for keys in glyphs:
ind = string.upper().find(keys)
while ind != -1: #while there exists a key in the string
if len(glyphs[string[ind].upper()]) > 1: #if there is more than one value for key
rand = randint(0, 1) #choose randomly
transChar = glyphs[keys][rand]
new = string[:ind] + transChar + string[ind+1:]
ind = string.upper().find(keys)
print("hi1")
else:
transChar = glyphs[keys][0]
new = string[:ind] + transChar + string[ind+1:]
ind = string.upper().find(keys)
print("hi")
return new
Any help would be appreciated!
Looks like your dictionary contains lists of possible translations as values, from which you make random choices, and upper-case keys. This list comprehension should work, then:
import random
new = ' '.join(random.choice(glyphs[word]) \
for word in input_string.upper().split())

Categories