Tokenizing blocks of code in Python

Tokenizing blocks of code in Python - python

I have this string:
[a [a b] [c e f] d]
and I want a list like this
lst[0] = "a"
lst[1] = "a b"
lst[2] = "c e f"
lst[3] = "d"
My current implementation that I don't think is elegant/pythonic is two recursive functions (one splitting with '['
and the other with ']' ) but I am sure it can be
done using list comprehensions or regular expressions (but I can't figure out a sane way to do it).
Any ideas?

Actually this really isn't a recursive data structure, note that a and d are in separate lists. You're just splitting the string over the bracket characters and getting rid of some white space.
I'm sure somebody can find something cleaner, but if you want a one-liner something like the following should get you close:
parse_str = '[a [a b] [c e f] d]'
lst = [s.strip() for s in re.split('[\[\]]', parse_str) if s.strip()]
>>>lst
['a', 'a b', 'c e f', 'd']

Well, if it's a recursive data structure you're going to need a recursive function to cleanly navigate it.
But Python does have a tokenizer library which might be useful:
http://docs.python.org/library/tokenize.html

If it's a recursive data structure, then recursion is good to traverse it. However, parsing the string to create the structure does not need to be recursive. One alternative way I would do it is iterative:
origString = "[a [a b] [c [x z] d e] f]".split(" ")
stack = []
for element in origString:
if element[0] == "[":
newLevel = [ element[1:] ]
stack.append(newLevel)
elif element[-1] == "]":
stack[-1].append(element[0:-1])
finished = stack.pop()
if len(stack) != 0:
stack[-1].append(finished)
else:
root = finished
else:
stack[-1].append(element)
print root
Of course, this can probably be improved, and it will create lists of lists of lists of ... of strings, which isn't exactly what your example wanted. However, it does handle arbitrary depth of the tree.

Related

how to find the index of an element in a list by giving only a specific part of that element in a list in python

a=["cat on the wall" ,"dog on the table","tea in the cup"]
b= "dog"
for i in a:
if b in i:
print(a.index(i))
The output prints the index of the element where is "dog" is present
can this be done in using any inbuilt function, variable b contains only a part of the element in the list.

Looks like you need enumerate
Ex:
a=["cat on the wall" ,"dog on the table","tea in the cup"]
b= "dog"
for idx, v in enumerate(a):
if b in v:
print(idx) #-->1

a=["cat on the wall" ,"dog on the table","tea in the cup"]
b= "dog"
for z in a:
if b in z:
print(a.index(z))

There's no direct way to do it. Other functions (eg. sorted) accept a key that they use to compare elements. But index depends on equality. So, you have two choices
override the equality operator. not a good idea in your case, since you have simple strings.
rewrite you loop, using a regular expression
import re
for word in a:
position = re.search(b, word)
if position is not None:
start_index = position.span()[0]
print(start_index)

As blue_note mentioned there is no direct way for this. You can simplify using List Comprehension. Like,
a = ["cat on the wall", "dog on the table", "tea in the cup"]
b = "dog"
print ([idx for idx, itm in enumerate(a) if b in itm])

Filtering for tuples from another list and extracting values

I am working on handling two lists of tuples and deducing results.
For example:
A = [('Hi','NNG'),('Good','VV'),...n]
B = [('Happy','VA',1.0),('Hi','NNG',0.5)...n]
First, I'd like to match the words between A and B.
like 'Hi'='Happy' or 'Hi'='Hi'
Second, if they are same and match, then match word class.
whether 'NNG'='NNG' or 'NNG'='VV'
Third, if all these steps match, then extract the number!
like if A=[('Hi','NNG')] and B=('Hi','NNG',0.5)
Extract 0.5
Lastly, I want to multiply all numbers from extraction.
There are more than 1,000 tuples in each A, B. So 'for' loop will be necessary to find out this process.
How can I do this in Python?

Try something like this:
A = [('Hi', 'NNG'), ('Good', 'VV')]
B = [('Happy', 'VA', 1.0), ('Hi', 'NNG', 0.5)]
print(', '.join(repr(j[2]) for i in A for j in B if i[0] == j[0] and i[1] == j[1]))
# 0.5

One way is to use a set and (optionally) a dictionary. The benefit of this method is you also keep the key data to know where your values originated.
A = [('Hi','NNG'),('Good','VV')]
B = [('Happy','VA',1.0),('Hi','NNG',0.5)]
A_set = set(A)
res = {(i[0], i[1]): i[2] for i in B if (i[0], i[1]) in A_set}
res = list(res.values())
# [0.5]
To multiply all results in the list, see How can I multiply all items in a list together with Python?
Explanation
Use a dictionary comprehension with for i in B. What this does is return a tuple of results iterating through each element of B.
For example, when iterating the first element, you will find i[0] = 'Happy', i[1] = 'VA', i[2] = 1.0.
Since we loop through the whole list, we construct a dictionary of results with tuple keys from the first 2 elements.
Additionally, we add the criterion (i[0], i[1]) in A_set to filter as per required logic.

Python is so high level that it feels like English. So, the following working solution can be written very easily with minimum experience:
A = [('Hi','NNG'),('Good','VV')]
B = [('Happy','VA',1.0),('Hi','NNG',0.5)]
tot = 1
for ia in A:
for ib in B:
if ia == ib[:2]:
tot *= ib[2]
break # remove this line if multiple successful checks are possible
print(tot) # -> 0.5

zip() is your friend:
for tupA,tupB in zip(A,B):
if tupA[:2] == tupB[:2] : print(tupB[2])
To use fancy pythonic list comprehension:
results = [tubB[2] for tubA,tubB in zip(A,B) if tubA[:2] == tubB[:2] ]
But... why do I have a sneaky feeling this isn't what you want to do?

Python, finding unique words in multiple lists

I have the following code:
a= ['hello','how','are','hello','you']
b= ['hello','how','you','today']
len_b=len(b)
for word in a:
count=0
while count < len_b:
if word == b[count]:
a.remove(word)
break
else:
count=count+1
print a
The goal is that it basically outputs (contents of list a)-(contents of list b)
so the wanted result in this case would be a = ['are','hello']
but when i run my code i get a= ['how','are','you']
can anybody either point out what is wrong with my implementation, or is there another better way to solve this?

You can use a set to get all non duplicate elements
So you could do set(a) - set(b) for the difference of sets

The reason for this is because you are mutating the list a while iterating over it.
If you want to solve it correctly, you can try the below method. It uses list comprehension and dictionary to keep track of the number of words in the resulting set:
>>> a = ['hello','how','are','hello','you']
>>> b = ['hello','how','you','today']
>>>
>>> cnt_a = {}
>>> for w in a:
... cnt_a[w] = cnt_a.get(w, 0) + 1
...
>>> for w in b:
... if w in cnt_a:
... cnt_a[w] -= 1
... if cnt_a[w] == 0:
... del cnt_a[w]
...
>>> [y for k, v in cnt_a.items() for y in [k] * v]
['hello', 'are']
It works well in case where there are duplicates, even in the resulting list. However it may not preserve the order, but it can be easily modify to do this if you want.

set(a+b) is alright, too. You can use sets to get unique elements.

Python: 'as' keyword in list comprehension?

I know this won't work but you guys get the idea.
c = [m.split('=')[1] as a for m in matches if a != '1' ]
Is there a way to archive this? If you use a list comprehension like
c = [m.split('=')[1] as a for m in matches if m.split('=')[1] != '1' ]
two lists will be build from the split, right?

You can use use a generator expression inside the list comprehension:
c = [a for a in (m.split('=')[1] for m in matches) if a != '1']

It's sorta-possible, but when you find yourself resorting to awful hacks like the following, it's time to use a regular loop:
c = [a for m in matches for a in [m.split('=')[1]] if a != '1']

You can't do it, and there's no real point in using a nested map or nested list comprehension as the other solutions show. If you want to preprocess the list, just do:
whatIwant = (m.split('=')[1] for m in matches)
c = [a for a in whatIwant if a != 1]
Using a nested list comp or map saves nothing, since the entire list is still processed. All it does is reduce readability.

Something like this perhaps:
c = [ a for a, m in map(lambda x: (x.split('=')[1], x), matches) if a != '1' ]
you may want to use imap instead of map. Some cleaner version:
def right_eq(x): return (x.split('=')[1], x)
c = [ a for a, m in imap(right_eq, matches) if a != '1' ]

python removing whitespace from string in a list

I have a list of lists. I want to remove the leading and trailing spaces from them. The strip() method returns a copy of the string without leading and trailing spaces. Calling that method alone does not make the change. With this implementation, I am getting an 'array index out of bounds error'. It seems to me like there would be "an x" for exactly every list within the list (0-len(networks)-1) and "a y" for every string within those lists (0-len(networks[x]) aka i and j should map exactly to legal, indexes and not go out of bounds?
i = 0
j = 0
for x in networks:
for y in x:
networks[i][j] = y.strip()
j = j + 1
i = i + 1

You're forgetting to reset j to zero after iterating through the first list.
Which is one reason why you usually don't use explicit iteration in Python - let Python handle the iterating for you:
>>> networks = [[" kjhk ", "kjhk "], ["kjhkj ", " jkh"]]
>>> result = [[s.strip() for s in inner] for inner in networks]
>>> result
[['kjhk', 'kjhk'], ['kjhkj', 'jkh']]

You don't need to count i, j yourself, just enumerate, also looks like you do not increment i, as it is out of loop and j is not in inner most loop, that is why you have an error
for x in networks:
for i, y in enumerate(x):
x[i] = y.strip()
Also note you don't need to access networks but accessing 'x' and replacing value would work, as x already points to networks[index]

This generates a new list:
>>> x = ['a', 'b ', ' c ']
>>> map(str.strip, x)
['a', 'b', 'c']
>>>
Edit: No need to import string when you use the built-in type (str) instead.

So you have something like: [['a ', 'b', ' c'], [' d', 'e ']], and you want to generate [['a', 'b',' c'], ['d', 'e']]. You could do:
mylist = [['a ', 'b', ' c'], [' d', 'e ']]
mylist = [[x.strip() for x in y] for y in mylist]
The use of indexes with lists is generally not necessary, and changing a list while iterating though it can have multiple bad side effects.

c=[]
for i in networks:
d=[]
for v in i:
d.append(v.strip())
c.append(d)

A much cleaner version of cleaning list could be implemented using recursion. This will allow you to have a infinite amount of list inside of list all while keeping a very low complexity to your code.
Side note: This also puts in place safety checks to avoid data type issues with strip. This allows your list to contain ints, floats, and much more.
def clean_list(list_item):
if isinstance(list_item, list):
for index in range(len(list_item)):
if isinstance(list_item[index], list):
list_item[index] = clean_list(list_item[index])
if not isinstance(list_item[index], (int, tuple, float, list)):
list_item[index] = list_item[index].strip()
return list_item
Then just call the function with your list. All of the values will be cleaned inside of the list of list.
clean_list(networks)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tokenizing blocks of code in Python - python

Well, if it's a recursive data structure you're going to need a recursive function to cleanly navigate it. But Python does have a tokenizer library which might be useful: http://docs.python.org/library/tokenize.html

Related

how to find the index of an element in a list by giving only a specific part of that element in a list in python

Filtering for tuples from another list and extracting values

Python, finding unique words in multiple lists

Python: 'as' keyword in list comprehension?

python removing whitespace from string in a list

Categories

Resources