Python elegant way to map string structure - python

Let's say I know beforehand that the string
"key1:key2[]:key3[]:key4" should map to "newKey1[]:newKey2[]:newKey3"
then given "key1:key2[2]:key3[3]:key4",
my method should return "newKey1[2]:newKey2[3]:newKey3"
(the order of numbers within the square brackets should stay, like in the above example)
My solution looks like this:
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3"}
def transform(parent_key, parent_key_with_index):
indexes_in_parent_key = re.findall(r'\[(.*?)\]', parent_key_with_index)
target_list = predefined_mapping[parent_key].split(":")
t = []
i = 0
for elem in target_list:
try:
sub_result = re.subn(r'\[(.*?)\]', '[{}]'.format(indexes_in_parent_key[i]), elem)
if sub_result[1] > 0:
i += 1
new_elem = sub_result[0]
except IndexError as e:
new_elem = elem
t.append(new_elem)
print ":".join(t)
transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4")
prints newKey1[2]:newKey2[3]:newKey3 as the result.
Can someone suggest a better and elegant solution (around the usage of regex especially)?
Thanks!

You can do it a bit more elegantly by simply splitting the mapped structure on [], then interspersing the indexes from the actual data and, finally, joining everything together:
import itertools
# split the map immediately on [] so that you don't have to split each time on transform
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3".split("[]")}
def transform(key, source):
mapping = predefined_mapping.get(key, None)
if not mapping: # no mapping for this key found, return unaltered
return source
indexes = re.findall(r'\[.*?\]', source) # get individual indexes
return "".join(i for e in itertools.izip_longest(mapping, indexes) for i in e if i)
print(transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4"))
# newKey1[2]:newKey2[3]:newKey3
NOTE: On Python 3 use itertools.zip_longest() instead.
I still think you're over-engineering this and that there is probably a much more elegant and far less error-prone approach to the whole problem. I'd advise stepping back and looking at the bigger picture instead of hammering out this particular solution just because it seems to be addressing the immediate need.

Related

Dynamically look for index() in list from set of options

def _parse_options(productcode_array):
if not self._check_productcode_has_options(productcode_array):
return None
possible_options = {"UV1", "UV2", "Satin", "Linen", "Unco", "Natural"}
option_index = productcode_array.index()
Example value of productcode_array:
["BC", "1.5x3.5", "100lb", "Linen", "Q100"]
My initial thought was to maybe try/except with a list comprehension but I feel there's probably a cleaner way I don't know about.
What I'm trying to achieve is getting the index position within my list productcode_array where any 1 of the possible_options exist. I know there will always only be 1 of the options present. The reason I need this is because the index position within the productcode is dependent on a number of factors.
What would be a clean and effective way to use index() with each of the values of my possible_options set?
>>> next(i for i, code in enumerate(productcode_array) if code in possible_options)
3
or
>>> productcode_array.index(possible_options.intersection(productcode_array).pop())
3
Example with try/except:
for x in possible_options:
try:
option_index = productcode_array.index(x)
except ValueError:
pass
This does work, but it feels dirty, so open to cleaner options.
You could use set.intersection and then assign to option_index (assuming there's only one common value, as stated in the comments):
For example:
possible_options = {"UV1", "UV2", "Satin", "Linen", "Unco", "Natural"}
productcode_array = ["BC", "1.5x3.5", "100lb", "Linen", "Q100"]
for v in possible_options.intersection(productcode_array):
option_index = productcode_array.index(v)
print(option_index)
Prints:
3

Iterating over references to variables in Python

I have an object called Song, which is defined as:
class Song(object):
def __init__(self):
self.title = None
self.songauthor = None
self.textauthor = None
self.categories = None
Inside this class I have a method that parses a run-time property of that object, "metadata", which is basically just a text file with some formatted text that I parse with regular expressions. During this process, I have come up with the following code that I am pretty certain can be simplified to a loop.
re_title = re.compile("^title:(.*)$", re.MULTILINE)
re_textauthor = re.compile("^textauthor:(.*)$", re.MULTILINE)
re_songauthor = re.compile("^songauthor:(.*)$", re.MULTILINE)
re_categories = re.compile("^categories:(.*)$", re.MULTILINE)
#
# it must be possible to simplify the below code to a loop...
#
tmp = re_title.findall(self.metadata)
self.title = tmp[0] if len(tmp) > 0 else None
tmp = re_textauthor.findall(self.metadata)
self.textauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_songauthor.findall(self.metadata)
self.songauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_categories.findall(self.metadata)
self.categories = tmp[0] if len(tmp) > 0 else None
I'm guessing this can be done by encapsulating a reference to the property (e.g. self.title) and the corresponding regular expression (re_title) in a datatype (possibly tuple), and then iterate over a list of these data types.
I have a tried using a tuple as such:
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
data = x[1].findall(self.metadata)
x[0] = data[0] if len(data) > 0 else None
This failed horribly as I cannot modify a tuple in run-time. Can anyone provide a suggestion as to how I can pull this off?
There are two problems with your code.
The big one is that x[0] is not a reference to self.title, it's a reference to the value of self.title. In other words, you're just copying the existing title into a tuple, then replacing that title in the tuple with a different one, which has no effect on the existing title.
The smaller one is that you can't replace elements in a tuple. You could fix that trivially by using a list instead of a tuple, but you're still going to have the big problem.
So, how do you create references to variables in Python? You can't. You need to think of a way to reorganize things. For example, maybe you can access these things by name, instead of by reference. Instead of four separate variables, store a dictionary of four variables in a single dictionary:
res = {
'title': re.compile("^title:(.*)$", re.MULTILINE),
'textauthor': re.compile("^textauthor:(.*)$", re.MULTILINE)
'songauthor': re.compile("^songauthor:(.*)$", re.MULTILINE)
'categories': re.compile("^categories:(.*)$", re.MULTILINE)
}
class Song(object):
def __init__(self):
self.properties = {}
def parsify(self, text):
for thing in ('title', 'textauthor', 'songauthor', 'categories'):
data = res[thing].findall(self.metadata)
self.properties[thing] = data[0] if len(data) > 0 else None
You could also use for thing in res: there, because that will iterate over all the keys (in arbitrary order, but you probably don't care about the order).
If you really need to have self.title, you've run into a common problem. Usually, there's a clear distinction between data—which should be referred to by runtime strings—and attributes—which should not. But sometimes, there isn't. So you have to bridge between them in some way. You can create four #property fields that return self.properties['title'], or you can use setattr(self, thing, …) instead of self.properties[thing], or various other possibilities. Which one is best comes down to whether they're more data-like or more attribute-like.
Instead of assigning to the tuple, update the class members directly:
all_res = {'title':re_title,
'textauthor': re_textauthor,
'songauthor': re_song_author,
'categories': re_categories}
for k, v in all_res.iteritems():
tmp = v.findall(self.metadata)
if tmp:
setattr(self, k, tmp[0])
else:
setattr(self, k, None)
If you only care about the first match, you don't need to use findall.
abarnert's answer has given a good explanation of what is going wrong with your code, but I wanted to offer up an alternative solution. Rather than using a loop to assign each variable, try creating an iterable of the different values from the parsed file, then use a single unpacking-assignment to get them into the various variables.
Here's a two-statement solution using a list comprehension, which is made just a bit tricky by the fact that you need to reference the result of findall twice in if/else expression (thus the nested generator expression):
vals = [x[0] if len(x) > 0 else None for x in (regex.findall(self.metadata) for regex in
[re_title, re_textauthor,
re_songauthor, re_categories])]
self.title, self.textauthor, self.songauthor, self.categories = vals
You can probably simplify things a little bit in the first part of the list comprehension. To start with, you can just test if x rather than if len(x) > 0. Or, if you're not too attached to using findall, you could use search instead, then just use x and x.group(0) instead of the whole if/else bit. The search method returns None if no match was found, so the short-circuiting behavior of the and operator will do exactly what we want.
An example would be to use a dictionary like this:
things = {}
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
if len(x[1].findall(self.metadata):
things[x[0]] = x[1].findall(self.metadata)[1]
else:
things[x[0]] = None
Could this be a possible solution?

Having problems in extracting duplicates

I am stumped with this problem, and no matter how I get around it, it is still giving me the same result.
Basically, supposedly I have 2 groups - GrpA_null and GrpB_null, each having 2 meshes in them and are named exactly the same, brick_geo and bars_geo
- Result: GrpA_null --> brick_geo, bars_geo
But for some reason, in the code below which I presume is the one giving me problems, when it is run, the program states that GrpA_null has the same duplicates as GrpB_null, probably they are referencing the brick_geo and bars_geo. As soon as the code is run, my children geo have a numerical value behind,
- Result: GrpA_null --> brick_geo0, bars_geo0, GrpB_null1 --> brick_geo, bars_geo1
And so, I tried to modify the code such that it will as long as the Parent (GrpA_null and GrpB_null) is different, it shall not 'touch' on the children.
Could someone kindly advice me on it?
def extractDuplicateBoxList(self, inputs):
result = {}
for i in range(0, len(inputs)):
print '<<< i is : %s' %i
for n in range(0, len(inputs)):
print '<<< n is %s' %n
if i != n:
name = inputs[i].getShortName()
# Result: brick_geo
Lname = inputs[i].getLongName()
# Result: |GrpA_null|concrete_geo
if name == inputs[n].getShortName():
# If list already created as result.
if result.has_key(name):
# Make sure its not already in the list and add it.
alreadyAdded = False
for box in result[name]:
if box == inputs[i]:
alreadyAdded = True
if alreadyAdded == False:
result[name].append(inputs[i])
# Otherwise create a new list and add it.
else:
result[name] = []
result[name].append(inputs[i])
return result
There are a couple of things you may want to be aware of. First and foremost, indentation matters in Python. I don't know if the indentation of your code as is is as intended, but your function code should be indented further in than your function def.
Secondly, I find your question a little difficult to understand. But there are several things which would improve your code.
In the collections module, there is (or should be) a type called defaultdict. This type is similar to a dict, except for it having a default value of the type you specify. So a defaultdict(int) will have a default of 0 when you get a key, even if the key wasn't there before. This allows the implementation of counters, such as to find duplicates without sorting.
from collections import defaultdict
counter = defaultdict(int)
for item in items:
counter[item] += 1
This brings me to another point. Python for loops implement a for-each structure. You almost never need to enumerate your items in order to then access them. So, instead of
for i in range(0,len(inputs)):
you want to use
for input in inputs:
and if you really need to enumerate your inputs
for i,input in enumerate(inputs):
Finally, you can iterate and filter through iterable objects using list comprehensions, dict comprehensions, or generator expressions. They are very powerful. See Create a dictionary with list comprehension in Python
Try this code out, play with it. See if it works for you.
from collections import defaultdict
def extractDuplicateBoxList(self, inputs):
counts = defaultdict(int)
for input in inputs:
counts[input.getShortName()] += 1
dup_shns = set([k for k,v in counts.items() if v > 1])
dups = [i for i in inputs if input.getShortName() in dup_shns]
return dups
I was on the point to write the same remarks as bitsplit, he has already done it.
So I just give you for the moment a code that I think is doing exactly the same as yours, based on these remarks and the use of the get dictionary's method:
from collections import defaultdict
def extract_Duplicate_BoxList(self, inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
name = A.getShortName() # Result: brick_geo
Lname = A.getLongName() # Result: |GrpA_null|concrete_geo
for n in (j for j,B in enumerate(inputs)
if j!=i and B.getShortName()==name):
print '<<< n is %s' %n
if A not in result.get(name,[])):
result[name].append(A)
return result
.
Secondly, as bitsplit said it, I find your question ununderstandable.
Could you give more information on the elements of inputs ?
Your explanations about GrpA_null and GrpB_null and the names and the meshes are unclear.
.
EDIT:
If my reduction/simplification is correct, examining it , I see that What you essentially does is to compare A and B elements of inputs (with A!=B) and you record A in the dictionary result at key shortname (only one time) if A and B have the same shortname shortname;
I think this code can still be reduced to just:
def extract_Duplicate_BoxList(inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
result[B.getShortName()].append(A)
return result
this may be do what your looking for if I understand it, which seems to be comparing the sub-hierarchies of different nodes to see if they are they have the same names.
import maya.cmds as cmds
def child_nodes(node):
''' returns a set with the relative paths of all <node>'s children'''
root = cmds.ls(node, l=True)[0]
children = cmds.listRelatives(node, ad=True, f=True)
return set( [k[len(root):] for k in children])
child_nodes('group1')
# Result: set([u'|pCube1|pCubeShape1', u'|pSphere1', u'|pSphere1|pSphereShape1', u'|pCube1']) #
# note the returns are NOT valid maya paths, since i've removed the root <node>,
# you'd need to add it back in to actually access a real shape here:
all_kids = child_nodes('group1')
real_children = ['group1' + n for n in all_kids ]
Since the returns are sets, you can test to see if they are equal, see if one is a subset or superset of the other, see what they have in common and so on:
# compare children
child_nodes('group1') == child_nodes('group2')
#one is subset:
child_nodes('group1').issuperset(child_nodes('group2'))
Iterating over a bunch of nodes is easy:
# collect all the child sets of a bunch of nodes:
kids = dict ( (k, child_nodes(k)) for k in ls(*nodes))

Python nested for loop: what am I doing wrong?

I am working with data pulled from a spreadsheet-like file. I am trying to find, for each "ligand", the item with the lowest corresponding "energy". To do this I'm trying to make a list of all the ligands I find in the file, and compare them to one another, using the index value to find the energy of each ligand, keeping the one with the lowest energy. However, the following loop is not working out for me. The program won't finish, it just keeps running until I cancel it manually. I'm assuming this is due to an error in the structure of my loop.
for item in ligandList:
for i in ligandList:
if ligandList.index(item) != ligandList.index(i):
if ( item == i ) :
if float(lineList[ligandList.index(i)][42]) < float(lineList[ligandList.index(item)][42]):
lineList.remove(ligandList.index(item))
else:
lineList.remove(ligandList.index(i))
As you can see, I've created a separate ligandList containing the ligands, and am using the current index of that list to access the energy values in the lineList.
Does anyone know why this isn't working?
It is a bit hard to answer without some actual data to play with, but I hope this works, or at least leads you into the right direction:
for idx1, item1 in enumerate(ligandList):
for idx2, item2 in enumerate(ligandList):
if idx1 == idx2: continue
if item1 != item2: continue
if float(lineList[idx1][42]) < float(lineList[idx2][42]):
del lineList [idx1]
else:
del lineList [idx2]
That’s a really inefficient way of doing things. Lots of index calls. It might just feel infinite because it’s slow.
Zip your related things together:
l = zip(ligandList, lineList)
Sort them by “ligand” and “energy”:
l = sorted(l, key=lambda t: (t[0], t[1][42]))
Grab the first (lowest) “energy” for each:
l = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(l, key=lambda t: t[0]))
Yay.
result = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(
sorted(zip(ligandList, lineList), key=lambda t: (t[0], t[1][42])),
lambda t: t[0]
))
It would probably look more flattering if you made lineList contain classes of some kind.
Demo
You look like you're trying to find the element in ligandList with the smallest value in index 42. Let's just do that....
min(ligandList, key=lambda x: float(x[42]))
If these "Ligands" are something you use regularly, STRONGLY consider writing a class wrapper for them, something like:
class Ligand(object):
def __init__(self,lst):
self.attr_name = lst[index_of_attr] # for each attribute
... # for each attribute
... # etc etc
self.energy = lst[42]
def __str__(self):
"""This method defines what the class looks like if you call str() on
it, e.g. a call to print(Ligand) will show this function's return value."""
return "A Ligand with energy {}".format(self.energy) # or w/e
def transmogfiscate(self,other):
pass # replace this with whatever Ligands do, if they do things...
In which case you can simply create a list of the Ligands:
ligands = [Ligand(ligand) for ligand in ligandList]
and return the object with the smallest energy:
lil_ligand = min(ligands, key=lambda ligand: ligand.energy)
As a huge aside, PEP 8 encourages the use of the lowercase naming convention for variables, rather than mixedCase as many languages use.

Do I have to cause an ValueError in Python

I have this code:
chars = #some list
try:
indx = chars.index(chars)
except ValueError:
#doSomething
else:
#doSomethingElse
I want to be able to do this because I don't like knowfully causing Exceptions:
chars = #some list
indx = chars.index(chars)
if indx == -1:
#doSomething
else:
#doSomethingElse
Is there a way I can do this?
Note that the latter approach is going against the generally accepted "pythonic" philosophy of EAFP, or "It is Easier to Ask for Forgiveness than Permission.", while the former follows it.
if element in mylist:
index = mylist.index(element)
# ... do something
else:
# ... do something else
For the specific case where your list is a sequence of single-character strings you can get what you want by changing the list to be searched to a string in advance (eg. ''.join(chars)).
You can then use the .find() method, which does work as you want. However, there's no corresponding method for lists or tuples.
Another possible option is to use a dictionary instead. eg.
d = dict((x, loc) for (loc,x) in enumerate(chars))
...
index = d.get(chars_to_find, -1) # Second argument is default if not found.
This may also perform better if you're doing a lot of searches on the list. If it's just a single search on a throwaway list though, its not worth doing.

Categories