I need to compare the following sequences list items:
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
to:
folder = 'sphere_v002'
and then work on the list items containing folder.
I have a working function for this but I want to improve it.
Current code is:
foundSeq = False
for seq in sequences:
headName = os.path.splitext(seq.head())[0]
#Check name added exception for when name has a last underscore
if headName == folder or headName[:-1] == folder:
foundSeq = True
sequence = seq
if not foundSeq:
...
My improvement looks like this:
if any(folder in os.path.splitext(seq.head())[0] for seq in sequences):
print seq
But then I get the following error:
local variable seq referenced before the assignment
How can I get the correct output working with the improved solution?
any returns a Boolean value only, it won't store in a variable seq the element within sequences when your condition is satisfied.
What you can do is use a generator and utilize the fact None is "Falsy":
def get_seq(sequences, folder):
for seq in sequences:
if folder in os.path.splitext(seq.head())[0]:
yield seq
for seq in get_seq(sequences, folder):
print seq
You can rewrite this, if you wish, as a generator expression:
for seq in (i for i in sequences if folder in os.path.splitext(i.head())[0]):
print seq
If the condition is never specified, the generator or generator expression will not yield any values and the logic within your loop will not be processed.
As pointed out by jpp, any just return a boolean. So the if any is not the good solution in this particular case.
Like suggested by thebjorn, the most efficient code for us so far consists in the use of filter function.
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
match = filter(lambda x: 'sphere_v002' == x[:-1] or 'sphere_v002' == x, sequences)
print match
['sphere_v002_']
Related
I am a python beginner and I learn using dataquest.
I want to use a self-defined function in a loop to check every item in a list, whether it is a color movie or not and add the results (True, False) to a list. Right now the function returns False only, also way to many times. Any hints what I did wrong?
wonder_woman = ['Wonder Woman','Patty Jenkins','Color',141,'Gal Gadot','English','USA',2017]
def is_usa(input_lst):
if input_lst[6] == "USA":
return True
else:
return False
def index_equals_str(input_lst, index, input_str):
if input_lst[index] == input_str:
return True
else:
return False
wonder_woman_in_color = index_equals_str(input_str="Color", index=2, input_lst=wonder_woman)
# End of dataquest challenge
# My own try to use the function in a loop and add the results to a list
f = open("movie_metadata.csv", "r")
data = f.read()
rows = data.split("\n")
aufbereitet = []
for row in rows:
einmalig = row.split(",")
aufbereitet.append(einmalig)
# print(aufbereitet)
finale_liste = []
for item in aufbereitet:
test = index_equals_str(input_str="Color", index=2, input_lst=aufbereitet)
finale_liste.append(test)
print(finale_liste)
Also at pastebin: https://pastebin.com/AESjdirL
I appreciate your help!
The problem is in this line
test = index_equals_str(input_str="Color", index=2, input_lst=aufbereitet)
The input_lst argument should be input_lst=item. Right now you are passing the whole list of lists to your function everytime.
The .csv file is not provided but I assume the reading is correct and it returns a list like the one you provided in the first line of your code; in particular, that you are trying to pack the data in a list of lists (the einmalig variable is a list obtained by the row of the csv file, then you append each einmalig you find in another list, aufbereitet).
The problem is not in the function itself but in the parameters you give as inputs: when you do
test = index_equals_str(input_str="Color", index=2, input_lst=aufbereitet)
you should see that the third parameter is not a list corresponding to the single movie data but the whole list of movies. This means that the Python interpreter, in the function, does this iteration for every item in aufbereitet (that is, iterates for n times where n is aufbereitet's length):
if aufbereitet[2] == "Color":
return True
else:
return False
It is clear that even if the movie is in color, the comparison between a list (an element of aufbereitet) and a string returns False by default since they are different types.
To correct the issue just change the line
test = index_equals_str(input_str="Color", index=2, input_lst=aufbereitet)
with
test = index_equals_str(input_str="Color", index=2, input_lst=item)
since, when you use the for loop in that way, the variable item changes at each iteration with the elements in aufbereitet.
Notice that if you're learning that's still ok to use functions but you can use an inline version of the algorithm (that's what Python is famous for). Using
finale_liste = [item[2] == "Color" for item in aufbereitet]
you obtain the list without going to define a function and without using the for loop. That's called list comprehension.
Another thing you can do to make the code more Pythonic - if you want to use the functions anyway - is to do something like
def index_equals_str(input_lst, index, input_str):
return input_lst[index] == input_str
that has the same result with less lines.
Functional programming is sometimes more readable and adaptable for such tasks:
from functools import partial
def index_equals_str(input_lst, index=1, input_str='Null'):
return input_lst[index] == input_str
input_data = [['Name1', 'Category1', 'Color', 'Language1'],
['Name2', 'Category2', 'BW', 'Language2']]
result = list(map(partial(index_equals_str, input_str='Color', index=2), input_data))
# output
# [True, False]
I am having issues with creating a function that takes a list of tuples and then returns one string which is the first character of each tuple. Below is my current code, but nothing it happening, I do not get a syntax error. Any help would be appreciated.
lst_of_tups = ([('hello', 'all'), ('music', 'playing'), ('celebration', 'station')])
def build_string(lst_of_tups):
final_str = ""
for tup in list_of_tups:
for item in tup:
final_str = final_str + item[0]
return final_str
print build_string
**** expected output: hampcs****
those string manipulation functions are error-prone: they define lots of variables, can return within inner loops, have unexpected side-effects...
Once you're used to list comprehensions, you can create such programs easily & with great execution performance (string concatenation is slow). One way:
def build_string(lst_of_tups):
return "".join([x[0] for y in lst_of_tups for x in y])
basically, it's just 2 loops (flattening the data) within a list comprehension to extract each first character from every string, joined together using str.join to rebuild the string.
Once you reach a return statement in a function, the function ends for good. The line
print build_string
cannot be reached. (Another problem is that the name build_string is not defined.)
Use your function like this:
result = build_string(lst_of_tups) # calls your function and puts the return value in the result variable
print result # print the result
Of course, the intermediary variable result is not necessary, you could just issue print build_string(lst_of_tups) as well.
def mutations (list_a,string1,name,list_b):
""" (list of str, str, list of str, list of str) -> NoneType
"""
dna=list_a
for i in range(len(list_b)):
strand=dna[:dna.index(list_b[i])]
string1=string1[string1.index(list_b[i]):]
dna[strand+string1]
>>>dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
>>>mutations(dna,'CCCGGGGAATTCTCGC',['EcoRI','SmaI'],['GAATTC','CCCGGG'])
>>>mutated
>>>['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC']
It's suppose to modify the first parameter. So basically im trying to modify list_a and making it change to ['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC'] however, i get an error saying
strand=dna[:dna.index(string1[i])].
ValueError: 'GAATTC' is not in list
Also, is there a way if the sequence does not exist, it doesn't modify the function?
Well if I understand you correctly, you want to check each element in list_a if it contains its corresponding element from list_b. If so, you want to modify the element from list_a by replacing the rest of the string (including the list_b element) with part of a control string that does also contain the element from list_b, right?!
Ideally you would put this in your question!!
A way of doing this would be as follow:
def mut(list_a, control, list_b):
check_others = Falsee
for i in range(len(list_a)): # run through list_a (use xrange in python 2.x)
if i == len(list_b): # if we are at the end of list_b we run will
# check all other elements if list_b (see below)
check_others = True
if not check_others: # this is the normal 1 to 1 match.
if list_b[i] in list_a[i]: # if the element from list_b is in it
# correct the element
list_a[i] = list_a[i][:list_a[i].index(list_b[i])] +\
control[control.index(list_b[i]):]
else: # this happens if we are at the end of list_b
for j in xrange(len(list_b)): # run through list_b for the start
if list_b[j] in list_a[i]:
list_a[i] = list_a[i][:list_a[i].index(list_b[j])] +\
control[control.index(list_b[j]):]
break # only the first match with an element in list_b is used!
As described in the comments, dna is a list, not a string, so finding a substring won't quite work how you want.
dna=list_a is unnecessary, and dna[strand+string1] doesn't modify a list, so not sure what you were trying to accomplish there.
All in all, I know the following code doesn't get the output you are expecting (or maybe it does), but hopefully it sets you on the more correct path.
(I removed name because it was not used)
def mutations (mutated,clean,recognition):
""" (list of str, str, list of str) -> NoneType
"""
# Loop over 'mutated' list. We need the index to update the list
for i,strand in enumerate(mutated):
# Loop over 'recognition' list
for rec in recognition:
# Find the indices in the two strings
strand_idx = strand.find(rec)
clean_idx = clean.find(rec)
# Check that 'rec' existed in both strings
if strand_idx > 0 and clean_idx > 0:
# both are found, so get the substrings
strand_str = strand[:strand_idx]
clean_str = clean[clean_idx:]
# debug these values
print(rec, (strand_idx, strand_str,), (clean_idx, clean_str, ))
# updated 'mutated' like this
mutated[i] = strand_str+clean_str
And, the output. (The first dna element changed, the second did not)
dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
mutations(dna,'CCCGGGGAATTCTCGC',['GAATTC','CCCGGG'])
print(dna) # ['TGCAGAATTCTCGC', 'ACGTCCCGGGTTGC']
I have a list with files (the path to them).
I wrote a function like this to remove certain files matching a pattern but it just removes 2 files at most and I don't understand why.
remove_list = ('*.txt',) # Example for removing all .txt files in the list
def removal(list):
for f in list:
if any(fnmatch(basename(f.lower()), pattern) for pattern in remove_list:
list.remove(f)
return list
//Edit; Ok naming my list "list" in the code was a bad idea. in my code here its called differently. Just wanted to give an abstract idea what I'm dealing with. Should have mentioned that
Modifying a list while you're iterating over it is a bad idea, as you can very easily get in edge cases when behaviour is not determined.
The best way to do what you want is to build a new list without the items you don't want:
remove_list = (r'*.txt',) # Example for removing all .txt files in the list
def removal(l, rm_list):
for f in l:
for pattern in rm_list:
if not fnmatch(basename(f.lower()), pattern):
yield f
print(list(removal(list_with_files, remove_list))
Here, I'm unrolling your any one-liner that might make your code look smart, but is hard to read, and might give you headaches in six months. It's better (because more readable) to do a simple for and an if instead!
The yield keyword will make the function return what's called a generator in python, so that when you're iterating over the result of the function, it will return the value, to make it available to the calling context, and then get back to the function to return the next item.
This is why in the print statement, I use list() around the function call, whereas if you iterate over it, you don't need to put it in a list:
for elt in removal(list_with_files, remove_list):
print(elt)
If you don't like using a generator (and the yield statement), then you have to build the list manually, before returning it:
remove_list = (r'*.txt',) # Example for removing all .txt files in the list
def removal(l, rm_list):
ret_list = []
for f in l:
for pattern in rm_list:
if not fnmatch(basename(f.lower()), pattern):
ret_list.append(f)
return ret_list
HTH
You can use str.endswith if you are removing based on extension, you just need to pass a tuple of extensions:
remove_tup = (".txt",".py") # Example for removing all .txt files in the list
def removal(lst):
return [f for f in lst if not f.endswith(remove_tup)]
The code you provided is vague.
1.don't use list it is shadow the build-in list
2.don't modify the list when you iterate it, you can make a copy of it
My suggestion is:
You can iterate your original list and the remove_list as below:
test.py
list1=["file1.txt", "file2.txt", "other.csv"]
list2=["file1.txt", "file2.txt"] # simulates your remove_list
listX = [x for x in list1 if x not in list2] # creates a new list
print listX
$python test.py
['other.csv']
As was said in the comments, don't modify a list as you iterate over it. Can also use a list comprehension like so:
patterns = ('*.txt', '*.csv')
good = [f for f in all_files if not any(fnmatch(basename(f.lower()), pattern) for pattern in patterns)]
I am stumped with this problem, and no matter how I get around it, it is still giving me the same result.
Basically, supposedly I have 2 groups - GrpA_null and GrpB_null, each having 2 meshes in them and are named exactly the same, brick_geo and bars_geo
- Result: GrpA_null --> brick_geo, bars_geo
But for some reason, in the code below which I presume is the one giving me problems, when it is run, the program states that GrpA_null has the same duplicates as GrpB_null, probably they are referencing the brick_geo and bars_geo. As soon as the code is run, my children geo have a numerical value behind,
- Result: GrpA_null --> brick_geo0, bars_geo0, GrpB_null1 --> brick_geo, bars_geo1
And so, I tried to modify the code such that it will as long as the Parent (GrpA_null and GrpB_null) is different, it shall not 'touch' on the children.
Could someone kindly advice me on it?
def extractDuplicateBoxList(self, inputs):
result = {}
for i in range(0, len(inputs)):
print '<<< i is : %s' %i
for n in range(0, len(inputs)):
print '<<< n is %s' %n
if i != n:
name = inputs[i].getShortName()
# Result: brick_geo
Lname = inputs[i].getLongName()
# Result: |GrpA_null|concrete_geo
if name == inputs[n].getShortName():
# If list already created as result.
if result.has_key(name):
# Make sure its not already in the list and add it.
alreadyAdded = False
for box in result[name]:
if box == inputs[i]:
alreadyAdded = True
if alreadyAdded == False:
result[name].append(inputs[i])
# Otherwise create a new list and add it.
else:
result[name] = []
result[name].append(inputs[i])
return result
There are a couple of things you may want to be aware of. First and foremost, indentation matters in Python. I don't know if the indentation of your code as is is as intended, but your function code should be indented further in than your function def.
Secondly, I find your question a little difficult to understand. But there are several things which would improve your code.
In the collections module, there is (or should be) a type called defaultdict. This type is similar to a dict, except for it having a default value of the type you specify. So a defaultdict(int) will have a default of 0 when you get a key, even if the key wasn't there before. This allows the implementation of counters, such as to find duplicates without sorting.
from collections import defaultdict
counter = defaultdict(int)
for item in items:
counter[item] += 1
This brings me to another point. Python for loops implement a for-each structure. You almost never need to enumerate your items in order to then access them. So, instead of
for i in range(0,len(inputs)):
you want to use
for input in inputs:
and if you really need to enumerate your inputs
for i,input in enumerate(inputs):
Finally, you can iterate and filter through iterable objects using list comprehensions, dict comprehensions, or generator expressions. They are very powerful. See Create a dictionary with list comprehension in Python
Try this code out, play with it. See if it works for you.
from collections import defaultdict
def extractDuplicateBoxList(self, inputs):
counts = defaultdict(int)
for input in inputs:
counts[input.getShortName()] += 1
dup_shns = set([k for k,v in counts.items() if v > 1])
dups = [i for i in inputs if input.getShortName() in dup_shns]
return dups
I was on the point to write the same remarks as bitsplit, he has already done it.
So I just give you for the moment a code that I think is doing exactly the same as yours, based on these remarks and the use of the get dictionary's method:
from collections import defaultdict
def extract_Duplicate_BoxList(self, inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
name = A.getShortName() # Result: brick_geo
Lname = A.getLongName() # Result: |GrpA_null|concrete_geo
for n in (j for j,B in enumerate(inputs)
if j!=i and B.getShortName()==name):
print '<<< n is %s' %n
if A not in result.get(name,[])):
result[name].append(A)
return result
.
Secondly, as bitsplit said it, I find your question ununderstandable.
Could you give more information on the elements of inputs ?
Your explanations about GrpA_null and GrpB_null and the names and the meshes are unclear.
.
EDIT:
If my reduction/simplification is correct, examining it , I see that What you essentially does is to compare A and B elements of inputs (with A!=B) and you record A in the dictionary result at key shortname (only one time) if A and B have the same shortname shortname;
I think this code can still be reduced to just:
def extract_Duplicate_BoxList(inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
result[B.getShortName()].append(A)
return result
this may be do what your looking for if I understand it, which seems to be comparing the sub-hierarchies of different nodes to see if they are they have the same names.
import maya.cmds as cmds
def child_nodes(node):
''' returns a set with the relative paths of all <node>'s children'''
root = cmds.ls(node, l=True)[0]
children = cmds.listRelatives(node, ad=True, f=True)
return set( [k[len(root):] for k in children])
child_nodes('group1')
# Result: set([u'|pCube1|pCubeShape1', u'|pSphere1', u'|pSphere1|pSphereShape1', u'|pCube1']) #
# note the returns are NOT valid maya paths, since i've removed the root <node>,
# you'd need to add it back in to actually access a real shape here:
all_kids = child_nodes('group1')
real_children = ['group1' + n for n in all_kids ]
Since the returns are sets, you can test to see if they are equal, see if one is a subset or superset of the other, see what they have in common and so on:
# compare children
child_nodes('group1') == child_nodes('group2')
#one is subset:
child_nodes('group1').issuperset(child_nodes('group2'))
Iterating over a bunch of nodes is easy:
# collect all the child sets of a bunch of nodes:
kids = dict ( (k, child_nodes(k)) for k in ls(*nodes))