Conditional copying of files in python - python

so I'm trying to copy files to another directory if their filename starts with the same 4 digit ID as the values my list.
I'm either getting the wrong data written to the file or nothing at all.
What I have so far:
import shutil
import os
ok_ids = [5252,
8396,
8397,
8397,
8556,
8004,
6545,
6541,
4392,
4392,
6548,
1363,
1363,
1363,
8489,
8652,
1368,
1368]
source = os.listdir("/Users/amm/Desktop/mypath1/")
destination = "/Users/amm/Desktop/mypath2/"
for files in source:
for x in ok_ids:
if files[:4] == x:
shutil.copy(files,destination)
else:
print("not working")
Sample of the files I'm trying to copy i.e. source
0000_051123_192805.txt
0000_051123_192805.txt
8642_060201_113220.txt
8652_060204_152839.txt
8652_060204_152839.txt
309-_060202_112353.txt
x104_051203_064013.txt
destination directory is blank
A few important things: ok_ids does not contain distinct values, but i'd like the the program to treat the list as if it does contain distinct values. for example 8397 appears in the ok_ids list twice and it doesnt need to be iterated over twice in the ok_ids loop (its a verrry long list and i dont fancy editing). source can often contain duplicate id's also, using the example above these are 0000, 8652, but the rest of the filename is different.
So in summary... if 0000 is in my ok_ids list and there are filenames beginning with 0000 in my source directory then i want to copy them into my destination folder.
I've looked at using .startswith but its not happy using a list as the argument even if i cast it to a tuple and then a str. Any help would be amazing.
UPDATE
Could the reason for this not working be that some of the ids contain a hyphen? and others start with a char x not a int value?
The first 4 values are the ID, for example these are still valid:
309-_060202_112353.txt
x104_051203_064013.txt

This should work:
for file in source:
for x in set(ok_ids):
if file.startswith(str(x)):
shutil.copy(file, destination)
Use set() to make numbers unique and str() to convert to string. So you can preprocess the list into a set for better performance.
Or better yet, given your naming constraints:
if int(file.split("_")[0]) in ok_ids:
Why your code doesn't work?
if files[:4] == x:
You're comparing a str with a int, which, intuitively, will always be False.

import os
import shutil
for root, dirs, files in os.walk("/Users/amm/Desktop/mypath1/"):
for file in files:
try:
if int(file[:4]) in ok_ids:
shutil.copy(file,destination)
except:
pass
This worked for me. The only catch is that it crawls all folders in the same directory.

Your code works for me with the slight modification of str(x) instead of x.
Try using this to see what it is doing with each file:
for files in source:
for x in ok_ids:
if files[:4] == str(x):
print("File '{}' matched".format(files))
break
else:
print("File '{}' not matched".format(files))
Or, alternatively, convert all the items in ok_ids to strings and then see what this produces:
ok_ids = [str(id) for id in ok_ids]
files_matched = [file for file in source if file[:4] in ok_ids]

files[:4] == x can never be true because x is an integer and files[:4] is a string. It does not matter if the string representation of x matches:
>>> 123 == '123'
False
I've looked at using .startswith but its not happy using a list as the argument even if i cast it to a tuple and then a str. Any help would be amazing.
This is arguably the best way to solve the problem, but you don't just need a tuple - you need the individual ID values to be strings. There is no possible "cast" (they are not really casts) you can perform on ok_ids that affects the elements.
The simplest way to do that is to make a tuple in the first place, and have the elements of the tuple be strings in the first place:
ok_ids = (
'5252',
'8396',
# ...
'1368'
)
If you do not control this data, you can use a generator expression passed to tuple to create the tuple:
ok_ids = tuple(str(x) for x in ok_ids)

Related

Manipulate the output of if any for loop

I need to compare the following sequences list items:
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
to:
folder = 'sphere_v002'
and then work on the list items containing folder.
I have a working function for this but I want to improve it.
Current code is:
foundSeq = False
for seq in sequences:
headName = os.path.splitext(seq.head())[0]
#Check name added exception for when name has a last underscore
if headName == folder or headName[:-1] == folder:
foundSeq = True
sequence = seq
if not foundSeq:
...
My improvement looks like this:
if any(folder in os.path.splitext(seq.head())[0] for seq in sequences):
print seq
But then I get the following error:
local variable seq referenced before the assignment
How can I get the correct output working with the improved solution?
any returns a Boolean value only, it won't store in a variable seq the element within sequences when your condition is satisfied.
What you can do is use a generator and utilize the fact None is "Falsy":
def get_seq(sequences, folder):
for seq in sequences:
if folder in os.path.splitext(seq.head())[0]:
yield seq
for seq in get_seq(sequences, folder):
print seq
You can rewrite this, if you wish, as a generator expression:
for seq in (i for i in sequences if folder in os.path.splitext(i.head())[0]):
print seq
If the condition is never specified, the generator or generator expression will not yield any values and the logic within your loop will not be processed.
As pointed out by jpp, any just return a boolean. So the if any is not the good solution in this particular case.
Like suggested by thebjorn, the most efficient code for us so far consists in the use of filter function.
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
match = filter(lambda x: 'sphere_v002' == x[:-1] or 'sphere_v002' == x, sequences)
print match
['sphere_v002_']

Pulling data from a files after being matched to a regex

I have two files that contain hashes, one of them looks something like this:
user_id,user_bio,user_pass,user_name,user_email,user_banned,user_regdate,user_numposts,user_timezone,user_bio_status,user_lastsession,user_newpassword,user_email_public,user_allowviewonline,user_lasttimereadpost
1,<blank>,a1fba56e72b37d0ba83c2ccer7172ec8eb1fda6d,human,human#place.com,0,1115584099,1,2.0,1,1115647107,<blank>,0,1,1115647107
2,<blank>,b404bac52c91ef1f291ba9c2719aa7d916dc55e5,josh,josh#place.com,0,1115584767,1,2.0,5,1115585298,<blank>,0,1,1115585126
3,<blank>,3a5fb7652e4c4319455769d5462eb2c4ac4cbe79,rich,rich#place.com,0,1167079798,1,2.0,5,1167079798,<blank>,0,1,1167079887
The other one, looks something like this:
This is a random assortment 3a5fb7652e4c4319455769d5462eb2c4ac4cbe79 of characters in order 3a5fb7652e4c4319455769d5462eb2c4ac4cbe79 to see if I can find a file 3a5fb7652e4c4319455769d5462eb2c4ac4cbe79 full of hashes
I'm trying to pull the hashes from these files using a regular expression to match the hash:
def hash_file_generator(self):
def __fix_re_pattern(regex_string, to_add=r""):
regex_string = list(regex_string)
regex_string[0] = to_add
regex_string[-1] = to_add
return re.compile(''.join(regex_string))
matched_hashes = set()
keys = [k for k in bin.verify_hashes.verify.HASH_TYPE_REGEX.iterkeys()]
with open(self.words) as wordlist:
for item in wordlist.readlines():
for s in item.split("\n"):
for k in keys:
k = __fix_re_pattern(k.pattern)
print k.pattern
if k.findall(s):
matched_hashes.add(s)
return matched_hashes
The regular expression that matches these hashes, looks like this: [a-fA-F0-9]{40}.
However, when this is run, it pulls everything from the first file and saves it into the set, and in the second file it will work successfully:
First file:
set(['1<blank>,a1fba56e72b37d0ba83c2ccer7172ec8eb1fda6d,human,human#place.com,0,1115584099,1,2.0,1,1115647107,<blank>,0,1,1115647107','2,<blank>,b404bac52c91ef1f291ba9c2719aa7d916dc55e5,josh,josh#place.com,0,1115584767,1,2.0,5,1115585298,<blank>,0,1,1115585126','3,<blank>,3a5fb7652e4c4319455769d5462eb2c4ac4cbe79,rich,rich#place.com,0,1167079798,1,2.0,5,1167079798,<blank>,0,1,1167079887'])
Second file:
set(['3a5fb7652e4c4319455769d5462eb2c4ac4cbe79'])
How can I pull just the matched data from the first file using the regex as seen here, and why is it pulling everything instead of just the matched data?
Edit for comments
def hash_file_generator(self):
"""
Parse a given file for anything that matches the hashes in the
hash type regex dict. Possible that this will pull random bytes
of data from the files.
"""
def __fix_re_pattern(regex_string, to_add=r""):
regex_string = list(regex_string)
regex_string[0] = to_add
regex_string[-1] = to_add
return ''.join(regex_string)
matched_hashes = []
keys = [k for k in bin.verify_hashes.verify.HASH_TYPE_REGEX.iterkeys()]
with open(self.words) as hashes:
for k in keys:
k = re.compile(__fix_re_pattern(k.pattern))
matched_hashes = [
i for line in hashes
for i in k.findall(line)
]
return matched_hashes
Output:
[]
If you just want to pull the hashes, this should work:
import re
hash_pattern = re.compile("[a-fA-F0-9]{40}")
with open("hashes.txt", "r") as hashes:
matched_hashes = [i for line in hashes
for i in hash_pattern.findall(line)]
print(matched_hashes)
Note that this doesn't match some of what look like hashes because they contain, for example, an 'r', but it uses your specified regex.
The way this works is by using re.findall, which just return a list of strings representing each match, and using a list comprehension to do this for each line of the file.
When hashes.txt is
user_id,user_bio,user_pass,user_name,user_email,user_banned,user_regdate,user_numposts,user_timezone,user_bio_status,user_lastsession,user_newpassword,user_email_public,user_allowviewonline,user_lasttimereadpost
1,<blank>,a1fba56e72b37d0ba83c2ccer7172ec8eb1fda6d,human,human#place.com,0,1115584099,1,2.0,1,1115647107,<blank>,0,1,1115647107
2,<blank>,b404bac52c91ef1f291ba9c2719aa7d916dc55e5,josh,josh#place.com,0,1115584767,1,2.0,5,1115585298,<blank>,0,1,1115585126
3,<blank>,3a5fb7652e4c4319455769d5462eb2c4ac4cbe79,rich,rich#place.com,0,1167079798,1,2.0,5,1167079798,<blank>,0,1,1167079887
this has the output
['b404bac52c91ef1f291ba9c2719aa7d916dc55e5', '3a5fb7652e4c4319455769d5462eb2c4ac4cbe79']
Having looked at your code as it stands, I can tell you one thing: __fix_re_pattern probably isn't doing what you want it to. It currently removes the first and last character of any regex you pass it, which will ironically and horribly mangle the regex.
def __fix_re_pattern(regex_string, to_add=r""):
regex_string = list(regex_string)
regex_string[0] = to_add
regex_string[-1] = to_add
return ''.join(regex_string)
print(__fix_re_pattern("[a-fA-F0-9]{40}"))
will output
a-fA-F0-9]{40
I'm still missing a lot of context in your code, and it's not quite modular enough to do without. I can't meaningfully reconstruct your code to reproduce any problems, leaving me to troubleshoot by eye. Presumably this is an instance method of an object which has the words, which for some reason contains a file name. I can't really tell what keys is, for example, so I'm still finding it difficult to provide you with an entire 'fix'. I also don't know what the intention behind __fix_re_pattern is, but I think your code would work fine if you just took it out entirely.
Another problem is that for each k in whatever keys is, you overwrite the variable matched_hashes, so you return only the matched hashes for the last key.
Also the whole keys thing is kind of intriguing me.. Is it a call to some kind of globally defined function/module/class which knows about hash regexes?
Now you probably know best what your code wants, but it nevertheless seems a little complicated.. I'd advise you to keep in the back of your mind that my first answer, as it stands, also entirely meets the specification of your question.

how can I add mark every two index in String

def Change(_text):
L = len(_text)
_i = 2
_text[_i] = "*"
_i += 2
print(_text)
How can I add a mark e.g:* every two Index In String
Why are you using _ in your variables? If it is for any of these reasons then you are OK, if it is a made up syntax, try not to use it as it might cause unnecessary confusion.
As for your code, try:
def change_text(text):
for i in range(len(text)):
if i % 2 == 0: # check if i = even (not odd)
print(text[:i] + "*" + text[i+1:])
When you run change_text("tryout string") the output will look like:
*ryout string
tr*out string
tryo*t string
tryout*string
tryout s*ring
tryout str*ng
tryout strin*
If you meant something else, name a example input and wished for output.
See How to create a Minimal, Complete, and Verifiable example
PS: Please realize that strings are immutable in Python, so you cannot actually change a string, only create new ones from it.. if you want to actually change it you might be better of saving it as a list for example. Like they have done here.
Are you trying to separate every two letters with an asterix?
testtesttest
te*st*te*st*te*st
You could do this using itertools.zip_longest to split the string up, and '*'.join to rebuild it with the markers inserted
from itertools import zip_longest
def add_marker(s):
return '*'.join([''.join(x) for x in zip_longest(*[iter(s)]*2, fillvalue='')])

remove certain files in a list matching a pattern

I have a list with files (the path to them).
I wrote a function like this to remove certain files matching a pattern but it just removes 2 files at most and I don't understand why.
remove_list = ('*.txt',) # Example for removing all .txt files in the list
def removal(list):
for f in list:
if any(fnmatch(basename(f.lower()), pattern) for pattern in remove_list:
list.remove(f)
return list
//Edit; Ok naming my list "list" in the code was a bad idea. in my code here its called differently. Just wanted to give an abstract idea what I'm dealing with. Should have mentioned that
Modifying a list while you're iterating over it is a bad idea, as you can very easily get in edge cases when behaviour is not determined.
The best way to do what you want is to build a new list without the items you don't want:
remove_list = (r'*.txt',) # Example for removing all .txt files in the list
def removal(l, rm_list):
for f in l:
for pattern in rm_list:
if not fnmatch(basename(f.lower()), pattern):
yield f
print(list(removal(list_with_files, remove_list))
Here, I'm unrolling your any one-liner that might make your code look smart, but is hard to read, and might give you headaches in six months. It's better (because more readable) to do a simple for and an if instead!
The yield keyword will make the function return what's called a generator in python, so that when you're iterating over the result of the function, it will return the value, to make it available to the calling context, and then get back to the function to return the next item.
This is why in the print statement, I use list() around the function call, whereas if you iterate over it, you don't need to put it in a list:
for elt in removal(list_with_files, remove_list):
print(elt)
If you don't like using a generator (and the yield statement), then you have to build the list manually, before returning it:
remove_list = (r'*.txt',) # Example for removing all .txt files in the list
def removal(l, rm_list):
ret_list = []
for f in l:
for pattern in rm_list:
if not fnmatch(basename(f.lower()), pattern):
ret_list.append(f)
return ret_list
HTH
You can use str.endswith if you are removing based on extension, you just need to pass a tuple of extensions:
remove_tup = (".txt",".py") # Example for removing all .txt files in the list
def removal(lst):
return [f for f in lst if not f.endswith(remove_tup)]
The code you provided is vague.
1.don't use list it is shadow the build-in list
2.don't modify the list when you iterate it, you can make a copy of it
My suggestion is:
You can iterate your original list and the remove_list as below:
test.py
list1=["file1.txt", "file2.txt", "other.csv"]
list2=["file1.txt", "file2.txt"] # simulates your remove_list
listX = [x for x in list1 if x not in list2] # creates a new list
print listX
$python test.py
['other.csv']
As was said in the comments, don't modify a list as you iterate over it. Can also use a list comprehension like so:
patterns = ('*.txt', '*.csv')
good = [f for f in all_files if not any(fnmatch(basename(f.lower()), pattern) for pattern in patterns)]

Replacing characters in renaming

I am trying to do a simple Maya renaming UI but I am stuck at a part - replacing he initial characters in the current naming with other characters
For example; 3 items in the Outliner(irregardless of what they are):- pCube1, - pSphere1, - nurbsSphere1
So far I am able to write up to the point where it can selects and rename 1 or more objects, see code below
objects = []
objects = cmds.ls(sl=True)
for obj in objects:
test = []
test = cmds.rename(obj, "pSphere" )
print objects
# Results: pSphere, pSphere2, pSphere3 #
However, suppose now I am selecting nurbsSphere1 and pSphere1, and I just wanted to replace the word 'Sphere' in them with 'Circle', instead of getting the results as: nurbsCircle1, pCircle1, I got a error message # TypeError: Too many objects or values. #
charReplace = "test"
if charReplace in objects:
newName = []
newName = cmds.rename(objects, "Circle" )
Any advices on it?
As per the documentation rename command takes only strings as input parameters. You are providing a list named objects while trying to rename the filenames.
Moreover, you are searching for string "test" in objects list.
Instead you should search for string "test" in each filename which is present in objects list.
rename command renames the old string with the newer one. It does not replace the substring within a string (e.g. "sphere" in "nurbsSphere"). In order to achieve this you should create the new filenames separately and then use them to rename the files.
You can try this:
charReplace = "test"
for filename in objects:
if charReplace in filename:
newFilename = filename.replace(charReplace, "Circle")
cmds.rename(filename, newFilename)
I do not have Maya installed so code is not tested.

Categories