Trying to exclude a substring within a list in python

Trying to exclude a substring within a list in python - python

I have a list phplist containing the following strings (example below), there are many more, this is a snippet of the entire list
/home/comradec/public_html/moodle/config.php
/home/comradec/public_html/moodle/cache/classes/config.php
/home/comradec/public_html/moodle/theme/sky_high/config.php
/home/comradec/public_html/moodle/theme/brick/config.php
/home/comradec/public_html/moodle/theme/serenity/config.php
/home/comradec/public_html/moodle/theme/binarius/config.php
/home/comradec/public_html/moodle/theme/anomaly/config.php
/home/comradec/public_html/moodle/theme/standard/config.php
What I am trying to do is only keep the subdir/config.php file and exclude all other config.php files (eg cache/classes/config.php).
Full code is
for folder, subs, files in os.walk(path):
for filename in files:
if filename.endswith('.php'):
phplist.append(abspath(join(folder, filename)))
for i in phplist:
if i.endswith("/config.php"):
cmsconfig.append(i)
if i.endswith("/mdeploy.php"):
cmslist.append(cms1[18])
So the outcome will only add /config.php file path to the list cmsconfig but what is happening I am getting all the config.php files as in the top example
I have been using the code like is not i.endswith("/theme/brick/config.php") but I want a way to exclude the theme directory from the list.
The reason I am placing the output into a list is I use that output in another area of the code.

Change your if-condition to if i.endswith("moodle/config.php").
If you want to change the folder that you want to this with:
path_ending = '%s/config.php' % folder_name
Now change the if-condition to if i.endswith(path_ending)
This will show paths that end with config.php within the folder tbat you passed.

I think this is what you want. may change the naming of variables it is not pep8 style.
First i sort all entries that the shortest comes first, then i remember which parts are already checked.
url1 = '/home/comradec/public_html/moodle/theme/binarius/config.php'
url2 = '/home/comradec/public_html/moodle/config.php'
url3 = '/home/comradec/public_html/othername/theme/binarius/config.php'
url4 = '/home/comradec/public_html/othername/config.php'
urls = []
urls.append(url1)
urls.append(url2)
urls.append(url3)
urls.append(url4)
moodleUrls = []
checkedDirs = []
#sort
for i in sorted(urls):
if str(i).endswith('config.php'):
alreadyChecked = False
for checkedDir in checkedDirs:
if str(i).startswith(checkedDir):
alreadyChecked = True
break
if not alreadyChecked:
moodleUrls.append(i)
checkedDirs.append(str(i).replace('/config.php',''))
print(checkedDirs)
print(moodleUrls)
Output:
['/home/comradec/public_html/moodle', '/home/comradec/public_html/othername']
['/home/comradec/public_html/moodle/config.php', '/home/comradec/public_html/othername/config.php']

The way I resolved my question. Provides the output I am looking for.
path = "/home/comradec"
phplist = []
cmsconfig = []
config = "config.php"
for folder, subs, files in os.walk(path):
for filename in files:
if filename.endswith('.php'):
phplist.append(abspath(join(folder, filename)))
for i in phplist:
if i.endswith("/mdeploy.php"):
newurl = i
newurl = newurl[:-11]
newurl = newurl + config
for i in phplist:
if i.endswith("/config.php"):
confirmurl = i
if confirmurl == newurl:
cmsconfig.append(newurl)
print('\n'.join(cmsconfig))

Related

python openpyxl.load_workbook(r"mypath")

i want to use this piece of code openpyxl.load_workbook(r"mypath") but the only difference is that mypath is a varialbe path i change everytime depending on a loop of different folders.
PathsList = []
for folderName, subFolders, fileNames in os.walk
fileNamesList.append(os.path.basename(fileName))
PathsList.append(os.path.abspath(fileName))
or i in range(len(fileNamesList)):
j = 1
while j < len(fileNamesList):
if(first3isdigit(fileNamesList[i])) == (first3isdigit(fileNamesList[j])):
if(in_fileName_DOORS in str(fileNamesList[i]) and in_fileName_TAF in str(fileNamesList[j])):
mypath = PathsList[i]
File = openpyxl.load_workbook(r'mypath ')
wsFile = File.active
mypath is not readable as a vairable , is there's any solution!
Edit 1:i thought also about
File = openpyxl.load_workbook(exec(r'%s' % (mypath))
but couldn't since exec can't be inside brackets

This code
File = openpyxl.load_workbook(r'mypath ')
Tries to pass the raw string 'mypath ' as an argument to the load_workbook method.
If you want to pass the contents of the mypath variable to the method, you should remove the apostrophe and the r tag.
File = openpyxl.load_workbook(mypath)
This is basic python synthax. You can read more about it in the documentation.
Please let me know if this is what you needed.
Edit:
If the slashes are a concern you can do the following:
File = openpyxl.load_workbook(mypath.replace('\\','/')

Checking if pairs from zip are correct?

I need your advice on this problem.
I have collected what I need in these two lists: simpl2, astik, with this code:
simpl2 = []
astik = []
for path, subdirs, files in os.walk(rootfolder):
for name in files:
if 'sim2.shp' == name:
simpl2.append(os.path.join(path, name))
elif 'ASTIK.shp' == name:
astik.append(os.path.join(path, name))
The code above searches in a rootfolder that contains folders: v1.v2,v3,v4
So using this:
for i,j in zip(simpl2,astik):
print(i,j)
gives this:
CONTENT
C:\Users\user\Desktop\pl\v1\exported\sim2.shp C:\Users\user\Desktop\pl\v1\ASTIK\ASTIK.shp
C:\Users\user\Desktop\pl\v2\exported\sim2.shp C:\Users\user\Desktop\pl\v4\ASTIK\ASTIK.shp
Question
How to ensure that the pairs would be from the same folder (like the first row that come both from v1 and if don't (like the second row where one is from v2 and the other from v4) make them not have a pair at all.
This should happen because, they will be used later and they have to be correct pairs otherwise I have a code ready with exception for those that don't have a pair, so the problem is how to fix this part that is described earlier.
Explanation
The rootfolder is:
C:\Users\user\Desktop\pl
after that pl there is a v1,v2,v3,v4 folder. Each of these folders has some files that are the same to all the 4 folders. The only difference is that some will be empty. I just want to check if correct pairs of the same v are created in the lists.

Ok, seeing your update maybe you are interested in something more like this:
import os
simpl2 = []
astik = []
rootfolder = r'C:\Users\user\Desktop\pl'
subfolders = [os.path.join(rootfolder, i) for i in ['v1','v2','v3','v4']]
for folder in subfolders:
temp = {name: os.path.join(path, name)
for path, subdirs, files in os.walk(folder)
for name in files
if name in ['sim2.shp', 'ASTIK.shp']}
if len(temp) == 2:
simpl2.append(temp['sim2.shp'])
astik.append(temp['ASTIK.shp'])
OLD CODE
But... if this is your end goal you could also just store the paths. If both files are in the path then you know the path contains both files. You can then easily build the endpaths with os.path.join() when needed.
paths = []
for path, subdirs, files in os.walk(rootfolder):
if ('sim2.shp' in files) and ('ASTIK.shp' in files):
paths.append(path)
Or a more compact format:
lookfor = ['sim2.shp','ASTIK.shp']
paths = [p for p,s,f in os.walk(rootfolder) if all(i in f for i in lookfor)]

Finding all subfolders that contain two files that end with certain strings

So I have a folder, say D:\Tree, that contains only subfolders (names may contain spaces). These subfolders contain a few files - and they may contain files of the form "D:\Tree\SubfolderName\SubfolderName_One.txt" and "D:\Tree\SubfolderName\SubfolderName_Two.txt" (in other words, the subfolder may contain both of them, one, or neither). I need to find every occurence where a subfolder contains both of these files, and send their absolute paths to a text file (in a format explained in the following example). Consider these three subfolders in D:\Tree:
D:\Tree\Grass contains Grass_One.txt and Grass_Two.txt
D:\Tree\Leaf contains Leaf_One.txt
D:\Tree\Branch contains Branch_One.txt and Branch_Two.txt
Given this structure and the problem mentioned above, I'd to like to be able to write the following lines in myfile.txt:
D:\Tree\Grass\Grass_One.txt D:\Tree\Grass\Grass_Two.txt
D:\Tree\Branch\Branch_One.txt D:\Tree\Branch\Branch_Two.txt
How might this be done? Thanks in advance for any help!
Note: It is very important that "file_One.txt" comes before "file_Two.txt" in myfile.txt

import os
folderPath = r'Your Folder Path'
for (dirPath, allDirNames, allFileNames) in os.walk(folderPath):
for fileName in allFileNames:
if fileName.endswith("One.txt") or fileName.endswith("Two.txt") :
print (os.path.join(dirPath, fileName))
# Or do your task as writing in file as per your need
Hope this helps....

Here is a recursive solution
def findFiles(writable, current_path, ending1, ending2):
'''
:param writable: file to write output to
:param current_path: current path of recursive traversal of sub folders
:param postfix: the postfix which needs to match before
:return: None
'''
# check if current path is a folder or not
try:
flist = os.listdir(current_path)
except NotADirectoryError:
return
# stores files which match given endings
ending1_files = []
ending2_files = []
for dirname in flist:
if dirname.endswith(ending1):
ending1_files.append(dirname)
elif dirname.endswith(ending2):
ending2_files.append(dirname)
findFiles(writable, current_path+ '/' + dirname, ending1, ending2)
# see if exactly 2 files have matching the endings
if len(ending1_files) == 1 and len(ending2_files) == 1:
writable.write(current_path+ '/'+ ending1_files[0] + ' ')
writable.write(current_path + '/'+ ending2_files[0] + '\n')
findFiles(sys.stdout, 'G:/testf', 'one.txt', 'two.txt')

os.listdir analog for a zipped directory

My goal is to list all files contained in the certain sub-directory inside a zip-archive.
os.listdir(target_dir) raises a FileNotFoundError, and zfile.namelist() just lists all the files in all directories.
Any ideas?

Try the following:
files = list(filter(lambda f: f.startswith("subdir"), zfile.namelist()))
print(files)
Explanation: filter filters the list supplied by zfile.namelist() on a lambda that is checking whether the filename starts with "subdir".
The filter function does not return a list but rather a filter object (generator) and thus we need to convert it to a list.
You could also use the following line which does the same but uses list comprehension:
files = [f for f in zfile.namelist() if f.startswith("subdir")]
Edit: As pointed out by advance512: "The problem with this solution is that it will also return files in subdirectories inside the subdirectory you're checking.":
files = [f for f in zfile.namelist() if f.startswith("subdir") and f.count("/") == 1]
This will not return any files in sub-sub directories.

You can use the supplied zip_listdir function, which is a bit quick-n-dirty but should always work in Unix clones.
class MockZipFile(object):
fake_file_names = [
"string.pyc", # Top level name
"test/__init__.pyc", # Package directory
"test/test_support.pyc", # Module test.test_support
"test/bogus/__init__.pyc", # Subpackage directory
"test/bogus/myfile.pyc" # Submodule test.bogus.myfile
]
def namelist(self):
return self.fake_file_names
def zip_listdir(zip_file, target_dir):
file_names = zip_file.namelist()
if not target_dir.endswith("/"):
target_dir += "/"
if target_dir == "/":
target_dir = ""
result = [ file_name
for file_name in file_names
if file_name.startswith(target_dir) and
not "/" in file_name[len(target_dir):]
]
return result
mockZipfile = MockZipFile()
print zip_listdir(zip_file=mockZipfile, target_dir="test")
print zip_listdir(zip_file=mockZipfile, target_dir="test/bogus")
print zip_listdir(zip_file=mockZipfile, target_dir="test/")
print zip_listdir(zip_file=mockZipfile, target_dir="/")
print zip_listdir(zip_file=mockZipfile, target_dir="")
print zip_listdir(zip_file=mockZipfile, target_dir="/asd")
Please note I created a MockZipFile class, and am using it as the input for the zip_listdir function, but a proper zipfile object should work exactly the same.

Use of regular expression to exclude characters in file rename,Python?

I am trying to rename files so that they contain an ID followed by a -(int). The files generally come to me in this way but sometimes they come as 1234567-1(crop to bottom).jpg.
I have been trying to use the following code but my regular expression doesn't seem to be having any effect. The reason for the walk is because we have to handles large directory trees with many images.
def fix_length():
for root, dirs, files in os.walk(path):
for fn in files:
path2 = os.path.join(root, fn)
filename_zero, extension = os.path.splitext(fn)
re.sub("[^0-9][-]", "", filename_zero)
os.rename(path2, filename_zero + extension)
fix_length()
I have inserted print statements for filename_zero before and after the re.sub line and I am getting the same result (i.e. 1234567-1(crop to bottom) not what I wanted)
This raises an exception as the rename is trying to create a file that already exists.
I thought perhaps adding the [-] in the regex was the issue but removing it and running again I would then expect 12345671.jpg but this doesn't work either. My regex is failing me or I have failed the regex.
Any insight would be greatly appreciated.
As a follow up, I have taken all the wonderful help and settled on a solution to my specific problem.
path = 'C:\Archive'
errors = 'C:\Test\errors'
num_files = []
def best_sol():
num_files = []
for root, dirs, files in os.walk(path):
for fn in files:
filename_zero, extension = os.path.splitext(fn)
path2 = os.path.join(root, fn)
ID = re.match('^\d{1,10}', fn).group()
if len(ID) <= 7:
if ID not in num_files:
num_files = []
num_files.append(ID)
suffix = str(len(num_files))
os.rename(path2, os.path.join(root, ID + '-' + suffix + extension))
else:
num_files.append(ID)
suffix = str(len(num_files))
os.rename(path2, os.path.join( root, ID + '-' + suffix +extension))
else:
shutil.copy(path2, errors)
os.remove(path2)
This code creates an ID based upon (up to) the first 10 numeric characters in the filename. I then use lists that store the instances of this ID and use the, length of the list append a suffix. The first file will have a -1, second a -2 etc...
I am only interested (or they should only be) in ID's with a length of 7 but allow to read up to 10 to allow for human error in labelling. All files with ID longer than 7 are moved to a folder where we can investigate.
Thanks for pointing me in the right direction.

re.sub() returns the altered string, but you ignore the return value.
You want to re-assign the result to filename_zero:
filename_zero = re.sub("[^\d-]", "", filename_zero)
I've corrected your regular expression as well; this removes anything that is not a digit or a dash from the base filename:
>>> re.sub(r'[^\d-]', '', '1234567-1(crop to bottom)')
'1234567-1'
Remember, strings are immutable, you cannot alter them in-place.
If all you want is the leading digits, plus optional dash-digit suffix, select the characters to be kept, rather than removing what you don't want:
filename_zero = re.match(r'^\d+(?:-\d)?', filename_zero).group()

new_filename = re.sub(r'^([0-9]+)-([0-9]+)', r'\g1-\g2', filename_zero)
Try using this regular expression instead, I hope this is how regular expressions work in Python, I don't use it often. You also appear to have forgotten to assign the value returned by the re.sub call to the filename_zero variable.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to exclude a substring within a list in python - python

Related

python openpyxl.load_workbook(r"mypath")

Checking if pairs from zip are correct?

Finding all subfolders that contain two files that end with certain strings

os.listdir analog for a zipped directory

Use of regular expression to exclude characters in file rename,Python?

Categories

Resources