How to check the file names are in os.listdir('.')? - python

I use RegEx & a String to get if this file name & similars to it exists in os.listdir('.') or not, If exists print('Yes'), If not print('No'), But If the file name even doesn't exists in my listdir('.') It shows me YES.
How should I check that ?
search = str(args[0])
pattern = re.compile('.*%s.*\.pdf' %search, re.I)
if filter(pattern.search, os.listdir('.')):
print('Yes ...')
else:
print('No ...')

filter on Python 3 is lazy, it doesn't return a list, it returns a generator, which is always "truthy", whether or not it would produce items (it doesn't know if it would until it's run out). If you want to check if it got any hits, the most efficient way would be to try to pull an item from it. On Python 3, you'd use two-arg next to do this lazily (so you stop when you get a hit and don't look further):
if next(filter(pattern.search, os.listdir('.')), False):
If you need the complete list a la Py2, you'd just wrap it in the list constructor:
matches = list(filter(pattern.search, os.listdir('.')))
On Python 2, your existing code should work as written.
I'll note, what you're doing would usually be handled much better with the glob module; I'd strongly recommend taking a look at it.

An alternative to your code (not considering additional requirements you might not have listed):
from pathlib import Path
search = str(args[0]).lower()
file_cnt = sum([search in file.stem.lower() for file in Path('.').glob('*.pdf')])
if file_cnt > 0:
print('Yes')
else:
print('No')

Related

python os.path.exists returns false

I am officially lost.. I have a trivial piece of python which checks whether a file exists. The path is stored in a dictionary.
When I execute next line of code, it returns false:
if not os.path.exists(self.parameters['icm_profile']):
raise FileDoesNotExistError(PreprocessingErrors.FileNotPresent, "icm profile {} not found".format(self.parameters['icm_profile']))
When I copy THE EXACT STRING and execute the next line, it returns true:
if not os.path.exists("S:\\IAI\\Data\\Recipes\\BGD\\Inkjet\\LPI\\CMY_360x720dpi_2dpd_profilev6.icm"):
print("aap")
Thus, then the path does exist.
There is no difference I can think of... What in gods name am I doing wrong?
Looks to me like you've got an extra pair of quotes in there which could be causing you hassle. Try:
self.parameters['icm_profile'][1:-1]
Another approach mentioned by DeepSpace is to use
self.parameters['icm_profile'].strip('"')
Or if you're really paranoid
self.parameters['icm_profile'].strip('"').strip("'")

Is it possible to find which condition has failed in a single if/else statement with multiple conditions?

Very shortly I need to verify if 3 conditions are verified and if not execute something regarding the failed conditions. I know I can iterate through the 3 conditions with multiple if/else statements but I was wondering if there is a simpler and more concise way to do it.
In a more generic way:
if condition1 and condition2 and condition3: pass
else: print which condition has failed
For an applied case:
if file_exist("1.txt") and file_exist("2.txt") and file_exist("3.txt"):
pass
else:
#find which condition has failed
#for file of the failed condition
create_file(...)
I am not looking to solve the example above! My question is about a way of finding which condition is not verified in a series of conditions on a single if/else statement!
Regards
I know I can iterate through the 3 files with multiple if/else statements
Every time you notice a repetition like this in a programming problem, it's a pretty good sign you can use a cycle:
for filename in ("1.txt", "2.txt", "3.txt"):
if not file_exist(filename):
create_file(...)
You could also use a list comprehension:
[create_file( filename ) for filename in ("1.txt", "2.txt", "3.txt") if not file_exist(filename)]
This is closer to the way you read it in english, but some people will frown upon it, because you're using a list comprehension to cause side-effects, instead of actually creating a list.
No, it's not possible. The if statement has only one condition, in this case that condition is condition1 and condition2 and condition3. It doesn't "remember" the results of sub-expressions in that expression, so unless they have side-effects you're out of luck.
Also be aware that if condition1 is false then it doesn't evaluate condition2 at all. So if you wanted to know which conditions (plural) failed, then and would be entirely the wrong tool for the job. You could instead do something like:
results = (condition1, condition2, condition3)
if all(results):
pass
else:
# look at the individual values
In practice, though, if you're going to "do something" for each false value when you look at the individual values, then you don't need to special-case them all being true. Just execute the same code doing nothing at each step.
I suppose that just to prove a point, you could do something peculiar to record the first failure:
def countem(result):
if result:
countem.count += 1
return result
countem.count = 0
if countem(condition1) and countem(condition2) and countem(condition3):
pass
else:
print countem.count
Or get rid of the if to be a tiny bit more concise:
conditions = (lambda: condition1, lambda: condition2, lambda: condition3)
first_failed = sum(1 for _ in itertools.takewhile(lambda f: f(), conditions))
Of course this is not sensible code for your example, but as far as it goes, it handles the general case.
Instead of just using an if/else solution:
for file_name in('1.txt', '2.txt', '3.txt'):
try:
with open(file_name): # default mode 'r' to read file
#do something... or not
except IOError:
with open(file_name, 'w') as f:
#something to do...
#modes can be 'w', 'a', 'w+', 'a+' for writing, appending,write/read, append/read respectively. There are others...
There is also:
import os.path
file_path = '/this path if all files/ have the same path.../'
for file_name in('1.txt', '2.txt', '3.txt'):
if os.path.exists(file_path/file_name):
continue
else:
#create the file
# though os.path.exists this will return True for directories also
I was just going to say it looks like a for loop to meet certain conditions.
try putting in print statements to see if each condition is being met...simple but effective.

Python text processing and parsing

I have a file in gran/config.py AND I cannot import this file (not an option).
Inside this config.py, there is the following code
...<more code>
animal = dict(
bear = r'^bear4x',
tiger = r'^.*\tiger\b.*$'
)
...<more code>
I want to be able parse r'^bear4x' or r'^.*\tiger\b.*$' based on bear or tiger.
I started out with
try:
text = open('gran/config.py','r')
tline = filter('not sure', text.readlines())
text.close()
except IOError, str:
pass
I was hoping to grab the whole animal dict by
grab = re.compile("^animal\s*=\s*('.*')") or something like that
and maybe change tline to tline = filter(grab.search,text.readlines())
but it only grabs animal = dict( and not the following lines of dict.
how can i grab multiple lines?
look for animal then confirm the first '(' then continue to look until ')' ??
Note: the size of animal dict may change so anything static approach (like grab 4 extra lines after animal is found) wouldnt work
Maybe you should try some AST hacks? With python it is easy, just:
import ast
config= ast.parse( file('config.py').read() )
So know you have your parsed module. You need to extract assign to animals and evaluate it. There are safe ast.literal_eval function but since we make a call to dict it wont work here. The idea is to traverse whole module tree leaving only assigns and run it localy:
class OnlyAssings(ast.NodeTransformer):
def generic_visit( self, node ):
return None #throw other things away
def visit_Module( self, node ):
#We need to visit Module and pass it
return ast.NodeTransformer.generic_visit( self, node )
def visit_Assign(self, node):
if node.targets[0].id == 'animals': # this you may want to change
return node #pass it
return None # throw away
config= OnlyAssings().visit(config)
Compile it and run:
exec( compile(config,'config.py','exec') )
print animals
If animals should be in some dictionary, pass it as a local to exec:
data={}
exec( compile(config,'config.py','exec'), globals(), data )
print data['animals']
There is much more you can do with ast hacking, like visit all If and For statement or much more. You need to check documentation.
If the only reason you can't import that file as-is is because of imports that will fail otherwise, you can potentially hack your way around it than trying to process a perfectly good Python file as just text.
For example, if I have a file named busted_import.py with:
import doesnotexist
foo = 'imported!'
And I try to import it, I will get an ImportError. But if I define what the doesnotexist module refers to using sys.modules before trying to import it, the import will succeed:
>>> import sys
>>> sys.modules['doesnotexist'] = ""
>>> import busted_import
>>> busted_import.foo
'imported!'
So if you can just isolate the imports that will fail in your Python file and redefine those prior to attempting an import, you can work around the ImportErrors
I am not getting what exactly are you trying to do.
If you want to process each line with regular expression - you have ^ in regular expression re.compile("^animal\s*=\s*('.*')"). It matches only when animal is at the start of line, not after some spaces. Also of course it does not match bear or tiger - use something like re.compile("^\s*([a-z]+)\s*=\s*('.*')").
If you want to process multiple lines with single regular expression,
read about re.DOTALL and re.MULTILINE and how they affect matching newline characters:
http://docs.python.org/2/library/re.html#re.MULTILINE
Also note that text.readlines() reads lines, so the filter function in filter('not sure', text.readlines()) is run on each line, not on whole file. You cannot pass regular expression in this filter(<re here>, text.readlines()) and hope it will match multiple lines.
BTW processing Python files (and HTML, XML, JSON... files) using regular expressions is not wise. For every regular expression you write there are cases where it will not work. Use parser designed for given format - for Python source code it's ast. But for your use case ast is too complex.
Maybe it would be better to use classic config files and configparser. More structured data like lists and dicts can be easily stored in JSON or YAML files.

Python program to search for specific strings in hash values (coding help)

Trying to write a code that searches hash values for specific string's (input by user) and returns the hash if searchquery is present in that line.
Doing this to kind of just learn python a bit more, but it could be a real world application used by an HR department to search a .csv resume database for specific words in each resume.
I'd like this program to look through a .csv file that has three entries per line (id#;applicant name;resume text)
I set it up so that it creates a hash, then created a string for the resume text hash entry, and am trying to use the .find() function to return the entire hash for each instance.
What i'd like is if the word "gpa" is used as a search query and it is found in s['resumetext'] for three applicants(rows in .csv file), it prints the id, name, and resume for every row that has it.(All three applicants)
As it is right now, my program prints the first row in the .csv file(print resume['id'], resume['name'], resume['resumetext']) no matter what the searchquery is, whether it's in the resumetext or not.
lastly, are there better ways to doing this, by searching word documents, pdf's and .txt files in a folder for specific words using python (i've just started reading about the re module and am wondering if this may be the route, rather than putting everything in a .csv file.)
def find_details(id2find):
resumes_f=open("resume_data.csv")
for each_line in resumes_f:
s={}
(s['id'], s['name'], s['resumetext']) = each_line.split(";")
resumetext = str(s['resumetext'])
if resumetext.find(id2find):
return(s)
else:
print "No data matches your search query. Please try again"
searchquery = raw_input("please enter your search term")
resume = find_details(searchquery)
if resume:
print resume['id'], resume['name'], resume['resumetext']
The line
resumetext = str(s['resumetext'])
is redundant, because s['resumetext'] is already a string (since it comes as one of the results from a .split call). So, you can merge this line and the next into
if id2find in s['resumetext']: ...
Your following else is misaligned -- with it placed like that, you'll print the message over and over again. You want to place it after the for loop (and the else isn't needed, though it would work), so I'd suggest:
for each_line in resumes_f:
s = dict(zip('id name resumetext'.split(), each_line.split(";"))
if id2find in s['resumetext']:
return(s)
print "No data matches your search query. Please try again"
I've also shown an alternative way to build dict s, although yours is fine too.
What #Justin Peel said. Also to be more pythonic I would say change
if resumetext.find(id2find) != -1: to if id2find in resumetext:
A few more changes: you might want to lower case the comparison and user input so it matches GPA, gpa, Gpa, etc. You can do this by doing searchquery = raw_input("please enter your search term").lower() and resumetext = s['resumetext'].lower(). You'll note I removed the explicit cast around s['resumetext'] as it's not needed.
One change that I recommend for your code is changing
if resumetext.find(id2find):
to
if resumetext.find(id2find) != -1:
because find() returns -1 if id2find wasn't in resumetext. Otherwise, it returns the index where id2find is first found in resumetext, which could be 0. As #Personman commented, this would give you the false positive because -1 is interpreted as True in Python.
I think that problem has something to do with the fact that find_details() only returns the first entry for which the search string is found in resumetext. It might be good to make find_details() into a generator instead and then you could iterate over it and print the found records out one by one.

Manipulating Directory Paths in Python

Basically I've got this current url and this other key that I want to merge into a new url, but there are three different cases.
Suppose the current url is localhost:32401/A/B/foo
if key is bar then I want to return localhost:32401/A/B/bar
if key starts with a slash and is /A/bar then I want to return localhost:32401/A/bar
finally if key is its own independent url then I just want to return that key = http://foo.com/bar -> http://foo.com/bar
I assume there is a way to do at least the first two cases without manipulating the strings manually, but nothing jumped out at me immediately in the os.path module.
Have you checked out the urlparse module?
From the docs,
from urlparse import urljoin
urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
Should help with your first case.
Obviously, you can always do basic string manipulation for the rest.
I assume there is a way to do at least the first two cases without manipulating the strings manually, but nothing jumped out at me immediately in the os.path module.
That's because you want to use urllib.parse (for Python 3.x) or urlparse (for Python 2.x) instead.
I don't have much experience with it, though, so here's a snippet using str.split() and str.join().
urlparts = url.split('/')
if key.startswith('http://'):
return key
elif key.startswith('/'):
return '/'.join(urlparts[:2], key[1:])
else:
urlparts[len(urlparts) - 1] = key
return '/'.join(urlparts)
String objects in Python all have startswith and endswith methods that should be able to get you there. Something like this perhaps?
def merge(current, key):
if key.startswith('http'):
return key
if key.startswith('/'):
parts = current.partition('/')
return '/'.join(parts[0], key)
parts = current.rpartition('/')
return '/'.join(parts[0], key)

Categories