Manipulating Directory Paths in Python - python

Basically I've got this current url and this other key that I want to merge into a new url, but there are three different cases.
Suppose the current url is localhost:32401/A/B/foo
if key is bar then I want to return localhost:32401/A/B/bar
if key starts with a slash and is /A/bar then I want to return localhost:32401/A/bar
finally if key is its own independent url then I just want to return that key = http://foo.com/bar -> http://foo.com/bar
I assume there is a way to do at least the first two cases without manipulating the strings manually, but nothing jumped out at me immediately in the os.path module.

Have you checked out the urlparse module?
From the docs,
from urlparse import urljoin
urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
Should help with your first case.
Obviously, you can always do basic string manipulation for the rest.

I assume there is a way to do at least the first two cases without manipulating the strings manually, but nothing jumped out at me immediately in the os.path module.
That's because you want to use urllib.parse (for Python 3.x) or urlparse (for Python 2.x) instead.
I don't have much experience with it, though, so here's a snippet using str.split() and str.join().
urlparts = url.split('/')
if key.startswith('http://'):
return key
elif key.startswith('/'):
return '/'.join(urlparts[:2], key[1:])
else:
urlparts[len(urlparts) - 1] = key
return '/'.join(urlparts)

String objects in Python all have startswith and endswith methods that should be able to get you there. Something like this perhaps?
def merge(current, key):
if key.startswith('http'):
return key
if key.startswith('/'):
parts = current.partition('/')
return '/'.join(parts[0], key)
parts = current.rpartition('/')
return '/'.join(parts[0], key)

Related

How to check the file names are in os.listdir('.')?

I use RegEx & a String to get if this file name & similars to it exists in os.listdir('.') or not, If exists print('Yes'), If not print('No'), But If the file name even doesn't exists in my listdir('.') It shows me YES.
How should I check that ?
search = str(args[0])
pattern = re.compile('.*%s.*\.pdf' %search, re.I)
if filter(pattern.search, os.listdir('.')):
print('Yes ...')
else:
print('No ...')
filter on Python 3 is lazy, it doesn't return a list, it returns a generator, which is always "truthy", whether or not it would produce items (it doesn't know if it would until it's run out). If you want to check if it got any hits, the most efficient way would be to try to pull an item from it. On Python 3, you'd use two-arg next to do this lazily (so you stop when you get a hit and don't look further):
if next(filter(pattern.search, os.listdir('.')), False):
If you need the complete list a la Py2, you'd just wrap it in the list constructor:
matches = list(filter(pattern.search, os.listdir('.')))
On Python 2, your existing code should work as written.
I'll note, what you're doing would usually be handled much better with the glob module; I'd strongly recommend taking a look at it.
An alternative to your code (not considering additional requirements you might not have listed):
from pathlib import Path
search = str(args[0]).lower()
file_cnt = sum([search in file.stem.lower() for file in Path('.').glob('*.pdf')])
if file_cnt > 0:
print('Yes')
else:
print('No')

Python Environment variable within environment variable

I'm trying to set up an environment variable via Python:
os.environ["myRoot"]="/home/myName"
os.environ["subDir"]="$myRoot/subDir"
I expect the subDir environment variable to hold /home/myname/subDir, however it holds the string '$myRoot/subDir'. How do I get this functionality?
(Bigger picture : I'm reading a json file of environment variables and the ones lower down reference the ones higher up)
Use os.environ to fetch the value, and os.path to correctly put slashes in the right places:
os.environ["myRoot"]="/home/myName"
os.environ["subDir"] = os.path.join(os.environ['myRoot'], "subDir")
You can use os.path.expandvars to expand environment variables like so:
>>> import os
>>> print os.path.expandvars("My home directory is $HOME")
My home director is /home/Majaha
>>>
For your example, you might do:
os.environ["myRoot"] = "/home/myName"
os.environ["subDir"] = os.path.expandvars("$myRoot/subDir")
I think #johntellsall's answer is the better for the specific example you gave, however I don't doubt you'll find this useful for your json work.
Edit: I would now recommend using #johntellsall's answer, as os.path.expandvars() is designed explicitly for use with paths, so using it for arbitrary strings may work but is kinda hacky.
def fix_text(txt,data):
'''txt is the string to fix, data is the dictionary with the variable names/values'''
def fixer(m): #takes a regex match
match = m.groups()[0] #since theres only one thats all we worry about
#return a replacement or the variable name if its not in the dictionary
return data.get(match,"$%s"%match)
return re.sub("$([a-zA-Z]+)",fixer,txt) #regular expression to match a "$" followed by 1 or more letters
with open("some.json") as f: #open the json file to read
file_text= f.read()
data = json.loads(file_text) #load it into a json object
#try to ensure you evaluate them in the order you found them
keys = sorted(data.keys() ,key=file_text.index)
#create a new dictionary by mapping our ordered keys above to "fixed" strings that support simple variables
data2= dict(map(lambda k:(k,fixer(data[k],data)),keys)
#sanity check
print data2
[edited to fix a typo that would cause it not to work]

Python text processing and parsing

I have a file in gran/config.py AND I cannot import this file (not an option).
Inside this config.py, there is the following code
...<more code>
animal = dict(
bear = r'^bear4x',
tiger = r'^.*\tiger\b.*$'
)
...<more code>
I want to be able parse r'^bear4x' or r'^.*\tiger\b.*$' based on bear or tiger.
I started out with
try:
text = open('gran/config.py','r')
tline = filter('not sure', text.readlines())
text.close()
except IOError, str:
pass
I was hoping to grab the whole animal dict by
grab = re.compile("^animal\s*=\s*('.*')") or something like that
and maybe change tline to tline = filter(grab.search,text.readlines())
but it only grabs animal = dict( and not the following lines of dict.
how can i grab multiple lines?
look for animal then confirm the first '(' then continue to look until ')' ??
Note: the size of animal dict may change so anything static approach (like grab 4 extra lines after animal is found) wouldnt work
Maybe you should try some AST hacks? With python it is easy, just:
import ast
config= ast.parse( file('config.py').read() )
So know you have your parsed module. You need to extract assign to animals and evaluate it. There are safe ast.literal_eval function but since we make a call to dict it wont work here. The idea is to traverse whole module tree leaving only assigns and run it localy:
class OnlyAssings(ast.NodeTransformer):
def generic_visit( self, node ):
return None #throw other things away
def visit_Module( self, node ):
#We need to visit Module and pass it
return ast.NodeTransformer.generic_visit( self, node )
def visit_Assign(self, node):
if node.targets[0].id == 'animals': # this you may want to change
return node #pass it
return None # throw away
config= OnlyAssings().visit(config)
Compile it and run:
exec( compile(config,'config.py','exec') )
print animals
If animals should be in some dictionary, pass it as a local to exec:
data={}
exec( compile(config,'config.py','exec'), globals(), data )
print data['animals']
There is much more you can do with ast hacking, like visit all If and For statement or much more. You need to check documentation.
If the only reason you can't import that file as-is is because of imports that will fail otherwise, you can potentially hack your way around it than trying to process a perfectly good Python file as just text.
For example, if I have a file named busted_import.py with:
import doesnotexist
foo = 'imported!'
And I try to import it, I will get an ImportError. But if I define what the doesnotexist module refers to using sys.modules before trying to import it, the import will succeed:
>>> import sys
>>> sys.modules['doesnotexist'] = ""
>>> import busted_import
>>> busted_import.foo
'imported!'
So if you can just isolate the imports that will fail in your Python file and redefine those prior to attempting an import, you can work around the ImportErrors
I am not getting what exactly are you trying to do.
If you want to process each line with regular expression - you have ^ in regular expression re.compile("^animal\s*=\s*('.*')"). It matches only when animal is at the start of line, not after some spaces. Also of course it does not match bear or tiger - use something like re.compile("^\s*([a-z]+)\s*=\s*('.*')").
If you want to process multiple lines with single regular expression,
read about re.DOTALL and re.MULTILINE and how they affect matching newline characters:
http://docs.python.org/2/library/re.html#re.MULTILINE
Also note that text.readlines() reads lines, so the filter function in filter('not sure', text.readlines()) is run on each line, not on whole file. You cannot pass regular expression in this filter(<re here>, text.readlines()) and hope it will match multiple lines.
BTW processing Python files (and HTML, XML, JSON... files) using regular expressions is not wise. For every regular expression you write there are cases where it will not work. Use parser designed for given format - for Python source code it's ast. But for your use case ast is too complex.
Maybe it would be better to use classic config files and configparser. More structured data like lists and dicts can be easily stored in JSON or YAML files.

how do I modify a url that I pick at random in python

I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.

Find value matching value in a list of dicts

I have a list of dicts that looks like this:
serv=[{'scheme': 'urn:x-esri:specification:ServiceType:DAP',
'url': 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/air.mon.anom.nobs.nc'},
{'scheme': 'urn:x-esri:specification:ServiceType:WMS',
'url': 'http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities'},
{'scheme': 'urn:x-esri:specification:ServiceType:WCS',
'url': 'http://ferret.pmel.noaa.gov/geoide/wcs/Datasets/air.mon.anom.nobs.nc?service=WCS&version=1.0.0&request=GetCapabilities'}]
and I want to find the URL corresponding to the ServiceType:WMS which means finding the value of url key in the dictionary from this list where the scheme key has value urn:x-esri:specification:ServiceType:WMS.
So I've got this that works:
for d in serv:
if d['scheme']=='urn:x-esri:specification:ServiceType:WMS':
url=d['url']
print url
which produces
http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities
but I've just watched Raymond Hettinger's PyCon talk and at the end he says that that if you can say it as a sentence, it should be expressed in one line of Python.
So is there a more beautiful, idiomatic way of achieving the same result, perhaps with one line of Python?
Thanks,
Rich
The serv array you listed looks like a dictionary mapping schemes to URLs, but it's not represented as such. You can easily convert it to a dict using list comprehensions, though, and then use normal dictionary lookups:
url = dict([(d['scheme'],d['url']) for d in serv])['urn:x-esri:specification:ServiceType:WMS']
You can, of course, save the dictionary version for future use (at the cost of using two lines):
servdict = dict([(d['scheme'],d['url']) for d in serv])
url = servdict['urn:x-esri:specification:ServiceType:WMS']
If you're only interested in one URL, then you can build a generator over serv and use next with a default value for the cases where a match isn't found, eg:
url = next((dct['url'] for dct in serv if dct['scheme'] == 'urn:x-esri:specification:ServiceType:WMS'), 'default URL / not found')
I would split this into two lines, to separate the target from the url retrieval. This is because your target may change in time, so this should not be hardwired. The single line of code follows.
I would use in instead of == as we want to search for all schemes that are of this type. This adds more flexibility, and readability, assuming this will not also catch other schemes not wanted. But from the description, this is the functionality desired.
target = "ServiceType:WMS"
url = [d['url'] for d in serv if target in d['scheme']]
Also, note, this returns a list in all cases, in case there is more than one match, so you will have to loop over url in the code that uses this.
How about this?
urls = [d['url'] for d in serv if d['scheme'] == 'urn:x-esri:specification:ServiceType:WMS']
print urls # ['http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities']
Its doing the same thing your code is doing, where d['url'] are being appended to the list - urls if they end with WMS
You can even add an else clause:
urls = [i['url'] for i in serv if i['scheme'].endswith('WMS') else pass]
I've been trying to work in more functional programming into my own work, so here is a pretty simple functional way:
needle='urn:x-esri:specification:ServiceType:WMS'
url = filter( lambda d: d['scheme']==needle, serv )[0]['url']
filter takes as arguments a function that returns a boolean and a list to be filtered. It returns a list of elements that return True when passed to the boolean-returning function (in this case a lambda I defined on the fly). So, to finally get the url, we have to take the zeroth element of the list that filter returns. Since that is the dict containing our desired url, we can tag ['url'] on the end of the whole expression to get the corresponding dictionary entry.

Categories