How to get names in string? - python

Get last name of the directory:
str = "/folderA/folderB/folderC/folderD"
editstr = str.split("/")[-1]
print(editstr)
folderD
How do I get all the directories before folderD (without the last slash)? E.g:
editstr = ???
print(editstr)
/folderA/folderB/folderC

There is a module for this.
>>> import os
>>> s = "/folderA/folderB/folderC/folderD"
>>> os.path.basename(s)
'folderD'
>>> os.path.dirname(s)
'/folderA/folderB/folderC'

You may use str.rsplit():
>>> editstr = str.rsplit('/folderD', 1)[0]
>>> print(editstr)
/folderA/folderB/folderC

Related

add text to a path by adding text to it with python (FTP Path NCBI)

I'm in a bind please. do you know if i can duplicate the last folder in the path please and add "_genomic.fna.gz" to it for example how to change from this
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3
to this :
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3/GCA_001316945.3_ASM131694v3_genomic.fna.gz
Thanks
urllib.parse.urlparse can split your URL into parts we can work with.
posixpath.join can help us build the full path.
urllib.parse.urlunparse can help us get a complete URL back.
>>> import urllib.parse
>>> import posixpath
>>> url = "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3"
>>> parts = urllib.parse.urlparse(url)
>>> parts.path
'/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3'
>>> posixpath.basename(parts.path)
'GCA_001316945.3_ASM131694v3'
>>> suffix = "_genomic.fna.gz"
>>> prefix = posixpath.basename(parts.path)
>>> print(prefix+suffix)
GCA_001316945.3_ASM131694v3_genomic.fna.gz
>>> path = posixpath.join(parts.path, prefix+suffix)
>>> path
'/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3/GCA_001316945.3_ASM131694v3_genomic.fna.gz'
>>> ret = parts._replace(path=path)
>>> print(urllib.parse.urlunparse(ret))
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/316/945/GCA_001316945.3_ASM131694v3/GCA_001316945.3_ASM131694v3_genomic.fna.gz
>>>

os.path.splitext(file.txt.gz) into (file,.txt.gz)

Currently, I have files that end in /path_to_file/file.txt.gz.
I would like to split the extract the filename (before the .txt.gz).
x = os.path.basename("/path_to_file/file.txt.gz")
gives me
file.txt.gz
while
os.path.splitext("file.txt.gz")
gives me
('file.txt','.gz')
Is there a function that would separate 'file' from '.txt.gz'?
I suppose I could just use re.sub(), but was wondering if there exists an os.path function.
Thanks.
Surprised that no one has mentioned that the str.split method takes an argument on the maximum number of times to split on that character: e.g., filepath.split('.', 1).
s = "/path_to_file/file.txt.gz"
basename = os.path.basename(s) # file.txt.gz
filename = basename[:basename.find('.')] # file
extension = basename[basename.find('.'):] # txt.gz
You can do it very easily. Just try:
import os
filename = os.path.split(path)[1]
filename_wout_ext = filename.split('.')[0]
An example would be:
>>> path = "/path_to_file/file.txt.gz"
>>> filename = os.path.split(path)[1]
>>> filename
>>> 'file.txt.gz'
>>> filename_wout_ext = filename.split('.')[0]
>>> filename_wout_ext
>>> 'file'
Try this:
".".join(os.path.basename("/path_to_file/file.txt.gz").split('.')[:1])
>>'file'
os.path.splitext(os.path.splitext(os.path.basename("/path_to_file/file.txt.gz"))[0])[0]
>>'file'

change urlparse.path of a url

Here is the python code:
url = http://www.phonebook.com.pk/dynamic/search.aspx
path = urlparse(url)
print (path)
>>>ParseResult(scheme='http', netloc='www.phonebook.com.pk', path='/dynamic/search.aspx', params='', query='searchtype=cat&class_id=4520&page=1', fragment='')
print (path.path)
>>>/dynamic/search.aspx
Now I need to change the path.path to my requirement. Like if "/dynamic/search.aspx" is the path then I only need the parts between the first slash and last slash including slashes which is "/dynamic/".
I have tried these two lines but end result is not what I expected that's why I am asking this question as my knowledge of "urllib.parse" is insufficient.
path = path.path[:path.path.index("/")]
print (path)
>>>Returns nothing.
path = path.path[path.path.index("/"):]
>>>/dynamic/search.aspx (as it was before, no change.)
In short whatever the path.path result is my need is directory names only. For example:" dynamic/search/search.aspx". now I need "dynamic/search/"
First, the desired part of the path can be obtained using rfind which returns the index of the last occurrence. The + 1 is for keeping the trailing slash.
desired_path = path.path[:path.path.rfind("/") + 1]
Second, use the _replace method to replace the path attribute of the urlparse object as follows:
desired_url = urlunparse(path._replace(path=desired_path))
The full working example:
from urllib.parse import urlparse, urlunparse
url = "http://www.phonebook.com.pk/dynamic/search/search.aspx"
path = urlparse(url)
desired_path = path.path[:path.path.rfind("/") + 1]
desired_url = urlunparse(path._replace(path=desired_path))
I've tried to look into urlparse to find any method that could help in your situation, but didn't find, may be overlooked, but anyway, at this level, you probably would have to make your own method or hack:
>>> path.path
'/dynamic/search.aspx'
>>> import re
>>> d = re.search(r'/.*/', path.path)
>>> d.group(0)
'/dynamic/'
This is just an example to you, you may also use built-in methods, like so:
>>> i = path.path.index('/', 1)
>>>
>>> path.path[:i+1]
'/dynamic/'
EDIT:
I didn't notice your last example, so here is another way:
>>> import os
>>> path = os.path.dirname(path.path) + os.sep
>>> path
'/dynamic/'
>>> path = os.path.dirname(s) + os.sep
>>> path
'dynamic/search/'
Or with re:
>>> s
'dynamic/search/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d
<_sre.SRE_Match object; span=(0, 15), match='dynamic/search/'>
>>> d.group(0)
'dynamic/search/'
>>>
>>> s = '/dynamic/search.aspx'
>>> d = re.search(r'.*/', s)
>>> d.group(0)
'/dynamic/'

How to look up two directories in Python?

I know that to go up to a parent directory, you should use
parentname = os.path.abspath(os.path.join(yourpath, os.path.pardir))
But what if I want to get the name of a directory a few folders up?
Say I am given /stuff/home/blah/pictures/myaccount/album, and I want to get the names of the last two folders of "myaccount" and "album" (not the paths, just the names) to use in my script. How do I do that?
>>> p='/stuff/home/blah/pictures/myaccount/album'
>>> os.path.abspath(p).split(os.sep)[-1]
'album'
>>> os.path.abspath(p).split(os.sep)[-2]
'myaccount'
>>> os.path.abspath(p).split(os.sep)[-3]
'pictures'
>>> os.path.abspath(p).split(os.sep)[-4]
'blah'
etc...
There doesn't look to be anything particularly elegant, but this should do the trick:
>>> yourpath = "/stuff/home/blah/pictures/myaccount/album"
>>> import os.path
>>> yourpath = os.path.abspath(yourpath)
>>> (npath, d1) = os.path.split(yourpath)
>>> (npath, d2) = os.path.split(npath)
>>> print d1
album
>>> print d2
myaccount
Keep in mind that os.path.split will return an empty string for the second component if the supplied path ends in a trailing slash, so you might want to make sure you strip that off first if you don't otherwise validate the format of the supplied path.
What about splitting the path to list and get the last two elements?
>>> import os
>>> path_str = ' /stuff/home/blah/pictures/myaccount/album'
>>> path_str.split(os.sep)
[' ', 'stuff', 'home', 'blah', 'pictures', 'myaccount', 'album']
For the relative path such as . and .., os.path.abspath() can be used to pre-process the path string.
>>> import os
>>> path_str = os.path.abspath('.')
>>> path_str.split(os.sep)
['', 'tmp', 'foo', 'bar', 'foobar']

read string backwards and terminate at first '/'

I want to extract just the file name portion of a path. My code below works, but I'd like to know what the better (pythonic) way of doing this is.
filename = ''
tmppath = '/dir1/dir2/dir3/file.exe'
for i in reversed(tmppath):
if i != '/':
filename += str(i)
else:
break
a = filename[::-1]
print a
Try:
#!/usr/bin/python
import os.path
path = '/dir1/dir2/dir3/file.exe'
name = os.path.basename(path)
print name
you'd be better off using standard library for this:
>>> tmppath = '/dir1/dir2/dir3/file.exe'
>>> import os.path
>>> os.path.basename(tmppath)
'file.exe'
Use os.path.basename(..) function.
>>> import os
>>> path = '/dir1/dir2/dir3/file.exe'
>>> path.split(os.sep)
['', 'dir1', 'dir2', 'dir3', 'file.exe']
>>> path.split(os.sep)[-1]
'file.exe'
>>>
The existing answers are correct for your "real underlying question" (path manipulation). For the question in your title (generalizable to other characters of course), what helps there is the rsplit method of strings:
>>> s='some/stuff/with/many/slashes'
>>> s.rsplit('/', 1)
['some/stuff/with/many', 'slashes']
>>> s.rsplit('/', 1)[1]
'slashes'
>>>

Categories