I have several shapefiles in a folder, each suffixed by _LINES or _AREAS.
I would like to get rid of these suffixes
California_Aqueduct_LINES.shp --> California_Aqueduct.shp
California_Aqueduct_LINES.dbf --> California_Aqueduct.dbf
California_Aqueduct_LINES.prj --> California_Aqueduct.prj
California_Aqueduct_LINES.shx--> California_Aqueduct.shx
Subdivision_AREAS.dbf --> Subdivision.dbf
Subdivision_AREAS.prj --> Subdivision.prj
Subdivision_AREAS.SHP --> Subdivision.SHP
Subdivision_AREAS.shx --> Subdivision.shx
Can i do this way:
ls = ['California_Aqueduct_LINES.shp',
truncate = set(['LINES', 'AREAS'])
for p in [x.split('_') for x in ls]:
pre, suf = p[-1].split('.')
if pre in truncate:
print '_'.join(p[:-1]) + '.' + suf
print '_'.join(p[:-1]) + p[-1]
Here is something I've put together in ArcPY
import os, sys, arcpy
InFolder = sys.argv[1] # make this a hard set path if you like
arcpy.env.workspace = InFolder
CapitalKeywordsToRemove = ["_AREAS","_LINES"]# MUST BE CAPITALS
DS_List = arcpy.ListFeatureClasses("*.shp","ALL") # Get a list of the feature classes (shapefiles)
for ThisDS in DS_List:
NewName = ThisDS # set the new name to the old name
# replace key words, searching is case sensitive but I'm trying not to change the case
# of the name so the new name is put together from the original name with searching in
# upper case, use as many key words as you like
for CapKeyWord in CapitalKeywordsToRemove:
if CapKeyWord in NewName.upper():
# remove the instance of CapKeyWord without changing the case
NewName = NewName[0:NewName.upper().find(CapKeyWord)] + NewName[NewName.upper().find(CapKeyWord) + len(CapKeyWord):]
if NewName != ThisDS:
if not arcpy.Exists(NewName):
arcpy.AddMessage("Renaming " + ThisDS + " to " + NewName)
arcpy.Rename_management(ThisDS , NewName)
arcpy.AddWarning("Cannot rename, " + NewName + " already exists")
arcpy.AddMessage("Retaining " + ThisDS)
If you don't have arcpy let me know and I'll change it to just straight python... there's a bit more to it but it's not difficult.
I have a path like this one :
path = "./corpus_test/corpus_ix_test_FMC.xlsx"
I want to retrieve the name of the file without ".xlsx" and the other parts of the file.
I know I should use index like this but there are some cases the file is different ans the path is not the same , for example :
path2 = "./corpus_ix/corpus_polarity_test_FMC.xlsx"
I am looking for a regular expression or a method which with retrieve only the name in both cases. for example, if I read a full repertory, there with lot of files and using index won't help me.
Is there a way to do it in python and telling it it should start slicing at the last "/" ? so that I only retrieve the index of "/" and add "1" to start from.
what I try but still thinking
path ="./corpus_test/corpus_ix_test_FMC.xlsx"
list_of_index =[]
for f in path:
if f == "/":
ind = path.index(f)
ind_to_start_count = max(list_of_index) + 1
print(" the index of the last "/" is" :
name_of_file = path[ind_to_start_count:-5] #
But the printing give me 1 for each / , is there a way to have the index of the letters part of the string ?
But the index of / in both case is 1 for each r ?
wanted to split in caracter but get error with
path ="./corpus_test/corpus_ix_test_FMC.xlsx"
path_string = path.split("")
ValueError Traceback (most recent call last)
<ipython-input-9-b8bdc29c19b1> in <module>
1 path ="./corpus_test/corpus_ix_test_FMC.xlsx"
----> 3 path_string = path.split("")
4 print(path_string)
ValueError: empty separator
import os
fullpath = r"./corpus_test/corpus_ix_test_FMC.xlsx"
full_filename = os.path.basename(fullpath)
filename, ext = os.path.splitext(full_filename)
This would give you the base filename without the extension
This is what I've been using:
def get_suffix(word, delimeter):
""" Returns the part of word after the last instance of 'delimeter' """
while delimeter in word:
word = word.partition(delimeter)[2]
return word
def get_terminal_path(path):
Returns the last step of a path
Edge cases:
-Delimeters: / or \\ or mixed
-Ends with delimeter or not
# Convert "\\" to "/"
while "\\" in path:
part = path.partition("\\")
path = part[0] + "/" + part[2]
# Check if ends with delimeter
if path[-1] == "/":
path = path[0:-1]
# Get terminal path
out = get_suffix(path, "/")
return out.partition(".")[0]
So I have a rather general question I was hoping to get some help with. I put together a Python program that runs through and automates workflows at the state level for all the different counties. The entire program was created for research at school - not actual state work. Anyways, I have two designs shown below. The first is an updated version. It takes about 40 minutes to run. The second design shows the original work. Note that it is not a well structured design. However, it takes about five minutes to run the entire program. Could anybody give any insight why there are such differences between the two? The updated version is still ideal as it is much more reusable (can run and grab any dataset in the url) and easy to understand. Furthermore, 40 minutes to get about a hundred workflows completed is still a plus. Also, this is still a work in progress. A couple minor issues still need to be addressed in the code but it is still a pretty cool program.
Updated Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
path = os.getcwd()
def pickData():
myCount = 1
path1 = 'path2URL'
response = urllib2.urlopen(path1)
print "Enter the name of the files you need"
numZips = raw_input()
numZips2 = numZips.split(",")
myResponse(myCount, path1, response, numZips2)
def myResponse(myCount, path1, response, numZips2):
myPath = os.getcwd()
for each in response:
eachNew = each.split(" ")
eachCounty = eachNew[9].strip("\n").strip("\r")
myCountyDir = os.mkdir(os.path.expanduser(myPath+ "\\counties" + "\\" + eachCounty))
myRetrieveDir = myPath+"\\counties" + "\\" + eachCounty
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ").strip("\n").strip("\r")
numZipsIter = 0
path8 = path1 + eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1
downZip = eachNew[9][0:len(eachNew[9])-2]+".zip"
while(numZipsIter <len(numZips2)):
if (numZips2[numZipsIter][0:3].strip(" ") == "NWI") and ("remap" not in allFinal1):
numZips2New = numZips2[numZipsIter].split("_")
if (numZips2New[0].strip(" ") in allFinal1 and numZips2New[1] != "remap" and numZips2New[2].strip(" ") in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
#maybe just have numzips2 (raw input) as the values before the county number
#numZips2[numZipsIter][0:-7].strip(" ") in allFinal1 or numZips2[numZipsIter][0:-7].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"
elif (numZips2[numZipsIter].strip(" ") in allFinal1 or numZips2[numZipsIter].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
#client picks shapefiles to add to map
#section for geoprocessing operations
# get the data frames
#add new data frame, title
#check spaces in ftp crawler
env.workspace = path+ "\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
def myGeoprocessing(layer1, layer2):
#the code in this function is used for geoprocessing operations
#it returns whatever output is generated from the tools used in the map
arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", layer1, path + "\\counties\\" + layer2 + "\\Streams.shp")
streams = arcpy.mapping.Layer(path + "\\counties\\" + layer2 + "\\Streams.shp")
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
return streams
def makeMap():
#original wetlands layers need to be entered as NWI_line or NWI_poly
print "Enter the layer or layers you wish to include in the map"
myInput = raw_input();
counter1 = 1
for each in zp1:
print each
print path
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
#print eachNew
if (eachNew[-4:] == ".shp") and ((myInput in eachNew[0:-7] or myInput.lower() in eachNew[0:-7])or((eachNew[8:12] == "poly" or eachNew[8:12]=='line') and eachNew[8:12] in myInput)):
print eachNew[0:-7]
theMap = arcpy.mapping.MapDocument(path +'\\map.mxd')
df1 = arcpy.mapping.ListDataFrames(theMap,"*")[0]
#this is where we add our layers
layer1 = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
if(eachNew[7:11] == "poly" or eachNew[7:11] =="line"):
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +myInput+'.lyr')
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +eachNew[0:-7]+'.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(theMap, "LEGEND_ELEMENT", "Legend")[0]
# add wetland layer to map
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, layer1,"AUTO_ARRANGE")
#geoprocessing steps
streams = myGeoprocessing(layer1, each)
# more geoprocessing options, add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
df1.extent = layer1.getExtent(True)
arcpy.mapping.ExportToJPEG(theMap, path + "\\counties\\" + each + "\\map.jpg")
# Save map document to path
theMap.saveACopy(path + "\\counties\\" + each + "\\map.mxd")
del theMap
print "done with map " + str(counter1)
print "issue with map or already exists"
Original Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
response = urllib2.urlopen('path2URL')
path1 = 'path2URL'
myCount = 1
for each in response:
eachNew = each.split(" ")
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
#print all1
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ")
if allFinal1[-10:-2] == "poly.ZIP":
response2 = urllib2.urlopen('path2URL')
zipcontent= response2.readlines()
path8 = 'path2URL'+ eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1[0:len(allFinal1)-2]
downZip = str(eachNew[9][0:len(eachNew[9])-2])+ ".zip"
urllib.urlretrieve (path8, downZip)
# Set the path to the directory where your zipped folders reside
zipfilepath = 'F:\Misc\presentation'
# Set the path to where you want the extracted data to reside
extractiondir = 'F:\Misc\presentation\counties'
# List all data in the main directory
zp1 = os.listdir(zipfilepath)
# Creates a loop which gives use each zipped folder automatically
# Concatinates zipped folder to original directory in variable done
for each in zp1:
print each[-4:]
if each[-4:] == ".zip":
done = zipfilepath + "\\" + each
zip1 = zipfile.ZipFile(done)
extractiondir1 = extractiondir + "\\" + each[:-4]
path = os.getcwd()
counter1 = 1
# get the data frames
# Create new layer for all files to be added to map document
env.workspace = "E:\\Misc\\presentation\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
for each in zp1:
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
if eachNew[-4:] == ".shp":
wetlandMap = arcpy.mapping.MapDocument('E:\\Misc\\presentation\\wetland.mxd')
df1 = arcpy.mapping.ListDataFrames(wetlandMap,"*")[0]
#print eachNew[-4:]
wetland = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
#arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", wetland, path + "\\counties\\" + each + "\\Streams.shp")
streams = arcpy.mapping.Layer(path + "\\symbology\\Stream_order.shp")
arcpy.ApplySymbologyFromLayer_management(wetland, path + '\\symbology\\wetland.lyr')
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(wetlandMap, "LEGEND_ELEMENT", "Legend")[0]
# add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, wetland,"AUTO_ARRANGE")
df1.extent = wetland.getExtent(True)
# Export the map to a pdf
arcpy.mapping.ExportToJPEG(wetlandMap, path + "\\counties\\" + each + "\\wetland.jpg")
# Save map document to path
wetlandMap.saveACopy(path + "\\counties\\" + each + "\\wetland.mxd")
del wetlandMap
print "done with map " + str(counter1)
Have a look at this guide:
Let me quote:
Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates.
So effectively this suggests, to not factor out something as a function that is going to be called hundreds of thousands of times.
In Python functions won't be inlined, and calling them is not cheap. If in doubt use a profiler to find out how many times is each function called, and how long does it take on average. Then optimize.
You might also give PyPy a shot, as they have certain optimizations built in. Reducing the function call overhead in some cases seems to be one of them:
Python equivalence to inline functions or macros
My goal is to download full metazoan genome sequences from NCBI. I have a list of unique ID numbers for the genome sequences I need. I planned to use the Bio.Entrez module EFetch to download the data but learned today via the Nov 2, 2011 release notes (http://1.usa.gov/1TA5osg) that EFetch does not support the 'Genome' database. Can anyone suggest an alternative package/module or some other way around this? Thank you in advance!
Here is a script for you -- though you may need to tinker with it to make it work. Name the script whatever you prefer, but when you call the script do so as follows:
python name_of_script[with .py extension] your_email_address.
You need to add your email to the end of the call else it will not work. If you have a text file of accession numbers (1/line), then choose option 2. If you choose option 1, it will ask you for items like the name of the organism, strain name, and keywords. Use as many keywords as you would like -- just be certain to separate them by commas. If you go with the first option, NCBI will be searched and will return GI numbers [NOTE: NCBI is phasing out the GI numbers in 9.2016 so this script may not work after this point] which will then be used to snag the accession numbers. Once all the accession numbers are present, a folder is created, and a subfolder is created for each accession number (named as the accession number). In each subfolder, the corresponding fasta AND genbank file will be downloaded. These files will carry the accession number as the file name (e.g. accession_number.fa, accession_number.gb). Edit script to your purposes.
ALSO...Please note the warning (ACHTUNG) portion of the script. Sometimes the rules can be bent...but if you are egregious enough, your IP may be blocked from NCBI. You have been warned.
import os
import os.path
import sys
import re #regular expressions
from Bio import Entrez
import datetime
import time
import glob
arguments = sys.argv
Entrez.email = arguments[1] #email
accession_ids = []
print('Select method for obtaining the accession numbers?\n')
action = input('1 -- Input Search Terms\n2 -- Use text file\n')
if action == '1':
print('\nYou will be asked to enter an organism name, a strain name, and keywords.')
print('It is not necessary to provide a value to each item (you may just hit [ENTER]), but you must provide at least one item.\n')
organism = input('Enter the organism you wish to search for (e.g. Escherichia coli [ENTER])\n')
strain = input('Enter the strain you wish to search for. (e.g., HUSEC2011 [ENTER])\n')
keywords = input('Enter the keywords separated by a comma (e.g., complete genome, contigs, partial [ENTER])\n')
search_phrase = ''
if ',' in keywords:
keywords = keywords.split(',')
ncbi_terms = ['organism', 'strain', 'keyword']
ncbi_values = [organism, strain, keywords]
for index, n in enumerate(ncbi_values):
if index == 0 and n != '':
search_phrase = '(' + n + '[' + ncbi_terms[index] + '])'
if n != '' and index != len(ncbi_values)-1:
search_phrase = search_phrase + ' AND (' + n + '[' + ncbi_terms[index] + '])'
if index == len(ncbi_values)-1 and n != '' and type(n) is not list:
search_phrase = search_phrase + ' AND (' + n + '[' + ncbi_terms[index] + '])'
if index == len(ncbi_values)-1 and n != '' and type(n) is list:
for name in n:
name = name.lstrip()
search_phrase = search_phrase + ' AND (' + name + '[' + ncbi_terms[index] + '])'
print('Here is the complete search line that will be used: \n\n', search_phrase)
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=1000, rettype='acc', retmode='text')
result = Entrez.read(handle)
gi_numbers = result['IdList']
fetch_handle = Entrez.efetch(db='nucleotide', id=result['IdList'], rettype='acc', retmode='text')
accession_ids = [id.strip() for id in fetch_handle]
if action == '2': #use this option if you have a file of accession #s
file_name = input('Enter the name of the file\n')
with open(file_name, 'r') as input_file:
lines = input_file.readlines()
for line in lines:
line = line.replace('\n', '')
#----------------------------------- Make directory to store files --------------------------------------------
new_path = 'Genbank_Files/'
if not os.path.exists(new_path):
print('You have ' + str(len(accession_ids)) + ' file(s) to download.') #print(accession_ids)
files = []
for dirpath, dirnames, filenames in os.walk(new_path):
for filename in [f for f in filenames if f.endswith(ending)]: #for zipped files
for f in files:
f = f.rsplit('/')[-1]
f = f.replace('.gb', '')
if f in accession_ids:
ind = accession_ids.index(f)
print('You have ' + str(len(accession_ids)) + ' file(s) to download.')
# Call Entrez to download files
# If downloading more than 100 files...
# Run this script only between 9pm-5am Monday - Friday EST
# Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
# Make no more than 3 requests every 1 second (Biopython takes care of this).
# Use URL parameter email & tool for distributed software
# NCBI's Disclaimer and Copyright notice must be evident to users of your service.
# Use this script at your own risk.
# Neither the script author nor author's employers are responsible for consequences arising from improper usage
# CALL ENTREZ: Call Entrez to download genbank AND fasta (nucleotide) files using accession numbers.
start_day = datetime.date.today().weekday() # 0 is Monday, 6 is Sunday
start_time = datetime.datetime.now().time()
print(str(start_day), str(start_time))
if ((start_day < 5 and start_time > datetime.time(hour=21)) or (start_day < 5 and start_time < datetime.time(hour=5)) or start_day > 5 or len(accession_ids) <= 100 ):
print('Calling Entrez...')
for a in accession_ids:
if ((datetime.date.today().weekday() < 5 and datetime.datetime.now().time() > datetime.time(hour=21)) or
(datetime.date.today().weekday() < 5 and datetime.datetime.now().time() < datetime.time(hour=5)) or
(datetime.date.today().weekday() == start_day + 1 and datetime.datetime.now().time() < datetime.time(hour=5)) or
(datetime.date.today().weekday() > 5) or len(accession_ids) <= 100 ):
print('Downloading ' + a)
new_path = 'Genbank_Files/' + a + '/'
if not os.path.exists(new_path):
handle=Entrez.efetch(db='nucleotide', id=a, rettype='gb', retmode='text', seq_start=0)
FILENAME = new_path + a + '.gb'
handle=Entrez.efetch(db='nucleotide', id=a, rettype='fasta', retmode='text')
FILENAME = new_path + a + '.fna'
print('You have too many files to download at the time. Try again later.')
I am writing a script in python to consolidate images in different folders to a single folder. There is a possibility of multiple image files with same names. How to handle this in python? I need to rename those with "image_name_0001", "image_name_0002" like this.
You can maintain a dict with count of a names that have been seen so far and then use os.rename() to rename the file to this new name.
for example:
dic = {}
list_of_files = ["a","a","b","c","b","d","a"]
for f in list_of_files:
if f in dic:
dic[f] += 1
new_name = "{0}_{1:03d}".format(f,dic[f])
print new_name
dic[f] = 0
print f
If you have the root filename i.e name = 'image_name', the extension, extension = '.jpg' and the path to the output folder, path, you can do:
*for each file*:
moved = 0
num = 0
if os.path.exists(path + name + ext):
while moved == 0:
modifier = '_00'+str(num)
if not os.path.exists(path + name + modifier + extension):
*MOVE FILE HERE using (path + name + modifier + extension)*
moved = 1
*MOVE FILE HERE using (path + name + ext)*
There are obviously a couple of bits of pseudocode in there but you should get the gist
I am trying to:
Loop through a bunch of files
makes some changes
Copy the old file to a sub directory. Here's the kicker I don't want to overwrite the file in the new directory if it already exists. (e.g. if "Filename.mxd" already exists, then copy and rename to "Filename_1.mxd". If "Filename_1.mxd" exists, then copy the file as "Filename_2.mxd" and so on...)
save the file (but do a save, not a save as so that it overwrites the existing file)
it goes something like this:
for filename in glob.glob(os.path.join(folderPath, "*.mxd")):
fullpath = os.path.join(folderPath, filename)
mxd = arcpy.mapping.MapDocument(filename)
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
# Make some changes to my file here
# Copy the in memory file to a new location. If the file name already exists, then rename the file with the next instance of i (e.g. filename + "_" + i)
for i in range(50):
if i > 0:
print "Test1"
if arcpy.Exists(draftloc + "\\" + filename2) or arcpy.Exists(draftloc + "\\" + shortname + "_" + str(i) + extension):
print "Test2"
print "Test3"
arcpy.Copy_management(filename2, draftloc + "\\" + shortname + "_" + str(i) + extension)
So, 2 things I decided to do, was to just set the range of files well beyond what I expect to ever occur (50). I'm sure there's a better way of doing this, by just incrementing to the next number without setting a range.
The second thing, as you may see, is that the script saves everything in the range. I just want to save it once on the next instance of i that does not occur.
Hope this makes sense,
Use a while loop instead of a for loop. Use the while loop to find the appropriate i, and then save afterwards.
The code/pseudocode would look like:
result_name = original name
i = 0
while arcpy.Exists(result_name):
result_name = draftloc + "\\" + shortname + "_" + str(i) + extension
save as result_name
This should fix both issues.
thanks to Maty suggestion above, I've come up with my answer. For those who are interested, my code is:
result_name = filename2
print result_name
i = 0
# Check if file exists
if arcpy.Exists(draftloc + "\\" + result_name):
# If it does, increment i by 1
# While each successive filename (including i) does not exists, then save the next filename
while not arcpy.Exists(draftloc + "\\" + shortname + "_" + str(i) + extension):
mxd.saveACopy(draftloc + "\\" + shortname + "_" + str(i) + extension)
# else if the original file didn't satisfy the if, the save it.
mxd.saveACopy(draftloc + "\\" + result_name)