Python Sequential Requests: Data Processing Automation within ArcMap - python

my python skills are very limited (to none) and I've never created an automated, sequential request for ArcMap. Below are the steps I'd like to code, any advice would be appreciated.
Locate File folder
Import “first” file (table csv) (there are over 500 cvs, the naming convention is not sequential)
Join csv to HUC08 shapefile
Select data without Null values within under the field name “Name”
Save selected data as a layer file within my FoTX.gdb
Move to the next file within the folder and complete the same action until all actions are complete.

#Part of the code. The rest depends mostly on your data
#Set environment settings
arcpy.env.workspace = 'C:/data' #whatever it is for you. you can do this or not
import os, arcpy, csv
mxd = arcpy.mapping.MapDocument("CURRENT")
folderPath=os.path.dirname(mxd.filePath)
#Loop through each csv file
count = 0
for f_name in os.listdir(folderPath):
fullpath = os.path.join(folderPath, f_name)
if os.path.isfile(fullpath):
if f_name.lower().endswith(".csv"):
#import csv file and join to shape file code here
# Set local variables
in_features = ['SomeNAME.shp', 'SomeOtherNAME.shp'] # if there are more
#then one
out_location = 'C:/output/FoTX.gdb'
# out_location =os.path.basename(gdb.filePath) #or if the gdb is in the
#same folder as the csv
#files
# Execute FeatureClassToGeodatabase
arcpy.FeatureClassToGeodatabase_conversion(in_features, out_location)
if count ==0:
print "No CSV files in this folder"

Related

How can i find the inputdata (location(s)) from a Python Script?

I want to create a general program for monitoring purposes to see which inputdata is being used for various models in our company.
therefore, i want to loop through our (production) model folder and find all the .py of .ipynb files and open those, read them as a string using glob (and os). For now, i made a loop that looks for all scripts containing a csv (as a start):
path = directory
search_word = 'csv'
#list to store files that contain matching word
final_files = []
for folder_path, folders, files in os.walk(path):
#IPYNB files
path = folder_path+'\\*.IPYNB'
for filepath in glob.glob(path, recursive=True):
try:
with open(filepath) as fp:
# read the file as a string
data = fp.read()
if search_word in data:
final_files.append(filepath)
except:
print('Exception while reading file')
print(final_files)
This gives back, all IPYNB files containing the word csv in the script. So, i'm able toe read within the files.
What i want to have, is that within the part where now i'm searching for the 'CSV', i want the program to read the file (as doing right now) and determine which inputdata (and output in the end) is being used.
For example, 1 file (.IPYNB) contains this script part (input used for a model):
#Dataset 1
df1 = pd.read_csv('Data.csv', sep=';')
#dataset 2
sql_conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=X;DATABASE=X;Trusted_Connection=yes')
query = "SELECT * FROM database.schema.data2"
df2 = pd.read_sql_query(query, sql_conn)
#dataset 3
sql_conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=X;DATABASE=X;Trusted_Connection=yes')
query = "SELECT element1, element2 FROM database.schema.data3"
df3 = pd.read_sql_query(query, sql_conn)
How can i make the program such that it extracts the following facts:
Data.csv
database.schema.data2
database.schema.data3
Anyone a good idea?
Thanks in advance!

How can I rename a downloaded file in a for loop, when files have the same name?

I am using Selenium in Python to download the same file, but with different inputs, each time. So for example, I download data with country selection, "China." In the next iteration, I download the same data, but for country "Brazil."
I am struggling to find easy to understand syntax I can use to rename the downloaded files. The files are currently downloading as "Data.csv" and Data(1).csv." What I want is to have "China-Data.csv" and "Brazil-Data.csv."
The only relevant code I have constructed for this is:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
ChromeOptions=webdriver.ChromeOptions()
driver =webdriver.Chrome('Users/yu/Downloads/chromedriver')
inputcountry.send_keys('China')
inputcountry.send_keys(Keys.RETURN)
I read through this post, but I don't know how to create a forloop that can adapt this to the issue of files having the same name but with numbers at the end. EX: Data(1).csv, Data(2).csv, Data(3).csv
Thanks
Since you know the name of the download file, you can rename as you go. It can be tricky to know when a download completes, so I used a polling method.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
import shutil
download_file = os.path.expanduser("~/Downloads/Data.csv")
save_to_template = os.path.expanduser("~/Documents/Data-{}.csv")
# remove stale files
if os.path.isfile(download_file):
os.remove(download_file)
ChromeOptions=webdriver.ChromeOptions()
driver =webdriver.Chrome('Users/yu/Downloads/chromedriver')
countries = ['China', 'Malaysia', 'Brazil']
for country in countries:
inputcountry.send_keys(country)
inputcountry.send_keys(Keys.RETURN)
# one option is to poll for file showing up.... assuming file
# is renamed when done
for s in range(60): # give it a minute
if os.path.exists(download_file):
shutil.move(download_file, save_to_template.format(country))
break
else:
raise TimeoutError("could not download {}".format(country))
If you know the order of your files (i.e. you know that Data(1) should be named China-Data, Data(2) should be named Brazil-Data, etc.), then you just need to use a list and rename all the files according to it.
import os
directory = 'Users/yu/Downloads/chromedriver/'
correct_names = ['China-Data.csv','Brazil-Data.csv']
def rename_files(directory: str, correct_names: list) -> None:
# change the name of each file in the directory
for i, filename in enumerate(sorted(os.listdir(directory))):
src = directory + filename
dst = directory + correct_names[i]
os.rename(src, dst)
Every time you do inputcountry.send_keys('China'), you can add to the correct_names list whatever input you are giving, like correct_names.append('China-Data.csv').
You may call rename_files at the end with the correct_names list.

How to remove row line numbers from several .doc/.docx files on Linux?

I need to remove row line numbers from a large collection of Word .doc/.docx files as part of a (Python) data processing pipeline.
I am aware of solutions to do this in C# using Word.Interop (e.g. Is it possible to use Microsoft.Office.Interop.Word to programatically remove line numbering from a Word document?) but it would be great to achieve this e.g. using LibreOffice in --headless mode (before evaluating MS Word + wine solutions).
For a single file, with the UI, one can follow https://help.libreoffice.org/Writer/Line_Numbering, but I need to do this for a lot of files, so a macro/script/command line solution to
1) cycle through a set of files
2) remove row numbers and save the result to file
and triggered with e.g. a Python subprocess call would be great, or even with calls to the Python API (https://help.libreoffice.org/Common/Scripting).
To perform line removal for a list of files in the working directory (and put the resulting output into pdfs) run LibreOffice in a Linux command line:
soffice --headless --accept="socket,host=localhost,port=2002;urp;StarOffice.ServiceManager"
and then in the Python interpreter
import uno
import socket
import os
import subprocess
from pythonscript import ScriptContext
from com.sun.star.beans import PropertyValue
# list docfiles in working dir
files = [x for x in os.listdir('.') if x.endswith(".docx")]
# iterate on files
for file in files:
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
ctx = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
smgr = ctx.ServiceManager
desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx)
# open file
model = desktop.loadComponentFromURL(uno.systemPathToFileUrl(os.path.realpath(file)), "_blank", 0, ())
# remove line numbers
model.getLineNumberingProperties().IsOn = False
# prepare to save output to pdf
XSCRIPTCONTEXT = ScriptContext(ctx, None, None)
p = PropertyValue()
p.Name = 'FilterName'
p.Value = 'writer_pdf_Export'
oDoc = XSCRIPTCONTEXT.getDocument()
# create pdf
oDoc.storeToURL("file://" + os.getcwd() + "/" + file + ".pdf", tuple([p]))
This should create pdf files with no line numbering in your working directory.
Useful links:
Add line numbers and export to pdf via macro on OpenOffice forums
LineNumberingProperties documentation
Info on running a macro from the command line

if/else statement - moving files and continuing if file does not exist

I am using python to move DBF files from one folder to multiple folders. These come to me from an S3 bucket and I unzip and move. Sometimes there will be a missing DBF. If that happens I am trying to code so that if the file is not there the script moves to the next file. I figure this would be an if/else statement but I am having trouble with the else part.
import arcpy, os
from arcpy import env
env.workspace = "E:\staging\DT_TABLES"
######Move Clackamas Pro41005.dbf######
in_data = "Pro41005.dbf"
out_data = "D:/DATATRACE/OREGON/OR_TRI COUNTY/Pro41005.dbf"
data_type = ""
if in_data == "Pro41005.dbf":
arcpy.Delete_management(out_data)
arcpy.Copy_management(in_data, out_data, data_type)
print 'Clackamas Moved'
else :
######Move Multnomah Pro41051.dbf######
in_data = "Pro41051.dbf"
out_data = "D:/DATATRACE/OREGON/OR_TRI COUNTY/Pro41051.dbf"
data_type = ""
arcpy.Delete_management(out_data)
arcpy.Copy_management(in_data, out_data, data_type)
print 'Multnomah Moved'
In other words, if Pro41005.dbf was not in the zipped file I like the script to continue to Pro41051.dbf This is two of eight files that I am moving. In time there will be about 20 files.
Your IF statement right now just checked whether the variable has the same filename that you already assigned above. So it will always be true.
It seems that what you need is to check whether a file exists.
import os
...
if os.path.isfile(in_data):

Many-record upload to postgres

I have a series of .csv files with some data, and I want a Python script to open them all, do some preprocessing, and upload the processed data to my postgres database.
I have it mostly complete, but my upload step isn't working. I'm sure it's something simple that I'm missing, but I just can't find it. I'd appreciate any help you can provide.
Here's the code:
import psycopg2
import sys
from os import listdir
from os.path import isfile, join
import csv
import re
import io
try:
con = db_connect("dbname = '[redacted]' user = '[redacted]' password = '[redacted]' host = '[redacted]'")
except:
print("Can't connect to database.")
sys.exit(1)
cur = con.cursor()
upload_file = io.StringIO()
file_list = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for file in file_list:
id_match = re.search(r'.*-(\d+)\.csv', file)
if id_match:
id = id_match.group(1)
file_name = format(id_match.group())
with open(mypath+file_name) as fh:
id_reader = csv.reader(fh)
next(id_reader, None) # Skip the header row
for row in id_reader:
[stuff goes here to get desired values from file]
if upload_file.getvalue() != '': upload_file.write('\n')
upload_file.write('{0}\t{1}\t{2}'.format(id, [val1], [val2]))
print(upload_file.getvalue()) # prints output that looks like I expect it to
# with thousands of rows that seem to have the right values in the right fields
cur.copy_from(upload_file, '[my_table]', sep='\t', columns=('id', 'col_1', 'col_2'))
con.commit()
if con:
con.close()
This runs without error, but a select query in psql still shows no records in the table. What am I missing?
Edit: I ended up giving up and writing it to a temporary file, and then uploading the file. This worked without any trouble...I'd obviously rather not have the temporary file though, so I'm happy to have suggestions if someone sees the problem.
When you write to an io.StringIO (or any other file) object, the file pointer remains at the position of the last character written. So, when you do
f = io.StringIO()
f.write('1\t2\t3\n')
s = f.readline()
the file pointer stays at the end of the file and s contains an empty string.
To read (not getvalue) the contents, you must reposition the file pointer to the beginning, e.g. use seek(0)
upload_file.seek(0)
cur.copy_from(upload_file, '[my_table]', columns = ('id', 'col_1', 'col_2'))
This allows copy_from to read from the beginning and import all the lines in your upload_file.
Don't forget, that you read and keep all the files in your memory, which might work for a single small import, but may become a problem when doing large imports or multiple imports in parallel.

Categories