I am using python to move DBF files from one folder to multiple folders. These come to me from an S3 bucket and I unzip and move. Sometimes there will be a missing DBF. If that happens I am trying to code so that if the file is not there the script moves to the next file. I figure this would be an if/else statement but I am having trouble with the else part.
import arcpy, os
from arcpy import env
env.workspace = "E:\staging\DT_TABLES"
######Move Clackamas Pro41005.dbf######
in_data = "Pro41005.dbf"
out_data = "D:/DATATRACE/OREGON/OR_TRI COUNTY/Pro41005.dbf"
data_type = ""
if in_data == "Pro41005.dbf":
arcpy.Delete_management(out_data)
arcpy.Copy_management(in_data, out_data, data_type)
print 'Clackamas Moved'
else :
######Move Multnomah Pro41051.dbf######
in_data = "Pro41051.dbf"
out_data = "D:/DATATRACE/OREGON/OR_TRI COUNTY/Pro41051.dbf"
data_type = ""
arcpy.Delete_management(out_data)
arcpy.Copy_management(in_data, out_data, data_type)
print 'Multnomah Moved'
In other words, if Pro41005.dbf was not in the zipped file I like the script to continue to Pro41051.dbf This is two of eight files that I am moving. In time there will be about 20 files.
Your IF statement right now just checked whether the variable has the same filename that you already assigned above. So it will always be true.
It seems that what you need is to check whether a file exists.
import os
...
if os.path.isfile(in_data):
Related
I'm writing something in python that needs to know which specific files/programs are open. I've mapped the list of running processes to find the executable paths of the processes I'm looking for. This works for most things, but all Microsoft Office programs run under general processes like WINWORD.exe or EXCEL.exe etc. I've also tried getting a list of open windows and their titles to see what file is being edited, but the window titles are relative paths not absolute paths to the file being edited.
Here's a sample:
import wmi
f = wmi.WMI()
pid_map = {}
PID = 4464 #pid of Microsoft Word
for process in f.Win32_Process():
if not process.Commandline: continue
pid_map[process.ProcessID] = process.Commandline
pid_map[PID]
Outputs:
'"C:\\Program Files\\Microsoft Office\\root\\Office16\\WINWORD.EXE" '
How do I get the path of the file actually being edited?
I figured it out. Here is a function that will return the files being edited.
import pythoncom
def get_office(): # creates doctype: docpath dictionary
context = pythoncom.CreateBindCtx(0)
files = {}
dupl = 1
patt2 = re.compile(r'(?i)(\w:)((\\|\/)+([\w\-\.\(\)\{\}\s]+))+'+r'(\.\w+)') #matches files, can change the latter to only get specific files
#look for path in ROT
for moniker in pythoncom.GetRunningObjectTable():
name = moniker.GetDisplayName(context, None)
checker = re.search(patt2,name)
if checker:
match = checker.group(5) #extension
if match in ('.XLAM','.xlam'): continue #These files aren't useful
try:
files[match[1:]] # check to see if the file type was already documented
match += str(dupl)
dupl += 1
except KeyError:
pass
files[match[1:]] = name #add doctype: doc path pairing to dictionary
I'm trying to make a custom logwatcher of a log folder using python. The objective is simple, finding a regex in the logs and write a line in a text if find it.
The problem is that the script must be running constantly against a folder in where could be multiple log files of unknown names, not a single one, and it should detect the creation of new log files inside the folder on the fly.
I made some kind of tail -f (copying part of the code) in python which is constantly reading a specific log file and write a line in a txt file if regex is found in it, but I don't know how could I do it with a folder instead a single log file, and how can the script detect the creation of new log files inside the folder to read them on the fly.
#!/usr/bin/env python
import time, os, re
from datetime import datetime
# Regex used to match relevant loglines
error_regex = re.compile(r"ERROR:")
start_regex = re.compile(r"INFO: Service started:")
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("log/script-log.txt")
# Function that will work as tail -f for python
def follow(thefile):
thefile.seek(0,2)
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log/service.log")
loglines = follow(logfile)
counter = 0
for line in loglines:
if (error_regex.search(line)):
counter += 1
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + line)
out_file.close()
if (start_regex.search(line)):
sttime = datetime.now().strftime('%Y%m%d_%H:%M:%S - ')
out_file=open(output_filename, "a")
out_file.write(sttime + "SERVICE STARTED\n" + sttime + "Number of errors detected during the startup = {}\n".format(counter))
counter = 0
out_file.close()
You can use watchgod for this purpose. This may be a comment too, not sure if it deserves to be na answer.
my python skills are very limited (to none) and I've never created an automated, sequential request for ArcMap. Below are the steps I'd like to code, any advice would be appreciated.
Locate File folder
Import “first” file (table csv) (there are over 500 cvs, the naming convention is not sequential)
Join csv to HUC08 shapefile
Select data without Null values within under the field name “Name”
Save selected data as a layer file within my FoTX.gdb
Move to the next file within the folder and complete the same action until all actions are complete.
#Part of the code. The rest depends mostly on your data
#Set environment settings
arcpy.env.workspace = 'C:/data' #whatever it is for you. you can do this or not
import os, arcpy, csv
mxd = arcpy.mapping.MapDocument("CURRENT")
folderPath=os.path.dirname(mxd.filePath)
#Loop through each csv file
count = 0
for f_name in os.listdir(folderPath):
fullpath = os.path.join(folderPath, f_name)
if os.path.isfile(fullpath):
if f_name.lower().endswith(".csv"):
#import csv file and join to shape file code here
# Set local variables
in_features = ['SomeNAME.shp', 'SomeOtherNAME.shp'] # if there are more
#then one
out_location = 'C:/output/FoTX.gdb'
# out_location =os.path.basename(gdb.filePath) #or if the gdb is in the
#same folder as the csv
#files
# Execute FeatureClassToGeodatabase
arcpy.FeatureClassToGeodatabase_conversion(in_features, out_location)
if count ==0:
print "No CSV files in this folder"
I have a list contains names of the files.
I want to append content of all the files into the first file, and then copy that file(first file which is appended) to new path.
This is what I have done till now:
This is part of code for appending (I have put a reproducable program in the end of my question please have a look on that:).
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a+') as myappendedfile:
for file in appended:
myappendedfile.write(file)
shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
this one will run successfully and copy successfully but it does not append files it just keep the content of the first file.
I have also tried this link it did not raises error but did not append files. so the same code except instead of using write I used shutil.copyobject
with open(file,'rb') as fd:
shutil.copyfileobj(fd, myappendedfile)
the same thing happend.
Update1
This is the whole code:
Even with the update it still does not append:
import os
import pandas as pd
d = {'Clinic Number':[1,1,1,2,2,3],'date':['2015-05-05','2015-05-05','2015-05-05','2015-05-05','2016-05-05','2017-05-05'],'file':['1a.txt','1b.txt','1c.txt','2.txt','4.txt','5.txt']}
df = pd.DataFrame(data=d)
df.sort_values(['Clinic Number', 'date'], inplace=True)
df['row_number'] = (df.date.ne(df.date.shift()) | df['Clinic Number'].ne(df['Clinic Number'].shift())).cumsum()
import shutil
path= 'C:/Users/sari/Documents/fldr'
out_path_tempappendedfiles='C:/Users/sari/Documents/fldr/temp'
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a') as myappendedfile:
for file in appended:
fd=open(file,'r')
myappendedfile.write('\n'+fd.read())
fd.close()
Shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
Would you please let me know what is the problem?
you can do it like this, and if the size of files are to large to load, you can use readlines as instructed in Python append multiple files in given order to one big file
import os,shutil
file_list=['a.txt', 'a1.txt', 'a2.txt', 'a3.txt']
new_path=
with open(file_list[0], "a") as content_0:
for file_i in file_list[1:]:
f_i=open(file_i,'r')
content_0.write('\n'+f_i.read())
f_i.close()
shutil.copy(file_list[0],new_path)
so this how I resolve it.
that was very silly mistake:| not joining the basic path to it.
I changed it to use shutil.copyobj for the performance purpose, but the problem only resolved with this:
os.path.join(path,file)
before adding this I was actually reading from the file name in the list and not joining the basic path to read from actual file:|
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
print(appended)
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), new_path)
else:
with open(appended[0], "w+") as myappendedfile:
for file in appended:
with open(os.path.join(path,file),'r+') as fd:
shutil.copyfileobj(fd, myappendedfile, 1024*1024*10)
myappendedfile.write('\n')
shutil.copy(appended[0],new_path)
I have a series of .csv files with some data, and I want a Python script to open them all, do some preprocessing, and upload the processed data to my postgres database.
I have it mostly complete, but my upload step isn't working. I'm sure it's something simple that I'm missing, but I just can't find it. I'd appreciate any help you can provide.
Here's the code:
import psycopg2
import sys
from os import listdir
from os.path import isfile, join
import csv
import re
import io
try:
con = db_connect("dbname = '[redacted]' user = '[redacted]' password = '[redacted]' host = '[redacted]'")
except:
print("Can't connect to database.")
sys.exit(1)
cur = con.cursor()
upload_file = io.StringIO()
file_list = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for file in file_list:
id_match = re.search(r'.*-(\d+)\.csv', file)
if id_match:
id = id_match.group(1)
file_name = format(id_match.group())
with open(mypath+file_name) as fh:
id_reader = csv.reader(fh)
next(id_reader, None) # Skip the header row
for row in id_reader:
[stuff goes here to get desired values from file]
if upload_file.getvalue() != '': upload_file.write('\n')
upload_file.write('{0}\t{1}\t{2}'.format(id, [val1], [val2]))
print(upload_file.getvalue()) # prints output that looks like I expect it to
# with thousands of rows that seem to have the right values in the right fields
cur.copy_from(upload_file, '[my_table]', sep='\t', columns=('id', 'col_1', 'col_2'))
con.commit()
if con:
con.close()
This runs without error, but a select query in psql still shows no records in the table. What am I missing?
Edit: I ended up giving up and writing it to a temporary file, and then uploading the file. This worked without any trouble...I'd obviously rather not have the temporary file though, so I'm happy to have suggestions if someone sees the problem.
When you write to an io.StringIO (or any other file) object, the file pointer remains at the position of the last character written. So, when you do
f = io.StringIO()
f.write('1\t2\t3\n')
s = f.readline()
the file pointer stays at the end of the file and s contains an empty string.
To read (not getvalue) the contents, you must reposition the file pointer to the beginning, e.g. use seek(0)
upload_file.seek(0)
cur.copy_from(upload_file, '[my_table]', columns = ('id', 'col_1', 'col_2'))
This allows copy_from to read from the beginning and import all the lines in your upload_file.
Don't forget, that you read and keep all the files in your memory, which might work for a single small import, but may become a problem when doing large imports or multiple imports in parallel.