Automate Script, Path folder name changes - python

I am in the process of making an automated script. Basically I am downloading some shapefiles, unzipping them and them making a few changes each month. Each month I download the same dataset.
An issue I have found is that the dataset name changes each month after I download it, I'm not sure how i can point the script too it if the name changes? I don't really want to have to update the script with the new file path each month.
For example November was
L:\load\Ten\Nov20\NSW\FME_68185551_1604301077137_7108\GSNSWDataset
And Dec is
L:\load\Ten\Dec20\NSW\FME_68185551_1606880934716_1252\GSNSWDataset

You could use glob with a wildcard in the changing number section. Something like:
import glob
import datetime
d = datetime.today().strftime('%b%y') #'Dec20'
fil = glob.glob("L:/load/Ten/%s/NSW/FME*/GSNSWDataset" % (d))[0]
This should get you the correct path to your files and then you just read/manipulate however you need.

Related

How to deal in python with the xml files preceded by the TIME prefix in successive runs of SUMO

I want to get the results of a SUMO successive runs in CSV format directly by using python script (not by using the xml2csv tools and cmd). Due to the TIME prefix comes before the XML file, I don't know how to deal with this part of the code.
Here we want the run to show the results separately by using the time:
sumoCmd = [sumoBinary, "-c", "test4.sumocfg", "--tripinfo-output", "tripinfo.xml", "--output-prefix", " TIME"].
And here is where I must put the proper XML file name which is my question:
tree = ET.parse("myfile.xml")
Any help would be appreciated.
Best, Ali
You can just find the file using glob e.g.:
import glob
tripinfos = glob.glob("*tripinfo.xml")
To get the latest you can use sorted:
latest = sorted(tripinfos)[-1]
tree = ET.parse(latest)

How to get latest folder path from S3 using python

I have multiple s3 file paths which contain the folder name as date. I want to extract the latest path from S3 using python and boto3 based on the date.
For Example- Below are the few paths I have under my root folder(s3:///all/stage/servicenow/service-mgmt/sm_task/raw/)
Sample Paths -
s3://my-bucket/all/stage/pqr/xyz/abc/raw/2020/12/11/10/20/file.parquet
s3://my-bucket/all/stage/pqr/xyz/abc/raw/2020/12/11/11/12/file.parquet
s3://my-bucket/all/stage/pqr/xyz/abc/raw/2020/12/11/12/01/file.parquet
s3://my-bucket/all/stage/pqr/xyz/abc/raw/2020/12/12/11/10/file.parquet
all the above paths are in s3:///all/stage/pqr/xyz/abc/raw/YYYY/MM/DD/HH/mm/file.parquet format
So I need the latest timestamp path under root path (s3:///all/stage/pqr/xyz/abc/raw/) which is s3:///all/stage/pqr/xyz/abc/raw/2020/12/12/11/10/file.parquet.
How can i achieve this using python and Boto3.
Any help will be appreciated as I am new in python
Please comment if the question is not clear
from os import path
is one way to check the file
using function
os.path.splitext(root,date)
and just use your own algorithm to check weather or not your file time is the newest

Do I always need to specify my current working directory in Python?

I am working through the renameDates project in Automate the Boring stuff. It's supposed to match dates formated the American way and change them to the European format.
The thing I don't understand about this code is, how does it find the correct directory?
I can't find any code that sets the current working directory to the file I need it to work in, but the script seems written assuming the default current working directory is where it should work.
Is it something simple like, running the script from the file I want to search for regular expressions in will make Python set that file as the CWD?
#! python3
# renameDates.py - renames filenames with American MM-DD-YYYY date format
# to European DD-MM-YYYY.
import shutil, os, re
# Create a regex that matches files with the American date format.
datePattern=re.compile(r"""^(.*?) # all text before the date
((0|1)?\d)- # one or two digits for the month
((0|1|2|3)?\d)- # on or two digits for the day
((19|20)\d\d) #four digits for the year
(.*?)$ # all text after the date
""", re.VERBOSE)
# loop over the files in the working directory.
for amerFilename in os.listdir('.'):
mo=datePattern.search(amerFilename)
# Skip files without a date.
if mo==none:
continue
# Get the different parts of the filename.
beforePart=mo.group(1)
monthPart=mo.group(2)
dayPart=mo.group(4)
yearPart=mo.group(6)
afterPart=mo.group(8)
# Form the European-style filename.
euroFilename=beforePart+dayPart+'-'+monthPart+'-'+yearPart+afterPart
# Get the full, absolute file paths.
absWorkingDir=os.path.abspath('.')
amerFilename=os.path.join(absWorkingDir, amerFilename)
euroFilename=os.path.join(absWorkingDir, euroFilename)
# Rename the files.
print('Renaming "%s" to "%s%...' % (amerFilename, euroFilename))
#shutil.move(amerFilename,euroFilename) #uncomment after testing
Hey do you run your code from Terminal or interpreter? It uses the current working directory. Normaly it's the directory where you have been before starting the script / python interpreter... You can check your current working directory with this code... Hope this helps you:
import os
print(os.getcwd())
You can change the working directory with:
os.chdir(path)
you can create a txt file with a date as a name like 11-11-2011 in your working directory and when you run the program you'll see the name has changed

How to change modified time for folder?

I am taking zip file as input which contains multiple files and folders,I am extracting it and then I want to change the last modified time of each content in zip to some new date and time set by user.
I am using os.utime() to change the date and time, but changes get reflected only to the files and not to the folders inside zip.
timeInStr = raw_input("Enter the new time =format: dd-mm-yyyy HH:MM:SS -")
timeInDt=datetime.datetime.strptime(timeInStr, '%d-%m-%Y %H:%M:%S')
timeInTS=mktime(timeInDt.timetuple())
epochTime=(datetime.datetime(timeInDt.year, timeInDt.month, timeInDt.day, timeInDt.hour, timeInDt.minute, timeInDt.second)-datetime.datetime(1970,1,1)).total_seconds()
z=zp.ZipFile(inputZipFile,"a",zp.ZIP_DEFLATED)
for files in z.infolist():
z.extract(files, srcFolderName)
fileName=files.filename
new= fileName.replace('/',os.path.sep)
correctName= srcFolderName+os.path.sep+new
print correctName
if(correctName.endswith(os.path.sep)):
correc=correctName[:-1]
print correc
os.utime(correc, (timeInTS, timeInTS))
else:
os.utime(correctName, (timeInTS, timeInTS))
I am using Python 2.7 as platform
Base to the directory permission is this question on SO. The directory only changes its timestamp when the directory itself changes for ex: when you create a new file in it. So to update the timestamp of folder you can create a temp file and then delete it. There should be a better way but till you find it you can manage using this.
I ran into a similar problem. Here is the code I used to get past the issue.
As user966588 stated, the directory's timestamp is updating as the directory changes.
In the post I linked, I held onto any directory metadata updates until after my directory was fully-populated in order for the timestamp change to stay.

Can python figure out the most up to date directory?

I have six directories the follow the format
\home\mydir\myproject\2012-01-23_03-01-34
\home\mydir\myproject\2012-01-11_01-00-57
\home\mydir\myproject\2010-01-11_01-00-57
\home\mydir\myproject\2010-01-11_01-00-54
\home\mydir\myproject\2010-01-08_01-00-54
Note, the datetime as the final directory. It is exactly this format and it is meant to indicate the time the directory it was created Now they all cotain the file name myfile.xml. I want to parse out the latest and greatest myfile.xml. Does python have any magic where it can tell the latest (i.e. most up to date directory) from the name format of the directory I am using? If it does not, does it have any magic where it can tell by the file timestamps who is the most up to date? The OS is windows?
Another way of looking at this is that the most up to date directory will also have the highest number.
Thanks.
If you have those directory names in a list dirs, then max(dirs) will give you the latest.
For getting OS information as to the age of the files see http://docs.python.org/release/2.5.2/lib/module-stat.html - if you really need the "most up to date", and there's a chance the files in the directories could be modified, and so considered more up-to-date than files in directories with later names, going by what the OS says is more robust. If only the creation age given by the folder is relevant then #Greg Hewgill has you covered.

Categories