File modification times reverting after change with os.utime in python - python

I've came across a problem in python 2.7.1 (running on Mac OS X 10.7.5) with the os.utime command
I'm trying to develop a script that downloads files from an FTP that match certain criteria but if the file exists and I already have a copy of it on a local dir then I want to check file modification times. If they don't match then I download the new copy. To achieve this goal I get the FTP file modification time, convert it to timestamp and then use os.utime to change the access and modification dates of the files downloaded to match the FTP server ones.
My problem is that as soon as I get out from the subroutine where I change the access and modification times they revert back to the original ones! I don't have anything running in background and I also tested the script on a linux server with the same results
If you run the code bellow twice it will show the problem in the debug comments as the files in the FTP server didn't change but the timestamps don't match the local ones that were correctly changed.
Thanks in advance for any help in this problem.
import ftplib
import os
from datetime import datetime
def DownloadAndSetTimestamp(local_file,fi,nt):
lf=open(local_file,'wb')
f.retrbinary("RETR " + fi, lf.write, 8*1024)
lf.close
print fi + " downloaded!"
print "-> mtime before change : " + str(os.stat(local_file).st_mtime)
print "-> atime before change : " + str(os.stat(local_file).st_atime)
print "-> FTP value : " + str(int(nt))
#set the modification time the same as server for future comparison
os.utime(local_file,( int(nt) , int(nt) ))
print "-> mtime after change : "+ str(os.stat(local_file).st_mtime)
print "-> atime after change : "+ str(os.stat(local_file).st_atime)
print "Connecting to ftp.ncbi.nih.gov..."
f=ftplib.FTP('ftp.ncbi.nih.gov')
f.login()
f.cwd('/genomes/Bacteria/')
listing=[]
dirs=f.nlst();
print "Connected and Dir list retrieved."
target_bug="Streptococcus_pseudopneumoniae"
print "Searching for :"+ target_bug
ct=0;
Target_dir="test/"
for item in dirs:
if item.find(target_bug)>-1:
print item
#create the dir
if not os.path.isdir(os.path.join(Target_dir,item)):
print "Dir not found. Creating it..."
os.makedirs(os.path.join(Target_dir,item))
#Get the gbk
#1) change the dir
f.cwd(item)
#2) get *.gbk files in dir
files=f.nlst('*.gbk')
for fi in files:
print "----------------------------------------------"
local_file = os.path.join(Target_dir,item,fi)
if os.path.isfile(local_file):
print "################"
print "File " + local_file + " already exists."
#get remote modification time
mt = f.sendcmd('MDTM '+ fi)
#converting to timestamp
nt = datetime.strptime(mt[4:], "%Y%m%d%H%M%S").strftime("%s")
#print "mtime FTP :" + str(int(mt[4:]))
#print "FTP M timestamp : " + str(nt)
#print "Local M timestamp : " + str(os.stat(local_file).st_mtime)
#print "Local A timestamp : " + str(os.stat(local_file).st_atime)
if int(nt)==int(os.stat(local_file).st_mtime):
print fi +" not modified. Download skipped"
else:
print "New version of "+fi
ct+=1
DownloadAndSetTimestamp(local_file,fi,nt)
print "NV Local M timestamp : " + str(os.stat(local_file).st_mtime)
print "NV Local A timestamp : " + str(os.stat(local_file).st_atime)
print "################"
else:
print "################"
print "New file: "+fi
ct+=1
mt = f.sendcmd('MDTM '+ fi)
#converting to timestamp
nt = datetime.strptime(mt[4:], "%Y%m%d%H%M%S").strftime("%s")
DownloadAndSetTimestamp(local_file,fi,nt)
print "################"
f.cwd('..')
f.quit()
print "# of "+target_bug+" new files found and downloaded: " + str(ct)

You're missing the parentheses in lf.close; it should be lf.close().
Without the parentheses, you're effectively not closing the file. Instead, the file is closed a bit later by the garbage collector after your call to os.utime. Since closing a file flushes the outstanding IO buffer contents, modification time will be updated as a side effect, clobbering the value you previously set.

Related

Entity Draw with Understand Python API

I am trying to get each entity to draw a certain type of graph using the Understand Python API. All the inputs are good and the database is opened but the the single item is not drawn. There is no error and no output file. The code is listed below. The Python code is called from a C# application which calls upython.exe with the associated arguments.
The Python file receives the Scitools directory and opens the understand database. The entities are also an argument which is loaded into a temp file. The outputPath references the directory where the SVG file will be placed. I can't seem to figure out why the item.draw method isn't working.
import os
import sys
import argparse
import re
import json
#parse arguments
parser = argparse.ArgumentParser(description = 'Creates butterfly or call by graphs from and Understand DB')
parser.add_argument('PathToSci', type = str, help = "Path to Sci Understand libraries")
parser.add_argument('PathToDB', type = str, help = "Path to the Database you want to create graphs from")
parser.add_argument('PathToOutput', type = str, help='Path to where the graphs should be outputted')
parser.add_argument('TypeOfGraph', type = str,
help="The type of graph you want to generate. Same names as in Understand GUI. IE 'Butterfly' 'Called By' 'Control Flow' ")
parser.add_argument("entities", help='Path to json list file of Entity long names you wish to create graphs for')
args, unknown = parser.parse_known_args()
# they may have entered a path with a space broken into multiple strings
if len(unknown) > 0:
print("Unkown argument entered.\n Note: Individual arguments must be passed as a single string.")
quit()
pathToSci = args.PathToSci
pathToDB = args.PathToDB
graphType = args.TypeOfGraph
entities = json.load(open(args.entities,))
pathToOutput = args.PathToOutput
pathToSci = os.path.join(pathToSci, "Python")
sys.path.append(pathToSci)
import understand
db = understand.open(pathToDB)
count = 0
for name in entities:
count += 1
print("Completed: " + str(count) + "/" + str(len(entities)))
#if it is an empty name don't make a graph
if len(name) == 0:
break
pattern = re.compile((name + '$').replace("\\", "/"))
print("pattern: " + str(pattern))
sys.stdout.flush()
ent = db.lookup(pattern)
print("ent: " + str(ent))
sys.stdout.flush()
print("Type: " + str(type(ent[0])))
sys.stdout.flush()
for item in ent:
try:
filename = os.path.join(pathToOutput, item.longname() + ".svg")
print("Graph Type: " + graphType)
sys.stdout.flush()
print("filename: " + filename)
sys.stdout.flush()
print("Item Kind: " + str(ent[0].kind()))
sys.stdout.flush()
item.draw(graphType, filename)
except understand.UnderstandError:
print("error creating graph")
sys.stdout.flush()
except Exception as e:
print("Could not create graph for " + item.kind().longname() + ": " + item.longname())
sys.stdout.flush()
print(e)
sys.stdout.flush()
db.close()
The output is below:
Completed: 1/1
pattern: re.compile('CSC03.SIN_COS$')
ent: [#lCSC03.SIN_COS#kacsc03.sin_cos(long_float,csc03.sctype)long_float#f./../../../IOSSP/Source_Files/OGP/OGP_71/csc03/csc03.ada]
Type: <class 'understand.Ent'>
Graph Type: Butterfly
filename: C:\Users\M73720\Documents\DFS\DFS-OGP-25-Aug-2022-11-24\SVGs\Entities\CSC03.SIN_COS.svg
Item Kind: Function
It turns out that it was a problem in the Understand API. The latest build corrected the problem. This was found by talking with SciTools Support group.

PyPDF2 - Unable to Get Past. A Large Corrupted File

I am working on checking for corrupted PDF in a file system. In the test I am running, there are almost 200k PDF's. It seems like smaller corrupted files alert correctly, but I ran into a large 15 MB file that's corrupted and the code just hangs indefinitely. I've tried setting Strict to False with no luck. It seems like it's the initial opening that's the problem. Rather than doing threads and setting a timeout (which I have tried in the past to little success), I'm hoping there's an alternative.
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
PyPDF2.PdfFileReader(file,'rb',warndest="c:\test\warning.txt")
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
else:
pass
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()
It looks like there is an issue with PyPDF2. I wasn't able to get it to work, however, I used pdfrw and it did not stop at this point and ran through all couple hundred thousand docs without issue.

Trouble calling EMBOSS program from python

I am having trouble calling an EMBOSS program (which runs via command line) called sixpack through Python.
I run Python via Windows 7, Python version 3.23, Biopython version 1.59, EMBOSS version 6.4.0.4. Sixpack is used to translate a DNA sequence in all six reading frames and creates two files as output; a sequence file identifying ORFs, and a file containing the protein sequences.
There are three required arguments which I can successfully call from command line: (-sequence [input file], -outseq [output sequence file], -outfile [protein sequence file]). I have been using the subprocess module in place of os.system as I have read that it is more powerful and versatile.
The following is my python code, which runs without error but does not produce the desired output files.
from Bio import SeqIO
import re
import os
import subprocess
infile = input('Full path to EXISTING .fasta file would you like to open: ')
outdir = input('NEW Directory to write outfiles to: ')
os.mkdir(outdir)
for record in SeqIO.parse(infile, "fasta"):
print("Translating (6-Frame): " + record.id)
ident=re.sub("\|", "-", record.id)
print (infile)
print ("Old record ID: " + record.id)
print ("New record ID: " + ident)
subprocess.call (['C:\memboss\sixpack.exe', '-sequence ' + infile, '-outseq ' + outdir + ident + '.sixpack', '-outfile ' + outdir + ident + '.format'])
print ("Translation of: " + infile + "\nWritten to: " + outdir + ident)
Found the answer.. I was using the wrong syntax to call subprocess. This is the correct syntax:
subprocess.call (['C:\memboss\sixpack.exe', '-sequence', infile, '-outseq', outdir + ident + '.sixpack', '-outfile', outdir + ident + '.format'])

Inputting the time and comparing it with user input

I'm trying to make a function run within a Python script at a certain time given by the user. To do this I'm using the datetime module.
This is part of the code so far:
import os
import subprocess
import shutil
import datetime
import time
def process():
path = os.getcwd()
outdir = os.getcwd() + '\Output'
if not os.path.exists(outdir):
os.mkdir(outdir, 0777)
for (root, dirs, files) in os.walk(path):
filesArr = []
dirname = os.path.basename(root)
parent_dir = os.path.basename(path)
if parent_dir == dirname:
outfile = os.path.join(outdir, ' ' + dirname + '.pdf')
else:
outfile = os.path.join(outdir, parent_dir + ' ' + dirname + '.pdf')
print " "
print 'Processing: ' + path
for filename in files:
if root == outdir:
continue
if filename.endswith('.pdf'):
full_name = os.path.join(root, filename)
if full_name != outfile:
filesArr.append('"' + full_name + '"')
if filesArr:
cmd = 'pdftk ' + ' '.join(filesArr) + ' cat output "' + outfile + '"'
print " "
print 'Merging: ' + str(filesArr)
print " "
sp = subprocess.Popen(cmd)
print "Finished merging documents successfully."
sp.wait()
return
now = datetime.datetime.now()
hour = str(now.hour)
minute = str(now.minute)
seconds = str(now.second)
time_1 = hour + ":" + minute + ":" + seconds
print "Current time is: " + time_1
while True:
time_input = raw_input("Please enter the time in HH:MM:SS format: ")
try:
selected_time = time.strptime(time_input, "%H:%M:%S")
print "Time selected: " + str(selected_time)
while True:
if (selected_time == time.localtime()):
print "Beginning merging process..."
process()
break
time.sleep(5)
break
except ValueError:
print "The time you entered is incorrect. Try again."
The problem is having is trying to find a way on how to compare the user inputted time with the current time (as in, the current time for when the script is running). Also, how do I keep a python script running and process a function at the given time?
I can see various things to be commented in the code you propose, but main one is on selected_time = selected_hour + ..., because I think you are adding integers with different units. You should maybe start with selected_time = selected_hour * 3600 + ....
Second one is when you try to check the validity of the inputs: you make a while on a check that cannot evolve, as user is not requested to enter another value. Which means these loops will never end.
Then, something about robustness: maybe you should compare the selected time to the current time by something more flexible, i.e. replacing == by >= or with some delta.
Last thing, you can make the Python script wait with the following command:
import time
time.sleep(some_duration)
where some_duration is a float, meant in seconds.
Could you please check if this works now?
First of all I'd suggest you look at: http://docs.python.org/library/time.html#time.strptime
which might prove to be helpful in your situation when trying to validate the time.
You can something like this:
import time
import time
while True: #Infinite loop
time_input = raw_input("Please enter the time in HH:MM:SS format: ")
try:
current_date = time.strftime("%Y %m %d")
my_time = time.strptime("%s %s" % (current_date, time_input),
"%Y %m %d %H:%M:%S")
break #this will stop the loop
except ValueError:
print "The time you entered is incorrect. Try again."
Now you can do stuff with my_time like comparing it: my_time == time.localtime()
The simplest way to make the program running until 'it's time is up' is as follows:
import time
while True:
if (my_time <= time.localtime()):
print "Running process"
process()
break
time.sleep(1) #Sleep for 1 second
The above example is by no means the best solution, but in my opinion the easiest to implement.
Also I'd suggest you use http://docs.python.org/library/subprocess.html#subprocess.check_call to execute commands whenever possible.

Python: File Writing Adding Unintentional Newlines on Linux Only

I am using Python 2.7.9. I'm working on a program that is supposed to produce the following output in a .csv file per loop:
URL,number
Here's the main loop of the code I'm using:
csvlist = open(listfile,'w')
f = open(list, "r")
def hasQuality(item):
for quality in qualities:
if quality in item:
return True
return False
for line in f:
line = line.split('\n')
line = line[0]
# print line
itemname = urllib.unquote(line).decode('utf8')
# print itemhash
if hasQuality(itemname):
try:
looptime = time.time()
url = baseUrl + line
results = json.loads(urlopen(url).read())
# status = results.status_code
content = results
if 'median_price' in content:
medianstr = str(content['median_price']).replace('$','')
medianstr = medianstr.replace('.','')
median = float(medianstr)
volume = content['volume']
print url+'\n'+itemname
print 'Median: $'+medianstr
print 'Volume: '+str(volume)
if (median > minprice) and (volume > minvol):
csvlist.write(line + ',' + medianstr + '\n')
print '+ADDED TO LIST'
else:
print 'No median price given for '+itemname+'.\nGiving up on item.'
print "Finished loop in " + str(round(time.time() - looptime,3)) + " seconds."
except ValueError:
print "we blacklisted fool?? cause we skippin beats"
else:
print itemname+'is a commodity.\nGiving up on item.'
csvlist.close()
f.close()
print "Finished script in " + str(round(time.time() - runtime, 3)) + " seconds."
It should be generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29,4202
But it's actually generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29
,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29
,4202
Whenever it is ran on a Windows machine, I have no issue. Whenever I run it on my EC2 instance, however, it adds that extra newline. Any ideas why? Running commands on the file like
awk 'NR%2{printf $0" ";next;}1' output.csv
do not do anything. I have transferred it to my Windows machine and it still reads the same. However, when I paste the output into Steam's chat client it concatenates it in the way that I want.
Thanks in advance!
This is where the problem occurs
code:
csvlist.write(line + ',' + medianstr + '\n')
This can be cleared is you strip the space
modified code:
csvlist.write(line.strip() + ',' + medianstr + '\n')
Problem:
The problem is due to the fact you are reading raw lines from the input file
Raw_lines contain \n to indicate there is a new line for every line which is not the last and for the last line it just ends with the given character .
for more details:
Just type print(repr(line)) before writing and see the output

Categories