Python get most recent file in a directory with certain extension - python

I'm trying to use the newest file in the 'upload' directory with '.log' extension to be processed by Python. I use a Ubuntu web server and file upload is done by a html script. The uploaded file is processed by a Python script and results are written to a MySQL database. I used this answer for my code.
import glob
newest = max(glob.iglob('upload/*.log'), key=os.path.getctime)
print newest
f = open(newest,'r')
But this is not getting the newest file in the directory, instead it gets the oldest one. Why?

The problem is that the logical inverse of max is min:
newest = max(glob.iglob('upload/*.log'), key=os.path.getctime)
For your purposes should be:
newest = min(glob.iglob('upload/*.log'), key=os.path.getctime)

In many newer programs, it is preferred to use pathlib for this very common task:
from pathlib import Path
XLSX_DIR = Path('../../somedir/')
XLSX_PATTERN = r'someprefix*.xlsx'
latest_file = max(XLSX_DIR.glob(XLSX_PATTERN), key=lambda f: f.stat().st_ctime)

Related

Open last saved CSV file via python

I have a file to be downloaded in CSV format every day that it's name changes daily. how can i open that file via python in Excel after being downloaded automatically?
I've tried the below solution but i have to mention the file name in full which I can't do because it changes dynamically.
from subprocess import Popen
p = Popen('filename.csv', shell=True)
At the end of your code that downloads the file You can find latest file in your download folder as and open it
import glob
import os
from subprocess import Popen
list_of_files = glob.glob('/path_to_download_folder/*.csv')
latest_csv_file = max(list_of_files, key=os.path.getctime)
print(latest_csv_file)
#then do what ever you want to do with it.
os.startfile(latest_csv_file)

How to find location of downloaded files on Heroku

I am using YouTube-dl with Python and Flask to download youtube videos and return them using the send_file function.
When running locally I have been using to get the file path:
username = getpass.getuser()
directories = os.listdir(rf'C:\\Users\\{username}')
I then download the video with YouTube-dl:
youtube_dl.YoutubeDL().download([link])
I then search the directory for the file based on the video code:
files = [file for file in directories]
code = link.split('v=')[1]
for file in files:
if file.endswith('.mp4') is True:
try:
code_section = file.split('-')[1].split('.mp4')[0]
if code in code_section:
return send_file(rf'C:\\Users\\{username}\\{file}')
except:
continue
Finally, I return the file:
return send_file(rf'C:\\Users\\{username}\\{file}')
to find the location of the downloaded file, but, on Heroku, this doesn't work - simply the directory doesn't exist. How would I find where the file is downloaded? Is there a function I can call? Or is there a set path it would go to?
Or alternatively, is there a way to set the download location with YouTube-dl?
Since heroku is running Linux and not windows, you could attempt to download your files to your current working directory and then just send it from there.
The main tweak would be setting up some options in your YoutubeDL app:
import os
opts = {
"outtmpl": f"{os.getcwd()}/(title)s.(ext)s"
}
youtube_dl.YoutubeDL(opts).download([link])
That will download the file to your current working directory.
Then you can just upload it from your working directory using return send_file(file).

Archiving Files Using Python Apart from Latest File

I am trying to archive existing file apart from the latest modified file in Python or FME. I have managed to get it to point where I can get python pick up the latest modified file but any ideas on how I can archive all the files i have in my folder apart from the last modified file?
Thank You
You can solve your problem using this snippet of code:
import glob
import os
import zipfile
files_dir = r'C:\Users\..\files' # here should be path to directory with your files
files = glob.glob(files_dir + '\*')
# find all files that located in specified directory
files_modify_dt = [os.path.getmtime(file) for file in files]
# take files except last modified file
files_to_zip = [file for _, file in sorted(zip(files_modify_dt, files))][:-1]
# zip of selected files
with zipfile.ZipFile(os.path.join(files_dir, 'archive.zip'), 'w', zipfile.ZIP_DEFLATED) as zip_obj:
for file in files_to_zip:
zip_obj.write(file, os.path.basename(file))
os.remove(file)

TarFile.extractall() processes fine but doesn't create any new files or directories

Version: Python 2.7
OS: MacOS Mojave
IDE: Pycharm Community 2019.2
I'm having trouble downloading tar.gz files from pypi.org/project and unzipping them. The use case for this is that we can't use any actual package management, so we have to manually put packages where we need them in the local folders. My solution to this is to read a requirements file and pull the tar.gz and .zip files for the given versions and then write to local files.
I've got it working for zip files, and it works exactly how I want it to. Zip files are handled by the 'elif' statement, tar files are handled by the 'if' statement. For some reason, for the tar file links I pass in, no directories are created, no files are extracted.
I did a test by putting the path to my local copy of a tar file directly in this line tarData = tarfile.open(fileobj=zipData, mode='r:*') instead of zipData and it worked, so I think it has something to do with the format the file is being either downloaded in or handled in StringIO, but I'm not getting any exceptions so I have no way to narrow down what the issue could be.
The below is the slice of code I'm using to unzip these files.
import os
import bs4 as bs
import urllib2
import re
from StringIO import StringIO
import zipfile
import tarfile
import requests
def install_packages_locally(compressed_link, directory):
file_name = '_'.join(compressed_link.split('/')[-1].split('-')[:-1])
file_ext = compressed_link.split('-')[-1]
response = requests.get(compressed_link, stream=True)
if file_name in os.listdir(directory):
print('Skipping %s' % file_name)
return True
zipData = StringIO()
if file_ext.endswith('.tar.gz'):
zipData.write(response.raw.read())
print('writing %s to %s' % (file_name, directory))
tarData = tarfile.open(fileobj=zipData, mode='r:*')
tarData.extractall(path=directory)
tarData.close()
zipData.close()
return True
elif file_ext.endswith('.zip'):
zipData.write(response.content)
print('writing %s to %s' % (file_name, directory))
unzipData = zipfile.ZipFile(zipData)
unzipData.extractall(path=directory)
unzipData.close()
zipData.close()
return True
Answered here : https://www.reddit.com/r/learnpython/comments/dix37i/tarfileextractall_processes_fine_but_doesnt/f3zq1j4?utm_source=share&utm_medium=web2x
I found another example that included a call to BytesIO.seek(0) to
reset the file pointer to the start:
If fileobj is specified, it is used as an alternative to a file object
opened in binary mode for name. It is supposed to be at position 0.
https://docs.python.org/3/library/tarfile.html
So you should add a call to zipData.seek(0) before calling
tarfile.open.

python download images are not saved to the correct directory

When i use python 2.7 to download images from a website, the code as follows:
pic = requests.get(src[0])
f = open("pic\\"+str(i) + '.jpg', "wb")
f.write(pic.content)
f.close()
i += 1
I want to save the picture into pic directory, but I find that images is saved in the same directory with the name like pic\1.jpg. Is this a bug?
In Windows, it's right, but on Ubuntu, it's an error!
Windows uses backslashes for file paths, but Ubuntu uses forward slashes. This is why your save path with a backslash doesn't work on Ubuntu.
You probably want to use os.path.join to make your path OS agnostic:
import os
path = os.path.join('pic', '{}.jpg'.format(str(i)))
f = open(path, 'wb)
...
import os
f = open(os.sep.join(['pic', str(i), '.jpg']), 'wb')
Now the line should be os agnostic

Categories