Rename files and sort them after creation date - python

I have a lot of files in a specific directory and I want to rename all files with the extension type .txt after the file creation date and add a counter prefix. By the way, I'm using python on windows.
Example:
Lets say I have the files aaa.txt, bbb.txt, and ccc.txt.
aaa.txt is the newest file and ccc.txt ist the oldest created file.
I want to rename the files that way:
999_aaa.txt, 998_bbb.txt, 997_ccc.txt ...
The counter should start with 999_newest file (I will never have more than 300 txt file).
Like you can see I just want to give the newest file the highest number (sorted by creation date).
How would you do this?

Have a look at this untested code:
import os
import glob
import shutil
# get a list of all txt files
fnames = glob.glob("*.txt")
# sort according to time of last modification/creation (os-dependent)
# reverse: newer files first
fnames.sort(key=lambda x: os.stat(x).st_ctime, reverse=True)
# rename files, choose pattern as you like
for i, fname in enumerate(fnames):
shutil.move(fname, "%03d_%s" % (999-i, fname))
For reference:
http://docs.python.org/3.1/library/glob.html#glob.glob
http://docs.python.org/2/library/os.html#os.stat
http://docs.python.org/2/library/shutil.html#shutil.move
http://docs.python.org/2/library/functions.html#enumerate
http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types

Related

Creating data frames within Pandas from 2 most recent .csv within a directory

I have a directory that is consistently having .csv files appended (1 or 2 every 30 min).
My pandas script merges and cleans two of the latest .csv within the dir (two paths are currently added manually) and then saves a .csv of their differences within a different dir.
However to mitigate the current manual process I would like to obtain the paths of the 2 most recent csv's and assign them to the left df and right df for the initial merge?
It would be preferable to sort the dir by date created and then use an index the assign most recent (in this case [0], [1])
I have tried modifying the snippet below however this only yields the latest .csv
from pathlib import Path
left_path = '/home/user/some_folder/csv1'
files = Path(left_path).glob('*.csv')
latest_left = max(files, key=lambda f: f.stat().st_mtime)
right_path = '/home/user/some_folder/csv2'
files = Path(right_path).glob('*.csv')
latest_right = max(files, key=lambda f: f.stat().st_mtime)
Thanks for the help!
You were almost there!
if you make a list of the files in your directory and then sort those by creation time you can access the last two entries in the list:
files = list(Path(path).glob('*.csv'))
files.sort(key=lambda f: f.stat().st_mtime)
csv1 = files[-1]
csv2 = files[-2]
Try this,
import os
from pathlib import Path
paths = sorted(Path('/home/ryan/Data/plan_import_full').iterdir(), key=os.path.getmtime)
file0 = paths[-1] #last file
file1 = paths[-2] #2nd last file

How can i copy and move all Folder which are older than 30 days to a _History folder?

Structure: 20170410.1207.te <- Date (2017 04 10 , 12:07)
There is a company folder that contains several folders. All folders with the above structure which are older than 30 days should be moved to the folder _History (basically archiving them), but at least 5 should be left no matter which timestamp.
As a time value, the string must be taken from the folder name to be converted as a date and compared to today's date minus 30 days.
I also want to create a log file that logs when which folders were moved at what location.
The Code below just shows me the filename, can somebody help me please?
import os
import shutil
for subdir, dirs, files in os.walk("C:\Python-Script\Spielwiese"):
for file in files:
print(os.path.join(file))
shutil.move("C:\Python-Script\Spielwiese\", "C:\Python-Script\Spielwiese2")
The following code will return a list of all files in a given timeframe, sorted by create time on windows. Depending on how you want to filter, I can give you more information. You can than work on the resulting list. One more thing is, that you should use pathlib for windows filepaths, to not run into problems with german paths and unicode escapes in your pathname.
import os
import shutil
found_files = []
for subdir, dirs, files in os.walk("C:\Python-Script\Spielwiese"):
for file in files:
name = os.path.join(file)
create_date = os.path.getctime(file)
if create_date > some_time: # Put the timeframe here
found_files.append((name, create_date))
found_files.sort(key=lambda tup: tup[1]) # Sort the files according to creation time

Errors with Glob while outputting file names

I am combining two questions here because they are related to each other.
Question 1: I am trying to use glob to open all the files in a folder but it is giving me "Syntax Error". I am using Python 3.xx. Has the syntax changed for Python 3.xx?
Error Message:
File "multiple_files.py", line 29
files = glob.glob(/src/xyz/rte/folder/)
SyntaxError: invalid syntax
Code:
import csv
import os
import glob
from pandas import DataFrame, read_csv
#extracting
files = glob.glob(/src/xyz/rte/folder/)
for fle in files:
with open (fle) as f:
print("output" + fle)
f_read.close()
Question 2: I want to read input files, append "output" to the names and print out the names of the files. How can I do that?
Example: Input file name would be - xyz.csv and the code should print output_xyz.csv .
Your help is appreciated.
Your first problem is that strings, including pathnames, need to be in quotes. This:
files = glob.glob(/src/xyz/rte/folder/)
… is trying to divide a bunch of variables together, but the leftmost and rightmost divisions are missing operands, so you've confused the parser. What you want is this:
files = glob.glob('/src/xyz/rte/folder/')
Your next problem is that this glob pattern doesn't have any globs in it, so the only thing it's going to match is the directory itself.
That's perfectly legal, but kind of useless.
And then you try to open each match as a text file. Which you can't do with a directory, hence the IsADirectoryError.
The answer here is less obvious, because it's not clear what you want.
Maybe you just wanted all of the files in that directory? In that case, you don't want glob.glob, you want listdir (or maybe scandir): os.listdir('/src/xyz/rte/folder/').
Maybe you wanted all of the files in that directory or any of its subdirectories? In that case, you could do it with rglob, but os.walk is probably clearer.
Maybe you did want all the files in that directory that match some pattern, so glob.glob is right—but in that case, you need to specify what that pattern is. For example, if you wanted all .csv files, that would be glob.glob('/src/xyz/rte/folder/*.csv').
Finally, you say "I want to read input files, append "output" to the names and print out the names of the files". Why do you want to read the files if you're not doing anything with the contents? You can do that, of course, but it seems pretty wasteful. If you just want to print out the filenames with output appended, that's easy:
for filename in os.listdir('/src/xyz/rte/folder/'):
print('output'+filename)
This works in http://pyfiddle.io:
Doku: https://docs.python.org/3/library/glob.html
import csv
import os
import glob
# create some files
for n in ["a","b","c","d"]:
with open('{}.txt'.format(n),"w") as f:
f.write(n)
print("\nFiles before")
# get all files
files = glob.glob("./*.*")
for fle in files:
print(fle) # print file
path,fileName = os.path.split(fle) # split name from path
# open file for read and second one for write with modified name
with open (fle) as f,open('{}{}output_{}'.format(path,os.sep, fileName),"w") as w:
content = f.read() # read all
w.write(content.upper()) # write all modified
# check files afterwards
print("\nFiles after")
files = glob.glob("./*.*") # pattern for all files
for fle in files:
print(fle)
Output:
Files before
./d.txt
./main.py
./c.txt
./b.txt
./a.txt
Files after
./d.txt
./output_c.txt
./output_d.txt
./main.py
./output_main.py
./c.txt
./b.txt
./output_b.txt
./a.txt
./output_a.txt
I am on windows and would use os.walk (Doku) instead.
for d,subdirs,files in os.walk("./"): # deconstruct returned aktDir, all subdirs, files
print("AktDir:", d)
print("Subdirs:", subdirs)
print("Files:", files)
Output:
AktDir: ./
Subdirs: []
Files: ['d.txt', 'output_c.txt', 'output_d.txt', 'main.py', 'output_main.py',
'c.txt', 'b.txt', 'output_b.txt', 'a.txt', 'output_a.txt']
It also recurses into subdirs.

Undo files.split after matching Filename (python 3.x)

Filenames:
File1: new_data_20100101.csv
File2: samples_20100101.csv
timestamp is always = %Y%m%din the filename after a _ and before .csv
I want to find the files where there is a data and a samplesfile and then do something with those files:
My Code so far:
for all_files in os.listdir():
if all_files.__contains__("data_"):
dataList.append(all_files.split('_')[2])
if all_files.__contains__("samples_"):
samplesList.append(all_files.split('_')[1])
that gives me the filenames cut down to the Timestamp and the extension .csv
Now I would like to try something like this
for day in dataList:
if day in sampleList:
open day as csv.....
I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csvdoes not exist because it's new_data_2010010.csv
I'm kinda unsure on how to use the os.basename so I would appreciated some advice on the data names.
thanks
You could instead use the glob module to get your list. This allows you to filter just your CSV files.
The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. A list comprehension creates a list of tuples holding each matching pair:
import glob
import os
csv_files = glob.glob('*.csv')
data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]
for sample_file, data_file in sorted(matching_pairs):
print('{} <-> {}'.format(sample_file, data_file))
For your two file example, this would display the following:
samples_20100101.csv <-> new_data_20100101.csv

Concatenating fasta files from different folders

I have a large numbers of fasta files (these are just text files) in different subfolders. What I need is a way to search through the directories for files that have the same name and concatenate these into a file with the name of the input files. I can't do this manually as I have 10000+ genes that I need to do this for.
So far I have the following Python code that looks through one of the directories and then uses those file names to search through the other directories. This returns a list that has the full path for each file.
import os
from os.path import join, abspath
path = '/directoryforfilelist/' #Directory for source list
listing = os.listdir(path)
for x in listing:
for root, dirs, files in os.walk('/rootdirectorytosearch/'):
if x in files:
pathlist = abspath(join(root,x))
Where I am stuck is how to concatenate the files it returns that have the same name. The results from this script look like this.
/directory1/file1.fasta
/directory2/file1.fasta
/directory3/file1.fasta
/directory1/file2.fasta
/directory2/file2.fasta
/directory3/file2.fasta
In this case I would need the end result to be two files named file1.fasta and file2.fasta that contain the text from each of the same named files.
Any leads on where to go from here would be appreciated. While I did this part in Python anyway that gets the job done is fine with me. This is being run on a Mac if that matters.
Not tested, but here's roughly what I'd do:
from itertools import groupby
import os
def conc_by_name(names):
for tail, group in groupby(names, key=os.path.split):
with open(tail, 'w') as out:
for name in group:
with open(name) as f:
out.writelines(f)
This will create the files (file1.fasta and file2.fasta in your example) in the current folder.
For each file of your list, allocate the target file in append mode, read each line of your source file and write it to the target file.
Assuming that the target folder is empty to start with, and is not in /rootdirectorytosearch.

Categories