Concatenate last 100 files in an only one

Concatenate last 100 files in an only one - python

Begginer in Python needs a bit of help. I am using Python 2.7.
I want to make a program that concatenates the last 100 files I have in a folder. In that folder I have lots of files but I only want the concatenation of the last 100 ones. I am able to do the concatenation of all of them (if I don´t specify number and change the for loop), but I am not able to select the last 100 files. These files are saved in binary by the software.They are saved in the folder specified below. I would like to remove that 100 files once are concatenated in teh new one.The program I have done is the following:
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
filename=glob.glob("*-*-*.*.*-*.*.*.jpk-force")
filename.sort(key=os.path.getmtime)
for filename in range(0,99):
filename=open(filename,"rb")
tout=filename.read()+\r\n"
rout = rout+tout
os.remove(filename)
filename.close()
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
It doesn´t do anything and the error is the following:
Traceback (most recent call last):
File "C:\AFM_test\jpk_files\AFM_test.py", line 12, in <module>
filename = open(filename,"rb")
TypeError: coercing to Unicode: need string or buffer, int found
[Finished in 0.1s]
I guess the problem is the loop and its structure "range(0,99)",as when I have concatenated all the files contained in the folder:
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
filename=glob.glob("*-*-*.*.*-*.*.*.jpk-force")
for filename in files:
filename=open(filename,"rb")
tout=filename.read()+\r\n"
rout = rout+tout
os.remove(filename)
filename.close()
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
it worked okay except the remove order, which showed this error:
Traceback (most recent call last):
File "C:\try\AFM_test_2.py", line 17, in <module>
os.remove(filename)
must be string, not file
Any ideas how can I achieve my goal?
I hope I have explained myself properly. Maybe I have missed something important, sorry, I am just a beginner in this field.
Thank you.

TypeError: coercing to Unicode: need string or buffer, int found
That is because filename is an integer and then you are trying to concatenate it with a string.
os.remove(filename)
must be string, not file
That is because you are re-assigning the variable filename (which was a string path) to a file handle/object. os.remove(..) expects the variable from the for-loop, not the result of open(..). Its generally a good practice to give meaningful names to variables – filepath and infile etc.
A better approach would be:
def processFile(filepath):
with open(filepath) as f:
content = f.read()
os.remove(filepath)
return content
def main():
paths = glob.glob("..*..*..")
last100paths = paths[-100:]
with open(outFilePath, "w") as f:
f.write("\r\n".join(processFile(path) for path in last100paths))

You need to change:
filename=open(filename,"rb")
...to something like:
inf = open(filename, "rb")
...
inf.close()
Then, when you're calling os.remove(filename), it will still be the filename from the original loop, not a file object that your code is reassigning to this variable.
Note: rather than doing this explicit opening and closing of files, though, try using the with statement (see this helpful guide).

Checn if glob is matching patterns
pattern = r"*-*-*.*.*-*.*.*.jpk-force"
filenames=glob.glob(pattern)
if not filenames:
print 'no files matched ', pattern
sys.exit(1)
Get mtime sorted file list by building list of tuples each containing file name and mtime
filenames = [ (filename,os.stat(filename)[8]) for filename in filenames ]
sort the list with mtime in descending order
filenames.sort(key=lambda x:x[1],reverse=True)
The above two lines can be simplified as;
filenames = [ filename for filename in sorted(filenames,key=os.path.getmtime,reverse=True) ]
The above line can be refactored, because we can sort in place
filenames.sort(key=os.path.getmtime,reverse=True)
#!/usr/bin/python
import os
import glob
os.chdir("C:\AFM_test\jpk_files")
rout=""
pattern = r"*-*-*.*.*-*.*.*.jpk-force"
filenames=glob.glob(pattern)
if not filenames:
print 'no files matched ', pattern
sys.exit(1)
filenames.sort(key=os.path.getmtime,reverse=True)
for filename in filenames[:100]
filecontent=open(filename,"rb")
tout=filecontent.read()+"\r\n"
filecontent.close()
rout = rout+tout
os.remove(filename)
fout = open("output.jpk-force","wb+")
fout.write(rout)
fout.close()
You didn't check for exceptions.

Related

Python: Unicode characters in file or folder names

We process a lot of files where path can contain an extended character set like this:
F:\Site Section\Cieślik
My Python scripts fail to open such files or chdir to such folders whatever I try.
Here is an extract from my code:
import zipfile36 as zipfile
import os
from pathlib import Path
outfile = open("F:/zip_pdf3.log", "w", encoding="utf-8")
with open('F:/zip_pdf.txt') as f: # Input file list - note the forward slashes!
for line in f:
print (line)
path, filename = os.path.split(line)
file_no_ext = os.path.splitext(os.path.basename(line))[0]
try:
os.chdir(path) # Go to the file path
except Exception as exception:
print (exception, file = outfile) #3.7
print (exception)
continue
I tried the following:
Converting path to a raw string
raw_string = r"{}".format(path)
try:
os.chdir(raw_string)
Converting a string to Path
Ppath = Path(path)
try:
os.chdir(Ppath.decode("utf8"))
Out of ideas... Anyone knows how to work with Unicode file and folder names? Using Python 3.7 or higher on Windows.
Could be as simple as that - thanks #SergeBallesta:
with open('F:/pdf_err.txt', encoding="utf-8") as f:
I may post updates after more runs with different input.
This, however, leads to a slightly different question: if, instead of reading from the file, I walk over folders and files with extended character set - how do I deal with those, i.e.
for subdir, dirs, files in os.walk(rootdir): ?
At present I'm getting either a "The filename, directory name, or volume label syntax is incorrect" or "Can't open the file".

Rename files in multiple directories

I have files named the same in multiple directories. I wanted to change their names, so they would correspond to the unique id of the directory that they are in.
'*' represents unique identifier, like '067' for example
The filename is always 'NoAdapter_len25.truncated_sorted.fastq'
I wanted the filename in each directory to be '*NoAdapter_len25.truncated_sorted.fastq', where * stands for the unique identifier
Here is the the error I'm getting:
Traceback (most recent call last):
File "change_names.py", line 19, in <module>
rename(name, new_name)
TypeError: Can't convert '_io.TextIOWrapper' object to str implicitly
Here's the code that produces it:
from glob import glob
import re
from os import rename
#path = "/home/users/screening/results_Sample_*_hg38_hg19/N*"
files = glob(path)
for f in files:
with open(f) as name:
sample_id = f.partition('results_')[-1].rpartition('hg38_hg19')[0]
#print(sample_id)
back = f[-38:]
new_name = sample_id + back
rename(name, new_name)

You have a few problems:
You're opening a file for no apparent reason (it confirms the file exists and is readable at open time, but even with an open handle, the name could be moved or deleted between that and the rename, so you aren't preventing any race conditions)
You're passing the opened file object to os.rename, but os.rename takes str, not file-like objects
You're doing a lot of "magic" manipulations of the path, instead of using appropriate os.path functions
Try this to simplify the code. I included some inline comments when I'm doing what your example does, but it doesn't make a lot of sense (or it's poor form):
for path in files: # path, not f; f is usually placeholder for file-like object
filedir, filename = os.path.split(path)
parentdir = os.path.dirname(filedir)
# Strip parentdir name to get 'Sample_*_' per provided code; is this what you wanted?
# Question text seems like you only wanted the '*' part.
sample_id = parentdir.replace('results_', '').replace('hg38_hg19', '')
# Large magic numbers are code smell; if the name is a fixed name,
# just use it directly as a string literal
# If the name should be "whatever the file is named", use filename unsliced
# If you absolutely need a fixed length (to allow reruns or something)
# you might do keepnamelen = len('NoAdapter_len25.truncated_sorted.fastq')
# outside the loop, and do f[-keepnamelen:] inside the loop so it's not
# just a largish magic number
back = filename[-38:]
new_name = sample_id + back
new_path = os.path.join(filedir, new_name)
rename(path, new_path)

You feed rename a file (name) and a filename, it needs two filenames. To get from a file to its filename, you can do this
old_filename = os.path.abspath(name.name)

python change file names to unicode chars Hindi

I am trying to change the filenames to unicode which i am getting from a file reading line by line. When i try to rename the files, then i gets the error here. Here is the code
import codecs
import os
arrayname = []
arrayfile = []
f = codecs.open('E:\\songs.txt', encoding='utf-8', mode='r+')
for line in f:
arrayname.append(line)
for filename in os.listdir("F:\\songs"):
if filename.endswith(".mp3"):
arrayfile.append(filename)
for num in range(0,len(arrayname)):
print "F:\\songs\\" + arrayfile[num]
os.rename("F:\\songs\\" + arrayfile[num], "F:\\songs\\" + (arrayname[num]))
I am getting this error
Traceback (most recent call last):
File "C:\read.py", line 25, in <module>
os.rename("F:\\songs\\" + arrayfile[num], "F:\\songs\\" + (arrayname[num]))
WindowsError: [Error 123] The filename, directory name, or volume label syntax is in
correct
How can change the name of the files ?

You are forgetting to remove the newline character from the end of your lines. Remove it with str.rstrip():
for line in f:
arrayname.append(line.rstrip('\n'))
You can simplify your code somewhat, and use best practices to ensure the file is closed. I'd use the newer (and better engineered) io.open() rather than codecs.open(). If you use Unicode literals for paths, Python will ensure you get Unicode filenames when listing:
import io
import os
import glob
directory = u"F:\\songs"
songs = glob.glob(os.path.join(directory, u"*.mp3"))
with io.open('E:\\songs.txt', encoding='utf-8') as newnames:
for old, new in zip(songs, newnames):
oldpath = os.path.join(directory, old)
newpath = os.path.join(directory, new.rstrip('\n'))
print oldpath
os.rename(oldpath, newpath)
I used the glob module to filter out matching filenames.

replacing strings in files from list in python

So I need to find all files with certain extension in this case .txt. Then I must open all these files and change certain string with another string... and here i'm stuck.
here is my code:
import os, os.path
find_files=[]
for root, dirs, files in os.walk("C:\\Users\\Kevin\\Desktop\\python programi"):
for f in files:
fullpath = os.path.join(root, f)
if os.path.splitext(fullpath)[1] == '.txt':
find_files.append(fullpath)
for i in find_files:
file = open(i,'r+')
contents = file.write()
replaced_contents = contents.replace('xy', 'djla')
print i
ERROR mesage:
line 12, in <module>
contents = file.write()
TypeError: function takes exactly 1 argument (0 given)
i know that there is misssing an argument but which argument should I use?
i think it would be better if i change the code from for i in find files: down
any advice?

I think you mean to use file.read() rather than file.write()

Not sure if you're just trying to print out the changes or if you want to actually rewrite them into the file, in which case you could just do this:
for i in find_files:
replaced_contents = ""
contents = ""
with open(i, "r") as file:
contents = file.read()
replaced_contents = contents.replace('xy', 'djla')
with open(i, "w") as file:
file.write(replaced_contents)
print i

Python File Concatenation

I have a data folder, with subfolders for each subject that ran through a program. So, for example, in the data folder, there are folders for Bob, Fred, and Tom. Each one of those folders contains a variety of files and subfolders. However, I am only interested in the 'summary.log' file contained in each subject's folder.
I want to concatenate the 'summary.log' file from Bob, Fred, and Tom into a single log file in the data folder. In addition, I want to add a column to each log file that will list the subject number.
Is this possible to do in Python? Or is there an easier way to do it? I have tried a number of different batches of code, but none of them get the job done. For example,
#!/usr/bin/python
import sys, string, glob, os
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
outfile = open('summary.log','w');
for x in fls:
file=open(x,'r');
data=file.read();
file.close();
outfile.write(data);
outfile.close();
Gives me the error,
Traceback (most recent call last):
File "fileconcat.py", line 8, in <module>
file=open(x,'r');
IOError: [Errno 21] Is a directory
I think this has to do with the fact that the data folder contains subfolders, but I don't know how to work around it. I also tried this, but to no avail:
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
destination = open('summary.log', 'wb')
for filename in iglob(os.path.join(PATH, '*.log'))
shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()
This gives me an "invalid syntax" error at the "for filename" line, but I'm not sure what to change.

The syntax is not related to the use of glob.
You forget the ":" at the end of the for statement:
for filename in iglob(os.path.join(PATH, '*.log')):
^--- missing
But the following pattern works :
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/*.log'
destination = open('summary.log', 'wb')
for filename in iglob(PATH):
shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()

The colon (:) is missing in the for line.
Besides you should use with because it handles closing the file (your code is not exception safe).
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
with open('summary.log', 'wb') as destination:
for filename in iglob(os.path.join(PATH, '*.log')):
with open(filename, 'rb') as in_:
shutil.copyfileobj(in_, destination)

In your first example:
import sys, string, glob, os
you are not using sys, string or os, so there is no need to import those.
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
here, you are selecting the subject folders. Since you are interested in summary.log files within these folders, you may change the pattern as follows:
fls = glob.glob('/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/summary.log')
In Python, there is no need to end lines with semicolons.
outfile = open('summary.log','w')
for x in fls:
file = open(x, 'r')
data = file.read()
file.close()
outfile.write(data)
outfile.close()

As VGE's answer shows, your second solution works once you've fixed the syntax error. But note that a more general solution is to use os.walk:
>>> import os
>>> for i in os.walk('foo'):
... print i
...
('foo', ['bar', 'baz'], ['oof.txt'])
('foo/bar', [], ['rab.txt'])
('foo/baz', [], ['zab.txt'])
This goes through all the directories in the tree above the start directory and maintains a nice separation between directories and files.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenate last 100 files in an only one - python

Related

Python: Unicode characters in file or folder names

Rename files in multiple directories

python change file names to unicode chars Hindi

replacing strings in files from list in python

Python File Concatenation

Categories

Resources