Reading csv file with partially variable name - python

I want to read a csv file into a data frame from a certain folder with pandas. This folder contains several csv files. They contain different information.
df = pd.read_csv(r'C:\User\Username\Desktop\Statistic\12345678_Reference.csv')
The first part in the filename (1 - 8 is variable). I want to read it in the file which ends with '_Reference.csv', but I have no clue how to manage it. I googled, but could not find a solution if there are more than one csv file in the same folder.

If you import os, then you can use functions for for navigating the file system.
os.listdir(path) will return a list of all of the file names in a directory.
[f for f in os.listdir(path) if f.endswith("Reference.csv")]
Will return a list of all files names ending with "Reference.csv". In your scenario, it sounds like there will be only one item in the list.
So, [f for f in os.listdir(path) if f.endswith("Reference.csv")][0] would return the filename that you're looking for.
Then you can construct a path using the filename, and feed it to pd.read_csv().

Related

get List of recently added .csv Files into the directory using python

I have a output files folder, where all the files get dumped, i need to check into the folder every five mins and pick up all the list of recently added files by using python.
One way of doing this is using sets, and get the non intersected files, is there any other better approach?
much appreciate the code snippet of it.
Thanks
To solve this, you can make use of the particular method listdir() from the os module and sleep() from the time module.
import os
from time import sleep
path = "/path/to/folder/with/csv/files"
with open("log.txt", "a+") as log_file:
while True:
log_file.seek(0)
existing = [f.strip() for f in log_file]
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
if len(csvs) > 0:
print(f"Found {len(csvs)} new file(s):")
for f in csvs:
print(f)
print("\n")
else:
print("Found 0 new files.")
log_file.writelines([f"{f}\n" for f in csvs])
sleep(300)
We will be storing the existing file names in a .txt file. You could use a .json file or any other file type you like. Firstly, we open the file using with/open (in append/read mode) and get a list of the file names that have previously been stored in the text file. We then get a list of all of the .csv files in that directory that are not in the file:
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
os.listdir() is lists all of the files and folders in the current working directory.
The following if/else statement is simply for output purposes and is not required. It is only saying: if new csv files were found, print how many and the names of each. If none were found, print that zero were found.
All that's left to do is write the newly discovered file names into the .txt file so that on the next iteration, they will be marked as existing and not new:
log_file.writelines([f"{f}\n" for f in csvs])
The final line, sleep(300), makes the program wait 300 seconds, or 5 minutes, to iterate again.

read two files from two paths at within same loop_python

I have two paths each path contains many files, each file contains data for one day. I need to read the file from the first path, and the other file from the other path that corresponds to the same day. [in the same python loop I want to read the file of the first day in each path]. The files are with the same name and sequence in each path.
I'm trying to use os.listdir(path) in for loop instead of with open (file) as file because I want to read the file as a data frame using pandas. Then use pandas to do data aggregation for each file.
I assume, you are sure, that a file always exists in both directories.
Is this what you are asking?
path1 = "path1"
path2 = "path2"
for fname in os.listdir(path1):
fname1 = os.path.join(path1, fname)
fname2 = os.path.join(path2, fname)
# do your processing here
If it can happen, that a file exists in path1 but not in path 2 you have to check with os.path.isfile() the presence of the other file before opening and skip if absent

Search for multiple files by name and copy to a new folder

I have been trying to write some python code in order to get each line from a .txt file and search for a file with that name in a folder and its subfolders. After this I want to copy that file in a preset destination folder.
The thing is when I test this code I can read all the files in the .txt and I can display all files in a directory and its subdirectories. The problem rises when I have to compare the filename I read from the .txt (line by line as I said) with all the filenames within the directory folder and then copy the file there.
Any ideas what am I doing wrong?
import os, shutil
def main():
dst = '/Users/jorjis/Desktop/new'
f = open('/Users/jorjis/Desktop/articles.txt', 'rb')
lines = [line[:-1] for line in f]
for files in os.walk("/Users/jorjis/Desktop/folder/"):
for line in lines:
if line == files:
shutil.copy('/dir/file.ext', '/new/dir')
You are comparing the file names from the text file with a tuple with three elements: the root path of the currently visited folder, a list of all subdirectory names in that path, and a list of all file names in that path. Comparing a string with a tuple will never be true. You have to compare each file name with the set of file names to copy. The data type set comes in handy here.
Opening a file together with the with statement ensures that it is closed when the control flow leaves the with block.
The code might look like this:
import os
import shutil
def main():
destination = '/Users/jorjis/Desktop/new'
with open('/Users/jorjis/Desktop/articles.txt', 'r') as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for root, _, filenames in os.walk('/Users/jorjis/Desktop/folder/'):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(os.path.join(root, filename), destination)
If I had to guess, I would say that the files in the .txt contain the entire path. You'd need to add a little more to os.walk to match up completely.
for root, _, files in os.walk("/Users/jorjis/Desktop/folder/"):
for f in files:
new_path = f + root
if new_path in lines:
shutil.copy(new_path, `/some_new_dir')
Then again, I'm not sure what the .txt file looks like so it might be that your original way works. If that's the case, take a closer look at the lines = ... line.

Walking sub directories in Python and saving to same sub directory

First of all thanks for reading this. I am a little stuck with sub directory walking (then saving) in Python. My code below is able to walk through each sub directory in turn and process a file to search for certain strings, I then generate an xlsx file (using xlsxwriter) and post my search data to an Excel.
I have two problems...
The first problem I have is that I want to process a text file in each directory, but the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt (would I use glob here?)
The second problem is that when I open/create an Excel I would like to save the file to the same sub directory where the .txt file has been found and processed. Currently my Excel is saving to the python script directory, and consequently gets overwritten each time a new sub directory is opened and processed. Would it be wiser to save the Excel at the end to the sub directory or can it be created with the current sub directory path from the start?
Here's my partially working code...
for root, subFolders, files in os.walk(dir_path):
if 'Textfile.txt' in files:
with open(os.path.join(root, 'Textfile.txt'), 'r') as f:
#f = open(file, "r")
searchlines = f.readlines()
searchstringsFilter1 = ['Filter Used :']
searchstringsFilter0 = ['Filter Used : 0']
timestampline = None
timestamp = None
f.close()
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Excel.xlsx', {'strings_to_numbers': True})
worksheetFilter = workbook.add_worksheet("Filter")
Thanks again for looking at this problem.
MikG
I will not solve your code completely, but here are hints:
the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt
you can list all files in directory, then check file extension
for filename in files:
if filename.endswith('.txt'):
# do stuff
Also when creating woorkbook, can you enter path? You have root, right? Why not use it?
You don't want glob because you already have a list of files in the files variable. So, filter it to find all the text files:
import fnmatch
txt_files = filter(lambda fn: fnmatch.fnmatch(fn, '*.txt'), files)
To save the file in the same subdirectory:
outfile = os.path.join(root, 'someoutfile.txt')

Concatenating fasta files from different folders

I have a large numbers of fasta files (these are just text files) in different subfolders. What I need is a way to search through the directories for files that have the same name and concatenate these into a file with the name of the input files. I can't do this manually as I have 10000+ genes that I need to do this for.
So far I have the following Python code that looks through one of the directories and then uses those file names to search through the other directories. This returns a list that has the full path for each file.
import os
from os.path import join, abspath
path = '/directoryforfilelist/' #Directory for source list
listing = os.listdir(path)
for x in listing:
for root, dirs, files in os.walk('/rootdirectorytosearch/'):
if x in files:
pathlist = abspath(join(root,x))
Where I am stuck is how to concatenate the files it returns that have the same name. The results from this script look like this.
/directory1/file1.fasta
/directory2/file1.fasta
/directory3/file1.fasta
/directory1/file2.fasta
/directory2/file2.fasta
/directory3/file2.fasta
In this case I would need the end result to be two files named file1.fasta and file2.fasta that contain the text from each of the same named files.
Any leads on where to go from here would be appreciated. While I did this part in Python anyway that gets the job done is fine with me. This is being run on a Mac if that matters.
Not tested, but here's roughly what I'd do:
from itertools import groupby
import os
def conc_by_name(names):
for tail, group in groupby(names, key=os.path.split):
with open(tail, 'w') as out:
for name in group:
with open(name) as f:
out.writelines(f)
This will create the files (file1.fasta and file2.fasta in your example) in the current folder.
For each file of your list, allocate the target file in append mode, read each line of your source file and write it to the target file.
Assuming that the target folder is empty to start with, and is not in /rootdirectorytosearch.

Categories