Filenames:
File1: new_data_20100101.csv
File2: samples_20100101.csv
timestamp is always = %Y%m%din the filename after a _ and before .csv
I want to find the files where there is a data and a samplesfile and then do something with those files:
My Code so far:
for all_files in os.listdir():
if all_files.__contains__("data_"):
dataList.append(all_files.split('_')[2])
if all_files.__contains__("samples_"):
samplesList.append(all_files.split('_')[1])
that gives me the filenames cut down to the Timestamp and the extension .csv
Now I would like to try something like this
for day in dataList:
if day in sampleList:
open day as csv.....
I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csvdoes not exist because it's new_data_2010010.csv
I'm kinda unsure on how to use the os.basename so I would appreciated some advice on the data names.
thanks
You could instead use the glob module to get your list. This allows you to filter just your CSV files.
The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. A list comprehension creates a list of tuples holding each matching pair:
import glob
import os
csv_files = glob.glob('*.csv')
data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]
for sample_file, data_file in sorted(matching_pairs):
print('{} <-> {}'.format(sample_file, data_file))
For your two file example, this would display the following:
samples_20100101.csv <-> new_data_20100101.csv
Related
I've got 2 folders, each with a different CSV file inside (both have the same format):
I've written some python code to search within the "C:/Users/Documents" directory for CSV files which begin with the word "File"
import glob, os
inputfile = []
for root, dirs, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
inputfile.append([os.path.join(root, datafile)])
print(inputfile)
That almost worked as it returns:
[['C:/Users/Documents/Test A\\File 1.csv'], ['C:/Users/Documents/Test B\\File 2.csv']]
Is there any way I can get it to return this instead (no sub list and shows / instead of \):
['C:/Users/Documents/Test A/File 1.csv', 'C:/Users/Documents/Test B/File 2.csv']
The idea is so I can then read both CSV files at once later, but I believe I need to get the list in the format above first.
okay, I will paste an option here.
I made use of os.path.abspath to get the the path before join.
Have a look and see if it works.
import os
filelist = []
for folder, subfolders, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
filePath = os.path.abspath(os.path.join(folder, datafile))
filelist.append(filePath)
filelist
Result:
['C:/Users/Documents/Test A/File 1.csv','C:/Users/Documents/Test B/File 2.csv']
I am trying to write a python 3.6 script that will add key/value pairs from a folder tree dictionary to a csv file. Files in the folder three are the keys and their paths are the values.
There seems to be an error in how I am iterating through the dictionary because in the csv file I only get the key/value pairs from one of the folders, and not the entire folder tree. I just don't see where my error is. Here is my code:
import os
import csv
root_dir = '.'
for root, dirs, files in os.walk (root_dir, topdown='true'):
folder_dict = {filename:root for filename in files}
print (folder_dict)
with open ('test.csv', 'w') as csvfile:
for key in folder_dict:
csvfile.write ('%, %s\n'% (key, folder_dict [key]))
I get the dictionary but in the csv file there are only the key/value pairs for one item.
Because of the line folder_dict = {filename:root for filename in files}, you overwrite the data on each loop, leaving the last dictionary as the only thing for the later write to the CSV.
You don't really need this interim data structure at all. Just write the CSV as you discover files to write. You weren't actually using the CSV module, so I added it to the solution.
import os
import csv
root_dir = '.'
with open ('test.csv', 'w') as fileobj:
csvfile = csv.writer(fileobj)
for root, dirs, files in os.walk (root_dir, topdown='true'):
csvfile.writerows((filename, root) for filename in files)
I have about 500 '.csv' files starting with letter 'T' e.g. 'T50, T51, T52 ..... T550' and there are some other ',csv' files with other random names in the folder. I want to read all csv files starting with "T" and store them in separate dataframes: 't50, t51, t52... etc.'
The code I have written just reads these files into a dataframe
import glob
import pandas as pd
for file in glob.glob("T*.csv"):
print (file)
I want to have a different name for each dataframe - preferably, their own file names. How can I achieve this within its 'for loop'?
Totally agree with #Comos
But if you still need individual variable names, I adapted the solution from here!
import pandas as pd
import os
folder = '/path/to/my/inputfolder'
filelist = [file for file in os.listdir(folder) if file.startswith('T')]
for file in filelist:
exec("%s = pd.read_csv('%s')" % (file.split('.')[0], os.path.join(folder,file)))
In additions to ABotros's answer, to read all files in different dataframes, I would recommend adding the files to a dictionary, which will allow you to save dataframes with different names in a loop:
filelist = [file for file in os.listdir(folder) if file.startswith('T')]
database = {}
for file in filelist:
database[file] = pd.read_csv(file)
I have a folder with .exp files. They're basically .csv files but with a .exp extension (just the format of files exported from the instrument). I know because changing .exp to .csv still allows to open them in Excel as csv files. Example here: https://uowmailedu-my.sharepoint.com/personal/tonyd_uow_edu_au/Documents/LAB/MC-ICPMS%20solution/Dump%20data%20here?csf=1
In Python, I want to read the data from each file into data frames (one for each file). I've tried the following code, but it makes the list dfs with all the files and:
(i) I don't know how to access the content of list dfs and turn it into several data frames
(ii) it looks like the columns in the original .exp files were lost.
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
import glob
import pandas as pd
# get data file names
filenames = glob.glob(path + "/*.csv")
dfs = []
for filename in filenames:
dfs.append(pd.read_csv(filename))
do you guys have any ideas how I could read these files into data frames, so I can easily access the content?
I found this post: Storing csv file's contents into data Frames [Python Pandas] but not too helpful in my case.
thanks
I would recommend you switch to using an absolute path to your folder. Also it is safer to use os.path.join() when combining file parts (better than string concatenation).
To make things easier to understand, I suggest rather than just creating a list of dataframes, that you create a list of tuples containing the filename and the dataframe, that way you will know which is which.
In your code, you are currently searching for csv files not exp files.
The following creates the list of dataframes, each entry also stores the corresponding filename. At end end it cycles through all of the entries and displays the data.
Lastly, it shows you how you would for example display just the first entry.
import pandas as pd
import glob
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
# get data file names
dfs = []
for filename in glob.glob(os.path.join(path, "*.exp")):
dfs.append((filename, pd.read_csv(filename)))
print "Found {} exp files".format(len(dfs))
# display each of your dataframes
for filename, df in dfs:
print filename
print df
# To display just the first entry:
print "Filename:", df[0][0]
print df[0][1]
I have a lot of files in a specific directory and I want to rename all files with the extension type .txt after the file creation date and add a counter prefix. By the way, I'm using python on windows.
Example:
Lets say I have the files aaa.txt, bbb.txt, and ccc.txt.
aaa.txt is the newest file and ccc.txt ist the oldest created file.
I want to rename the files that way:
999_aaa.txt, 998_bbb.txt, 997_ccc.txt ...
The counter should start with 999_newest file (I will never have more than 300 txt file).
Like you can see I just want to give the newest file the highest number (sorted by creation date).
How would you do this?
Have a look at this untested code:
import os
import glob
import shutil
# get a list of all txt files
fnames = glob.glob("*.txt")
# sort according to time of last modification/creation (os-dependent)
# reverse: newer files first
fnames.sort(key=lambda x: os.stat(x).st_ctime, reverse=True)
# rename files, choose pattern as you like
for i, fname in enumerate(fnames):
shutil.move(fname, "%03d_%s" % (999-i, fname))
For reference:
http://docs.python.org/3.1/library/glob.html#glob.glob
http://docs.python.org/2/library/os.html#os.stat
http://docs.python.org/2/library/shutil.html#shutil.move
http://docs.python.org/2/library/functions.html#enumerate
http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types