Read multiple csv files in a folder - python

I have multiple .csv files that represents a serious of measurements maiden.
I need to plot them in order to compare proceeding alterations.
I basically want to create a function with it I can read the file into a list and replay several of the "data cleaning in each .csv file" Then plot them all together in a happy graph
this is a task I need to do to analyze some results. I intend to make this in python/pandas as I might need to integrate into a bigger picture in the future but for now, this is it.
I basically want to create a function with it I can read the file into a big picture comparing it Graph.
I also have one file that represents background noise. I want to remove these values from the others .csv as well
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
PATH = r'C:\Users\UserName\Documents\FSC\Folder_name'
FileNames = [os.listdir(PATH)]
for files in FileNames:
df = pd.read_csv(PATH + file, index_col = 0)
I expected to read every file and store into this List but I got this code:
FileNotFoundError: [Errno 2] File b'C:\Users\UserName\Documents\FSC\FolderNameFileName.csv' does not exist: b'C:\Users\UserName\Documents\FSC\FolderNameFileName.csv'

Have you used pathlib from the standard library? it makes working with the file system a breeze,
recommend reading : https://realpython.com/python-pathlib/
try:
from pathlib import Path
files = Path('/your/path/here/').glob('*.csv') # get all csvs in your dir.
for file in files:
df = pd.read_csv(file,index_col = 0)
# your plots.

Related

How do I save each iteration as my file format without overwriting the previous iteration?

I am new to coding. I basically have a bunch of files in "nifti" format, I wanted to simply load them, apply a thresholding function to them and then save them. I was able to write the few lines of code to do it to one file (it worked), but I have many so I created another python file and tried to make a for loop. I think it does everything fine but the last step for saving my files just keeps overwriting so in the end I only get one output file.
import numpy as np
import nibabel as nb
import glob
import os
path= 'subjects'
all_files=glob.glob(path + '/*.nii')
for filename in all_files:
image=nb.load(filename)
data=image.get_fdata()
data [data<0.1]=0
new_image=nb.Nifti1Image(data, affine=image.affine, header=image.header)
nb.save(new_image,filename+1)

Apply a written code to all CSV files across different folders using Python

I have a wide range of CSV files that give me the solar energy produced by several systems on a daily basis. Each CSV file corresponds to one day in the year, for one particular site (I have 12 sites).
My goal is to develop a code that reads all CSV files (located across different folders), extracts the daily produced solar energy for every specific day and site, stores the values in a dataframe, and finally exports the dataframe collecting all daily produced solar energy across all sites on a new Excel file.
So far I have written the code to extract the values of all CSV files stored within the same folder, which gives me the solar energy produced for all days for which a CSV file exists in that folder:
import csv
import pandas as pd
import numpy as np
import glob
path = r"C:\Users\XX\Documents\XX\DataLogs\NameofSite\CSV\2020\02\CSV\*.csv"
Monthly_PV=[]
for fname in glob.glob(path):
df=pd.read_csv(fname, header=7, decimal=',')
kWh_produced=df["kWh"]
daily_start=kWh_produced.iloc[0]
daily_end=kWh_produced.iloc[-1]
DailyPV=daily_end-daily_start
Monthly_PV.append(DailyPV)
print(Monthly_PV)
MonthlyTotal=sum(Monthly_PV)
Monthly_PV=pd.DataFrame(Monthly_PV)
print(MonthlyTotal)
Monthly_PV.to_excel(r"C:\Users\XXX\Documents\XXX\DataLogs\NameofSite\CSV\2020\02\CSV\Summary.xlsx")
I get the result I want: a list in which each value corresponds to the daily produced solar energy of each CSV in this one given folder located on the folder I called "path". My aim is to add something to this code so that the developed code is applied to CSV files located in previous folders to the one listed here as well, or to parallel folders within the same bigger folder.
Any tips will be much appreciated.
Thanks!
You can add an extra for loop to handle a list of paths
import numpy as np
import glob
paths = [r"C:\Users\XX\Documents\XX\DataLogs\NameofSite\CSV\2020\02\CSV\*.csv",
r"C:\Foo\*.csv",
r"..\..Bar]*.csv"]
Monthly_PV=[]
for path in paths:
for fname in glob.glob(path):
df=pd.read_csv(fname, header=7, decimal=',')
kWh_produced=df["kWh"]
daily_start=kWh_produced.iloc[0]
daily_end=kWh_produced.iloc[-1]
DailyPV=daily_end-daily_start
Monthly_PV.append(DailyPV)
print(Monthly_PV)
MonthlyTotal=sum(Monthly_PV)
Monthly_PV=pd.DataFrame(Monthly_PV)
print(MonthlyTotal)
Monthly_PV.to_excel(r"C:\Users\XXX\Documents\XXX\DataLogs\NameofSite\CSV\2020\02\CSV\Summary.xlsx")
If you do not want to hardcode a list of directories in your program, maybe try something based on this?
def get_input_directories(depth: int, base_directory: str) -> typing.DefaultDict[str, typing.List[DeltaFile]]:
"""
Build a dict with keys that are directories, and values that are lists of filenames.
Does not include blocking uninteresting tracks.
"""
result: typing.DefaultDict[str, typing.List[DeltaFile]] = collections.defaultdict(list)
os.chdir(base_directory)
try:
for root, directories, filenames in os.walk('.'):
if root.count('/') != depth:
# We only want to deal with /band/album (for EG, with depth==2) in root
continue
assert not directories, f"root is {root}, directories is {directories}"
for filename in filenames:
if appropriate_extension(filename) and not hidden(filename):
result[root].append(DeltaFile(filename))
finally:
os.chdir('..')
return result
You can safely remove the type annotations if you don't want them.

Loading Multiple Data files from same folder in Python

I am trying to load a large number of data files from the same folder in Python. The ultimate goal here is to simply choose which file I would like to use in calculations, rather than individually opening files.
Here is what I have. This seems to work in opening the data in the files, but I am having a hard time choosing a specific file I want to work with (and assigning a value to each column in each file).
import astropy
import numpy as np
import matplotlib.pyplot as plt
dir = '/S34_east_tfa/'
import glob, os
os.chdir(dir)
for file in glob.glob("*.data"):
data = np.loadtxt(file)
print (data)
Time = data[:,0]
Use a python dictionary, instead of overwriting the results in data variable inside your loop.
data_dict = dict()
for file in glob.glob("*.data"):
data_dict[file] = np.loadtxt(file)
Is this what you were looking for?

Python - Pandas Concatenate Multiple Text Files Within Multiple Zip Files

I am having problems getting txt files located in zipped files to load/concatenate using pandas. There are many examples on here with pd.concat(zip_file.open) but still not getting anything to work in my case since I have more than one zip file and multiple txt files in each.
For example, Lets say I have TWO Zipped files in a specific folder "Main". Each zipped file contains FIVE txt files each. I want to read all of these txt files and pd.concat them all together. In my real world example I will have dozens of zip folders with each containing five txt files.
Can you help please?
Folder and File Structure for Example:
'C:/User/Example/Main'
TAG_001.zip
sample001_1.txt
sample001_2.txt
sample001_3.txt
sample001_4.txt
sample001_5.txt
TAG_002.zip
sample002_1.txt
sample002_2.txt
sample002_3.txt
sample002_4.txt
sample002_5.txt
I started like this but everything after this is throwing errors:
import os
import glob
import pandas as pd
import zipfile
path = 'C:/User/Example/Main'
ziplist = glob.glob(os.path.join(path, "*TAG*.zip"))
This isn't efficient but it should give you some idea of how it might be done.
import os
import zipfile
import pandas as pd
frames = {}
BASE_DIR = 'C:/User/Example/Main'
_, _, zip_filenames = list(os.walk(BASE_DIR))[0]
for zip_filename in zip_filenames:
with zipfile.ZipFile(os.path.join(BASE_DIR, zip_filename)) as zip_:
for filename in zip_.namelist():
with zip_.open(filename) as file_:
new_frame = pd.read_csv(file_, sep='\t')
frame = frames.get(filename)
if frame is not None:
pd.concat([frame, new_frame])
else:
frames[filename] = new_frame
#once all frames have been concatenated loop over the dict and write them back out
depending on how much data there is you will have to design a solution that balances processing power/memory/disk space. This solution could potentially use up a lot of memory.

Using Pandas read_table with list of files

I am pretty new to Python in general, but am trying to make a script that takes data from certain files in a folder and puts it into an Excel spreadsheet.
The code I have will find the file type that I want in my specified folder, and then make a list with the full file paths.
import os
file_paths = []
for folder, subs, files in os.walk('C://Users/Dir'):
for filename in files:
if filename.endswith(".log") or filename.endswith(".txt"):
file_paths.append(os.path.abspath(os.path.join(folder,filename)))
It will also take a specific file path, pull data from the correct column, and put it into excel in the correct cells.
import pandas as pd
import numpy
for i in range(len(file_paths)):
fields = ['RDCR']
data = pd.read_table(file_paths[i], sep= "\s+", names = fields, usecols=[3],
Where I am having trouble is making the read_table iterate through my list of files and put the data into an excel sheet where every time it reads a new file it moves over one column in the spreadsheet.
Ideally, the for loop would see how long the file_paths list is, and use that as the range. It would then use the file_paths[i] to input the file names into the read_table one by one.
What happens is that it finds the length of file_paths, and instead of iterating through the files in it one by one, it just inputs the data from the last file on the list.
Any help would be much appreciated! Thank you!
Try to concatenate all of them at once and write to excel one time.
from glob import glob
import pandas as pd
files = glob('C://Users/Dir/*.log') + glob('C://Users/Dir/*.txt')
def read_file(f):
fields = ['RDCR']
return pd.read_table(
f, sep="\s+",
names=fields, usecols=[3])
df = pd.concat([read_file(f) for f in files], axis=1).to_excel('out.xlsx')

Categories