Loading Multiple Data files from same folder in Python

Loading Multiple Data files from same folder in Python - python

I am trying to load a large number of data files from the same folder in Python. The ultimate goal here is to simply choose which file I would like to use in calculations, rather than individually opening files.
Here is what I have. This seems to work in opening the data in the files, but I am having a hard time choosing a specific file I want to work with (and assigning a value to each column in each file).
import astropy
import numpy as np
import matplotlib.pyplot as plt
dir = '/S34_east_tfa/'
import glob, os
os.chdir(dir)
for file in glob.glob("*.data"):
data = np.loadtxt(file)
print (data)
Time = data[:,0]

Use a python dictionary, instead of overwriting the results in data variable inside your loop.
data_dict = dict()
for file in glob.glob("*.data"):
data_dict[file] = np.loadtxt(file)
Is this what you were looking for?

Related

How do I save each iteration as my file format without overwriting the previous iteration?

I am new to coding. I basically have a bunch of files in "nifti" format, I wanted to simply load them, apply a thresholding function to them and then save them. I was able to write the few lines of code to do it to one file (it worked), but I have many so I created another python file and tried to make a for loop. I think it does everything fine but the last step for saving my files just keeps overwriting so in the end I only get one output file.
import numpy as np
import nibabel as nb
import glob
import os
path= 'subjects'
all_files=glob.glob(path + '/*.nii')
for filename in all_files:
image=nb.load(filename)
data=image.get_fdata()
data [data<0.1]=0
new_image=nb.Nifti1Image(data, affine=image.affine, header=image.header)
nb.save(new_image,filename+1)

Writing a function to combine arrays from different files

Sorry for the novice question. I'm just starting to learn Python and I don't have any coding background. I already ended up doing this process manually, but I'm curious what the automated process would look like and would like to learn from the example.
So I have a folder of 50 npz files. I need to pull a specific 29x5 array from each npz file and concatenate all of it into a single csv. This is what I did manually:
import numpy as np
import os
os.chdir('D:/Documents/WorkingDir')
data1=np.load('file1.npz', mmap_mode='r')
array1 = data1.f.array
#data2=etc.
#array2=etc.
grandarray = np.concatenate((array1,array2), axis = 0)
np.savetext('grandarray.csv', grandarrray, delimiter=",")
I gather you can use glob to get a list of all files in the same folder with the .npz extension, but I can't figure out how to turn my manual process into a script and automate it. I'll gladly take links to tutorial websites that can get me going in this direction as well. Thank you all for your time.

You need to use iteration. A loop would be fine, but I find a list comprehension to be acceptable here.
import glob
import numpy as np
import os
os.chdir('D:/Documents/WorkingDir')
filenames = glob.glob('*.npz')
data_arrays = [np.load(filename, mmap_mode='r').f.array for filename in filenames]
grandarray = np.concatenate(data_arrays, axis = 0)
np.savetext('grandarray.csv', grandarrray, delimiter=",")

Read multiple csv files in a folder

I have multiple .csv files that represents a serious of measurements maiden.
I need to plot them in order to compare proceeding alterations.
I basically want to create a function with it I can read the file into a list and replay several of the "data cleaning in each .csv file" Then plot them all together in a happy graph
this is a task I need to do to analyze some results. I intend to make this in python/pandas as I might need to integrate into a bigger picture in the future but for now, this is it.
I basically want to create a function with it I can read the file into a big picture comparing it Graph.
I also have one file that represents background noise. I want to remove these values from the others .csv as well
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
PATH = r'C:\Users\UserName\Documents\FSC\Folder_name'
FileNames = [os.listdir(PATH)]
for files in FileNames:
df = pd.read_csv(PATH + file, index_col = 0)
I expected to read every file and store into this List but I got this code:
FileNotFoundError: [Errno 2] File b'C:\Users\UserName\Documents\FSC\FolderNameFileName.csv' does not exist: b'C:\Users\UserName\Documents\FSC\FolderNameFileName.csv'

Have you used pathlib from the standard library? it makes working with the file system a breeze,
recommend reading : https://realpython.com/python-pathlib/
try:
from pathlib import Path
files = Path('/your/path/here/').glob('*.csv') # get all csvs in your dir.
for file in files:
df = pd.read_csv(file,index_col = 0)
# your plots.

Read multiple .xlsx files from a directory into separate Pandas data frames based on file name

I want to load multiple xlsx files with varying structures from a directory and assign these their own data frame based on the file name. I have 30+ files with differing structures but for brevity please consider the following:
3 excel files [wild_animals.xlsx, farm_animals_xlsx, domestic_animals.xlsx]
I want to assign each with their own data frame so if the file name contains 'wild' it is assigned to wild_df, if farm then farm_df and if domestic then dom_df. This is just the first step in a process as the actual files contain a lot of 'noise' that needs to be cleaned depending on file type etc they file names will also change on a weekly basis with only a few key markers staying the same.
My assumption is the glob module is the best way to begin to do this but in terms of taking very specific parts of the file extension and using this to assign to a specific df I become a bit lost so any help appreciated.
I asked a similar question a while back but it was part of a wider question most of which I have now solved.

I would parse them into a dictionary of DataFrame's:
import os
import glob
import pandas as pd
files = glob.glob('/path/to/*.xlsx')
dfs = {}
for f in files:
dfs[os.path.splitext(os.path.basename(f))[0]] = pd.read_excel(f)
then you can access them as a normal dictionary elements:
dfs['wild_animals']
dfs['domestic_animals']
etc.

You nee to get all xlsx files, than using comprehension dict, you can access to any elm
import pandas as pd
import os
import glob
path = 'Your_path'
extension = 'xlsx'
os.chdir(path)
result = [i for i in glob.glob('*.{}'.format(extension))]
{elm:pd.ExcelFile(elm) for elm in result}

For completeness wanted to show the solution I ended up using, very close to Khelili suggestion with a few tweaks to suit my particular code including not creating a DataFrame at this stage
import os
import pandas as pd
import openpyxl as excel
import glob
#setting up path
path = 'data_inputs'
extension = 'xlsx'
os.chdir(path)
files = [i for i in glob.glob('*.{}'.format(extension))]
#Grouping files - brings multiple files of same type together in a list
wild_groups = ([s for s in files if "wild" in s])
domestic_groups = ([s for s in files if "domestic" in s])
#Sets up a dictionary associated with the file groupings to be called in another module
file_names = {"WILD":wild_groups, "DOMESTIC":domestic_groups}
...

Iterate a simple calculation in Pandas across multiple files

The code below generates a sum from the "Value" column in an ndarray called 'File1.csv'.
How do I apply this code to every file in a directory and place the sums in a new file called Sum.csv?
import pandas as pd
import numpy as np
df = pd.read_csv("~/File1.csv")
df["Value"].sum()
Many thanks!

There's probably a nice way to do this with a pandas Panel, but this is a basic python implementation.
import os
import pandas as pd
# Get the home directory (not recommended, work somewhere else)
directory = os.environ["HOME"]
# Read all files in directory, filter out non-csv
files = [os.path.join(directory, f)
for f in os.listdir(directory) if f.endswith(".csv")]
# Make list of tuples [(filename, sum)]
sums = [(filename, pd.read_csv(filename)["Value"].sum())
for filename in files ]
# Make a dataframe
df = pd.DataFrame(sums, columns=["filename", "sum"])
df.to_csv(os.path.join(directory, "files_with_sum.csv"))
Note that the built in python os.listdir() doesn't understand "~/" like pandas does, so we get it out of the environment map. Using the home directory isn't really recommended, so this gives any adopter of this code an opportunity to set a different path.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading Multiple Data files from same folder in Python - python

Use a python dictionary, instead of overwriting the results in data variable inside your loop. data_dict = dict() for file in glob.glob("*.data"): data_dict[file] = np.loadtxt(file) Is this what you were looking for?

Related

How do I save each iteration as my file format without overwriting the previous iteration?

Writing a function to combine arrays from different files

Read multiple csv files in a folder

Read multiple .xlsx files from a directory into separate Pandas data frames based on file name

Iterate a simple calculation in Pandas across multiple files

Categories

Resources