How to get path of selected files in a folder using python? - python

I am using glob to retrieve the path of all files in a folder ("data"):
import glob
import geopandas
for file in glob.glob(r"/home/data/*.txt"):
sf = geopandas.read_file(file)
However, here, I am interested in knowing how to retrieve only selected files whose names are listed in a variable as a list. For instance, I want the path of only the following files: aaa.txt, bdf.txt, hgr.txt, in the "data" folder, which are listed in variable "imp".
imp = ['aaa.txt', 'bdf.txt', 'hgr.txt']

Something like this could do it. Just loop through the files you need.
import geopandas
imp = ['aaa.txt', 'bdf.txt', 'hgr.txt']
for file in imp:
sf = geopandas.read_file(f'/home/data/{file}')

Like this?
import geopandas
path = "/home/data/"
imp = [ 'aaa.txt', 'bdf.txt', 'hgr.txt']
for file in imp:
if os.path.exists(path+file):
sf = geopandas.read_file(path+file)

Related

how to get the file name from directory in python

I want to get the file name in my directory to use it as dynamically in ml model.
import os
i = os.listdir('./upload')[0]
print(i)
# ...read data from file `i`...
The problem with your approach is that os.listdir returns file names not file paths. So you need to add the directory name to your filepath:
cv2.imread(os.path.join('upload', i))
Or you can use glob which returns paths:
import glob
import cv2
for image_path in glob.glob('./upload/*.jpg'):
image = cv2.imread(image_path)

How to find a required file and read it in a zip file?

I have zip files and each zip file contains three subfolders (i.e. ini, log, and output). I want to read a file from output folder and it contains three csv files with different names. Suppose three files name are: initial.csv, intermediate.csv, and final.csv. and just want to read final.csv file.
The code that I tried to read file is:
import zipfile
import numpy
import pandas as pd
zipfiles = glob.glob('/home/data/*.zip')
for i in np.arange(len(zipfiles)):
zip = zipfile.ZipFile(zpfiles[i])
f = zip.open(zip.namelist().startswith('final'))
data = pd.read_csv(f, usecols=[3,7])
and the error I got is 'list' object has no attribute 'startswith'
How can I find the correct file and read it?
Replase
f = zip.open(zip.namelist().startswith('final'))
With
f = zip.open('output/final.csv')
If you can "find" it:
filename = ([name for name in zip.namelist() if name.startswith('output/final')][0])
f = zip.open(filename)
To find sub dirs, let's switch to pathlib which uses glob:
from pathlib import Path
import zipfile
import pandas as pd
dfs = []
files = Path('/home/data/').rglob('*final*.zip') #rglob recursively trawls all child dirs.
for file in files:
zip = zipfile.ZipFile(zpfiles[file])
....
# your stuff
df = pd.read_csv(f, usecols=[3,7])
dfs.append(df)

Get all folder names exept for one in python

i need to get all folder names EXCEPT for "Archives" using Path() ONLY as i need to use glob later in the for loop. Im on Kali Linux and the file structure is ./sheets/ and then the folders Archives, and Test(ALSO NOTHINGS INSIDE) with files creds.json and sheets.py.
# Imports
from pathlib import Path
import pandas as pd
import pygsheets
import glob
import os
# Setup
gc = pygsheets.authorize(service_file='./creds.json')
email = str(input("Enter email to share sheet: "))
folderName = Path("./") # <<<<<<<<<<<<< HERE IS PROBLEM
for file in folderName.glob("*.txt"):
if not Path("./Archives").exists():
os.mkdir("./Archives")
df = pd.DataFrame()
df['name'] = ['John', 'Steve', 'Sarah', 'YESSSS']
gc.create(folderName)
sh = gc.open(file)
sh.share(email, role='writer', type='user')
wks = sh[0]
wks.set_dataframe(df,(1,1))
i expect the output to the variable folderName be any folder name except Archives as a string.
my goal is a script that when run, gets the folder name in ./sheets/ (Test in this case) as the newly created spreadsheet's name, get file names as headers, and stuff in files (seperated by newlines) as the stuff underneath the header(file names) then shares the sheet with me at my email. using pygsheets by the way
from pathlib import Path
p = Path('./sheets')
# create Archives under p if needed
archives = p / 'Archives'
if not archives.exists(): archives.mkdir()
# find all directories under p that don't include 'Archives'
folder_dirs = [x for x in p.glob('**/*') if x.is_dir() and 'Archives' not in x.parts]
# find all *.txt* files under p that don't include 'Archives'
txt_file_dirs = [x for x in p.glob('**/*.txt') if x.is_file() and 'Archives' not in x.parts]
for file in txt_file_dirs:
file_name = file.stem
When using pathlib you will be working with objects, not strings and there are many methods for working on those objects, within the library.

How to read particular text files out-of multiple files in a sub directories in python

I have a one folder, within it contains 5 sub-folders.
Each sub folder contains some 'x.txt','y.txt' and 'z.txt' files and it repeats in every sub-folders
Now I need to read and print only 'y.txt' file from all sub-folders.
My problem is I'm unable to read and print 'y.txt' files. Can you tell me how solve this problem.
Below is my code which I have written for reading y.txt file
import os, sys
import pandas as pd
file_path = ('/Users/Naga/Desktop/Python/Data')
for root, dirs, files in os.walk(file_path):
for name in files:
print(os.path.join(root, name))
pd.read_csv('TextInformation.txt',delimiter=";", names = ['Name', 'Value'])
error :File TextInformation.txt does not exist: 'TextInformation.txt'
You could also try the following approach to fetch all y.txt files from your subdirectories:
import glob
import pandas as pd
# get all y.txt files from all subdirectories
all_files = glob.glob('/Users/Naga/Desktop/Python/Data/*/y.txt')
for file in all_files:
data_from_this_file = pd.read_csv(file, sep=" ", names = ['Name', 'Value'])
# do something with the data
Subsequently, you can apply your code to all the files within the list all_files. The great thing with glob is that you can use wilcards (*). Using them you don't need the names of the subdirectories (you can even use it within the filename, e.g. *y.txt). Also see the documentation on glob.
Your issue is forgot adding the parent path of 'y.txt' file. I suggest this code for you, hope it help.
import os
pth = '/Users/Naga/Desktop/Python/Data'
list_sub = os.listdir(pth)
filename = 'TextInformation.txt'
for sub in list_sub:
TextInfo = open('{}/{}/{}'.format(pth, sub, filename), 'r').read()
print(TextInfo)
I got you a little code. you can personalize it anyway you like but the code works for you.
import os
for dirPath,foldersInDir,fileName in os.walk(path_to_main_folder):
if fileName is not []:
for file in fileName:
if file.endswith('y.txt'):
loc = os.sep.join([dirPath,file])
y_txt = open(loc)
y = y_txt.read()
print(y)
But keep in mind that {path_to_main} is the path that has the subfolders.

Loop over multiple folders from list with glob.glob

How can I loop over a defined list of folders and all of the individual files inside each of those folders?
I'm trying to have it copy all the months in each year folder. But when I run it nothing happens..
import shutil
import glob
P4_destdir = ('Z:/Source P4')
yearlist = ['2014','2015','2016']
for year in yearlist:
for file in glob.glob(r'{0}/*.csv'.format(yearlist)):
print (file)
shutil.copy2(file,P4_destdir)
I think the problem might be that you require a / in you source path:
import shutil
import glob
P4_destdir = ('Z:/Source P4/')
yearlist = ['2014','2015','2016'] # assuming these files are in the same directory as your code.
for year in yearlist:
for file in glob.glob(r'{0}/*.csv'.format(yearlist)):
print (file)
shutil.copy2(file,P4_destdir)
Another thing that might be a problem is if the destination file does not yet exist. You can create it using os.mkdir:
import os
dest = os.path.isdir('Z:/Source P4/') # Tests if file exists
if not dest:
os.mkdir('Z:/Source P4/')

Categories