i need to get all folder names EXCEPT for "Archives" using Path() ONLY as i need to use glob later in the for loop. Im on Kali Linux and the file structure is ./sheets/ and then the folders Archives, and Test(ALSO NOTHINGS INSIDE) with files creds.json and sheets.py.
# Imports
from pathlib import Path
import pandas as pd
import pygsheets
import glob
import os
# Setup
gc = pygsheets.authorize(service_file='./creds.json')
email = str(input("Enter email to share sheet: "))
folderName = Path("./") # <<<<<<<<<<<<< HERE IS PROBLEM
for file in folderName.glob("*.txt"):
if not Path("./Archives").exists():
os.mkdir("./Archives")
df = pd.DataFrame()
df['name'] = ['John', 'Steve', 'Sarah', 'YESSSS']
gc.create(folderName)
sh = gc.open(file)
sh.share(email, role='writer', type='user')
wks = sh[0]
wks.set_dataframe(df,(1,1))
i expect the output to the variable folderName be any folder name except Archives as a string.
my goal is a script that when run, gets the folder name in ./sheets/ (Test in this case) as the newly created spreadsheet's name, get file names as headers, and stuff in files (seperated by newlines) as the stuff underneath the header(file names) then shares the sheet with me at my email. using pygsheets by the way
from pathlib import Path
p = Path('./sheets')
# create Archives under p if needed
archives = p / 'Archives'
if not archives.exists(): archives.mkdir()
# find all directories under p that don't include 'Archives'
folder_dirs = [x for x in p.glob('**/*') if x.is_dir() and 'Archives' not in x.parts]
# find all *.txt* files under p that don't include 'Archives'
txt_file_dirs = [x for x in p.glob('**/*.txt') if x.is_file() and 'Archives' not in x.parts]
for file in txt_file_dirs:
file_name = file.stem
When using pathlib you will be working with objects, not strings and there are many methods for working on those objects, within the library.
Related
I have a folder called Contracts and then in that folder I have folders for several companies. In the company folders I have several contracts that we have with those companies. I am trying to get a data frame that has two columns, Folder_Name and Contract.
I tried to follow this question, Python list directory, subdirectory, and files which got me close, I think, but I could not get a column with the folder name that the contract was from.
I thought this would work:
import pathlib, sys, os
import pandas as pd
cwd = os.getcwd()
lst1 = []
lst2 = []
for path, subdir, file in os.walk(cwd):
for i in subdir:
for name in file:
lst1.append(i)
lst2.append(name)
df = pd.DataFrame(zip(lst1, lst2), columns = ['Folder_Name', 'Contract'])
but it only gave me the folder names in one column and the names of files in the contracts folder instead of in the company folders
Folder_Name Contract
0 .ipynb_checkpoints Untitled.ipynb
1 AWS Untitled.ipynb
I ran this code:
import pathlib, sys, os
import pandas as pd
cwd = os.getcwd()
lst1 = []
lst2 = []
for path, subdir, file in os.walk(os.path.join(cwd,'Contracts')):
print(path, subdir, file)
for i in subdir:
for name in file:
print(i,name)
In an exemple folder and I found your problem.
Here is the console response
As you can see when subdir is full, file is empty and when file is full, subdir is empty.
In fact, subdir lists the forward folders whereas file only lists you the forward files considering to the path you are in, regarding to your situation there is either one or another, but never both at the same time. That's why your loop always has an empty element and never prints anything.
I tryed to do something which works in the situation you described, this is a ltle bit longer but you can try that:
import pathlib, sys, os
import pandas as pd
cwd = os.getcwd()
contracts_path=os.path.join(cwd,'Contracts')
lst1 = []
lst2 = []
for path, subdir, file in os.walk(contracts_path):
for company in subdir:
for path, subdir, file in os.walk(os.path.join(contracts_path,company)):
for name in file:
lst1.append(company)
lst2.append(name)
df = pd.DataFrame(zip(lst1, lst2), columns = ['Folder_Name', 'Contract'])
I´m trying to save converted excel files from different paths, to the same folder.
How can I pass the path to the function correctly?
Now what is happening is that it is attaching the original path to the save path I have given to the function.
So my solution was:
import pandas as pd
import glob
import csv, json
import openpyxl
from pathlib import Path
import os, os.path
import errno
destination_path = "C:\\csv_files"
all_paths = [r"C:\\PLM\\PML.xlsx",r"C:\\TMR\\TMR.xlsx",r"C:\\PLM\\PLM.xlsx"]
Create variable to store tuple list
all_items = []
Create tuple list with file path and file name without extension
def getFileName():
for paths in all_paths:
all_items.append((paths , paths.split("\\")[-1].split(".")[0]))
Convert given files by iterating through tuple list and pass destination folder.
def convertFiles():
for item in all_items:
read_file = pd.read_excel(item[0], 'Relatório - DADOS', index_col=None, engine='openpyxl')
read_file.to_csv(destination_path + "\\"+ item[1] + ".csv", encoding='utf-8', index=False)
You can ensure the save folder exists by adding this line before the outer for loop:
Path(save_path).mkdir(exist_ok=True)
See documentation.
I search a few related discussions, such as
Read most recent excel file from folder PYTHON however, it does not fit my requirement quite well.
Suppose I have a folder with the following .xlsx files
I want to read the files with name "T2xxMhz", i.e., the last 7 files.
I have the following codes
import os
import pandas as pd
folder = r'C:\Users\work' # <--- find the folder
files = os.listdir(folder) # <--- find files in the folder 'work'
dfs ={}
for i, file in enumerate(files):
if file.endswith('.xlsx'):
dfs[i] = pd.read_excel(os.path.join(folder,file), sheet_name='Z=143', header = None, skiprows=[0], usecols = "B:M") # <--- read specific sheet with the name 'Z=143'
num = i + 1 # <--- number of files.
However in this codes, I cannot differentiate two types of file name 'PYTEST' and 'T2XXX'.
How to deal with this problem? Any suggestions and hints please!
use glob package. allows multiple usage of regexes
import glob
dir = 'path/to/files/'
flist = glob.glob(dir + 'T*Mhz*')
print(flist)
I have the following example dataset output from pandas.
What I would like to do in an efficient way is using glob to search the filename in the associated main folder and sub folder only and not to loop through all the main folders/ sub folders as per my current code. I need this to then match against a main folders and sub folder I have and if it matches then it copies the file. I have code that works but it is very inefficient and has to go through all folders/sub folders for each search. The code is as below;At the moment, main_folder and searchdate are lists.filenames_i_want, is also the list that I will be matching to. Any way i can make it go straight to the folder/subfolder e.g if i provided this as CSV input?
import itertools
import glob
import shutil
from pathlib import Path
filenames_i_want = Search_param
main_folder=locosearch
searchfolder= Search_date
TargetFolder = r'C:\ELK\LOGS\XX\DEST'
for directory,folder in itertools.product(main_folder, searchfolder):
files = glob.glob('Z:/{}/{}/asts_data_logger/*.bz2'.format(directory, folder))
for f in files:
current_path = Path(f)
cpn = current_path.name
if current_path.name in filenames_i_want:
print(f"found target file: {f}")
shutil.copy2(f, TargetFolder)
I created a column and used the fields to make an absolute path and then used tuples to go through each row and shutil to copy
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
for row in df.itertuples():
search = row.Search
try:
shutil.copy2(search, TargetFolder)
except Exception as e:
print(e)
I have a one folder, within it contains 5 sub-folders.
Each sub folder contains some 'x.txt','y.txt' and 'z.txt' files and it repeats in every sub-folders
Now I need to read and print only 'y.txt' file from all sub-folders.
My problem is I'm unable to read and print 'y.txt' files. Can you tell me how solve this problem.
Below is my code which I have written for reading y.txt file
import os, sys
import pandas as pd
file_path = ('/Users/Naga/Desktop/Python/Data')
for root, dirs, files in os.walk(file_path):
for name in files:
print(os.path.join(root, name))
pd.read_csv('TextInformation.txt',delimiter=";", names = ['Name', 'Value'])
error :File TextInformation.txt does not exist: 'TextInformation.txt'
You could also try the following approach to fetch all y.txt files from your subdirectories:
import glob
import pandas as pd
# get all y.txt files from all subdirectories
all_files = glob.glob('/Users/Naga/Desktop/Python/Data/*/y.txt')
for file in all_files:
data_from_this_file = pd.read_csv(file, sep=" ", names = ['Name', 'Value'])
# do something with the data
Subsequently, you can apply your code to all the files within the list all_files. The great thing with glob is that you can use wilcards (*). Using them you don't need the names of the subdirectories (you can even use it within the filename, e.g. *y.txt). Also see the documentation on glob.
Your issue is forgot adding the parent path of 'y.txt' file. I suggest this code for you, hope it help.
import os
pth = '/Users/Naga/Desktop/Python/Data'
list_sub = os.listdir(pth)
filename = 'TextInformation.txt'
for sub in list_sub:
TextInfo = open('{}/{}/{}'.format(pth, sub, filename), 'r').read()
print(TextInfo)
I got you a little code. you can personalize it anyway you like but the code works for you.
import os
for dirPath,foldersInDir,fileName in os.walk(path_to_main_folder):
if fileName is not []:
for file in fileName:
if file.endswith('y.txt'):
loc = os.sep.join([dirPath,file])
y_txt = open(loc)
y = y_txt.read()
print(y)
But keep in mind that {path_to_main} is the path that has the subfolders.