I´m trying to save converted excel files from different paths, to the same folder.
How can I pass the path to the function correctly?
Now what is happening is that it is attaching the original path to the save path I have given to the function.
So my solution was:
import pandas as pd
import glob
import csv, json
import openpyxl
from pathlib import Path
import os, os.path
import errno
destination_path = "C:\\csv_files"
all_paths = [r"C:\\PLM\\PML.xlsx",r"C:\\TMR\\TMR.xlsx",r"C:\\PLM\\PLM.xlsx"]
Create variable to store tuple list
all_items = []
Create tuple list with file path and file name without extension
def getFileName():
for paths in all_paths:
all_items.append((paths , paths.split("\\")[-1].split(".")[0]))
Convert given files by iterating through tuple list and pass destination folder.
def convertFiles():
for item in all_items:
read_file = pd.read_excel(item[0], 'Relatório - DADOS', index_col=None, engine='openpyxl')
read_file.to_csv(destination_path + "\\"+ item[1] + ".csv", encoding='utf-8', index=False)
You can ensure the save folder exists by adding this line before the outer for loop:
Path(save_path).mkdir(exist_ok=True)
See documentation.
Related
I have a script, below, that can download files from a particular row from 1 only CSV file. I have no problem with it, it works well and all files are downloaded into my 'Python Project' folder, root.
But I would like to add functions here, First, download not only 1 but multiple (20 or more) CSV files then I don't have to change the name manually here - open('name1.csv') everytime my script has done the job. Second request, downloads need to be placed in a folder with the same name of the csv file that downloads come from. Hopefully I'm clear enough :)
Then I could have:
name1.csv -> name1 folder -> download from name1 csv
name2.csv -> name2 folder -> download from name2 csv
name3.csv -> name3 folder -> download from name3 csv
...
Any help or suggestions will be more than appreciate :) Many thanks!
from collections import Counter
import urllib.request
import csv
import os
with open('name1.csv') as csvfile: #need to add multiple .csv files here.
reader = csv.DictReader(csvfile)
title_counts = Counter()
for row in reader:
name, ext = os.path.splitext(row['link'])
title = row['title']
title_counts[title] += 1
title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
urllib.request.urlretrieve(row['link'], title_filename)
You need to add an outer loop which will iterate over files in specific folder. You can use either os.listdir() which returns list of all entries or glob.iglob() with *.csv pattern to get only files with .csv extension.
Also there are some minor improvements you can make in your code. You're using Counter in the way that it can be replaced with defaultdict or even simple dict. Also urllib.request.urlretrieve() is a part of legacy interface which might get deprecated, so you can replace it with combination of urllib.request.urlopen() and shutil.copyfileobj().
Finally, to create a folder you can use os.mkdir() but previously you need to check whether folder already exists using os.path.isdir(), it's required to prevent FileExistsError exception.
Full code:
from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj
csv_folder = r"/some/path"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
with open(file) as csv_file:
reader = DictReader(csv_file)
save_folder, _ = splitext(file)
if not isdir(save_folder):
mkdir(save_folder)
title_counter = defaultdict(int)
for row in reader:
url = row["link"]
title = row["title"]
title_counter[title] += 1
_, ext = splitext(url)
save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}")
with urlopen(url) as req, open(save_filename, "wb") as save_file:
copyfileobj(req, save_file)
For 1: Just loop over a list containing the names of your desired files.
The list can be retrieved using "os.listdir(path)" which returns a list of the files contained inside your "path" (a folder containing the csv files in your case).
I have zip files and each zip file contains three subfolders (i.e. ini, log, and output). I want to read a file from output folder and it contains three csv files with different names. Suppose three files name are: initial.csv, intermediate.csv, and final.csv. and just want to read final.csv file.
The code that I tried to read file is:
import zipfile
import numpy
import pandas as pd
zipfiles = glob.glob('/home/data/*.zip')
for i in np.arange(len(zipfiles)):
zip = zipfile.ZipFile(zpfiles[i])
f = zip.open(zip.namelist().startswith('final'))
data = pd.read_csv(f, usecols=[3,7])
and the error I got is 'list' object has no attribute 'startswith'
How can I find the correct file and read it?
Replase
f = zip.open(zip.namelist().startswith('final'))
With
f = zip.open('output/final.csv')
If you can "find" it:
filename = ([name for name in zip.namelist() if name.startswith('output/final')][0])
f = zip.open(filename)
To find sub dirs, let's switch to pathlib which uses glob:
from pathlib import Path
import zipfile
import pandas as pd
dfs = []
files = Path('/home/data/').rglob('*final*.zip') #rglob recursively trawls all child dirs.
for file in files:
zip = zipfile.ZipFile(zpfiles[file])
....
# your stuff
df = pd.read_csv(f, usecols=[3,7])
dfs.append(df)
I Selected my needed Data Like This:
import pathlib
from pathlib import Path
import glob, os
folder = Path('D:/xyz/123/Files')
os.chdir(folder)
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
print (JsonFiles)
As Output I will get all my needed .json Files
D:/xyz/123/Files/Data.json
D:/xyz/123/Files/Stuff.json
D:/xyz/123/Files/Random.json
D:/xyz/123/Files/Banana.json
D:/xyz/123/Files/Apple.json
For my further coding in need a Variable to the diffrens Json Paths. So my Idear was insted of printing them to store them in a List. But thats not working?
ListJson =[JsonFiles]
print (ListJson[1])
I get this Error:
print (ListJson[2])
IndexError: list index out of range
How wold you solve this Problem I just need an possibility to work with the Paths I already sorted.
Solution 1 with append():
If you change
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
print (JsonFiles)
to
ListJson = []
for file in glob.glob("*.json"):
JsonFile = os.path.join(folder, file)
ListJson.append(JsonFile)
You create an empty list and add during each iteration one file (the result of os.path.join to it.
then you would have what you want
Solution 2 with list comprehensions:
If you want to use list comprehensions ( https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions ) , then following would do:
ListJson = [os.path.join(folder, file) for file in glob.glob("*.json")]
Solution 3: with pathlib, absolute() and glob():
On the other hand you don't even have to chdir() into the given directory and if path objects are good enough in your context you could directly do:
ListJson = list(folder.absolute().glob('*.json'))
or if you really need strings:
ListJson = [str(path) for path in folder.absolute().glob('*.json')]
You can use Python List append() Method
import pathlib
from pathlib import Path
import glob, os
folder = Path('D:/xyz/123/Files')
ListJson=[]
os.chdir(folder)
for file in glob.glob("*.json"):
JsonFiles = os.path.join(folder, file)
ListJson.append(JsonFiles)
print(ListJson)
Hi I'm working on a simple script that copy files from a directory to another based on a dataframe that contains a list of invoices.
Is there any way to do this as a partial match? like i want all the files that contains "F11000", "G13000" and go on continue this loop until no more data in DF.
I tried to figure it out by myself and I'm pretty sure changing the "X" on the copy function will do the trick, but can't see it.
import pandas as pd
import os
import glob
import shutil
data = {'Invoice':['F11000','G13000','H14000']}
df = pd.DataFrame(data,columns=['Doc'])
path = 'D:/Pyfilesearch'
dest = 'D:/Dest'
def find(name,path):
for root,dirs,files in os.walk(path):
if name in files:
return os.path.join(root,name)
def copy():
for x in df['Invoice']:
shutil.copy(find(x,path),dest)
copy()
Using pathlib
This is part of the standard library
Treats paths and objects with methods instead of strings
Python 3's pathlib Module: Taming the File System
Script assumes dest is an existing directory.
.rglob searches subdirectories for files
from pathlib import Path
import pandas as pd
import shutil
# convert paths to pathlib objects
path = Path('D:/Pyfilesearch')
dest = Path('D:/Dest')
# find files and copy
for v in df.Invoice.unique(): # iterate through unique column values
files = list(path.rglob(f'*{v}*')) # create a list of files for a value
files = [f for f in files if f.is_file()] # if not using file extension, verify item is a file
for f in files: # iterate through and copy files
print(f)
shutil.copy(f, dest)
Copy to subdirectories for each value
path = Path('D:/Pyfilesearch')
for v in df.Invoice.unique():
dest = Path('D:/Dest')
files = list(path.rglob(f'*{v}*'))
files = [f for f in files if f.is_file()]
dest = dest / v # create path with value
if not dest.exists(): # check if directory exists
dest.mkdir(parents=True) # if not, create directory
for f in files:
shutil.copy(f, dest)
i need to get all folder names EXCEPT for "Archives" using Path() ONLY as i need to use glob later in the for loop. Im on Kali Linux and the file structure is ./sheets/ and then the folders Archives, and Test(ALSO NOTHINGS INSIDE) with files creds.json and sheets.py.
# Imports
from pathlib import Path
import pandas as pd
import pygsheets
import glob
import os
# Setup
gc = pygsheets.authorize(service_file='./creds.json')
email = str(input("Enter email to share sheet: "))
folderName = Path("./") # <<<<<<<<<<<<< HERE IS PROBLEM
for file in folderName.glob("*.txt"):
if not Path("./Archives").exists():
os.mkdir("./Archives")
df = pd.DataFrame()
df['name'] = ['John', 'Steve', 'Sarah', 'YESSSS']
gc.create(folderName)
sh = gc.open(file)
sh.share(email, role='writer', type='user')
wks = sh[0]
wks.set_dataframe(df,(1,1))
i expect the output to the variable folderName be any folder name except Archives as a string.
my goal is a script that when run, gets the folder name in ./sheets/ (Test in this case) as the newly created spreadsheet's name, get file names as headers, and stuff in files (seperated by newlines) as the stuff underneath the header(file names) then shares the sheet with me at my email. using pygsheets by the way
from pathlib import Path
p = Path('./sheets')
# create Archives under p if needed
archives = p / 'Archives'
if not archives.exists(): archives.mkdir()
# find all directories under p that don't include 'Archives'
folder_dirs = [x for x in p.glob('**/*') if x.is_dir() and 'Archives' not in x.parts]
# find all *.txt* files under p that don't include 'Archives'
txt_file_dirs = [x for x in p.glob('**/*.txt') if x.is_file() and 'Archives' not in x.parts]
for file in txt_file_dirs:
file_name = file.stem
When using pathlib you will be working with objects, not strings and there are many methods for working on those objects, within the library.