i`m traind to read some .xlsx files from a directory that is create earlier using curent timestamp and the files are store there, now i want to read those .xlsx files and put them in only one .xlsx files with multiple sheets, but i tried multiple ways and didnt work, i tried:
final file Usage-SvnAnalysis.xlsx
the script i tried:
import pandas as pd
import numpy as np
from timestampdirectory import createdir
import os
dest = createdir()
dfSvnUsers = pd.read_csv(dest, "SvnUsers.xlsx")
dfSvnGroupMembership = pd.read_csv(dest, "SvnGroupMembership.xlsx")
xlwriter = pd.ExcelWriter("Usage-SvnAnalysis.xlsx")
dfSvnUsers.to_excel(xlwriter, sheet_name='SvnUsers', index = False )
dfSvnGroupMembership.to_excel(xlwriter, sheet_name='SvnGroupMembership', index = False )
xlwriter.close()
the folder that is created automaticaly with curent timestamp that contains files.
this is one of file that file that i want to add as sheet in that final xlsx
this is how i create the director with curent time and return dest to export the files in
I change a bit the script, now its how it looks like, still getting error :
File "D:\Py_location_projects\testfi\Usage-SvnAnalysis.py", line 8, in
with open(file, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'SvnGroupMembership.xlsx'
the files exist, but the script cant take the root path to that directory because i create that directory on other script using timestamp and i returned the path using dest
dest=createdir() represent the path where the files is, what i need to do its just acces this dest an read the files from there and export them in only 1 xlsx as sheets of him , in this cas sheet1 and sheet2, because i tried to reat only 2 files from that dir
import pandas as pd
import numpy as np
from timestampdirectory import createdir
import os
dest = createdir()
files = os.listdir(dest)
for file in files:
with open(file, 'r') as f:
dfSvnUsers = open(os.path.join(dest, 'SvnUsers.xlsx'))
dfSvnGroupMembership = open(os.path.join(dest, 'SvnGroupMembership.xlsx'))
xlwriter = pd.ExcelWriter("Usage-SvnAnalysis.xlsx")
dfSvnUsers.to_excel(xlwriter, sheet_name='SvnUsers', index = False )
dfSvnGroupMembership.to_excel(xlwriter, sheet_name='SvnGroupMembership', index = False )
xlwriter.close()
I think you should try read Excel files use pd.read_excel instead of pd.read_csv.
import os
dfSvnUsers = pd.read_excel(os.path.join(dest, "SvnUsers.xlsx"))
dfSvnGroupMembership = pd.read_excel(os.path.join(dest, "SvnGroupMembership.xlsx"))
Related
I have 720 .NC files in one folder. I am trying to open the file and write all the data into an excel sheet. the scripts works perfectly for single file. Here is my code for single file:
import xarray as xr
file_name = 'dcbl.slice.11748.nc'
# Loading NetCDF dataset using xarray
data = xr.open_dataset('/Users/ismot/Downloads/LES_Data/u1.shf400.lhf040.512/' + file_name)
# convert the columns to dataframe using xarray
df = data[['x', 'y', 'time', 'C_sum_column_AVIRIS', 'C_sum_column_HyTES']].to_dataframe()
# write the dataframe to an excel file
df.to_excel(file_name + '.xlsx')
Now, I am trying to run the script for the all files in the directory. I have modified the scripts like this:
# import required module
import os
import xarray as xr
# assign directory
directory = '/Users/ismot/Downloads/LES_Data/u1.shf400.lhf040.512'
# list all files in the directory
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
print(f)
# write a function to open .NC files using xarray and convert them in excel sheet
def file_changer(filename):
data = xr.open_dataset(str(filename))
df = data[['x', 'y', 'time', 'C_sum_column_AVIRIS', 'C_sum_column_HyTES']].to_dataframe()
df.to_excel(filename + '.xlsx')
# Run for multiple files
import glob
for file in glob.glob('*.nc'):
file_changer(file)
The scripts runs and gives no error. But it only prints the name of the files in the directory. It doesn't go over the 720 files and save them in the excel sheet. How can I fix it?
I search a few related discussions, such as
Read most recent excel file from folder PYTHON however, it does not fit my requirement quite well.
Suppose I have a folder with the following .xlsx files
I want to read the files with name "T2xxMhz", i.e., the last 7 files.
I have the following codes
import os
import pandas as pd
folder = r'C:\Users\work' # <--- find the folder
files = os.listdir(folder) # <--- find files in the folder 'work'
dfs ={}
for i, file in enumerate(files):
if file.endswith('.xlsx'):
dfs[i] = pd.read_excel(os.path.join(folder,file), sheet_name='Z=143', header = None, skiprows=[0], usecols = "B:M") # <--- read specific sheet with the name 'Z=143'
num = i + 1 # <--- number of files.
However in this codes, I cannot differentiate two types of file name 'PYTEST' and 'T2XXX'.
How to deal with this problem? Any suggestions and hints please!
use glob package. allows multiple usage of regexes
import glob
dir = 'path/to/files/'
flist = glob.glob(dir + 'T*Mhz*')
print(flist)
I have zip files and each zip file contains three subfolders (i.e. ini, log, and output). I want to read a file from output folder and it contains three csv files with different names. Suppose three files name are: initial.csv, intermediate.csv, and final.csv. and just want to read final.csv file.
The code that I tried to read file is:
import zipfile
import numpy
import pandas as pd
zipfiles = glob.glob('/home/data/*.zip')
for i in np.arange(len(zipfiles)):
zip = zipfile.ZipFile(zpfiles[i])
f = zip.open(zip.namelist().startswith('final'))
data = pd.read_csv(f, usecols=[3,7])
and the error I got is 'list' object has no attribute 'startswith'
How can I find the correct file and read it?
Replase
f = zip.open(zip.namelist().startswith('final'))
With
f = zip.open('output/final.csv')
If you can "find" it:
filename = ([name for name in zip.namelist() if name.startswith('output/final')][0])
f = zip.open(filename)
To find sub dirs, let's switch to pathlib which uses glob:
from pathlib import Path
import zipfile
import pandas as pd
dfs = []
files = Path('/home/data/').rglob('*final*.zip') #rglob recursively trawls all child dirs.
for file in files:
zip = zipfile.ZipFile(zpfiles[file])
....
# your stuff
df = pd.read_csv(f, usecols=[3,7])
dfs.append(df)
I would like to read several excel files contained into a folder in the Desktop of my MacBook into pandas.
The folder in the desktop is contains a folder (project dataset) with all the excel files and the Jupiter notebook page where I am writing the code (draft progetto)
I wrote the following code:
path = os.getcwd()
files = os.listdir(path)
files
Output:
['.DS_Store', 'draft progetto.ipynb', '.ipynb_checkpoints', 'project_dataset']
Then when I run:
files_xls = [f for f in files if f[3:] == 'xlsx']
files_xls
I GET AN EMPTY LIST AS OUTPUT!!
WHY IS THIS?
IIUC,
this is something that can be done much easier with pathlib and unix matching using the glob module.
from pathlib import Path
import pandas as pd
#one liner
your_path = 'path_to_excel_files'
df = pd.concat([pd.read_excel(f) for f in Path(your_path).rglob('*.xlsx')])
Breaking it down.
# find the excel files
# if you want to change the path do Path('your_path')...
files = [file for file in Path.cwd.rglob('*.xlsx')]
#create a list of dataframes.
dfs_list = [pd.read_excel(file) for file in files])
#concat
df = pd.concat(dfs_list)
I have a task to create a script to ssh to list of 10 cisco routers weekly and check for config changes and send notification. So i have in place the script that logs and run the command and send it to csv. I have modified so if there is not changes all I have in the csv will be for example:
rtr0003# -which is the router name only. If there will be conf change the excel will have inside for example:
My question is how to run pandas to open each file and if it sees only one line/row to delete the excel file and if more lines to skip it.
This is how i write the files:
files = glob.glob('*.csv')
for file in files:
df=pd.read_csv(file)
df=df.dropna()
df.to_csv(file,index=False)
df1=pd.read_csv(file,skiprows = 2)
#df1=df1.drop(df1.tail(1))
df1.to_csv(file,index=False)
import os
import glob
import csv
files = glob.glob('*.csv')
for file in files:
with open(file,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)
if row_count == 1:
os.remove(file)
Here is a solution using pandas:
import pandas as pd
import glob
import os
csv_files = glob.glob('*.csv')
for file in csv_files:
df_file = pd.read_csv(file, low_memory = False)
if len(df_file) == 1:
os.remove(file)
If you are using excel files, change
glob.glob('*.csv')
to
glob.glob('*.xlsx')
and
pd.read_csv(file, low_memory = False)
to
pd.read_excel(file)