How can I extract the same excel sheet from multiple files? - python

I have a directory of similar excel files and want to extract the first sheet from each file and save it as a .csv file. Currently have code which works to extract and save sheet from individual file:
import glob
import pandas as pd
f = glob.glob('filename.xlsx') # assume the path
for excel in f:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # if only the first sheet is needed.
df.to_csv(out)

You can get all your files into a list using glob with a list comprehension:
files_to_be_read = glob.glob("*.xlsx") #Assuming you also have the path to the folder where the excel files are saved
for i in files_to_be_read:
df_in = pd.read_excel(i) #You pass the path, pd.read_excel always uses the first sheet by default
df_out = pd.to_csv(i+'.csv') #You will save the file with the same name, but in csv format

Related

Merge all excel files into one file with multiple sheets

i would like some help.
I have multiple excel files, each file only has one sheet.
I would like to combine all excel files into just one file but with multiple sheets one sheet per excel file keeping the same sheet names.
this is what i have so far:
import pandas as pd
from glob import glob
import os
excelWriter = pd.ExcelWriter("multiple_sheets.xlsx",engine='xlsxwriter')
for file in glob('*.xlsx'):
df = pd.read_excel(file)
df.to_excel(excelWriter,sheet_name=file,index=False)
excelWriter.save()
All the excel files looks like this:
https://iili.io/HfiJRHl.png
sorry i cannot upload images here, dont know why but i pasted the link
But all the excel files have the exact same columns and rows and just one sheet, the only difference is the sheet name
Thanks in advance
import pandas as pd
import os
output_excel = r'/home/bera/Desktop/all_excels.xlsx'
#List all excel files in folder
excel_folder= r'/home/bera/Desktop/GIStest/excelfiles/'
excel_files = [os.path.join(root, file) for root, folder, files in os.walk(excel_folder) for file in files if file.endswith(".xlsx")]
with pd.ExcelWriter(output_excel) as writer:
for excel in excel_files: #For each excel
sheet_name = pd.ExcelFile(excel).sheet_names[0] #Find the sheet name
df = pd.read_excel(excel) #Create a dataframe
df.to_excel(writer, sheet_name=sheet_name, index=False) #Write it to a sheet in the output excel

How to use Python to extract Excel Sheet data

I have many folders, each folders contains 1 excel file like 1Aug2022, 2Aug2022...
I want python to Read thru all Folders, and only open the excel file name like 19AUG2022, the excel file have many sheets inside like IP-1*****, IP-2*****, IP-3*****. Then go to sheets with (IP-2*****) to extract 2columns of data.
How can I do it in python?
You can use pandas package: https://pandas.pydata.org/
an example is
import pandas as pd
your_excel_path = "your/path/to/the/excel/file"
data = pd.read_excel(your_excel_path, sheet_name = "19AUG2022") # If you want to read specific sheet's data
data = pd.read_excel(your_excel_path, sheet_name = None) # If you want to read all sheets' data, it will return a list of dataframes
As Fergus said use pandas.
The code to search all directorys may look like that:
import os
import pandas as pd
directory_to_search = "./"
sheet_name = "IP-2*****"
for root, dirs, files in os.walk(directory_to_search):
for file in files:
if file == "19AUG2022":
df = pd.read_excel(io=os.path.join(root, file), sheet_name=sheet_name)

Python: how to make a loop to copy data from different Excel files into a new one in an iterative way with pandas

I need to copy data from different Excel files into a new one. I would like to just tell the program to take all the files into a specific folder and copy two columns from each of them into a new Excel file. I tried a for loop but it overwrites data coming from different files and I get a new Excel file with just one sheet with data copied from the last file read by the program. Could you help me, please?
Here is my code:
import os.path
import pandas as pd
folder=r'C:\\Users\\PycharmProjects\\excelfile\\'
for fn in os.listdir(folder):
fx = pd.read_excel(os.path.join(folder, fn), usecols='H,E')
with pd.ExcelWriter('Output.xlsx') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)
You should open the output file in append mode like so:
with pd.ExcelWriter("Output.xlsx", engine='openpyxl', mode='a') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)

How can I write a python scripts using pandas to iterate over Excel .xlsx files with multiple sheets?

I have some Excel .Xlsx files. Each file contains multiple sheets. I have used the following code to read and extract data from the files:
import pandas as pd
file = pd.ExcelFile('my_file.xlsx')
file.sheet_names #Displays the sheet names
df = file.parse('Sheet1') #To parse Sheet1
df.columns #To list columns
My interest is the email columns in each sheet. I have been doing this almost manually with the code above. I need a code to automatically iterate over the sheets and extract all emails. Help!
You can pass over all files and all sheets with a for loop:
import pandas as pd
import os
emails = []
files_dir = "/your_path_to_the_xlsx_files"
for file in os.listdir(files_dir):
excel = pd.ExcelFile(os.path.join(files_dir,file))
for sheet in excel.sheet_names:
df = excel.parse(sheet)
if 'email' not in df.columns:
continue
emails.extend(df['email'].tolist())
Now you have all the emails in the emails list.

Removing entire row with specific values in cells of excel file

Suppose I have 10 excel files in a directory, and I want to iterate over them and remove rows of each excel which meet certain conditions like (if cell contains values like null), and save the updated file and move that updated file into a new directory. I have to only remove the rows not the columns
How can I achieve this with python
Thanks in advance
I would propose to have a look at Pandas DataFrame. There you can easily import and export from Excel-files.
In your code you would iterate with a for loop over your files and remove the desired rows from your read-in DataFrames and export them to the Excel-files again.
I have written a semi-Pseudo code for you. Hope this helps. Store this code in the folder of your xlsx-files.
import glob
import os
import pandas as pd
import shutil
#create a new folder if not exists:
if not os.path.exists("New"):
os.makedirs("New")
# store all files in a list
filenames = glob.glob("*.xlsx")
#iterate through your files
for file in filenames:
#create dataframes from your files
df = pd.read_excel (file )
#insert some conditions:
#...
#...
#...
#...
#e.g. get specific value
#val=df.iloc[0,1]
#Drop the matching rows from your df e.g.
df.drop(df.index[0])
#write to excel files
df.to_excel(file,index=None)
# move updated files to that folder
shutil.move(file, "New/" + file)
print (df)

Categories