I have a CSV file which have multiple sheets in it. Want to read it sheet by sheet and filter some data and want to create csv file in same format. how can I do that. Please suggest. I was trying it though pandas.ExcelReader but its not working for CSV file.
you can use the following code for this may help!
import pandas as pd
def read_excel_sheets(xls_path):
"""Read all sheets of an Excel workbook and return a single DataFrame"""
print(f'Loading {xls_path} into pandas')
xl = pd.ExcelFile(xls_path)
df = pd.DataFrame()
columns = None
for idx, name in enumerate(xl.sheet_names):
print(f'Reading sheet #{idx}: {name}')
sheet = xl.parse(name)
if idx == 0:
# Save column names from the first sheet to match for append
columns = sheet.columns
sheet.columns = columns
# Assume index of existing data frame when appended
df = df.append(sheet, ignore_index=True)
return df
the resource for this code is the link below:
click here
and for converting it back to csv you can follow the post which link is
attached here
Related
I'm using this line code to get all sheets from an Excel file:
excel_file = pd.read_excel('path_file',skiprows=35,sheet_name=None)
sheet_name=None option gets all the sheets.
How do I get all sheets except one of them?
If all you want to do is exclude one of the sheets, there is not much to change from your base code.
Assume file.xlsx is an excel file with multiple sheets, and you want to skip 'Sheet1'.
One possible solution is as follows:
import pandas as pd
# Returns a dictionary with key:value := sheet_name:df
xlwb = pd.read_excel('file.xlsx', sheet_name=None)
unwanted_sheet = 'Sheet1'
# list comprehension that filters out unwanted sheet
# all other sheets are kept in df_generator
df_generator = (items for keys, items in xlwb.items()
if keys != unwanted_sheet)
# get to the actual dataframes
for df in df_generator:
print(df.head())
I need to read data from several sheets in a xlsx file, and save data as a dataframe with the same name as sheet name. Here is the code I use. It can read data from different sheets, however, all dataframes are named as temp. How should I change it. Thanks.
import pandas as pd
sheet_name_list = ['sheet1','sheet2','sheet3']
for temp in sheet_name_list:
temp = pd.read_excel("data_spreadsheet.xlsx", sheet_name = temp)
You can use dictionary:
pd_dict = {}
for temp in sheet_name_list:
pd_dict[temp] = pd.read_excel("data_spreadsheet.xlsx", sheet_name=temp)
I am trying to automate a process that basically reads in values from text files into certain excel cells. I have a template in excel that will read data from various sheets under certain names. For example, the template will read in data from "Video scores". Video scores is a .txt file that I copy and paste into excel. There are 5 different text files used in each project so it gets tedious after a while and when there are a lot of projects to complete.
How can I import or copy and paste these .txt files into excel to a specified sheet? I have been using openpyxl for the other parts of this project, but I am open to using another library if it can't be done with openpxl.
I've also tried opening and reading a file, but I couldn't figure out how to do what I want with that either. I have found a list of all the files I need, its just a matter of getting them into excel.
Thanks in advance for anyone who helps.
First, import the TXT file into a list in python, i'm asumming the TXT file is like this
1
2
3
4
....
with open(path_txt, "r") as e:
list1 = [i for i in e]
then, we paste the values of the list on the worksheet you need
from openpyxl import load_workbook
wb = load_workbook(path_xlsx)
ws = wb[sheet_name]
ws["A1"] = "values" #just a header
row = 2 #represent the 2 row of the sheet
column = 1 #represent the column "A" of the sheet
for i in list1:
ws.cell(row=row, column=column).value = i #getting the current cell, and writing the value of the list
row += 1 #just setting the current to the next
wb.save(path_xlsx)
Hope this works for you.
Pandas would do the trick!
Approach:
Have a sheet containing path to your files, separator, the corresponding target sheet names
Now read this excel sheet using pandas and iterate over each row for each file details, read the data, write it to new excel sheet of same workbook.
import pandas as pd
file_details_path = r"/Users/path for xl sheet/file details/File2XlDetails.xlsx"
target_sheet_path = r"/Users/path to target xl sheet/File samples/FiletoXl.xlsx"
# create a writer to save the file content in excel
writer = pd.ExcelWriter(target_sheet_path, engine='xlsxwriter')
file_details = pd.read_excel(file_details_path,
dtype = str,
index_col = False
)
def write_to_excel(file, trg_sheet_name):
# writes it to excel
file.to_excel(writer,
sheet_name = trg_sheet_name,
index = False,
)
# loop through each file record
for index, file_dtl in file_details.iterrows():
# you can print and check the row content for reference
print(file_dtl['File_path'])
print(file_dtl['Separator'])
print(file_dtl['Target_sheet_name'])
# reads file
file = pd.read_csv(file_dtl['File_path'],
sep = file_dtl['Separator'],
dtype = str,
index_col = False,
)
write_to_excel(file, file_dtl['Target_sheet_name'])
writer.save()
Hope this helps! Let me know if you run into any issues...
I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")
I have several workbooks, each with three sheets. I want to loop through each workbook and merge all the data from sheet_1 into a new workbook_1 file, sheet_2 into workbook_2 file & sheet_3 into workbook_3.
As far as I can tell the script below does everything I need, except rather than appending the data, it overwrites the data from the previous iteration.
For the sake of parsimony I've shortened, cleaned & simplified my script, but I'm happy to share the full script if needed.
import pandas as pd
import glob
search_dir= ('/Users/PATH/*.xlsx')
sheet_names = ['sheet_1','sheet_2','sheet_2']
def a_joiner(sheet):
for loop_x in glob.glob(search_dir):
try:
if sheet == 'sheet_1':
id_file= pd.ExcelFile(loop_x)
df_1 = id_file.parse(sheet, header= None)
writer= pd.ExcelWriter('/Users/PATH/%s.xlsx' %(sheet), engine= 'xlsxwriter')
df_1.to_excel(writer)
writer.save()
elif sheet == 'sheet_2':
#do same as above
else:
#and do same as above again
except Exception as e:
print('Error:',e)
for sheet in sheet_names:
a_joiner(sheet)
You can also easilly append data like:
df = []
for f in ['c:\\file1.xls', 'c:\\ file2.xls']:
data = pd.read_excel(f, 'Sheet1').iloc[:-2]
data.index = [os.path.basename(f)] * len(data)
df.append(data)
df = pd.concat(df)
From:
Using pandas Combining/merging 2 different Excel files/sheets