How to read excel sheet data in python - python

How to read excel file in python I already used some program but not work.
I want to create folder folder of separate sets

I hope, it helps you in understanding how to read excel file, try to correctly specify your file path. In my case ./ means current file where my python file exist. Move your excel file where your python file exist
Install:
pip install pandas openpyxl
Solution 1
import pandas as pd
df = pd.read_excel('./TfidfVectorizer_sklearn.xlsx')
df
Solution 2
import openpyxl
book = openpyxl.load_workbook('./TfidfVectorizer_sklearn.xlsx')
sheet = book.active
cells = sheet['A1': 'D5']
for c1, c2, c3, c4 in cells:
print(f"{c1.value} {c2.value} {c3.value} {c4.value}")

Related

How to open all excel files within a folder create and create a new spreadsheet consisting of specific worksheets within those files

I have a folder consisting of several excel documents.
For each file in the list I want to go to 3 specific sheets (that are present in each of the files) and copy these sheets into a new workbook.
So the it looks like this
Folder:
File1
Sheet1
Sheet2
...
Sheetn
File2
Sheet1
Sheet2
...
Sheetn
...
Filen
Sheet1
Sheet2
...
Sheetn
The sheet names are not titled like this but all have a similar naming structure so I want to write something like this:
new_file = excel workbook # create a new workbook (not sure the syntax)
for file in folder:
open file
for sheet in file:
if sheetname like 'foobar1' or sheetname like 'foobar2'....:
copy sheet into new_file
save new_file
The problem is I don't know what libraries to use or exactly how to write this properly.
I am not well versed with using python to interact with excel documents.
Any ideas?
At first, install all libraries that you need to read all excel files from your directories.
pip install xlrd
pip install pandas
pip install xlsxwriter
Then, import these to your code:
import os
import xlrd
import pandas as pd
import xlsxwriter
address='E:\\DataFrames\\CSV\\'
List_SubFolders=os.listdir(address)
number=0
with pd.ExcelWriter('E:/DataFrames/output.xlsx') as writer: # doctest: +SKIP
for folders in List_SubFolders:
temp_folder=folders
List_XLS_In_Directory=os.listdir(address+str(temp_folder))
for xls in List_XLS_In_Directory:
#print(address+str(temp_folder)+'\\'+str(xls))
df = pd.read_excel(address+str(temp_folder)+'\\'+str(xls), sheet_name='Sheet1')
#df = pd.read_excel('E:/DataFrames/CSV/1/EZ Apply GPA Calculator.xlsx', sheet_name='Sheet1')
number+=1
df.to_excel(writer, sheet_name='Sheet_name_'+str(number))
Finally, you have a excel file that has a sheet for each one.
I read all excel files (or csv) in this directory.

How to read a specific row in excel file using python

I want to read single row from an Excel_file1, Sheet1, Row number 7 using python, any help?
First install xlrd
pip install xlrd
then open python file and
import xlrd
# Give the location of the file
loc = ("path of file")
# To open Workbook
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
print(sheet.row_values(7))
location is relative path not absolute path.
To read more about xlrd and its usage visit https://xlrd.readthedocs.io/en/latest/
Happy coding.
You can also use pd.read_excel of pandas library:
You would need to install pandas and xlrd first:
import pandas as pd
import xlrd
df = pd.read_excel('abc.xlsx', sheet_name='Sheet1')
Now, you can filter your dataframe to get any specific row using iloc
df.iloc[6] ## This will give you 7th row

how to handle excel file (xlsx,xls) with excel formulas(macros) in python

I need to pass inputs from Input_data.xls in iteration to existing xls file which have special function at various cells using python3.6. These function change primary data in existing xls as per inputs. But when xlrd open the file it doesn't import the xls cell function and save file file with modification. And write object name instead of its value
Python code:
import xlrd
import xlwt
import xlutils
from xlrd import open_workbook
from xlutils.copy import copy
import os.path
book = xlrd.open_workbook('input_data.xlsx')
sheet0 = book.sheet_by_index(0)
for i in range (sheet0.nrows):
st1=sheet0.row_values(i+1)
TIP=[st1[0]]
OOIPAN_IP=[st1[1]]
NM=[st1[2]]
book1 = xlrd.open_workbook('primary_data.xls')
wb=copy(book1)
w_sheet=wb.get_sheet(0)
w_sheet.write(1,0,'TIP')
w_sheet.write(1,1,'OIP')
w_sheet.write(1,2,'NM')
wb.save('ipsectemp.xls')
write object name in cells instead of object's vlaue
input 1 input 2 input 3
st1[0] st1[1] st1[2]
which module can help to open/read/write workbook with its excel functions (macros) in python.
Luckly, i found below code that can fetch excel macros, openpyxl module does good work using cell values
book = load_workbook('primary_data.xlsx') #open ipsec file with desired inputs
sheet0 = book.get_sheet_by_name('Sheet1')
for row in range(2,sheet0.max_row+1):
for column in "A": #Here add or reduce the columns
cell_name = "{}{}".format(column, row)
textlt=sheet0[cell_name].value
print(textlt)
information extracted from this answer
openpyxl - read only one column from excel file in python? used information other way

pandas read excel values not formulas

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,
import pandas as pd
df = pd.read_excel(filename, sheetname="Sheet1")
This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?
That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.
In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).
In the second case, you could solve by setting explicit nan values in read_excel:
pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])
As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:
import pandas as pd
import xlwings as xl
def df_from_excel(path):
app = xl.App(visible=False)
book = app.books.open(path)
book.save()
app.kill()
return pd.read_excel(path)
df = df_from_excel(path to your file)
If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.
I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.
you can use xlrd to read the values.
first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below
file = myxl.xls
import xlrd
import win32com.client
import os
def refresh_file(file):
xlapp = win32com.client.DispatchEx("Excel.Application")
path = os.path.abspath(file)
wb = xlapp.Wordbooks.Open(path)
wb.RefreshAll()
xlapp.CalculateUntilAsyncqueriesDone()
wb.save()
xlapp.Quit()
after the file refresh, you can start reading the content.
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
row = worksheet.row(rowid)
for colid, cell in enumerate(row):
print(cell.value)
you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

Write data into existing excel file and making summary table

I have to write some data into existing xls file.(i should say that im working on unix and couldnt use windows)
I prefer work with python and have tried some libraries like xlwt, openpyxl, xlutils.
Its not working, cause there is some filter in my xls file. After rewriting this file filter is dissapearing. But i still need this filter.
Could some one tell me about options that i have.
help, please!
Example:
from xlutils.copy import copy
from xlrd import open_workbook
from xlwt import easyxf
start_row=0
rb=open_workbook('file.xls')
r_sheet=rb.sheet_by_index(1)
wb=copy(rb)
w_sheet=wb.get_sheet(1)
for row_index in range(start_row, r_sheet.nrows):
row=r_sheet.row_values(row_index)
call_index=0
for c_el in row:
value=r_sheet.cell(row_index, call_index).value
w_sheet.write(row_index, call_index, value)
call_index+=1
wb.save('file.out.xls');
I also tried:
import xlrd
from openpyxl import Workbook
import unicodedata
rb=xlrd.open_workbook('file.xls')
sheet=rb.sheet_by_index(0)
wb=Workbook()
ws1=wb.create_sheet("Results", 0)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws1.append(arr)
ws2=wb.create_sheet("Common", 1)
sheet=rb.sheet_by_index(1)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws2.append(arr)
ws2.auto_filter.ref=["A1:A15", "B1:B15"]
#ws['A1']=42
#ws.append([1,2,3])
wb.save('sample.xls')
The problem is still exist. Ok, ill try to find machine running on windows, but i have to admit something else:
There is some rows like this:
enter image description here
Ive understood what i was doing wrong, but i still need help.
First of all, i have one sheet that contains some values
Second sheet contains summary table!!!
If i try to copy this worksheet it did wrong.
So, the question is : how could i make summary table from first sheet?
Suppose your existing excel file has two columns (date and number).
This is how you will append additional rows using openpyxl.
import openpyxl
import datetime
wb = openpyxl.load_workbook('existing_data_file.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
a = sheet.get_highest_row()
sheet.cell(row=a,column=0).value=datetime.date.today()
sheet.cell(row=a,column=1).value=30378
wb.save('existing_data_file.xlsx')
If you are on Windows, I would suggest you take a look at using the win32com.client approach. This allows you to interact with your spreadsheet using Excel itself. This will ensure that any existing filters, images, tables, macros etc should be preserved.
The following example opens an XLS file adds one entry and saves the whole workbook as a different XLS formatted file:
import win32com.client as win32
import os
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'input.xls')
ws = wb.Worksheets(1)
# Write a value at A1
ws.Range("A1").Value = "Hello World"
excel.DisplayAlerts = False # Allow file overwrite
wb.SaveAs(r'sample.xls', FileFormat=56)
excel.Application.Quit()
Note, make sure you add full paths to your input and output files.

Categories