Accessing Excel columns by names in xlwings - python

In pandas Excel columns can be accessed using names that are assigned in the first row of the sheet. How can this be achieved in xlwings?

You can use Pandas as a converter as of xlwings 0.7.0. for an example workbook like this:
A B C
1 4 7
2 5 8
3 6 9
This code will read the table in and allow you to access the data via column headers. The key is the .options(pd.DataFrame, index=False) bit. That particular call will return a Pandas DataFrame, with a default index.
More info on xlwings converters here.
import xlwings as xw
import pandas as pd
def calc():
# Create a reference to the calling Excel xw.Workbook
wb = xw.Workbook.caller()
table = xw.Range('A1').table.options(pd.DataFrame, index=False).value
# Access columns as attributes of the Pandas DataFrame
print table.A
print table.B
# Access columns as column labels of the Pandas DataFrame
print table['A']
print table['B']
if __name__ == '__main__':
path = "test.xlsm"
xw.Workbook.set_mock_caller(path)
calc()

You can use square brackets to access the columns, as suggested here:
import xlwings as xw
wb = xw.Workbook.active()
xw.Range('TableName[ColumnName]').value

Related

Return several rows in python from Excel Table

I was wondering if I can return several rows of an excel sheet that where some columns consist of a unique string. And then I want to export them into a CSV.
I was considering openpyxl but am not getting too far.
If my Excel looks like that:
Sample
I would e.g. search for ID2 and return all rows
ID2,1,ping
ID2,2,pong
from openpyxl import Workbook
import openpyxl
file = "test.xlsx"
wb = openpyxl.load_workbook(file, read_only=True)
ws = wb.active
for row in ws.iter_rows("A"):
for cell in row:
if cell.value == "ID2":
print(ws.cell(row=cell.row, column=1,2,3).value)
Can anyone help me?
Try using pandas pd.read_excel() and pd.to_csv(), for example:
import pandas as pd
df = pd.read_excel('/file/path/excel.xslx')
df_filtered = df[df['id_column'] == 'ID1'] # returns df with only rows where 'id_column' is 'ID1'
df_filtered.to_csv('/file/path/output.csv')
Would export a csv with only rows where your 'id_column' is equal to 'ID1'.

How to combine value in multiple cell references in Excel by Python?

I am trying to do something like grabbing all values in each cell while they are referencing one by one. Maybe an example help illustration.
Example:
A
B
C
=B2
='I am' & C2
'Peter
Example2 - in term of number:
A
B
C
D
=B2
=C2*D2
12
56
So I want to get a concat string 'I am Peter' or 672 (from 12*56) when I reading the cell A2
Code I tried:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'new.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
print(df)
The formula will become 'NaN'
Any suggestion to achieve it? Thanks!
If you want to have the actual values of the cells, you have to use data_only=True
wb = load_workbook(filename = 'new.xlsx', data_only=True)
Look here: Read Excel cell value and not the formula computing it -openpyxl
Anyway, as you use pandas, it would be way easier to go directly:
import pandas as pd
df = pd.read_excel('new.xlsx')
print(df)
which grabs the first sheet (but could be specified) and gives the values as output.
openpyxl supports either the formula or the value of the formula. You can select which using the data_only parameter when loading a workbook.
You can change your code like below:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename='new.xlsx', data_only=True)
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
print(df)

Import all excel sheets from a excel file into pandas [duplicate]

I am trying to read an excel file this way :
newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)
which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?
Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.
>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
Tid dummy1 dummy2 dummy3 dummy4 dummy5 \
0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307
1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360
2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395
3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187
4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058
dummy6 dummy7 dummy8 dummy9
0 0.623354 0 2.579108 2.681728
1 0.554211 0 7.210000 3.028614
2 0.567841 0 6.940000 3.644147
3 0.581470 0 6.630000 4.016155
4 0.595100 0 6.350000 3.974442
What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:
>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)
This is much simple and easy way.
import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
Check out documentation full details.
FutureWarning: The sheetname keyword is deprecated for newer Pandas versions, use sheet_name instead.
Thought i should add here, that if you want to access rows or columns to loop through them, you do this:
import pandas as pd
# open the file
xlsx = pd.ExcelFile("PATH\FileName.xlsx")
# get the first sheet as an object
sheet1 = xlsx.parse(0)
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want
column = sheet1.icol(0).real
# get the first row as a list you can loop through
row = sheet1.irow(0).real
Edit:
The methods icol(i) and irow(i) are deprecated now. You can use sheet1.iloc[:,i] to get the i-th col and sheet1.iloc[i,:] to get the i-th row.
I think this should satisfy your need:
import pandas as pd
# Read the excel sheet to pandas dataframe
df = pd.read_excel("PATH\FileName.xlsx", sheet_name=0) #corrected argument name
Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.
import pandas as pd
sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
sheet2 = pd.read_excel(reader, sheet_name='Sheet2')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
You just need to feed the path to your file to pd.read_excel
import pandas as pd
file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)
Checkout the documentation to explore parameters like skiprows to ignore rows when loading the excel
import pandas as pd
data = pd.read_excel (r'**YourPath**.xlsx')
print (data)
Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:
import pandas as pd
myexcel = pd.ExcelFile("C:/filename.xlsx")
myexcel = myexcel.parse(myexcel.sheet_names[0])
Since .sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).
All of these works for me
In [1]: import pandas as pd
In [2]: df = pd.read_excel('FileName.xlsx') # If there is only one sheet in the excel file
In [3]: df = pd.read_excel('FileName.xlsx', sheet_name=0)
In [4]: In [20]: df = pd.read_excel('FileName.xlsx', sheet_name='Sheet 1')
#load pandas library
import pandas as pd
#set path where the file is
path = "./myfile.xlsx"
#load the file into dataframe df
df = pd.read_excel(path)
#check the first 5 rows
df.head(5)

Trying to write values to an excel sheet using python. Calculated values in that sheet do not update when trying to import the result into a dataframe

I am trying to write to an excel file and then load the result into a DF. However, calculated values are returning N/A in the DF even though when I open the excel sheet they are correctly displayed.
If I open and then save the excel sheet manually after updating using python, loading the dataframe works.
Here is the code:
from openpyxl import load_workbook
import pandas as pd
if __name__ == '__main__':
portfolio_values = getBalances('USD')
wb = load_workbook(filename = 'Client_Portfolio_Tracker.xlsx')
clients = wb['Clients']
clients['F2'] = portfolio_values
clients['G2'] = datetime.datetime.now().strftime("%m/%d/%y %H:%M")
wb.save('Client_Portfolio_Tracker.xlsx')
client_df = pd.read_excel('Client_Portfolio_Tracker.xlsx', sheetname = 'Clients')
print(client_df)
Output of dataframe
Thanks in advance!

Reading an Excel file in python using pandas

I am trying to read an excel file this way :
newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)
which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?
Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.
>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
Tid dummy1 dummy2 dummy3 dummy4 dummy5 \
0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307
1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360
2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395
3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187
4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058
dummy6 dummy7 dummy8 dummy9
0 0.623354 0 2.579108 2.681728
1 0.554211 0 7.210000 3.028614
2 0.567841 0 6.940000 3.644147
3 0.581470 0 6.630000 4.016155
4 0.595100 0 6.350000 3.974442
What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:
>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)
This is much simple and easy way.
import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
Check out documentation full details.
FutureWarning: The sheetname keyword is deprecated for newer Pandas versions, use sheet_name instead.
Thought i should add here, that if you want to access rows or columns to loop through them, you do this:
import pandas as pd
# open the file
xlsx = pd.ExcelFile("PATH\FileName.xlsx")
# get the first sheet as an object
sheet1 = xlsx.parse(0)
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want
column = sheet1.icol(0).real
# get the first row as a list you can loop through
row = sheet1.irow(0).real
Edit:
The methods icol(i) and irow(i) are deprecated now. You can use sheet1.iloc[:,i] to get the i-th col and sheet1.iloc[i,:] to get the i-th row.
I think this should satisfy your need:
import pandas as pd
# Read the excel sheet to pandas dataframe
df = pd.read_excel("PATH\FileName.xlsx", sheet_name=0) #corrected argument name
Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.
import pandas as pd
sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
sheet2 = pd.read_excel(reader, sheet_name='Sheet2')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
You just need to feed the path to your file to pd.read_excel
import pandas as pd
file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)
Checkout the documentation to explore parameters like skiprows to ignore rows when loading the excel
import pandas as pd
data = pd.read_excel (r'**YourPath**.xlsx')
print (data)
Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:
import pandas as pd
myexcel = pd.ExcelFile("C:/filename.xlsx")
myexcel = myexcel.parse(myexcel.sheet_names[0])
Since .sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).
All of these works for me
In [1]: import pandas as pd
In [2]: df = pd.read_excel('FileName.xlsx') # If there is only one sheet in the excel file
In [3]: df = pd.read_excel('FileName.xlsx', sheet_name=0)
In [4]: In [20]: df = pd.read_excel('FileName.xlsx', sheet_name='Sheet 1')
#load pandas library
import pandas as pd
#set path where the file is
path = "./myfile.xlsx"
#load the file into dataframe df
df = pd.read_excel(path)
#check the first 5 rows
df.head(5)

Categories