XLWINGS: How to select an entire column without headers? - python

How do I go about selecting an entire column from excel without the headers?
For example, when I try the following code, it selects the entire column including the header:
import xlwings as xw
wb = xw.Book.caller()
wb.sheets[0].range('A:A').options(ndim=1).value
How do I select the entire column A without including the header? I basically want to use xlwings to receive values from each cell of a column from the beginning of that column till its last value (not including the header).
Please advise.
Thank you

You can directly slice the Range object (you don't need to declare the dimension as 1d arrays arrive per default als simple lists):
wb = xw.Book.caller()
wb.sheets[0].range('A:A')[1:].value
Alternatively, define an Excel Table Object (Insert > Table):
wb.sheets[0].range('Table1[[#Data]]').value
This will automatically exclude the headers, see e.g. here for the syntax.

Related

Pandas read_excel: How to preserve cell format information for currency and percent

[ 10-07-2022 - For anyone stopping by with the same issue. After much searching, I have yet to find a way, that isn't convoluted and complicated, to accurately pull mixed type data from excel using Pandas/Python. My solution is to convert the files using unoconv on the command line, which preserves the formatting, then read into pandas from there. ]
I have to concatenate 1000s of individual excel workbooks with a single sheet, into one master sheet. I use a for loop to read them into a data frame, then concatenate the data frame to a master data frame. There is one column in each that could represent currency, percentages, or just contain notes. Sometimes it has been filled out with explicit indicators in the cell, Eg., '$' - other times, someone has used cell formatting to indicate currency while leaving just a decimal in the cell. I've been using a formatting routine to catch some of this but have run into some edge cases.
Consider a case like the following:
In the actual spreadsheet, you see: $0.96
When read_excel siphons this in, it will be represented as 0.96. Because of the mixed-type nature of the column, there is no sure way to know whether this is 96% or $0.96
Is there a way to read excel files into a data frame for analysis and record what is visually represented in the cell, regardless of whether cell formatting was used or not?
I've tried using dtype="str", dtype="object" and have tried using both the default and openpyxl engines.
UPDATE
Taking the comments below into consideration, I'm rewriting with openpyxl.
import openpyxl
from openpyxl import load_workbook
def excel_concat(df_source):
df_master = pd.DataFrame()
for index, row in df_source.iterrows():
excel_file = Path(row['Test Path']) / Path(row['Original Filename'])
wb = openpyxl.load_workbook(filename = excel_file)
ws = wb.active
df_data = pd.DataFrame(ws.values)
df_master = pd.concat([df_master, df_data], ignore_index=True)
return df_master
df_master1 = excel_concat(df_excel_files)
This appears to be nothing more than a "longcut" to just calling the openpyxl engine with pandas. What am I missing in order to capture the visible values in the excel files?
looking here,https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html , noticed the following
dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. **If converters are specified, they will be applied INSTEAD of dtype conversion.**
converters dict, default None
Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.
Do you think that might work for you?

Way to refer a column within a same name under difference merged cell?

im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an ​​index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following

Accessing imported data from Excel in Pandas

I'm new to python and just trying to redo my first project from matlab. I've written a code in vscode to import an excel file using pandas
filename=r'C:\Users\user\Desktop\data.xlsx'
sheet=['data']
with pd.ExcelFile(filename) as xls:
Dateee=pd.read_excel(xls, sheet,index_col=0)
Then I want to access data in a row and column.
I tried to print data using code below:
for key in dateee.keys():
print(dateee.keys())
but this returns nothing.
Is there anyway to access the data (as a list)?
You can iterate on each column, making the contents of each a list:
for c in df:
print(df[c].to_list())
df is what the dataframe was assigned as. (OP had inconsistent syntax & so I didn't use that.)
Look into df.iterrows() or df.itertuples() if you want to iterate by row. Example:
for row in df.itertuples():
print(row)
Look into df.iloc and df.loc for row and column selection of individual values, see Pandas iloc and loc – quickly select rows and columns in DataFrames.
Or df.iat or df.at for getting or setting single values, see here, here, and here.

How to check every cell from a column in an excel sheet?

I am struggling to find a way to solve my problem. I have an excel file which has data.
I need to check the type of data in columns (with every cell).
For example, in this column, I need to check that every cells are strings. But as you can see, there is a cell that is an int.
In this situation, I need to write this line in a new text file.
This is the code I have so far :
from openpyxl import load_workbook
book = load_workbook('export.xlsx')
sheet = book['Data']
for row in sheet.rows:
print (str(row[6].value))
Thanks for any help !
For this special case you may try this:
intList = list()
for row in sheet.rows:
try:
intList.append(int(row[6].value))
except:
pass
It will try to get the int value of the cell, and if it succeeded, it will push it into a list

Python, Pandas: How to automatically skip Excel header cells and add the rest to a dataframe

Greetings I would like to transform an excel document into a dataframe, but unfortunately the excel documents are made by someone else and they will always headers like so:
Excel example
I would like to ignore the "made by stevens" and "made 04/02/21" parts and just read the relevant information like name, age, file.
How would I skip it using pandas
Is there a way to always skip those header information, even if the relevant info (name, age, file) starts at a different line on different documents? (IE in one document age is at row 4 and in another age is at row 7)
Thanks!
The function pandas.read_excel has a parameter called skiprows, if you feed it an integer it will simply skip the n first lines at the start of the file.
In your case just use:
df = pd.read_excel(filepath, skiprows=4)
The second part of your question this is trickier. Depending on your business use cases you might have different solutions. If the columns are always the same (Name, Age, file) you could import the excel file without skipping lines but with fixing the column names, then by dropping rows with empty data and the additional header row you didn't use.
If you want to skip header which is on row = 1, then you can try this
pandas.read_excel(skiprows=1, skipfooter=0)
you can specify the value in integer to skiprows=1 to skip header and skipfooter=1 to skip footer, the number can depending upon how many rows you want to skip.

Categories