How to check every cell from a column in an excel sheet? - python

I am struggling to find a way to solve my problem. I have an excel file which has data.
I need to check the type of data in columns (with every cell).
For example, in this column, I need to check that every cells are strings. But as you can see, there is a cell that is an int.
In this situation, I need to write this line in a new text file.
This is the code I have so far :
from openpyxl import load_workbook
book = load_workbook('export.xlsx')
sheet = book['Data']
for row in sheet.rows:
print (str(row[6].value))
Thanks for any help !

For this special case you may try this:
intList = list()
for row in sheet.rows:
try:
intList.append(int(row[6].value))
except:
pass
It will try to get the int value of the cell, and if it succeeded, it will push it into a list

Related

How to get previous row count and column values in xlsx using python list iteration

I need to get row count and particular column values using python list iteration can any one help me on this.
for every row in excel i need to get url,username and password.
for that i have used the below code.
xl_workbook = xlrd.open_workbook(file)
sheet = xl_workbook.sheet_by_index(0)
vendor = [[sheet.cell_value(r,c) for c in range(sheet.ncols)] for r in range(1,sheet.nrows)]
for i in vendor:
user_name = i[1]
password=i[2]
like this when first iteration is completed, i need to check previous username and password and need to compare with current iteration values.
Can anyone help me on this.
Whenever I deal with excel or any other list file input, I always create a list before reading the file, and appending each line to a list so I can access them later.
If you need to compare to any given username/password I would use something like the following:
Putting into list
This allows you to get values from any username/password.

Changing date-format to text in xlsx using openpyxl

I have written a script that reads from excel workbooks and writes new workbooks.
Each row is a separate object, and one of the columns is a date.
I have written the date as a NamedStyle using datetime to get what I think is the correct format:
date_style = NamedStyle(name='datetime', number_format='YYYY-MM-DD')
for row in range(2,ws_kont.max_row+1):
ws_kont.cell(row = row, column = 4).style = date_style
The problem is that i need to import this excel workbook to an ancient database who for some reason dont accept a date-formating, only text like this "yyyy-dd-mm".
I'm having trouble rewriting these cells as text.
I have tried using the =TEXT formula, but that wont work since you cant use the cell itself to calculate the result unless i duplicate the column for referencing in the formula:
name = str(ws_teg.cell(row = row, column = 4).coordinate)
date_f = "yyyy-mm-dd"
ws_kont[name] = "=TEXT(%s,%s)" % (name, date_f)
I need to do this a bunch of places in a couple of scripts, so I'm wondering if there is a simpler way to do this?
PS. I'm just a archaeologist trying to automate some tasks in my workday by dabbling in some simple code, please go easy on me if I seem a bit slow.
Found another article that worked out well with minmal code:
writer = pd.ExcelWriter('Sample_Master_Data_edited.xlsx', engine='xlsxwriter',
date_format='mm/dd/yyyy', datetime_format='mm/dd/yyyy')
Reference
Most likely, it won't be enough to change the format of your date - you'll have to store the date as a string instead of a datetime object.
Loop over the column and format the dates with datetime.strftime:
for row in range(1, ws_kont.max_row+1):
cell = ws_kont.cell(row = row, column = 4)
cell.value = cell.value.strftime('%Y-%m-%d')

XLWINGS: How to select an entire column without headers?

How do I go about selecting an entire column from excel without the headers?
For example, when I try the following code, it selects the entire column including the header:
import xlwings as xw
wb = xw.Book.caller()
wb.sheets[0].range('A:A').options(ndim=1).value
How do I select the entire column A without including the header? I basically want to use xlwings to receive values from each cell of a column from the beginning of that column till its last value (not including the header).
Please advise.
Thank you
You can directly slice the Range object (you don't need to declare the dimension as 1d arrays arrive per default als simple lists):
wb = xw.Book.caller()
wb.sheets[0].range('A:A')[1:].value
Alternatively, define an Excel Table Object (Insert > Table):
wb.sheets[0].range('Table1[[#Data]]').value
This will automatically exclude the headers, see e.g. here for the syntax.

Counting rows in Excel file but not empty rows (Python)

So I have an Excel document where I have a lot different stuff. But I need some way to count all elements/rows that are NOT blank. The reason I just don't remove the blank rows from the Excel document is due to some ID trouble when I use it on my website. So whenever I need to delete something in the document I delete the text in the rows, but not the row itself.
Before I put it online, and just deleted the row entirely I could just use the following command to count how many elements/rows there were:
import openpyxl
from collections import Counter
wb = openpyxl.load_workbook('document.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
number_of_rows = sheet.max_row
But as stated, now, since I put it online, I can only delete the text, and not the row, which gives me the same count if I use the above row, as if they were filled with text. So basically, how do I go about counting ONLY the rows that actually have some data?
You have to iterate over all rows, decided which row is not empty and count not_empty +=1.
Read Iteration over Worksheets, Rows, Columns

Smartsheet API Python - extract data from external excel file by cell and update into a smartsheet using API

So I am trying to copy all the data from an excel file with dynamic row length - can range from 100 to 500 rows , which I then want to copy the contents from each cell iterating by column and updating rows to the last row
now my current code updates by Row when I specify the column ID, I am storing
a primary column and non primary column[] , I am not sure how do I iterate a update through cells in each column in my row first , so If I lose my interenet connection for any reason I know till where it got last updated.
Yes this is slow process
The second part is I can open an excel file with openpyxl
read the cell value and store it in a variable but I am struggling to
pass it to the smart sheet code ....
MySheet = smartsheet.Sheets.get_sheet(SHEET_ID, PrimaryCol)
for MyRow in MySheet.rows:
for MyCell in MyRow.cells:
print (MyRow.id, MyCell.value)
row_a = smartsheet.Sheets.get_row(SHEET_ID,MyRow.id)
cell_a = row_a.get_column(PrimaryCol)
cell_a.value = 'new value'
row_a.set_column(cell_a.column_id, cell_a)
smartsheet.Sheets.update_rows(SHEET_ID, [row_a])
Any help would get great thanks
I think these links (Add Rows, Update Rows) will be helpful in achieving the functionality you're looking for.
Ultimately, when ripping through an excel or CSV file, you're going to want to generate the entire row update (and updates for all of the rows) before submitting the update call to Smartsheet.
It appears in your code that you're making an update call for each cell in your sheet. So on a high-level, you might try first getting all of the columnIDs for your sheet and then for each row in your excel file, generating an update/add call for that new row.
Your last step should be a single call to the sheet that contains all of the row updates you're looking for. The last call should very seriously look something like:
smartsheet.Sheets.update_rows(SHEET_ID, ROW_UPDATES)
Where ROW_UPDATES is a list of all the row objects you're adding/updating.

Categories