Using OpenPyxl to dynamically create dictionaries? - python

I'm using Openpyxl to read an excel file, specifically one column, that looks like:
this excel snapshot
The number of main and sub classes in the source document can change, and my goal is to be able to iterate through and create perhaps a nested dictionary for each main class of the form:
main_Class1 = { 'subClass1': {'data': 'data_1'},
'subClass2': {'data': 'data_2'}}
I'm open to any data type, so long as the info is connected like such.
I've though to have the Classes in column B, merge Main Classes into column A and Subclasses into column C, then hide A and C so I can separate main and subs to more easily iterate like
this
and tried:
mainClassList = []
mainClassDict = defaultdict(list)
activeClassList=[]
for row in ws.iter_rows(min_row=2):
activeClass ="" #supposed to update this at every
#appropriate row
if row[0].value is not None:
activeClass=row[0].value
mainClassList.append(activeClass)
mainClassDict[activeClass]=[]
activeClassList.append(activeClass)
# add 2nd column entries to 1st column keys
# would be better if these were nested dicts
if row[0].value is None and row[1].value is not None:
mainClassDict[activeClass].append(row[1].value)
#check to see things are being added and updated as needed
print("main Class List:", mainClassList)
print("active classes;", activeClassList)
for key, value in mainClassDict.items():
print(key, ' : ', value)

I eventually solved with the following:
mainClassDict = {} #create an empty dictionary for each level
subClassDict = {}
for row in class_sheet.iter_rows(min_row=2):
#catch only row 1 values
if row[0].value is not None:
main = row[0].value
mainClassDict[main]={}
subClassDict[main]={}
# if none in row 1, select row 2 value instead
if row[0].value is None and row[1].value is not None:
subclasslist = []
subclasslist.append(row[1].value)
#create empty list for data
attributelist = []
# populate data list from row of choice
attributelist.append(row[...].value)
# populate dict from elements in list using nested for
for key in subclasslist:
for value in attributelist:
subClassDict[main][key] = value

Related

Wrong indexing while creating a pandas dataframe with a dicitonary iteratively

I am trying to create a pandas dataframe with selenium objects 'left' and 'right'.
left = driver.find_elements(by=By.CLASS_NAME, value='lc')
right = driver.find_elements(by=By.CLASS_NAME, value='rc')
These return strings as ojects which has a different number of values for each item in left and right. But left and right has same number of elements for an iteration. The strings from 'left' are column names and values from 'right' has to be appended to the corresponding column names. I tried the following:
for l, r in zip(left, right):
# Get the text from the left and right elements
l_text = l.text
r_text = r.text
# Create a dictionary for the row with the left text as the key and the right text as the value
row = {l_text: r_text}
# Append the dictionary to the list
data.append(row)
# Create the dataframe from the list of dictionaries
df = pd.DataFrame(data)
The df created out of it has a problem with the index such that each value is added to a new index instead of being added to the same row. How do I add all values from an iteration to the same row.
The 'left' values are attributes of brake disks and the 'right' refers to its corresponding values. These vary for each item sometimes there are more and sometimes less.
The following should do what you want:
Items are added to the row until it encounters the same header.
Once a duplicate header is discovered, the row variable is appended to data and then cleared for the next round / row.
data=[]
row={}
for l, r in zip(left, right):
# Get the text from the left and right elements
l_text = l.text
r_text = r.text
if l_text is not None and l_text != "":
if l_text in row:
data.append(row)
row = {}
row[l_text] = r_text
# This is required to append the last row
if len(row) > 0:
data.append(row)
# Create the dataframe from the list of dictionaries
df = pd.DataFrame(data)
print(df)
I made some adjustments to your code when I append each key, value in a dictionary then append it to the dataframe
data = pd.DataFrame()
dic = {}
for l, r in zip(left, right):
# Get the text from the left and right elements
dic[l.text] = r.text
# Create a dictionary for the row with the left text as the key and the right text as the value
# Append the dictionary to the list
data = data.append(dic,ignore_index=True)
#data is your final dataframe
Try do it this way:
left = driver.find_elements(by=By.CLASS_NAME, value='lc')
right = driver.find_elements(by=By.CLASS_NAME, value='rc')
# Create a dictionary with keys from the left and empty lists as values
data = {}
for element in left:
if element.text not in data.keys():
data[element.text] = list()
for l, r in zip(left, right):
# Add an element to list by key
data[l.text].append(r.text)
# Create the dataframe from the dictionary
df = pd.DataFrame.from_dict(data)
I have not worked with selenium so you may need to tweak the code a little (in terms of getting a text from left list values).

How do you save dictionary data to the matching excel row but different column using openpyxl

I am using openpyxl to read a column (A) from an excel spreadsheet. I then iterate through a dictionary to find the matching information and then I want to write this data back to column (C) of the same Excel spreadsheet.
I have tried to figure out how to append data back to the corresponding row but without luck.
CODE
from openpyxl import load_workbook
my_dict = {
'Agriculture': 'ET_SS_Agriculture',
'Dance': 'ET_FA_Dance',
'Music': 'ET_FA_Music'
}
wb = load_workbook("/Users/administrator/Downloads/Book2.xlsx") # Work Book
ws = wb['Sheet1'] # Work Sheet
column = ws['A'] # Column
write_column = ws['C']
column_list = [column[x].value for x in range(len(column))]
for k, v in my_dict.items():
for l in column_list:
if k in l:
print(f'The dict for {l} is {v}')
# append v to row of cell index of column_list
So, if my excel spreadsheet looks like this:
I would like Column C to look like this after I have matched the data dictionary.
In order to do this with your method you need the index (ie: row) to assign the values to column C, you can get this with enumerate when running over your column_list
for i, l in enumerate(column_list):
if k in l:
print(f'The dict for {l} is {v}')
# append v to row of cell index of column_list
write_column[i].value = v
After writing all the values you will need to run
wb.save("/Users/administrator/Downloads/Book2.xlsx")
To save your changes
That said, you do a lot of unnecessary iterations of the data in the spreadsheet, and also make things a little bit difficult for yourself by dealing with this data in columns rather than rows. You already have your dict with the values in column A, so you can just do direct lookups using split.
You are adding to each row, so it makes sense to loop over rows instead, in my opinion.
my_dict = {
'Agriculture': 'ET_SS_Agriculture',
'Dance': 'ET_FA_Dance',
'Music': 'ET_FA_Music'
}
wb = load_workbook("/Users/administrator/Downloads/Book2.xlsx") # Work Book
ws = wb['Sheet1'] # Work Sheet
for row in ws:
try:
# row[0] = column A
v = my_dict[row[0].value.split("-")[0]] # get the value from the dict using column A
except KeyError:
# leave rows which aren't in my_dict alone
continue
# row[2] = column C
row[2].value = v
wb.save("/Users/administrator/Downloads/Book2.xlsx")

How can I write to specific Excel columns using openpyxl?

I'm writing a Python script that needs to write collections of data down specific columns in an Excel document.
More specifically, I'm calling an API that returns a list of items. Each item in the list contains multiple fields of data (item name, item version, etc). I would like to iterate through each item in the list, then write selected fields of data down specific columns in Excel.
Currently, I'm iterating through the items list, then appending the fields of data I want as a list into an empty list (creating a list of lists). Once I'm done iterating through the list of items, I iterate through the list of lists and append to each row of the Excel document. This works, but makes writing to a specific column complicated.
Here is roughly the code that I currently have:
import requests
import json
from openpyxl import Workbook
def main():
wb = Workbook()
ws = wb.active
r = requests.get(api_url) # Can't provide URL
items_list = r.json()['items']
filler_list = []
for item in items_list:
item_name = item['itemName']
item_version = item['itemVersion']
# etc...
filler_list.append([item_name, item_version])
for i in filler_list:
ws.append(i)
wb.save('output.xlsx')
if __name__ == "__main__":
main()
The above code will write to the Excel document across row 1, then row 2, etc. for however many lists were appended to the filler list. What I would prefer to do is specify that I want every item name or item version to be added to whatever column letter I want. Is this possible with openpyxl? The main function would look something like this:
def main():
wb = Workbook()
ws = wb.active
r = requests.get(api_url) # Can't provide URL
items_list = r.json()['items']
for item in items_list:
item_name = item['itemName']
item_version = item['itemVersion']
# Add item name to next open cell in column B (any column)
# Add item version to next open cell in column D (any column)
wb.save('output.xlsx')
There are two general methods for accessing specific cells in openpyxl.
One is to use the cell() method of the worksheet, specifying the row and column numbers. For example:
ws.cell(row=1, column=1).value = 5
sets the first cell.
The second method is to index the worksheet using Excel-style cell notation. For example:
ws["A1"].value = 5
does the same thing as above.
If you're going to be setting the same column positions in each row, probably the simplest thing to do is make items_list an iterator and loop through the columns you want to set. This is a simplified example from your code above:
columns = ["B", "D", "G", etc. etc.]
items_list = iter(r.json()['items'])
row = 1
for col in columns:
ws[f"{col}{row}"].value = next(items_list)

How do I change the value type (from float to integer) in a list of dictionaries?

I am trying to parse an Excel with the output as list of dictionaries. I wish to change the type of one of two columns:
Date : Any date format
Account - see pic)
from Float to integer (in the excel it has no decimal values)
How do I make it happen so that it is saved permanently for further code on reference of this list of dictionary?
Output of my code is as seen in the picture here:
I tried various options but unsuccessful in making the change and have it displayed as output.
My code:
import xlrd
workbook = xlrd.open_workbook('filelocation')
ws = workbook.sheet_by_index(0)
first_row = [] # The first row with values in column
for col in range(ws.ncols):
first_row.append(ws.cell_value(0, col))
# creating a list of dictionaries
data = []
for row in range(1, ws.nrows):
d = {}
for col in range(ws.ncols):
d[first_row[col]] = ws.cell_value(row, col)
data.append(d)
for i in data:
if i['Account'] in data:
i['Account'] = int(i['Account'])
print(int(i['Account']))
print(data)
I added the last part to make changes on Account column but it does save the changes in the output.
Your problem is with the condition if i['Acount'] in data:.
data is a list of dicts. i['Acount'] is a float. So the above condition is never met, and any value gets converted to int.
From what I understand from your code you can simply remove the condition:
for i in data:
i['Acount'] = int(i['Acount'])
If you want to generally change all floats to ints, you could change the part where you read the file to:
for col in range(ws.ncols):
value = es.cell_value(row, col)
try:
value = int(value)
finally:
d[first_row[col]] = value
You have the right idea, but the syntax is a little off.
for element in data: #iterate through list
for key, value in element.items(): #iterate through each dict
if type(value) == float:
element[key] = int(value)
This will cast all your floats to ints.

Python script to manipulate excel sheets by auto-filling spaces in columns

Hi I have an excel file with a similar structure as below:
Location column2 column3
1 South Africa
2
3
4
5 England
6
7
8
9 U.S
10
11
12
I am trying to write a python script that can fill out the spaces between each location with the name of the preceding location (i.e fill out the space from 2 to 4 with the South Africa as the location, 6-8 will be filled out with England as the location, etc)
I would be grateful if someone could point me in the right direction.Thanks
Ok dude, I think the answer is this dumb wrapper I made for xlrd (or, one you write yourself!). The key is that the function reads one row at a time into a list, and that Python lists remember the order in which they were populated. The wrapper produces a dictionary which maps Excel sheet names to a list of rows on that sheet (we're assuming one table per sheet here, you'll have to generalize things otherwise). Each row is a dictionary whose keys are the column names.
For you, I'd read in your data, and then do something like this (not tested):
import see_below as sb
dict = sb.workbookToDict(your_file)
output = []
this_location = None
for row in dict[relevant_sheet_name]:
output_row = row
if row['Location'] is not None:
this_location = row['Location']
else:
output_row['Location'] = this_location
There might be something cute you can do with list comprehension, but I've had too much wine to fool with that tonight :)
Here's the wrapper for the reader:
import xlrd
def _isEmpty(_):
return ''
def _isString(element):
return element.value.encode('ascii', 'ignore')
def _isFloat(element):
return float(element.value)
def _isDate(element):
import datetime
rawDate = float(element.value)
return (datetime.datetime(1899, 12, 30) +
datetime.timedelta(days=rawDate))
def _isBool(element):
return element.value == 1
def _isExcelGarbage(element):
return int(element.value)
_options = {0: _isEmpty,
1: _isString,
2: _isFloat,
3: _isDate,
4: _isBool,
5: _isExcelGarbage,
6: _isEmpty}
def WorkbookToDict(filename):
'''
Reads .xlsx file into dictionary.
The keys of the dictionary correspond to sheet names in the Excel workbook.
The first row of the Excel workbook is taken to be column names, and each row
of the worksheet is read into a separate dictionary, whose keys correspond to
column names. The collection of dictionaries (as a list) forms the value in the
dictionary. The output maps sheet names (keys) to a collection of dictionaries
(value).
'''
book = xlrd.open_workbook(filename)
allSheets = {}
for s in book.sheets():
thisSheet = []
headings = [_options[x.ctype](x) for x in s.row(0)]
for i in range(s.nrows):
if i == 0:
continue
thisRow = s.row(i)
if len(thisRow) != len(headings):
raise Exception("Mismatch between headings and row length in ExcelReader")
rowDict = {}
for h, r in zip(headings, thisRow):
rowDict[h] = _options[r.ctype](r)
thisSheet.append(rowDict)
allSheets[str(s.name)] = thisSheet
return allSheets
The writer is here:
import xlwt
def write(workbookDict, colMap, filename):
'''
workbookDict should be a map of sheet names to a list of dictionaries.
Each member of the list should be a mapping of column names to contents,
missing keys are handled with the nullEntry field. colMap should be a
dictionary whose keys are identical tto the sheet names in the workbookDict.
Each value is a list of column names that are assumed to be in order.
If a key exists in the workbookDict that does not exist in the colDict, the
entry in workbookDict will not be written.
'''
workbook = xlwt.Workbook()
for sheet in workbookDict.keys():
worksheet = workbook.add_sheet(sheet)
cols = colMap[sheet]
i = 0
writeCols = True
while i <= len(workbookDict[sheet]):
if writeCols:
for j in range(len(cols)):
if writeCols: # write col headings
worksheet.write(i, j, cols[j])
writeCols = False
else:
for j in range(len(cols)):
worksheet.write(i, j, workbookDict[sheet][(i-1)][cols[j]])
i += 1
workbook.save(filename)
Anyway, I really hope this works for you!
wb = openpyxl.load_workbook('enter your workbook name')
sheet = wb.get_sheet_by_name('enter your sheet name')
row=sheet.max_row
for row in range (3,row):
if sheet.cell(row=row, column=1).value is not None and sheet.cell(row=row+1,column=1).value is None:
sheet.cell(row=row+1, column=1).value = sheet.cell(row=row, column=1).value
wb.save('enter your workbook name')

Categories