Text to cleaner text to excel spreadsheet project

Text to cleaner text to excel spreadsheet project - python

I have a text file full of pc data, organized as a list of blocks of one of two types. Either:
*redacted*
My goal was to have Python (3.6.2) open and read the file, clean it up, and compile the data into an excel spreadsheet as follows:
Column 1: PC name
Column 2: Error Type (0 if none, 1-4 for 4 error types)
Column 3: ID (if no error, no braces containing the ID)
Column 4: Password (if no error, just the password)
Here is my code. I use Pycharm, and am in a virtual env:
import xlsxwriter
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
worksheet.set_column(0, 0, 14)
worksheet.set_column(1, 1, 5)
worksheet.set_column(2, 2, 38)
worksheet.set_column(3, 3, 55)
worksheet.write('A1', 'Name', bold)
worksheet.write('B1', 'Error', bold)
worksheet.write('C1', 'ID', bold)
worksheet.write('D1', 'Password', bold)
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line.lstrip
with open("C:\\Users\\MyName\\Desktop\\BLRP.txt", "r+") as op:
gold_lst = []
nonblank = nonblank_lines(op)
for line in nonblank:
if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
elif line.startswith("ID"):
gold_lst.append("IDG: " + str(line))
gold_lst.append('NO ERROR')
elif line.startswith("ERROR: An error occurred while"):
gold_lst.append('1')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: No key"):
gold_lst.append('2')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x80070057)"):
gold_lst.append('3')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x8004100e)"):
gold_lst.append('4')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("Password"):
gold_lst.append(str('Password: ' + next(nonblank)))
print(gold_lst)
op.close()
pc_data = (gold_lst)
row = 1
col = 0
for obj in pc_data:
if obj.startswith("Computer Name"):
worksheet.write_string(row, col, obj[15:])
elif obj.startswith('NO'):
worksheet.write_number(row, col + 1, 0, left)
elif obj.startswith('1'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('2'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('3'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('4'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith("ID: {ERROR"):
worksheet.write_string(row, col + 2, '')
elif obj.startswith("IDG: "):
worksheet.write_string(row, col + 2, obj[10:-1])
elif obj.startswith("Password"):
worksheet.write_string(row, col + 3, obj[9:])
row += 1
workbook.close()
Now, this works perfectly for the file in question, but, in addition to the terribly suboptimal code, I'm sure, there is something I can explicitly see that needs improved. In this block:
if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
I only want "NO ERROR" to be appended to my list if my line starts with "Computer Name" AND the next non-blank line does not begin with "ERROR." Naturally, I tried this:
if line.startswith("Computer Name"):
if next(nonblank).startswith("ERROR"):
gold_lst.append(str(line))
elif next(nonblank).startswith("VOLUME"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
The problem is, this creates a jacked up excel spreadsheet, and I don't at all know why. Even in the step afterward in the main code where I print gold_lst (just to check if the list is correct), the list is terribly inaccurate. I can't even seem to figure out of what the list is comprised.
How can I fix this?
As for a second question, if I may ask it in the same topic, more general text files of this type which I am likely to receive in the future may contain computers with more than one ID and password. The block would look like this, if I had to guess:
*redacted*
And there may be even more than 2 such ID/Password combos. How can I modify my code to allow for this? As it stands, my code will not easily account for this. I am quite new to Python, so maybe it could, but I don't see it.

One approach to this problem is as follows:
Read in the whole file, skipping any empty lines.
Use Python's groupby() function to split the list of lines into blocks based on the Computer Name line.
For each block, try and extract both an error and a list of IDs and Passwords. Leave blank if not present.
For each block, write any extracted data to the next row in the spreadsheet.
The script is as follows:
from itertools import groupby
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID1', 38), ('Password1', 55), ('ID2', 38), ('Password2', 55), ('ID3', 38), ('Password3', 55)]
for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)
rowy = 1
lines = []
data = []
computer_name = None
with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]
for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', list(g)[0]).group(1)
elif computer_name:
block = list(g)
error = 'NO ERROR'
ids = []
passwords = []
for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+"(.*?)"', line)
if re_error:
error = re_error.group(1)
if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_string(rowy, 1, error)
for index, (id, pw) in enumerate(zip(ids, passwords)):
worksheet.write_string(rowy, index * 2 + 2, id)
worksheet.write_string(rowy, index * 2 + 3, pw)
rowy += 1 # Advance to the next output row
workbook.close()
Assuming your BLRP.txt is as follows:
Computer Name: "Name Here1"
ERROR: "some type of error"
Blah blah
Blah blah
Blah blah
Computer Name: "Name Here2"
Volume blah blah
Blah Blah
Numerical Password:
ID: {"The ID1 is here; long string of random chars"}
Password:
"Password1 here; also a long string"
Blah Blah
Blah Blah
Numerical Password:
ID: {"The ID2 is here; long string of random chars"}
Password:
"Password2 here; also a long string"
Blah Blah
Blah Blah
Numerical Password:
ID: {"The ID3 is here; long string of random chars"}
Password:
"Password3 here; also a long string"
Blah Blah
Blah Blah
You would get a spreadsheet as follows:
How does groupby() work?
Normally when you iterate over a list, it gives you the entries one item at a time. With groupby(), you are able to iterate over this list in "groups", where the number of items in each group is based on a condition. The condition is provided in the form of a function (I have used lambda to avoid writing a separate function).
groupby() will build up the group of items to return until the result from the function changes. In this case, the function is looking for lines that start with the word Computer Name. So when that is true it will return with one item (unless there are two adjacent lines with Computer Name on them). Next it will return with all the lines that don't start with Computer Name, and so on.
It returns two things, a key and a group. The key is the result of the function startswith(), which will either be True or False. The group is an iterable holding all the matching items. list(g) is used to convert it into a normal list, in this case all the lines until the next Computer Name line is returned.
To write the entries onto different rows and to convert known error messages into numbers:
from itertools import groupby
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()
bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID', 38), ('Password', 55)]
for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)
rowy = 1
lines = []
data = []
computer_name = None
error_numbers = {
'An error occurred while connecting to the BitLocker management interface.' : 1,
'No key protectors found.' : 2,
'An error occurred (code 0x80070057):' : 3,
'An error occurred (code 0x8004100e):' : 4}
with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]
for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
block = list(g)
if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', block[0]).group(1)
elif computer_name:
error_number = 0 # 0 for NO ERROR
ids = []
passwords = []
for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+?(.*)\s*?', line)
if re_error:
error = re_error.group(1)
error_number = error_numbers.get(error, -1) # Return -1 for an unknown error
if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)
for id, pw in zip(ids, passwords):
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)
worksheet.write_string(rowy, 2, id)
worksheet.write_string(rowy, 3, pw)
rowy += 1 # Advance to the next output row
if len(ids) == 0:
rowy += 1 # Advance to the next output row
workbook.close()

Related

Relocate the table index in python 3

Query = search _entry.get()
Sql = *SELECT FROM customers where last_name = %s"
Data = (query,)
Result = my_cursor. Execute (sql, data)
Result = my_cursor. fetchall ()
If not result :
Result = "record not found... "
Query_label = Label(search _customer _window, text =result)
Query_label. Place (x=40,y=130)
else :
For index, x in enumerate (result) :
Num =0
Index +=2
For y in x:
Query_label = Label(search _customer_window, text=y)
Query_label.grid(row=index,column=num)
Num +=1
I set the value of index to 2 but nothing happens. Thanks for your help.
When I run the program, the label (query_lqbel) is shown at top left side of the window (row 0, column =0), how can I change the location of label. Its actually a label on which some data are shown.

Python retrieving data from a block of lines containing specific characters and appending relevant data into separate lines

I am trying to create a program which selects specific information from a bulk paste, extract the relevant information and then proceed to paste said information into lines.
Here is some example data;
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA
I would like to have the output where the relevant data for the Track1 is grouped together in a single line, with semicolon joining identical information and " - " seperating between others.
LyrcistA - ComposerA - ArrangerA; ArrangerB
LyrcistA; LyrcistC - ComposerA - ArrangerA
I have not gotten very far despite my best efforts
while True:
YodobashiData = input("")
SplitData = YodobashiData.splitlines();
returns the following
['1. Track1 03:01']
['VOC:PersonA ']
['LYR：LyrcistA']
['COM：ComposerA']
['ARR：ArrangerA']
['ARR：ArrangerB']
[]
['2. Track2 04:18']
['VOC:PersonB']
['VOC:PersonC']
['LYR：LyrcistA']
['LYR：LyrcistC']
['COM：ComposerA']
['ARR：ArrangerA']
Whilst I have all the data now in separate lists, I have no idea how to identify and extract the information from the list I need from the ones I do not.
Also, it seems I need to have the while loop or else it will only return the first list and nothing else.

Here's a script that doesn't use regular expressions.
It assumes that header lines, and only the header lines, will always start with a digit, and that the overall structure of header line then credit lines is consistent. Empty lines are ignored.
Extraction and formatting of the track data are handled separately, so it's easier to change formats, or use the extracted data in other ways.
import collections
import unicodedata
data_from_question = """\
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA
"""
def prepare_data(data):
# The "colons" in the credits lines are actually
# "full width colons". Replace them (and other such characters)
# with their normal width equivalents.
# If full normalisation is undesirable then we could return
# data.replace('\N{FULLWIDTH COLON}', ':')
return unicodedata.normalize('NFKC', data)
def is_new_track(line):
return line[0].isdigit()
def parse_track_header(line):
id_, title, duration = line.split()
return {'id': id_.rstrip('.'), 'title': title, 'duration': duration}
def get_credit(line):
credit, _, name = line.partition(':')
return credit.strip(), name.strip()
def format_track_heading(track):
return 'id: {id} title: {title} length: {duration}'.format(**track)
def format_credits(track):
order = ['ARR', 'COM', 'LYR', 'VOC']
parts = ['; '.join(track[k]) for k in order]
return ' - '.join(parts)
def get_data():
# The data is expected to be a multiline string.
return data_from_question
def parse_data(data):
track = None
for line in filter(None, data.splitlines()):
if is_new_track(line):
if track:
yield track
track = collections.defaultdict(list)
header_data = parse_track_header(line)
track.update(header_data)
else:
role, name = get_credit(line)
track[role].append(name)
yield track
def report(tracks):
for track in tracks:
print(format_track_heading(track))
print(format_credits(track))
print()
def main():
data = get_data()
prepared_data = prepare_data(data)
tracks = parse_data(prepared_data)
report(tracks)
main()
Output:
id: 1 title: Track1 length: 03:01
ArrangerA; ArrangerB - ComposerA - LyrcistA - PersonA
id: 2 title: Track2 length: 04:18
ArrangerA - ComposerA - LyrcistA; LyrcistC - PersonB; PersonC

Here's another take on an answer to your question:
data = """
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA"""
import re
import collections
# Regular expression to pull apart the headline of each entry
headlinePattern = re.compile(r"(\d+)\.\s+(.*?)\s+(\d\d:\d\d)")
def main():
# break the data into lines
lines = data.strip().split("\n")
# while we have more lines...
while lines:
# The next line should be a title line
line = lines.pop(0)
m = headlinePattern.match(line)
if not m:
raise Exception("Unexpected data format")
id = m.group(1)
title = m.group(2)
length = m.group(3)
people = collections.defaultdict(list)
# Now read person lines until we hit a blank line or the end of the list
while lines:
line = lines.pop(0)
if not line:
break
# Break the line into label and name
label, name = re.split(r"\W+", line, 1)
# Add this entry to a map of lists, where the map's keys are the label and the
# map's values are all the people who had that label
people[label].append(name)
# Now we have everything for one entry in the data. Print everything we got.
print("id:", id, "title:", title, "length:", length)
print(" - ".join(["; ".join(person) for person in people.values()]))
# go on to the next entry...
main()
Result:
id: 1 title: Track1 length: 03:01
PersonA - LyrcistA - ComposerA - ArrangerA; ArrangerB
id: 2 title: Track2 length: 04:18
PersonB; PersonC - LyrcistA; LyrcistC - ComposerA - ArrangerA
You can just comment out the line that prints the headline info if you really just want the line with all of the people on it. Just replace the built in data with data = input("") if you want to read the data from a user prompt.

Assuming your data is in the format you specified in a file called tracks.txt, the following code should work:
import re
with open('tracks.txt') as fp:
tracklines = fp.read().splitlines()
def split_tracks(lines):
track = []
all_tracks = []
while True:
try:
if lines[0] != '':
track.append(lines.pop(0))
else:
all_tracks.append(track)
track = []
lines.pop(0)
except:
all_tracks.append(track)
return all_tracks
def gather_attrs(tracks):
track_attrs = []
for track in tracks:
attrs = {}
for line in track:
match = re.match('([A-Z]{3}):', line)
if match:
attr = line[:3]
val = line[4:].strip()
try:
attrs[attr].append(val)
except KeyError:
attrs[attr] = [val]
track_attrs.append(attrs)
return track_attrs
if __name__ == '__main__':
tracks = split_tracks(tracklines)
attrs = gather_attrs(tracks)
for track in attrs:
semicolons = map(lambda va: '; '.join(va), track.values())
hyphens = ' - '.join(semicolons)
print(hyphens)
The only thing you may have to change is the colon characters in your data - some of them are ASCII colons : and others are Unicode colons ：, which will break the regex.

import re
list_ = data_.split('\n') # here data_ is your data
regObj = re.compile(rf'[A-Za-z]+(:|{chr(65306)})[A-Za-z]+')
l = []
pre = ''
for i in list_:
if regObj.findall(i):
if i[:3] != 'VOC':
if pre == i[:3]:
l.append('; ')
else:
l.append(' - ')
l.append(i[4:].strip())
else:
l.append(' => ')
pre = i[:3]
track_list = list(map(lambda item: item.strip(' - '), filter(lambda item: item, ''.join(l).split(' => '))))
print(track_list)
OUTPUT : list of result you want
['LyrcistA - ComposerA - ArrangerA; ArrangerB', 'LyrcistA; LyrcistC - ComposerA - ArrangerA']

Prompting user to enter column names from a csv file (not using pandas framework)

I am trying to get the column names from a csv file with nearly 4000 rows. There are about 14 columns.
I am trying to get each column and store it into a list and then prompt the user to enter themselves at least 5 columns they want to look at.
The user should then be able to type how many results they want to see (they should be the smallest results from that column).
For example, if they choose clothing_brand, "8", the 8 least expensive brands are displayed.
So far, I have been able to use "with" and get a list that contains each column, but I am having trouble prompting the user to pick at least 5 of those columns.

You can very well use the Python input to get the input from user, if you want to prompt no. of times, use the for loop to get inputs. Check Below code:
def get_user_val(no_of_entries = 5):
print('Enter {} inputs'.format(str(no_of_entries)))
val_list = []
for i in range(no_of_entries):
val_list.append(input('Enter Input {}:'.format(str(i+1))))
return val_list
get_user_val()

I hope I didn't misunderstand what you mean, the code below is what you want?
You can put the data into the dict then sorted it.
Solution1
from io import StringIO
from collections import defaultdict
import csv
import random
import pprint
def random_price():
return random.randint(1, 10000)
def create_test_data(n_row=4000, n_col=14, sep=','):
columns = [chr(65+i) for i in range(n_col)] # A, B ...
title = sep.join(columns)
result_list = [title]
for cur_row in range(n_row):
result_list.append(sep.join([str(random_price()) for _ in range(n_col)]))
return '\n'.join(result_list)
def main():
if 'load CSV':
test_content = create_test_data(n_row=10, n_col=5)
dict_brand = defaultdict(list)
with StringIO(test_content) as f:
rows = csv.reader(f, delimiter=',')
for idx, row in enumerate(rows):
if idx == 0: # title
columns = row
continue
for i, value in enumerate(row):
dict_brand[columns[i]].append(int(value))
pprint.pprint(dict_brand, indent=4, compact=True, width=120)
user_choice = input('input columns (brand)')
number_of_results = 5 # input('...')
watch_columns = user_choice.split(' ') # D E F
for col_name in watch_columns:
cur_brand_list = dict_brand[col_name]
print(sorted(cur_brand_list, reverse=True)[:number_of_results])
# print(f'{col_name} : {sorted(cur_brand_list)}') # ASC
# print(f'{col_name} : {sorted(cur_brand_list, reverse=True)}') # DESC
if __name__ == '__main__':
main()
defaultdict(<class 'list'>,
{ 'A': [9424, 6352, 5854, 5870, 912, 9664, 7280, 8306, 9508, 8230],
'B': [1539, 1559, 4461, 8039, 8541, 4540, 9447, 512, 7480, 5289],
'C': [7701, 6686, 1687, 3134, 5723, 6637, 6073, 1925, 4207, 9640],
'D': [4313, 3812, 157, 6674, 8264, 2636, 765, 2514, 9833, 1810],
'E': [139, 4462, 8005, 8560, 5710, 225, 5288, 6961, 6602, 4609]})
input columns (brand)C D
[9640, 7701, 6686, 6637, 6073]
[9833, 8264, 6674, 4313, 3812]
Solution2: Using Pandas
def pandas_solution(test_content: str, watch_columns= ['C', 'D'], number_of_results=5):
with StringIO(test_content) as f:
df = pd.read_csv(StringIO(f.read()), usecols=watch_columns,
na_filter=False) # it can add performance (ignore na)
dict_result = defaultdict(list)
for col_name in watch_columns:
dict_result[col_name].extend(df[col_name].sort_values(ascending=False).head(number_of_results).to_list())
df = pd.DataFrame.from_dict(dict_result)
print(df)
C D
0 9640 9833
1 7701 8264
2 6686 6674
3 6637 4313
4 6073 3812

openpyxl read tables from existing data book example?

In the openpyxl documentation there is an example of how to place a table into a workbook but there are no examples of how to find back the tables of a workbook. I have an XLS file that has named tables in it and I want to open the file, find all of the tables and parse them. I cannot find any documentation on how to do this. Can anyone help?
In the meantime I worked it out and wrote the following class to work with openpyxl:
class NamedArray(object):
''' Excel Named range object
Reproduces the named range feature of Microsoft Excel
Assumes a definition in the form <Worksheet PinList!$A$6:$A$52 provided by openpyxl
Written for use with, and initialised by the get_names function
After initialisation named array can be used in the same way as for VBA in excel
Written for openpyxl version 2.4.1, may not work with earlier versions
'''
C_CAPS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
def __init__(self, wb, named_range_raw):
''' Initialise a NameArray object from the named_range_raw information in the given workbook
'''
self.sheet, cellrange_str = str(named_range_raw).split('!')
self.sheet = self.sheet.replace("'",'') # remove the single quotes if they exist
self.loc = wb[self.sheet]
if ':' in cellrange_str:
self.has_range = True
self.has_value = False
lo, hi = cellrange_str.split(':')
self.ad_lo = lo.replace('$','')
self.ad_hi = hi.replace('$','')
else:
self.has_range = False
self.has_value = True
self.ad_lo = cellrange_str.replace('$','')
self.ad_hi = self.ad_lo
self.row = self.get_row(self.ad_lo)
self.max_row = self.get_row(self.ad_hi)
self.rows = self.max_row - self.row + 1
self.min_col = self.col_to_n(self.ad_lo)
self.max_col = self.col_to_n(self.ad_hi)
self.cols = self.max_col - self.min_col + 1
def size_of(self):
''' Returns two dimensional size of named space
'''
return self.cols, self.rows
def value(self, row=1, col=1):
''' Returns the value at row, col
'''
assert row <= self.rows , 'invalid row number given'
assert col <= self.cols , 'invalid column number given'
return self.loc.cell(self.n_to_col(self.min_col + col-1)+str(self.row + row-1)).value
def __str__(self):
''' printed description of named space
'''
locs = 's ' + self.ad_lo + ':' + self.ad_hi if self.is_range else ' ' + self.ad_lo
return('named range'+ str(self.size_of()) + ' in sheet ' + self.sheet + ' # location' + locs)
def __contains__(self, val):
rval = False
for row in range(1,self.rows+1):
for col in range(1,self.cols+1):
if self.value(row,col) == val:
rval = True
return rval
def vlookup(self, key, col):
''' excel style vlookup function
'''
assert col <= self.cols , 'invalid column number given'
rval = None
for row in range(1,self.rows+1):
if self.value(row,1) == key:
rval = self.value(row, col)
break
return rval
def hlookup(self, key, row):
''' excel style hlookup function
'''
assert row <= self.rows , 'invalid row number given'
rval = None
for col in range(1,self.cols+1):
if self.value(1,col) == key:
rval = self.value(row, col)
break
return rval
#classmethod
def get_row(cls, ad):
''' get row number from cell string
Cell string is assumed to be in excel format i.e "ABC123" where row is 123
'''
row = 0
for l in ad:
if l in "1234567890":
row = row*10 + int(l)
return row
#classmethod
def col_to_n(cls, ad):
''' find column number from xl address
Cell string is assumed to be in excel format i.e "ABC123" where column is abc
column number is integer represenation i.e.(A-A)*26*26 + (B-A)*26 + (C-A)
'''
n = 0
for l in ad:
if l in cls.C_CAPS:
n = n*26 + cls.C_CAPS.find(l)+1
return n
#classmethod
def n_to_col(cls, n):
''' make xl column address from column number
'''
ad = ''
while n > 0:
ad = cls.C_CAPS[n%26-1] + ad
n = n // 26
return ad
def get_names(workbook, filt='', debug=False):
''' Create a structure containing all of the names in the given workbook
filt is an optional parameter and used to create a subset of names starting with filt
useful for IO_ring_spreadsheet as all names start with 'n_'
if present, filt characters are stipped off the front of the name
'''
named_ranges = workbook.defined_names.definedName
name_list = {}
for named_range in named_ranges:
name = named_range.name
if named_range.attr_text.startswith('#REF'):
print('WARNING: named range "', name, '" is undefined')
elif filt == '' or name.startswith(filt):
name_list[name[len(filt):]] = NamedArray(workbook, named_range.attr_text)
if debug:
with open("H:\\names.txt",'w') as log:
for item in name_list:
print (item, '=', name_list[item])
log.write(item.ljust(30) + ' = ' + str(name_list[item])+'\n')
return name_list

I agree that the documentation does not really help, and the public API also seems to have only add_table() method.
But then I found an openpyxl Issue 844 asking for a better interface, and it shows that worksheet has an _tables property.
This is enough to get a list of all tables in a file, together with some basic properties:
from openpyxl import load_workbook
wb = load_workbook(filename = 'test.xlsx')
for ws in wb.worksheets:
print("Worksheet %s include %d tables:" % (ws.title, len(ws._tables)))
for tbl in ws._tables:
print(" : " + tbl.displayName)
print(" - name = " + tbl.name)
print(" - type = " + (tbl.tableType if isinstance(tbl.tableType, str) else 'n/a')
print(" - range = " + tbl.ref)
print(" - #cols = %d" % len(tbl.tableColumns))
for col in tbl.tableColumns:
print(" : " + col.name)
Note that the if/else construct is required for the tableType, since it can return NoneType (for standard tables), which is not convertible to str.

Building on #MichalKaut's answer, I created a simple function that returns a dictionary with all tables in a given workbook. It also puts each table's data into a Pandas DataFrame.
from openpyxl import load_workbook
import pandas as pd
def get_all_tables(filename):
""" Get all tables from a given workbook. Returns a dictionary of tables.
Requires a filename, which includes the file path and filename. """
# Load the workbook, from the filename, setting read_only to False
wb = load_workbook(filename=file, read_only=False, keep_vba=False, data_only=True, keep_links=False)
# Initialize the dictionary of tables
tables_dict = {}
# Go through each worksheet in the workbook
for ws_name in wb.sheetnames:
print("")
print(f"worksheet name: {ws_name}")
ws = wb[ws_name]
print(f"tables in worksheet: {len(ws.tables)}")
# Get each table in the worksheet
for tbl in ws.tables.values():
print(f"table name: {tbl.name}")
# First, add some info about the table to the dictionary
tables_dict[tbl.name] = {
'table_name': tbl.name,
'worksheet': ws_name,
'num_cols': len(tbl.tableColumns),
'table_range': tbl.ref}
# Grab the 'data' from the table
data = ws[tbl.ref]
# Now convert the table 'data' to a Pandas DataFrame
# First get a list of all rows, including the first header row
rows_list = []
for row in data:
# Get a list of all columns in each row
cols = []
for col in row:
cols.append(col.value)
rows_list.append(cols)
# Create a pandas dataframe from the rows_list.
# The first row is the column names
df = pd.DataFrame(data=rows_list[1:], index=None, columns=rows_list[0])
# Add the dataframe to the dictionary of tables
tables_dict[tbl.name]['dataframe'] = df
return tables_dict
# File location:
file = r"C:\Users\sean\spreadsheets\full_of_tables.xlsx"
# Run the function to return a dictionary of all tables in the Excel workbook
tables_dict = get_all_tables(filename=file)

The answer to this has changed.
ws objects now contain the tables accessor which acts as a dictionary. Updated answer is:
tmp = [ws.tables for ws in wb.worksheets]
tbls = [{v.name:v} for t in tmp for v in t.values()]

I'm not sure what you mean by parsing but read-support for worksheet tables has been possible since version 2.4.4. If you have questions about the details then I suggest you ask your question on the openpyxl mailing list as that is a more suitable place for this kind of discussion.

I don't think this is possible. I seems to work similarly to images; if you read and save a file with a table it will get striped.

Setting row Span in QTableView using Python?

I am trying to set rowspan on second column of my QTableView but somehow logically i am missing something. i am only able to get A and B but not C. Plus i am getting warning QTableView::setSpan: span cannot overlap and QTableView::setSpan: single cell span won't be added
My code snippet is:-
startspan = 0
for i, tcname in enumerate(tcfilename):
if tcfilename[i]:
if i > 0:
print '#######################'
print 'startspan = '+str(startspan)+' i = '+str(i)
if tcname == tcfilename[i-1]:
#setSpan (row, column, rowSpan, columnSpan)
print 'if (from_row, till_row) '+str(startspan)+' '+str(i)
table_view.setSpan(startspan, 1, i, 1);
elif tcname != tcfilename[i-1]:
print 'Else no span (from_row, till_row) '+str(startspan)+' '+str(i)
table_view.setSpan(startspan, 1, i, 1);
if i == 1:
startspan = 0
else:
startspan = i
else:
break

Did this with simple two line code below
for toRow, tcname in enumerate(tcfilename):
table_view.setSpan(tcfilename.index(tcname), 1, tcfilename.count(tcname), 1)

I made a nifty little function to solve this.. Had recursion but then optimized it without recursion.. feed it a table and a data set
def my_span_checker(self, my_data, table):
for i in range(len(my_data)):
my_item_count = 0
my_label = table.item(i, 0).text()
for j in range(len(my_data)):
if table.item(j, 0).text() == my_label:
my_item_count += 1
if my_item_count != 1:
table.setSpan(i, 0, my_item_count, 1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Text to cleaner text to excel spreadsheet project - python

Related

Relocate the table index in python 3

Python retrieving data from a block of lines containing specific characters and appending relevant data into separate lines

Prompting user to enter column names from a csv file (not using pandas framework)

openpyxl read tables from existing data book example?

Setting row Span in QTableView using Python?

Categories

Resources