I have an .xlsx file like this:
item price
foo 5$
poo 3$
woo 7$
moo 2$
I want to use the openpyxl to open the file and add a new column to it like this:
item price owner
foo 5$ Jim owns foo
poo 3$ Jack owns poo
woo 7$ John owns woo
moo 2$ Jay owns moo
Anyone can help me with how to do it?
My code:
file_location = 'excel_name.xlsx'
df = pd.read_excel(file_errors_location, engine='openpyxl')
for item in df['item']:
df['sub'].append(f'bla bla owms {item}')
df.to_excel('excel_me.xlsx', engine='openpyxl')
You don't need pandas for something this simple.
So working with 'item' column as column 'A'
import openpyxl as op
wb = op.load_workbook(r'foo.xlsx')
ws = wb["Sheet1"]
owner_list = ['owner', 'Jim', 'Jack', 'John', 'Jay']
for enum, cell in enumerate(ws['A']):
row = enum+1
if enum == 0:
ws.cell(row=row, column=3).value = owner_list[enum]
else:
ws.cell(row=row, column=3).value = owner_list[enum] + " owns " + cell.value
wb.save('foo.xlsx')
Related
I need to extract unique names with titles such as Lord|Baroness|Lady|Baron from text and match it with another list. I struggle to get the right result and hope the community can help me. Thanks!
import re
def get_names(text):
# find nobel titles and grab it with the following name
match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+) ([A-Z][a-z]+)')
names = list(set(match.findall(text)))
# remove duplicates based on the index in tuples
names_ = list(dict((v[1],v) for v in sorted(names, key=lambda names: names[0])).values())
names_lst = list(set([' '.join(map(str, name)) for name in names_]))
return names_lst
text = 'Baroness Firstname Surname and Baroness who is also known as Lady Anothername and Lady Surname or Lady Firstname.'
names_lst = get_names(text)
print(names_lst)
Which now yields:['Baroness Firstname Surname']
Desired output: ['Baroness Firstname Surname', 'Lady Anothername'] but NOT Lady Surname or Lady Firstname
Then I need to match the result with this list:
other_names = ['Firstname Surname', 'James', 'Simon Smith']
and drop the element 'Firstname Surname' from it because it matches the first name and surname of the Baroness in 'the desired output'.
I suggest you the following solution:
import re
def get_names(text):
# find nobel titles and grab it with the following name
match = re.compile(r'(Lord|Baroness|Lady|Baron) ([A-Z][a-z]+)[ ]?([A-Z][a-z]+)?')
names = list(match.findall(text))
# keep only the first title encountered
d = {}
for name in names:
if name[0] not in d:
d[name[0]] = ' '.join(name[1:3]).strip()
return d
text = 'Baroness Firstname Surname and Baroness who is also known as Lady Anothername and Lady Surname or Lady Firstname.'
other_names = ['Firstname Surname', 'James', 'Simon Smith']
names_dict = get_names(text)
print(names_dict)
# {'Baroness': 'Firstname Surname', 'Lady': 'Anothername'}
print([' '.join([k,v]) for k,v in names_dict.items()])
# ['Baroness Firstname Surname', 'Lady Anothername']
other_names_dropped = [name for name in other_names if name not in names_dict.values()]
print(other_names_dropped)
# ['James', 'Simon Smith']
I receive an XML document with many child elements which I need to extract the info and then export to a CSV or text document so I can import to Quickbooks. The XML tree looks like the following:
<MODocuments>
<MODocument>
<Document>TX1126348</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180510</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>15500</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>175</TotalPounds>
</MOLot>
<MOLot>
<LotID>C</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>7500</TotalPounds>
</MOLot>
<MOLot>
<LotID>D</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>300</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126349</Document>
<DocStatus>P</DocStatus>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>25200</TotalPounds>
</MOLot>
<MOLot>
<LotID>B</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>16800</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
<MODocument>
<Document>TX1126350</Document>
<DateIssued>20180511</DateIssued>
<ApplicantName>COMPANY FRUIT & VEGETABLE</ApplicantName>
<MOLots>
<MOLot>
<LotID>A</LotID>
<ProductVariety>Yellow</ProductVariety>
<TotalPounds>14100</TotalPounds>
</MOLot>
</MOLots>
</MODocument>
</MODocuments>
I need to extract the TotalPounds from each MODocument parent so the output would look like this:
DOCUMENT number, APPLICANT NAME, and TOTAL POUNDS added up for all the MOLots in that one document.
TX1126348 COMPANY FRUIT & VEGETABLE 23475
TX1126349 COMPANY FRUIT & VEGETABLE 42000
TX1126350 COMPANY FRUIT & VEGETABLE 14100
Here's the code I'm working with:
import xml.etree.ElementTree as ET
tree = ET.parse('TX_959_20180514131311.xml')
root = tree.getroot()
docCert = []
docComp = []
totalPounds=[]
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
for MOLots in MODocument:
for MOLot in MOLots:
totalPounds.append(int(MOLot.find('TotalPounds').text))
for i in range(len(docCert)):
print(i, docCert[i],' ', docComp[i], totalPounds[i])
This is my output, and I don't know how to add up the totals for each Document.. please help.
0 TX1126348 COMPANY FRUIT & VEGETABLE 15500
1 TX1126349 COMPANY FRUIT & VEGETABLE 175
2 TX1126350 COMPANY FRUIT & VEGETABLE 7500
If you can use lxml, you can have the XPath sum() function sum all of the TotalPounds for you.
Example...
from lxml import etree
import csv
tree = etree.parse("TX_959_20180514131311.xml")
with open("output.csv", "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)
for mo_doc in tree.xpath("/MODocuments/MODocument"):
csvwriter.writerow([mo_doc.xpath("Document")[0].text,
mo_doc.xpath("ApplicantName")[0].text,
int(mo_doc.xpath("sum(MOLots/MOLot/TotalPounds)"))])
contents of "output.csv"...
TX1126348,COMPANY FRUIT & VEGETABLE,23475
TX1126349,COMPANY FRUIT & VEGETABLE,42000
TX1126350,COMPANY FRUIT & VEGETABLE,14100
Also, you have lots of control over quoting, delimiters, etc. by writing the output with csv.
It looks like there will be more items in totalPounds than in docCert or docComp. I think you need to do something like this:
for MODocuments in root:
for MODocument in MODocuments:
docCert.append(MODocument.find('Document').text)
docComp.append(MODocument.find('ApplicantName').text)
sub_total = 0
for MOLots in MODocument:
for MOLot in MOLots:
sub_total += int(MOLot.find('TotalPounds').text)
totalPounds.append(sub_total)
I've been working on a function which will update two dictionaries (similar authors, and awards they've won) from an open text file. The text file looks something like this:
Brabudy, Ray
Hugo Award
Nebula Award
Saturn Award
Ellison, Harlan
Heinlein, Robert
Asimov, Isaac
Clarke, Arthur
Ellison, Harlan
Nebula Award
Hugo Award
Locus Award
Stephenson, Neil
Vonnegut, Kurt
Morgan, Richard
Adams, Douglas
And so on. The first name is an authors name (last name first, first name last), followed by awards they may have won, and then authors who are similar to them. This is what I've got so far:
def load_author_dicts(text_file, similar_authors, awards_authors):
name_of_author = True
awards = False
similar = False
for line in text_file:
if name_of_author:
author = line.split(', ')
nameA = author[1].strip() + ' ' + author[0].strip()
name_of_author = False
awards = True
continue
if awards:
if ',' in line:
awards = False
similar = True
else:
if nameA in awards_authors:
listawards = awards_authors[nameA]
listawards.append(line.strip())
else:
listawards = []
listawards.append(line.strip()
awards_authors[nameA] = listawards
if similar:
if line == '\n':
similar = False
name_of_author = True
else:
sim_author = line.split(', ')
nameS = sim_author[1].strip() + ' ' + sim_author[0].strip()
if nameA in similar_authors:
similar_list = similar_authors[nameA]
similar_list.append(nameS)
else:
similar_list = []
similar_list.append(nameS)
similar_authors[nameA] = similar_list
continue
This works great! However, if the text file contains an entry with just a name (i.e. no awards, and no similar authors), it screws the whole thing up, generating an IndexError: list index out of range at this part Zname = sim_author[1].strip()+" "+sim_author[0].strip() )
How can I fix this? Maybe with a 'try, except function' in that area?
Also, I wouldn't mind getting rid of those continue functions, I wasn't sure how else to keep it going. I'm still pretty new to this, so any help would be much appreciated! I keep trying stuff and it changes another section I didn't want changed, so I figured I'd ask the experts.
How about doing it this way, just to get the data in, then manipulate the dictionary any ways you want.
test.txt contains your data
Brabudy, Ray
Hugo Award
Nebula Award
Saturn Award
Ellison, Harlan
Heinlein, Robert
Asimov, Isaac
Clarke, Arthur
Ellison, Harlan
Nebula Award
Hugo Award
Locus Award
Stephenson, Neil
Vonnegut, Kurt
Morgan, Richard
Adams, Douglas
And my code to parse it.
award_parse.py
data = {}
name = ""
awards = []
f = open("test.txt")
for l in f:
# make sure the line is not blank don't process blank lines
if not l.strip() == "":
# if this is a name and we're not already working on an author then set the author
# otherwise treat this as a new author and set the existing author to a key in the dictionary
if "," in l and len(name) == 0:
name = l.strip()
elif "," in l and len(name) > 0:
# check to see if recipient is already in list, add to end of existing list if he/she already
# exists.
if not name.strip() in data:
data[name] = awards
else:
data[name].extend(awards)
name = l.strip()
awards = []
# process any lines that are not blank, and do not have a ,
else:
awards.append(l.strip())
f.close()
for k, v in data.items():
print("%s got the following awards: %s" % (k,v))
I have a dictionary in Python 2.7.9. I want to present the data in my dictionary in a spreadsheet. How can I accomplish this? Note, the dictionary has over 15 different items inside.
Dictionary:
{'Leda Doggslife': '$13.99', 'Carson Busses': '$29.95', 'Derri Anne Connecticut': '$19.25', 'Bobbi Soks': '$5.68', 'Ben D. Rules': '$7.50', 'Patty Cakes': '$15.26', 'Ira Pent': '$16.27', 'Moe Tell': '$10.09', 'Ido Hoe': '$14.47', 'Ave Sectomy': '$50.85', 'Phil Meup': '$15.98', 'Al Fresco': '$8.49', 'Moe Dess': '$19.25', 'Sheila Takya': '$15.00', 'Earl E. Byrd': '$8.37', 'Rose Tattoo': '$114.07', 'Gary Shattire': '$14.26', 'Len Lease': '$11.11', 'Howie Kisses': '$15.86', 'Dan Druff': '$31.57'}
Are you trying to write your dictionary in a Excel Spreadsheet?
In this case, you could use win32com library:
import win32com.client
xlApp = win32com.client.DispatchEx('Excel.Application')
xlApp.Visible = 0
xlBook = xlApp.Workbooks.Open(my_filename)
sht = xlBook.Worksheets(my_sheet)
row = 1
for element in dict.keys():
sht.Cells(row, 1).Value = element
sht.Cells(row, 2).Value = dict[element]
row += 1
xlBook.Save()
xlBook.Close()
Note that this code will work just if the workbook already exists.
Otherwise:
import win32com.client
xlApp = win32com.client.DispatchEx('Excel.Application')
xlApp.Visible = 0
xlBook = xlApp.Workbooks.Add()
sht = xlBook.Worksheets(my_sheet)
row = 1
for element in dict.keys():
sht.Cells(row, 1).Value = element
sht.Cells(row, 2).Value = dict[element]
row += 1
xlBook.SaveAs(mw_filename)
xlBook.Close()
I hope it will be the right answer to your question.
I have the following type of document, where each person might have a couple of names and an associated description of features:
New person
name: ana
name: anna
name: ann
feature: A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.
New person
name: tom
name: thomas
name: thimoty
name: tommy
feature: A 32-year old male that is known to be deaf.
New person
.....
What I would like is to read this file in a python dictionary, where each new person is id-ed.
i.e. Person with ID 1 will have the names ['ann','anna','ana']
and will have the feature ['A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.' ]
Any suggestions?
Assuming that your input file is lo.txt. It can be added to dictionary this way:
file = open('lo.txt')
final_data = []
feature = []
names = []
for line in file.readlines():
if ("feature") in line:
data = line.replace("\n","").split(":")
feature=data[1]
final_data.append({
'names': names,
'feature': feature
})
names = []
feature = []
if ("name") in line:
data = line.replace("\n","").split(":")
names.append(data[1])
print final_data
Something like this might work
result = {}
f = open("document.txt")
contents = f.read()
info = contents.split('==== new person ===')
for i in range(len(info)):
info[i].split('\n')
names = []
features = []
for j in range(len(info[i])):
info[i][j].split(':')
if info[i][j][0] == 'name':
names.append(info[i][j][1])
else:
features.append(info[i][j][1])
result[i] = {'names': names,'features': features}
print(result)
This should give you something like:
{0: {'names': ['ana', 'anna', 'ann'], features:['...', '...']}}
e.t.c
Here is code that may work for you:
f = open("documents.txt").readlines()
f = [i.strip('\n') for i in f]
final_condition = f[len(f)-1]
f.remove(final_condition)
names = [i.split(":")[1] for i in f]
the_dict = {}
the_dict["names"] = names
the_dict["features"] = final_condition
print the_dict
All it does is split the names at ":" and take the last element of the resulting list (the names) and keep it for the list names.