Python Text Widget appending tab characters ( "\t") - python

I have this function in my code that just opens a Dialog and open a text file, so in this file it will extract some specific information, that its just basic numbers. Im extracting them by column, since they displays like this:
1[tab]2.942 2,885 3,013 2170 745 2,91 0,00000 0,000
2[tab]3,065 3,013 3,129 2500 834 3,00 0,00000 V 0,042
3[tab]3,188 3,129 3,261 9813 2449 4,01 0,00000 V 0,084
4[tab]3,307 3,261 3,409 4990 891 5,60 0,00000 V 0,124 [...]
[...]
10[tab]4,731 4,661 4,793 6855 2037 3,37 0,00000 T 0,608
11[tab]4,842 4,793 4,941 2834 829 3,42 0,00000 TV 0,646
I just need the first numbers, the enumerator, to print it in the Text Widget. The problem is that I`m extracting by column [0:2],and the first 9 numbers are single digit only, so the widget is append the tab character after the
1,2,3,4,5,6,7,8,9.. and printing like:
1
whitespace
2
whitespace
3
whitespace
10
11
12
I want to remove this spaces.
Here`s the code:
def grab_file(self):
self.f = tkFileDialog.askopenfile(mode="r",filetypes=(("Description file","*.desc"),("All files","*.*")))
self.data_list = self.f.readlines()
self.f.close()
del self.data_list[141:]
del self.data_list[0:62]
for self.line in self.data_list:
self.lines = self.line.strip().split("\t")
self.grab_file_irk.append((self.lines[1]))
self.grab_file_irk = [w.replace(',', '.')for w in self.grab_file_irk]
self.grab_file_irk.reverse()
for i in self.grab_file_irk:
self.grab_file_floated.append(float(i))
for self.column in self.data_list:
self.entry0.insert("end",self.column[0:2] + "\n")
Thanks for your time.

You could either find the index of the first tab character directly:
self.column[:self.column.find('\t')]
or use regex:
import re
m = re.match('(\d+)', self.column)
s_num = m.group(0)

Related

How do I make it stop storing everything in the first element of the list?

I am trying to have each line be stored in a different element of the list. The text file is as follows...
244
Large Cake Pan
7
19.99
576
Assorted Sprinkles
3
12.89
212
Deluxe Icing Set
6
37.97
827
Yellow Cake Mix
3
1.99
194
Cupcake Display Board
2
27.99
285
Bakery Boxes
7
8.59
736
Mixer
5
136.94
I am trying to have 244, 576, etc. be in ID. And "Large Cake Pan","Assorted Sprinkles", etc. in Name. You get the idea, but it's storing everything in ID, and I don't know how to make it store the information in its corresponding element.
Here is my code so far:
import Inventory
def process_inventory(filename, inventory_dict):
inventory_dict = {}
inventory_file = open(filename, "r")
for line in inventory_file:
line = line.split('\n')
ID = line[0]
Name = line[1]
Quantity = line[2]
Price = line[3]
my_inventory = Inventory.Inventory(ID, Name, Quantity, Price)
inventory_dict[ID] = my_inventory
inventory_file.close()
return inventory_dict
def main():
inventory1={}
process_inventory("Inventory.txt", inventory1)

Extracting numbers in text file

I have a text file which came from excel. I dont know how to take five digits after a specific character.
I want to take only five digits after #ACA in a text file.
my text is like:
ERROR_MESSAGE
(((#ACA16018)|(#ACA16019))&(#AQV71767='')&(#AQV71765='2'))?1:((#AQV71765='4')?1:((#AQV71767$'')?(((#AQV71765='1')|(#AQV71765='3'))?1:'Hasar veya Lehe Hukuk seçebilirsiniz'):'Rücu sıra numarasını yazıp Hasar veya Lehe Hukuk seçebilirsiniz'))
Rücu Oranı Girilmesi Zorunludur...'
#ACA17660
#ACA16560
#ACA15623
#ACA17804
BU ALANI BOŞ GEÇEMEZSİNİZ.EKSPER RAPORU GELMEDEN DY YE GERİ GÖNDEREMEZSİNİZ. PERT İHBARI VARSA PERT ÇALINMA OPERASYONU AKTİVİTESİ OLUŞTURULMALIDIR.
(#TSC[T008UNSMAS;FIRM_CODE=2 AND UNIT_TYPE='SG' AND UNIT_NO=#AQV71830]>0)?1:'Girdiğiniz değer fihristte yoktur'
#ACA17602
#ACA17604
#ACA56169
BU ALANI BOŞ GEÇEMEZSİNİZ
#ACA17606
#ACA17608
(#AQV71835='')?'Boş geçilemez':1
Lütfen Gönderilecek Kişinin Mail Adresini Giriniz ! '
LÜTFEN RED NEDENİNİ GİRİNİZ.
EKSİK BİLGİ / BELGE ALANINA GİRMİŞ OLDUĞUNUZ DEĞER YANLIŞ VEYA GEÇERŞİZDİR!!! LÜTFEN KONTROL EDİP TEKRAR DENEYİNİZ.'
BU ALAN BOŞ GEÇİLEMEZ. ÖDEME YAPILMADAN EK ÖDEME SÜRECİNİ BAŞLATAMAZSINIZ.
ONAYLANDI VE REDDEDİLDİ SEÇENEKLERİNİ KULLANAMAZSINIZ
BU ALAN BOŞ GEÇİLEMEZ.EVRAKLARINIZI , VARSA EKSPER RAPORUNU VE MUALLAĞI KONTROL EDİNİZ.
Muallak Tutarını kontrol ediniz.
'OTO BRANŞINDA REDDEDİLDİ NEDENİ SEÇMELİSİNİZ'
'OTODIŞI BRANŞINDA REDDEDİLDİ NEDENİ SEÇMELİSİNİZ'
(#AQV70003$'')?((#TSC[T001HASIHB;FIRM_CODE=#FP10100 AND COMPANY_CODE=2 AND CLAIM_NO=#AQV70003]$0)?1:'Bu dosya sistemde bulunmamaktadır'):'Bu alan boş geçilemez'
(#AQV70503='')?'Bu alan boş geçilemez.':((#ACA18635=1)?1:'Mağdura ait uygun kriterli ödeme kaydı mevcut değildir.')
(#AQV71809=0)?'Boş geçilemez':1
(#FD101AQV71904_AFDS<0)?'Tarih bugünün tarihinden büyük olamaz
I want to take every 5 digits which comes after #ACA, so:
16018, 16019, 17660, etc...
grep -oP '#ACA\K[0-9]{5}' file.txt
#ACA\K will match #ACA but not printed as part of output
[0-9]{5} five digits following #ACA
If variable number of digits are needed, use
grep -oP '#ACA\K[0-9]+' file.txt
If you don't know or don't like regular expressions, you can do this, although the code is a bit longer :
if __name__ == '__main__':
pattern = '#ACA'
filename = 'yourfile.txt'
res = list()
with open(filename, 'rb') as f: # open 'yourfile.txt' in byte-reading mode
for line in f: # for each line in the file
for s in line.split(pattern)[1:]: # split the line on '#ACA'
try:
nb = int(s[:5]) # take the first 5 characters after as an int
res.append(nb) # add it to the list of numbers we found
except (NameError, ValueError): # if conversion fails, that wasn't an int
pass
print res # if you want them in the same order as in the file
print sorted(res) # if you want them in ascending order
This should do it
import re
print(re.findall("#ACA(\d+)",str_var))
If you have the whole text in the variable str_var
Output:
['16018', '16019', '17660', '16560', '15623', '17804', '17602', '17604', '56169', '17606', '17608', '18635']
re.findall(r'#ACA(\d{5})', str_var)
[x[:5] for x in content.split("#ACA")[1:]]
PowerShell solution:
$contet = Get-Content -Raw 'your_file'
$match = [regex]::Matches($contet, '#ACA(\d{5})')
$match | ForEach-Object {
$_.Groups[1].Value
}
Output:
16018
16019
17660
16560
15623
17804
17602
17604
56169
17606
17608
18635

accented characters in a regex with Python

This is my code
# -*- coding: utf-8 -*-
import json
import re
with open("/Users/paul/Desktop/file.json") as json_file:
file = json.load(json_file)
print file["desc"]
key="capacità"
result = re.findall("((?:[\S,]+\s+){0,3})"+key+"\s+((?:[\S,]+\s*){0,3})", file["desc"], re.IGNORECASE)
print result
This is the content of the file
{
"desc": "Frigocongelatore, capacit\u00e0 di 215 litri, h 122 cm, classe A+"
}
My result is []
but what I want is result = "capacità"
You need to treat your string as an Unicode string, like this:
str = u"Frigocongelatore, capacit\u00e0 di 215 litri, h 122 cm, classe A+"
And as you can see if you print str.encode('utf-8') you'll get:
Frigocongelatore, capacità di 215 litri, h 122 cm, classe A+
The same way you can make your regex string an unicode or raw string with u or r respectively.
You can use this function to display different encodings.
The default encoding on your editor should be UTF-8. Check you settings with sys.getdefaultencoding().
def find_context(word_, n_before, n_after, string_):
# finds the word and n words before and after it
import re
b= '\w+\W+' * n_before
a= '\W+\w+' * n_after
pattern = '(' + b + word_ + a + ')'
return re.search(pattern, string_).groups(1)[0]
s = "Frigocongelatore, capacità di 215 litri, h 122 cm, classe A+"
# find 0 words before and 3 after the word capacità
print(find_context('capacità',0,3,s) )
capacità di 215 litri
print(find_context(' capacit\u00e0',0,3,s) )
capacità di 215 litri

python extracting string from data

I have the following data and I need to extract the first string occurrence It is separated from rest of data with \t. I'm trying to use split(),regex but the problem is it is taking more than 1 second to do this for each line. Is there anyway that it could be done faster?
Data:
DT 0.00155095460731831934 0.00121897344629313064 0.00000391325536877105 0.09743272975663436197 0.00002271067721789807 0.00614528909266214615 0.00000445295550745487 0.70422975214810612510 0.00000042521183266708 0.00080380970031485965 0.00046229528280753270 0.00019894095277762626 0.00041012830368947716 0.00013156663380611624 0.00000001065986007929 0.00004244196517011733 0.00061444160944146384 0.02101761386512242258 0.00010328516871273944 0.00001128873771536226 0.00279163054567377073 0.00018903663417650421 0.00006490063677390687 0.00002151218889856898 0.00032824534915777535 0.00040349658620449016 0.00042393411014689220 0.00053643791028589382 0.00001032961180051124 0.00025743865541833909 0.00011497457801324625 0.00005359814320647386 0.00010336445810407512 0.00040942464084107332 0.00009098970100047888 0.00000091369931486168 0.00059479547081431436 0.00000009853464391239 0.00020303484015768289 0.00050594563648307127 0.15679657927655424321 0.00034115929559768240 0.00115490132012489345 0.00019823414624750937
PRP 0.00000131203717608417 0.99998368311809904263 0.00000002192874737415 0.00000073240710142655 0.00000000536610432900 0.00000195554704853124 0.00000000012203475361 0.00000017206852489982 0.00000040268728691384 0.00000034167449501884 0.00000077203219019333 0.00000003082351874675 0.00000052849070550174 0.00000319144710228690 0.00000000009512989203 0.00000002016363199180 0.00000005598551431381 0.00000129166108708107 0.00000004127954869435 0.00000099983230311242 0.00000032415702089502 0.00000010477525952469 0.00000000011045642123 0.00000006942075882668 0.00000017433924380308 0.00000028874823360049 0.00000048656924101513 0.00000017722073116061 0.00000037193481161874 0.00000000452174124394 0.00000081986547018432 0.00000001740977711224 0.00000000808377988046 0.00000001418892143074 0.00000045250939471023 0.00000000000050232556 0.00000043504206149021 0.00000011310292804313 0.00000000013241046549 0.00000015302998639348 0.00000002800056509608 0.00000038361859715043 0.00000000099713364069 0.00000001345362455494
VBD 0.00000002905639670475 0.00000000730896486886 0.00000000406530491040 0.00000009048972500851 0.00000000380338117015 0.00000000000390031394 0.00000000169948197867 0.00000000091890304843 0.00000000013856552537 0.00000191013917141413 0.00000002300239228881 0.00000003601993413087 0.00000004266629173115 0.00000000166497478879 0.00000000000079281873 0.00000180895378547175 0.00000000000159251758 0.00000000081310874277 0.00000000334322892919 0.99999591744268101490 0.00000000000454647012 0.00000000060884665646 0.00000000000010515727 0.00000000019245471748 0.00000000308524019147 0.00000001376847404364 0.00000001449670334202 0.00000001434634011983 0.00000000656887521298 0.00000000796791556475 0.00000000578334901413 0.00000000142124935798 0.00000000213053365838 0.00000000487780229311 0.00000001702409705978 0.00000000391793832836 0.00000001292779157438 0.00000000002447935587 0.00000000000435117453 0.00000000408872313468 0.00000000007201124397 0.00000000431736839121 0.00000000002970930698 0.00000000080852330796
RB 0.00000015663242474016 0.00000002464350694082 0.00000000095443410385 0.99998778106321006831 0.00000000021007124986 0.00000006156902517681 0.00000000277279124155 0.00000000301727284928 0.00000000030682776953 0.00000007379165980724 0.00000012399749754355 0.00000494600825959811 0.00000008488215978963 0.00000000897527112360 0.00000000000009257081 0.00000000223574222125 0.00000000371653801739 0.00000548300954899374 0.00000001802212638276 0.00000000022437343140 0.00000001084514551630 0.00000000328207000562 0.00000000672649111321 0.00000003640165688536 0.00000050812474700731 0.00000007422081603379 0.00000018000760320187 0.00000007733588104368 0.00000008890139839523 0.00000001494850369145 0.00000003233439691280 0.00000000299507821025 0.00000000501198681017 0.00000000271863832841 0.00000004782796496077 0.00000000000160157399 0.00000006968900381578 0.00000000003199719817 0.00000001234122837743 0.00000002204081342858 0.00000000038818632144 0.00000002327335651712 0.00000000016015202564 0.00000000435845392228
VBN 0.00222925562857408935 0.00055631931823257885 0.00000032474066230587 0.00333293927262896372 0.12594759350192680225 0.00142014631420757115 0.00008260266473343272 0.00001658664201138300 0.00000444848747905589 0.00025881226046863004 0.00176478222683846956 0.00226268536384150636 0.00120807701719786715 0.00016158429451364274 0.00000000200391980114 0.00012971908549403702 0.41488930515218963579 0.41237674095727266943 0.00025649814681915863 0.00001340291420511781 0.00067983726358035045 0.00001718712609473795 0.00009573412529081616 0.02342065200703593100 0.00010281749829896253 0.00243912549478067552 0.00111221146411718771 0.00110067534479759994 0.00048702441892562549 0.00014537544850052323 0.00046019613393571187 0.00004100416046505168 0.00001820421200359182 0.00013212194667244404 0.00112515351673182361 0.00000022002597310723 0.00099184191436586821 0.00000187809735682276 0.00000214888688830288 0.00031369371619907773 0.00000552482376141306 0.00033123576486582436 0.00000227934800338172 0.00006203126813779618
So,the bottom line is I need to extract DT, PRP, VBD... from the above text really fast.
You can just call split with maxsplit argument and wrap it into a list generator.
result = [line.split('\t', 1)[0] for line in data]
As you see, passing 1 in the method call makes it stop after the first splitting takes place. I bet this is the fastest solution in Python.
A manual alternative.
def end_of_loop():
raise StopIteration
def my_split(line):
return ''.join(end_of_loop() if char == '\t' else char for char in line)
result = [my_split(line) for line in lines]
Provided your data are in a file:
with open(file) as data:
result = [my_split(line) for line in data]
This will be a lot slower than the first one.
You can use split in a list comprehension :
>>> s="""DT 0.00155095460731831934 0.00121897344629313064 0.00000391325536877105 0.09743272975663436197 0.00002271067721789807 0.00614528909266214615 0.00000445295550745487 0.70422975214810612510 0.00000042521183266708 0.00080380970031485965 0.00046229528280753270 0.00019894095277762626 0.00041012830368947716 0.00013156663380611624 0.00000001065986007929 0.00004244196517011733 0.00061444160944146384 0.02101761386512242258 0.00010328516871273944 0.00001128873771536226 0.00279163054567377073 0.00018903663417650421 0.00006490063677390687 0.00002151218889856898 0.00032824534915777535 0.00040349658620449016 0.00042393411014689220 0.00053643791028589382 0.00001032961180051124 0.00025743865541833909 0.00011497457801324625 0.00005359814320647386 0.00010336445810407512 0.00040942464084107332 0.00009098970100047888 0.00000091369931486168 0.00059479547081431436 0.00000009853464391239 0.00020303484015768289 0.00050594563648307127 0.15679657927655424321 0.00034115929559768240 0.00115490132012489345 0.00019823414624750937
... PRP 0.00000131203717608417 0.99998368311809904263 0.00000002192874737415 0.00000073240710142655 0.00000000536610432900 0.00000195554704853124 0.00000000012203475361 0.00000017206852489982 0.00000040268728691384 0.00000034167449501884 0.00000077203219019333 0.00000003082351874675 0.00000052849070550174 0.00000319144710228690 0.00000000009512989203 0.00000002016363199180 0.00000005598551431381 0.00000129166108708107 0.00000004127954869435 0.00000099983230311242 0.00000032415702089502 0.00000010477525952469 0.00000000011045642123 0.00000006942075882668 0.00000017433924380308 0.00000028874823360049 0.00000048656924101513 0.00000017722073116061 0.00000037193481161874 0.00000000452174124394 0.00000081986547018432 0.00000001740977711224 0.00000000808377988046 0.00000001418892143074 0.00000045250939471023 0.00000000000050232556 0.00000043504206149021 0.00000011310292804313 0.00000000013241046549 0.00000015302998639348 0.00000002800056509608 0.00000038361859715043 0.00000000099713364069 0.00000001345362455494
... VBD 0.00000002905639670475 0.00000000730896486886 0.00000000406530491040 0.00000009048972500851 0.00000000380338117015 0.00000000000390031394 0.00000000169948197867 0.00000000091890304843 0.00000000013856552537 0.00000191013917141413 0.00000002300239228881 0.00000003601993413087 0.00000004266629173115 0.00000000166497478879 0.00000000000079281873 0.00000180895378547175 0.00000000000159251758 0.00000000081310874277 0.00000000334322892919 0.99999591744268101490 0.00000000000454647012 0.00000000060884665646 0.00000000000010515727 0.00000000019245471748 0.00000000308524019147 0.00000001376847404364 0.00000001449670334202 0.00000001434634011983 0.00000000656887521298 0.00000000796791556475 0.00000000578334901413 0.00000000142124935798 0.00000000213053365838 0.00000000487780229311 0.00000001702409705978 0.00000000391793832836 0.00000001292779157438 0.00000000002447935587 0.00000000000435117453 0.00000000408872313468 0.00000000007201124397 0.00000000431736839121 0.00000000002970930698 0.00000000080852330796
... RB 0.00000015663242474016 0.00000002464350694082 0.00000000095443410385 0.99998778106321006831 0.00000000021007124986 0.00000006156902517681 0.00000000277279124155 0.00000000301727284928 0.00000000030682776953 0.00000007379165980724 0.00000012399749754355 0.00000494600825959811 0.00000008488215978963 0.00000000897527112360 0.00000000000009257081 0.00000000223574222125 0.00000000371653801739 0.00000548300954899374 0.00000001802212638276 0.00000000022437343140 0.00000001084514551630 0.00000000328207000562 0.00000000672649111321 0.00000003640165688536 0.00000050812474700731 0.00000007422081603379 0.00000018000760320187 0.00000007733588104368 0.00000008890139839523 0.00000001494850369145 0.00000003233439691280 0.00000000299507821025 0.00000000501198681017 0.00000000271863832841 0.00000004782796496077 0.00000000000160157399 0.00000006968900381578 0.00000000003199719817 0.00000001234122837743 0.00000002204081342858 0.00000000038818632144 0.00000002327335651712 0.00000000016015202564 0.00000000435845392228
... VBN 0.00222925562857408935 0.00055631931823257885 0.00000032474066230587 0.00333293927262896372 0.12594759350192680225 0.00142014631420757115 0.00008260266473343272 0.00001658664201138300 0.00000444848747905589 0.00025881226046863004 0.00176478222683846956 0.00226268536384150636 0.00120807701719786715 0.00016158429451364274 0.00000000200391980114 0.00012971908549403702 0.41488930515218963579 0.41237674095727266943 0.00025649814681915863 0.00001340291420511781 0.00067983726358035045 0.00001718712609473795 0.00009573412529081616 0.02342065200703593100 0.00010281749829896253 0.00243912549478067552 0.00111221146411718771 0.00110067534479759994 0.00048702441892562549 0.00014537544850052323 0.00046019613393571187 0.00004100416046505168 0.00001820421200359182 0.00013212194667244404 0.00112515351673182361 0.00000022002597310723 0.00099184191436586821 0.00000187809735682276 0.00000214888688830288 0.00031369371619907773 0.00000552482376141306 0.00033123576486582436 0.00000227934800338172 0.00006203126813779618"""
>>> [i.split()[0] for i in s.split('\n')]
['DT', 'PRP', 'VBD', 'RB', 'VBN']
import re
p = re.compile(r'^\S+', re.MULTILINE)
re.findall(p, test_str)
You can simply do this to get a list of strings you want.

Reading File and Printing Output

I am reading a file and the output is supposed to look like the one below, ignoring the actual table the values for my hours,minutes and seconds are off as is the money, which is supposed to be calculated by rounding up to the minute; I have tried many ways to solve this and this is my last resort.
+--------------+------------------------------+---+---------+--------+
| Phone number | Name | # |Duration | Due |
+--------------+------------------------------+---+---------+--------
|(780) 123 4567|Ameneh Gholipour Shahraki |384|55h07m53s|$ 876.97|**
|(780) 123 6789|Stuart Johnson |132|17h53m19s|$ 288.81|
|(780) 321 4567|Md Toukir Imam |363|49h52m12s|$ 827.48|++
|(780) 432 1098|Hamman Samuel |112|16h05m09s|$ 259.66|
|(780) 492 2860|Osmar Zaiane |502|69h27m48s|$1160.52|**
|(780) 789 0123|Elham Ahmadi |259|35h56m10s|$ 596.94|
|(780) 876 5432|Amir Hossein Faghih Dinevari |129|17h22m32s|$ 288.56|
|(780) 890 7654|Weifeng Chen |245|33h48m46s|$ 539.41|
|(780) 987 6543|Farrukh Ahmed |374|52h50m11s|$ 883.72|**
+--------------+------------------------------+---+---------+--------+
| Total dues | $ 5722.07|
+--------------+-----------------------------------------------------+
This is my code and I am having the most trouble with the time() and due() functions
from collections import Counter
customers=open('customers.txt','r')
calls=open('calls.txt.','r')
def main():
customers=open('customers.txt','r')
calls=open('calls.txt.','r')
print("+--------------+------------------------------+---+---------+--------+")
print("| Phone number | Name | # |Duration | Due |")
print("+--------------+------------------------------+---+---------+--------+")
phone_sorter()
number_calls()
time()
due()
def phone_sorter():
sorted_no={}
for line in customers:
rows=line.split(";")
sorted_no[rows[1]]=rows[0]
for value in sorted(sorted_no.values()):
for key in sorted_no.keys():
if sorted_no[key] == value:
print(sorted_no[key],key)
def number_calls():
no_calls={}
for line in calls:
rows=line.split(";")
if rows[1] not in no_calls:
no_calls[rows[1]]=1
else:
no_calls[rows[1]]+=1
s=sorted(no_calls.keys())
for key in s:
print(no_calls[key])
def time():
calls=open('calls.txt.','r')
n_list=[]
d={}
for line in calls:
rows=line.split(";")
d[rows[1]]=rows[3]
if rows[1] not in d:
d[rows[1]]=rows[3]
else:
d[rows[1]]+=rows[3]
x=sorted(d.keys())
for value in x:
m, s = divmod(int(value), 60)
h, m = divmod(m, 60)
print("%d:%02d:%02d" % (h, m, s))
def due():
calls=open('calls.txt.','r')
d2={}
for line in calls:
rows=line.split(";")
d2[rows[1]]=float(rows[3])*float(rows[4])
if rows[1] not in d2:
d2[rows[1]]=float(rows[3])*float(rows[4])
else:
d2[rows[1]]+=float(rows[3])*float(rows[4])
x=sorted(d2.keys())
for key in x:
print(d2[key])
print(sum(d2.values()))
main()
This is the link to the file I am reading in pastebin: http://pastebin.com/RSMnXDtq
The first column is for the phone number. This number has to be formatted as (999) 999 9999.
The second column is for the name and it has to be 30 characters wide.
The third column is for the number of calls originating from the phone in question. It should be on 3 digits.
The fourth column is for the total duration of the calls originating from the phone in question. This duration should be formatted as follows: 99h99m99s for hours, minutes and seconds. The minutes and seconds should have a prefix of 0 if less than 10.
The fifth column is for the amount paid for the calls calculated based on the rates attached to each call. Note that the duration for each call should be rounded up to the minute in order to use the rate per minute. This amount should be printed with 7 positions and only 2 after the decimal point.
Here is a solution using pandas:
from pandas import np, read_csv
#
# Load and process call data
#
def get_data(calls_fname, custs_fname):
# load call data
calls = read_csv(
calls_fname,
sep = ";",
names = ["session", "phone", "to", "seconds", "rate"],
header = None
)
# calculate cost per call (time rounded up to the next minute)
calls["cost"] = np.ceil(calls["seconds"] / 60.) * calls["rate"]
# add a call-count column
# (I think there is a better way to do this using np.size in
# the .groupby, but I haven't been able to figure it out)
calls["count"] = 1
# find per-cust totals
calls = calls.groupby(["phone"]).agg({"seconds":np.sum, "cost":np.sum, "count":np.sum})
# load customer data
custs = read_csv(
custs_fname,
sep = ";",
names = ["phone", "name"],
header = None,
index_col = 0 # index by phone number
)
# join to find customer name
return calls.join(custs, sort=False).reset_index()
#
# output formatting functions
#
def phone_str(i):
"""
Convert int 1234567890 to str "(123) 456 7890"
"""
s = str(i).zfill(10)
return "({}) {} {}".format(s[0:3], s[3:6], s[6:10])
def time_str(i):
"""
Convert int 3662 to str " 1h01m02s"
"""
m, s = divmod(i, 60)
h, m = divmod(m, 60)
return "{:>2d}h{:02d}m{:02d}s".format(h, m, s)
def make_table(totals):
header = (
"+--------------+------------------------------+---+---------+--------+\n"
"| Phone number | Name | # |Duration | Due |\n"
"+--------------+------------------------------+---+---------+--------+\n"
)
rows = [
"|{}|{:<30}|{:>3d}|{}|${:7.2f}|\n"
.format(
phone_str(row["phone" ]),
row["name" ],
row["count" ],
time_str (row["seconds"]),
row["cost" ]
)
for i,row in totals.iterrows()
]
total_dues = np.sum(totals["cost"])
footer = (
"+--------------+------------------------------+---+---------+--------+\n"
"| Total dues | ${:10.2f}|\n"
"+--------------+-----------------------------------------------------+"
.format(total_dues)
)
return header + "".join(rows) + footer
def main():
totals = get_data("calls.txt", "customers.txt")
print(make_table(totals))
if __name__ == "__main__":
main()
Using the data from your pastebin link as calls.txt, and the following as customers.txt:
7801236789;Stuart Johnson
7804321098;Hamman Samuel
7803214567;Md Toukir Imam
7804922860;Osmar Zaiane
7801234567;Ameneh Gholipour Shahraki
7807890123;Elham Ahmadi
7808765432;Amir Hossein Faghih Dinevari
7808907654;Weifeng Chen
7809876543;Farrukh Ahmed
it produces
+--------------+------------------------------+---+---------+--------+
| Phone number | Name | # |Duration | Due |
+--------------+------------------------------+---+---------+--------+
|(780) 123 4567|Ameneh Gholipour Shahraki |384|55h07m53s|$ 876.97|
|(780) 123 6789|Stuart Johnson |132|17h53m19s|$ 288.81|
|(780) 321 4567|Md Toukir Imam |363|49h52m12s|$ 827.48|
|(780) 432 1098|Hamman Samuel |112|16h05m09s|$ 259.66|
|(780) 492 2860|Osmar Zaiane |502|69h27m48s|$1160.52|
|(780) 789 0123|Elham Ahmadi |259|35h56m10s|$ 596.94|
|(780) 876 5432|Amir Hossein Faghih Dinevari |129|17h22m32s|$ 288.56|
|(780) 890 7654|Weifeng Chen |245|33h48m46s|$ 539.41|
|(780) 987 6543|Farrukh Ahmed |374|52h50m11s|$ 883.72|
+--------------+------------------------------+---+---------+--------+
| Total dues | $ 5722.07|
+--------------+-----------------------------------------------------+

Categories