how to parse a txt file to csv and modify formatting - python
Is there a way I can use python to take my animals.txt file results and convert it to csv and format it differently?
Currently the animals.txt file looks like this:
ID:- 512
NAME:- GOOSE
PROJECT NAME:- Random
REPORT ID:- 30321
REPORT NAME:- ANIMAL
KEYWORDS:- ['"help,goose,Grease,GB"']
ID:- 566
NAME:- MOOSE
PROJECT NAME:- Random
REPORT ID:- 30213
REPORT NAME:- ANIMAL
KEYWORDS:- ['"Moose, boar, hansel"']
I would like the CSV file to present it as:
ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS
Followed by the results underneath each header
Here is a script I have wrote:
import re
import csv
with open("animals.txt") as f: text = f.read()
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=keys)
writer.writeheader()
for x in data:
writer.writerow(x)
An easy way to do is parsing using regex and store them in a dict, just before you write the final csv:
import re
# `text` is your input text
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
And to CSV:
import csv
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, escapechar='\\')
writer.writerow(data.keys())
for i in range(len(data[keys[0]])):
writer.writerow([data[k][i] for k in keys])
Output in csv:
ID,NAME,PROJECT NAME,REPORT ID,REPORT NAME,KEYWORDS
512,GOOSE,Random,30321,ANIMAL,['\"help\,goose\,Grease\,GB\"']
566,MOOSE,Random,30213,ANIMAL,['\"Moose\, boar\, hansel\"']
Note that I used re.M multiline mode since there's a trick in your text, preventing matching ID twice! Also the default write rows needed to be twisted.
Also uses \ to escape the quote.
This should work:
fname = 'animals.txt'
with open(fname) as f:
content = f.readlines()
content = [x.strip() for x in content]
output = 'ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS\n'
line_output = ''
for i in range(0, len(content)):
if content[i]:
line_output += content[i].split(':-')[-1].strip() + ','
elif not content[i] and not content[i - 1]:
output += line_output.rstrip(',') + '\n'
line_output = ''
output += line_output.rstrip(',') + '\n'
print(output)
That's the code in Autoit (www.autoitscript.com)
Global $values_A = StringRegExp(FileRead("json.txt"), '[ID|NAME|KEYWORDS]:-\s(.*)?', 3)
For $i = 0 To UBound($values_A) - 1 Step +6
FileWrite('out.csv', $values_A[$i] & ',' & $values_A[$i + 1] & ',' & $values_A[$i + 2] & ',' & $values_A[$i + 3] & ',' & $values_A[$i + 4] & ',' & $values_A[$i + 5] & #CRLF)
Next
Related
Import CSV File and Doing arithmetic Opertions without importing any Library in PYTHON
My CSV file Looks Like this Time_stamp; Mobile_number; Download; Upload; Connection_start_time; Connection_end_time; location 1/2/2020 10:43:55;+917777777777;213455;2343;1/2/2020 10:43:55;1/2/2020 10:47:25;09443 1/3/2020 10:33:10;+919999999999;345656;3568;1/3/2020 10:33:10;1/3/2020 10:37:20;89442 1/4/2020 11:47:57;+919123456654;345789;7651;1/4/2020 11:11:10;1/4/2020 11:40:22;19441 1/5/2020 11:47:57;+919123456543;342467;4157;1/5/2020 11:44:10;1/5/2020 11:59:22;29856 1/6/2020 10:47:57;+917777777777;213455;2343;1/6/2020 10:43:55;1/6/2020 10:47:25;09443 MY Question is Without importing any Library file How i can read a CSV file & user have to enter the Mobile number & Program should show the Data usage of that number. ie: Arithmetic Operation (Adding Uplink & downlink ) & get the result (Total Data Used)of that specific Mobile number. Here is what my code looks Like. ( i don't want to import any Pandas Library. ) import pandas as pd df = pd.read_csv('test.csv', sep=';') df.columns = [col.strip() for col in df.columns] usage = df[['Download', 'Upload']][df.Mobile_number == +917777777777].sum().sum() print(usage)
I'd use csv.DictReader In [30]: with open('x', 'r') as f: ...: r = csv.DictReader(f, delimiter=';') ...: dct = {} ...: for row in r: ...: dct.setdefault(row[' Mobile_number'], []).append(row) ...: In [31]: dct Out[31]: {'+917777777777': [OrderedDict([('Time_stamp', '1/2/2020 10:43:55'), (' Mobile_number', '+917777777777'), (' Download', '213455'), (' Upload', '2343'), (' Connection_start_time', '1/2/2020 10:43:55'), (' Connection_end_time', '1/2/2020 10:47:25'), (' location', '09443')]), OrderedDict([('Time_stamp', '1/6/2020 10:47:57'), (' Mobile_number', '+917777777777'), (' Download', '213455'), (' Upload', '2343'), (' Connection_start_time', '1/6/2020 10:43:55'), (' Connection_end_time', '1/6/2020 10:47:25'), (' location', '09443')])], '+919999999999': [OrderedDict([('Time_stamp', '1/3/2020 10:33:10'), (' Mobile_number', '+919999999999'), (' Download', '345656'), (' Upload', '3568'), (' Connection_start_time', '1/3/2020 10:33:10'), (' Connection_end_time', '1/3/2020 10:37:20'), (' location', '89442')])], '+919123456654': [OrderedDict([('Time_stamp', '1/4/2020 11:47:57'), (' Mobile_number', '+919123456654'), (' Download', '345789'), (' Upload', '7651'), (' Connection_start_time', '1/4/2020 11:11:10'), (' Connection_end_time', '1/4/2020 11:40:22'), (' location', '19441')])], '+919123456543': [OrderedDict([('Time_stamp', '1/5/2020 11:47:57'), (' Mobile_number', '+919123456543'), (' Download', '342467'), (' Upload', '4157'), (' Connection_start_time', '1/5/2020 11:44:10'), (' Connection_end_time', '1/5/2020 11:59:22'), (' location', '29856')])]} In [32]: You then process list of dict for a given mobile number by something like usage = sum(float(_[' Download']) + float(_[' Upload']) for _ in dct['+91777777777'])
Noting that you specifically wanted to avoid importing any libraries (I assume this means you want to avoid importing even from the included modules) - for a trivial file (I named one supermarkets.csv, the content looks like this): ID,Address,City,State,Country,Name,Employees 1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8 2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15 3,332 Hill St,San Francisco,California 94114,USA,Super River,25 4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10 5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12 6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20 Then you can do something like this: data = [] with open("supermarkets.csv") as f: for line in f: data.append(line) print(data) From here you can manipulate your each of the entries in the list using string tools and list comprehensions.
You could try open that does not require any library, to read your file and then iterate through it with readlines. Split the line and check your condition depending on the place in the file your data are. usage=0 with open('test.csv', 'r') as f: for line in f.readlines(): try: line_sp = line.split(';') if line_sp[1]=='+917777777777': usage += int(line_sp[2])+int(line_sp[3]) except: #print(line) pass print (usage)
Using no imported modules # read file and create dict of phone numbers phone_dict = dict() with open('test.csv') as f: for i, l in enumerate(f.readlines()): l = l.strip().split(';') if (i != 0): mobile = l[1] download = int(l[2]) upload = int(l[3]) if phone_dict.get(mobile) == None: phone_dict[mobile] = {'download': [download], 'upload': [upload]} else: phone_dict[mobile]['download'].append(download) phone_dict[mobile]['upload'].append(upload) print(phone_dict) {'+917777777777': {'download': [213455, 213455], 'upload': [2343, 2343]}, '+919999999999': {'download': [345656], 'upload': [3568]}, '+919123456654': {'download': [345789], 'upload': [7651]}, '+919123456543': {'download': [342467], 'upload': [4157]}} # function to return usage def return_usage(data: dict, number: str): download_usage = sum(data[number]['download']) upload_usage = sum(data[number]['upload']) return download_usage + upload_usage # get user input to return usage number = input('Please input a phone number') usage = return_usage(phone_dict, number) print(usage) >>> Please input a phone number (numbers only) +917777777777 >>> 431596
A combination of csv and defaultdict can fit your use case: import csv from collections import defaultdict d= defaultdict(list) with open('data.txt',newline='') as csvfile: reader = csv.DictReader(csvfile, delimiter=';', skipinitialspace = True) headers = reader.fieldnames for row in reader: row['Usage'] = int(row['Upload']) + int(row['Download']) d[row.get('Mobile_number')].append(row["Usage"]) print(d) defaultdict(list, {'+917777777777': [215798, 215798], '+919999999999': [349224], '+919123456654': [353440], '+919123456543': [346624]}) #get sum for specific mobile number : sum(d.get("+917777777777")) 431596 Additional details : new_d = {} for k,v in d.items(): kb = sum(v) mb = kb/1024 gb = kb/1024**2 usage = F"{kb}KB/{mb:.2f}MB/{gb:.2f}GB" new_d[k] = usage print(new_d) {'+917777777777': '431596KB/421.48MB/0.41GB', '+919999999999': '349224KB/341.04MB/0.33GB', '+919123456654': '353440KB/345.16MB/0.34GB', '+919123456543': '346624KB/338.50MB/0.33GB'}
Converting Fixed-Width File to .txt then the .txt to .csv
I have a fixed-width file that I have no issues importing and splitting into 31 txt files. The spaces from the fixed-width file are conserved in this process since the writing to the txt simply writes each entry from the fixed-width file as a new line. My issue is that when I use python's csv function these spaces are replaced with "(a quotation mark) as a place holder. I'm looking to see if there is a way to have a csv file produced without these double quotes as place holders while maintaining the required formatting initially set in the fixed-width file. Initial line in txt doc: 'PAY90004100095206 9581400086000909 0008141000 5350 3810 C 000021841998051319980513P810406247 FELT, MARTIN & FRAZIER, P.C. FELT, MARTIN & FRAZIER, P.C. 208 NORTH BROADWAY STE 313 BILLINGS MT59101-0 NLance Martin v. Whitman College N00000000NN98004264225 SYS656 19980512+000000378761998041319980421+000000378769581400086000909 000+000000 Lance Martin v. Whitman College 00000000 00010001 +00000000000002184 000000021023.005000000003921.005\n' .py: import csv read_loc = 'c:/Users/location/e0290000005.txt' e02ext_start = read_loc.find('e02') e02_ext = read_loc[int(e02ext_start):] with open(read_loc, 'r') as f: contents = f.readlines() dict_of_record_lists = {} # takes first 3 characters of each line and if a matching dictionary key is found # it appends the line to the value-list for line in contents: record_type = (line[:3]) dict_of_record_lists.setdefault(record_type,[]).append(line) slice_list_CLM = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,59),(59,109),(109,189),(189,191),(191,193),(193,194),(194,195),(195,203),(203,211),(211,219),(219,227),(227,235),(235,237),(237,239),(239,241),(241,245),(245,249),(249,253),(253,257),(257,261),(261,291),(291,316),(316,331),(331,332),(332,357),(357,377),(377,378),(378,408),(408,438),(438,468),(468,470),(470,485),(485,505),(505,514),(514,517),(517,525),(525,533),(533,535),(535,536),(536,537),(537,545),(545,551),(551,553),(553,568),(568,572),(572,587),(587,602),(602,627),(627,631),(631,638),(638,642),(642,646),(646,654),(654,662),(662,670),(670,672),(672,674),(674,675),(675,676),(676,682),(682,700),(700,708),(708,716),(716,717),(717,725),(725,733),(733,741),(741,749),(749,759),(759,761),(761,762),(762,763),(763,764),(764,765),(765,768),(768,769),(769,770),(770,778),(778,779),(779,783),(783,787),(787,788),(788,805),(805,817),(817,829),(829,833),(833,863),(863,893),(893,896),(896,897),(897,898),(898,928),(928,936),(936,944),(944,945),(945,947),(947,959),(959,971),(971,983),(983,995),(995,1007),(1007,1019),(1019,1031),(1031,1043),(1043,1055),(1055,1067),(1067,1079),(1079,1091),(1091,1103),(1103,1115),(1115,1127),(1127,1139),(1139,1151),(1151,1163),(1163,1175),(1175,1187),(1187,1197),(1197,1202),(1202,1203),(1203,1211),(1211,1214),(1214,1215),(1215,1233),(1233,1241),(1241,1257),(1257,1272),(1272,1273),(1273,1285),(1285,1289),(1289,1293),(1293,1343),(1343,1365),(1365,1685),(1685,1686),(1686,1704),(1704,1708),(1708,1748),(1748,1768),(1768,1770),(1770,1772),(1772,1773),(1773,1782),(1782,1784),(1784,1792),(1792,1793),(1793,1796),(1796,1800)] slice_list_CTL = [(0,3),(3,7),(7,15),(15,23),(23,31),(31,39),(39,47),(47,55),(55,56),(56,65),(65,74),(74,83),(83,98),(98,113),(113,128),(128,143),(143,158),(158,173),(173,188),(188,203),(203,218),(218,233),(233,248),(248,263),(263,278),(278,293),(293,308),(308,323),(323,338),(338,353),(353,368),(368,383),(383,398),(398,413),(413,428),(428,443),(443,458),(458,473),(473,488),(488,503),(503,518),(518,527),(527,536),(536,545),(545,554),(554,563),(563,572),(572,581),(581,590),(590,599),(599,614),(614,623),(623,638),(638,647),(647,662),(662,671),(671,686),(686,695),(695,710),(710,719),(719,728),(728,737),(737,746),(746,755),(755,764),(764,773),(773,782),(782,791),(791,800),(800,809),(809,818),(818,827),(827,836),(836,845),(845,854),(854,863),(863,872),(872,881),(881,890),(890,899),(899,908)] slice_list_ADR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,50),(50,53),(53,62),(62,65),(65,66),(66,91),(91,111),(111,121),(121,151),(151,181),(181,206),(206,208),(208,223),(223,243),(243,261),(261,265),(265,283),(283,287),(287,305),(305,335),(335,375),(375,383),(383,387),(387,437),(437,438),(438,446),(446,454),(454,461),(461,468),(468,484),(484,500)] slice_list_AGR = [(0,3),(3,7),(7,45),(45,85),(85,93),(93,101),(101,109),(109,117),(117,127),(127,139),(139,151)] slice_list_ACN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,115),(115,145),(145,165),(165,195),(195,215),(215,245),(245,265),(265,295),(295,315),(315,345),(345,365),(365,395),(395,415),(415,445),(445,465),(465,495),(495,515),(515,545),(545,565),(565,595),(595,615),(615,645),(645,665),(665,695),(695,715),(715,745),(745,765),(765,795),(795,815),(815,845),(845,865),(865,895),(895,915),(915,945),(945,965),(965,995),(995,1015),(1015,1045),(1045,1061)] slice_list_CST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,59),(59,60),(60,61),(61,62),(62,64),(64,80),(80,82),(82,84),(84,86),(86,88),(88,104)] slice_list_MCF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,49),(49,79),(79,94),(94,159),(159,175),(175,191)] slice_list_DD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,62),(62,63),(63,69),(69,75),(75,81),(81,87),(87,93),(93,94),(94,95),(95,103),(103,111),(111,119),(119,126),(126,134),(134,143),(143,154),(154,162),(162,170),(170,178),(178,186),(186,194),(194,202),(202,205),(205,208),(208,210),(210,218),(218,220),(220,228),(228,230),(230,238),(238,240),(240,248),(248,250),(250,258),(258,274)] slice_list_DES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,1300),(1300,1316)] slice_list_IBC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,48),(48,50),(50,54),(54,55),(55,56),(56,81),(81,101),(101,121),(121,124),(124,125),(125,145),(145,146),(146,149),(149,152),(152,154),(154,179),(179,199),(199,219),(219,222),(222,224),(224,227),(227,230),(230,238),(238,249),(249,265),(265,281)] slice_list_ICD = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,63),(63,69),(69,75),(75,81),(81,87),(87,95),(95,103),(103,111),(111,114),(114,122),(122,125),(125,126),(126,142),(142,144),(144,152),(152,154),(154,162),(162,164),(164,172),(172,174),(174,182),(182,184),(184,192),(192,208)] slice_list_LEG = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,65),(65,73),(73,81),(81,82),(82,90),(90,98),(98,133),(133,148),(148,163),(163,164),(164,172),(172,180),(180,181),(181,216),(216,256),(256,296),(296,326),(326,356),(356,381),(381,383),(383,398),(398,418),(418,438),(438,456),(456,474),(474,509),(509,549),(549,589),(589,619),(619,649),(649,674),(674,676),(676,691),(691,711),(711,731),(731,749),(749,767),(767,782),(782,790),(790,798),(798,806),(806,810),(810,818),(818,826),(826,834),(834,840),(840,849),(849,879),(879,888),(888,918),(918,920),(920,921),(921,923),(923,931),(931,939),(939,943),(943,944),(944,952),(952,960),(960,990),(990,1020),(1020,1050),(1050,1051),(1051,1086),(1086,1095),(1095,1135),(1135,1175),(1175,1205),(1205,1235),(1235,1260),(1260,1262),(1262,1277),(1277,1295),(1295,1304),(1304,1312),(1312,1328)] slice_list_LD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,125),(125,150),(150,152),(152,167),(167,187),(187,205),(205,223),(223,227),(227,252),(252,267),(267,279),(279,309),(309,339),(339,359),(359,361),(361,376),(376,396),(396,414),(414,439),(439,440),(440,448),(448,454),(454,456),(456,871),(471,472),(472,492),(492,522),(522,552),(552,572),(572,574),(574,589),(589,609),(609,627),(627,637),(637,645),(645,685),(685,686),(686,706),(706,714),(714,744),(744,774),(774,794),(794,796),(796,811),(811,831),(831,849),(849,879),(879,909),(909,929),(929,931),(931,946),(946,966),(966,984),(984,992),(992,1004),(1004,1024),(1024,1064),(1064,1081),(1081,1098),(1098,1106),(1106,1121),(1121,1122),(1122,1152),(1152,1153),(1153,1162),(1162,1170),(1170,1185),(1185,1190),(1190,1220),(1220,1238),(1238,1253),(1253,1283),(1283,1301),(1301,1302),(1302,1303),(1303,1333),(1333,1363),(1363,1388),(1388,1390),(1390,1405),(1405,1406),(1406,1436),(1436,1442),(1442,1462),(1462,1463),(1463,1478),(1478,1493),(1493,1533),(1533,1535),(1535,1538),(1538,1540),(1540,1556),(1556,1756)] slice_list_LD2 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,78),(78,118),(118,148),(148,178),(178,203),(203,205),(205,220),(220,238),(238,256),(256,260),(260,270),(270,290),(290,300),(300,302),(302,322),(322,352),(352,377),(377,397),(397,398),(398,423),(423,424),(424,454),(454,455),(455,456),(456,458),(458,474)] slice_list_LD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,71),(71,91),(91,92),(92,122),(122,152),(152,177),(177,179),(179,194),(194,197),(197,205),(205,213),(213,221),(221,229),(229,237),(237,297),(297,305),(305,313),(313,321),(321,329),(329,337),(337,345),(345,353),(353,361),(361,421),(421,429),(429,489),(489,497),(497,557),(557,617),(617,633)] slice_list_NET = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,88),(88,99),(99,105),(105,135),(135,146),(146,152),(152,182),(182,193),(193,199),(199,229),(229,240),(240,246),(246,276),(276,287),(287,293),(293,323),(323,334),(334,340),(340,370),(370,381),(381,387),(387,417),(417,428),(428,434),(434,464),(464,475),(475,481),(481,511),(511,522),(522,528),(528,558),(558,569),(569,575),(575,605),(605,616),(616,622),(622,652),(652,663),(663,669),(669,699),(699,710),(710,716),(716,746),(746,757),(757,763),(763,793),(793,804),(804,810),(810,840),(840,851),(851,857),(857,887),(887,898),(898,904),(904,934),(934,945),(945,951),(951,981),(981,992),(992,998),(998,1028),(1028,1039),(1039,1047),(1047,1055),(1055,1061),(1061,1077),(1077,1087),(1087,1103)] slice_list_NOT = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,63),(63,71),(71,77),(77,79),(79,1279),(1279,1295),(1295,1296),(1296,1312)] slice_list_OFF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,75),(75,78),(78,93),(93,105),(105,107),(107,115),(115,123),(123,131),(131,132),(132,148)] slice_list_PAY = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,61),(61,65),(65,73),(73,81),(81,89),(89,90),(90,130),(130,165),(165,205),(205,245),(245,275),(275,305),(305,330),(330,332),(332,347),(347,367),(367,368),(368,428),(428,429),(429,437),(437,438),(438,439),(439,450),(450,452),(452,455),(455,458),(458,473),(473,481),(481,493),(493,501),(501,509),(509,521),(521,539),(539,542),(542,549),(549,552),(552,562),(562,567),(567,627),(627,635),(635,643),(643,647),(647,651),(651,653),(653,654),(654,684),(684,692),(692,702),(702,713),(713,1034),(1034,1050),(1050,1066)] slice_list_PRC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,51),(51,81),(81,84),(84,87),(87,95),(95,103),(103,119),(119,125),(125,131),(131,147)] slice_list_ACR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,51),(51,59),(59,71),(71,79),(79,91),(91,103),(103,119),(119,135)] slice_list_REC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,58),(58,71),(71,84),(84,97),(97,110),(110,123),(123,136),(136,149),(149,162),(162,175),(175,188),(188,201),(201,214),(214,227),(227,240),(240,253),(253,266),(266,279),(279,292),(292,305),(305,318),(318,331),(331,344),(344,357),(357,370),(370,383),(383,396),(396,409),(409,422),(422,435),(435,448),(448,461),(461,474),(474,487),(487,500),(500,513),(513,526),(526,539),(539,552),(552,565),(565,578),(578,591),(591,604),(604,617),(617,630),(630,643),(643,656),(656,669),(669,682),(682,695),(695,708),(708,721),(721,734),(734,747),(747,760),(760,773),(773,786),(786,799),(799,812),(812,825),(825,838),(838,851),(851,864),(864,877),(877,890),(890,903),(903,916),(916,929),(929,942),(942,955),(955,968),(968,981),(981,997)] slice_list_RED = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,69),(69,81),(81,93),(93,105),(105,117),(117,129),(129,141),(141,157)] slice_list_REI = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,61),(61,67),(67,87),(87,88),(88,100),(100,108),(108,116),(116,176),(176,192),(192,193),(193,199),(199,214),(214,222),(222,230),(230,238),(238,250),(250,251),(251,311),(311,327)] slice_list_RES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,134),(134,136),(136,148),(148,160),(160,172),(172,184),(184,196),(196,208),(208,220),(220,232),(232,242),(242,252),(252,262),(262,272),(272,282),(282,292),(292,299),(299,309),(309,319),(319,329),(329,339),(339,349),(349,359),(359,369),(369,379),(379,389),(389,399),(399,409),(409,419),(419,429),(429,439),(439,449),(449,465),(465,475),(475,975),(975,991)] slice_list_RST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,87),(87,95),(95,125),(125,145),(145,161),(161,177)] slice_list_SPC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,85),(85,93),(93,101),(101,109),(109,117),(117,125),(125,133),(133,149)] slice_list_SSN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,62),(62,74),(74,82),(82,94),(94,102),(102,114),(114,122),(122,134),(134,142),(142,143),(143,151),(151,159),(159,160),(160,168),(168,176),(176,177),(177,185),(185,193),(193,194),(194,202),(202,210),(210,211),(211,219),(219,220),(220,228),(228,268),(268,276),(276,277),(277,293)] slice_list_WRK = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,57),(57,72),(72,73),(73,81),(81,82),(82,90),(90,98),(98,106),(106,114),(114,122),(122,130),(130,131),(131,132),(132,133),(133,153),(153,154),(154,155),(155,159),(159,179),(179,180),(180,240),(240,248),(248,256),(256,264),(264,272),(272,280),(280,284),(284,288),(288,298),(298,314),(314,330)] slice_list_WD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,58),(58,59),(59,60),(60,61),(61,63),(63,73),(73,74),(74,82),(82,83),(83,91),(91,99),(99,107),(107,108),(108,118),(118,120),(120,130),(130,137),(137,139),(139,149),(149,156),(156,158),(158,168),(168,175),(175,177),(177,187),(187,194),(194,196),(196,206),(206,213),(213,223),(223,233),(233,243),(243,253),(253,263),(263,273),(273,283),(283,293),(293,303),(303,311),(311,314),(314,322),(322,332),(332,342),(342,352),(352,353),(353,354),(354,355),(355,365),(365,375),(375,385),(385,395),(395,405),(405,415),(415,425),(425,435),(435,436),(436,437),(437,438),(438,439),(439,440),(440,442),(442,443),(443,444),(444,445),(445,446),(446,448),(448,458),(458,460),(460,470),(470,472),(472,482),(482,484),(484,494),(494,496),(496,506),(506,508),(508,518),(518,528),(528,542),(542,543),(543,551),(551,559),(559,561),(561,565),(565,567),(567,574),(574,582),(582,583),(583,584),(584,585),(585,593),(593,594),(594,595),(595,596),(596,604),(604,605),(605,606),(606,607),(607,615),(615,616),(616,617),(617,618),(618,626),(626,627),(627,628),(628,629),(629,637),(637,645),(645,653),(653,661),(661,669),(669,677),(677,685),(685,693),(693,701),(701,709),(709,717),(717,721),(721,729),(729,732),(732,734),(734,738),(738,746),(746,749),(749,751),(751,755),(755,763),(763,766),(766,774),(774,782),(782,790),(790,798),(798,800),(800,801),(801,802),(802,813),(813,829)] slice_list_WD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,47),(47,48),(48,49),(49,50),(50,51),(51,52),(52,53),(53,54),(54,55),(55,56),(56,57),(57,58),(58,98),(98,138),(138,178),(178,182),(182,183),(183,191),(191,197),(197,213)] slice_dict = { 'CLM' : slice_list_CLM, 'CTL' : slice_list_CTL, 'ADR' : slice_list_ADR, 'AGR' : slice_list_AGR, 'ACN' : slice_list_ACN, 'CST' : slice_list_CST, 'MCF' : slice_list_MCF, 'DD1' : slice_list_DD1, 'DES' : slice_list_DES, 'IBC' : slice_list_IBC, 'ICD' : slice_list_ICD, 'LEG' : slice_list_LEG, 'LD1' : slice_list_LD1, 'LD2' : slice_list_LD2, 'LD3' : slice_list_LD3, 'NET' : slice_list_NET, 'NOT' : slice_list_NOT, 'OFF' : slice_list_OFF, 'PAY' : slice_list_PAY, 'PRC' : slice_list_PRC, 'ACR' : slice_list_ACR, 'REC' : slice_list_REC, 'RED' : slice_list_RED, 'REI' : slice_list_REI, 'RES' : slice_list_RES, 'RST' : slice_list_RST, 'SPC' : slice_list_SPC, 'SSN' : slice_list_SSN, 'WRK' : slice_list_WRK, 'WD1' : slice_list_WD1, 'WD3' : slice_list_WD3, } def slicer(file,slice_list): csv_string = "" for i in slice_list: csv_string += (file[i[0]:i[1]]+",") return csv_string overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs with open(overview_loc, 'w') as overview_file: for key, value in dict_of_record_lists.items(): overview_file.write((key+' '+(str(len(value)))+'\n')) for key, value in dict_of_record_lists.items(): for k, v in slice_dict.items(): if key == k: iteration = 0 for i in value: s = slicer(i,v) value[iteration] = s iteration+= 1 e02_ext = read_loc[int(e02ext_start):] csv_ext = e02_ext[:-3]+'csv' # file overview/log that shows how many lines should exist in the other files to ensure everything wrote correctly overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs with open(overview_loc, 'w') as overview_file: for key, value in dict_of_record_lists.items(): overview_file.write((key+' '+(str(len(value)))+'\n')) # if the list isn't empty writes a new file w/prefix matching key and includes the lines for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+e02_ext with open(write_loc, "w", newline='') as parsed_file: for line in value: line_pre = "%s\n" % line parsed_file.write(line_pre[:-1]) for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: writer.writerow(i) This is a sample of a section of output in both Excel and our SQL table: P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 " " " " " " " " Desired output (void of " as place holders for spaces): P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 Any help would be greatly appreciated.
Why: The problem you are facing is, that you have list entries inside your row of processed data that only contain the csv.delimiter character. The module then quotes them to distinguish your "delimiter-only-data" from your "delimiter between columns". When writing something like [ ["PAY","...."," "," "," "," "] ] into a csv using ' ' as divider you get them outputted quoted: import csv dict_of_record_lists = {"K": [ ["PAY","...."," "," "," "," "] ] } for key, value in dict_of_record_lists.items(): write_loc = 't.txt' with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: writer.writerow(i) print( open(write_loc).read()) # PAY .... " " " " " " " " Fix: You can fix that specifying quoting=csv.QUOTE_NONE and provide a escapechar = ... or by fixing your data. Providing an escapechar would put that into your file though. Relevant portions of the documentation: csv.QUOTE_NONE. You can manipulate your data to not contain "only" the delimiters as data: for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: # if an inner item only contains delimiter characters, set it to empty string cleared = [x for x in i if i.strip(" ") else ""] writer.writerow(cleared) HTH Doku: https://docs.python.org/3/library/csv.html
Was able to change the initial text writing portion to: for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as parsed_file: for line in value: line_pre = "%s" % line parsed_file.write(line_pre[:-1]+'\n') All the issues were fixed by avoiding python's built-in CSV writer. The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created. A text file can be created by swapping out the extension upon writing.
CSV file not properly filled up with details
import csv TextFileContent = open('tickets.txt') with open('example4.csv', 'w') as csvfile: fieldnames = ['Author', 'ticket number', 'Revision'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for TextLine in TextFileContent: if 'Revision:' in TextLine: tmp=TextLine.replace('Revision:', "") print(tmp) writer.writerow({'Revision': tmp}) elif 'Author:' in TextLine: tmp=TextLine.replace("Author:", "") print(tmp) writer.writerow({'Author': tmp}) elif 'Contributes to:' in TextLine: tmp=TextLine.replace("Contributes to:", "") print(tmp) writer.writerow({'ticket number': tmp}) Hi all i have developed above python script to extract "Author", "Ticket" and "revision" details from text file and then filled up that infomation to CSV file. Now i am able to extract all information but the data not correctly filled up in CSV file. the text file content is like below Revision: 22904 Author: Userx Contributes to: CF-1159 Revision: 22887 Author: Usery Contributes to: CF-955 Revision: 22884 Author: UserZ Contributes to: CPL-7768 And i want result in CSV file like below Author ticket number Revision Userx CF-1159 22904 Usery CF-955 22887 UserZ CPL-7768 22884
Your code writes a row as soon as it finds any field instead of waiting until it has read a full set of fields. The following edit waits for a full set and then writes the row. with open('/tmp/out.csv', 'w') as csvfile: fieldnames = ['Author', 'ticket number', 'Revision'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() row = {} for TextLine in TextFileContent: if 'Revision:' in TextLine: row['Revision'] = TextLine.replace('Revision: ', "") elif 'Author:' in TextLine: row['Author'] = TextLine.replace("Author: ", "") elif 'Contributes to:' in TextLine: row['ticket number'] = TextLine.replace("Contributes to: ", "") if len(row) == len(fieldnames): writer.writerow(row) row = {} Note that this will not function correctly unless all records contain all fields.
Row lables to columns in multi column file
I am new to Python and am using version 2.7.1 as part of Hyperion FDMEE. I have a file which I need to reorder the columns plus, split one column into 3 as part of the same file. Source file; ACCOUNT;UD1;UD2;UD3;PERIOD;PERIOD;AMOUNT QTY;032074;99953;53;2017.07.31;2017.07.31;40.91 COGS;032074;99953;53;2017.07.31;2017.07.31;-7488.36 TURNOVER;032074;99953;53;2017.07.31;2017.07.31;505.73 QTY;032075;99960;60;2017.07.31;2017.07.31;40.91 COGS;032075;99960;60;2017.07.31;2017.07.31;-7488.36 TURNOVER;032075;99960;60;2017.07.31;2017.07.31;505.73 I have managed to reorder the columns per this script; infilename = fdmContext["OUTBOXDIR"]+"/Targit_1707.dat" outfilename = fdmContext["OUTBOXDIR"]+"/TargitExport.csv" import csv infile = open(infilename, 'r') outfile = open(outfilename, 'w+') for line in infile: column = line.split(';') outfile.write(column[1] + ";" + column[2] + ";" + column[3] + ";" + column[4] + ";" + column[0] + ";" + str(column[6].strip('\n')) + ";201701" + "\n") outfile.close() infile.close() Producing the result; UD1;UD2;UD3;PERIOD;ACCOUNT;AMOUNT;201701 032074;99953;53;2017.07.31;QTY;40.91;201701 032074;99953;53;2017.07.31;COGS;-7488.36;201701 032074;99953;53;2017.07.31;TURNOVER;505.73;201701 032075;99960;60;2017.07.31;QTY;40.91;201701 032075;99960;60;2017.07.31;COGS;-7488.36;201701 032075;99960;60;2017.07.31;TURNOVER;505.73;201701 but I am struggling to transpose the Account column (QTY, COGS, TURNOVER) into seperate columns as in the example below; UD1;UD2;UD3;PERIOD;QTY;COGS;TURNOVER;201701 032074;99953;53;2017.07.31;40.91;-7488.36;505.73;201701 032075;99960;60;2017.07.31;40.91;-7488.36;505.73;201701 Any suggestions would be very much appreciated.
Use a dict, for instance: import csv fieldnames = infile.readline()[:-1] fieldnames = fieldnames.split(';')[1:5] + ['QTY', 'COGS', 'TURNOVER'] writer = csv.DictWriter(outfile, fieldnames=fieldnames) writer.writeheader() record_dict = {} for i, line in enumerate(infile): if not line: break line = line[:-1].split(';') # Assign column data every 1,2,3 lines mod_row = (i % 3)+1 if mod_row == 1: record_dict['QTY'] = line[6] record_dict['UD1'] = line[1] # ... and so on if mod_row == 2: record_dict['COGS'] = line[6] if mod_row == 3: record_dict['TURNOVER'] = line[6] writer.writerow(record_dict) record_dict = {} Output: UD1,UD2,UD3,PERIOD,QTY,COGS,TURNOVER 032074,,,,40.91,-7488.36,505.73 032075,,,,40.91,-7488.36,505.73 Tested with Python: 3.4.2 Read about: Python ยป 3.6.1 Documentation csv.DictWriter
Group and Check-mark using Python
I have several files, each of which has data like this (filename:data inside separated by newline): Mike: Plane\nCar Paula: Plane\nTrain\nBoat\nCar Bill: Boat\nTrain Scott: Car How can I create a csv file using python that groups all the different vehicles and then puts a X on the applicable person, like:
Assuming those line numbers aren't in there (easy enough to fix if they are), and with an input file like following: Mike: Plane Car Paula: Plane Train Boat Car Bill: Boat Train Scott: Car Solution can be found here : https://gist.github.com/999481 import sys from collections import defaultdict import csv # see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python def main(): # files = ["group.txt"] files = sys.argv[1:] if len(files) < 1: print "usage: ./python_checkmark.py file1 [file2 ... filen]" name_map = defaultdict(set) for f in files: file_handle = open(f, "r") process_file(file_handle, name_map) file_handle.close() print_csv(sys.stdout, name_map) def process_file(input_file, name_map): cur_name = "" for line in input_file: if ":" in line: cur_name, item = [x.strip() for x in line.split(":")] else: item = line.strip() name_map[cur_name].add(item) def print_csv(output_file, name_map): names = name_map.keys() items = set([]) for item_set in name_map.values(): items = items.union(item_set) writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL) writer.writerow( [""] + names ) for item in sorted(items): row_contents = map(lambda name:"X" if item in name_map[name] else "", names) row = [item] + row_contents writer.writerow( row ) if __name__ == '__main__': main() Output: ,Mike,Bill,Scott,Paula Boat,,X,,X Car,X,,X,X Plane,X,,,X Train,,X,,X Only thing this script doesn't do is keep the columns in order that the names are in. Could keep a separate list maintaining the order, since maps/dicts are inherently unordered.
Here is an example of how-to parse these kind of files. Note that the dictionary is unordered here. You can use ordered dict (in case of Python 3.2 / 2.7) from standard library, find any available implmentation / backport in case if you have older Python versions or just save an order in additional list :) data = {} name = None with open(file_path) as f: for line in f: if ':' in line: # we have a name here name, first_vehicle = line.split(':') data[name] = set([first_vehicle, ]) # a set of vehicles per name else: if name: data[name].add(line) # now a dictionary with names/vehicles is available # let's convert it to simple csv-formatted string.. # a set of all available vehicles vehicles = set(v for vlist in data.values() for v in vlist) for name in data: name_vehicles = data[name] csv_vehicles = '' for v in vehicles: if v in name_vehicles: csv_vehicles += v csv_vehicles += ',' csv_line = name + ',' + csv_vehicles
Assuming that the input looks like this: Mike: Plane Car Paula: Plane Train Boat Car Bill: Boat Train Scott: Car This python script, places the vehicles in a dictionary, indexed by the person: #!/usr/bin/python persons={} vehicles=set() with open('input') as fd: for line in fd: line = line.strip() if ':' in line: tmp = line.split(':') p = tmp[0].strip() v = tmp[1].strip() persons[p]=[v] vehicles.add(v) else: persons[p].append(line) vehicles.add(line) for k,v in persons.iteritems(): print k,v print 'vehicles', vehicles Result: Mike ['Plane', 'Car'] Bill ['Boat', 'Train'] Scott ['Car'] Paula ['Plane', 'Train', 'Boat', 'Car'] vehicles set(['Train', 'Car', 'Plane', 'Boat']) Now, all the data needed are placed in data-structures. The csv-part is left as an exercise for the reader :-)
The most elegant and simple way would be like so: vehiclesToPeople = {} people = [] for root,dirs,files in os.walk('/path/to/folder/with/files'): for file in files: person = file people += [person] path = os.path.join(root, file) with open(path) as f: for vehicle in f: vehiclesToPeople.setdefault(vehicle,set()).add(person) people.sort() table = [ ['']+people ] for vehicle,owners in peopleToVehicles.items(): table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people]) csv = '\n'.join(','.join(row) for row in table) You can do pprint.pprint(table) as well to look at it.