Converting Fixed-Width File to .txt then the .txt to .csv - python
I have a fixed-width file that I have no issues importing and splitting into 31 txt files. The spaces from the fixed-width file are conserved in this process since the writing to the txt simply writes each entry from the fixed-width file as a new line.
My issue is that when I use python's csv function these spaces are replaced with "(a quotation mark) as a place holder.
I'm looking to see if there is a way to have a csv file produced without these double quotes as place holders while maintaining the required formatting initially set in the fixed-width file.
Initial line in txt doc:
'PAY90004100095206 9581400086000909 0008141000 5350 3810 C 000021841998051319980513P810406247 FELT, MARTIN & FRAZIER, P.C. FELT, MARTIN & FRAZIER, P.C. 208 NORTH BROADWAY STE 313 BILLINGS MT59101-0 NLance Martin v. Whitman College N00000000NN98004264225 SYS656 19980512+000000378761998041319980421+000000378769581400086000909 000+000000 Lance Martin v. Whitman College 00000000 00010001 +00000000000002184 000000021023.005000000003921.005\n'
.py:
import csv
read_loc = 'c:/Users/location/e0290000005.txt'
e02ext_start = read_loc.find('e02')
e02_ext = read_loc[int(e02ext_start):]
with open(read_loc, 'r') as f:
contents = f.readlines()
dict_of_record_lists = {}
# takes first 3 characters of each line and if a matching dictionary key is found
# it appends the line to the value-list
for line in contents:
record_type = (line[:3])
dict_of_record_lists.setdefault(record_type,[]).append(line)
slice_list_CLM = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,59),(59,109),(109,189),(189,191),(191,193),(193,194),(194,195),(195,203),(203,211),(211,219),(219,227),(227,235),(235,237),(237,239),(239,241),(241,245),(245,249),(249,253),(253,257),(257,261),(261,291),(291,316),(316,331),(331,332),(332,357),(357,377),(377,378),(378,408),(408,438),(438,468),(468,470),(470,485),(485,505),(505,514),(514,517),(517,525),(525,533),(533,535),(535,536),(536,537),(537,545),(545,551),(551,553),(553,568),(568,572),(572,587),(587,602),(602,627),(627,631),(631,638),(638,642),(642,646),(646,654),(654,662),(662,670),(670,672),(672,674),(674,675),(675,676),(676,682),(682,700),(700,708),(708,716),(716,717),(717,725),(725,733),(733,741),(741,749),(749,759),(759,761),(761,762),(762,763),(763,764),(764,765),(765,768),(768,769),(769,770),(770,778),(778,779),(779,783),(783,787),(787,788),(788,805),(805,817),(817,829),(829,833),(833,863),(863,893),(893,896),(896,897),(897,898),(898,928),(928,936),(936,944),(944,945),(945,947),(947,959),(959,971),(971,983),(983,995),(995,1007),(1007,1019),(1019,1031),(1031,1043),(1043,1055),(1055,1067),(1067,1079),(1079,1091),(1091,1103),(1103,1115),(1115,1127),(1127,1139),(1139,1151),(1151,1163),(1163,1175),(1175,1187),(1187,1197),(1197,1202),(1202,1203),(1203,1211),(1211,1214),(1214,1215),(1215,1233),(1233,1241),(1241,1257),(1257,1272),(1272,1273),(1273,1285),(1285,1289),(1289,1293),(1293,1343),(1343,1365),(1365,1685),(1685,1686),(1686,1704),(1704,1708),(1708,1748),(1748,1768),(1768,1770),(1770,1772),(1772,1773),(1773,1782),(1782,1784),(1784,1792),(1792,1793),(1793,1796),(1796,1800)]
slice_list_CTL = [(0,3),(3,7),(7,15),(15,23),(23,31),(31,39),(39,47),(47,55),(55,56),(56,65),(65,74),(74,83),(83,98),(98,113),(113,128),(128,143),(143,158),(158,173),(173,188),(188,203),(203,218),(218,233),(233,248),(248,263),(263,278),(278,293),(293,308),(308,323),(323,338),(338,353),(353,368),(368,383),(383,398),(398,413),(413,428),(428,443),(443,458),(458,473),(473,488),(488,503),(503,518),(518,527),(527,536),(536,545),(545,554),(554,563),(563,572),(572,581),(581,590),(590,599),(599,614),(614,623),(623,638),(638,647),(647,662),(662,671),(671,686),(686,695),(695,710),(710,719),(719,728),(728,737),(737,746),(746,755),(755,764),(764,773),(773,782),(782,791),(791,800),(800,809),(809,818),(818,827),(827,836),(836,845),(845,854),(854,863),(863,872),(872,881),(881,890),(890,899),(899,908)]
slice_list_ADR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,50),(50,53),(53,62),(62,65),(65,66),(66,91),(91,111),(111,121),(121,151),(151,181),(181,206),(206,208),(208,223),(223,243),(243,261),(261,265),(265,283),(283,287),(287,305),(305,335),(335,375),(375,383),(383,387),(387,437),(437,438),(438,446),(446,454),(454,461),(461,468),(468,484),(484,500)]
slice_list_AGR = [(0,3),(3,7),(7,45),(45,85),(85,93),(93,101),(101,109),(109,117),(117,127),(127,139),(139,151)]
slice_list_ACN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,115),(115,145),(145,165),(165,195),(195,215),(215,245),(245,265),(265,295),(295,315),(315,345),(345,365),(365,395),(395,415),(415,445),(445,465),(465,495),(495,515),(515,545),(545,565),(565,595),(595,615),(615,645),(645,665),(665,695),(695,715),(715,745),(745,765),(765,795),(795,815),(815,845),(845,865),(865,895),(895,915),(915,945),(945,965),(965,995),(995,1015),(1015,1045),(1045,1061)]
slice_list_CST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,59),(59,60),(60,61),(61,62),(62,64),(64,80),(80,82),(82,84),(84,86),(86,88),(88,104)]
slice_list_MCF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,49),(49,79),(79,94),(94,159),(159,175),(175,191)]
slice_list_DD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,62),(62,63),(63,69),(69,75),(75,81),(81,87),(87,93),(93,94),(94,95),(95,103),(103,111),(111,119),(119,126),(126,134),(134,143),(143,154),(154,162),(162,170),(170,178),(178,186),(186,194),(194,202),(202,205),(205,208),(208,210),(210,218),(218,220),(220,228),(228,230),(230,238),(238,240),(240,248),(248,250),(250,258),(258,274)]
slice_list_DES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,1300),(1300,1316)]
slice_list_IBC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,48),(48,50),(50,54),(54,55),(55,56),(56,81),(81,101),(101,121),(121,124),(124,125),(125,145),(145,146),(146,149),(149,152),(152,154),(154,179),(179,199),(199,219),(219,222),(222,224),(224,227),(227,230),(230,238),(238,249),(249,265),(265,281)]
slice_list_ICD = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,63),(63,69),(69,75),(75,81),(81,87),(87,95),(95,103),(103,111),(111,114),(114,122),(122,125),(125,126),(126,142),(142,144),(144,152),(152,154),(154,162),(162,164),(164,172),(172,174),(174,182),(182,184),(184,192),(192,208)]
slice_list_LEG = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,65),(65,73),(73,81),(81,82),(82,90),(90,98),(98,133),(133,148),(148,163),(163,164),(164,172),(172,180),(180,181),(181,216),(216,256),(256,296),(296,326),(326,356),(356,381),(381,383),(383,398),(398,418),(418,438),(438,456),(456,474),(474,509),(509,549),(549,589),(589,619),(619,649),(649,674),(674,676),(676,691),(691,711),(711,731),(731,749),(749,767),(767,782),(782,790),(790,798),(798,806),(806,810),(810,818),(818,826),(826,834),(834,840),(840,849),(849,879),(879,888),(888,918),(918,920),(920,921),(921,923),(923,931),(931,939),(939,943),(943,944),(944,952),(952,960),(960,990),(990,1020),(1020,1050),(1050,1051),(1051,1086),(1086,1095),(1095,1135),(1135,1175),(1175,1205),(1205,1235),(1235,1260),(1260,1262),(1262,1277),(1277,1295),(1295,1304),(1304,1312),(1312,1328)]
slice_list_LD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,125),(125,150),(150,152),(152,167),(167,187),(187,205),(205,223),(223,227),(227,252),(252,267),(267,279),(279,309),(309,339),(339,359),(359,361),(361,376),(376,396),(396,414),(414,439),(439,440),(440,448),(448,454),(454,456),(456,871),(471,472),(472,492),(492,522),(522,552),(552,572),(572,574),(574,589),(589,609),(609,627),(627,637),(637,645),(645,685),(685,686),(686,706),(706,714),(714,744),(744,774),(774,794),(794,796),(796,811),(811,831),(831,849),(849,879),(879,909),(909,929),(929,931),(931,946),(946,966),(966,984),(984,992),(992,1004),(1004,1024),(1024,1064),(1064,1081),(1081,1098),(1098,1106),(1106,1121),(1121,1122),(1122,1152),(1152,1153),(1153,1162),(1162,1170),(1170,1185),(1185,1190),(1190,1220),(1220,1238),(1238,1253),(1253,1283),(1283,1301),(1301,1302),(1302,1303),(1303,1333),(1333,1363),(1363,1388),(1388,1390),(1390,1405),(1405,1406),(1406,1436),(1436,1442),(1442,1462),(1462,1463),(1463,1478),(1478,1493),(1493,1533),(1533,1535),(1535,1538),(1538,1540),(1540,1556),(1556,1756)]
slice_list_LD2 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,78),(78,118),(118,148),(148,178),(178,203),(203,205),(205,220),(220,238),(238,256),(256,260),(260,270),(270,290),(290,300),(300,302),(302,322),(322,352),(352,377),(377,397),(397,398),(398,423),(423,424),(424,454),(454,455),(455,456),(456,458),(458,474)]
slice_list_LD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,71),(71,91),(91,92),(92,122),(122,152),(152,177),(177,179),(179,194),(194,197),(197,205),(205,213),(213,221),(221,229),(229,237),(237,297),(297,305),(305,313),(313,321),(321,329),(329,337),(337,345),(345,353),(353,361),(361,421),(421,429),(429,489),(489,497),(497,557),(557,617),(617,633)]
slice_list_NET = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,88),(88,99),(99,105),(105,135),(135,146),(146,152),(152,182),(182,193),(193,199),(199,229),(229,240),(240,246),(246,276),(276,287),(287,293),(293,323),(323,334),(334,340),(340,370),(370,381),(381,387),(387,417),(417,428),(428,434),(434,464),(464,475),(475,481),(481,511),(511,522),(522,528),(528,558),(558,569),(569,575),(575,605),(605,616),(616,622),(622,652),(652,663),(663,669),(669,699),(699,710),(710,716),(716,746),(746,757),(757,763),(763,793),(793,804),(804,810),(810,840),(840,851),(851,857),(857,887),(887,898),(898,904),(904,934),(934,945),(945,951),(951,981),(981,992),(992,998),(998,1028),(1028,1039),(1039,1047),(1047,1055),(1055,1061),(1061,1077),(1077,1087),(1087,1103)]
slice_list_NOT = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,63),(63,71),(71,77),(77,79),(79,1279),(1279,1295),(1295,1296),(1296,1312)]
slice_list_OFF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,75),(75,78),(78,93),(93,105),(105,107),(107,115),(115,123),(123,131),(131,132),(132,148)]
slice_list_PAY = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,61),(61,65),(65,73),(73,81),(81,89),(89,90),(90,130),(130,165),(165,205),(205,245),(245,275),(275,305),(305,330),(330,332),(332,347),(347,367),(367,368),(368,428),(428,429),(429,437),(437,438),(438,439),(439,450),(450,452),(452,455),(455,458),(458,473),(473,481),(481,493),(493,501),(501,509),(509,521),(521,539),(539,542),(542,549),(549,552),(552,562),(562,567),(567,627),(627,635),(635,643),(643,647),(647,651),(651,653),(653,654),(654,684),(684,692),(692,702),(702,713),(713,1034),(1034,1050),(1050,1066)]
slice_list_PRC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,51),(51,81),(81,84),(84,87),(87,95),(95,103),(103,119),(119,125),(125,131),(131,147)]
slice_list_ACR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,51),(51,59),(59,71),(71,79),(79,91),(91,103),(103,119),(119,135)]
slice_list_REC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,58),(58,71),(71,84),(84,97),(97,110),(110,123),(123,136),(136,149),(149,162),(162,175),(175,188),(188,201),(201,214),(214,227),(227,240),(240,253),(253,266),(266,279),(279,292),(292,305),(305,318),(318,331),(331,344),(344,357),(357,370),(370,383),(383,396),(396,409),(409,422),(422,435),(435,448),(448,461),(461,474),(474,487),(487,500),(500,513),(513,526),(526,539),(539,552),(552,565),(565,578),(578,591),(591,604),(604,617),(617,630),(630,643),(643,656),(656,669),(669,682),(682,695),(695,708),(708,721),(721,734),(734,747),(747,760),(760,773),(773,786),(786,799),(799,812),(812,825),(825,838),(838,851),(851,864),(864,877),(877,890),(890,903),(903,916),(916,929),(929,942),(942,955),(955,968),(968,981),(981,997)]
slice_list_RED = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,69),(69,81),(81,93),(93,105),(105,117),(117,129),(129,141),(141,157)]
slice_list_REI = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,61),(61,67),(67,87),(87,88),(88,100),(100,108),(108,116),(116,176),(176,192),(192,193),(193,199),(199,214),(214,222),(222,230),(230,238),(238,250),(250,251),(251,311),(311,327)]
slice_list_RES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,134),(134,136),(136,148),(148,160),(160,172),(172,184),(184,196),(196,208),(208,220),(220,232),(232,242),(242,252),(252,262),(262,272),(272,282),(282,292),(292,299),(299,309),(309,319),(319,329),(329,339),(339,349),(349,359),(359,369),(369,379),(379,389),(389,399),(399,409),(409,419),(419,429),(429,439),(439,449),(449,465),(465,475),(475,975),(975,991)]
slice_list_RST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,87),(87,95),(95,125),(125,145),(145,161),(161,177)]
slice_list_SPC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,85),(85,93),(93,101),(101,109),(109,117),(117,125),(125,133),(133,149)]
slice_list_SSN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,62),(62,74),(74,82),(82,94),(94,102),(102,114),(114,122),(122,134),(134,142),(142,143),(143,151),(151,159),(159,160),(160,168),(168,176),(176,177),(177,185),(185,193),(193,194),(194,202),(202,210),(210,211),(211,219),(219,220),(220,228),(228,268),(268,276),(276,277),(277,293)]
slice_list_WRK = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,57),(57,72),(72,73),(73,81),(81,82),(82,90),(90,98),(98,106),(106,114),(114,122),(122,130),(130,131),(131,132),(132,133),(133,153),(153,154),(154,155),(155,159),(159,179),(179,180),(180,240),(240,248),(248,256),(256,264),(264,272),(272,280),(280,284),(284,288),(288,298),(298,314),(314,330)]
slice_list_WD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,58),(58,59),(59,60),(60,61),(61,63),(63,73),(73,74),(74,82),(82,83),(83,91),(91,99),(99,107),(107,108),(108,118),(118,120),(120,130),(130,137),(137,139),(139,149),(149,156),(156,158),(158,168),(168,175),(175,177),(177,187),(187,194),(194,196),(196,206),(206,213),(213,223),(223,233),(233,243),(243,253),(253,263),(263,273),(273,283),(283,293),(293,303),(303,311),(311,314),(314,322),(322,332),(332,342),(342,352),(352,353),(353,354),(354,355),(355,365),(365,375),(375,385),(385,395),(395,405),(405,415),(415,425),(425,435),(435,436),(436,437),(437,438),(438,439),(439,440),(440,442),(442,443),(443,444),(444,445),(445,446),(446,448),(448,458),(458,460),(460,470),(470,472),(472,482),(482,484),(484,494),(494,496),(496,506),(506,508),(508,518),(518,528),(528,542),(542,543),(543,551),(551,559),(559,561),(561,565),(565,567),(567,574),(574,582),(582,583),(583,584),(584,585),(585,593),(593,594),(594,595),(595,596),(596,604),(604,605),(605,606),(606,607),(607,615),(615,616),(616,617),(617,618),(618,626),(626,627),(627,628),(628,629),(629,637),(637,645),(645,653),(653,661),(661,669),(669,677),(677,685),(685,693),(693,701),(701,709),(709,717),(717,721),(721,729),(729,732),(732,734),(734,738),(738,746),(746,749),(749,751),(751,755),(755,763),(763,766),(766,774),(774,782),(782,790),(790,798),(798,800),(800,801),(801,802),(802,813),(813,829)]
slice_list_WD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,47),(47,48),(48,49),(49,50),(50,51),(51,52),(52,53),(53,54),(54,55),(55,56),(56,57),(57,58),(58,98),(98,138),(138,178),(178,182),(182,183),(183,191),(191,197),(197,213)]
slice_dict = {
'CLM' : slice_list_CLM,
'CTL' : slice_list_CTL,
'ADR' : slice_list_ADR,
'AGR' : slice_list_AGR,
'ACN' : slice_list_ACN,
'CST' : slice_list_CST,
'MCF' : slice_list_MCF,
'DD1' : slice_list_DD1,
'DES' : slice_list_DES,
'IBC' : slice_list_IBC,
'ICD' : slice_list_ICD,
'LEG' : slice_list_LEG,
'LD1' : slice_list_LD1,
'LD2' : slice_list_LD2,
'LD3' : slice_list_LD3,
'NET' : slice_list_NET,
'NOT' : slice_list_NOT,
'OFF' : slice_list_OFF,
'PAY' : slice_list_PAY,
'PRC' : slice_list_PRC,
'ACR' : slice_list_ACR,
'REC' : slice_list_REC,
'RED' : slice_list_RED,
'REI' : slice_list_REI,
'RES' : slice_list_RES,
'RST' : slice_list_RST,
'SPC' : slice_list_SPC,
'SSN' : slice_list_SSN,
'WRK' : slice_list_WRK,
'WD1' : slice_list_WD1,
'WD3' : slice_list_WD3,
}
def slicer(file,slice_list):
csv_string = ""
for i in slice_list:
csv_string += (file[i[0]:i[1]]+",")
return csv_string
overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs
with open(overview_loc, 'w') as overview_file:
for key, value in dict_of_record_lists.items():
overview_file.write((key+' '+(str(len(value)))+'\n'))
for key, value in dict_of_record_lists.items():
for k, v in slice_dict.items():
if key == k:
iteration = 0
for i in value:
s = slicer(i,v)
value[iteration] = s
iteration+= 1
e02_ext = read_loc[int(e02ext_start):]
csv_ext = e02_ext[:-3]+'csv'
# file overview/log that shows how many lines should exist in the other files to ensure everything wrote correctly
overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs
with open(overview_loc, 'w') as overview_file:
for key, value in dict_of_record_lists.items():
overview_file.write((key+' '+(str(len(value)))+'\n'))
# if the list isn't empty writes a new file w/prefix matching key and includes the lines
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+e02_ext
with open(write_loc, "w", newline='') as parsed_file:
for line in value:
line_pre = "%s\n" % line
parsed_file.write(line_pre[:-1])
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
writer.writerow(i)
This is a sample of a section of output in both Excel and our SQL table:
P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 " " " " " " " "
Desired output (void of " as place holders for spaces):
P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7
Any help would be greatly appreciated.
Why:
The problem you are facing is, that you have list entries inside your row of processed data that only contain the csv.delimiter character. The module then quotes them to distinguish your "delimiter-only-data" from your "delimiter between columns".
When writing something like [ ["PAY","...."," "," "," "," "] ] into a csv using ' ' as divider you get them outputted quoted:
import csv
dict_of_record_lists = {"K": [ ["PAY","...."," "," "," "," "] ] }
for key, value in dict_of_record_lists.items():
write_loc = 't.txt'
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
writer.writerow(i)
print( open(write_loc).read()) # PAY .... " " " " " " " "
Fix:
You can fix that specifying quoting=csv.QUOTE_NONE and provide a escapechar = ... or by fixing your data. Providing an escapechar would put that into your file though.
Relevant portions of the documentation: csv.QUOTE_NONE.
You can manipulate your data to not contain "only" the delimiters as data:
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
# if an inner item only contains delimiter characters, set it to empty string
cleared = [x for x in i if i.strip(" ") else ""]
writer.writerow(cleared)
HTH
Doku:
https://docs.python.org/3/library/csv.html
Was able to change the initial text writing portion to:
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as parsed_file:
for line in value:
line_pre = "%s" % line
parsed_file.write(line_pre[:-1]+'\n')
All the issues were fixed by avoiding python's built-in CSV writer.
The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created.
A text file can be created by swapping out the extension upon writing.
Related
Printing Values to CSV (Python)
I am trying to send the failed values to a CSV file but it's only giving me the last failed value in the list. print(("Folder\t"+ "Expected\t"+ "Actual\t"+"Result").expandtabs(20)) for key in expected: expectedCount = str(expected[key]) actualCount = "0" if key in newDictionary: actualCount = str(newDictionary[key]) elif expectedCount == actualCount: result = "Pass" else: result = "Fail" with open('XML Output.csv', 'w',encoding='utf-8', newline="") as csvfile: header = ['Folder', 'Expected', 'Actual','Result'] my_writer = csv.writer(csvfile) my_writer.writerow(header) my_writer.writerow([key, expectedCount, actualCount, result]) csvfile.close() print((key + "\t"+expectedCount+ "\t"+actualCount+ "\t"+result).expandtabs(20)) print("======================== Data Exported to CSV file ========================") Output: Folder Expected Actual Result D 2 1 Fail Here is what the output should be: Folder Expected Actual Result A 2 1 Fail B 2 1 Fail C 2 1 Fail D 2 1 Fail
This is because each iteration of with open using w is overwriting the file, leaving only the last iteration at the end of it. You could use a for append. A better method may be to create a data structure to hold the failures and write to the file simultaneously. Give the below a try. I couldn't test it without initial data, but I think you'll get what I was going for. print(("Folder\t"+ "Expected\t"+ "Actual\t"+"Result").expandtabs(20)) failures = [] for key in expected: expectedCount = str(expected[key]) actualCount = "0" if key in newDictionary: actualCount = str(newDictionary[key]) elif expectedCount == actualCount: result = "Pass" else: result = "Fail" csv_row = { "Folder":key, "Expected":expectedCount, "Actual":actualCount, "Result":"Fail" } failures.append(csv_row) print((key + "\t"+expectedCount+ "\t"+actualCount+ "\t"+result).expandtabs(20)) try: with open('XML Output.csv', 'w') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=failures[0].keys()) writer.writeheader() for data in failures: writer.writerow(data) except IOError: print('I/O Error on CSV export') print("======================== Data Exported to CSV file ========================") Edit: Wanted to add a note that if you want to use dictionaries to write to CSV, DictWriter is an apt choice for this. https://docs.python.org/3/library/csv.html#csv.DictWriter
you're recreating the csv file upon every iteration that hits the else branch. You need to move the with statement out of your loop: import csv expected = {"a": 1, "b": 2} newDictionary = {"a": 2, "c": 2} with open('XML Output2.csv', 'w', encoding='utf-8', newline="") as csvfile: header = ['Folder', 'Expected', 'Actual', 'Result'] my_writer = csv.writer(csvfile) my_writer.writerow(header) for key in expected: expectedCount = str(expected[key]) actualCount = "0" if key in newDictionary: actualCount = str(newDictionary[key]) if expectedCount == actualCount: result = "Pass" else: result = "Fail" my_writer.writerow([key, expectedCount, actualCount, result]) print((key + "\t" + expectedCount + "\t" + actualCount + "\t" + result).expandtabs(20)) print("======================== Data Exported to CSV file ========================") Also, note that you do not need to invoke close() explicitly on a file that was created using a context manager (with). See this article for more on the matter: https://realpython.com/python-with-statement/
split list with multiple delimiter
I have a list containing string from lines in txt file. import csv import re from collections import defaultdict parameters = ["name", "associated-interface", "type", "subnet", "fqdn", "wildcard-fqdn", "start-ip", "end-ip", "comment"] address_dict = defaultdict(dict) address_statements = [] with open("***somepaths**\\file.txt", "r") as address: in_address = False for line in address: line = line.strip() #print (line) if in_address and line != "next": if line == "end": break address_statements.append(line) else: if line == "config firewall address": in_address = True #print(address_statements) if address_statements: for statement in address_statements: op, param, *val = statement.split() if op == "edit": address_id = param elif op == "set" and param in parameters: address_dict[address_id][param] = ' '.join(val) # output to the CSV with open("***somepaths**\\file.csv", "w", newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=parameters) writer.writeheader() for key in address_dict: address_dict[key]['name'] = key writer.writerow(address_dict[key]) output should be like this: edit "name test" but it turn out to emit the space after the name and be like this: edit "name How can I include everything in the double quotes?
You are using op, param, *val = statement.split() which splits at spaces - a line of edit "CTS SVR"' will put '-edit' into op, '"CTS' into param and the remainder of the line (split at spaces as list) into val: ['SVR"']. You need a way to Split a string by spaces preserving quoted substrings - if you have params that are internally seperated by spaces and delimited by quote. Inspired by this answer the csv module gives you what you need: t1 = 'edit1 "some text" bla bla' t2 = 'edit2 "some text"' t3 = 'edit3 some thing' import csv reader = csv.reader([t1,t2,t3], delimiter = " ", skipinitialspace = True) for row in reader: op, param, *remainder = row print(op,param,remainder, sep = " --> ") Output: edit1 --> some text --> ['bla', 'bla'] edit2 --> some text --> [] edit3 --> some --> ['thing'] You can apply the reader to one line only ( reader = csv.reader([line], delimiter = " ") ). Probably a duplicate of Split a string by spaces preserving quoted substrings - I closevoted earlier on the question and cannot vote for duplicate anymore - hence the detailed answer.
how to parse a txt file to csv and modify formatting
Is there a way I can use python to take my animals.txt file results and convert it to csv and format it differently? Currently the animals.txt file looks like this: ID:- 512 NAME:- GOOSE PROJECT NAME:- Random REPORT ID:- 30321 REPORT NAME:- ANIMAL KEYWORDS:- ['"help,goose,Grease,GB"'] ID:- 566 NAME:- MOOSE PROJECT NAME:- Random REPORT ID:- 30213 REPORT NAME:- ANIMAL KEYWORDS:- ['"Moose, boar, hansel"'] I would like the CSV file to present it as: ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS Followed by the results underneath each header Here is a script I have wrote: import re import csv with open("animals.txt") as f: text = f.read() data = {} keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS'] for k in keys: data[k] = re.findall(r'%s:- (.*)' % k, text) csv_file = 'out.csv' with open(csv_file, 'w') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=keys) writer.writeheader() for x in data: writer.writerow(x)
An easy way to do is parsing using regex and store them in a dict, just before you write the final csv: import re # `text` is your input text data = {} keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS'] for k in keys: data[k] = re.findall(r'%s:- (.*)' % k, text) And to CSV: import csv csv_file = 'out.csv' with open(csv_file, 'w') as csvfile: writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, escapechar='\\') writer.writerow(data.keys()) for i in range(len(data[keys[0]])): writer.writerow([data[k][i] for k in keys]) Output in csv: ID,NAME,PROJECT NAME,REPORT ID,REPORT NAME,KEYWORDS 512,GOOSE,Random,30321,ANIMAL,['\"help\,goose\,Grease\,GB\"'] 566,MOOSE,Random,30213,ANIMAL,['\"Moose\, boar\, hansel\"'] Note that I used re.M multiline mode since there's a trick in your text, preventing matching ID twice! Also the default write rows needed to be twisted. Also uses \ to escape the quote.
This should work: fname = 'animals.txt' with open(fname) as f: content = f.readlines() content = [x.strip() for x in content] output = 'ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS\n' line_output = '' for i in range(0, len(content)): if content[i]: line_output += content[i].split(':-')[-1].strip() + ',' elif not content[i] and not content[i - 1]: output += line_output.rstrip(',') + '\n' line_output = '' output += line_output.rstrip(',') + '\n' print(output)
That's the code in Autoit (www.autoitscript.com) Global $values_A = StringRegExp(FileRead("json.txt"), '[ID|NAME|KEYWORDS]:-\s(.*)?', 3) For $i = 0 To UBound($values_A) - 1 Step +6 FileWrite('out.csv', $values_A[$i] & ',' & $values_A[$i + 1] & ',' & $values_A[$i + 2] & ',' & $values_A[$i + 3] & ',' & $values_A[$i + 4] & ',' & $values_A[$i + 5] & #CRLF) Next
New column and column values get added to the next line
I want to add a new column and new values to it. I'm just using normal file handling to do it (just adding a delimiter). I actually did try using csv but the csv file would have one letter per cell after running the code. #import csv #import sys #csv.field_size_limit(sys.maxsize) inp = open("city2", "r") inp2 = open("op", "r") oup = open("op_mod.csv", "a+") #alldata = [] count = 0 for line in inp2: check = 0 if count == 0: count = count + 1 colline = line + "\t" + "cities" oup.write(colline) continue for city in inp: if city in line: print(city, line) linemod = line + "\t" + city #adding new value to an existing row #alldata.append(linemod) oup.write(linemod) #writing the new value check = 1 break if check == 0: check = 1 #oup.write(line) #alldata.append(line) inp.close() inp = open("city2", "r") #writer.writerows(alldata) inp.close() inp2.close() oup.close() Expected result: existing fields/values ... new field/value actual result: existing fields/values ... new line new field/value ...next line
there is a carriage return at the end of line, you can remove it using line.rstrip() similar to this answer: Deleting carriage returns caused by line reading
Python CSV module, add column to the side, not the bottom
I am new in python, and I need some help. I made a python script that takes two columns from a file and copies them into a "new file". However, every now and then I need to add columns to the "new file". I need to add the columns on the side, not the bottom. My script adds them to the bottom. Someone suggested using CSV, and I read about it, but I can't make it in a way that it adds the new column to the side of the previous columns. Any help is highly appreciated. Here is the code that I wrote: import sys import re filetoread = sys.argv[1] filetowrite = sys.argv[2] newfile = str(filetowrite) + ".txt" openold = open(filetoread,"r") opennew = open(newfile,"a") rline = openold.readlines() number = int(len(rline)) start = 0 for i in range (len(rline)) : if "2theta" in rline[i] : start = i for line in rline[start + 1 : number] : words = line.split() word1 = words[1] word2 = words[2] opennew.write (word1 + " " + word2 + "\n") openold.close() opennew.close() Here is the second code I wrote, using CSV: import sys import re import csv filetoread = sys.argv[1] filetowrite = sys.argv[2] newfile = str(filetowrite) + ".txt" openold = open(filetoread,"r") rline = openold.readlines() number = int(len(rline)) start = 0 for i in range (len(rline)) : if "2theta" in rline[i] : start = i words1 = [] words2 = [] for line in rline[start + 1 : number] : words = line.split() word1 = words[1] word2 = words[2] words1.append([word1]) words2.append([word2]) with open(newfile, 'wb') as file: writer = csv.writer(file, delimiter= "\n") writer.writerow(words1) writer.writerow(words2) These are some samples of input files: https://dl.dropbox.com/u/63216126/file5.txt https://dl.dropbox.com/u/63216126/file6.txt My first script works "almost" great, except that it writes the new columns at the bottom and I need them at side of the previous columns.
The proper way to use writerow is to give it a single list that contains the data for all the columns. words.append(word1) words.append(word2) writer.writerow(words)