Printing Values to CSV (Python) - python
I am trying to send the failed values to a CSV file but it's only giving me the last failed value in the list.
print(("Folder\t"+ "Expected\t"+ "Actual\t"+"Result").expandtabs(20))
for key in expected:
expectedCount = str(expected[key])
actualCount = "0"
if key in newDictionary:
actualCount = str(newDictionary[key])
elif expectedCount == actualCount:
result = "Pass"
else:
result = "Fail"
with open('XML Output.csv', 'w',encoding='utf-8', newline="") as csvfile:
header = ['Folder', 'Expected', 'Actual','Result']
my_writer = csv.writer(csvfile)
my_writer.writerow(header)
my_writer.writerow([key, expectedCount, actualCount, result])
csvfile.close()
print((key + "\t"+expectedCount+ "\t"+actualCount+ "\t"+result).expandtabs(20))
print("======================== Data Exported to CSV file ========================")
Output:
Folder Expected Actual Result
D 2 1 Fail
Here is what the output should be:
Folder Expected Actual Result
A 2 1 Fail
B 2 1 Fail
C 2 1 Fail
D 2 1 Fail
This is because each iteration of with open using w is overwriting the file, leaving only the last iteration at the end of it. You could use a for append.
A better method may be to create a data structure to hold the failures and write to the file simultaneously. Give the below a try. I couldn't test it without initial data, but I think you'll get what I was going for.
print(("Folder\t"+ "Expected\t"+ "Actual\t"+"Result").expandtabs(20))
failures = []
for key in expected:
expectedCount = str(expected[key])
actualCount = "0"
if key in newDictionary:
actualCount = str(newDictionary[key])
elif expectedCount == actualCount:
result = "Pass"
else:
result = "Fail"
csv_row = {
"Folder":key,
"Expected":expectedCount,
"Actual":actualCount,
"Result":"Fail"
}
failures.append(csv_row)
print((key + "\t"+expectedCount+ "\t"+actualCount+ "\t"+result).expandtabs(20))
try:
with open('XML Output.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=failures[0].keys())
writer.writeheader()
for data in failures:
writer.writerow(data)
except IOError:
print('I/O Error on CSV export')
print("======================== Data Exported to CSV file ========================")
Edit:
Wanted to add a note that if you want to use dictionaries to write to CSV, DictWriter is an apt choice for this.
https://docs.python.org/3/library/csv.html#csv.DictWriter
you're recreating the csv file upon every iteration that hits the else branch. You need to move the with statement out of your loop:
import csv
expected = {"a": 1, "b": 2}
newDictionary = {"a": 2, "c": 2}
with open('XML Output2.csv', 'w', encoding='utf-8', newline="") as csvfile:
header = ['Folder', 'Expected', 'Actual', 'Result']
my_writer = csv.writer(csvfile)
my_writer.writerow(header)
for key in expected:
expectedCount = str(expected[key])
actualCount = "0"
if key in newDictionary:
actualCount = str(newDictionary[key])
if expectedCount == actualCount:
result = "Pass"
else:
result = "Fail"
my_writer.writerow([key, expectedCount, actualCount, result])
print((key + "\t" + expectedCount + "\t" + actualCount + "\t" + result).expandtabs(20))
print("======================== Data Exported to CSV file ========================")
Also, note that you do not need to invoke close() explicitly on a file that was created using a context manager (with). See this article for more on the matter: https://realpython.com/python-with-statement/
Related
Python3 - Nested dict to JSON
I am trying to convert multiple .txt file to "table-like" data (with columns and rows). Each .txt file should be considered as a new column. Consider below content of the .txt file: File1.txt Hi there How are you doing? What is your name? File2.txt Hi Great! Oliver, what's yours? I have created a simple method, that accepts the file and and integer (the file number from another method): def txtFileToJson(text_file, column): data = defaultdict(list) i = int(1) with open(text_file) as f: data[column].append(column) for line in f: i = i + 1 for line in re.split(r'[\n\r]+', line): data[column] = line with open("output.txt", 'a+') as f: f.write(json.dumps(data)) So above method will run two times (one time for each file, and append the data). This is the output.txt file after I have run my script: {"1": "What is your name?"}{"2": "Oliver, what's yours?"} As you can see, I can only get it to create a new for each file I have, and then add the entire line. [{ "1": [{ "1": "Hi there", "2": "How are you doing?", "3": "\n" "4": "What is your name?" }, "2": [{ "1": "Hi" "2": "Great!", "3": "\n", "4": "Oliver, what's yours?" }, }] Update: OK, so I played around a bit and got a bit closer: myDict = {str(column): []} i = int(1) with open(text_file) as f: for line in f: # data[column].append(column) match = re.split(r'[\n\r]+', line) if match: myDict[str(column)].append({str(i): line}) i = i + 1 with open(out_file, 'a+') as f: f.write(json.dumps(myDict[str(column)])) That gives me below output: [{"1": "Hi there\n"}, {"2": "How are you doing?\n"}, {"3": "\n"}, {"4": "What is your name?"}] [{"1": "Hi\n"}, {"2": "Great!\n"}, {"3": "\n"}, {"4": "Oliver, what's yours?"}] But as you can see, now I have multiple JSON root elements. Solution Thanks to jonyfries, I did this: data = defaultdict(list) for path in images.values(): column = column + 1 data[str(column)] = txtFileToJson(path, column) saveJsonFile(path, data) And then added a new method to save the final combined list: def saveJsonFile(text_file, data): basename = os.path.splitext(os.path.basename(text_file)) dir_name = os.path.dirname(text_file) + "/" text_file = dir_name + basename[0] + "1.txt" out_file = dir_name + 'table_data.txt' with open(out_file, 'a+') as f: f.write(json.dumps(data))
You're creating a new dictionary within the function itself. So each time you pass a text file in it will create a new dictionary. The easiest solution seems to be returning the dictionary created and add it to an existing dictionary. def txtFileToJson(text_file, column): myDict = {str(column): []} i = int(1) with open(text_file) as f: for line in f: # data[column].append(column) match = re.split(r'[\n\r]+', line) if match: myDict[str(column)].append({str(i): line}) i = i + 1 with open(out_file, 'a+') as f: f.write(json.dumps(myDict[str(column)])) return myDict data = defaultdict(list) data["1"] = txtFileToJson(text_file, column) data["2"] = txtFileToJson(other_text_file, other_column)
def read(text_file): data, i = {}, 0 with open(text_file) as f: for line in f: i = i + 1 data['row_%d'%i] = line.rstrip('\n') return data res = {} for i, fname in enumerate([r'File1.txt', r'File2.txt']): res[i] = read(fname) with open(out_file, 'w') as f: json.dump(res, f)
First, if I understand you are trying to get as output a dictionary of dictionaries, then let me observe that what I understand to be your desired output seems to be enclosing the whole thing within a list, Furthermore, you have unbalanced open and closed list brackets within the dictionaries, which I will ignore, as I will the enclosing list. I think you need something like: #!python3 import json import re def processTxtFile(text_file, n, data): d = {} with open(text_file) as f: i = 0 for line in f: for line in re.split(r'[\n\r]+', line): i = i + 1 d[str(i)] = line data[str(n)] = d data = dict() processTxtFile('File1.txt', 1, data) processTxtFile('File2.txt', 2, data) with open("output.txt", 'wt') as f: f.write(json.dumps(data)) If you really need the nested dictionaries to be enclosed within a list, then replace data[str(n)] = d with: data[str(n)] = [d]
Converting Fixed-Width File to .txt then the .txt to .csv
I have a fixed-width file that I have no issues importing and splitting into 31 txt files. The spaces from the fixed-width file are conserved in this process since the writing to the txt simply writes each entry from the fixed-width file as a new line. My issue is that when I use python's csv function these spaces are replaced with "(a quotation mark) as a place holder. I'm looking to see if there is a way to have a csv file produced without these double quotes as place holders while maintaining the required formatting initially set in the fixed-width file. Initial line in txt doc: 'PAY90004100095206 9581400086000909 0008141000 5350 3810 C 000021841998051319980513P810406247 FELT, MARTIN & FRAZIER, P.C. FELT, MARTIN & FRAZIER, P.C. 208 NORTH BROADWAY STE 313 BILLINGS MT59101-0 NLance Martin v. Whitman College N00000000NN98004264225 SYS656 19980512+000000378761998041319980421+000000378769581400086000909 000+000000 Lance Martin v. Whitman College 00000000 00010001 +00000000000002184 000000021023.005000000003921.005\n' .py: import csv read_loc = 'c:/Users/location/e0290000005.txt' e02ext_start = read_loc.find('e02') e02_ext = read_loc[int(e02ext_start):] with open(read_loc, 'r') as f: contents = f.readlines() dict_of_record_lists = {} # takes first 3 characters of each line and if a matching dictionary key is found # it appends the line to the value-list for line in contents: record_type = (line[:3]) dict_of_record_lists.setdefault(record_type,[]).append(line) slice_list_CLM = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,59),(59,109),(109,189),(189,191),(191,193),(193,194),(194,195),(195,203),(203,211),(211,219),(219,227),(227,235),(235,237),(237,239),(239,241),(241,245),(245,249),(249,253),(253,257),(257,261),(261,291),(291,316),(316,331),(331,332),(332,357),(357,377),(377,378),(378,408),(408,438),(438,468),(468,470),(470,485),(485,505),(505,514),(514,517),(517,525),(525,533),(533,535),(535,536),(536,537),(537,545),(545,551),(551,553),(553,568),(568,572),(572,587),(587,602),(602,627),(627,631),(631,638),(638,642),(642,646),(646,654),(654,662),(662,670),(670,672),(672,674),(674,675),(675,676),(676,682),(682,700),(700,708),(708,716),(716,717),(717,725),(725,733),(733,741),(741,749),(749,759),(759,761),(761,762),(762,763),(763,764),(764,765),(765,768),(768,769),(769,770),(770,778),(778,779),(779,783),(783,787),(787,788),(788,805),(805,817),(817,829),(829,833),(833,863),(863,893),(893,896),(896,897),(897,898),(898,928),(928,936),(936,944),(944,945),(945,947),(947,959),(959,971),(971,983),(983,995),(995,1007),(1007,1019),(1019,1031),(1031,1043),(1043,1055),(1055,1067),(1067,1079),(1079,1091),(1091,1103),(1103,1115),(1115,1127),(1127,1139),(1139,1151),(1151,1163),(1163,1175),(1175,1187),(1187,1197),(1197,1202),(1202,1203),(1203,1211),(1211,1214),(1214,1215),(1215,1233),(1233,1241),(1241,1257),(1257,1272),(1272,1273),(1273,1285),(1285,1289),(1289,1293),(1293,1343),(1343,1365),(1365,1685),(1685,1686),(1686,1704),(1704,1708),(1708,1748),(1748,1768),(1768,1770),(1770,1772),(1772,1773),(1773,1782),(1782,1784),(1784,1792),(1792,1793),(1793,1796),(1796,1800)] slice_list_CTL = [(0,3),(3,7),(7,15),(15,23),(23,31),(31,39),(39,47),(47,55),(55,56),(56,65),(65,74),(74,83),(83,98),(98,113),(113,128),(128,143),(143,158),(158,173),(173,188),(188,203),(203,218),(218,233),(233,248),(248,263),(263,278),(278,293),(293,308),(308,323),(323,338),(338,353),(353,368),(368,383),(383,398),(398,413),(413,428),(428,443),(443,458),(458,473),(473,488),(488,503),(503,518),(518,527),(527,536),(536,545),(545,554),(554,563),(563,572),(572,581),(581,590),(590,599),(599,614),(614,623),(623,638),(638,647),(647,662),(662,671),(671,686),(686,695),(695,710),(710,719),(719,728),(728,737),(737,746),(746,755),(755,764),(764,773),(773,782),(782,791),(791,800),(800,809),(809,818),(818,827),(827,836),(836,845),(845,854),(854,863),(863,872),(872,881),(881,890),(890,899),(899,908)] slice_list_ADR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,50),(50,53),(53,62),(62,65),(65,66),(66,91),(91,111),(111,121),(121,151),(151,181),(181,206),(206,208),(208,223),(223,243),(243,261),(261,265),(265,283),(283,287),(287,305),(305,335),(335,375),(375,383),(383,387),(387,437),(437,438),(438,446),(446,454),(454,461),(461,468),(468,484),(484,500)] slice_list_AGR = [(0,3),(3,7),(7,45),(45,85),(85,93),(93,101),(101,109),(109,117),(117,127),(127,139),(139,151)] slice_list_ACN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,115),(115,145),(145,165),(165,195),(195,215),(215,245),(245,265),(265,295),(295,315),(315,345),(345,365),(365,395),(395,415),(415,445),(445,465),(465,495),(495,515),(515,545),(545,565),(565,595),(595,615),(615,645),(645,665),(665,695),(695,715),(715,745),(745,765),(765,795),(795,815),(815,845),(845,865),(865,895),(895,915),(915,945),(945,965),(965,995),(995,1015),(1015,1045),(1045,1061)] slice_list_CST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,59),(59,60),(60,61),(61,62),(62,64),(64,80),(80,82),(82,84),(84,86),(86,88),(88,104)] slice_list_MCF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,49),(49,79),(79,94),(94,159),(159,175),(175,191)] slice_list_DD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,62),(62,63),(63,69),(69,75),(75,81),(81,87),(87,93),(93,94),(94,95),(95,103),(103,111),(111,119),(119,126),(126,134),(134,143),(143,154),(154,162),(162,170),(170,178),(178,186),(186,194),(194,202),(202,205),(205,208),(208,210),(210,218),(218,220),(220,228),(228,230),(230,238),(238,240),(240,248),(248,250),(250,258),(258,274)] slice_list_DES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,1300),(1300,1316)] slice_list_IBC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,48),(48,50),(50,54),(54,55),(55,56),(56,81),(81,101),(101,121),(121,124),(124,125),(125,145),(145,146),(146,149),(149,152),(152,154),(154,179),(179,199),(199,219),(219,222),(222,224),(224,227),(227,230),(230,238),(238,249),(249,265),(265,281)] slice_list_ICD = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,63),(63,69),(69,75),(75,81),(81,87),(87,95),(95,103),(103,111),(111,114),(114,122),(122,125),(125,126),(126,142),(142,144),(144,152),(152,154),(154,162),(162,164),(164,172),(172,174),(174,182),(182,184),(184,192),(192,208)] slice_list_LEG = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,65),(65,73),(73,81),(81,82),(82,90),(90,98),(98,133),(133,148),(148,163),(163,164),(164,172),(172,180),(180,181),(181,216),(216,256),(256,296),(296,326),(326,356),(356,381),(381,383),(383,398),(398,418),(418,438),(438,456),(456,474),(474,509),(509,549),(549,589),(589,619),(619,649),(649,674),(674,676),(676,691),(691,711),(711,731),(731,749),(749,767),(767,782),(782,790),(790,798),(798,806),(806,810),(810,818),(818,826),(826,834),(834,840),(840,849),(849,879),(879,888),(888,918),(918,920),(920,921),(921,923),(923,931),(931,939),(939,943),(943,944),(944,952),(952,960),(960,990),(990,1020),(1020,1050),(1050,1051),(1051,1086),(1086,1095),(1095,1135),(1135,1175),(1175,1205),(1205,1235),(1235,1260),(1260,1262),(1262,1277),(1277,1295),(1295,1304),(1304,1312),(1312,1328)] slice_list_LD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,125),(125,150),(150,152),(152,167),(167,187),(187,205),(205,223),(223,227),(227,252),(252,267),(267,279),(279,309),(309,339),(339,359),(359,361),(361,376),(376,396),(396,414),(414,439),(439,440),(440,448),(448,454),(454,456),(456,871),(471,472),(472,492),(492,522),(522,552),(552,572),(572,574),(574,589),(589,609),(609,627),(627,637),(637,645),(645,685),(685,686),(686,706),(706,714),(714,744),(744,774),(774,794),(794,796),(796,811),(811,831),(831,849),(849,879),(879,909),(909,929),(929,931),(931,946),(946,966),(966,984),(984,992),(992,1004),(1004,1024),(1024,1064),(1064,1081),(1081,1098),(1098,1106),(1106,1121),(1121,1122),(1122,1152),(1152,1153),(1153,1162),(1162,1170),(1170,1185),(1185,1190),(1190,1220),(1220,1238),(1238,1253),(1253,1283),(1283,1301),(1301,1302),(1302,1303),(1303,1333),(1333,1363),(1363,1388),(1388,1390),(1390,1405),(1405,1406),(1406,1436),(1436,1442),(1442,1462),(1462,1463),(1463,1478),(1478,1493),(1493,1533),(1533,1535),(1535,1538),(1538,1540),(1540,1556),(1556,1756)] slice_list_LD2 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,78),(78,118),(118,148),(148,178),(178,203),(203,205),(205,220),(220,238),(238,256),(256,260),(260,270),(270,290),(290,300),(300,302),(302,322),(322,352),(352,377),(377,397),(397,398),(398,423),(423,424),(424,454),(454,455),(455,456),(456,458),(458,474)] slice_list_LD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,71),(71,91),(91,92),(92,122),(122,152),(152,177),(177,179),(179,194),(194,197),(197,205),(205,213),(213,221),(221,229),(229,237),(237,297),(297,305),(305,313),(313,321),(321,329),(329,337),(337,345),(345,353),(353,361),(361,421),(421,429),(429,489),(489,497),(497,557),(557,617),(617,633)] slice_list_NET = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,88),(88,99),(99,105),(105,135),(135,146),(146,152),(152,182),(182,193),(193,199),(199,229),(229,240),(240,246),(246,276),(276,287),(287,293),(293,323),(323,334),(334,340),(340,370),(370,381),(381,387),(387,417),(417,428),(428,434),(434,464),(464,475),(475,481),(481,511),(511,522),(522,528),(528,558),(558,569),(569,575),(575,605),(605,616),(616,622),(622,652),(652,663),(663,669),(669,699),(699,710),(710,716),(716,746),(746,757),(757,763),(763,793),(793,804),(804,810),(810,840),(840,851),(851,857),(857,887),(887,898),(898,904),(904,934),(934,945),(945,951),(951,981),(981,992),(992,998),(998,1028),(1028,1039),(1039,1047),(1047,1055),(1055,1061),(1061,1077),(1077,1087),(1087,1103)] slice_list_NOT = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,63),(63,71),(71,77),(77,79),(79,1279),(1279,1295),(1295,1296),(1296,1312)] slice_list_OFF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,75),(75,78),(78,93),(93,105),(105,107),(107,115),(115,123),(123,131),(131,132),(132,148)] slice_list_PAY = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,61),(61,65),(65,73),(73,81),(81,89),(89,90),(90,130),(130,165),(165,205),(205,245),(245,275),(275,305),(305,330),(330,332),(332,347),(347,367),(367,368),(368,428),(428,429),(429,437),(437,438),(438,439),(439,450),(450,452),(452,455),(455,458),(458,473),(473,481),(481,493),(493,501),(501,509),(509,521),(521,539),(539,542),(542,549),(549,552),(552,562),(562,567),(567,627),(627,635),(635,643),(643,647),(647,651),(651,653),(653,654),(654,684),(684,692),(692,702),(702,713),(713,1034),(1034,1050),(1050,1066)] slice_list_PRC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,51),(51,81),(81,84),(84,87),(87,95),(95,103),(103,119),(119,125),(125,131),(131,147)] slice_list_ACR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,51),(51,59),(59,71),(71,79),(79,91),(91,103),(103,119),(119,135)] slice_list_REC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,58),(58,71),(71,84),(84,97),(97,110),(110,123),(123,136),(136,149),(149,162),(162,175),(175,188),(188,201),(201,214),(214,227),(227,240),(240,253),(253,266),(266,279),(279,292),(292,305),(305,318),(318,331),(331,344),(344,357),(357,370),(370,383),(383,396),(396,409),(409,422),(422,435),(435,448),(448,461),(461,474),(474,487),(487,500),(500,513),(513,526),(526,539),(539,552),(552,565),(565,578),(578,591),(591,604),(604,617),(617,630),(630,643),(643,656),(656,669),(669,682),(682,695),(695,708),(708,721),(721,734),(734,747),(747,760),(760,773),(773,786),(786,799),(799,812),(812,825),(825,838),(838,851),(851,864),(864,877),(877,890),(890,903),(903,916),(916,929),(929,942),(942,955),(955,968),(968,981),(981,997)] slice_list_RED = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,69),(69,81),(81,93),(93,105),(105,117),(117,129),(129,141),(141,157)] slice_list_REI = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,61),(61,67),(67,87),(87,88),(88,100),(100,108),(108,116),(116,176),(176,192),(192,193),(193,199),(199,214),(214,222),(222,230),(230,238),(238,250),(250,251),(251,311),(311,327)] slice_list_RES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,134),(134,136),(136,148),(148,160),(160,172),(172,184),(184,196),(196,208),(208,220),(220,232),(232,242),(242,252),(252,262),(262,272),(272,282),(282,292),(292,299),(299,309),(309,319),(319,329),(329,339),(339,349),(349,359),(359,369),(369,379),(379,389),(389,399),(399,409),(409,419),(419,429),(429,439),(439,449),(449,465),(465,475),(475,975),(975,991)] slice_list_RST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,87),(87,95),(95,125),(125,145),(145,161),(161,177)] slice_list_SPC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,85),(85,93),(93,101),(101,109),(109,117),(117,125),(125,133),(133,149)] slice_list_SSN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,62),(62,74),(74,82),(82,94),(94,102),(102,114),(114,122),(122,134),(134,142),(142,143),(143,151),(151,159),(159,160),(160,168),(168,176),(176,177),(177,185),(185,193),(193,194),(194,202),(202,210),(210,211),(211,219),(219,220),(220,228),(228,268),(268,276),(276,277),(277,293)] slice_list_WRK = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,57),(57,72),(72,73),(73,81),(81,82),(82,90),(90,98),(98,106),(106,114),(114,122),(122,130),(130,131),(131,132),(132,133),(133,153),(153,154),(154,155),(155,159),(159,179),(179,180),(180,240),(240,248),(248,256),(256,264),(264,272),(272,280),(280,284),(284,288),(288,298),(298,314),(314,330)] slice_list_WD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,58),(58,59),(59,60),(60,61),(61,63),(63,73),(73,74),(74,82),(82,83),(83,91),(91,99),(99,107),(107,108),(108,118),(118,120),(120,130),(130,137),(137,139),(139,149),(149,156),(156,158),(158,168),(168,175),(175,177),(177,187),(187,194),(194,196),(196,206),(206,213),(213,223),(223,233),(233,243),(243,253),(253,263),(263,273),(273,283),(283,293),(293,303),(303,311),(311,314),(314,322),(322,332),(332,342),(342,352),(352,353),(353,354),(354,355),(355,365),(365,375),(375,385),(385,395),(395,405),(405,415),(415,425),(425,435),(435,436),(436,437),(437,438),(438,439),(439,440),(440,442),(442,443),(443,444),(444,445),(445,446),(446,448),(448,458),(458,460),(460,470),(470,472),(472,482),(482,484),(484,494),(494,496),(496,506),(506,508),(508,518),(518,528),(528,542),(542,543),(543,551),(551,559),(559,561),(561,565),(565,567),(567,574),(574,582),(582,583),(583,584),(584,585),(585,593),(593,594),(594,595),(595,596),(596,604),(604,605),(605,606),(606,607),(607,615),(615,616),(616,617),(617,618),(618,626),(626,627),(627,628),(628,629),(629,637),(637,645),(645,653),(653,661),(661,669),(669,677),(677,685),(685,693),(693,701),(701,709),(709,717),(717,721),(721,729),(729,732),(732,734),(734,738),(738,746),(746,749),(749,751),(751,755),(755,763),(763,766),(766,774),(774,782),(782,790),(790,798),(798,800),(800,801),(801,802),(802,813),(813,829)] slice_list_WD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,47),(47,48),(48,49),(49,50),(50,51),(51,52),(52,53),(53,54),(54,55),(55,56),(56,57),(57,58),(58,98),(98,138),(138,178),(178,182),(182,183),(183,191),(191,197),(197,213)] slice_dict = { 'CLM' : slice_list_CLM, 'CTL' : slice_list_CTL, 'ADR' : slice_list_ADR, 'AGR' : slice_list_AGR, 'ACN' : slice_list_ACN, 'CST' : slice_list_CST, 'MCF' : slice_list_MCF, 'DD1' : slice_list_DD1, 'DES' : slice_list_DES, 'IBC' : slice_list_IBC, 'ICD' : slice_list_ICD, 'LEG' : slice_list_LEG, 'LD1' : slice_list_LD1, 'LD2' : slice_list_LD2, 'LD3' : slice_list_LD3, 'NET' : slice_list_NET, 'NOT' : slice_list_NOT, 'OFF' : slice_list_OFF, 'PAY' : slice_list_PAY, 'PRC' : slice_list_PRC, 'ACR' : slice_list_ACR, 'REC' : slice_list_REC, 'RED' : slice_list_RED, 'REI' : slice_list_REI, 'RES' : slice_list_RES, 'RST' : slice_list_RST, 'SPC' : slice_list_SPC, 'SSN' : slice_list_SSN, 'WRK' : slice_list_WRK, 'WD1' : slice_list_WD1, 'WD3' : slice_list_WD3, } def slicer(file,slice_list): csv_string = "" for i in slice_list: csv_string += (file[i[0]:i[1]]+",") return csv_string overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs with open(overview_loc, 'w') as overview_file: for key, value in dict_of_record_lists.items(): overview_file.write((key+' '+(str(len(value)))+'\n')) for key, value in dict_of_record_lists.items(): for k, v in slice_dict.items(): if key == k: iteration = 0 for i in value: s = slicer(i,v) value[iteration] = s iteration+= 1 e02_ext = read_loc[int(e02ext_start):] csv_ext = e02_ext[:-3]+'csv' # file overview/log that shows how many lines should exist in the other files to ensure everything wrote correctly overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs with open(overview_loc, 'w') as overview_file: for key, value in dict_of_record_lists.items(): overview_file.write((key+' '+(str(len(value)))+'\n')) # if the list isn't empty writes a new file w/prefix matching key and includes the lines for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+e02_ext with open(write_loc, "w", newline='') as parsed_file: for line in value: line_pre = "%s\n" % line parsed_file.write(line_pre[:-1]) for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: writer.writerow(i) This is a sample of a section of output in both Excel and our SQL table: P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 " " " " " " " " Desired output (void of " as place holders for spaces): P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 Any help would be greatly appreciated.
Why: The problem you are facing is, that you have list entries inside your row of processed data that only contain the csv.delimiter character. The module then quotes them to distinguish your "delimiter-only-data" from your "delimiter between columns". When writing something like [ ["PAY","...."," "," "," "," "] ] into a csv using ' ' as divider you get them outputted quoted: import csv dict_of_record_lists = {"K": [ ["PAY","...."," "," "," "," "] ] } for key, value in dict_of_record_lists.items(): write_loc = 't.txt' with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: writer.writerow(i) print( open(write_loc).read()) # PAY .... " " " " " " " " Fix: You can fix that specifying quoting=csv.QUOTE_NONE and provide a escapechar = ... or by fixing your data. Providing an escapechar would put that into your file though. Relevant portions of the documentation: csv.QUOTE_NONE. You can manipulate your data to not contain "only" the delimiters as data: for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as csvfile: writer = csv.writer(csvfile, delimiter=' ') for i in value: # if an inner item only contains delimiter characters, set it to empty string cleared = [x for x in i if i.strip(" ") else ""] writer.writerow(cleared) HTH Doku: https://docs.python.org/3/library/csv.html
Was able to change the initial text writing portion to: for key, value in dict_of_record_lists.items(): write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext with open(write_loc, "w", newline='') as parsed_file: for line in value: line_pre = "%s" % line parsed_file.write(line_pre[:-1]+'\n') All the issues were fixed by avoiding python's built-in CSV writer. The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created. A text file can be created by swapping out the extension upon writing.
python iterating over JSON object and writing to csv
I am trying to iterate over a JSON object, and write to a new CSV file. Anyhow I am getting an error when trying this code: def flat_attr(thisAttr): if type(thisAttr) is bytes: thisAttr = (thisAttr.decode('utf-8'))[:1500] else: try: thisAttr = str(thisAttr)[:1500] except: thsAttr = thisAttr return thisAttr thisDate = (datetime.today().date()) thisFile = 'sim_' + thisDate.strftime('%Y%m%d') + '.csv' with open('/tmp/' + thisFile, 'w') as csvfile: writer = csv.DictWriter(csvfile, fieldnames = ['sim_id', 'data'], delimiter = '\t', lineterminator = '\n') counter = 0 for issue in results.issues: counter += 1 print('Writer written line ' + str(counter) + ' issue_id: ' + issue.main_id) print('Writer written line ' + str(counter) + ' issue_id: ' + issue.labels) writer.writerow({ 'sim_id': issue.main_id, 'data': json.dumps({ for a in dir(issue): if a in attr_list: a: flat_attr(getattr(issue, a)) print(a) }) }) The Error is this one: E for a in dir(issue): E ^ E SyntaxError: invalid syntax When I change that writerow() for loop to the following code, it works: writer.writerow({ 'sim_id': issue.main_id, 'data': json.dumps({ a: flat_attr(getattr(issue, a)) for a in dir(issue) if a in attr_list }) }) I want to debut, that's why I am trying to print 'a'. How come the loop works, when the for loop and if-clause are after the a: flat_attr(getattr(issue, a)) and it doesn't when the for and if are before that line? How can I print 'a' to debug the code? Thanks!
If you want to debug what data has been passing in a particular line then you go for an IDE like Pycharm. Using Pycharm you can keep breakpoints and there will be option for debug an application at run time and now you can easily debug your program. Just try..
Ok, do one thing ZeleIB, append the value of 'a' to a list and return the list for a testing purpose. Example, for a in dir(issue): debug_a = [] if a in attr_list: a: flat_attr(getattr(issue, a)) debug_a.append(a) return {'test': debug_a}
My function to extract totals is exhausting my input file for future reading
The client includes 3 rows at the bottom that contain totals for me to reconcile against in my program. Only problem is that my program is exhausting the input file with readlines() before it can do anything else. Is there a way to keep the file from being exhausted during my get_recon_total function call? #!/usr/bin/env python # pre_process.py import csv import sys def main(): infile = sys.argv[1] outfile = sys.argv[2] with open(infile, 'rbU') as in_obj: # Create reader object, get fieldnames for later on reader, fieldnames = open_reader(in_obj) nav_tot_cnt, nav_rec_cnt, nav_erec_cnt = get_recon_totals(in_obj) print nav_tot_cnt, nav_rec_cnt, nav_erec_cnt # This switches the dictionary to a sorted list... necessary?? reader_list = sorted(reader, key=lambda key: (key['PEOPLE_ID'], key['DON_DATE'])) # Create a list to contain section header information header_list = create_header_list(reader_list) # Create dictionary that contains header list as the key, # then all rows that match as a list of dictionaries. master_dict = map_data(header_list, reader_list) # Write data to processed file, create recon counts to compare # to footer record tot_cnt, rec_cnt, erec_cnt = write_data(master_dict, outfile, fieldnames) print tot_cnt, rec_cnt, erec_cnt def open_reader(file_obj): ''' Uses DictReader from the csv module to take the first header line as the fieldnames, then applies them to each element in the file. Returns the DictReader object and the fieldnames being used (used later when data is printed out with DictWriter.) ''' reader = csv.DictReader(file_obj, delimiter=',') return reader, reader.fieldnames def create_header_list(in_obj): p_id_list = [] for row in in_obj: if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list: p_id_list.append((row['PEOPLE_ID'], row['DON_DATE'])) return p_id_list def map_data(header_list, data_obj): master_dict = {} client_section_list = [] for element in header_list: for row in data_obj: if (row['PEOPLE_ID'], row['DON_DATE']) == element: client_section_list.append(row) element = list(element) element_list = [client_section_list[0]['DEDUCT_AMT'], client_section_list[0]['ND_AMT'], client_section_list[0]['DEDUCT_YTD'], client_section_list[0]['NONDEDUCT_YTD'] ] try: element_list.append((float(client_section_list[0]['DEDUCT_YTD']) + float(client_section_list[0]['NONDEDUCT_YTD']) )) except ValueError: pass element.extend(element_list) element = tuple(element) master_dict[element] = client_section_list client_section_list = [] return master_dict def write_data(in_obj, outfile, in_fieldnames): with open(outfile, 'wb') as writer_outfile: writer = csv.writer(writer_outfile, delimiter=',') dict_writer = csv.DictWriter(writer_outfile, fieldnames=in_fieldnames, extrasaction='ignore') tot_cnt = 0 rec_cnt = 0 email_cnt = 0 for k, v in in_obj.iteritems(): writer_outfile.write(' -01- ') writer.writerow(k) rec_cnt += 1 for i, e in enumerate(v): if v[i]['INT_CODE_EX0006'] != '' or v[i]['INT_CODE_EX0028'] != '': email_cnt += 1 writer_outfile.write(' -02- ') dict_writer.writerow(e) tot_cnt += 1 return tot_cnt, rec_cnt, email_cnt def get_recon_totals(in_obj): print in_obj client_tot_cnt = 0 client_rec_cnt = 0 client_erec_cnt = 0 for line in in_obj.readlines(): line = line.split(',') if line[0] == 'T' and line[1] == 'Total Amount': print 'Total Amount found.' client_tot_cnt = line[2] elif line[0] == 'T' and line[1] == 'Receipt Count': print 'Receipt Count found.' client_rec_cnt = line[2] elif line[0] == 'T' and line[1] == 'Email Receipt Count': print 'E-Receipt Count Found.' client_erec_cnt = line[2] return client_tot_cnt, client_rec_cnt, client_erec_cnt if __name__ == '__main__': main()
If your file is not very large, you can convert reader generator to a list of dcitonary , by calling list() on reader and then use it in your code instead of trying to read from the file directly. Example - def main(): infile = sys.argv[1] outfile = sys.argv[2] with open(infile, 'rbU') as in_obj: # Create reader object, get fieldnames for later on reader, fieldnames = open_reader(in_obj) reader_list = list(reader) nav_tot_cnt, nav_rec_cnt, nav_erec_cnt = get_recon_totals(reader_list) print nav_tot_cnt, nav_rec_cnt, nav_erec_cnt # This switches the dictionary to a sorted list... necessary?? reader_list = sorted(reader_list, key=lambda key: (key['PEOPLE_ID'], key['DON_DATE'])) . . def get_recon_totals(reader_list): print in_obj client_tot_cnt = 0 client_rec_cnt = 0 client_erec_cnt = 0 for line in reader_list: #line here is a dict if line[<fieldname for first column>] == 'T' and line[<fieldname for secondcolumn>] == 'Total Amount': print 'Total Amount found.' client_tot_cnt = line[<fieldname for third column>] . . #continued like above . return client_tot_cnt, client_rec_cnt, client_erec_cnt
Rewind the file pointer to the beginning of the previous line
I am doing text processing and using 'readline()' function as follows: ifd = open(...) for line in ifd: while (condition) do something... line = ifd.readline() condition = .... #Here when the condition becomes false I need to rewind the pointer so that the 'for' loop read the same line again. ifd.fseek() followed by readline is giving me a '\n' character. How to rewind the pointer so that the whole line is read again. >>> ifd.seek(-1,1) >>> line = ifd.readline() >>> line '\n' Here is my code labtestnames = sorted(tmp) #Now read each line in the inFile and write into outFile ifd = open(inFile, "r") ofd = open(outFile, "w") #read the header header = ifd.readline() #Do nothing with this line. Skip #Write header into the output file nl = "mrn\tspecimen_id\tlab_number\tlogin_dt\tfluid" offset = len(nl.split("\t")) nl = nl + "\t" + "\t".join(labtestnames) ofd.write(nl+"\n") lenFields = len(nl.split("\t")) print "Reading the input file and converting into modified file for further processing (correlation analysis etc..)" prevTup = (0,0,0) rowComplete = 0 k=0 for line in ifd: k=k+1 if (k==200): break items = line.rstrip("\n").split("\t") if((items[0] =='')): continue newline= list('' for i in range(lenFields)) newline[0],newline[1],newline[3],newline[2],newline[4] = items[0], items[1], items[3], items[2], items[4] ltests = [] ltvals = [] while(cmp(prevTup, (items[0], items[1], items[3])) == 0): # If the same mrn, lab_number and specimen_id then fill the same row. else create a new row. ltests.append(items[6]) ltvals.append(items[7]) pos = ifd.tell() line = ifd.readline() prevTup = (items[0], items[1], items[3]) items = line.rstrip("\n").split("\t") rowComplete = 1 if (rowComplete == 1): #If the row is completed, prepare newline and write into outfile indices = [labtestnames.index(x) for x in ltests] j=0 ifd.seek(pos) for i in indices: newline[i+offset] = ltvals[j] j=j+1 if (rowComplete == 0): # currTup = (items[0], items[1], items[3]) ltests = items[6] ltvals = items[7] pos = ifd.tell() line = ifd.readline() items = line.rstrip("\n").split("\t") newTup = (items[0], items[1], items[3]) if(cmp(currTup, newTup) == 0): prevTup = currTup ifd.seek(pos) continue else: indices = labtestnames.index(ltests) newline[indices+offset] = ltvals ofd.write(newline+"\n")
The problem can be handled more simply using itertools.groupby. groupby can cluster all the contiguous lines that deal with the same mrn, specimen_id, and lab_num. The code that does this is for key, group in IT.groupby(reader, key = mykey): where reader iterates over the lines of the input file, and mykey is defined by def mykey(row): return (row['mrn'], row['specimen_id'], row['lab_num']) Each row from reader is passed to mykey, and all rows with the same key are clustered together in the same group. While we're at it, we might as well use the csv module to read each line into a dict (which I call row). This frees us from having to deal with low-level string manipulation like line.rstrip("\n").split("\t") and instead of referring to columns by index numbers (e.g. row[3]) we can write code that speaks in higher-level terms such as row['lab_num']. import itertools as IT import csv inFile = 'curious.dat' outFile = 'curious.out' def mykey(row): return (row['mrn'], row['specimen_id'], row['lab_num']) fieldnames = 'mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate'.split() with open(inFile, 'rb') as ifd: reader = csv.DictReader(ifd, delimiter = '\t') with open(outFile, 'wb') as ofd: writer = csv.DictWriter( ofd, fieldnames, delimiter = '\t', lineterminator = '\n', ) writer.writeheader() for key, group in IT.groupby(reader, key = mykey): new = {} row = next(group) for key in ('mrn', 'specimen_id', 'date', 'lab_num'): new[key] = row[key] new[row['labtest']] = row['result_val'] for row in group: new[row['labtest']] = row['result_val'] writer.writerow(new) yields mrn specimen_id date lab_num Bilirubin Lipase Calcium Magnesium Phosphate 4419529 1614487 26.2675 5802791G 0.1 3319529 1614487 26.2675 5802791G 0.3 153 8.1 2.1 4 5713871 682571 56.0779 9732266E 4.1
This seems to be a perfect use case for yield expressions. Consider the following example that prints lines from a file, repeating some of them at random: def buflines(fp): r = None while True: r = yield r or next(fp) if r: yield None from random import randint with open('filename') as fp: buf = buflines(fp) for line in buf: print line if randint(1, 100) > 80: print 'ONCE AGAIN::' buf.send(line) Basically, if you want to process an item once again, you send it back to the generator. On the next iteration you will be reading the same item once again.