how to parse a txt file to csv and modify formatting

how to parse a txt file to csv and modify formatting - python

Is there a way I can use python to take my animals.txt file results and convert it to csv and format it differently?
Currently the animals.txt file looks like this:
ID:- 512
NAME:- GOOSE
PROJECT NAME:- Random
REPORT ID:- 30321
REPORT NAME:- ANIMAL
KEYWORDS:- ['"help,goose,Grease,GB"']
ID:- 566
NAME:- MOOSE
PROJECT NAME:- Random
REPORT ID:- 30213
REPORT NAME:- ANIMAL
KEYWORDS:- ['"Moose, boar, hansel"']
I would like the CSV file to present it as:
ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS
Followed by the results underneath each header
Here is a script I have wrote:
import re
import csv
with open("animals.txt") as f: text = f.read()
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=keys)
writer.writeheader()
for x in data:
writer.writerow(x)

An easy way to do is parsing using regex and store them in a dict, just before you write the final csv:
import re
# `text` is your input text
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
And to CSV:
import csv
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, escapechar='\\')
writer.writerow(data.keys())
for i in range(len(data[keys[0]])):
writer.writerow([data[k][i] for k in keys])
Output in csv:
ID,NAME,PROJECT NAME,REPORT ID,REPORT NAME,KEYWORDS
512,GOOSE,Random,30321,ANIMAL,['\"help\,goose\,Grease\,GB\"']
566,MOOSE,Random,30213,ANIMAL,['\"Moose\, boar\, hansel\"']
Note that I used re.M multiline mode since there's a trick in your text, preventing matching ID twice! Also the default write rows needed to be twisted.
Also uses \ to escape the quote.

This should work:
fname = 'animals.txt'
with open(fname) as f:
content = f.readlines()
content = [x.strip() for x in content]
output = 'ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS\n'
line_output = ''
for i in range(0, len(content)):
if content[i]:
line_output += content[i].split(':-')[-1].strip() + ','
elif not content[i] and not content[i - 1]:
output += line_output.rstrip(',') + '\n'
line_output = ''
output += line_output.rstrip(',') + '\n'
print(output)

That's the code in Autoit (www.autoitscript.com)
Global $values_A = StringRegExp(FileRead("json.txt"), '[ID|NAME|KEYWORDS]:-\s(.*)?', 3)
For $i = 0 To UBound($values_A) - 1 Step +6
FileWrite('out.csv', $values_A[$i] & ',' & $values_A[$i + 1] & ',' & $values_A[$i + 2] & ',' & $values_A[$i + 3] & ',' & $values_A[$i + 4] & ',' & $values_A[$i + 5] & #CRLF)
Next

Related

Import CSV File and Doing arithmetic Opertions without importing any Library in PYTHON

My CSV file Looks Like this
Time_stamp; Mobile_number; Download; Upload; Connection_start_time; Connection_end_time; location
1/2/2020 10:43:55;+917777777777;213455;2343;1/2/2020 10:43:55;1/2/2020 10:47:25;09443
1/3/2020 10:33:10;+919999999999;345656;3568;1/3/2020 10:33:10;1/3/2020 10:37:20;89442
1/4/2020 11:47:57;+919123456654;345789;7651;1/4/2020 11:11:10;1/4/2020 11:40:22;19441
1/5/2020 11:47:57;+919123456543;342467;4157;1/5/2020 11:44:10;1/5/2020 11:59:22;29856
1/6/2020 10:47:57;+917777777777;213455;2343;1/6/2020 10:43:55;1/6/2020 10:47:25;09443
MY Question is
Without importing any Library file
How i can read a CSV file & user have to enter the Mobile number & Program should show the Data usage of that number. ie: Arithmetic Operation (Adding Uplink & downlink ) & get the result (Total Data Used)of that specific Mobile number.
Here is what my code looks Like. ( i don't want to import any Pandas Library. )
import pandas as pd
df = pd.read_csv('test.csv', sep=';')
df.columns = [col.strip() for col in df.columns]
usage = df[['Download', 'Upload']][df.Mobile_number == +917777777777].sum().sum()
print(usage)

I'd use csv.DictReader
In [30]: with open('x', 'r') as f:
...: r = csv.DictReader(f, delimiter=';')
...: dct = {}
...: for row in r:
...: dct.setdefault(row[' Mobile_number'], []).append(row)
...:
In [31]: dct
Out[31]:
{'+917777777777': [OrderedDict([('Time_stamp', '1/2/2020 10:43:55'),
(' Mobile_number', '+917777777777'),
(' Download', '213455'),
(' Upload', '2343'),
(' Connection_start_time', '1/2/2020 10:43:55'),
(' Connection_end_time', '1/2/2020 10:47:25'),
(' location', '09443')]),
OrderedDict([('Time_stamp', '1/6/2020 10:47:57'),
(' Mobile_number', '+917777777777'),
(' Download', '213455'),
(' Upload', '2343'),
(' Connection_start_time', '1/6/2020 10:43:55'),
(' Connection_end_time', '1/6/2020 10:47:25'),
(' location', '09443')])],
'+919999999999': [OrderedDict([('Time_stamp', '1/3/2020 10:33:10'),
(' Mobile_number', '+919999999999'),
(' Download', '345656'),
(' Upload', '3568'),
(' Connection_start_time', '1/3/2020 10:33:10'),
(' Connection_end_time', '1/3/2020 10:37:20'),
(' location', '89442')])],
'+919123456654': [OrderedDict([('Time_stamp', '1/4/2020 11:47:57'),
(' Mobile_number', '+919123456654'),
(' Download', '345789'),
(' Upload', '7651'),
(' Connection_start_time', '1/4/2020 11:11:10'),
(' Connection_end_time', '1/4/2020 11:40:22'),
(' location', '19441')])],
'+919123456543': [OrderedDict([('Time_stamp', '1/5/2020 11:47:57'),
(' Mobile_number', '+919123456543'),
(' Download', '342467'),
(' Upload', '4157'),
(' Connection_start_time', '1/5/2020 11:44:10'),
(' Connection_end_time', '1/5/2020 11:59:22'),
(' location', '29856')])]}
In [32]:
You then process list of dict for a given mobile number by something like usage = sum(float(_[' Download']) + float(_[' Upload']) for _ in dct['+91777777777'])

Noting that you specifically wanted to avoid importing any libraries (I assume this means you want to avoid importing even from the included modules) - for a trivial file (I named one supermarkets.csv, the content looks like this):
ID,Address,City,State,Country,Name,Employees
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20
Then you can do something like this:
data = []
with open("supermarkets.csv") as f:
for line in f:
data.append(line)
print(data)
From here you can manipulate your each of the entries in the list using string tools and list comprehensions.

You could try open that does not require any library, to read your file and then iterate through it with readlines. Split the line and check your condition depending on the place in the file your data are.
usage=0
with open('test.csv', 'r') as f:
for line in f.readlines():
try:
line_sp = line.split(';')
if line_sp[1]=='+917777777777':
usage += int(line_sp[2])+int(line_sp[3])
except:
#print(line)
pass
print (usage)

Using no imported modules
# read file and create dict of phone numbers
phone_dict = dict()
with open('test.csv') as f:
for i, l in enumerate(f.readlines()):
l = l.strip().split(';')
if (i != 0):
mobile = l[1]
download = int(l[2])
upload = int(l[3])
if phone_dict.get(mobile) == None:
phone_dict[mobile] = {'download': [download], 'upload': [upload]}
else:
phone_dict[mobile]['download'].append(download)
phone_dict[mobile]['upload'].append(upload)
print(phone_dict)
{'+917777777777': {'download': [213455, 213455], 'upload': [2343, 2343]},
'+919999999999': {'download': [345656], 'upload': [3568]},
'+919123456654': {'download': [345789], 'upload': [7651]},
'+919123456543': {'download': [342467], 'upload': [4157]}}
# function to return usage
def return_usage(data: dict, number: str):
download_usage = sum(data[number]['download'])
upload_usage = sum(data[number]['upload'])
return download_usage + upload_usage
# get user input to return usage
number = input('Please input a phone number')
usage = return_usage(phone_dict, number)
print(usage)
>>> Please input a phone number (numbers only) +917777777777
>>> 431596

A combination of csv and defaultdict can fit your use case:
import csv
from collections import defaultdict
d= defaultdict(list)
with open('data.txt',newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=';', skipinitialspace = True)
headers = reader.fieldnames
for row in reader:
row['Usage'] = int(row['Upload']) + int(row['Download'])
d[row.get('Mobile_number')].append(row["Usage"])
print(d)
defaultdict(list,
{'+917777777777': [215798, 215798],
'+919999999999': [349224],
'+919123456654': [353440],
'+919123456543': [346624]})
#get sum for specific mobile number :
sum(d.get("+917777777777"))
431596
Additional details :
new_d = {}
for k,v in d.items():
kb = sum(v)
mb = kb/1024
gb = kb/1024**2
usage = F"{kb}KB/{mb:.2f}MB/{gb:.2f}GB"
new_d[k] = usage
print(new_d)
{'+917777777777': '431596KB/421.48MB/0.41GB',
'+919999999999': '349224KB/341.04MB/0.33GB',
'+919123456654': '353440KB/345.16MB/0.34GB',
'+919123456543': '346624KB/338.50MB/0.33GB'}

Converting Fixed-Width File to .txt then the .txt to .csv

I have a fixed-width file that I have no issues importing and splitting into 31 txt files. The spaces from the fixed-width file are conserved in this process since the writing to the txt simply writes each entry from the fixed-width file as a new line.
My issue is that when I use python's csv function these spaces are replaced with "(a quotation mark) as a place holder.
I'm looking to see if there is a way to have a csv file produced without these double quotes as place holders while maintaining the required formatting initially set in the fixed-width file.
Initial line in txt doc:
'PAY90004100095206 9581400086000909 0008141000 5350 3810 C 000021841998051319980513P810406247 FELT, MARTIN & FRAZIER, P.C. FELT, MARTIN & FRAZIER, P.C. 208 NORTH BROADWAY STE 313 BILLINGS MT59101-0 NLance Martin v. Whitman College N00000000NN98004264225 SYS656 19980512+000000378761998041319980421+000000378769581400086000909 000+000000 Lance Martin v. Whitman College 00000000 00010001 +00000000000002184 000000021023.005000000003921.005\n'
.py:
import csv
read_loc = 'c:/Users/location/e0290000005.txt'
e02ext_start = read_loc.find('e02')
e02_ext = read_loc[int(e02ext_start):]
with open(read_loc, 'r') as f:
contents = f.readlines()
dict_of_record_lists = {}
# takes first 3 characters of each line and if a matching dictionary key is found
# it appends the line to the value-list
for line in contents:
record_type = (line[:3])
dict_of_record_lists.setdefault(record_type,[]).append(line)
slice_list_CLM = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,59),(59,109),(109,189),(189,191),(191,193),(193,194),(194,195),(195,203),(203,211),(211,219),(219,227),(227,235),(235,237),(237,239),(239,241),(241,245),(245,249),(249,253),(253,257),(257,261),(261,291),(291,316),(316,331),(331,332),(332,357),(357,377),(377,378),(378,408),(408,438),(438,468),(468,470),(470,485),(485,505),(505,514),(514,517),(517,525),(525,533),(533,535),(535,536),(536,537),(537,545),(545,551),(551,553),(553,568),(568,572),(572,587),(587,602),(602,627),(627,631),(631,638),(638,642),(642,646),(646,654),(654,662),(662,670),(670,672),(672,674),(674,675),(675,676),(676,682),(682,700),(700,708),(708,716),(716,717),(717,725),(725,733),(733,741),(741,749),(749,759),(759,761),(761,762),(762,763),(763,764),(764,765),(765,768),(768,769),(769,770),(770,778),(778,779),(779,783),(783,787),(787,788),(788,805),(805,817),(817,829),(829,833),(833,863),(863,893),(893,896),(896,897),(897,898),(898,928),(928,936),(936,944),(944,945),(945,947),(947,959),(959,971),(971,983),(983,995),(995,1007),(1007,1019),(1019,1031),(1031,1043),(1043,1055),(1055,1067),(1067,1079),(1079,1091),(1091,1103),(1103,1115),(1115,1127),(1127,1139),(1139,1151),(1151,1163),(1163,1175),(1175,1187),(1187,1197),(1197,1202),(1202,1203),(1203,1211),(1211,1214),(1214,1215),(1215,1233),(1233,1241),(1241,1257),(1257,1272),(1272,1273),(1273,1285),(1285,1289),(1289,1293),(1293,1343),(1343,1365),(1365,1685),(1685,1686),(1686,1704),(1704,1708),(1708,1748),(1748,1768),(1768,1770),(1770,1772),(1772,1773),(1773,1782),(1782,1784),(1784,1792),(1792,1793),(1793,1796),(1796,1800)]
slice_list_CTL = [(0,3),(3,7),(7,15),(15,23),(23,31),(31,39),(39,47),(47,55),(55,56),(56,65),(65,74),(74,83),(83,98),(98,113),(113,128),(128,143),(143,158),(158,173),(173,188),(188,203),(203,218),(218,233),(233,248),(248,263),(263,278),(278,293),(293,308),(308,323),(323,338),(338,353),(353,368),(368,383),(383,398),(398,413),(413,428),(428,443),(443,458),(458,473),(473,488),(488,503),(503,518),(518,527),(527,536),(536,545),(545,554),(554,563),(563,572),(572,581),(581,590),(590,599),(599,614),(614,623),(623,638),(638,647),(647,662),(662,671),(671,686),(686,695),(695,710),(710,719),(719,728),(728,737),(737,746),(746,755),(755,764),(764,773),(773,782),(782,791),(791,800),(800,809),(809,818),(818,827),(827,836),(836,845),(845,854),(854,863),(863,872),(872,881),(881,890),(890,899),(899,908)]
slice_list_ADR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,50),(50,53),(53,62),(62,65),(65,66),(66,91),(91,111),(111,121),(121,151),(151,181),(181,206),(206,208),(208,223),(223,243),(243,261),(261,265),(265,283),(283,287),(287,305),(305,335),(335,375),(375,383),(383,387),(387,437),(437,438),(438,446),(446,454),(454,461),(461,468),(468,484),(484,500)]
slice_list_AGR = [(0,3),(3,7),(7,45),(45,85),(85,93),(93,101),(101,109),(109,117),(117,127),(127,139),(139,151)]
slice_list_ACN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,115),(115,145),(145,165),(165,195),(195,215),(215,245),(245,265),(265,295),(295,315),(315,345),(345,365),(365,395),(395,415),(415,445),(445,465),(465,495),(495,515),(515,545),(545,565),(565,595),(595,615),(615,645),(645,665),(665,695),(695,715),(715,745),(745,765),(765,795),(795,815),(815,845),(845,865),(865,895),(895,915),(915,945),(945,965),(965,995),(995,1015),(1015,1045),(1045,1061)]
slice_list_CST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,59),(59,60),(60,61),(61,62),(62,64),(64,80),(80,82),(82,84),(84,86),(86,88),(88,104)]
slice_list_MCF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,49),(49,79),(79,94),(94,159),(159,175),(175,191)]
slice_list_DD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,62),(62,63),(63,69),(69,75),(75,81),(81,87),(87,93),(93,94),(94,95),(95,103),(103,111),(111,119),(119,126),(126,134),(134,143),(143,154),(154,162),(162,170),(170,178),(178,186),(186,194),(194,202),(202,205),(205,208),(208,210),(210,218),(218,220),(220,228),(228,230),(230,238),(238,240),(240,248),(248,250),(250,258),(258,274)]
slice_list_DES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,1300),(1300,1316)]
slice_list_IBC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,48),(48,50),(50,54),(54,55),(55,56),(56,81),(81,101),(101,121),(121,124),(124,125),(125,145),(145,146),(146,149),(149,152),(152,154),(154,179),(179,199),(199,219),(219,222),(222,224),(224,227),(227,230),(230,238),(238,249),(249,265),(265,281)]
slice_list_ICD = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,63),(63,69),(69,75),(75,81),(81,87),(87,95),(95,103),(103,111),(111,114),(114,122),(122,125),(125,126),(126,142),(142,144),(144,152),(152,154),(154,162),(162,164),(164,172),(172,174),(174,182),(182,184),(184,192),(192,208)]
slice_list_LEG = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,65),(65,73),(73,81),(81,82),(82,90),(90,98),(98,133),(133,148),(148,163),(163,164),(164,172),(172,180),(180,181),(181,216),(216,256),(256,296),(296,326),(326,356),(356,381),(381,383),(383,398),(398,418),(418,438),(438,456),(456,474),(474,509),(509,549),(549,589),(589,619),(619,649),(649,674),(674,676),(676,691),(691,711),(711,731),(731,749),(749,767),(767,782),(782,790),(790,798),(798,806),(806,810),(810,818),(818,826),(826,834),(834,840),(840,849),(849,879),(879,888),(888,918),(918,920),(920,921),(921,923),(923,931),(931,939),(939,943),(943,944),(944,952),(952,960),(960,990),(990,1020),(1020,1050),(1050,1051),(1051,1086),(1086,1095),(1095,1135),(1135,1175),(1175,1205),(1205,1235),(1235,1260),(1260,1262),(1262,1277),(1277,1295),(1295,1304),(1304,1312),(1312,1328)]
slice_list_LD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,65),(65,95),(95,125),(125,150),(150,152),(152,167),(167,187),(187,205),(205,223),(223,227),(227,252),(252,267),(267,279),(279,309),(309,339),(339,359),(359,361),(361,376),(376,396),(396,414),(414,439),(439,440),(440,448),(448,454),(454,456),(456,871),(471,472),(472,492),(492,522),(522,552),(552,572),(572,574),(574,589),(589,609),(609,627),(627,637),(637,645),(645,685),(685,686),(686,706),(706,714),(714,744),(744,774),(774,794),(794,796),(796,811),(811,831),(831,849),(849,879),(879,909),(909,929),(929,931),(931,946),(946,966),(966,984),(984,992),(992,1004),(1004,1024),(1024,1064),(1064,1081),(1081,1098),(1098,1106),(1106,1121),(1121,1122),(1122,1152),(1152,1153),(1153,1162),(1162,1170),(1170,1185),(1185,1190),(1190,1220),(1220,1238),(1238,1253),(1253,1283),(1283,1301),(1301,1302),(1302,1303),(1303,1333),(1333,1363),(1363,1388),(1388,1390),(1390,1405),(1405,1406),(1406,1436),(1436,1442),(1442,1462),(1462,1463),(1463,1478),(1478,1493),(1493,1533),(1533,1535),(1535,1538),(1538,1540),(1540,1556),(1556,1756)]
slice_list_LD2 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,78),(78,118),(118,148),(148,178),(178,203),(203,205),(205,220),(220,238),(238,256),(256,260),(260,270),(270,290),(290,300),(300,302),(302,322),(322,352),(352,377),(377,397),(397,398),(398,423),(423,424),(424,454),(454,455),(455,456),(456,458),(458,474)]
slice_list_LD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,71),(71,91),(91,92),(92,122),(122,152),(152,177),(177,179),(179,194),(194,197),(197,205),(205,213),(213,221),(221,229),(229,237),(237,297),(297,305),(305,313),(313,321),(321,329),(329,337),(337,345),(345,353),(353,361),(361,421),(421,429),(429,489),(489,497),(497,557),(557,617),(617,633)]
slice_list_NET = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,88),(88,99),(99,105),(105,135),(135,146),(146,152),(152,182),(182,193),(193,199),(199,229),(229,240),(240,246),(246,276),(276,287),(287,293),(293,323),(323,334),(334,340),(340,370),(370,381),(381,387),(387,417),(417,428),(428,434),(434,464),(464,475),(475,481),(481,511),(511,522),(522,528),(528,558),(558,569),(569,575),(575,605),(605,616),(616,622),(622,652),(652,663),(663,669),(669,699),(699,710),(710,716),(716,746),(746,757),(757,763),(763,793),(793,804),(804,810),(810,840),(840,851),(851,857),(857,887),(887,898),(898,904),(904,934),(934,945),(945,951),(951,981),(981,992),(992,998),(998,1028),(1028,1039),(1039,1047),(1047,1055),(1055,1061),(1061,1077),(1077,1087),(1087,1103)]
slice_list_NOT = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,47),(47,55),(55,63),(63,71),(71,77),(77,79),(79,1279),(1279,1295),(1295,1296),(1296,1312)]
slice_list_OFF = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,75),(75,78),(78,93),(93,105),(105,107),(107,115),(115,123),(123,131),(131,132),(132,148)]
slice_list_PAY = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,60),(60,61),(61,65),(65,73),(73,81),(81,89),(89,90),(90,130),(130,165),(165,205),(205,245),(245,275),(275,305),(305,330),(330,332),(332,347),(347,367),(367,368),(368,428),(428,429),(429,437),(437,438),(438,439),(439,450),(450,452),(452,455),(455,458),(458,473),(473,481),(481,493),(493,501),(501,509),(509,521),(521,539),(539,542),(542,549),(549,552),(552,562),(562,567),(567,627),(627,635),(635,643),(643,647),(647,651),(651,653),(653,654),(654,684),(684,692),(692,702),(702,713),(713,1034),(1034,1050),(1050,1066)]
slice_list_PRC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,51),(51,81),(81,84),(84,87),(87,95),(95,103),(103,119),(119,125),(125,131),(131,147)]
slice_list_ACR = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,51),(51,59),(59,71),(71,79),(79,91),(91,103),(103,119),(119,135)]
slice_list_REC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,58),(58,71),(71,84),(84,97),(97,110),(110,123),(123,136),(136,149),(149,162),(162,175),(175,188),(188,201),(201,214),(214,227),(227,240),(240,253),(253,266),(266,279),(279,292),(292,305),(305,318),(318,331),(331,344),(344,357),(357,370),(370,383),(383,396),(396,409),(409,422),(422,435),(435,448),(448,461),(461,474),(474,487),(487,500),(500,513),(513,526),(526,539),(539,552),(552,565),(565,578),(578,591),(591,604),(604,617),(617,630),(630,643),(643,656),(656,669),(669,682),(682,695),(695,708),(708,721),(721,734),(734,747),(747,760),(760,773),(773,786),(786,799),(799,812),(812,825),(825,838),(838,851),(851,864),(864,877),(877,890),(890,903),(903,916),(916,929),(929,942),(942,955),(955,968),(968,981),(981,997)]
slice_list_RED = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,57),(57,69),(69,81),(81,93),(93,105),(105,117),(117,129),(129,141),(141,157)]
slice_list_REI = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,61),(61,67),(67,87),(87,88),(88,100),(100,108),(108,116),(116,176),(176,192),(192,193),(193,199),(199,214),(214,222),(222,230),(230,238),(238,250),(250,251),(251,311),(311,327)]
slice_list_RES = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,54),(54,134),(134,136),(136,148),(148,160),(160,172),(172,184),(184,196),(196,208),(208,220),(220,232),(232,242),(242,252),(252,262),(262,272),(272,282),(282,292),(292,299),(299,309),(309,319),(319,329),(329,339),(339,349),(349,359),(359,369),(369,379),(379,389),(389,399),(399,409),(409,419),(419,429),(429,439),(439,449),(449,465),(465,475),(475,975),(975,991)]
slice_list_RST = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,87),(87,95),(95,125),(125,145),(145,161),(161,177)]
slice_list_SPC = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,61),(61,69),(69,77),(77,85),(85,93),(93,101),(101,109),(109,117),(117,125),(125,133),(133,149)]
slice_list_SSN = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,62),(62,74),(74,82),(82,94),(94,102),(102,114),(114,122),(122,134),(134,142),(142,143),(143,151),(151,159),(159,160),(160,168),(168,176),(176,177),(177,185),(185,193),(193,194),(194,202),(202,210),(210,211),(211,219),(219,220),(220,228),(228,268),(268,276),(276,277),(277,293)]
slice_list_WRK = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,53),(53,57),(57,72),(72,73),(73,81),(81,82),(82,90),(90,98),(98,106),(106,114),(114,122),(122,130),(130,131),(131,132),(132,133),(133,153),(153,154),(154,155),(155,159),(159,179),(179,180),(180,240),(240,248),(248,256),(256,264),(264,272),(272,280),(280,284),(284,288),(288,298),(298,314),(314,330)]
slice_list_WD1 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,54),(54,58),(58,59),(59,60),(60,61),(61,63),(63,73),(73,74),(74,82),(82,83),(83,91),(91,99),(99,107),(107,108),(108,118),(118,120),(120,130),(130,137),(137,139),(139,149),(149,156),(156,158),(158,168),(168,175),(175,177),(177,187),(187,194),(194,196),(196,206),(206,213),(213,223),(223,233),(233,243),(243,253),(253,263),(263,273),(273,283),(283,293),(293,303),(303,311),(311,314),(314,322),(322,332),(332,342),(342,352),(352,353),(353,354),(354,355),(355,365),(365,375),(375,385),(385,395),(395,405),(405,415),(415,425),(425,435),(435,436),(436,437),(437,438),(438,439),(439,440),(440,442),(442,443),(443,444),(444,445),(445,446),(446,448),(448,458),(458,460),(460,470),(470,472),(472,482),(482,484),(484,494),(494,496),(496,506),(506,508),(508,518),(518,528),(528,542),(542,543),(543,551),(551,559),(559,561),(561,565),(565,567),(567,574),(574,582),(582,583),(583,584),(584,585),(585,593),(593,594),(594,595),(595,596),(596,604),(604,605),(605,606),(606,607),(607,615),(615,616),(616,617),(617,618),(618,626),(626,627),(627,628),(628,629),(629,637),(637,645),(645,653),(653,661),(661,669),(669,677),(677,685),(685,693),(693,701),(701,709),(709,717),(717,721),(721,729),(729,732),(732,734),(734,738),(738,746),(746,749),(749,751),(751,755),(755,763),(763,766),(766,774),(774,782),(782,790),(790,798),(798,800),(800,801),(801,802),(802,813),(813,829)]
slice_list_WD3 = [(0,3),(3,7),(7,15),(15,21),(21,39),(39,42),(42,45),(45,46),(46,47),(47,48),(48,49),(49,50),(50,51),(51,52),(52,53),(53,54),(54,55),(55,56),(56,57),(57,58),(58,98),(98,138),(138,178),(178,182),(182,183),(183,191),(191,197),(197,213)]
slice_dict = {
'CLM' : slice_list_CLM,
'CTL' : slice_list_CTL,
'ADR' : slice_list_ADR,
'AGR' : slice_list_AGR,
'ACN' : slice_list_ACN,
'CST' : slice_list_CST,
'MCF' : slice_list_MCF,
'DD1' : slice_list_DD1,
'DES' : slice_list_DES,
'IBC' : slice_list_IBC,
'ICD' : slice_list_ICD,
'LEG' : slice_list_LEG,
'LD1' : slice_list_LD1,
'LD2' : slice_list_LD2,
'LD3' : slice_list_LD3,
'NET' : slice_list_NET,
'NOT' : slice_list_NOT,
'OFF' : slice_list_OFF,
'PAY' : slice_list_PAY,
'PRC' : slice_list_PRC,
'ACR' : slice_list_ACR,
'REC' : slice_list_REC,
'RED' : slice_list_RED,
'REI' : slice_list_REI,
'RES' : slice_list_RES,
'RST' : slice_list_RST,
'SPC' : slice_list_SPC,
'SSN' : slice_list_SSN,
'WRK' : slice_list_WRK,
'WD1' : slice_list_WD1,
'WD3' : slice_list_WD3,
}
def slicer(file,slice_list):
csv_string = ""
for i in slice_list:
csv_string += (file[i[0]:i[1]]+",")
return csv_string
overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs
with open(overview_loc, 'w') as overview_file:
for key, value in dict_of_record_lists.items():
overview_file.write((key+' '+(str(len(value)))+'\n'))
for key, value in dict_of_record_lists.items():
for k, v in slice_dict.items():
if key == k:
iteration = 0
for i in value:
s = slicer(i,v)
value[iteration] = s
iteration+= 1
e02_ext = read_loc[int(e02ext_start):]
csv_ext = e02_ext[:-3]+'csv'
# file overview/log that shows how many lines should exist in the other files to ensure everything wrote correctly
overview_loc = 'c:/Users/location/E02_ingestion/'+ 'overview_'+e02_ext #put in file location wehre you would like to see logs
with open(overview_loc, 'w') as overview_file:
for key, value in dict_of_record_lists.items():
overview_file.write((key+' '+(str(len(value)))+'\n'))
# if the list isn't empty writes a new file w/prefix matching key and includes the lines
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+e02_ext
with open(write_loc, "w", newline='') as parsed_file:
for line in value:
line_pre = "%s\n" % line
parsed_file.write(line_pre[:-1])
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
writer.writerow(i)
This is a sample of a section of output in both Excel and our SQL table:
P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7 " " " " " " " "
Desired output (void of " as place holders for spaces):
P A Y 9 0 0 0 4 1 0 0 0 9 5 2 0 7
Any help would be greatly appreciated.

Why:
The problem you are facing is, that you have list entries inside your row of processed data that only contain the csv.delimiter character. The module then quotes them to distinguish your "delimiter-only-data" from your "delimiter between columns".
When writing something like [ ["PAY","...."," "," "," "," "] ] into a csv using ' ' as divider you get them outputted quoted:
import csv
dict_of_record_lists = {"K": [ ["PAY","...."," "," "," "," "] ] }
for key, value in dict_of_record_lists.items():
write_loc = 't.txt'
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
writer.writerow(i)
print( open(write_loc).read()) # PAY .... " " " " " " " "
Fix:
You can fix that specifying quoting=csv.QUOTE_NONE and provide a escapechar = ... or by fixing your data. Providing an escapechar would put that into your file though.
Relevant portions of the documentation: csv.QUOTE_NONE.
You can manipulate your data to not contain "only" the delimiters as data:
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/location/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ')
for i in value:
# if an inner item only contains delimiter characters, set it to empty string
cleared = [x for x in i if i.strip(" ") else ""]
writer.writerow(cleared)
HTH
Doku:
https://docs.python.org/3/library/csv.html

Was able to change the initial text writing portion to:
for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as parsed_file:
for line in value:
line_pre = "%s" % line
parsed_file.write(line_pre[:-1]+'\n')
All the issues were fixed by avoiding python's built-in CSV writer.
The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created.
A text file can be created by swapping out the extension upon writing.

CSV file not properly filled up with details

import csv
TextFileContent = open('tickets.txt')
with open('example4.csv', 'w') as csvfile:
fieldnames = ['Author', 'ticket number', 'Revision']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for TextLine in TextFileContent:
if 'Revision:' in TextLine:
tmp=TextLine.replace('Revision:', "")
print(tmp)
writer.writerow({'Revision': tmp})
elif 'Author:' in TextLine:
tmp=TextLine.replace("Author:", "")
print(tmp)
writer.writerow({'Author': tmp})
elif 'Contributes to:' in TextLine:
tmp=TextLine.replace("Contributes to:", "")
print(tmp)
writer.writerow({'ticket number': tmp})
Hi all i have developed above python script to extract "Author", "Ticket" and "revision" details from text file and then filled up that infomation to CSV file.
Now i am able to extract all information but the data not correctly filled up in CSV file.
the text file content is like below
Revision: 22904
Author: Userx
Contributes to: CF-1159
Revision: 22887
Author: Usery
Contributes to: CF-955
Revision: 22884
Author: UserZ
Contributes to: CPL-7768
And i want result in CSV file like below
Author ticket number Revision
Userx CF-1159 22904
Usery CF-955 22887
UserZ CPL-7768 22884

Your code writes a row as soon as it finds any field instead of waiting until it has read a full set of fields. The following edit waits for a full set and then writes the row.
with open('/tmp/out.csv', 'w') as csvfile:
fieldnames = ['Author', 'ticket number', 'Revision']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
row = {}
for TextLine in TextFileContent:
if 'Revision:' in TextLine:
row['Revision'] = TextLine.replace('Revision: ', "")
elif 'Author:' in TextLine:
row['Author'] = TextLine.replace("Author: ", "")
elif 'Contributes to:' in TextLine:
row['ticket number'] = TextLine.replace("Contributes to: ", "")
if len(row) == len(fieldnames):
writer.writerow(row)
row = {}
Note that this will not function correctly unless all records contain all fields.

Row lables to columns in multi column file

I am new to Python and am using version 2.7.1 as part of Hyperion FDMEE.
I have a file which I need to reorder the columns plus, split one column into 3 as part of the same file.
Source file;
ACCOUNT;UD1;UD2;UD3;PERIOD;PERIOD;AMOUNT
QTY;032074;99953;53;2017.07.31;2017.07.31;40.91
COGS;032074;99953;53;2017.07.31;2017.07.31;-7488.36
TURNOVER;032074;99953;53;2017.07.31;2017.07.31;505.73
QTY;032075;99960;60;2017.07.31;2017.07.31;40.91
COGS;032075;99960;60;2017.07.31;2017.07.31;-7488.36
TURNOVER;032075;99960;60;2017.07.31;2017.07.31;505.73
I have managed to reorder the columns per this script;
infilename = fdmContext["OUTBOXDIR"]+"/Targit_1707.dat"
outfilename = fdmContext["OUTBOXDIR"]+"/TargitExport.csv"
import csv
infile = open(infilename, 'r')
outfile = open(outfilename, 'w+')
for line in infile:
column = line.split(';')
outfile.write(column[1] + ";" + column[2] + ";" + column[3] + ";" + column[4] + ";" + column[0] + ";" + str(column[6].strip('\n')) + ";201701" + "\n")
outfile.close()
infile.close()
Producing the result;
UD1;UD2;UD3;PERIOD;ACCOUNT;AMOUNT;201701
032074;99953;53;2017.07.31;QTY;40.91;201701
032074;99953;53;2017.07.31;COGS;-7488.36;201701
032074;99953;53;2017.07.31;TURNOVER;505.73;201701
032075;99960;60;2017.07.31;QTY;40.91;201701
032075;99960;60;2017.07.31;COGS;-7488.36;201701
032075;99960;60;2017.07.31;TURNOVER;505.73;201701
but I am struggling to transpose the Account column (QTY, COGS, TURNOVER) into seperate columns as in the example below;
UD1;UD2;UD3;PERIOD;QTY;COGS;TURNOVER;201701
032074;99953;53;2017.07.31;40.91;-7488.36;505.73;201701
032075;99960;60;2017.07.31;40.91;-7488.36;505.73;201701
Any suggestions would be very much appreciated.

Use a dict, for instance:
import csv
fieldnames = infile.readline()[:-1]
fieldnames = fieldnames.split(';')[1:5] + ['QTY', 'COGS', 'TURNOVER']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
record_dict = {}
for i, line in enumerate(infile):
if not line: break
line = line[:-1].split(';')
# Assign column data every 1,2,3 lines
mod_row = (i % 3)+1
if mod_row == 1:
record_dict['QTY'] = line[6]
record_dict['UD1'] = line[1]
# ... and so on
if mod_row == 2:
record_dict['COGS'] = line[6]
if mod_row == 3:
record_dict['TURNOVER'] = line[6]
writer.writerow(record_dict)
record_dict = {}
Output:
UD1,UD2,UD3,PERIOD,QTY,COGS,TURNOVER
032074,,,,40.91,-7488.36,505.73
032075,,,,40.91,-7488.36,505.73
Tested with Python: 3.4.2
Read about:
Python » 3.6.1 Documentation csv.DictWriter

Group and Check-mark using Python

I have several files, each of which has data like this (filename:data inside separated by newline):
Mike: Plane\nCar
Paula: Plane\nTrain\nBoat\nCar
Bill: Boat\nTrain
Scott: Car
How can I create a csv file using python that groups all the different vehicles and then puts a X on the applicable person, like:

Assuming those line numbers aren't in there (easy enough to fix if they are), and with an input file like following:
Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car
Solution can be found here : https://gist.github.com/999481
import sys
from collections import defaultdict
import csv
# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python
def main():
# files = ["group.txt"]
files = sys.argv[1:]
if len(files) < 1:
print "usage: ./python_checkmark.py file1 [file2 ... filen]"
name_map = defaultdict(set)
for f in files:
file_handle = open(f, "r")
process_file(file_handle, name_map)
file_handle.close()
print_csv(sys.stdout, name_map)
def process_file(input_file, name_map):
cur_name = ""
for line in input_file:
if ":" in line:
cur_name, item = [x.strip() for x in line.split(":")]
else:
item = line.strip()
name_map[cur_name].add(item)
def print_csv(output_file, name_map):
names = name_map.keys()
items = set([])
for item_set in name_map.values():
items = items.union(item_set)
writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL)
writer.writerow( [""] + names )
for item in sorted(items):
row_contents = map(lambda name:"X" if item in name_map[name] else "", names)
row = [item] + row_contents
writer.writerow( row )
if __name__ == '__main__':
main()
Output:
,Mike,Bill,Scott,Paula
Boat,,X,,X
Car,X,,X,X
Plane,X,,,X
Train,,X,,X
Only thing this script doesn't do is keep the columns in order that the names are in. Could keep a separate list maintaining the order, since maps/dicts are inherently unordered.

Here is an example of how-to parse these kind of files.
Note that the dictionary is unordered here. You can use ordered dict (in case of Python 3.2 / 2.7) from standard library, find any available implmentation / backport in case if you have older Python versions or just save an order in additional list :)
data = {}
name = None
with open(file_path) as f:
for line in f:
if ':' in line: # we have a name here
name, first_vehicle = line.split(':')
data[name] = set([first_vehicle, ]) # a set of vehicles per name
else:
if name:
data[name].add(line)
# now a dictionary with names/vehicles is available
# let's convert it to simple csv-formatted string..
# a set of all available vehicles
vehicles = set(v for vlist in data.values()
for v in vlist)
for name in data:
name_vehicles = data[name]
csv_vehicles = ''
for v in vehicles:
if v in name_vehicles:
csv_vehicles += v
csv_vehicles += ','
csv_line = name + ',' + csv_vehicles

Assuming that the input looks like this:
Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car
This python script, places the vehicles in a dictionary, indexed by the person:
#!/usr/bin/python
persons={}
vehicles=set()
with open('input') as fd:
for line in fd:
line = line.strip()
if ':' in line:
tmp = line.split(':')
p = tmp[0].strip()
v = tmp[1].strip()
persons[p]=[v]
vehicles.add(v)
else:
persons[p].append(line)
vehicles.add(line)
for k,v in persons.iteritems():
print k,v
print 'vehicles', vehicles
Result:
Mike ['Plane', 'Car']
Bill ['Boat', 'Train']
Scott ['Car']
Paula ['Plane', 'Train', 'Boat', 'Car']
vehicles set(['Train', 'Car', 'Plane', 'Boat'])
Now, all the data needed are placed in data-structures. The csv-part is left as an exercise for the reader :-)

The most elegant and simple way would be like so:
vehiclesToPeople = {}
people = []
for root,dirs,files in os.walk('/path/to/folder/with/files'):
for file in files:
person = file
people += [person]
path = os.path.join(root, file)
with open(path) as f:
for vehicle in f:
vehiclesToPeople.setdefault(vehicle,set()).add(person)
people.sort()
table = [ ['']+people ]
for vehicle,owners in peopleToVehicles.items():
table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people])
csv = '\n'.join(','.join(row) for row in table)
You can do pprint.pprint(table) as well to look at it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to parse a txt file to csv and modify formatting - python

Related

Import CSV File and Doing arithmetic Opertions without importing any Library in PYTHON

Converting Fixed-Width File to .txt then the .txt to .csv

CSV file not properly filled up with details

Row lables to columns in multi column file

Group and Check-mark using Python

Categories

Resources