Python - txt to csv - python

I am new to Python. I am trying to transfer data from a text file to a csv file. I have included a short description of the data of my text file and csv file. Can someone point me to the right direction of what to read up to get this done?
**Input Text file**
01/20/18 12:19:35#
TARGET_CENTER_COLUMN=0
TARGET_CENTER_ROW=0
TARGET_COLUMN=0
BASELINE_AVERAGE=0
#
01/21/18 12:19:35#
TARGET_CENTER_COLUMN=0
TARGET_CENTER_ROW=13
TARGET_COLUMN=13
BASELINE_AVERAGE=26
#
01/23/18 12:19:36#
TARGET_COLUMN=340
TARGET_CENTER_COLUMN=223
TARGET_CENTER_ROW=3608, 3609, 3610
BASELINE_AVERAGE=28
#
01/24/18 12:19:37#
TARGET_CENTER_COLUMN=224
TARGET_CENTER_ROW=388
TARGET_COLUMN=348
BASELINE_AVERAGE=26
#
01/25/18 12:19:37#
TARGET_CENTER_COLUMN=224
TARGET_CENTER_ROW=388
TARGET_COLUMN=348
BASELINE_AVERAGE=26
#
01/27/18 12:19:37#
TARGET_CENTER_COLUMN=223
TARGET_COLUMN=3444
TARGET_CENTER_ROW=354
BASELINE_AVERAGE=25
#
**Output CSV file**
Date,Time,BASELINE_AVERAGE,TARGET_CENTER_COLUMN,TARGET_CENTER_ROW,TARGET_COLUMN
01/20/18,9:37:16 PM,0,0,0,0
01/21/18,9:37:16 PM,26,0,13,13
01/23/18,9:37:16 PM,28,223,3608,340
0,0,3609,0
0,0,3610,0
01/24/18,9:37:16 PM,26,224,388,348
01/25/18,9:37:16 PM,26,224,388,348
01/27/18,9:37:16 PM,25,223,354,344
Reading up online I've been able to implement this.
import csv
txt_file = r"DebugLog15test.txt"
csv_file = r"15test.csv"
mylist = ['Date','Time','BASELINE_AVERAGE' ,'TARGET_CENTER_COLUMN', 'TARGET_CENTER_ROW','TARGET_COLUMN']
in_txt = csv.reader(open(txt_file, "r"))
with open(csv_file, 'w') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Beyond this I was planning to start a for loop and read data till # as this would be 1 row, and then use the delimiter to find each '=' and insert the data into the appropriate location in a row list(do this by comparing the column header with the string prior to the delimiter) and populate the row accordingly. Do you think this approach is correct?
Thanks for your help!

Check out csv.DictWriter for a nicer approach. You give it a list of headers and then you can give it data dictionaries which it will write for you. The writing portion would look like this:
import csv
csv_file = "15test.csv"
headers = ['Date','Time','BASELINE_AVERAGE' ,'TARGET_CENTER_COLUMN', 'TARGET_CENTER_ROW','TARGET_COLUMN']
with open(csv_file, 'w') as myfile:
wr = csv.DictWriter(myfile, quoting=csv.QUOTE_ALL, headers = headers)
# data_dicts is a list of dictionaries looking like so:
# {'Date': '01/20/18', 'Time': '12:19:35', 'TARGET_CENTER_COLUMN': '0', ...}
wr.writerows(data_dicts)
As for reading your input, csv.reader won't be of much help: your input file isn't really anything like a csv file. You'd probably be better off writing your own parsing, although it'll be a bit messy because of the inconsistency of the input format. Here's how I would approach that. First, make a function to interpret each line:
def get_data_from_line(line):
line = line.strip()
if line == '#':
# We're between data sections; None will signal that
return None
if '=' in line:
# this is a "KEY=VALUE" line
key, value = line.split('=', 1)
return {key: value}
if ' ' in line:
# this is a "Date time" line
date, time = line.split(' ', 1)
return {'Date': date, 'Time': time}
# if we get here, either we've missed something or there's bad data
raise ValueError("Couldn't parse line: {}".format(line))
Then build the list of data dictionaries from the input file:
data_dicts = []
with open(txt_file) as infh:
data_dict = {}
for line in infh:
update = get_data_from_line(line)
if update is None:
# we're between sections; add our current data to the list,
# if we have data.
if data_dict:
data_dicts.append(data_dict)
data_dict = {}
else:
# this line had some data; this incorporates it into data_dict
data_dict.update(update)
# finally, if we don't have a section marker at the end,
# we need to append the last section's data
if data_dict:
data_dicts.append(data_dict)

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Reading csv file to dictionary - only reads first key+value

I have a function that reads a csv file into a dictionary but the next iterator seems to not be working since it inly reads the first pair key+value.
reader = csv.DictReader(open(folder_path+'/information.csv'))
info = next(reader)
My csv file is structured this way:
Test Name
mono1
Date
18/03/2021
Time
18:25
Camera
monochromatic
and the dictionary return is:
{'Test Name': 'mono1'}
Any idea of what's happening? Or a better way to read the file without having to change its structure?
Your file is not a CSV. It would need to be structured as follows:
Test Name,Date,Time,Camera
mono1,18/03/2021,18:25,monochromatic
Which would be read with:
import csv
with open('test.csv',newline='') as f:
reader = csv.DictReader(f)
for line in reader:
print(line)
Output:
{'Test Name': 'mono1', 'Date': '18/03/2021', 'Time': '18:25', 'Camera': 'monochromatic'}
To read the file you have, you could use:
with open('test.txt') as f:
lines = iter(f) # An iterator over the lines of the file
info = {}
for line in lines: # gets a line (key)
# rstrip() removes the newline at the end of each line
info[line.rstrip()] = next(lines).rstrip() # next() fetches another line (value)
print(info)
(same output)

Converting CSV data from file to JSON

I have a csv file that contains csv data separated by ','. I am trying to convert it into a json format. For this I am tyring to extract headers first. But, I am not able to differentiate between headers and the next row.
Here is the data in csv file:
Start Date ,Start Time,End Date,End Time,Event Title
9/5/2011,3:00:00 PM,9/5/2011,,Social Studies Dept. Meeting
9/5/2011,6:00:00 PM,9/5/2011,8:00:00 PM,Curriculum Meeting
I have tried csvreader as well but I got stuck at the same issue.
Basically Event Title and the date on the next line is not being distinguished.
with open(file_path, 'r') as f:
first_line = re.sub(r'\s+', '', f.read())
arr = []
headers = []
for header in f.readline().split(','):
headers.append(header)
for line in f.readlines():
lineItems = {}
for i,item in enumerate(line.split(',')):
lineItems[headers[i]] = item
arr.append(lineItems)
print(arr)
print(headers)
jsonText = json.dumps(arr)
print(jsonText)
All three print statements give empty result below.
[]
['']
[]
I expect jsonText to be a json of key value pairs.
Use csv.DictReader to get a list of dicts (each row is a dict) then serialize it.
import json
import csv
with open(csvfilepath) as f:
json.dump(list(csv.DictReader(f)), jsonfilepath))
In Python, each file has a marker that keeps track of where you are in the file. Once you call read(), you have read through the entire file, and all future read or readline calls will return nothing.
So, just delete the line involving first_line.

Python read a file replace a string in a word

I am trying to read a file with below data
Et1, Arista2, Ethernet1
Et2, Arista2, Ethernet2
Ma1, Arista2, Management1
I need to read the file replace Et with Ethernet and Ma with Management. At the end of them the digit should be the same. The actual output should be as follows
Ethernet1, Arista2, Ethernet1
Ethernet2, Arista2, Ethernet2
Management1, Arista2, Management1
I tried a code with Regular expressions, I am able to get to the point I can parse all Et1, Et2 and Ma1. But unable to replace them.
import re
with open('test.txt','r') as fin:
for line in fin:
data = re.findall(r'\A[A-Z][a-z]\Z\d[0-9]*', line)
print(data)
The output looks like this..
['Et1']
['Et2']
['Ma1']
import re
#to avoid compile in each iteration
re_et = re.compile(r'^Et(\d+),')
re_ma = re.compile(r'^Ma(\d+),')
with open('test.txt') as fin:
for line in fin:
data = re_et.sub('Ethernet\g<1>,', line.strip())
data = re_ma.sub('Management\g<1>,', data)
print(data)
This example follows Joseph Farah's suggestion
import csv
file_name = 'data.csv'
output_file_name = "corrected_data.csv"
data = []
with open(file_name, "rb") as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
data.append(row)
corrected_data = []
for row in data:
tmp_row = []
for col in row:
if 'Et' in col and not "Ethernet" in col:
col = col.replace("Et", "Ethernet")
elif 'Ma' in col and not "Management" in col:
col = col.replace("Ma", "Management")
tmp_row.append(col)
corrected_data.append(tmp_row)
with open(output_file_name, "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in corrected_data:
writer.writerow(row)
print data
Here are the steps you should take:
Read each line in the file
Separate each line into smaller list items using the comments as delimiters
Use str.replace() to replace the characters with the words you want; keep in mind that anything that says "Et" (including the beginning of the word "ethernet") will be replaced, so remember to account for that. Same goes for Ma and Management.
Roll it back into one big list and put it back in the file with file.write(). You may have to overwrite the original file.

Use Python to split a CSV file with multiple headers

I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".
How do I split the single CSV file into separate CSV files, one for each header row?
here is a sample file:
"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3
If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:
import re
with open(ur_csv) as f:
data=f.read()
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.
Then use the mmap string with a regex to separate the csv chunks like so:
import mmap
import re
with open(ur_csv) as f:
mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
In either case, this will write all the chunks in files named 1.csv, 2.csv etc.
Copy the input to a new output file each time you see a header line. Something like this (not checked for errors):
partNum = 1
outHandle = None
for line in open("yourfile.csv","r").readlines():
if line.startswith('"NAME"'):
if outHandle is not None:
outHandle.close()
outHandle = open("part%d.csv" % (partNum,), "w")
partNum += 1
outHandle.write(line)
outHandle.close()
The above will break if the input does not begin with a header line or if the input is empty.
You can use the python csv package to read your source file and write multile csv files based on the rule that if element 0 in your row == "NAME", spawn off a new file. Something like this...
import csv
outfile_name = "out_%.csv"
out_num = 1
with open('nameslist.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csv_buffer = []
for row in csvreader:
if row[0] != "NAME":
csv_buffer.append(row)
else:
with open(outfile_name % out_num, 'wb') as csvout:
for b_row in csv_buffer:
csvout.writerow(b_row)
out_num += 1
csv_buffer = [row]
P.S. I haven't actually tested this but that's the general concept
Given the other answers, the only modification that I would suggest would be to open using csv.DictReader. pseudo code would be like this. Assuming that the first line in the file is the first header
Note that this assumes that there is no blank line or other indicator between the entries so that a 'NAME' header occurs right after data. If there were a blank line between appended files the you could use that as an indicator to use infile.fieldnames() on the next row. If you need to handle the inputs as a list, then the previous answers are better.
ifile = open(filename, 'rb')
infile = cvs.Dictreader(ifile)
infields = infile.fieldnames
filenum = 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
for row in infile:
if row['NAME'] != 'NAME':
#process this row here and do whatever is needed
else:
close(ofile)
# build infields again from this row
infields = [row["NAME"], ...] # This assumes you know the names & order
# Dict cannot be pulled as a list and keep the order that you want.
filenum += 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
# This is the end of the loop. All data has been read and processed
close(ofile)
close(ifile)
If the exact order of the new header does not matter except for the name in the first entry, then you can transfer the new list as follows:
infileds = [row['NAME']
for k in row.keys():
if k != 'NAME':
infields.append(row[k])
This will create the new header with NAME in entry 0 but the others will not be in any particular order.

Categories