I have a function that reads a csv file into a dictionary but the next iterator seems to not be working since it inly reads the first pair key+value.
reader = csv.DictReader(open(folder_path+'/information.csv'))
info = next(reader)
My csv file is structured this way:
Test Name
mono1
Date
18/03/2021
Time
18:25
Camera
monochromatic
and the dictionary return is:
{'Test Name': 'mono1'}
Any idea of what's happening? Or a better way to read the file without having to change its structure?
Your file is not a CSV. It would need to be structured as follows:
Test Name,Date,Time,Camera
mono1,18/03/2021,18:25,monochromatic
Which would be read with:
import csv
with open('test.csv',newline='') as f:
reader = csv.DictReader(f)
for line in reader:
print(line)
Output:
{'Test Name': 'mono1', 'Date': '18/03/2021', 'Time': '18:25', 'Camera': 'monochromatic'}
To read the file you have, you could use:
with open('test.txt') as f:
lines = iter(f) # An iterator over the lines of the file
info = {}
for line in lines: # gets a line (key)
# rstrip() removes the newline at the end of each line
info[line.rstrip()] = next(lines).rstrip() # next() fetches another line (value)
print(info)
(same output)
Related
Suppose I have sample data in an Excel document:
header1
header2
header3
some data
testing
123
moar data
hello!
456
I export this data to csv format with Excel, with File > Save as > .csv
This is my data sample.csv:
$ cat sample.csv
header1,header2,header3
some data,testing,123
moar data,hello!,456%
Note that Excel apparently does not add a newline at the end, by default -- this is indicated by % at the end.
Now let's say I want to append a row(s) to the CSV file. I can use csv module to do that:
import csv
def append_to_csv_file(file: str, row: dict, encoding=None) -> None:
# open file for reading and writing
with open(file, 'a+', newline='', encoding=encoding) as out_file:
# retrieve field names (CSV file headers)
reader = csv.reader(out_file)
out_file.seek(0)
field_names = next(reader, None)
# add new row to the CSV file
writer = csv.DictWriter(out_file, field_names)
writer.writerow(row)
row = {'header1': 'new data', 'header2': 'blah', 'header3': 789}
append_to_csv_file('sample.csv', row)
So now a newline is added to end of file, but problem is that the data is added to end of last line, rather than on a separate line:
$ cat sample.csv
header1,header2,header3
some data,testing,123
moar data,hello!,456new data,blah,789
This causes issue when I want to read back the updated data from the file:
with open('sample.csv', newline='') as f:
print(list(csv.DictReader(f)))
# [{..., 'header3': '456new data', None: ['blah', '789']}]
Question: so what is the best option to handle case when CSV file might not have newline at the end, when appending a row(s) to file.
Current attempt
This is my solution to work around case when appending to CSV file, but file may not end with a newline character:
import csv
def append_to_csv_file(file: str, row: dict, encoding=None) -> None:
with open(file, 'a+', newline='', encoding=encoding,) as out_file:
# get current file position
pos = out_file.tell()
print('pos:', pos)
# seek to one character back
out_file.seek(pos - 1)
# read in last character
c = out_file.read(1)
print(out_file.tell(), repr(c))
if c != '\n':
delta = out_file.write('\n')
pos += delta
print('new_pos:', pos)
# retrieve field names (CSV file headers)
reader = csv.reader(out_file)
out_file.seek(0)
field_names = next(reader, None)
# add new row to the CSV file
writer = csv.DictWriter(out_file, field_names)
# out_file.seek(pos + 1)
writer.writerow(row)
row = {'header1': 'new data', 'header2': 'blah', 'header3': 789}
append_to_csv_file('sample.csv', row)
This is output from running the script:
pos: 68
68 '6'
new_pos: 69
The contents of CSV file now look as expected:
$ cat sample.csv
header1,header2,header3
some data,testing,123
moar data,hello!,456
new data,blah,789
I am wondering if anyone knows of an easier way to do this. I feel like I might be overthinking this a bit. I basically want to account for cases where CSV file might need a newline added to end, before a new row is appended to end of file.
If it helps, I am running this on a Mac OS environment.
I am reading a CSV file called: candidates.csv line by line (row by row) like follows:
import csv
for line in open('candidates.csv'):
csv_row = line.strip().split(',')
check_update(csv_row[7]) #check_update is a function that returns an int
How can I append the data that the check_updates function returns at the end of the line (row) that I am reading?
Here is what I have tried:
for line in open('candidates.csv'):
csv_row = line.strip().split(',')
data_to_add = check_update(csv_row[7])
with open('candidates.csv','a') as f:
writer = csv.writer(f)
writer.writerow(data_to_add)
Getting this error:
_csv.Error: iterable expected, not NoneType
Also not entirely sure that this would have added in the right place at the end of the row that I was reading.
Bottom line, how to best add data at the end of the row that I am currently reading?
Do backup your file before trying just in case.
You can write a new temporary file and move that into the place over the old file you read from.
from tempfile import NamedTemporaryFile
import shutil
import csv
filename = 'candidates.csv'
tempfile = NamedTemporaryFile('w', delete=False)
with open(filename, 'r', newline='') as csvFile, tempfile:
writer = csv.writer(tempfile)
for line in csvFile:
csv_row = line.strip().split(',')
csv_row.append(check_update(csv_row[7])) # this will add the data to the end of the list.
writer.writerow(csv_row)
shutil.move(tempfile.name, filename)
I have a csv file that contains csv data separated by ','. I am trying to convert it into a json format. For this I am tyring to extract headers first. But, I am not able to differentiate between headers and the next row.
Here is the data in csv file:
Start Date ,Start Time,End Date,End Time,Event Title
9/5/2011,3:00:00 PM,9/5/2011,,Social Studies Dept. Meeting
9/5/2011,6:00:00 PM,9/5/2011,8:00:00 PM,Curriculum Meeting
I have tried csvreader as well but I got stuck at the same issue.
Basically Event Title and the date on the next line is not being distinguished.
with open(file_path, 'r') as f:
first_line = re.sub(r'\s+', '', f.read())
arr = []
headers = []
for header in f.readline().split(','):
headers.append(header)
for line in f.readlines():
lineItems = {}
for i,item in enumerate(line.split(',')):
lineItems[headers[i]] = item
arr.append(lineItems)
print(arr)
print(headers)
jsonText = json.dumps(arr)
print(jsonText)
All three print statements give empty result below.
[]
['']
[]
I expect jsonText to be a json of key value pairs.
Use csv.DictReader to get a list of dicts (each row is a dict) then serialize it.
import json
import csv
with open(csvfilepath) as f:
json.dump(list(csv.DictReader(f)), jsonfilepath))
In Python, each file has a marker that keeps track of where you are in the file. Once you call read(), you have read through the entire file, and all future read or readline calls will return nothing.
So, just delete the line involving first_line.
I am new to Python. I am trying to transfer data from a text file to a csv file. I have included a short description of the data of my text file and csv file. Can someone point me to the right direction of what to read up to get this done?
**Input Text file**
01/20/18 12:19:35#
TARGET_CENTER_COLUMN=0
TARGET_CENTER_ROW=0
TARGET_COLUMN=0
BASELINE_AVERAGE=0
#
01/21/18 12:19:35#
TARGET_CENTER_COLUMN=0
TARGET_CENTER_ROW=13
TARGET_COLUMN=13
BASELINE_AVERAGE=26
#
01/23/18 12:19:36#
TARGET_COLUMN=340
TARGET_CENTER_COLUMN=223
TARGET_CENTER_ROW=3608, 3609, 3610
BASELINE_AVERAGE=28
#
01/24/18 12:19:37#
TARGET_CENTER_COLUMN=224
TARGET_CENTER_ROW=388
TARGET_COLUMN=348
BASELINE_AVERAGE=26
#
01/25/18 12:19:37#
TARGET_CENTER_COLUMN=224
TARGET_CENTER_ROW=388
TARGET_COLUMN=348
BASELINE_AVERAGE=26
#
01/27/18 12:19:37#
TARGET_CENTER_COLUMN=223
TARGET_COLUMN=3444
TARGET_CENTER_ROW=354
BASELINE_AVERAGE=25
#
**Output CSV file**
Date,Time,BASELINE_AVERAGE,TARGET_CENTER_COLUMN,TARGET_CENTER_ROW,TARGET_COLUMN
01/20/18,9:37:16 PM,0,0,0,0
01/21/18,9:37:16 PM,26,0,13,13
01/23/18,9:37:16 PM,28,223,3608,340
0,0,3609,0
0,0,3610,0
01/24/18,9:37:16 PM,26,224,388,348
01/25/18,9:37:16 PM,26,224,388,348
01/27/18,9:37:16 PM,25,223,354,344
Reading up online I've been able to implement this.
import csv
txt_file = r"DebugLog15test.txt"
csv_file = r"15test.csv"
mylist = ['Date','Time','BASELINE_AVERAGE' ,'TARGET_CENTER_COLUMN', 'TARGET_CENTER_ROW','TARGET_COLUMN']
in_txt = csv.reader(open(txt_file, "r"))
with open(csv_file, 'w') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
Beyond this I was planning to start a for loop and read data till # as this would be 1 row, and then use the delimiter to find each '=' and insert the data into the appropriate location in a row list(do this by comparing the column header with the string prior to the delimiter) and populate the row accordingly. Do you think this approach is correct?
Thanks for your help!
Check out csv.DictWriter for a nicer approach. You give it a list of headers and then you can give it data dictionaries which it will write for you. The writing portion would look like this:
import csv
csv_file = "15test.csv"
headers = ['Date','Time','BASELINE_AVERAGE' ,'TARGET_CENTER_COLUMN', 'TARGET_CENTER_ROW','TARGET_COLUMN']
with open(csv_file, 'w') as myfile:
wr = csv.DictWriter(myfile, quoting=csv.QUOTE_ALL, headers = headers)
# data_dicts is a list of dictionaries looking like so:
# {'Date': '01/20/18', 'Time': '12:19:35', 'TARGET_CENTER_COLUMN': '0', ...}
wr.writerows(data_dicts)
As for reading your input, csv.reader won't be of much help: your input file isn't really anything like a csv file. You'd probably be better off writing your own parsing, although it'll be a bit messy because of the inconsistency of the input format. Here's how I would approach that. First, make a function to interpret each line:
def get_data_from_line(line):
line = line.strip()
if line == '#':
# We're between data sections; None will signal that
return None
if '=' in line:
# this is a "KEY=VALUE" line
key, value = line.split('=', 1)
return {key: value}
if ' ' in line:
# this is a "Date time" line
date, time = line.split(' ', 1)
return {'Date': date, 'Time': time}
# if we get here, either we've missed something or there's bad data
raise ValueError("Couldn't parse line: {}".format(line))
Then build the list of data dictionaries from the input file:
data_dicts = []
with open(txt_file) as infh:
data_dict = {}
for line in infh:
update = get_data_from_line(line)
if update is None:
# we're between sections; add our current data to the list,
# if we have data.
if data_dict:
data_dicts.append(data_dict)
data_dict = {}
else:
# this line had some data; this incorporates it into data_dict
data_dict.update(update)
# finally, if we don't have a section marker at the end,
# we need to append the last section's data
if data_dict:
data_dicts.append(data_dict)
I'm new to python and I struggling with this code. Have 2 file, 1st file is text file containing email addresses (one each line), 2nd file is csv file with 5-6 columns. Script should take search input from file1 and search in file 2, the output should be stored in another csv file (only first 3 columns) see example below. Also I have copied a script that I was working on. If there is a better/efficient script then please let me know. Thank you, appreciate your help.
File1 (output.txt)
rrr#company.com
eee#company.com
ccc#company.com
File2 (final.csv)
Sam,Smith,sss#company.com,admin
Eric,Smith,eee#company.com,finance
Joe,Doe,jjj#company.com,telcom
Chase,Li,ccc#company.com,IT
output (out_name_email.csv)
Eric,Smith,eee#company.com
Chase,Li,ccc#company.com
Here is the script
import csv
outputfile = 'C:\\Python27\\scripts\\out_name_email.csv'
inputfile = 'C:\\Python27\\scripts\\output.txt'
datafile = 'C:\\Python27\\scripts\\final.csv'
names=[]
with open(inputfile) as f:
for line in f:
names.append(line)
with open(datafile, 'rb') as fd, open(outputfile, 'wb') as fp_out1:
writer = csv.writer(fp_out1, delimiter=",")
reader = csv.reader(fd, delimiter=",")
headers = next(reader)
for row in fd:
for name in names:
if name in line:
writer.writerow(row)
Load the emails into a set for O(1) lookup:
with open(inputfile) as fin:
emails = set(line.strip() for line in fin)
Then loop over the rows once, and check it exists in emails - no need to loop over each possible match for each row:
# ...
for row in reader:
if row[1] in emails:
writer.writerow(row)
If you're not doing anything else, then you can make it:
writer.writerows(row for row in reader if row[1] in emails)
A couple of notes, in your original code you're not using the csv.reader object reader - you're looping over fd and you appear to have some naming issues with names and line and row...