Dynamic CSV Header in Python

Dynamic CSV Header in Python - python

I am trying to write a csv file via Python3.
While parsing another file into this CSV, I don't know the exact number of columns I will get, so can anyone help me out how can I write a dynamic header in my CSV file based on the number of inputs I receive after parsing.
Below it is the example:
name,numberOfStudents,grade1,grade2,grade3, ... ,gradeN
The number of grades it is unknown, this is why I just need a sequence which will serve as a header.
I know I can write something like writerow(['name', 'numberOfStudents', 'grade1', 'grade2' ... ] but this cannot be the case if I do receive more than 200 grades (potentially)
P.S I am using csv module in Python and especially writerow(row) method of this module.

You can read the .csv to a dictionary and then get the keys.
def read_csv_to_dict(file_path):
with open(file_path) as f:
a = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True, delimiter=';')]
return a
data = read_csv_to_dict('sample.csv')
print(data[0].keys())
Modify the data as you want.
And then use a similar approach to write the data (a list of dictionaries, one line per row in the csv)
def write_dict_to_csv(file_path, my_dict):
keys = my_dict[0].keys()
with open(file_path, 'w') as output_file:
dict_writer = csv.DictWriter(output_file, keys, skipinitialspace=True, delimiter=';', lineterminator='\n')
dict_writer.writeheader()
dict_writer.writerows(my_dict)

You need a function like this to generate your header depending on the number of grades you get:
def func(number_of_grades):
header = ['name','numberOfStudents']
for grade_number in range(1, number_of_grades):
header.append('grade' + str(grade_number))
print(header)
return header

Related

How to convert a csv file into a json formatted file?

Most of the samples here show hard-coded columns and not an iteration. I have 73 columns I want iterated and expressed properly in the JSON.
import csv
import json
CSV_yearly = r'C:\path\yearly.csv'
JSON_yearly = r'C:\path\json_yearly.json'
with open(CSV_yearly, 'r') as csv_file:
reader = csv.DictReader(csv_file)
with open(JSON_yearly, 'w') as json_file:
for row in reader:
json_file.write(json.dumps(row) + ',' + '\n')
print "done"
Though this creates a json file it does one improperly. I saw examples where an argument inside reader requested a list, but i don't want to type out 73 columns from the csv. My guess is the line of code goes between the start of with and reader.

Your code creates each line in the file as a separate JSON object (sometimes called JsonL or json-lines format). Collect the rows in a list and then serialise as JSON:
with open(CSV_yearly, 'r') as csv_file:
reader = csv.DictReader(csv_file)
with open(JSON_yearly, 'w') as json_file:
rows = list(reader)
json.dump(rows, json_file)
Note that some consumers of JSON expect an object rather than a list as an outer container, in which case your data would have to be
rows = {'data': list(reader)}
Update: - questions from comments
Do you know why the result did not order my columns accordingly?
csv.DictReader uses a standard Python dictionary to create rows, so the order of keys is arbitrary in Python versions before 3.7. If key order must be preserved, try using an OrderedDict:
from collections import OrderedDict
out = []
with open('mycsv.csv', 'rb') as f:
reader = csv.reader(f)
headings = next(reader) # Assumes first row is headings, otherwise supply your own list
for row in reader:
od = OrderedDict(zip(headings, row))
out.append(od)
# dump out to file using json module
Be aware that while this may output json with the required key order, consumers of the json are not required to respect it.
Do you also know why my values in the json were converted into string and not remain as a number or without parenthesis.
All values from a csv are read as strings. If you want different types then you need to perform the necessary conversions after reading from the csv file.

writing to a single CSV file from multiple dictionaries

Background
I have multiple dictionaries of different lengths. I need to write the values of dictionaries to a single CSV file. I figured I can loop through each dictionary one by one and write the data to CSV. I ran in to a small formatting issue.
Problem/Solution
I realized after I loop through the first dictionary the data of the second writing gets written the row where the first dictionary ended as displayed in the first image I would ideally want my data to print as show in the second image
My Code
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for name in e:
writer.writerow({'emp_name':name,'age':e.get(name)})
for company in c:
writer.writerow({'company_name':company,'size':c.get(company)})
writeData()
PS: I would have more than 2 dictionaries so I am looking for a generic way where I can print data from row under the header for all the dictionaries. I am open to all solutions and suggestions.

If all dictionaries are of equal size, you could use zip to iterate over them in parallel. If they aren't of equal size, and you want the iteration to pad to the longest dict, you could use itertools.zip_longest
For example:
import csv
from itertools import zip_longest
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
If the dicts are of equal size, it's simpler:
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000, 'Yahoo': 3000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'age', 'company_name', 'size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip(e.items(), c.items()):
writer.writerow(employee + company)
writeData()
A little side note: If you use Python3, dictionaries are ordered. This isn't the case in Python2. So if you use Python2, you should use collections.OrderedDict instead of the standard dictionary.

There might be a more pythonic solution, but I'd do something like this:
I haven't used your .csv writer thing before, so I just made my own comma separated output.
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
dict_list = [e,c] # add more dicts here.
max_dict_size = max(len(d) for d in dict_list)
output = ""
# Add header information here.
for i in range(max_dict_size):
for j in range(len(dict_list)):
key, value = dict_list[j].popitem() if len(dict_list[j]) else ("","")
output += f"{key},{value},"
output += "\n"
# Now output should contain the full text of the .csv file
# Do file manipulation here.
# You could also do it after each row,
# Where I currently have the output += "\n"
Edit: A little more thinking and I found something that might polish this a bit. You could first map the dictionary into a list of keys using the .key() function on each dictionary and appending those to an empty list.
The advantage with that is that you'd be able to go "forward" instead of popping the dictionary items off the back. It also wouldn't destroy the dictionary.

Creating a single dictionary from two tab delimited files

I'm somewhat new to Python and still trying to learn all its tricks and exploitations.
I'm looking to see if it's possible to collect column data from two separate files to create a single dictionary, rather than two distinct dictionaries. The code that I've used to import files before looks like this:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open("myfile.txt") as f:
reader = csv.DictReader(f,delimiter='\t')
for row in reader:
for (header,variable) in row.items():
columns[header].append(variable)
f.close()
This code makes each element of the first line of the file into a header for the columns of data below it. What I'd like to do now is to import a file that only contains one line which I'll use as my header, and import another file that only contains data that I'll match the headers up to. What I've tried so far resembles this:
columns = defaultdict(list)
with open("headerData.txt") as g:
reader1 = csv.DictReader(g,delimiter='\t')
for row in reader1:
for (h,v) in row.items():
columns[h].append(v)
with open("variableData.txt") as f:
reader = csv.DictReader(f,delimiter='\t')
for row in reader:
for (h,v) in row.items():
columns[h].append(v)
Is nesting the open statements the right way to attempt this? Honestly I am totally lost on what to do. Any help is greatly appreciated.

You can't use DictReader like that if the headers are not in the file. But you can create a fake file object that would yield the headers and then the data, using itertools.chain:
from itertools import chain
with open('headerData.txt') as h, open('variableData.txt') as data:
f = chain(h, data)
reader = csv.DictReader(f,delimiter='\t')
# proceed with you code from the first snippet
# no close() calls needed when using open() with "with" statements
Another way of course would be to just read the headers into a list and use regular csv.reader on variableData.txt:
with open('headerData') as h:
names = next(h).split('\t')
with open('variableData.txt') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
for name, value in zip(names, row):
columns[name].append(value)

By default, DictReader will take the first line in your csv file and use that as the keys for the dict. However, according to the docs, you can also pass it a fieldnames parameter, which is a sequence containing the names of the keys to use for the dict. So you could do this:
columns = defaultdict(list)
with open("headerData.txt") as f, open("variableData.txt") as data:
reader = csv.DictReader(data,
fieldnames=f.read().rstrip().split('\t'),
delimiter='\t')
for row in reader:
for (h,v) in row.items():
columns[h].append(v)

Syntax - saving a dictionary as a csv file

I am trying to "clean up" some data - I'm creating a dictionary of the channels that I need to keep and then I've got an if block to create a second dictionary with the correct rounding.
Dictionary looks like this:
{'time, s': (imported array), 'x temp, C':(imported array),
'x pressure, kPa': (diff. imported array).....etc}
Each imported array is 1-d.
I was looking at this example, but I didn't quite get the way to parse it so that I ended up with what I want.
My desired output is a csv file (do not care if the delimiter is spaces or commas or whatever) with the first row being the keys and the subsequent rows simply being the values.
I feel like what I'm missing is how to use the map function properly.
Also, I'm wondering if I'm using DictWriter when I should be using DictReader.
This is what I originally tried:
with open((filename), 'wb') as outfile:
write = csv.DictWriter(outfile, Fieldname_order)
write.writer.writerow(Fieldname_order)
write.writerows(data)

DictWriter's API doesn't match the data structure you have. DictWriter requires list of dictionaries. You have a dictionary of lists.
You can use the ordinary csv.writer:
my_data = {'time, s': [0,1,2,3], 'x temp, C':[0,10,20,30],
'x pressure, kPa': [0,100,200,300]}
import csv
with open('outfile.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(my_data.keys())
writer.writerows(zip(*my_data.values()))
That will write the columns in arbitrary order, which order may change from run to run. One way to make the order to be consistent is to replace the last two lines with:
writer.writerow(sorted(my_data.keys()))
writer.writerows(zip(*(my_data[k] for k in sorted(my_data.keys()))))

Edit: in this example data is a list of dictionaries. Each row in the csv contains one value for each key.
To write your dictionary with a header row and then data rows:
with open(filename, 'wb') as outfile:
writer = csv.DictWriter(outfile, fieldnames)
writer.writeheader()
writer.writerows(data)
To read in data as a dictionary then you do need to use DictReader:
with open(filename, 'r') as infile:
reader = csv.DictReader(infile)
data = [row for row in reader]

Write Python dictionary to CSV where where keys= columns, values = rows

I have a list of dictionaries that I want to be able to open in Excel, formatted correctly. This is what I have so far, using csv:
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(ipath, 'wb')
writer = csv.writer(ofile, dialect = 'excel')
for items in list_of_dicts:
for k,v in items.items():
writer.writerow([k,v])
Obviously, when I open the output in Excel, it's formatted like this:
key value
key value
What I want is this:
key key key
value value value
I can't figure out how to do this, so help would be appreciated. Also, I want the column names to be the dictionary keys, in stead of the default 'A, B, C' etc. Sorry if this is stupid.
Thanks

The csv module has a DictWriter class for this, which is covered quite nicely in another SO answer. The critical point is that you need to know all your column headings when you instantiate the DictWriter. You could construct the list of field names from your list_of_dicts, if so your code becomes
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(out_path, 'wb')
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader() # Assumes Python >= 2.7
for row in list_of_dicts:
writer.writerow(row)
out_file.close()
The way I've constructed fieldnames scans the entire list_of_dicts, so it will slow down as the size increases. You should instead construct fieldnames directly from the source of your data e.g. if the source of your data is also a csv file you can use a DictReader and use fieldnames = reader.fieldnames.
You can also replace the for loop with a single call to writer.writerows(list_of_dicts) and use a with block to handle file closure, in which case your code would become
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
with open(out_path, 'wb') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader()
writer.writerows(list_of_dicts)

You need to write 2 separate rows, one with the keys, one with the values, instead:
writer = csv.writer(ofile, dialect = 'excel')
writer.writerow([k for d in list_of_dicts k in d])
writer.writerow([v for d in list_of_dicts v in d.itervalues()])
The two list comprehensions extract first all the keys, then all the values, from the dictionaries in your input list, combining these into one list to write to the CSV file.

I think that the most useful is to write the column by column, so each key is a column (good for later on data processing and use for e.g. ML).
I had some trouble yesterday figuring it out but I came up with the solution I saw on some other website. However, from what I see it is not possible to go through the whole dictionary at once and we have to divide it on smaller dictionaries (my csv file had 20k rows at the end - surveyed person, their data and answers. I did it like this:
# writing dict to csv
# 'cleaned' is a name of the output file
# 1 header
# fildnames is going to be columns names
# 2 create writer
writer = csv.DictWriter(cleaned, d.keys())
# 3 attach header
writer.writeheader()
# write separate dictionarties
for i in range(len(list(d.values())[0])):
writer.writerow({key:d[key][i] for key in d.keys()})
I see my solution has one more for loop but from the other hand, I think it takes less memory (but, I am not sure!!)
Hope it'd help somebody ;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamic CSV Header in Python - python

You need a function like this to generate your header depending on the number of grades you get: def func(number_of_grades): header = ['name','numberOfStudents'] for grade_number in range(1, number_of_grades): header.append('grade' + str(grade_number)) print(header) return header

Related

How to convert a csv file into a json formatted file?

writing to a single CSV file from multiple dictionaries

Creating a single dictionary from two tab delimited files

Syntax - saving a dictionary as a csv file

Write Python dictionary to CSV where where keys= columns, values = rows

Categories

Resources