Convert list of dictionaries to csv file - python

I have a list of dictionaries of this kind :
[
{'site1':'data1'},
{'site2':'data2'}
]
What would be the proper way to generate a csv file with the data in this order ? :
row 1 row2
site1 data1
site2 data2

Loop through the dictionaries and write them to the file.
list_of_dicts = [{'site1':'data1'},{'site2':'data2'}]
with open('sites.csv', 'w') as file:
file.write('row1\trow2\n')
for dictionary in list_of_dicts:
file.write('\t'.join(list(dictionary.items())[0]) + '\n')
output:
row1 row2
site1 data1
site2 data2
Note that this requires each dictionary to only have one entry in it, if it has more, one is randomly selected and the others are ignored. There are many different ways to handle their being more than one entry in the dictionaries, so you must add the expected behaviour to the question statement for those cases to be accommodated for.

this should do the trick :)
data = [ {'site1':'data1'}, {'site2':'data2'} ]
with open ('list.csv', 'w') as f:
for dict in data:
for key, value in dict.items():
text = key+','+value+'\n'
f.writelines(text)

I like to use pandas's dataframe to make my data and write them into csv files
a = [{'site1':'data1'},{'site2':'data2'}]
#Get each key and values from each dictionaries in the list
keys = []
vals = []
for a1 in a:
for k, v in a1.items():
keys.append(k)
vals.append(v)
#make the dataframe from the keys and values
result = pd.DataFrame({'row1': keys, 'row2':vals})
#write the data into csv, use index=False to not write the row numbers
result.to_csv("mydata.csv", index=False)

You should use a CSV writer to make sure that any embedded metacharacters such as commas and quotes are escaped properly otherwise data such as {'site3':'data, data and more data'} will corrupt the file.
import csv
my_list = [{'site1':'data1'}, {'site2':'data2'}]
with open('test.csv', 'w', newline='') as out_fp:
writer = csv.writer(out_fp)
for d in my_list:
writer.writerows(d.items())
You could shorten that up a bit with itertools if you want
import itertools
with open('test.csv', 'w', newline='') as out_fp:
csv.writer(out_fp).writerows(itertools.chain.from_iterable(
d.items() for d in my_list))

Related

Exporting a complicated dictionary into a csv file

I have a dictionary in which the values are list. here is an example:
example:
d = {'20190606_CFMPB576run3_CF33456_12.RCC': [1.0354477611940298, '0.51'],
'20190606_CFMPB576run3_CF33457_05.RCC': [1.0412757973733584, '1.09'],
'20190606_CFMPB576run3_CF33505_06.RCC': [1.0531309297912714, '0.81']}
I am trying to export this dictionary into a csv file. like this expected output:
expected output:
file_name,citeria1,criteria2
20190606_CFMPB576run3_CF33456_12.RCC,1.0354477611940298, 0.51
20190606_CFMPB576run3_CF33457_05.RCC,1.0412757973733584,1.09
20190606_CFMPB576run3_CF33505_06.RCC,1.0531309297912714,0.81
to do so, I made the following code:
import csv
with open('mycsvfile.csv', 'w') as f:
header = ["file_name","citeria1","criteria2"]
w = csv.DictWriter(f, my_dict.keys())
w.writeheader()
w.writerow(d)
but it does not return what I want. do you know how to fix it?
Change as follows:
import csv
with open('mycsvfile.csv', 'w') as f:
header = ["file_name", "citeria1", "criteria2"]
w = csv.writer(f)
w.writerow(header)
for key, lst in d.items():
w.writerow([key] + lst)
A DictWriter is given the field/column names and assumes the rows to be provided as dictionaries with keys corresponding to the given field names. In your case, the data structure is different. You can use a simple csv.writer as your rows are a mixture of keys and values of your given dictionary.

writing to a single CSV file from multiple dictionaries

Background
I have multiple dictionaries of different lengths. I need to write the values of dictionaries to a single CSV file. I figured I can loop through each dictionary one by one and write the data to CSV. I ran in to a small formatting issue.
Problem/Solution
I realized after I loop through the first dictionary the data of the second writing gets written the row where the first dictionary ended as displayed in the first image I would ideally want my data to print as show in the second image
My Code
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for name in e:
writer.writerow({'emp_name':name,'age':e.get(name)})
for company in c:
writer.writerow({'company_name':company,'size':c.get(company)})
writeData()
PS: I would have more than 2 dictionaries so I am looking for a generic way where I can print data from row under the header for all the dictionaries. I am open to all solutions and suggestions.
If all dictionaries are of equal size, you could use zip to iterate over them in parallel. If they aren't of equal size, and you want the iteration to pad to the longest dict, you could use itertools.zip_longest
For example:
import csv
from itertools import zip_longest
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
If the dicts are of equal size, it's simpler:
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000, 'Yahoo': 3000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'age', 'company_name', 'size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip(e.items(), c.items()):
writer.writerow(employee + company)
writeData()
A little side note: If you use Python3, dictionaries are ordered. This isn't the case in Python2. So if you use Python2, you should use collections.OrderedDict instead of the standard dictionary.
There might be a more pythonic solution, but I'd do something like this:
I haven't used your .csv writer thing before, so I just made my own comma separated output.
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
dict_list = [e,c] # add more dicts here.
max_dict_size = max(len(d) for d in dict_list)
output = ""
# Add header information here.
for i in range(max_dict_size):
for j in range(len(dict_list)):
key, value = dict_list[j].popitem() if len(dict_list[j]) else ("","")
output += f"{key},{value},"
output += "\n"
# Now output should contain the full text of the .csv file
# Do file manipulation here.
# You could also do it after each row,
# Where I currently have the output += "\n"
Edit: A little more thinking and I found something that might polish this a bit. You could first map the dictionary into a list of keys using the .key() function on each dictionary and appending those to an empty list.
The advantage with that is that you'd be able to go "forward" instead of popping the dictionary items off the back. It also wouldn't destroy the dictionary.

How to sort uneven dictionary by key and create CSV

I have a python dictionary which for each KEY one can have a variable number of VALUES (arranged in a list).
For example:
{'607': [36146], '448': [50890, 44513], '626': [44349, 44436]}
What I'd like to do is generate a CSV of this information with a format like so:
448 , 607 , 626
50890,36146,44349
44513, ,44436
Currently my code can produce a CSV such as this, the only issue being that the columns of the CSV are not sorted according to the ascending numerical order of the KEYs. My code so far is below:
csv_file = 'file.csv'
with open(csv_file, 'wb') as fd:
writer = csv.writer(fd, delimiter = ',')
# Format headers for aesthetics
csv_headers = [' {} '.format(elem) for elem in dictionary.keys()]
writer.writerow(headers)
# Format data to create convenient csv format
csv_data = itertools.izip_longest(*dictionary.values(), fillvalue = ' ')
writer.writerows(csv_data)
As you can see I split the KEYs from the VALUEs and write them separately but if I want to sort the columns by the KEYs I imagine this is probably not the best way to go about this. Therefore, I was hoping someone could point me in the right (and most pythonic) direction.
You have two options:
Sort the keys, then extract values in the same order rather than rely on dictionary.values()
Use a csv.DictWriter() object and produce dictionaries per row.
Option 1 looks like this:
csv_file = 'file.csv'
with open(csv_file, 'wb') as fd:
writer = csv.writer(fd, delimiter=',')
keys = sorted(dictionary)
# Format headers for aesthetics
headers = [' {} '.format(key) for key in keys]
writer.writerow(headers)
# Format data to create convenient csv format
csv_data = itertools.izip_longest(*(dictionary[key] for key in keys),
fillvalue=' ')
writer.writerows(csv_data)
Using DictWriter would look like:
csv_file = 'file.csv'
with open(csv_file, 'wb') as fd:
writer = csv.DictWriter(
fd, sorted(dictionary), delimiter=',')
# write formatted headers
writer.writerow({k: ' {} '.format(k) for k in dicitonary})
csv_data = itertools.izip_longest(*dictionary.values(), fillvalue=' ')
writer.writerows(dict(zip(dictionary, row)) for row in csv_data)
I went for sorting and ending up with a transposed tuple of key and an iterable of the lists, then went from there:
import csv
from itertools import izip_longest
d = {'607': [36146], '448': [50890, 44513], '626': [44349, 44436]}
with open('output.csv', 'wb') as fout:
csvout = csv.writer(fout)
header, rows = zip(*sorted((k, iter(v)) for k, v in d.iteritems()))
csvout.writerow(format(el, '^5') for el in header)
csvout.writerows(izip_longest(*rows, fillvalue=' '))

Possibility of writing dictionary items in columns

i have a dictionary in which keys are tuples and values are list like
{('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'): [5.998999999999998,0.0013169999,
4.0000000000000972], ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'): [7.89899999,
0.15647999999675390, 8.764380000972, 9.200000000]}
I want to write this dictionary to a csv file in the column format like:
('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df') ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')
5.998999999999998 7.89899999
0.0013169999 0.15647999999675390
4.0000000000000972 8.764380000972
9.200000000
I know the same thing to write in row format using the code:
writer = csv.writer(open('dict.csv', 'wb'))
for key, value in mydict.items():
writer.writerow([key, value])
How do i write the same thing in columns? Is it even possible? Thanks in advance
I referred python docs for csv here: http://docs.python.org/2/library/csv.html. There is no information on column wise writing.
import csv
mydict = {('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'):
[5.998999999999998, 0.0013169999, 4.0000000000000972],
('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'):
[7.89899999, 0.15647999999675390, 8.764380000972, 9.200000000]}
with open('dict.csv', 'wb') as file:
writer = csv.writer(file, delimiter='\t')
writer.writerow(mydict.keys())
for row in zip(*mydict.values()):
writer.writerow(list(row))
Output file dict.csv:
('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df') ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')
5.998999999999998 7.89899999
0.0013169999 0.1564799999967539
4.000000000000097 8.764380000972
I am sure you can figure out the formatting:
>>> d.keys() #gives list of keys for first row
[('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'), ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')]
>>> for i in zip(*d.values()): #gives rows with tuple structure for columns
print i
(5.998999999999998, 7.89899999)
(0.0013169999, 0.1564799999967539)
(4.000000000000097, 8.764380000972)
For your code, do this:
writer = csv.writer(open('dict.csv', 'wb'))
writer.writerow(mydict.keys())
for values in zip(*mydict.values()):
writer.writerow(values)
The ()'s and such will not be added to the csv file.

Write Python dictionary to CSV where where keys= columns, values = rows

I have a list of dictionaries that I want to be able to open in Excel, formatted correctly. This is what I have so far, using csv:
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(ipath, 'wb')
writer = csv.writer(ofile, dialect = 'excel')
for items in list_of_dicts:
for k,v in items.items():
writer.writerow([k,v])
Obviously, when I open the output in Excel, it's formatted like this:
key value
key value
What I want is this:
key key key
value value value
I can't figure out how to do this, so help would be appreciated. Also, I want the column names to be the dictionary keys, in stead of the default 'A, B, C' etc. Sorry if this is stupid.
Thanks
The csv module has a DictWriter class for this, which is covered quite nicely in another SO answer. The critical point is that you need to know all your column headings when you instantiate the DictWriter. You could construct the list of field names from your list_of_dicts, if so your code becomes
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(out_path, 'wb')
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader() # Assumes Python >= 2.7
for row in list_of_dicts:
writer.writerow(row)
out_file.close()
The way I've constructed fieldnames scans the entire list_of_dicts, so it will slow down as the size increases. You should instead construct fieldnames directly from the source of your data e.g. if the source of your data is also a csv file you can use a DictReader and use fieldnames = reader.fieldnames.
You can also replace the for loop with a single call to writer.writerows(list_of_dicts) and use a with block to handle file closure, in which case your code would become
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
with open(out_path, 'wb') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader()
writer.writerows(list_of_dicts)
You need to write 2 separate rows, one with the keys, one with the values, instead:
writer = csv.writer(ofile, dialect = 'excel')
writer.writerow([k for d in list_of_dicts k in d])
writer.writerow([v for d in list_of_dicts v in d.itervalues()])
The two list comprehensions extract first all the keys, then all the values, from the dictionaries in your input list, combining these into one list to write to the CSV file.
I think that the most useful is to write the column by column, so each key is a column (good for later on data processing and use for e.g. ML).
I had some trouble yesterday figuring it out but I came up with the solution I saw on some other website. However, from what I see it is not possible to go through the whole dictionary at once and we have to divide it on smaller dictionaries (my csv file had 20k rows at the end - surveyed person, their data and answers. I did it like this:
# writing dict to csv
# 'cleaned' is a name of the output file
# 1 header
# fildnames is going to be columns names
# 2 create writer
writer = csv.DictWriter(cleaned, d.keys())
# 3 attach header
writer.writeheader()
# write separate dictionarties
for i in range(len(list(d.values())[0])):
writer.writerow({key:d[key][i] for key in d.keys()})
I see my solution has one more for loop but from the other hand, I think it takes less memory (but, I am not sure!!)
Hope it'd help somebody ;)

Categories