How to turn CSV data into dictionary [duplicate] - python

This question already has answers here:
Convert .csv table to dictionary [duplicate]
(4 answers)
Closed 9 years ago.
I have a CSV file which I am opening through this code:
open(file,"r")
When I read the file I get the output:
['hello', 'hi', 'bye']
['jelly', 'belly', 'heli']
['red', 'black', 'blue']
I want the otput something like this:
{hello:['jelly','red'], hi:['belly','black'], 'bye':['heli','blue']}
but I have no idea how

You can use collections.defaultdict and csv.DictReader:
>>> import csv
>>> from collections import defaultdict
>>> with open('abc.csv') as f:
reader = csv.DictReader(f)
d = defaultdict(list)
for row in reader:
for k, v in row.items():
d[k].append(v)
...
>>> d
defaultdict(<type 'list'>,
{'hi': ['belly', 'black'],
'bye': ['heli', 'blue'],
'hello': ['jelly', 'red']})

csv = [
['hello', 'hi', 'bye'],
['jelly', 'belly', 'heli'],
['red', 'black', 'blue'],
]
csv = zip(*csv)
result = {}
for row in csv:
result[row[0]] = row[1:]

yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
yourHash[line[0]] = line[1:]
This assumes that each key is unique to one line. If not, this would have to be modified to:
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
if line[0] in yourHash:
yourHash[line[0]] += line[1:]
else:
yourHash[line[0]] = line[1:]
Of course, you can use csv, but I figured that someone would definitely post that, so I gave an alternative way to do it. Good luck!

You can use csv, read the first line to get the header, create the number of lists corresponding to the header and then create the dict:
import csv
with open(ur_csv) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=[[head] for head in next(reader)]
for row in reader:
for i, e in enumerate(row):
header[i].append(e)
data={l[0]:l[1:] for l in header}
print(data)
# {'hi': ['belly', 'black'], 'bye': ['heli', 'blue'], 'hello': ['jelly', 'red']}
If you want something more terse, you can use Jon Clements excellent solution:
with open(ur_csv) as fin:
csvin = csv.reader(fin, quotechar="'", skipinitialspace=True)
header = next(csvin, [])
data=dict(zip(header, zip(*csvin)))
# {'bye': ('heli', 'blue'), 'hello': ('jelly', 'red'), 'hi': ('belly', 'black')}
But that will produce a dictionary of tuples if that matters...
And if you csv file is huge, you may want to rewrite this to generate a dictionary row by row (similar to DictReader):
import csv
def key_gen(fn):
with open(fn) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=next(reader, [])
for row in reader:
yield dict(zip(header, row))
for e in key_gen(ur_csv):
print(e)
# {'hi': 'belly', 'bye': 'heli', 'hello': 'jelly'}
{'hi': 'black', 'bye': 'blue', 'hello': 'red'} etc...

Related

How can I read from this file and turn it to a dictionary?

I've tried several methods to read from this file and turn it to a dictionary but I'm having a lot of errors
I've tried the following method but it did not work I got not enough values unpack.
d = {}
with open("file.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
I want to read and convert it to this:
{123: ['Ahmed Rashed', 'a', '1000.0'], 456: ['Noof Khaled', 'c', '0.0'], 777: ['Ali Mahmood', 'a', '4500.0']}
Split on commas instead.
d = {}
with open("file.txt") as f:
for line in f:
parts = line.rstrip('\n').split(',')
d[int(parts[0])] = parts[1:]
Using csv.reader to read the file and split it into its fields:
import csv
with open("file.txt") as f:
d = {
int(num): data
for num, *data in csv.reader(f)
}

List to dictionary - improvement?

The following code does what I want, but any other way more python-style of doing it?
Having file in the format:
key1:value1,key2:value2,...
key21:value21,key22:value22,...
.
EOF
and code:
file = open(fileName, 'r')
for lines in file:
line = lines.split(",")
my_dict = {}
for item in line:
key_value = item.split(":")
my_dict.update({key_value[0]:key_value[1]})
Thanks
A faster & more pythonic way would be to use csv module (comma separated by default) and split items in a double flattened generator comprehension fed to dict that accepts tuples with 2 elements:
import csv
with open("test.csv",newline="") as f: # replace ,newline="" by ,"rb" in python 2
cr = csv.reader(f)
d = dict(x.split(":") for row in cr for x in row)
print(d)
result:
{'key1': 'value1', 'key22': 'value22', 'key21': 'value21', 'key2': 'value2'}
non-csv version:
import csv
with open("test.csv") as f:
d = dict(x.split(":") for line in f for x in line.split(","))
Using split():
list.txt:
key1:value1,key2:value2,key3:value3
key21:value21,key22:value22
Hence:
with open("list.txt") as fileObj:
content = fileObj.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
for elem in line.split(","):
print({elem .split(":")[0] : elem.split(":")[1]})
OUTPUT:
{'key1': 'value1'}
{'key2': 'value2'}
{'key3': 'value3'}
{'key21': 'value21'}
{'key22': 'value22'}
OR
If you want them stored in the dict:
for line in content:
for x in line.split(","):
dict_.update({x.split(":")[0] : x.split(":")[1]})
print(dict_['key1']) # value1

Python unique values per column in csv file row

Crunching on this for a long time. Is there an easy way using Numpy or Pandas or fixing my code to get the unique values for the column in a row separated by "|"
I.e the data:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"
Output should be:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
My broken code:
import csv
import pprint
your_list = csv.reader(open('out.csv'))
your_list = list(your_list)
#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
i=0
for col in line:
if i==cols_no:
print "\n"
i=0
if string in col:
values = col.split("|")
myset = set(values)
items = list()
for item in myset:
items.append(item)
print items
else:
print col+",",
i=i+1
It outputs:
id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,
Thanks in advance!
numpy/pandas is a bit overkill for what you can achieve with csv.DictReader and csv.DictWriter with a collections.OrderedDict, eg:
import csv
from collections import OrderedDict
# If using Python 2.x - use `open('output.csv', 'wb') instead
with open('input.csv') as fin, open('output.csv', 'w') as fout:
csvin = csv.DictReader(fin)
csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL)
csvout.writeheader()
for row in csvin:
for k, v in row.items():
row[k] = '|'.join(OrderedDict.fromkeys(v.split('|')))
csvout.writerow(row)
Gives you:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
If you don't care about the order when you have many items separated with |, this will work:
lst = ["id","fname","lname","education","gradyear","attributes",
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL",
"2","john","doe","htw","2000","dev"]
def no_duplicate(string):
return "|".join(set(string.split("|")))
result = map(no_duplicate, lst)
print result
result:
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes', '1', 'john', 'smith', 'ft|harvard|mit', '2003|207|212', 'NULL|admin,co|master|qa', '2', 'john', 'doe', 'htw', '2000', 'dev']

python csv to dictionary columnwise

Is it possible to read data from a csv file into a dictionary, such that the first row of a column is the key and the remaining rows of that same column constitute the value as a list?
E.g. I have a csv file
strings, numbers, colors
string1, 1, blue
string2, 2, red
string3, 3, green
string4, 4, yellow
using
with open(file,'rU') as f:
reader = csv.DictReader(f)
for row in reader:
print row
I obtain
{'color': 'blue', 'string': 'string1', 'number': '1'}
{'color': 'red', 'string': 'string2', 'number': '2'}
{'color': 'green', 'string': 'string3', 'number': '3'}
{'color': 'yellow', 'string': 'string4', 'number': '4'}
or using
with open(file,'rU') as f:
reader = csv.reader(f)
mydict = {rows[0]:rows[1:] for rows in reader}
print(mydict)
I obtain the following dictionary
{'string3': ['3', 'green'], 'string4': ['4', 'yellow'], 'string2': ['2', 'red'], 'string': ['number', 'color'], 'string1': ['1', 'blue']}
However, I would like to obtain
{'strings': ['string1', 'string2', 'string3', 'string4'], 'numbers': [1, 2, 3,4], 'colors': ['red', 'blue', 'green', 'yellow']}
You need to parse the first row, create the columns, and then progress to the rest of the rows.
For example:
columns = []
with open(file,'rU') as f:
reader = csv.reader(f)
for row in reader:
if columns:
for i, value in enumerate(row):
columns[i].append(value)
else:
# first row
columns = [[value] for value in row]
# you now have a column-major 2D array of your file.
as_dict = {c[0] : c[1:] for c in columns}
print(as_dict)
output:
{
' numbers': [' 1', ' 2', ' 3', ' 4'],
' colors ': [' blue', ' red', ' green', ' yellow'],
'strings': ['string1', 'string2', 'string3', 'string4']
}
(some weird spaces, which were in your input "file". Remove spaces before/after commas, or use value.strip() if they're in your real input.)
This is why we have the defaultdict
from collections import defaultdict
from csv import DictReader
columnwise_table = defaultdict(list)
with open(file, 'rU') as f:
reader = DictReader(f)
for row in reader:
for col, dat in row.items():
columnwise_table[col].append(dat)
print columnwise_table
Yes it is possible: Try it this way:
import csv
from collections import defaultdict
D=defaultdict(list)
csvfile=open('filename.csv')
reader= csv.DictReader(csvfile) # Dictreader uses the first row as dictionary keys
for l in reader: # each row is in the form {k1 : v1, ... kn : vn}
for k,v in l.items():
D[k].append(v)
...................
...................
Assuming filename.csv has some data like
strings,numbers,colors
string1,1,blue
string2,2,red
string3,3,green
string4,4,yellow
then D will result in
defaultdict(<class 'list'>,
{'numbers': ['1', '2', '3', '4'],
'strings': ['string1', 'string2', 'string3', 'string4'],
'colors': ['blue', 'red', 'green', 'yellow']})

Write dictionary (keys and values) to a csv file

I just want the csv file to look like this:
key,item1,item2,item3
key2,itema,itemB,itemC
and so on
The dictionary has a key and the value is a list of floats.
This is the current code I have to write to the csv file but all it does is write out the key like this: k,e,y,s
Any help is appreciated
with open(outFileName1, 'w') as outfile:
csv_Writer = csv.writer(outfile)
csv_Writer.writerows(dict1)
import csv
dict_data = {'key1': [1, 2, 3], 'key2': [4, 5, 6]}
with open("dict2csv.txt", 'w') as outfile:
csv_writer = csv.writer(outfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for k,v in dict_data.items():
csv_writer.writerow([k] + v)
This code will write each key, value pair in your desire format on separate line in csv file.
Without getting into details how CSV works you can easily solve it with something like:
with open("out.txt", 'w') as outfile:
for k,v in dict1.items():
outfile.write(str(k))
for item in v:
outfile.write(","+str(item))
outfile.write(" ")
Your current code iterates the dictionary which yields keys only. Take a look at
import csv
data = {
'key1': ['item1', 'item2'],
'key2': ['item3', 'item4']
}
with open('', 'w') as outfile:
writer = csv.writer(outfile)
for k, v in data.iteritems():
writer.writerow([k] + v)
Notice that it iterates key-value pairs returned by .iteritems(). The key is inserted into a list which is concatenated with the value list.

Categories