For the following CSV File:
A,B,C
-----
A1,B1,C1
A1,B2,C2
A2,B3,C3
A2,B4,C4
My dictionary currently looks like this:
{'A1': {'B':'B1', 'C':'C1'}, 'A2': {'B':'B3', 'C':'C3'}
How do I get my dictionary to look like this:
'A1': {'B': ['B1', 'B2'], 'C': ['C1', 'C2']}, 'A2': {'B': ['B3', 'B4'], 'C': ['C3', 'C4']}}
I'm using the following code at the moment:
import csv
reader = csv.DictReader(open('test.csv'))
result = {}
for row in reader:
key = row.pop('A')
if key in result: pass
result[key] = row
print result
You need to create a base case for each key, such that the dictionary inserts the first value as a list. Then you can append values for duplicate keys as they are encountered.
The following code should do what you need:
with open('test.csv') as f:
reader = csv.DictReader(f)
for row in reader:
key = row.pop('A')
if '-' in key:
continue
if key not in result:
new_row = {'B': [row.pop('B')], 'C': [row.pop('C')]}
result[key] = new_row
else:
result[key]['B'].append(row.pop('B'))
result[key]['C'].append(row.pop('C'))
You don't have to use DictReader to achieve this. You can just use regular csv.reader and fill up your own dictionary.
Here is a commented simple solution:
from __future__ import print_function
import csv
csv_fpath = 'test.csv'
# readcsv.py
# You want this:
#{'A1': {'B':['B1','B2'], 'C':['C1','C2']}, 'A2': {'B':['B3','B4'], ..}}
mydict = {}
# newline = '' option is needed as per csv.reader documentation python 3.x
with open(csv_fpath, mode='r') as csvfile:
# A regular csv reader object
myreader = csv.reader(csvfile, delimiter=',')
# Header on first line
hrow = next(myreader)
# # Tagging header names for dictionary keys later
taga, tagb, tagc = hrow[0], hrow[1], hrow[2]
# Skip separator line (delete this line if unnecessary)
next(myreader)
# Reading data and constructing our dictionary
for row in myreader:
if len(row) == 0:
# ignore blank lines
continue
# Each row's key is the first column value
key = row[0]
if key in mydict:
# If an item exists with the given key, that item itself is also a
# dictionary with lists in keys tagb and tagc. So we append to those
# lists the values in second and third columns
mydict[key][tagb].append(row[1])
mydict[key][tagc].append(row[2])
else:
# Note the list constructors, they are important as we are going to
# append them down the iteration
mydict[key] = { tagb: [row[1]]
, tagc: [row[2]]}
print(mydict)
Slightly different approach:
reader = csv.DictReader(open("test.csv"))
result = {}
for row in reader:
if reader.line_num <= 2:
continue
key = row["A"]
for subkey in [k for k in row.keys() if k != "A"]:
if key not in result:
result[key] = {}
if subkey not in result[key]:
result[key][subkey] = []
result[key][subkey].append(row[subkey])
>>> print(result)
{'A2': {'C': ['C3', 'C4'], 'B': ['B3', 'B4']}, 'A1': {'C': ['C1', 'C2'], 'B': ['B1', 'B2']}}
Related
I am trying to create a dictionary of dictionaries in Python from a CSV file, the file looks something like this:
Column 1
Column 2
Column 3
A
flower
12
A
sun
13
B
cloud
14
B
water
34
C
rock
12
And I am trying to get a dictionary of dictionaries that looks like this:
dict = {
'A': {'flower': 12, 'sun': 13},
'B': {'cloud': 14, 'water': 34},
'C': {'rock': 12}
}
The code I tried so far is as follows:
import csv
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d=dict()
for row in rows:
head,tail=row[0], row[1:]
d[head]=dict(zip(tail[0:], tail[1:]))
print(d)
but it's not working well as I am getting this result:
dict = {
'A': {'sun': 13},
'B': {'water': 34},
'C': {'rock': 12}
}
You need to update your d[head] every iteration, not replace it:
import csv
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d=dict()
for row in rows:
head,name,value=row[0], row[1], row[2]
if head not in d:
d[head]= {} # {} is like dict() but faster
d[head][name] = value
print(d)
Or with defaultdict to be more concise:
import csv
from collections import defaultdict
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d = defaultdict(dict)
for row in rows:
head,name,value=row[0], row[1], row[2]
d[head][name] = value
print(d) # or print(dict(d))
So my question is this. I have these JSON files stored in a list called json_list
['9.json',
'8.json',
'7.json',
'6.json',
'5.json',
'4.json',
'3.json',
'2.json',
'10.json',
'1.json',]
Each of these files contains a dictionary with an (ID NUMBER: Rating).
This is my code below. The idea is to store all of the keys and values of these files into a dictionary so it will be easier to search through. I've separated the keys and values so it will be easier to add into the dictionary. The PROBLEM is that this iteration only goes through the file '1.json' and then stops. I'm not sure why its not going through all 10.
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
keys = data.keys()
values = data.values()
Here:
data = json.loads(file)
keys = data.keys()
values = data.values()
You're resetting the value for keys and values instead of appending to it.
Maybe try appending them, something like (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
data = json.loads(file)
keys += list(data.keys())
values += list(data.values())
Or better yet just append the dictionary (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
all_data = {}
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
all_data = {**all_data, **data}
Working example:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"foo":"bar","eggs":"spam","xxx":"yyy"}']
all_data = {}
for d in ds:
data = json.loads(d)
all_data = {**all_data, **data}
print (all_data)
Output:
{'1': 'a', '2': 'b', '3': 'c', 'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44', 'foo': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}
If the keys are not unique try appending the dictionaries to a list of dictionaries like this:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"dd":"bar","eggs":"spam","xxx":"yyy"}']
all_dicts= []
for d in ds:
data = json.loads(d)
all_dicts.append(data)
print (all_dicts)
# to access key
print (all_dicts[0]["1"])
Output:
[{'1': 'a', '2': 'b', '3': 'c'}, {'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44'}, {'dd': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}]
a
I was wondering how I could iterate over two dictionaries: yin and BL.
I have the following code so far to iterate over yin only:
with open('output.csv', 'wb') as output:
writer = csv.writer(output)
for key, value in yin.iteritems():
writer.writerow([key, value])
yin has values in a dictionary:
{'a': 2248433.0, 'b': 280955.0, 'c': 0.0}
BL has values in a dictionary:
{'a': 27.2, 'b': 57.6, 'c': 0.0}
I want to save it to an excel file so it looks like:
a 2248433.0 27.2
b 280955.0 57.6
c 0.0 0.0
Should I do the following?
with open('output.csv', 'wb') as output:
writer = csv.writer(output)
for key, value, valye in yin.iteritems(), BL.iteritems:
writer.writerow([key, value, value])
I also want the dictionaries to be listed in the same corresponding order in the CSV file. As shown in the table, I want row1: 2248433.0 to correspond to 27.2.
This was the code used to generate dictionaries:
yin = {}
BL = {}
for asdf in glob.glob(ay):
poregn = numpy.genfromtxt(asdf)
btwnROIs = poregn[2:size+2, 0:size]
BLu = poregn[(size*5)+2:(size*5)+size+2, 0:size]
for upmatSC in (list(combinations(range(size_FC),2))):
yin[FC_path1 + '_' + FC_path2 + '_' + str(upmatSC)] = btwnROIs[tuple(upmatSC)]
BL[FC_path1 + '_' + FC_path2 + '_' + str(upmatSC)] = BLu[tuple(upmatSC)]
To explain the code: basically I'm taking two separate matrices and extracting the upper half of each of the matrix and storing these values in two separate dictionaries.
If you have two dictionaries yin and bl, this would be how you would combine the dictionaries in the manner you described and write them to a CSV file:
import csv
yin = {'a': 2248433.0, 'b': 280955.0, 'c': 0.0}
bl = {'a': 27.2, 'b': 57.6, 'c': 0.0}
with open('output.csv', 'w') as output:
cw = csv.writer(output)
for k in yin.keys():
cw.writerow([k, yin[k], bl[k]])
So I have a CSV file with the data arranged like this:
X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3
I want to import the CSV to create a nested dictionary so that looks like this.
data = {'X' : {'a' : 1, 'b' : 2, 'c' : 3},
'y' : {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4},
'Z' : {'l' : 2, 'm' :3}}
After updating the dictionary in the program I wrote (I got that part figured out), I want to be able to export the dictionary onto the same CSV file, overwriting/updating it. However I want it to be in the same format as the previous CSV file so that I can import it again.
I have been playing around with the import and have this so far
import csv
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = {row[i] for i in range(1, len(row))}
But this doesn't work as things are not arranged correctly. Some numbers are subkeys to other numbers, letters are out of place, etc. I haven't even gotten to the export part yet. Any ideas?
Since you're not interested in preserving order, something relatively simple should work:
import csv
# import
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
a = iter(row[1:])
data[row[0]] = dict(zip(a, a))
# export
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
for key, values in data.items():
row = [key] + [value for item in values.items() for value in item]
writer.writerow(row)
The latter could be done a little more efficiently by making only a single call to thecsv.writer's writerows()method and passing it a generator expression.
# export2
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
rows = ([key] + [value for item in values.items() for value in item]
for key, values in data.items())
writer.writerows(rows)
You can use the grouper recipe from itertools:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
This will group your data into the a1/b2/c3 pairs you want. So you can do data[row[0]] = {k: v for k, v in grouper(row[1:], 2)} in your loop.
from collections import defaultdict
data_lines = """X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3""".splitlines()
data = defaultdict(dict)
for line in data_lines:
# you should probably add guards against invalid data, empty lines etc.
main_key, sep, tail = line.partition(',')
items = [item.strip() for item in tail.split(',')]
items = zip(items[::2], map(int, items[1::2])
# data[main_key] = {key : value for key, value in items}
data[main_key] = dict(items)
print dict(data)
# {'Y': {'a': '1', 'c': '3', 'b': '2', 'd': '4'},
# 'X': {'a': '1', 'c': '3', 'b': '2'},
# 'Z': {'m': '3', 'l': '2'}
# }
I'm lazy, so I might do something like this:
import csv
data = {}
with open('userdata.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = dict(zip(row[1::2], map(int,row[2::2])))
which works because row[1::2] gives every other element starting at 1, and row[2::2 every other element starting at 2. zip makes a tuple pair of those elements, and then we pass that to dict. This gives
{'Y': {'a': 1, 'c': 3, 'b': 2, 'd': 4},
'X': {'a': 1, 'c': 3, 'b': 2},
'Z': {'m': 3, 'l': 2}}
(Note that I changed your open to use 'rb', which is right for Python 2: if you're using 3, you want 'r', newline='' instead.)
This question already has answers here:
Convert .csv table to dictionary [duplicate]
(4 answers)
Closed 9 years ago.
I have a CSV file which I am opening through this code:
open(file,"r")
When I read the file I get the output:
['hello', 'hi', 'bye']
['jelly', 'belly', 'heli']
['red', 'black', 'blue']
I want the otput something like this:
{hello:['jelly','red'], hi:['belly','black'], 'bye':['heli','blue']}
but I have no idea how
You can use collections.defaultdict and csv.DictReader:
>>> import csv
>>> from collections import defaultdict
>>> with open('abc.csv') as f:
reader = csv.DictReader(f)
d = defaultdict(list)
for row in reader:
for k, v in row.items():
d[k].append(v)
...
>>> d
defaultdict(<type 'list'>,
{'hi': ['belly', 'black'],
'bye': ['heli', 'blue'],
'hello': ['jelly', 'red']})
csv = [
['hello', 'hi', 'bye'],
['jelly', 'belly', 'heli'],
['red', 'black', 'blue'],
]
csv = zip(*csv)
result = {}
for row in csv:
result[row[0]] = row[1:]
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
yourHash[line[0]] = line[1:]
This assumes that each key is unique to one line. If not, this would have to be modified to:
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
if line[0] in yourHash:
yourHash[line[0]] += line[1:]
else:
yourHash[line[0]] = line[1:]
Of course, you can use csv, but I figured that someone would definitely post that, so I gave an alternative way to do it. Good luck!
You can use csv, read the first line to get the header, create the number of lists corresponding to the header and then create the dict:
import csv
with open(ur_csv) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=[[head] for head in next(reader)]
for row in reader:
for i, e in enumerate(row):
header[i].append(e)
data={l[0]:l[1:] for l in header}
print(data)
# {'hi': ['belly', 'black'], 'bye': ['heli', 'blue'], 'hello': ['jelly', 'red']}
If you want something more terse, you can use Jon Clements excellent solution:
with open(ur_csv) as fin:
csvin = csv.reader(fin, quotechar="'", skipinitialspace=True)
header = next(csvin, [])
data=dict(zip(header, zip(*csvin)))
# {'bye': ('heli', 'blue'), 'hello': ('jelly', 'red'), 'hi': ('belly', 'black')}
But that will produce a dictionary of tuples if that matters...
And if you csv file is huge, you may want to rewrite this to generate a dictionary row by row (similar to DictReader):
import csv
def key_gen(fn):
with open(fn) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=next(reader, [])
for row in reader:
yield dict(zip(header, row))
for e in key_gen(ur_csv):
print(e)
# {'hi': 'belly', 'bye': 'heli', 'hello': 'jelly'}
{'hi': 'black', 'bye': 'blue', 'hello': 'red'} etc...