How can I loop through this dictionary instead of hardcoding the keys - python

So far, I have this code (from cs50/pset6/DNA):
import csv
data_dict = {}
with open(argv[1]) as data_file:
reader = csv.DictReader(data_file)
for record in reader:
# `record` is a dictionary of column-name & value
name = record["name"]
data = {
"AGATC": record["AGATC"],
"AATG": record["AATG"],
"TATC": record["TATC"],
}
data_dict[name] = data
print(data_dict)
Output
{'Alice': {'AATG': '8', 'AGATC': '2', 'TATC': '3'},
'Bob': {'AATG': '1', 'AGATC': '4', 'TATC': '5'},
'Charlie': {'AATG': '2', 'AGATC': '3', 'TATC': '5'}}
Here is the csv file:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
But my goal is to achieve the exact same thing, but instead of hardcoding the keys AATG, etc., and also because I'll use a much much bigger database that contains more values, I want to be able to loop through the data, instead of doing this:
data = {
"AGATC": record["AGATC"],
"AATG": record["AATG"],
"TATC": record["TATC"],
}
Could you please help me? Thanks

You could also try using pandas.
Using your example data as .csv file:
pandas.read_csv('example.csv', index_col = 0).transpose().to_dict()
Outputs:
{'Alice': {'AGATC': 2, 'AATG': 8, 'TATC': 3},
'Bob': {'AGATC': 4, 'AATG': 1, 'TATC': 5},
'Charlie': {'AGATC': 3, 'AATG': 2, 'TATC': 5}}
index_col = 0 because you have names column which I set as index (so that later becomes top level keys in dictionary)
.transpose() so top level keys are names and not features (AGATC, AATG, etc.)
.to_dict() to transform pandas.DataFrame to python dictionary

you can simply use pandas:
import csv
import pandas as pd
data_dict = {}
with open(argv[1]) as data_file:
reader = csv.DictReader(data_file)
df = pd.DataFrame(reader)
df = df.set_index('name') # set name column as index
data_dict = df.transpose().to_dict() # transpose to make dict with indexes
print(data_dict)

You can loop through a dictionary in python simply enough like this:
for key in dictionary:
print(key, dictionary[key])

You are on the right track using csv.DictReader.
import csv
from pprint import pprint
data_dict = {}
with open('fasta.csv', 'r') as f:
reader = csv.DictReader(f)
for record in reader:
name = record.pop('name')
data_dict[name] = record
pprint(data_dict)
Prints
{'Alice': {'AATG': '8', 'AGATC': '2', 'TATC': '3'},
'Bob': {'AATG': '1', 'AGATC': '4', 'TATC': '5'},
'Charlie': {'AATG': '2', 'AGATC': '3', 'TATC': '5'}}

Related

Write array of dictionaries to csv in Python 3?

I have been wrestling with this for a day or two now, and I can't seem to get it right.
project_index = [
{A: ['1', '2', '3']},
{B: ['4', '5', '6']},
{C: ['7', '8', '9']},
{D: ['10', '11', '12']},
{E: ['13', '14', '15']},
{F: ['16', '17', '18']}
]
I have tried so many different things to try to get this into a .CSV table, but it keeps coming out in ridiculously incorrect format, eg them tiling down diagonally, or a bunch of rows of just the keys over and over (EG:
A B C D E F
A B C D E F
A B C D E F
A B C D E F )
Also, even if I get the values to show up, the entire array of strings shows up in one cell.
Is there any way I can get it to make each dictionary a column, with each string in the array value as its own cell in said column?
Example:
Thank you in advance!
Assuming all your keys are unique... then this (Modified Slightly):
project_index = [
{'A': ['1', '2', '3']},
{'B': ['4', '5', '6']},
{'C': ['7', '8', '9']},
{'D': ['10', '11', '12', '20']},
{'E': ['13', '14', '15']},
{'F': ['16', '17', '18']}
]
Should probably look like this:
project_index_dict = {}
for x in project_index:
project_index_dict.update(x)
print(project_index_dict)
# Output:
{'A': ['1', '2', '3'],
'B': ['4', '5', '6'],
'C': ['7', '8', '9'],
'D': ['10', '11', '12', '20'],
'E': ['13', '14', '15'],
'F': ['16', '17', '18']}
At this point, rather than re-invent the wheel... you could just use pandas.
import pandas as pd
# Work-around for uneven lengths:
df = pd.DataFrame.from_dict(project_index_dict, 'index').T.fillna('')
df.to_csv('file.csv', index=False)
Output file.csv:
A,B,C,D,E,F
1,4,7,10,13,16
2,5,8,11,14,17
3,6,9,12,15,18
,,,20,,
csv module method:
import csv
from itertools import zip_longest, chain
header = []
for d in project_index:
header.extend(list(d))
project_index_rows = [dict(zip(header, x)) for x in
zip_longest(*chain(list(*p.values())
for p in project_index),
fillvalue='')]
with open('file.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames = header)
writer.writeheader()
writer.writerows(project_index_rows)
My solution does not use Pandas. Here is the plan:
For the header row, grab all the keys from the dictionaries
For the data row, use zip to transpose columns -> rows
import csv
def first_key(d):
"""Return the first key in a dictionary."""
return next(iter(d))
def first_value(d):
"""Return the first value in a dictionary."""
return next(iter(d.values()))
with open("output.csv", "w", encoding="utf-8") as stream:
writer = csv.writer(stream)
# Write the header row
writer.writerow(first_key(d) for d in project_index)
# Write the rest
rows = zip(*[first_value(d) for d in project_index])
writer.writerows(rows)
Contents of output.csv:
A,B,C,D,D,F
1,4,7,10,13,16
2,5,8,11,14,17
3,6,9,12,15,18

How can I create a list of dictionaries from a csv file?

I want to create a "dictionary of dictionaries" for each row of the following csv file
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
So the idea is, that mydict["Alice"] should be {'AGATC': 2, 'AATG': 8, 'TATC': 3} etc.
I really do not understand the .reader and .DictReader functions sufficiently. https://docs.python.org/3/library/csv.html#csv.DictReader
Because I am a newbie and cannot quite understand the docs. Do you have other 'easier' resources, that you can recommend?
First, I have to get the first column, i.e. names and put them as keys. How can I access that first column?
Second, I want to create a dictionary inside that name (as the value), with the keys being AGATC,AATG,TATC. Do you understand what I mean? Is that possible?
Edit, made progess:
# Open the CSV file and read its contents into memory.
with open(argv[1]) as csvfile:
reader = list(csv.reader(csvfile))
# Each row read from the csv file is returned as a list of strings.
# Establish dicts.
mydict = {}
for i in range(1, len(reader)):
print(reader[i][0])
mydict[reader[i][0]] = reader[i][1:]
print(mydict)
Out:
{'Alice': ['2', '8', '3'], 'Bob': ['4', '1', '5'], 'Charlie': ['3', '2', '5']}
But how to implement nested dictionaries as described above?
Edit #3:
# Open the CSV file and read its contents into memory.
with open(argv[1]) as csvfile:
reader = list(csv.reader(csvfile))
# Each row read from the csv file is returned as a list of strings.
# Establish dicts.
mydict = {}
for i in range(1, len(reader)):
print(reader[i][0])
mydict[reader[i][0]] = reader[i][1:]
print(mydict)
print(len(reader))
dictlist = [dict() for x in range(1, len(reader))]
#for i in range(1, len(reader))
for i in range(1, len(reader)):
dictlist[i-1] = dict(zip(reader[0][1:], mydict[reader[i][0]]))
#dictionary = dict(zip(reader[0][1:], mydict[reader[1][0]]))
print(dictlist)
Out:
[{'AGATC': '2', 'AATG': '8', 'TATC': '3'}, {'AGATC': '4', 'AATG': '1', 'TATC': '5'}, {'AGATC': '3', 'AATG': '2', 'TATC': '5'}]
{'AGATC': 1, 'AATG': 1, 'TATC': 5}
So I solved it for myself:)
The following code will give you what you've asked for in terms of dict struture.
import csv
with open('file.csv', newline='') as csvfile:
mydict = {}
reader = csv.DictReader(csvfile)
# Iterate through each line of the csv file
for row in reader:
# Create the dictionary structure as desired.
# This uses a comprehension
# Foreach item in the row get the key and the value except if the key
# is 'name' (k != 'name')
mydict[row['name']] = { k: v for k, v in row.items() if k != 'name' }
print(mydict)
This will give you
{
'Alice': {'AGATC': '2', 'AATG': '8', 'TATC': '3'},
'Bob': {'AGATC': '4', 'AATG': '1', 'TATC': '5'},
'Charlie': {'AGATC': '3', 'AATG': '2', 'TATC': '5'}
}
There are plenty of videos and articles covering comprehensions on the net if you need more information on these.

How to create a dictionary with lists as values out of a csv file?

I have a csv file with 19 columns and want to make it as a dictionary that the first 2 columns be the key (maybe a tuple or just merge them as one string), and then all other 17 columns be a list as values. the excel file looks like this: image of the cvs file
I want to have a dictionary like this :
d1 = { "A , 222" : [1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1]}
d2={"B, 223" : [1,1,1,1,0,0,0,1,1,0,0,1,1,1,1,1]}
d3 = {....}
....
Here's a solution using csv.reader
from csv import reader
d = {}
with open('infile.csv', newline='') as f:
r = reader(f)
for row in r:
if not row:
continue # Handles blank rows
key1, key2, *value = row
d[(key1, key2)] = value
Edit:
The line key1, key2, *value = row will only work in Python 3. If that feature is not available to you, you can use
key1, key2 = row[:2]
value = row[2:]
Are you means this?
res = {}
with open(fileName.csv, "r") as f:
text = f.readlines()
for line in text[1:]:
part = line.strip().split(",")
key = ",".join(part[:2])
value = [int(i) for i in part[2:]]
res[key] = value
Using csv.reader you can do that like:
Code:
as_dict = {'{}, {}'.format(*row[:2]): row[2:] for row in reader if row}
Test Code:
data = StringIO(''.join('\n'.join(x.strip() for x in u"""
A,222,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1
B,223,1,1,1,1,0,0,0,1,1,0,0,1,1,1,1,1
""".split('\n')[1:-1])))
reader = csv.reader(data)
as_dict = {'{}, {}'.format(*row[:2]): row[2:] for row in reader if row}
print(as_dict)
Results:
{
'A, 222': ['1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
'B, 223': ['1', '1', '1', '1', '0', '0', '0', '1', '1', '0', '0', '1', '1', '1', '1', '1']
}

How to create dict using csv file with first row as keys

I'd like to create a list of dictionaries reading from a large csv file that uses the entries from the first row as keys. for example, test.csv
Header1, Header2, Header3
A, 1, 10
B, 2, 20
C, 3, 30
The resulting dict would look like:
MyList = [{'Header1': A, 'Header2': 1, 'Header3: 10}, {'Header1': B, 'Header2': 2, 'Header3: 20}, {'Header1': C, 'Header2': 3, 'Header3: 30}]
I know how to read a file, and think maybe a defaultdict from collections might be a good way, but can't get the syntax right.
This is exactly what csv.DictReader was made for.
import csv
with open('data.csv') as f:
reader = csv.DictReader(f)
for row in reader:
print row
For the data.csv containing:
Header1,Header2,Header3
A,1,10
B,2,20
C,3,30
It prints:
{'Header2': '1', 'Header3': '10', 'Header1': 'A'}
{'Header2': '2', 'Header3': '20', 'Header1': 'B'}
{'Header2': '3', 'Header3': '30', 'Header1': 'C'}

Flatten Entity-Attribute-Value (EAV) Schema in Python

I've got a csv file in something of an entity-attribute-value format (i.e., my event_id is non-unique and repeats k times for the k associated attributes):
event_id, attribute_id, value
1, 1, a
1, 2, b
1, 3, c
2, 1, a
2, 2, b
2, 3, c
2, 4, d
Are there any handy tricks to transform a variable number of attributes (i.e., rows) into columns? The key here is that the output ought to be an m x n table of structured data, where m = max(k); filling in missing attributes with NULL would be optimal:
event_id, 1, 2, 3, 4
1, a, b, c, null
2, a, b, c, d
My plan was to (1) convert the csv to a JSON object that looks like this:
data = [{'value': 'a', 'id': '1', 'event_id': '1', 'attribute_id': '1'},
{'value': 'b', 'id': '2', 'event_id': '1', 'attribute_id': '2'},
{'value': 'a', 'id': '3', 'event_id': '2', 'attribute_id': '1'},
{'value': 'b', 'id': '4', 'event_id': '2', 'attribute_id': '2'},
{'value': 'c', 'id': '5', 'event_id': '2', 'attribute_id': '3'},
{'value': 'd', 'id': '6', 'event_id': '2', 'attribute_id': '4'}]
(2) extract unique event ids:
events = set()
for item in data:
events.add(item['event_id'])
(3) create a list of lists, where each inner list is a list the of attributes for the corresponding parent event.
attributes = [[k['value'] for k in j] for i, j in groupby(data, key=lambda x: x['event_id'])]
(4) create a dictionary that brings events and attributes together:
event_dict = dict(zip(events, attributes))
which looks like this:
{'1': ['a', 'b'], '2': ['a', 'b', 'c', 'd']}
I'm not sure how to get all inner lists to be the same length with NULL values populated where necessary. It seems like something that needs to be done in step (3). Also, creating n lists full of m NULL values had crossed my mind, then iterate through each list and populate the value using attribute_id as the list location; but that seems janky.
Your basic idea seems right, though I would implement it as follows:
import itertools
import csv
events = {} # we're going to keep track of the events we read in
with open('path/to/input') as infile:
for event, _att, val in csv.reader(infile):
if event not in events:
events[event] = []
events[int(event)].append(val) # track all the values for this event
maxAtts = max(len(v) for _k,v in events.items()) # the maximum number of attributes for any event
with open('path/to/output', 'w') as outfile):
writer = csv.writer(outfile)
writer.writerow(["event_id"] + list(range(1, maxAtts+1))) # write out the header row
for k in sorted(events): # let's look at the events in sorted order
writer.writerow([k] + events[k] + ['null']*(maxAtts-len(events[k]))) # write out the event id, all the values for that event, and pad with "null" for any attributes without values

Categories