I have a CSV file named 'salaries.csv' The content of the files is as follows:
City,Job,Salary
Delhi,Doctors,500
Delhi,Lawyers,400
Delhi,Plumbers,100
London,Doctors,800
London,Lawyers,700
London,Plumbers,300
Tokyo,Doctors,900
Tokyo,Lawyers,800
Tokyo,Plumbers,400
Lawyers,Doctors,300
Lawyers,Lawyers,400
Lawyers,Plumbers,500
Hong Kong,Doctors,1800
Hong Kong,Lawyers,1100
Hong Kong,Plumbers,1000
Moscow,Doctors,300
Moscow,Lawyers,200
Moscow,Plumbers,100
Berlin,Doctors,800
Berlin,Plumbers,900
Paris,Doctors,900
Paris,Lawyers,800
Paris,Plumbers,500
Paris,Dog catchers,400
I need to print the median salary of each profession. I tried a code, which shows some error.
My code is :
from StringIO import StringIO
import sqlite3
import csv
import operator #from operator import itemgetter, attrgetter
data = open('sal.csv', 'r').read()
string = ''.join(data)
f = StringIO(string)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (City text, Job text, Salary real)''')
conn.commit()
count = 0
for e in reader:
if count==0:
print ""
else:
e[0]=str(e[0])
e[1]=str(e[1])
e[2] = float(e[2])
c.execute("""insert into data values (?,?,?)""", e)
count=count+1
conn.commit()
labels = []
counts = []
count = 0
c.execute('''select count(Salary),Job from data group by Job''')
for row in c:
for i in row:
if count==0:
counts.append(i)
count=count+1
else:
count=0
labels.append(i)
c.execute('''select Salary,Job from data order by Job''')
count = 1
count1 = 1
temp = 0
pri = 0
lis = []
for row in c:
lis.append(row)
for cons in counts:
if cons%2 == 0:
pri = cons/2
else:
pri = (cons+1)/2
if count1 == 1:
for li in lis:
if count == pri:
print "Median is ",li
count = count + 1
count = 0
temp = pri+cons
else:
for li in lis:
if count == temp:
print "Median is",li
count = count+1
count = 0
temp = temp + pri
count1 = count1 + 1
However, it is showing some error:
IndentationError('expected an indented block', ('', 28, 2, 'if count==0:\n'))
How do I fix the error?
You can use defaultdict to put all the salaries for each profession then just get the median.
import csv
from collections import defaultdict
with open("C:/Users/jimenez/Desktop/a.csv","r") as f:
d = defaultdict(list)
reader = csv.reader(f)
reader.next()
for row in reader:
d[row[1]].append(float(row[2]))
for k,v in d.iteritems():
print "{} median is {}".format(k,sorted(v)[len(v) // 2])
print "{} average is {}".format(k,sum(v)/len(v))
Outputs
Plumbers median is 500.0
Plumbers average is 475.0
Lawyers median is 700.0
Lawyers average is 628.571428571
Dog catchers median is 400.0
Dog catchers average is 400.0
Doctors median is 800.0
Doctors average is 787.5
It is easy if you use pandas (http://pandas.pydata.org):
import pandas as pd
df = pd.read_csv('test.csv', names=['City', 'Job', 'Salary'])
df.groupby('Job').median()
# Salary
# Job
# Doctors 800
# Dog catchers 400
# Lawyers 700
# Plumbers 450
If you want the average instead of the median,
df.groupby('Job').mean()
# Salary
# Job
# Doctors 787.500000
# Dog catchers 400.000000
# Lawyers 628.571429
# Plumbers 475.000000
If your problem is computing he median, and not inserting everything in a SQL databas and scrambling it about,
it is a matter of just reading all lines, group all salaries in a list, and get the median from there - this reduces your hundred-line-magnitude script to:
import csv
professions = {}
with open("sal.csv") as data:
for city, profession, salary in csv.reader(data):
professions.setdefault(profession.strip(), []).append(int(salary.strip()))
for profession, salaries in sorted(professions.items()):
print ("{}: {}".format(profession, sorted(salaries)[len(salaries//2)] ))
(give or take "1" to get the proper median from the sorted salaries)
Related
I've written a python program that takes some inputs and turns them into a matplotlib graph. Specifically, it displays wealth distributions by percentile for a country of the user's choosing. However, these inputs are currently given by changing variables in the program.
I want to put this code on a website, allowing users to choose any country and see the wealth distribution for that country, as well as how they compare. Essentially, I am trying to recreate this: https://wid.world/income-comparator/
The code in python is all done but I am struggling to incorporate it into an HTML file. I was trying to use pyscript but it currently loads forever and displays nothing. Would rather not rewrite it in javascript (mainly because I don't know js). My thoughts are that it has something to do with the code importing csv files from my device?
import csv
from typing import List
import matplotlib.pyplot as plt
import collections
import math
from forex_python.converter import CurrencyRates
# ---------------- #
# whether or not the graph includes the top 1 percent in the graph (makes the rest of the graph visible!)
one_percent = False # True or False
# pick which country(ies) you want to view
country = 'China' # String
# what currency should the graph use
currency_used = 'Canada' # String
# if you want to compare an income
compare_income = True # True or False
# what income do you want to compare
income = 100000 # Int
# ---------------- #
codes = {}
# get dictionary of monetary country codes
monetary_codes = {}
with open('codes-all.csv') as csv_file:
list = csv.reader(csv_file, delimiter=',')
for row in list:
if row[5] == "":
monetary_codes[row[0]] = (row[2], row[1])
# get dictionary of country names and codes for WID
with open('WID_countries.csv') as csv_file:
WID_codes = csv.reader(csv_file, delimiter=',')
next(WID_codes)
for row in WID_codes:
if len(row[0]) == 2:
if row[2] != "":
monetary_code = monetary_codes[row[1].upper()][0]
currency_name = monetary_codes[row[1].upper()][1]
codes[row[1].upper()] = (row[0], monetary_code, currency_name)
elif row[2] == "":
codes[row[1].upper()] = (row[0], 'USD', 'United States Dollar')
elif row[0][0] == 'U' and row[0][1] == 'S':
codes[row[1].upper()] = (row[0], 'USD', 'United States Dollar')
# converts user input to upper case
country = country.upper()
currency_used = currency_used.upper()
# gets conversion rate
c = CurrencyRates()
conversion_rate = c.get_rate(codes[country][1], codes[currency_used][1])
# convert money into correct currency
def convert_money(conversion_rate, value):
return float(value) * conversion_rate
# get and clean data
def get_data(country):
aptinc = {}
# cleaning the data
with open(f'country_data/WID_data_{codes[country][0]}.csv') as csv_file:
data = csv.reader(csv_file, delimiter=';')
for row in data:
# I only care about the year 2021 and the variable 'aptinc'
if 'aptinc992' in row[1] and row[3] == '2021':
# translates percentile string into a numerical value
index = 0
for i in row[2]:
# index 0 is always 'p', so we get rid of that
if index == 0:
row[2] = row[2][1:]
# each string has a p in the middle of the numbers we care about. I also only
# care about the rows which measure a single percentile
# (upper bound - lower bound <= 1)
elif i == 'p':
lb = float(row[2][:index - 1])
ub = float(row[2][index:])
# if the top one percent is being filtered out adds another requirement
if not one_percent:
if ub - lb <= 1 and ub <= 99:
row[2] = ub
else:
row[2] = 0
else:
if ub - lb <= 1:
row[2] = ub
else: row[2] = 0
index += 1
# adds wanted, cleaned data to a dictionary. Also converts all values to one currency
if row[2] != 0:
aptinc[row[2]] = convert_money(conversion_rate, row[4])
return aptinc
# find the closest percentile to an income
def closest_percentile(income, data):
closest = math.inf
percentile = float()
for i in data:
difference = income - data[i]
if abs(difference) < closest:
closest = difference
percentile = i
return percentile
# ---------------- #
unsorted_data = {}
percentiles = []
average_income = []
# gets data for the country
data = get_data(country)
for i in data:
unsorted_data[i] = data[i]
# sorts the data
sorted = collections.OrderedDict(sorted(unsorted_data.items()))
for i in sorted:
percentiles.append(i)
average_income.append(data[i])
# makes countries pretty for printing
country = country.lower()
country = country.capitalize()
# calculates where the income places against incomes from country(ies)
blurb = ""
if compare_income:
percentile = closest_percentile(income, sorted)
blurb = f"You are richer than {round(percentile)} percent of {country}'s population"
# plot this data!
plt.plot(percentiles,average_income)
plt.title(f'{country} Average Annual Income by Percentile')
plt.xlabel(f'Percentile\n{blurb}')
plt.ylabel(f'Average Annual Income of {country}({codes[currency_used][1]})')
plt.axvline(x = 99, color = 'r', label = '99th percentile', linestyle=':')
if compare_income:
plt.axvline(x = percentile, color = 'g', label = f'{income} {codes[currency_used][2]}')
plt.legend(bbox_to_anchor = (0, 1), loc = 'upper left')
plt.show()
below is c.txt
CO11 CSE C1 8
CO12 ETC C1 8
CO13 Electrical C2 12
CO14 Mech E 5
my program needs to print a course summary on screen and save that summary into a file
named cr.txt. Given the above c.txt, your program output should look like
below. The content of course_report.txt should also be the same, except the last line. Course
names in the second column use * to indicate a compulsory course and – to indicate an elective
course. The fourth column is the number of students enrolled in that course. The fifth column is the average score of the course.
CID Name Points. Enrollment. Average.
----------------------------------
CO11 * CSE 8 2 81
CO12 * ETC 8 10 71
CO13 * Electrical 12 8 61
CO14 - Mech 5 4 51
----------------------------------
poor-performing subject is CO14 with an average 51.
cr.txt generated!
below is what I've tried:
def read(self):
ctype = []
fi = open("c.txt", "r")
l = fi.readline()
while l != "":
fields = l.strip().split(" ")
self.c.append(fields)
l = fi.readline().strip()
f.close()
# print(f"{'CID'}{'Name':>20}{'Points.':>16}{'Enrollment.':>18}{'Average.':>10}")
# print("-" * 67, end="")
print()
for i in range(0, len(self.c)):
for j in range(len(self.c[i])):
obj = self.c[i][j]
print(obj.ljust(18), end="")
print()
print("-" * 67, end="")
print()
you can try use 'file.read' or 'file.readlines' after use 'open' function, if you choose 'file.readlines' you'll have to use 'for row in file.readlines()' look my example with 'file.read':
headers = ['CID', 'Name', 'Points.', 'Enrollment.', 'Average.']
compulsory_course = ['CO11', 'CO12', 'CO13']
elective_course = ['CO14']
count = 0
with open('c.txt', 'r') as file_c:
file_c.seek(0, 0)
file_string = file_c.read().replace('\n', ' ')
fields = file_string.split(' ')
with open('cr.txt', 'w') as file_cr:
for field in headers:
file_cr.write(f'{field} ')
file_cr.write('\n')
for v in fields:
if count == 4:
file_cr.write('\n')
count = 0
count += 1
if v in compulsory_course:
file_cr.write(f'{v} * ')
continue
elif v in elective_course:
file_cr.write(f'{v} - ')
continue
elif count == 3:
file_cr.write(f' ')
continue
file_cr.write(f'{v} ')
I have a CSV, OutputA with format:
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
I am trying to get an output of a CSV which gets the total points for each team, the average points per team and the number of riders.
So output would be:
Team,Points,AvgPoints,NumOfRiders
Team1,190,95,2
Team2,95,95,1
I have this function to convert each row to a namedtuple:
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(path):
with open(path, 'r', errors='ignore') as file:
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
Then this sorts the rows into a sorted list by there club:
moutputA = sorted(list(csv_to_tuple("Male/outputA.csv")), key=lambda k: k[3])
This returns a list like:
[CategoryResults(Position='13', Category='A', Name='Marek', Team='1', Points='48'), CategoryResults(Position='7', Category='A', Name='', Team='1', Points='70')]
I am confident that this so far is right although I could be wrong.
I am trying to create a new list of teams with the points (not yet added up).
For example:
[Team 1(1,2,3,4,5)]
[Team 2 (6,9,10)]
etc.
The idea is that I can find how many unique values of points there are (this equals the number of riders). However, when trying to group the list I have this code:
Clubs = []
Club_Points = []
for Names, Club in groupby(moutputA, lambda x: x[3]):
for Teams in Names:
Clubs.append(list(Teams))
for Club, Points in groupby(moutputA, lambda x: x[4]):
for Point in Clubs:
Club_Points.append(list(Point))
print(Clubs)
but this retuns this error:
Teams.append(list(Team))
AttributeError: 'itertools._grouper' object has no attribute 'append'
If data.csv contains:
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
Then this script:
import csv
from collections import namedtuple
from itertools import groupby
from statistics import mean
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(path):
with open(path, 'r', errors='ignore') as file:
next(file) # skip header
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
moutputA = sorted(csv_to_tuple("data.csv"), key=lambda k: k.Team)
out = []
for team, group in groupby(moutputA, lambda x: x.Team):
group = list(group)
d = {}
d['Team'] = team
d['Points'] = sum(int(i.Points) for i in group)
d['AvgPoints'] = mean(int(i.Points) for i in group)
d['NumOfRider'] = len(group)
out.append(d)
with open('data_out.csv', 'w', newline='') as csvfile:
fieldnames = ['Team', 'Points', 'AvgPoints', 'NumOfRider']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in out:
writer.writerow(row)
Produces data_out.csv:
Team,Points,AvgPoints,NumOfRider
Team 1,190,95,2
Team 2,95,95,1
Screenshot from LibreOffice:
Here's a start. You should be able to figure out how to get what you want from this.
import csv, io
from collections import namedtuple
from itertools import groupby
data = '''\
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
'''
b = io.StringIO(data)
next(b)
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(file):
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
rows = sorted(list(csv_to_tuple(b)), key=lambda k: k[3])
for TeamName, TeamRows in groupby(rows, lambda x: x[3]):
print(TeamName)
TeamPoints = [row.Points for row in TeamRows]
print(TeamPoints)
print()
All of this would be made easier by just using pandas. Check out the code below.
import pandas as pd
import numpy as np
df = pd.read_csv(input_path)
teams = list(set(df['Team'])) # unique list of all the teams
num_teams = len(teams)
points = np.empty(shape=num_teams)
avg_points = np.empty(shape=num_teams)
num_riders = np.empty(shape=num_teams)
for i in range(num_teams):
# find all rows where the entry in the 'Team' column
# is the same as teams[i]
req = df.loc[df['Team'] == teams[i]]
points[i] = np.sum(req['Points'])
num_riders[i] = len(req)
avg_points[i] = point[i]/num_riders[i]
dict_out = {
'Team':teams,
'Points':points,
'AvgPoints':avg_points,
'NumOfRiders':num_riders
}
df_out = pd.DataFrame(data=dict_out)
df_out.to_csv(output_path)
I am trying to write a script to generate data. I am using random package for this. I execute the script and everything works fine. But when I check through the results, I found out that the script fails to generate the last 100+ rows for some reason.
Can someone suggest me why this could be happening?
from __future__ import print_function
from faker import Faker;
import random;
## Vaue declaration
population = 3;
product = 3;
years = 3;
months = 13;
days = 30;
tax= 3.5;
## Define Column Header
Column_Names = "Population_ID",";","Product_Name",";","Product_ID",";","Year",";",
"Month",";","Day","Quantity_sold",";","Sales_Price",";","Discount",
";","Actual_Sales_Price",tax;
## Function to generate sales related information
def sales_data():
for x in range(0,1):
quantity_sold = random.randint(5,20);
discount = random.choice(range(5,11));
sales_price = random.uniform(20,30);
return quantity_sold,round(sales_price,2),discount,round((sales_price)-(sales_price*discount)+(sales_price*tax));
## Format the month to quarter and return the value
def quarter(month):
if month >= 1 and month <= 3:
return "Q1";
elif month > 3 and month <= 6:
return "Q2";
elif month > 6 and month <= 9:
return "Q3";
else:
return "Q4";
## Generate product_id
def product_name():
str2 = "PROD";
sample2 = random.sample([1,2,3,4,5,6,7,8,9],5);
string_list = [];
for x in sample2:
string_list.append(str(x));
return (str2+''.join(string_list));
### Main starts here ###
result_log = open("C:/Users/Sangamesh.sangamad/Dropbox/Thesis/Data Preparation/GenData.csv",'w')
print (Column_Names, result_log);
### Loop and Generate Data ###
for pop in range(0,population):
pop = random.randint(55000,85000);
for prod_id in range(0,product):
product_name2 = product_name();
for year in range(1,years):
for month in range(1,months):
for day in range(1,31):
a = sales_data();
rows = str(pop)+";"+product_name2+";"+str(prod_id)+";"+str(year)+";"+str(month)+";"+quarter(month)+";"+str(day)+";"+str(a[0])+";"+str(a[1])+";"+str(a[2])+";"+str(tax)+";"+str(a[3]);
print(rows,file=result_log);
#print (rows);
tax = tax+1;
You need to close a file to have the buffers flushed:
result_log.close()
Better still, use the file object as a context manager and have the with statement close it for you when the block exits:
filename = "C:/Users/Sangamesh.sangamad/Dropbox/Thesis/Data Preparation/GenData.csv"
with result_log = open(filename, 'w'):
# code writing to result_log
Rather than manually writing strings with delimiters in between, you should really use the csv module:
import csv
# ..
column_names = (
"Population_ID", "Product_Name", "Product_ID", "Year",
"Month", "Day", "Quantity_sold", "Sales_Price", "Discount",
"Actual_Sales_Price", tax)
# ..
with result_log = open(filename, 'wb'):
writer = csv.writer(result_log, delimiter=';')
writer.writerow(column_names)
# looping
row = [pop, product_name2, prod_id, year, month, quarter(month), day,
a[0], a[1], a[2], tax, a[3]]
writer.writerow(row)
I am pretty new to python. I need to create a class that loads csv data into a dictionary.
I want to be able to control the keys and value
So let say the following code, I can pull out worker1.name or worker1.age anytime i want.
class ageName(object):
'''class to represent a person'''
def __init__(self, name, age):
self.name = name
self.age = age
worker1 = ageName('jon', 40)
worker2 = ageName('lise', 22)
#Now if we print this you see that it`s stored in a dictionary
print worker1.__dict__
print worker2.__dict__
#
'''
{'age': 40, 'name': 'jon'}
#
{'age': 22, 'name': 'lise'}
#
'''
#
#when we call (key)worker1.name we are getting the (value)
print worker1.name
#
'''
#
jon
#
'''
But I am stuck at loading my csv data into keys and value.
[1] I want to create my own keys
worker1 = ageName([name],[age],[id],[gender])
[2] each [name],[age],[id] and [gender] comes from specific a column in a csv data file
I really do not know how to work on this. I tried many methods but I failed. I need some helps to get started on this.
---- Edit
This is my original code
import csv
# let us first make student an object
class Student():
def __init__(self):
self.fname = []
self.lname = []
self.ID = []
self.sport = []
# let us read this file
for row in list(csv.reader(open("copy-john.csv", "rb")))[1:]:
self.fname.append(row[0])
self.lname.append(row[1])
self.ID.append(row[2])
self.sport.append(row[3])
def Tableformat(self):
print "%-14s|%-10s|%-5s|%-11s" %('First Name','Last Name','ID','Favorite Sport')
print "-" * 45
for (i, fname) in enumerate(self.fname):
print "%-14s|%-10s|%-5s|%3s" %(fname,self.lname[i],self.ID[i],self.sport[i])
def Table(self):
print self.lname
class Database(Student):
def __init__(self):
g = 0
choice = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
data = student.sport
k = len(student.fname)
print k
freq = {}
for i in data:
freq[i] = freq.get(i, 0) + 1
for i in choice:
if i not in freq:
freq[i] = 0
print i, freq[i]
student = Student()
database = Database()
This is my current code (incomplete)
import csv
class Student(object):
'''class to represent a person'''
def __init__(self, lname, fname, ID, sport):
self.lname = lname
self.fname = fname
self.ID = ID
self.sport = sport
reader = csv.reader(open('copy-john.csv'), delimiter=',', quotechar='"')
student = [Student(row[0], row[1], row[2], row[3]) for row in reader][1::]
print "%-14s|%-10s|%-5s|%-11s" %('First Name','Last Name','ID','Favorite Sport')
print "-" * 45
for i in range(len(student)):
print "%-14s|%-10s|%-5s|%3s" %(student[i].lname,student[i].fname,student[i].ID,student[i].sport)
choice = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
lst = []
h = 0
k = len(student)
# 23
for i in range(len(student)):
lst.append(student[i].sport) # merge together
for a in set(lst):
print a, lst.count(a)
for i in set(choice):
if i not in set(lst):
lst.append(i)
lst.count(i) = 0
print lst.count(i)
import csv
reader = csv.reader(open('workers.csv', newline=''), delimiter=',', quotechar='"')
workers = [ageName(row[0], row[1]) for row in reader]
workers now has a list of all the workers
>>> workers[0].name
'jon'
added edit after question was altered
Is there any reason you're using old style classes? I'm using new style here.
class Student:
sports = []
def __init__(self, row):
self.lname, self.fname, self.ID, self.sport = row
self.sports.append(self.sport)
def get(self):
return (self.lname, self.fname, self.ID, self.sport)
reader = csv.reader(open('copy-john.csv'), delimiter=',', quotechar='"')
print "%-14s|%-10s|%-5s|%-11s" % tuple(reader.next()) # read header line from csv
print "-" * 45
students = list(map(Student, reader)) # read all remaining lines
for student in students:
print "%-14s|%-10s|%-5s|%3s" % student.get()
# Printing all sports that are specified by students
for s in set(Student.sports): # class attribute
print s, Student.sports.count(s)
# Printing sports that are not picked
allsports = ['Basketball','Football','Other','Baseball','Handball','Soccer','Volleyball','I do not like sport']
for s in set(allsports) - set(Student.sports):
print s, 0
Hope this gives you some ideas of the power of python sequences. ;)
edit 2, shortened as much as possible... just to show off :P
Ladies and gentlemen, 7(.5) lines.
allsports = ['Basketball','Football','Other','Baseball','Handball',
'Soccer','Volleyball','I do not like sport']
sports = []
reader = csv.reader(open('copy-john.csv'))
for row in reader:
if reader.line_num: sports.append(s[3])
print "%-14s|%-10s|%-5s|%-11s" % tuple(s)
for s in allsports: print s, sports.count(s)
I know this is a pretty old question, but it's impossible to read this, and not think of the amazing new(ish) Python library, pandas. Its main unit of analysis is a think called a DataFrame which is modelled after the way R handles data.
Let's say you have a (very silly) csv file called example.csv which looks like this:
day,fruit,sales
Monday,Banana,10
Monday,Orange,20
Tuesday,Banana,12
Tuesday,Orange,22
If you want to read in a csv in double-quick time, and do 'stuff' with it, you'd be hard pressed to beat the following code for either brevity or ease of use:
>>> import pandas as pd
>>> csv = pd.read_csv('example.csv')
>>> csv
day fruit sales
0 Monday Banana 10
1 Monday Orange 20
2 Tuesday Banana 12
3 Tuesday Orange 22
>>> csv[csv.fruit=='Banana']
day fruit sales
0 Monday Banana 10
2 Tuesday Banana 12
>>> csv[(csv.fruit=='Banana') & (csv.day=='Monday')]
day fruit sales
0 Monday Banana 10
In my opinion, this is really fantastic stuff. Never iterate over a csv.reader object again!
I second Mark's suggestion. In particular, look at DictReader from csv module that allows reading a comma separated (or delimited in general) file as a dictionary.
Look at PyMotW's coverage of csv module for a quick reference and examples of usage of DictReader, DictWriter
Have you looked at the csv module?
import csv