I need to create a program that takes a CSV file and returns a nested dictionary. The keys for the outer dictionary should be the first value in each row, starting from the second one (so as to omit the row with the column names). The value for each key in the outer dictionary should be another dictionary, which I explain below.
The inner dictionary's keys should be the column names, while the values should be the value corresponding to that column in each row.
Example:
For a CSV file like this:
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78
I would like to print out the data in this form:
my_dict = {
'4': {'column1':'4','column2':'12', 'column3':'5', 'column4':'11'},
'29': {'column1':'29', 'column2':'47', 'column3':'23', 'column4':'41'},
'66': {'column1':'66', 'column2':'1', 'column3':'98', 'column4':'78'}
}
The closest I've gotten so far (which isn't even close):
import csv
import collections
def csv_to_dict(file, delimiter, quotechar):
list_inside_dict = collections.defaultdict(list)
with open(file, newline = '') as csvfile:
reader = csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar)
for row in reader:
for (k,v) in row.items():
list_inside_dict[k].append(v)
return dict(list_inside_dict)
If I try to run the function with the example CSV file above, delimiter = ",", and quotechar = "'", it returns the following:
{'column1': ['4', '29', '66'], ' column2': ['12', '47', '1'], ' column3': ['5', '23', '98'], ' column4': ['11', '41', '78']}
At this point I got lost. I tried to change:
list_inside_dict = collections.defaultdict(list)
for
list_inside_dict = collections.defaultdict(dict)
And then simply changing the value for each key, since I cannot append into a dictionary, but it all got really messy. So I started from scratch and found I reached the same place.
You can use a dictionary comprehension:
import csv
with open('filename.csv') as f:
header, *data = csv.reader(f)
final_dict = {a:dict(zip(header, [a, *b])) for a, *b in data}
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
You can use pandas for that task.
>>> df = pd.read_csv('/path/to/file.csv')
>>> df.index = df.iloc[:, 0]
>>> df.to_dict('index')
Not sure why you want to duplicate the value of the first column, but in case you don't the above simplifies to:
>>> pd.read_csv('/path/to/file.csv', index_col=0).to_dict('index')
This is similar to this answer, however, I believe it could be better explained.
import csv
with open('filename.csv') as f:
headers, *data = csv.reader(f)
output = {}
for firstInRow, *restOfRow in data:
output[firstInRow] = dict(zip(headers, [firstInRow, *restOfRow]))
print(output)
What this does is loops through the rows of data in the file with the first value as the index and the following values in a list. The value of the index in the output dictionary is then set by zipping the list of headers and the list of values. That output[first] = ... line is the same as writing output[firstInRow] = {header[1]: firstInRow, header[2]: restOfRow[1], ...}.
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
It is a couple of zips to get what you want.
Instead of a file, we can use a string for the csv. Just replace that part with a file.
Given:
s='''\
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78'''
You can do:
import csv
data=[]
for row in csv.reader(s.splitlines()): # replace 'splitlines' with your file
data.append(row)
header=data.pop(0)
col1=[e[0] for e in data]
di={}
for c,row in zip(col1,data):
di[c]=dict(zip(header, row))
Then:
>>> di
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
On Python 3.6+, the dicts will maintain insertion order. Earlier Python's will not.
Related
I have been wrestling with this for a day or two now, and I can't seem to get it right.
project_index = [
{A: ['1', '2', '3']},
{B: ['4', '5', '6']},
{C: ['7', '8', '9']},
{D: ['10', '11', '12']},
{E: ['13', '14', '15']},
{F: ['16', '17', '18']}
]
I have tried so many different things to try to get this into a .CSV table, but it keeps coming out in ridiculously incorrect format, eg them tiling down diagonally, or a bunch of rows of just the keys over and over (EG:
A B C D E F
A B C D E F
A B C D E F
A B C D E F )
Also, even if I get the values to show up, the entire array of strings shows up in one cell.
Is there any way I can get it to make each dictionary a column, with each string in the array value as its own cell in said column?
Example:
Thank you in advance!
Assuming all your keys are unique... then this (Modified Slightly):
project_index = [
{'A': ['1', '2', '3']},
{'B': ['4', '5', '6']},
{'C': ['7', '8', '9']},
{'D': ['10', '11', '12', '20']},
{'E': ['13', '14', '15']},
{'F': ['16', '17', '18']}
]
Should probably look like this:
project_index_dict = {}
for x in project_index:
project_index_dict.update(x)
print(project_index_dict)
# Output:
{'A': ['1', '2', '3'],
'B': ['4', '5', '6'],
'C': ['7', '8', '9'],
'D': ['10', '11', '12', '20'],
'E': ['13', '14', '15'],
'F': ['16', '17', '18']}
At this point, rather than re-invent the wheel... you could just use pandas.
import pandas as pd
# Work-around for uneven lengths:
df = pd.DataFrame.from_dict(project_index_dict, 'index').T.fillna('')
df.to_csv('file.csv', index=False)
Output file.csv:
A,B,C,D,E,F
1,4,7,10,13,16
2,5,8,11,14,17
3,6,9,12,15,18
,,,20,,
csv module method:
import csv
from itertools import zip_longest, chain
header = []
for d in project_index:
header.extend(list(d))
project_index_rows = [dict(zip(header, x)) for x in
zip_longest(*chain(list(*p.values())
for p in project_index),
fillvalue='')]
with open('file.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames = header)
writer.writeheader()
writer.writerows(project_index_rows)
My solution does not use Pandas. Here is the plan:
For the header row, grab all the keys from the dictionaries
For the data row, use zip to transpose columns -> rows
import csv
def first_key(d):
"""Return the first key in a dictionary."""
return next(iter(d))
def first_value(d):
"""Return the first value in a dictionary."""
return next(iter(d.values()))
with open("output.csv", "w", encoding="utf-8") as stream:
writer = csv.writer(stream)
# Write the header row
writer.writerow(first_key(d) for d in project_index)
# Write the rest
rows = zip(*[first_value(d) for d in project_index])
writer.writerows(rows)
Contents of output.csv:
A,B,C,D,D,F
1,4,7,10,13,16
2,5,8,11,14,17
3,6,9,12,15,18
If I want to combine lists inside list based on element value how can I achieve that?
suppose if list
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
here list containing 'steve' appears twice so I want to combine them as below
new_lis = [['steve','reporter','12','34','22','98','dancer','66','31','54','12'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20']]
I tried below code to achieve this
new_dic = {}
for i in range(len(lis)):
name = lis[i][0]
if name in new_dic:
new_dic[name].append([lis[i][1],lis[i][2],lis[i][3],lis[i][4],lis[i][5]])
else:
new_dic[name] = [lis[i][1],lis[i][2],lis[i][3],lis[i][4],lis[i][5]]
print(new_dic)
I ended up creating a dictionary with multiple values of lists as below
{'steve': ['reporter', '12', '34', '22', '98', ['dancer', '66', '31', '54', '12']], 'megan': ['arch', '44', '98', '32', '22'], 'jack': ['doctor', '80', '32', '65', '20']}
but I wanted it as single list so I can convert into below format
new_lis = [['steve','reporter','12','34','22','98','dancer','66','31','54','12'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20']]
is there a way to tackle this in different way?
There is a differnet way to do it using groupby function from itertools. Also there are ways to convert your dict to a list also. It totally depends on what you want.
from itertools import groupby
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
lis.sort(key = lambda x: x[0])
output = []
for name , groups in groupby(lis, key = lambda x: x[0]):
temp_list = [name]
for group in groups:
temp_list.extend(group[1:])
output.append(temp_list)
print(output)
OUTPUT
[['jack', 'doctor', '80', '32', '65', '20'], ['megan', 'arch', '44', '98', '32', '22'], ['steve', 'reporter', '12', '34', '22', '98', 'dancer', '66', '31', '54', '12']]
Not sure whether this snippet answers your question or not. This is not a fastest approach in terms to time complexity. I will update this answer if I can solve in a better way.
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
new_lis = []
element_value = 'steve'
for inner_lis in lis:
if element_value in inner_lis:
if not new_lis:
new_lis+=inner_lis
else:
inner_lis.remove(element_value)
new_lis+=inner_lis
lis.remove(inner_lis)
print([new_lis] + lis)
Output
[['steve', 'reporter', '12', '34', '22', '98', 'dancer', '66', '31', '54', '12'], ['megan', 'arch', '44', '98', '32', '22'], ['jack', 'doctor', '80', '32', '65', '20']]
I am trying to write a function which takes the file and split it with the new line and then again split it using comma delimiter(,) after that I want to convert each string inside that list to integers using only list comprehension
# My code but it's not converting the splitted list into integers.
def read_csv(filename):
string_list = open(filename, "r").read().split('\n')
string_list = string_list[1:len(string_list)]
splitted = [i.split(",") for i in string_list]
final_list = [int(i) for i in splitted]
return final_list
read_csv("US_births_1994-2003_CDC_NCHS.csv")
Output:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
How the data looks after splitting with comma delimiter(,)
us = open("US_births_1994-2003_CDC_NCHS.csv", "r").read().split('\n')
splitted = [i.split(",") for i in us]
print(splitted)
Output:
[['year', 'month', 'date_of_month', 'day_of_week', 'births'],
['1994', '1', '1', '6', '8096'],
['1994', '1', '2', '7', '7772'],
['1994', '1', '3', '1', '10142'],
['1994', '1', '4', '2', '11248'],
['1994', '1', '5', '3', '11053'],
['1994', '1', '6', '4', '11406'],
['1994', '1', '7', '5', '11251'],
['1994', '1', '8', '6', '8653'],
['1994', '1', '9', '7', '7910'],
['1994', '1', '10', '1', '10498']]
How do I convert each string inside this output as integers and assign it to a single list using list comprehension.
str.split() produces a new list; so splitted is a list of lists. You'd want to convert the contents of each contained list:
[[int(v) for v in row] for row in splitted]
Demo:
>>> csvdata = '''\
... year,month,date_of_month,day_of_week,births
... 1994,1,1,6,8096
... 1994,1,2,7,7772
... '''
>>> string_list = csvdata.splitlines() # better way to split lines
>>> string_list = string_list[1:] # you don't have to specify the second value
>>> splitted = [i.split(",") for i in string_list]
>>> splitted
[['1994', '1', '1', '6', '8096'], ['1994', '1', '2', '7', '7772']]
>>> splitted[0]
['1994', '1', '1', '6', '8096']
>>> final_list = [[int(v) for v in row] for row in splitted]
>>> final_list
[[1994, 1, 1, 6, 8096], [1994, 1, 2, 7, 7772]]
>>> final_list[0]
[1994, 1, 1, 6, 8096]
Note that you could just loop directly over the file to get separate lines too:
string_list = [line.strip().split(',') for line in openfileobject]
and skipping an entry in such an object could be done with next(iterableobject, None).
Rather than read the whole file into memory and manually split the data, you could just use the csv module:
import csv
def read_csv(filename):
with open(filename, 'r', newline='') as csvfile:
reader = csv.reader(csvfile)
next(reader, None) # skip first row
for row in reader:
yield [int(c) for c in row]
The above is a generator function, producing one row at a time as you loop over it:
for row in read_csv("US_births_1994-2003_CDC_NCHS.csv"):
print(row)
You can still get a list with all rows with list(read_csv("US_births_1994-2003_CDC_NCHS.csv")).
The problem at hand is I have a list of lists that I need to iterate through and compare one by one.
def stockcheck():
stock = open("Stock.csv", "r")
reader = csv.reader(stock)
stockList = []
for row in reader:
stockList.append(row)
The output from print(stockList) is:
[['Product', 'Current Stock', 'Reorder Level', 'Target Stock'], ['plain blankets', '5', '10', '50'], ['mugs', '15', '20', '120'], ['100m rope', '60', '15', '70'], ['burner', '90', '20', '100'], ['matches', '52', '10', '60'], ['bucket', '85', '15', '100'], ['spade', '60', '10', '65'], ['wood', '100', '10', '200'], ['sleeping bag', '50', '10', '60'], ['chair', '30', '10', '60']]
I've searched the basics for this but i've had no luck... I'm sure the solution is simple but it's escaping me! Essentially I need to check whether the current stock is less than the re-order level, and if it is save it to a CSV (that part I can do no problem).
for item in stockList:
if stockList[1][1] < stockList[1][2]:
print("do the add to CSV jiggle")
This is as much as I can do but it doesn't iterate through... Any ideas? Thanks in advance!
Iterate through the stockList using list comprehension, maybe and then print out the results
[sl for sl in stockList[1:] if sl[1] < sl[2]]
You will get the following results:
[['mugs', '15', '20', '120']]
In case you were wondering stockList[1:] is to ensure that you ignore the header.
However, you must note that the values are strings that are being compared. Hence, the values are compared char by char. If you want integer comparisons then you must convert the strings to integers, assuming you are absolutely sure that sl[1] and sl[2] will always be integers - just being presented as strings. Just try doing:
[sl for sl in stockList[1:] if int(sl[1]) < int(sl[2])]
The result changes:
[['plain blankets', '5', '10', '50'], ['mugs', '15', '20', '120']]
Use the [1:] to not get the header, and then make the comparation.
for item in stockList[1:]:
if item[1] < item[2]:
print item
print("do the add to CSV jiggle")
I have a dictionary that contains several keys and values, ie:
d = {'1' : "whatever", '2' : "something", '3' : "other", '4' : "more" ...etc}
I would like to quickly print selected values from that code, for example values of following keys: 1,3,20,23,45,46 etc.
I know I can print value by value:
print d['1'] + d['3'] +d['20'] + d['23'] + d['45'] + d['46']
but that'll take to long to type and the number sequence is visually unclear. What would be the simplest way to print out selected values, while keeping the sequence visualy easy to read (ideally something like print d['1','3','20','23','45','46']?
You could combine dict.get with the map function:
map(d.get, ['1', '3', '20', '23', '45', '46'])
This will give you a list of the corresponding values, such as:
['whatever', 'other', 'something', 'more']
You could print them with:
print ' '.join(map(d.get, ['1', '3', '20', '23', '45', '46']))
I think this solution is the easiest to understand:
keys_to_print = ['1', '3', '20', '23', '45', '46']
for key in keys_to_print:
print d[key] + ' ',
This just loops through all of the keys whose values you want to print.
There are several ways to do this.
Using operator.itemgetter with str.join:
>>> from operator import itemgetter
>>> keys = ['1', '3', '4', '2']
>>> print ' '.join(itemgetter(*keys)(d))
whatever other more something
>>> print ' '.join(itemgetter('1', '3', '4', '2')(d)) #without a `keys` variable.
whatever other more something
Using str.join with a generator expression:
>>> print ' '.join(d[k] for k in keys)
whatever other more something
If you're using Python 3 or willing to import print function in Python 2:
>>> print(*itemgetter(*keys)(d), sep=' ')
whatever other more something
>>> print(*itemgetter('1', '3', '4', '2')(d), sep=' ')
whatever other more something
>>> print(*(d[k] for k in keys), sep=' ')
whatever other more something
I'm surprised no one has proposed the obvious:
d = {'1' : "whatever", '2' : "something", '3' : "other", '4' : "more",
'20': 'vingt', '23': 'vingt-trois', '45':'quarante-cinq', '46':'quarante-six' }
keys_to_print = ['1', '3', '20', '23', '45', '46']
print ' '.join([d[key] for key in keys_to_print])
Producing:
whatever other vingt vingt-trois quarante-cinq quarante-six