Keyerror with 2d dictionaries - python

I'm trying to step through a csv and assign date and time values to their own point in a 2d dictionary.This would be in a form such that an instance of:
'11/02/16' and '23:24' in their respective columns in a row would add '1' to the value in the position marked by 'X' in the dictionary 'Dates{11/01/16{23:X}}'.
Unfortunately I get a KeyError for the following code.
import csv
import sys
from sys import argv
from collections import defaultdict
script, ReadFile = argv
f = open(ReadFile,'r')
l = f.readlines()
f.close()
file_list = [row.replace('\n','').split(',') for row in l]
header = file_list[0]
Total = 0
Dates = defaultdict(dict)
print Dates
index_variable = header.index('Time')
index_variable2 = header.index('# Timestamp')
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if row[index_variable2][:10] in Dates:
Dates[d][t] = 1
Total += 1
print "true"
else:
Dates[d] = {}
Dates[d][t] = 1
Total =+ 1
print "false"
print Dates
If I replace the local variable 't' with "'Test'" the code works, but obviously the results are not what I'm after.
Thanks in advance!
Update: If I replace 'd' with 'Test' and keep 't' as it is, the program works completely fine. It's only when the Dictionary is specifically called as 'Dates[d][t]' that the program returns a KeyError.
Update 2: I've updated the code above to show my work. Currently the script will work /as long as no numbers are added/.
Dates[d][t] = 1 #If I change this...
Dates[d][t] += 1 #To this...
A KeyError occurs.
Update 3:
I changed a portion of my code...
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if d in Dates and t in Dates[d]:
Dates[d][t] += 1
print "true"
else:
Dates[d][t] = 1
print "false"
And now the script works perfectly fine. I suppose that this means the KeyError was because I was not being specific enough (???).

Assuming that what we see above is just bad formatting of the if by the machine...
I think the problem is in the else:
Dates is a dict with various keys.
The d are the first 10 characters of the 'Date' field in your input
You are wanting to count how many times the minutes got hit on a specific Date.
Dates[d] then is a dictionary whose keys are days.
t is supposed to be a dictionary of minutes that got hit on the specific day
You haven't told python that Dates[d] is a dictionary too.
But you've made a reference to Dates[d][t]. This implies that Dates[d] already exists and it has something that is subscriptable in it.
I tried this on my system
import csv
import sys
from sys import argv
from collections import defaultdict
#script, ReadFile = argv
#f = open(ReadFile,'r')
#l = f.readlines()
#f.close()
#file_list = [row.replace('\n','').split(',') for row in l]
#header = file_list[0]
file_list = [['Date','Time','Otherstuff'],
['2016-02-01','23:12:00','Sillyme1'],
['2016-02-01','23:12:04','Sillyme2'],
['2016-02-02','22:10:00','Sillyme3']]
header = file_list[0]
Dates = defaultdict(dict)
print(Dates)
index_variable = header.index('Time')
index_variable2 = header.index('Date')
for row in file_list[1:]:
t = row[index_variable][:2]
d = row[index_variable2][:10]
if d in Dates.keys():
Dates[d][t] +=1
print("true")
else:
Dates[d] = {} #Now Dates[d] contains a dictionary
Dates[d][t] = 1 ##Now we put the first counter in the Dates[d] dictionary with key t.
print(Dates)
Return was:
defaultdict(, {})
true
defaultdict(, {'2016-02-01': {'23': 2}, '2016-02-02': {'22': 1}})

Related

Create a dictionary from a csv file with rows

I'm have a csv file
id,name,surname,age
"1, Johny, Black, 25"
"2, Armando, White, 18"
"3, Jack, Brown, ''"
"4, Ronn, Davidson, ''"
"5, Bill, Loney, 35"
first row this is list, other rows
How i can be converted this csv in dictionary. With future filter and sort
import csv
dicts = list()
with open("test.csv", "r", encoding="utf-8") as file:
csv_reader = csv.reader(file)
field_list = list()
record_list = list()
line_counter = 0
for row in csv_reader:
if line_counter == 0:
field_list = row
line_counter += 1
else:
records = row[0].split(',')
record_list.append(records)
counter = 0
full = dict()
for record in record_list:
for field in field_list:
try:
if field in full.keys():
full[field].append(record[counter])
counter += 1
else:
full[field] = [record[counter]]
if counter == len(record):
break
except Exception as e:
pass
print(full)
My code convert only 2 rows. I'm try split rows, but this don't help me.
Documentation csv lib not help me. Maybe someone knows solution
You never reset your counter to zero, the first time you loop through your nested for loop, the code initializes the dictionary keys to the first row in record_list and the counter remains equal to 0 (therefore only placing the first value in). The second time, the counter increments up to 4. So that every following time, the counter is out of index range for the record, and your exception will be raised.
I think the second half of your code should look like this:
full = dict()
for record in record_list:
counter = 0
for field in field_list:
try:
if field in full.keys():
full[field].append(record[counter])
else:
full[field] = [record[counter]]
counter += 1
except Exception as e:
pass
print(full)
The CSV library directly has a reader to convert in a dictionary:
https://docs.python.org/3/library/csv.html#csv.DictReader

Python list data filtering

I have a list that holds names of files, some of which are almost identical except for their timestamp string section. The list is in the format of [name-subname-timestamp] for example:
myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
What I need is a list that holds for every name and subname, the most recent file derived by the timestamp. I have started by creating a list that holds every [name-subname]:
name_subname_list = []
for row in myList:
name_subname_list.append((row.rpartition('-')[0]))
name_subname_list = set(name_subname_list) # {'name1-001', 'name2-002', 'name1-002'}
Not sure if it is the right approach, moreover I am not sure how to continue. Any ideas?
This code is what you asked for:
For each name-subname, you will have the corresponding newest file:
from datetime import datetime as dt
dic = {}
for i in myList:
sp = i.split('-')
name_subname = sp[0]+'-'+sp[1]
mytime = sp[2].split('.')[0]
if name_subname not in dic:
dic[name_subname] = mytime
else:
if dt.strptime(mytime, "%Y%m%d%H%M") > dt.strptime(dic[name_subname], "%Y%m%d%H%M"):
dic[name_subname] = mytime
result = []
for name_subname in dic:
result.append(name_subname+'-'+dic[name_subname]+'.txt')
which out puts resutl to be like:
['name1-001-202112021010.txt',
'name1-002-202112021010.txt',
'name2-002-202112020811.txt']
Try this:
myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
dic = {}
for name in myList:
parts = name.split('-')
dic.setdefault(parts[0] + '-' + parts[1], []).append(parts[2])
unique_list = []
for key,value in dic.items():
unique_list.append(key + '-' + max(value))

python script to concatenate values by row and delete identical

I am using python 2.7, and I have a text file that looks like this:
id value
--- ----
1 x
2 a
1 z
1 y
2 b
I am trying to get an ouput that looks like this:
id value
--- ----
1 x,z,y
2 a,b
Much appreciated!
The simplest solution would be to use collections.defaultdict and collections.OrderedDict. If you don't care about order you could also use sets instead of OrderedDict.
from collections import defaultdict, OrderedDict
# Keeps all unique values for each id
dd = defaultdict(OrderedDict)
# Keeps the unique ids in order of appearance
ids = OrderedDict()
with open(yourfilename) as f:
f = iter(f)
# skip first two lines
next(f), next(f)
for line in f:
id_, value = list(filter(bool, line.split())) # split at whitespace and remove empty ones
dd[id_][value] = None # dicts need a value, but here it doesn't matter which one...
ids[id_] = None
print('id value')
print('--- ----')
for id_ in ids:
print('{} {}'.format(id_, ','.join(dd[id_])))
Result:
id value
--- ----
1 x,z,y
2 a,b
In case you want to write it to another file just concatenate what I printed with \n and write it to a file.
I think this could also work, although the other answer seems more sophisticated:
input =['1,x',
'2,a',
'1,z',
'1,y',
'2,b',
'2,a', #added extra values to show duplicates won't be added
'1,z',
'1,y']
output = {}
for row in input:
parts = row.split(",")
id_ = parts[0]
value = parts[1]
if id_ not in output:
output[id_] = value
else:
a_List = list(output[id_])
if value not in a_List:
output[id_] += "," + value
else:
pass
You end up with a dictionary similar to what you requested.
#read
fp=open('','r')
d=fp.read().split("\n")
fp.close()
x=len(d)
for i in range(len(d)):
n= d[i].split()
d.append(n)
d=d[x:]
m={}
for i in d:
if i[0] not in m:
m[i[0]]=[i[1]]
else:
if i[1] not in m[i[0]]:
m[i[0]].append(i[1])
for i in m:
print i,",".join(m[i])

How do you make a dictionary out of 2 elements in each list?

I have a list of lists that I want to make into a dictionary. Basically it's a list of births based on date (year/month/day/day of week/births). I want to tally the total births for each day to see in total how many births on each day of the week.
List example:
[2000,12,3,2,12000],[2000,12,4,3,34000]...
days_counts = {1: 23000, 2: 43000, ..., 7: 11943}
Here's the code so far:
f = open('births.csv', 'r')
text = f.read()
text = text.split("\n")
header = text[0]
data = text[1:]
for d in data:
split_data = d.split(",")
print(split_data)
So basically I want to iterate over each day and add the birth from duplicate days into the same key (obviusly).
EDIT: I have to do this with an if statement that looks for the day of week as a key in the dict. if its found, assign the corresponding births as value. If its not in dict then add key and value. I can't import anything or use lambda functions.
Use a collections.Counter() object to track the counts per day-of-the-week. You also want to use the csv module to handle the file parsing:
import csv
from collections import Counter
per_dow = Counter()
with open('births.csv', 'r') as f:
reader = csv.reader(f)
header = next(reader)
for row in reader:
dow, births = map(int, row[-2:])
per_dow[dow] += births
I've used a with statement to manage the file object; Python auto-closes the file for you when the with block ends.
Now that you have a Counter object (which is a dictionary with some extra powers), you can now find the day of the week with the most births; the following loop prints out days of the week in order from most to least:
for day, births in per_dow.most_common():
print(day, births)
Without using external libraries or if statements, you can use exception handling
birth_dict = {}
birth_list = [[2000,12,3,2,12000],[2000,12,4,3,34000]]
for birth in birth_list:
try:
birth_dict[birth[3]]+=birth[4]
except KeyError:
birth_dict[birth[3]]=birth[4]
print birth_dict
Ok, after playing around with the code and using print statements where I need them for tests, I finally did it without using any external libraries. A very special thanks to Tobey and the others.
Here's the code with tests:
f = open('births.csv', 'r')
text = f.read()
text = text.split("\n")
header = text[0]
data = text[1:-1]
days_counts = {}
for d in data:
r = d.split(",")
print(r) #<--- used to test
k = r[3]
print(k)#<--- used to test
v = int(r[4])
print(v)#<--- used to test
if k in days_counts:
days_counts[k] += v
print("If : " , days_counts)#<--- used to test
else:
days_counts[k] = v
print("Else : ", days_counts)#<--- used to test
print(days_counts)
Code without tests:
f = open('births.csv', 'r')
text = f.read()
text = text.split("\n")
header = text[0]
data = text[1:-1]
days_counts = {}
for d in data:
r = d.split(",")
k = r[3]
v = int(r[4])
if k in days_counts:
days_counts[k] += v
else:
days_counts[k] = v
print(days_counts)

Dictionaries overwriting in Python

This program is to take the grammar rules found in Binary.text and store them into a dictionary, where the rules are:
N = N D
N = D
D = 0
D = 1
but the current code returns D: D = 1, N:N = D, whereas I want N: N D, N: D, D:0, D:1
import sys
import string
#default length of 3
stringLength = 3
#get last argument of command line(file)
filename1 = sys.argv[-1]
#get a length from user
try:
stringLength = int(input('Length? '))
filename = input('Filename: ')
except ValueError:
print("Not a number")
#checks
print(stringLength)
print(filename)
def str2dict(filename="Binary.txt"):
result = {}
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]] = line
print (result)
return result
print (str2dict("Binary.txt"))
Firstly, your data structure of choice is wrong. Dictionary in python is a simple key-to-value mapping. What you'd like is a map from a key to multiple values. For that you'll need:
from collections import defaultdict
result = defaultdict(list)
Next, where are you splitting on '=' ? You'll need to do that in order to get the proper key/value you are looking for? You'll need
key, value = line.split('=', 1) #Returns an array, and gets unpacked into 2 variables
Putting the above two together, you'd go about in the following way:
result = defaultdict(list)
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
key, value = line.split('=', 1)
result[key.strip()].append(value.strip())
return result
Dictionaries, by definition, cannot have duplicate keys. Therefor there can only ever be a single 'D' key. You could, however, store a list of values at that key if you'd like. Ex:
from collections import defaultdict
# rest of your code...
result = defaultdict(list) # Use defaultdict so that an insert to an empty key creates a new list automatically
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]].append(line)
print (result)
return result
This will result in something like:
{"D" : ["D = N D", "D = 0", "D = 1"], "N" : ["N = D"]}

Categories