I was wondering whether there was a way read all columns in a row except the first one as ints using csv.DictReader, kind of like this:
filename = sys.argv[1]
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
row[1:] = int(row[1:])
database.append(row)
I know this isn't a correct way to do this as it gives out the error of being unable to hash slices. I have a way to circumvent having to do this at all, but for future reference, I'm curious whether, using slices or not, I can selectively interact with columns in a row without hardcoding each one?
You can do it by using the key() dictionary method to get a list of the keys in each dictionary and the slice that for doing the conversion:
import csv
from pprint import pprint
import sys
filename = sys.argv[1]
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
for key in list(row.keys())[1:]:
row[key] = int(row[key])
database.append(row)
pprint(database)
Output:
[{'name': 'John', 'number1': 1, 'number2': 2, 'number3': 3, 'number4': 4},
{'name': 'Alex', 'number1': 4, 'number2': 3, 'number3': 2, 'number4': 1},
{'name': 'James', 'number1': 1, 'number2': 3, 'number3': 2, 'number4': 4}]
Use this:
import csv
filename = 'test.csv'
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
new_d = {} # Create new dictionary to be appended into database
for i, (k, v) in enumerate(row.items()): # Loop through items of the row (i = index, k = key, v = value)
new_d[k] = int(v) if i > 0 else v # If i > 0, add int(v) to the dictionary, else add v
database.append(new_d) # Append to database
print(database)
test.csv:
Letter,Num1,Num2
A,123,456
B,789,012
C,345,678
D,901,234
E,567,890
Output:
[{'Letter': 'A', 'Num1': 123, 'Num2': 456},
{'Letter': 'B', 'Num1': 789, 'Num2': 12},
{'Letter': 'C', 'Num1': 345, 'Num2': 678},
{'Letter': 'D', 'Num1': 901, 'Num2': 234},
{'Letter': 'E', 'Num1': 567, 'Num2': 890}]
Related
I am trying to create a dictionary of dictionaries in Python from a CSV file, the file looks something like this:
Column 1
Column 2
Column 3
A
flower
12
A
sun
13
B
cloud
14
B
water
34
C
rock
12
And I am trying to get a dictionary of dictionaries that looks like this:
dict = {
'A': {'flower': 12, 'sun': 13},
'B': {'cloud': 14, 'water': 34},
'C': {'rock': 12}
}
The code I tried so far is as follows:
import csv
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d=dict()
for row in rows:
head,tail=row[0], row[1:]
d[head]=dict(zip(tail[0:], tail[1:]))
print(d)
but it's not working well as I am getting this result:
dict = {
'A': {'sun': 13},
'B': {'water': 34},
'C': {'rock': 12}
}
You need to update your d[head] every iteration, not replace it:
import csv
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d=dict()
for row in rows:
head,name,value=row[0], row[1], row[2]
if head not in d:
d[head]= {} # {} is like dict() but faster
d[head][name] = value
print(d)
Or with defaultdict to be more concise:
import csv
from collections import defaultdict
with open('file.csv', 'r') as csvFile:
rows=csv.reader(csvFile)
d = defaultdict(dict)
for row in rows:
head,name,value=row[0], row[1], row[2]
d[head][name] = value
print(d) # or print(dict(d))
So my question is this. I have these JSON files stored in a list called json_list
['9.json',
'8.json',
'7.json',
'6.json',
'5.json',
'4.json',
'3.json',
'2.json',
'10.json',
'1.json',]
Each of these files contains a dictionary with an (ID NUMBER: Rating).
This is my code below. The idea is to store all of the keys and values of these files into a dictionary so it will be easier to search through. I've separated the keys and values so it will be easier to add into the dictionary. The PROBLEM is that this iteration only goes through the file '1.json' and then stops. I'm not sure why its not going through all 10.
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
keys = data.keys()
values = data.values()
Here:
data = json.loads(file)
keys = data.keys()
values = data.values()
You're resetting the value for keys and values instead of appending to it.
Maybe try appending them, something like (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
data = json.loads(file)
keys += list(data.keys())
values += list(data.values())
Or better yet just append the dictionary (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
all_data = {}
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
all_data = {**all_data, **data}
Working example:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"foo":"bar","eggs":"spam","xxx":"yyy"}']
all_data = {}
for d in ds:
data = json.loads(d)
all_data = {**all_data, **data}
print (all_data)
Output:
{'1': 'a', '2': 'b', '3': 'c', 'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44', 'foo': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}
If the keys are not unique try appending the dictionaries to a list of dictionaries like this:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"dd":"bar","eggs":"spam","xxx":"yyy"}']
all_dicts= []
for d in ds:
data = json.loads(d)
all_dicts.append(data)
print (all_dicts)
# to access key
print (all_dicts[0]["1"])
Output:
[{'1': 'a', '2': 'b', '3': 'c'}, {'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44'}, {'dd': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}]
a
I'm trying to create a datastructure of nested dictionaries in Python. I read 2 relational sql-table-like csv files into dataframes and then convert them row by row into dictionaries. Inside these dictionaries I store dictionaries I created from another csv.
My code below works as long as I just store a dictionary directly in a dict key.
But what I actually want is that data[id]['ticket'] contain a list of dictionaries. (1 customer could have multiple tickets)
import json
import pandas as pd
import collections
# Import csv into dataframe (maybe not necessesary)
df1 = pd.read_csv('customer.csv', sep=';', header=0, dtype=object, na_filter=False)
df2 = pd.read_csv('tickets.csv', sep=';', header=0, dtype=object, na_filter=False)
df1['tickets'] = '' #create new empty column in dataframe 1
data = collections.defaultdict(dict)
# Convert initial dataframe to dictionary of dictionarys
for index, row in df1.iterrows():
row_dict = row.to_dict()
data[row_dict['id']] = row_dict
data[row_dict['id']]['tickets'] = []
# Convert each row of dataframe 2 to into dictionary and store on correct key of dict 1
for index, row in df2.iterrows():
row_dict = row.to_dict()
data[row_dict['kundenid']]['tickets'].append(row_dict)
with open('json_file', 'w') as f:
json.dump(data, f, indent=4)
With this code I get a key error for tickets. However when I use data[row_dict['id']]['tickets'] = row_dict to just append the dict to the key tickets the code works. I just need multiple dicts in this field.
What I finally want to achieve is a dictionary / JSON that looks like this:
{ "1111": {
"id": "1111",
"name": "",
"adr": "",
"tickets": [{
"ticketid": "123545",
"id": "1111"
},
{
"ticketid": "123545",
"id": "1111"
}]}
....
}
How can I store a list of dictionaries under the key tickets?
Edit: Some sample input data:
tickets.csv
id;ticketid;xyz;message
1;9;1;fgsgfs
2;8;2;gdfg
3;7;3;gfsfgfg
4;6;4;fgsfdgfd
5;5;5;dgsgd
6;4;6;dfgsgdf
7;3;7;dfgdhfd
Customer.csv
id;name;surname;address;XID
1;Mueller;Hans;42553;1
2;Meier;Peter;42873;2
3;Schmidt;Micha;42567;213
4;Pauli;Ulli;98790;432
5;Dick;Franz;45632;423
6;Doof;Udo;76543;233
7;Pang;Lars;43232;234
8;Peutz;Lee;11342;4234
Your solution seems to work with input data provided (see below). Is there something I am missing?
As you point out, you need to test for keys in your second loop, as below. This is only apparent in your full dataset.
Setup
I have modified your data slightly so it demonstrates the problem better.
from collections import defaultdict
import pandas as pd
from io import StringIO
df1 = pd.read_csv(StringIO("""id;name;surname;address;XID
1;Mueller;Hans;42553;1
2;Meier;Peter;42873;2
3;Schmidt;Micha;42567;213"""), sep=';')
df2 = pd.read_csv(StringIO("""id;ticketid;xyz;message
1;9;1;fgsgfs
1;8;2;gdfg
2;7;3;gfsfgfg
2;6;4;fgsfdgfd
3;5;5;dgsgd
3;4;6;dfgsgdf
3;3;7;dfgdhfd"""), sep=';')
Solution
data = defaultdict(dict)
for index, row in df1.iterrows():
row_dict = row.to_dict()
data[row_dict['id']] = row_dict
data[row_dict['id']]['tickets'] = []
for index, row in df2.iterrows():
row_dict = row.to_dict()
if row_dict['id'] in data:
data[row_dict['id']]['tickets'].append(row_dict)
Result
defaultdict(dict,
{1: {'XID': 1,
'address': 42553,
'id': 1,
'name': 'Mueller',
'surname': 'Hans',
'tickets': [{'id': 1, 'message': 'fgsgfs', 'ticketid': 9, 'xyz': 1},
{'id': 1, 'message': 'gdfg', 'ticketid': 8, 'xyz': 2}]},
2: {'XID': 2,
'address': 42873,
'id': 2,
'name': 'Meier',
'surname': 'Peter',
'tickets': [{'id': 2, 'message': 'gfsfgfg', 'ticketid': 7, 'xyz': 3},
{'id': 2, 'message': 'fgsfdgfd', 'ticketid': 6, 'xyz': 4}]},
3: {'XID': 213,
'address': 42567,
'id': 3,
'name': 'Schmidt',
'surname': 'Micha',
'tickets': [{'id': 3, 'message': 'dgsgd', 'ticketid': 5, 'xyz': 5},
{'id': 3, 'message': 'dfgsgdf', 'ticketid': 4, 'xyz': 6},
{'id': 3, 'message': 'dfgdhfd', 'ticketid': 3, 'xyz': 7}]}})
Is it possible to read data from a csv file into a dictionary, such that the first row of a column is the key and the remaining rows of that same column constitute the value as a list?
E.g. I have a csv file
strings, numbers, colors
string1, 1, blue
string2, 2, red
string3, 3, green
string4, 4, yellow
using
with open(file,'rU') as f:
reader = csv.DictReader(f)
for row in reader:
print row
I obtain
{'color': 'blue', 'string': 'string1', 'number': '1'}
{'color': 'red', 'string': 'string2', 'number': '2'}
{'color': 'green', 'string': 'string3', 'number': '3'}
{'color': 'yellow', 'string': 'string4', 'number': '4'}
or using
with open(file,'rU') as f:
reader = csv.reader(f)
mydict = {rows[0]:rows[1:] for rows in reader}
print(mydict)
I obtain the following dictionary
{'string3': ['3', 'green'], 'string4': ['4', 'yellow'], 'string2': ['2', 'red'], 'string': ['number', 'color'], 'string1': ['1', 'blue']}
However, I would like to obtain
{'strings': ['string1', 'string2', 'string3', 'string4'], 'numbers': [1, 2, 3,4], 'colors': ['red', 'blue', 'green', 'yellow']}
You need to parse the first row, create the columns, and then progress to the rest of the rows.
For example:
columns = []
with open(file,'rU') as f:
reader = csv.reader(f)
for row in reader:
if columns:
for i, value in enumerate(row):
columns[i].append(value)
else:
# first row
columns = [[value] for value in row]
# you now have a column-major 2D array of your file.
as_dict = {c[0] : c[1:] for c in columns}
print(as_dict)
output:
{
' numbers': [' 1', ' 2', ' 3', ' 4'],
' colors ': [' blue', ' red', ' green', ' yellow'],
'strings': ['string1', 'string2', 'string3', 'string4']
}
(some weird spaces, which were in your input "file". Remove spaces before/after commas, or use value.strip() if they're in your real input.)
This is why we have the defaultdict
from collections import defaultdict
from csv import DictReader
columnwise_table = defaultdict(list)
with open(file, 'rU') as f:
reader = DictReader(f)
for row in reader:
for col, dat in row.items():
columnwise_table[col].append(dat)
print columnwise_table
Yes it is possible: Try it this way:
import csv
from collections import defaultdict
D=defaultdict(list)
csvfile=open('filename.csv')
reader= csv.DictReader(csvfile) # Dictreader uses the first row as dictionary keys
for l in reader: # each row is in the form {k1 : v1, ... kn : vn}
for k,v in l.items():
D[k].append(v)
...................
...................
Assuming filename.csv has some data like
strings,numbers,colors
string1,1,blue
string2,2,red
string3,3,green
string4,4,yellow
then D will result in
defaultdict(<class 'list'>,
{'numbers': ['1', '2', '3', '4'],
'strings': ['string1', 'string2', 'string3', 'string4'],
'colors': ['blue', 'red', 'green', 'yellow']})
So I have a CSV file with the data arranged like this:
X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3
I want to import the CSV to create a nested dictionary so that looks like this.
data = {'X' : {'a' : 1, 'b' : 2, 'c' : 3},
'y' : {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4},
'Z' : {'l' : 2, 'm' :3}}
After updating the dictionary in the program I wrote (I got that part figured out), I want to be able to export the dictionary onto the same CSV file, overwriting/updating it. However I want it to be in the same format as the previous CSV file so that I can import it again.
I have been playing around with the import and have this so far
import csv
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = {row[i] for i in range(1, len(row))}
But this doesn't work as things are not arranged correctly. Some numbers are subkeys to other numbers, letters are out of place, etc. I haven't even gotten to the export part yet. Any ideas?
Since you're not interested in preserving order, something relatively simple should work:
import csv
# import
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
a = iter(row[1:])
data[row[0]] = dict(zip(a, a))
# export
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
for key, values in data.items():
row = [key] + [value for item in values.items() for value in item]
writer.writerow(row)
The latter could be done a little more efficiently by making only a single call to thecsv.writer's writerows()method and passing it a generator expression.
# export2
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
rows = ([key] + [value for item in values.items() for value in item]
for key, values in data.items())
writer.writerows(rows)
You can use the grouper recipe from itertools:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
This will group your data into the a1/b2/c3 pairs you want. So you can do data[row[0]] = {k: v for k, v in grouper(row[1:], 2)} in your loop.
from collections import defaultdict
data_lines = """X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3""".splitlines()
data = defaultdict(dict)
for line in data_lines:
# you should probably add guards against invalid data, empty lines etc.
main_key, sep, tail = line.partition(',')
items = [item.strip() for item in tail.split(',')]
items = zip(items[::2], map(int, items[1::2])
# data[main_key] = {key : value for key, value in items}
data[main_key] = dict(items)
print dict(data)
# {'Y': {'a': '1', 'c': '3', 'b': '2', 'd': '4'},
# 'X': {'a': '1', 'c': '3', 'b': '2'},
# 'Z': {'m': '3', 'l': '2'}
# }
I'm lazy, so I might do something like this:
import csv
data = {}
with open('userdata.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = dict(zip(row[1::2], map(int,row[2::2])))
which works because row[1::2] gives every other element starting at 1, and row[2::2 every other element starting at 2. zip makes a tuple pair of those elements, and then we pass that to dict. This gives
{'Y': {'a': 1, 'c': 3, 'b': 2, 'd': 4},
'X': {'a': 1, 'c': 3, 'b': 2},
'Z': {'m': 3, 'l': 2}}
(Note that I changed your open to use 'rb', which is right for Python 2: if you're using 3, you want 'r', newline='' instead.)