Related
So my question is this. I have these JSON files stored in a list called json_list
['9.json',
'8.json',
'7.json',
'6.json',
'5.json',
'4.json',
'3.json',
'2.json',
'10.json',
'1.json',]
Each of these files contains a dictionary with an (ID NUMBER: Rating).
This is my code below. The idea is to store all of the keys and values of these files into a dictionary so it will be easier to search through. I've separated the keys and values so it will be easier to add into the dictionary. The PROBLEM is that this iteration only goes through the file '1.json' and then stops. I'm not sure why its not going through all 10.
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
keys = data.keys()
values = data.values()
Here:
data = json.loads(file)
keys = data.keys()
values = data.values()
You're resetting the value for keys and values instead of appending to it.
Maybe try appending them, something like (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
data = json.loads(file)
keys += list(data.keys())
values += list(data.values())
Or better yet just append the dictionary (The dictionary keys MUST be unique in each file or else you'll be overwriting data):
all_data = {}
for i in range(len(json_list)):
f = open(os.path.join("data", json_list[i]), encoding = 'utf-8')
file = f.read()
f.close()
data = json.loads(file)
all_data = {**all_data, **data}
Working example:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"foo":"bar","eggs":"spam","xxx":"yyy"}']
all_data = {}
for d in ds:
data = json.loads(d)
all_data = {**all_data, **data}
print (all_data)
Output:
{'1': 'a', '2': 'b', '3': 'c', 'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44', 'foo': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}
If the keys are not unique try appending the dictionaries to a list of dictionaries like this:
import json
ds = ['{"1":"a","2":"b","3":"c"}','{"aa":"11","bb":"22","cc":"33", "dd":"44"}','{"dd":"bar","eggs":"spam","xxx":"yyy"}']
all_dicts= []
for d in ds:
data = json.loads(d)
all_dicts.append(data)
print (all_dicts)
# to access key
print (all_dicts[0]["1"])
Output:
[{'1': 'a', '2': 'b', '3': 'c'}, {'aa': '11', 'bb': '22', 'cc': '33', 'dd': '44'}, {'dd': 'bar', 'eggs': 'spam', 'xxx': 'yyy'}]
a
I'm fairly new to python and have the following problem. I have a nested dictionary in the form of
dict = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
and would like to find all the keys that have the same values. The output should look similar to this.
1 : [a,b]
2 : [a,c]
..
Many thanks in Advance for any help!
dict = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
output = {}
for key, value in dict.items():
for v in value:
if v in output.keys():
output[v].append(key)
else:
output[v] = [ key ]
print(output)
And the output will be
{'2': ['a', 'c'], '1': ['a', 'b'], '5': ['b'], '3': ['c']}
before we go to the solution, lemme tell you something. What you've got there is not a nested dictionary but rather sets within the dictionary.
Some python terminologies to clear that up:
Array: [ 1 , 2 ]
Arrays are enclosed in square braces & separated by commas.
Dictionary: { "a":1 , "b":2 }
Dictionaries are enclosed in curly braces & separate "key":value pairs with comma. Here, "a" & "b" are keys & 1 & 2 would be their respective values.
Set: { 1 , 2 }
Sets are enclosed in curly braces & separated by commas.
dict = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
Here, {'1', '2'} is a set in a dictionary with key 'a'. Thus, what you've got is actually set in a dictionary & not a nested dictionary.
Solution
Moving on to the solution, sets are not iterable meaning you can't go through them one by one. So, you gotta turn them into lists & then iterate them.
# Initialize the dictionary to be processed
data = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
# Create dictionary to store solution
sol = {} # dictionary to store element as a key & sets containing that element as an array
# Eg., sol = { "1" : [ "a" , "b" ] }
# This shows that the value 1 is present in the sets contained in keys a & b.
# Record all elements & list every set containing those elements
for key in data. keys (): # iterate all keys in the dictionary
l = list ( data [ key ] ) # convert set to list
for elem in l: # iterate every element in the list
if elem in sol. keys (): # check if elem already exists in solution as a key
sol [ elem ]. append ( key ) # record that key contains elem
else:
sol [ elem ] = [ key ] # create a new list with elem as key & store that key contains elem
# At this time, sol would be
# {
# "1" : [ "a" , "b" ] ,
# "2" : [ "a" , "C" ] ,
# "3" : [ "c" ] ,
# "5" : [ "b" ]
# }
# Since, you want only the ones that are present in more than 1 sets, let's remove them
for key in sol : # iterate all keys in sol
if sol [ key ]. length < 2 : # Only keys in at least 2 sets will be retained
del sol [ key ] # remove the unrequired element
# Now, you have your required output in sol
print ( sol )
# Prints:
# {
# "1" : [ "a" , "b" ] ,
# "2" : [ "a" , "c" ]
# }
I hope that helps you...
You can use a defaultdict to build the output easily (and sort it if you want the keys in sorted order):
from collections import defaultdict
d = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
out = defaultdict(list)
for key, values in d.items():
for value in values:
out[value].append(key)
# for a sorted output (dicts are ordered since Python 3.7):
sorted_out = dict((k, out[k]) for k in sorted(out))
print(sorted_out)
#{'1': ['a', 'b'], '2': ['a', 'c'], '3': ['c'], '5': ['b']}
you can reverse the key-value in dict, create a value-key dict, if you only want duplicated values(find all the keys that have the same values), you can filter it:
from collections import defaultdict
def get_duplicates(dict1):
dict2 = defaultdict(list)
for k, v in dict1.items():
for c in v:
dict2[c].append(k)
# if you want to all values, just return dict2
# return dict2
return dict(filter(lambda x: len(x[1]) > 1, dict2.items()))
output:
{'1': ['a', 'b'], '2': ['a', 'c']}
This can be easily done using defaultdict from collections,
>>> d = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> for key,vals in d.items():
... for val in vals:
... dd[val].append(key)
...
>>>>>> dict(dd)
{'1': ['a', 'b'], '3': ['c'], '2': ['a', 'c'], '5': ['b']}
This can be easily achieved with two inner for loops:
dict = {'a': {'1','2'}, 'b':{'5','1'}, 'c':{'3','2'}}
out = {}
for key in dict:
for value in dict[key]:
if value not in out:
out[value]= [key]
else:
out[value]+= [key]
print out # {'1': ['a', 'b'], '3': ['c'], '2': ['a', 'c'], '5': ['b']}
I have question about Dictionaries in Python.
here it is:
I have a dict like dict = { 'abc':'a', 'cdf':'b', 'gh':'a', 'fh':'g', 'hfz':'g' }
Now i want to get all Key-Elements by the same value and save it in a new dict.
The new Dict should be look like:
new_dict = { 'b':('cdf'), 'a':('abc','gh'), 'g':('fh','hfz')}
If you are fine with lists instead of tuples in the new dictionary, you can use
from collections import defaultdict
some_dict = { 'abc':'a', 'cdf':'b', 'gh':'a', 'fh':'g', 'hfz':'g' }
new_dict = defaultdict(list)
for k, v in some_dict.iteritems():
new_dict[v].append(k)
If you want to avoid the use of defaultdict, you could also do
new_dict = {}
for k, v in some_dict.iteritems():
new_dict.setdefault(v, []).append(k)
Here's a naive implementation. Someone with better Python skills can probably make it more concise and awesome.
dict = { 'abc':'a', 'cdf':'b', 'gh':'a', 'fh':'g', 'hfz':'g' }
new_dict = {}
for pair in dict.items():
if pair[1] not in new_dict.keys():
new_dict[pair[1]] = []
new_dict[pair[1]].append(pair[0])
print new_dict
This produces
{'a': ['abc', 'gh'], 'b': ['cdf'], 'g': ['fh', 'hfz']}
If you do specifically want tuples as the values in your new dictionary, you can still use defaultdict, and use tuple concatenation. This solution works in Python 3.4+:
from collections import defaultdict
source = {'abc': 'a', 'cdf': 'b', 'gh': 'a', 'fh': 'g', 'hfz': 'g'}
target = defaultdict(tuple)
for key in source:
target[source[key]] += (key, )
print(target)
Which will produce
defaultdict(<class 'tuple'>, {'a': ('abc', 'gh'), 'g': ('fh', 'hfz'), 'b': ('cdf',)})
This will probably be slower than generating a dictionary by list insertion, and will create more objects to be collected. So, you can build your dictionary out of lists, and then map it into tuples:
target2 = defaultdict(list)
for key in source:
target2[source[key]].append(key)
for key in target2:
target2[key] = tuple(target2[key])
print(target2)
Which will give the same result as above.
It can be done this way too, without using any extra functions .
some_dict = { 'abc':'a', 'cdf':'b', 'gh':'a', 'fh':'g', 'hfz':'g' }
new_dict = { }
for keys in some_dict:
new_dict[some_dict[keys]] = [ ]
for keys in some_dict:
new_dict[some_dict[keys]].append(keys)
print(new_dict)
I have 2 long lists (extracted from a csv) both of the same index length.
Example:
l1 = ['Apple','Tomato','Cocos'] #name of product
l2 = ['1','2','3'] #some id's
I made my dictionary with this method:
from collections import defaultdict
d = defaultdict(list)
for x in l1:
d['Product'].append(x)
for y in l2:
d['Plu'].append(y)
print d
This will output:
{'Product': ['Apple', 'Tomato', 'Cocos'], 'Plu': ['1', '2', '3']}
(Product and Plu are my wanted keys)
Now I've tried to import this to a JavaScript Object like this:
import json
print(json.dumps(d, sort_keys=True, indent=4))
This will output:
{
"Plu": [
"1",
"2",
"3"
],
"Product": [
"Apple",
"Tomato",
"Cocos"
]
}
But my desired output is this:
{
Product:'Apple',
Plu:'1'
},
{
Product:'Tomato',
Plu:'2'
},
{
Product:'Cocos',
Plu:'3'
}
I will later use that to insert values in a MongoDB. What will I have to change in my json.dump (or in my dict?) in order to get a desired output? Also is there a way to save the output in a txt file? (since I will have a big code).
Rather than using a defaultdict (which doesn't buy you anything in this case), you're better off zipping the lists and creating a dict from each pair:
[{'Product': product, 'Plu': plu} for product, plu in zip(l1, l2)]
So I have a CSV file with the data arranged like this:
X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3
I want to import the CSV to create a nested dictionary so that looks like this.
data = {'X' : {'a' : 1, 'b' : 2, 'c' : 3},
'y' : {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4},
'Z' : {'l' : 2, 'm' :3}}
After updating the dictionary in the program I wrote (I got that part figured out), I want to be able to export the dictionary onto the same CSV file, overwriting/updating it. However I want it to be in the same format as the previous CSV file so that I can import it again.
I have been playing around with the import and have this so far
import csv
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = {row[i] for i in range(1, len(row))}
But this doesn't work as things are not arranged correctly. Some numbers are subkeys to other numbers, letters are out of place, etc. I haven't even gotten to the export part yet. Any ideas?
Since you're not interested in preserving order, something relatively simple should work:
import csv
# import
data = {}
with open('userdata.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
a = iter(row[1:])
data[row[0]] = dict(zip(a, a))
# export
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
for key, values in data.items():
row = [key] + [value for item in values.items() for value in item]
writer.writerow(row)
The latter could be done a little more efficiently by making only a single call to thecsv.writer's writerows()method and passing it a generator expression.
# export2
with open('userdata_exported.csv', 'w') as f:
writer = csv.writer(f)
rows = ([key] + [value for item in values.items() for value in item]
for key, values in data.items())
writer.writerows(rows)
You can use the grouper recipe from itertools:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
This will group your data into the a1/b2/c3 pairs you want. So you can do data[row[0]] = {k: v for k, v in grouper(row[1:], 2)} in your loop.
from collections import defaultdict
data_lines = """X,a,1,b,2,c,3
Y,a,1,b,2,c,3,d,4
Z,l,2,m,3""".splitlines()
data = defaultdict(dict)
for line in data_lines:
# you should probably add guards against invalid data, empty lines etc.
main_key, sep, tail = line.partition(',')
items = [item.strip() for item in tail.split(',')]
items = zip(items[::2], map(int, items[1::2])
# data[main_key] = {key : value for key, value in items}
data[main_key] = dict(items)
print dict(data)
# {'Y': {'a': '1', 'c': '3', 'b': '2', 'd': '4'},
# 'X': {'a': '1', 'c': '3', 'b': '2'},
# 'Z': {'m': '3', 'l': '2'}
# }
I'm lazy, so I might do something like this:
import csv
data = {}
with open('userdata.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
data[row[0]] = dict(zip(row[1::2], map(int,row[2::2])))
which works because row[1::2] gives every other element starting at 1, and row[2::2 every other element starting at 2. zip makes a tuple pair of those elements, and then we pass that to dict. This gives
{'Y': {'a': 1, 'c': 3, 'b': 2, 'd': 4},
'X': {'a': 1, 'c': 3, 'b': 2},
'Z': {'m': 3, 'l': 2}}
(Note that I changed your open to use 'rb', which is right for Python 2: if you're using 3, you want 'r', newline='' instead.)