Collection to 2d list in Python - python

I am trying to pass a MongoDB collection to a python 2d list. I need for each sublist to contain only the values of each key within the document. For example, if the MongoDB documents are:
{
_id: ObjectId("5099803df3f4948bd2f98392"),
name: 'marie',
age: '23',
gender: 'female'
}
and
{
_id: ObjectId("5099803df3f4948bd2f98391"),
name: 'john',
age: '43',
gender: 'male'
}
I need to get something like:
[
[ObjectId("5099803df3f4948bd2f98392"), 'marie', '23', 'female],
[ObjectId("5099803df3f4948bd2f98391"), 'john', '43', 'male']
]
I am new to MongoDB and PyMongo. For now, the closest I have been able to do is something like this:
people = mongo.db.population
people_key_list = ['_id', 'name', 'age', 'gender']
people_list = []
for item in people.find():
people_list.append(item)
But the structure of the results are not really what I need:
[
['ObjectId("5099803df3f4948bd2f98392")','ObjectId("5099803df3f4948bd2f98391")'],
['marie','john'],['23','43'],['female','male']
]
I could rotate the 2d list, but I am sure there should be a way to generate the structure I need efficiently from the start... but can't figure out how.

You already have people_key_list defined with the names of the keys, so just do a map over that list and extract the values:
from pymongo import MongoClient
from bson import ObjectId
data = [
{
'_id': ObjectId("5099803df3f4948bd2f98392"),
'name': 'marie',
'age': '23',
'gender': 'female'
},
{
'_id': ObjectId("5099803df3f4948bd2f98391"),
'name': 'john',
'age': '43',
'gender': 'male'
}
]
client = MongoClient();
db = client['test']
db.population.remove({})
db.population.insert_many(data)
people_key_list = ['_id', 'name', 'age', 'gender']
people_list = []
for person in db.population.find():
people_list.append(map(lambda k: person[k],people_key_list))
print(people_list)
Or even just nest the map for that matter:
people_list = map(lambda person:
map(lambda k: person[k],people_key_list),
db.population.find()
)
Either would return:
[
[ObjectId('5099803df3f4948bd2f98392'), u'marie', u'23', u'female'],
[ObjectId('5099803df3f4948bd2f98391'), u'john', u'43', u'male']
]

Related

List of Dictionary - How to combine a list of dictionary Python

a =[{
"id":"1",
"Name":'BK',
"Age":'56'
},
{
"id":"1",
"Sex":'Male'
},
{
"id":"2",
"Name":"AK",
"Age":"32"
}]
I have a list of dictionary with a person information split in multiple dictionary as above for ex above id 1's information is contained in first 2 dictionary , how can i get an output of below
{1: {'Name':'BK','Age':'56','Sex':'Male'}, 2: { 'Name': 'AK','Age':'32'}}
You can use a defaultdict to collect the results.
from collections import defaultdict
a =[{ "id":"1", "Name":'BK', "Age":'56' }, { "id":"1", "Sex":'Male' }, { "id":"2", "Name":"AK", "Age":"32" }]
results = defaultdict(dict)
key = lambda d: d['id']
for a_dict in a:
results[a_dict.pop('id')].update(a_dict)
This gives you:
>>> results
defaultdict(<class 'dict'>, {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}})
The defaultdict type behaves like a normal dict, except that when you reference an unknown value, a default value is returned. This means that as the dicts in a are iterated over, the values (except for id) are updated onto either an existing dict, or an automatic newly created one.
How does collections.defaultdict work?
Using defaultdict
from collections import defaultdict
a = [{
"id": "1",
"Name": 'BK',
"Age": '56'
},
{
"id": "1",
"Sex": 'Male'
},
{
"id": "2",
"Name": "AK",
"Age": "32"
}
]
final_ = defaultdict(dict)
for row in a:
final_[row.pop('id')].update(row)
print(final_)
defaultdict(<class 'dict'>, {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}})
You can combine 2 dictionaries by using the .update() function
dict_a = { "id":"1", "Name":'BK', "Age":'56' }
dict_b = { "id":"1", "Sex":'Male' }
dict_a.update(dict_b) # {'Age': '56', 'Name': 'BK', 'Sex': 'Male', 'id': '1'}
Since the output the you want is in dictionary form
combined_dict = {}
for item in a:
id = item.pop("id") # pop() remove the id key from item and return the value
if id in combined_dict:
combined_dict[id].update(item)
else:
combined_dict[id] = item
print(combined_dict) # {'1': {'Name': 'BK', 'Age': '56', 'Sex': 'Male'}, '2': {'Name': 'AK', 'Age': '32'}}
from collections import defaultdict
result = defaultdict(dict)
a =[{ "id":"1", "Name":'BK', "Age":'56' }, { "id":"1", "Sex":'Male' }, { "id":"2", "Name":"AK", "Age":"32" }]
for b in a:
result[b['id']].update(b)
print(result)
d = {}
for p in a:
id = p["id"]
if id not in d.keys():
d[id] = p
else:
d[id] = {**d[id], **p}
d is the result dictionary you want.
In the for loop, if you encounter an id for the first time, you just store the incomplete value.
If the id is in the existing keys, update it.
The combination happens in {**d[id], **p}
where ** is unpacking the dict.
It unpacks the existing incomplete dict associated withe the id and the current dict, then combine them into a new dict.

How to order the keys of Dictionary

I have the below dictionary and I would like to print the keys in the order:
[Name,Gender,Occupation,Location]
{'Gender': 'Male',
'Location': 'Nizampet,Hyderabad',
'Name': 'Srikanth',
'Occupation': 'Data Scientist'}
Can someone suggest how it can be done.
You can arrange the order of keys when you create the dictionary.
Python 3.7+
In Python 3.7.0 the insertion-order preservation nature of
dict objects has been declared to be an official part of the Python
language spec. Therefore, you can depend on it.
old_dict = {'Gender': 'Male', 'Location': 'Nizampet,Hyderabad', 'Name': 'Srikanth', 'Occupation': 'Data Scientist'}
new_dict = {
"Name": old_dict["Name"],
"Gender": old_dict["Gender"],
"Occupation": old_dict["Occupation"],
"Location": old_dict["Location"]
}
print(new_dict)
Output
{'Name': 'Srikanth', 'Gender': 'Male', 'Occupation': 'Data Scientist', 'Location': 'Nizampet,Hyderabad'}
If you have a predetermined order in which you'd like to print the data, store the order in a list of keys and loop through them.
data = {
'Gender': 'Male',
'Location': 'Nizampet,Hyderabad',
'Name': 'Srikanth',
'Occupation': 'Data Scientist'
}
print_order = [
'Name',
'Gender',
'Occupation',
'Location',
]
for key in print_order:
print(f'{key}: {data[key]}')
Output:
$ python3 print_dict_in_order.py
Name: Srikanth
Gender: Male
Occupation: Data Scientist
Location: Nizampet,Hyderabad

How to get lenght of dict keys after specific element?

There is a dict
example_dict =
{'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'}
I have to check if there are any additional keys after date_stop key and if yes, get the lenght of them and their names.
So far I made a list of keys
list_keys = list(example_dict.keys())
list_keys =
['spend',
'impressions',
'clicks',
'campaign_id',
'date_start',
'date_stop',
'age',
'gender']
And to check that there is 'date_stop' element is simple
if 'date_stop' in list_keys:
# what next
But how to proceed am not sure. Appreciate any help.
I guess it should be implement in diffrent way, You should be using dict, but if You really want to do this way You could use OrderedDict from collections:
from collections import OrderedDict
my_dict = {
'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'
}
sorted_ordered_dict = OrderedDict(sorted(my_dict.items(), key=lambda t: t[0]))
if 'date_stop' in sorted_ordered_dict.keys():
keys = list(sorted_ordered_dict.keys())
index = keys.index('date_stop')
after_list = keys[index:]
print('len: ', len(after_list))
print('list: ', after_list)
use below code:
new_dict={}
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict[i]=len(i)
output:
{'date_stop': 9, 'age': 3, 'gender': 6}
I hope you understand your question
if you want just name and number of keys use this:
new_dict=[]
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict.append(i)
output:
print (new_dict)
print (len(new_dict))
['date_stop', 'age', 'gender']
3

Creating a dictionary from two lists in python

I have a JSON data as below.
input_list = [["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]]
I have another list where I have the prospective keys present
list_keys = ['name', 'current_project', 'details']
I am trying to create a dic using both to make the data usable for metrics
I have summarized the both the list for the question but it goes on forever, there are multiple elements in the list. input_list is a nested list which has 500k+ elements and each list element have 70+ elements of their own (expect the details one)
list_keys also have 70+ elements in it.
I was trying to create a dict using zip but that its not helping given the size of data, also with zip I am not able to exclude the "details" element from
I am expecting output something like this.
[
{
"name": "Richard",
"current_project": "",
"children": "yes",
"divorced": "no",
"occupation": "analyst"
},
{
"name": "Mary",
"current_project" :"testing",
"children": "no",
"divorced": "yes",
"occupation": "QA analyst",
"location": "Seattle"
}
]
I have tried this so far
>>> for line in input_list:
... zipbObj = zip(list_keys, line)
... dictOfWords = dict(zipbObj)
...
>>> print dictOfWords
{'current_project': ['testing'], 'name': 'Mary', 'details': {'location': 'Seattle', 'children': 'no', 'divorced': 'yes', 'occupation': 'QA analyst'}}
but with this I am unable to to get rid of nested dict key "details". so looking for help with that
Seems like what you wanted was a list of dictionaries, here is something i coded up in the terminal and copied in here. Hope it helps.
>>> list_of_dicts = []
>>> for item in input_list:
... dict = {}
... for i in range(0, len(item)-2, 3):
... dict[list_keys[0]] = item[i]
... dict[list_keys[1]] = item[i+1]
... dict.update(item[i+2])
... list_of_dicts.append(dict)
...
>>> list_of_dicts
[{'name': 'Richard', 'current_project': [], 'children': 'yes', 'divorced': 'no', 'occupation': 'analyst'
}, {'name': 'Mary', 'current_project': ['testing'], 'children': 'no', 'divorced': 'yes', 'occupation': '
QA analyst', 'location': 'Seattle'}]
I will mention it is not the ideal method of doing this since it relies on perfectly ordered items in the input_list.
people = input_list = [["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]]
list_keys = ['name', 'current_project', 'details']
listout = []
for person in people:
dict_p = {}
for key in list_keys:
if not key == 'details':
dict_p[key] = person[list_keys.index(key)]
else:
subdict = person[list_keys.index(key)]
for subkey in subdict.keys():
dict_p[subkey] = subdict[subkey]
listout.append(dict_p)
listout
The issue with using zip is that you have that additional dictionary in the people list. This will get the following output, and should work through a larger list of individuals:
[{'name': 'Richard',
'current_project': [],
'children': 'yes',
'divorced': 'no',
'occupation': 'analyst'},
{'name': 'Mary',
'current_project': ['testing'],
'children': 'no',
'divorced': 'yes',
'occupation': 'QA analyst',
'location': 'Seattle'}]
This script will go through every item of input_list and creates new list where there aren't any list or dictionaries:
input_list = [
["Richard",[],{"children":"yes","divorced":"no","occupation":"analyst"}],
["Mary",["testing"],{"children":"no","divorced":"yes","occupation":"QA analyst","location":"Seattle"}]
]
list_keys = ['name', 'current_project', 'details']
out = []
for item in input_list:
d = {}
out.append(d)
for value, keyname in zip(item, list_keys):
if isinstance(value, dict):
d.update(**value)
elif isinstance(value, list):
if value:
d[keyname] = value[0]
else:
d[keyname] = ''
else:
d[keyname] = value
from pprint import pprint
pprint(out)
Prints:
[{'children': 'yes',
'current_project': '',
'divorced': 'no',
'name': 'Richard',
'occupation': 'analyst'},
{'children': 'no',
'current_project': 'testing',
'divorced': 'yes',
'location': 'Seattle',
'name': 'Mary',
'occupation': 'QA analyst'}]

Find item in a list of dictionaries

I have this data
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
The id's are unique, it's impossible that multiple dictonaries have the same id.
For example I want to get the item with the id "ieow83janx".
My current solution looks like this:
search_id = 'ieow83janx'
item = [x for x in data if x['id'] == search_id][0]
Do you think that's the be solution or does anyone know an alternative solution?
Since the ids are unique, you can store the items in a dictionary to achieve O(1) lookup.
lookup = {ele['id']: ele for ele in data}
then you can do
user_info = lookup[user_id]
to retrieve it
If you are going to get this kind of operations more than once on this particular object, I would recommend to translate it into a dictionary with id as a key.
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
data_dict = {item['id']: item for item in data}
#=> {'ieow83janx': {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}, 'abcd738asdwe': {'mail': 'test#test.com', 'id': 'abcd738asdwe', 'name': 'John'}}
data_dict['ieow83janx']
#=> {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}
In this case, this lookup operation will cost you some constant* O(1) time instead of O(N).
How about the next built-in function (docs):
>>> data = [
... {
... 'id': 'abcd738asdwe',
... 'name': 'John',
... 'mail': 'test#test.com',
... },
... {
... 'id': 'ieow83janx',
... 'name': 'Jane',
... 'mail': 'test#foobar.com',
... }
... ]
>>> search_id = 'ieow83janx'
>>> next(x for x in data if x['id'] == search_id)
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
EDIT:
It raises StopIteration if no match is found, which is a beautiful way to handle absence:
>>> search_id = 'does_not_exist'
>>> try:
... next(x for x in data if x['id'] == search_id)
... except StopIteration:
... print('Handled absence!')
...
Handled absence!
Without creating a new dictionary or without writing several lines of code, you can simply use the built-in filter function to get the item lazily, not checking after it finds the match.
next(filter(lambda d: d['id']==search_id, data))
should for just fine.
Would this not achieve your goal?
for i in data:
if i.get('id') == 'ieow83janx':
print(i)
(xenial)vash#localhost:~/python$ python3.7 split.py
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
Using comprehension:
[i for i in data if i.get('id') == 'ieow83janx']
if any(item['id']=='ieow83janx' for item in data):
#return item
As any function returns true if iterable (List of dictionaries in your case) has value present.
While using Generator Expression there will not be need of creating internal List. As there will not be duplicate values for the id in List of dictionaries, any will stop the iteration until the condition returns true. i.e the generator expression with any will stop iterating on shortcircuiting. Using List comprehension will create a entire List in the memory where as GE creates the element on the fly which will be better if you are having large items as it uses less memory.

Categories