Related
I have following data read from csv :
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
i get the unique keys with :
b= list({k for d in venues for k in d.keys()})
which results in random order :
['name', 'capacity', 'id']
i would like to sort the unique key result in following manner :
sorted_keys = ['id','name','capacity']
how may i achieve this ?
In python tuples are sorted element-wise, so using a key function that produces tuple from your dictionaries should do the trick.
>>> sorted(venues, key=lambda row: (row['id'], row['name'], row['capacity']))
To be slightly more concise, you could use operator.itemgetter.
>>> from operator import itemgetter
>>> sorted(venues, key=itemgetter('id','name','capacity'))
You can use sort() function and its property key to introduce specific criteria when sorting your list:
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
venues.sort(key=lambda x: x["capacity"])
print(venues)
Output: In this case it sorts by capacity parameter
[{'capacity': 700, 'id': 1, 'name': 'AMD'}, {'capacity': 2000, 'id': 2, 'name': 'Honda'}, {'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}, {'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'}]
Also, you can sort unique keys as follows:
venues =[{'capacity': 700, 'id': 1, 'name': 'AMD'},
{'capacity': 2000, 'id': 2, 'name': 'Honda'},
{'capacity': 2300, 'id': 3, 'name': 'Austin Kiddie Limits'},
{'capacity': 2000, 'id': 4, 'name': 'Austin Ventures'}]
venues.sort(key=lambda x: (x["id"], x["name"], x["capacity"]))
print(venues)
To get your sort order you could use name length as the key.
b = sorted(b, key=lambda x: len(x))
I have 2 lists of dictionaries
a = [{'id':1, 'name':'John Doe'}, {'id':2, 'name':'Jane Doe'}, {'id':4, 'name':'Sample Doe'}]
b = [{'id':1, 'rating':9}, {'id':2, 'rating':7}, {'id':3, 'rating':8}]
Is there a way to concat b to a if the id b is on id a?
[{'id':1, 'name':'John Doe', 'rating':9}, {'id':2, 'name':'Jane Doe', 'rating':7}, {'id':4, 'name':'Sample Doe', 'rating':0}]
You could use the new merging dictionaries feature introduced in Python 3.9:
>>> a = [{'id': 1, 'name': 'John Doe'}, {'id': 2, 'name': 'Jane Doe'}, {'id': 4, 'name': 'Sample Doe'}]
>>> b = [{'id': 1, 'rating': 9}, {'id': 2, 'rating': 7}, {'id': 3, 'rating': 8}]
>>> b_id_to_d = {d['id']: d for d in b} # Create for O(1) lookup time by id.
>>> b_id_to_d
{1: {'id': 1, 'rating': 9}, 2: {'id': 2, 'rating': 7}, 3: {'id': 3, 'rating': 8}}
>>> c = [d | b_id_to_d.get(d['id'], {'rating': 0}) for d in a]
>>> c
[{'id': 1, 'name': 'John Doe', 'rating': 9}, {'id': 2, 'name': 'Jane Doe', 'rating': 7}, {'id': 4, 'name': 'Sample Doe', 'rating': 0}]
For older versions of Python you can try use dict unpacking instead:
>>> c = [{**d, **b_id_to_d.get(d['id'], {'rating': 0})} for d in a]
>>> c
[{'id': 1, 'name': 'John Doe', 'rating': 9}, {'id': 2, 'name': 'Jane Doe', 'rating': 7}, {'id': 4, 'name': 'Sample Doe', 'rating': 0}]
This should work:
[{**item1, **item2} for item1 in a for item2 in b if item1['id'] == item2['id']]
It iterates over the the two dict so it is O(n^2), but it is clear and concise.
{**item1, **item2} means adds the key value pairs from item1, then the key value pairs from item2.
Here, the results will be:
[{'id': 1, 'name': 'John Doe', 'rating': 9},
{'id': 2, 'name': 'Jane Doe', 'rating': 7}]
There is no direct solution to this problem.
But you can use following code:
a = [{'id':1, 'name':'John Doe'}, {'id':2, 'name':'Jane Doe'}]
b = [{'id':1, 'rating':9}, {'id':2, 'rating':7}, {'id':3, 'rating':8}]
key_pos_mapping = {}
for index,dict in enumerate(a):
key_pos_mapping[dict['id']] = index
for dict in b:
if( dict['id'] in key_pos_mapping.keys()):
dict.update(a[key_pos_mapping[dict['id']]])
else:
b.remove(dict)
I'm trying to extract the values from this JSON file, but I having some trouble to extract the data inside from lists in the dict values. For example, in the city and state, I would like to get only the name values and create a Pandas Dataframe and select only some keys like this.
I tried using some for with get methods techniques, but without success.
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': 0001, 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [123]: data
Out[123]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [124]: data2 = {k:v for k,v in data.items() if k in required}
In [125]: data2
Out[125]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'gender': ['Feminino'],
'id': 11111111,
'name': 'My User',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}]}
In [126]: pd.DataFrame(data2).assign(
...: city_name=lambda x: x['city'].str.get('name'),
...: state_name=lambda x: x['state'].str.get('name'),
...: state_fs=lambda x: x['state'].str.get('fs')
...: ).drop(['state', 'city'], axis=1)
Out[126]:
birthday gender id name city_name state_name state_fs
0 1987-07-13T00:00:00.000Z Feminino 11111111 My User Porto Alegre Rio Grande do Sul RS
reason why data2 is required is that you can't have columns that differ in length. So in this case, pd.DataFrame(data) won't work as rg has 0 items but birthday has 1 item.
Also something to look at if you are directly dealing with json files is pd.json_normalize
Currently I have a function (shown below) that makes a GET request from an API that I made myself
def get_vehicles(self):
result = "http://127.0.0.1:8000/vehicles"
response = requests.get(result)
data = response.content
data_dict = json.loads(data)
return data_dict
The data I got is in this format. Which is a list of dictionary
data_dict = [{'colour': 'Black', 'cost': 10, 'latitude': -37.806152, 'longitude': 144.95787, 'rentalStatus': 'True', 'seats': 4, 'user': None, 'vehicleBrand': 'Toyota', 'vehicleID': 1, 'vehicleModel': 'Altis'}, {'colour': 'White', 'cost': 15, 'latitude': -37.803913, 'longitude': 144.964859, 'rentalStatus': 'False', 'seats': 4, 'user': {'firstname': 'Test', 'imageName': None, 'password': 'password', 'surname': 'Ing', 'userID': 15, 'username': 'Testing'}, 'vehicleBrand': 'Honda', 'vehicleID': 3, 'vehicleModel': 'Civic'}]
Is it possible to convert it to just a dictionary? Example:
data_dict = {'colour': 'Black', 'cost': 10, 'latitude': -37.806152, 'longitude': 144.95787, 'rentalStatus': 'True', 'seats': 4, 'user': None, 'vehicleBrand': 'Toyota', 'vehicleID': 1, 'vehicleModel': 'Altis'}, {'colour': 'White', 'cost': 15, 'latitude': -37.803913, 'longitude': 144.964859, 'rentalStatus': 'False', 'seats': 4, 'user': {'firstname': 'Test', 'imageName': None, 'password': 'password', 'surname': 'Ing', 'userID': 15, 'username': 'Testing'}, 'vehicleBrand': 'Honda', 'vehicleID': 3, 'vehicleModel': 'Civic'}
No, the second result is a tuple, not a dict.
data_dict = {'colour': 'Black', 'cost': 10, 'latitude': -37.806152, 'longitude': 144.95787, 'rentalStatus': 'True', 'seats': 4, 'user': None, 'vehicleBrand': 'Toyota', 'vehicleID': 1, 'vehicleModel': 'Altis'}, {'colour': 'White', 'cost': 15, 'latitude': -37.803913, 'longitude': 144.964859, 'rentalStatus': 'False', 'seats': 4, 'user': {'firstname': 'Test', 'imageName': None, 'password': 'password', 'surname': 'Ing', 'userID': 15, 'username': 'Testing'}, 'vehicleBrand': 'Honda', 'vehicleID': 3, 'vehicleModel': 'Civic'}
print(type(data_dict))
# <class 'tuple'>
It is the same as:
data_dict = ({'colour': 'Black', 'cost': 10, 'latitude': -37.806152, 'longitude': 144.95787, 'rentalStatus': 'True', 'seats': 4, 'user': None, 'vehicleBrand': 'Toyota', 'vehicleID': 1, 'vehicleModel': 'Altis'}, {'colour': 'White', 'cost': 15, 'latitude': -37.803913, 'longitude': 144.964859, 'rentalStatus': 'False', 'seats': 4, 'user': {'firstname': 'Test', 'imageName': None, 'password': 'password', 'surname': 'Ing', 'userID': 15, 'username': 'Testing'}, 'vehicleBrand': 'Honda', 'vehicleID': 3, 'vehicleModel': 'Civic'})
That's why it is a tuple.
If you only want to merge them in a dict,it seems to be impossible because dict couldn't have the same keys.But you could merge the value as a list,like:
d = {key: list(value) for key, value in zip(data_dict[0].keys(), zip(data_dict[0].values(), data_dict[1].values()))}
print(d)
Result(Make sure they has the same length):
{
'colour': ['Black', 'White'],
'cost': [10, 15],
'latitude': [-37.806152, -37.803913],
'longitude': [144.95787, 144.964859],
'rentalStatus': ['True', 'False'],
'seats': [4, 4],
'user': [None, {
'firstname': 'Test',
'imageName': None,
'password': 'password',
'surname': 'Ing',
'userID': 15,
'username': 'Testing'
}],
'vehicleBrand': ['Toyota', 'Honda'],
'vehicleID': [1, 3],
'vehicleModel': ['Altis', 'Civic']
}
This a list of dictionaries.
Therefore you can access them using the array syntax: data_dict[0] for the first element for example.
I have a list of dictionaries, and I would like to obtain those that have the same value in a key:
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
I want to keep those items that have the same 'name', so, I would like to obtain something like:
duplicates: [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
}, {
'id': 7,
'name': 'John'
}
]
I'm trying (not successfully):
duplicates = [item for item in my_list_of_dicts if len(my_list_of_dicts.get('name', None)) > 1]
I have clear my problem with this code, but not able to do the right sentence
Another concise way using collections.Counter:
from collections import Counter
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
c = Counter(x['name'] for x in my_list_of_dicts)
duplicates = [x for x in my_list_of_dicts if c[x['name']] > 1]
You could use the following list comprehension:
>>> [d for d in my_list_of_dicts if len([e for e in my_list_of_dicts if e['name'] == d['name']]) > 1]
[{'id': 3, 'name': 'John'},
{'id': 5, 'name': 'Peter'},
{'id': 2, 'name': 'Peter'},
{'id': 7, 'name': 'John'}]
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
df = pd.DataFrame(my_list_of_dicts)
df[df.name.isin(df[df.name.duplicated()]['name'])].to_json(orient='records')
Attempt similar to #cucuru
Hopefully Helpful.
Explained in comments what I did differently.
my_list_of_dicts = [{
'id': 3,
'name': 'John'
},{
'id': 5,
'name': 'Peter'
},{
'id': 2,
'name': 'Peter'
},{
'id': 6,
'name': 'Mariah'
},{
'id': 7,
'name': 'John'
},{
'id': 1,
'name': 'Louis'
}
]
# Create a list of names
names = [person.get('name') for person in my_list_of_dicts]
# Add item to list if the name occurs more than once in names
duplicates = [item for item in my_list_of_dicts if names.count(item.get('name')) > 1]
print(duplicates)
produces
[{'id': 3, 'name': 'John'}, {'id': 5, 'name': 'Peter'}, {'id': 2, 'name': 'Peter'}, {'id': 7, 'name': 'John'}]
[Program finished]