Please see the screenshot...
I'm trying to create another column pulling the first element of the column 'genre' (i.e. Animation for the first one, Adventure for the second one, Romance for the third one and so on...)
Could anyone please help?
You can split the data from the dictionary or from the dataframe. In this code, I convert the column to a list before splitting.
import pandas as pd
### from dictionary
dd = { 'genres':[
[{'id':16,'name':'Animation'},{'id':26,'name':'ChicFlick'}],
[{'id':12,'name':'Adventure'},{'id':22,'name':'Horror'}],
[{'id':18,'name':'Romance'},{'id':28,'name':'Crime'}],
]}
dd['genres2'] = [x[0]['name'] for x in dd['genres']]
print(dd)
### from dataframe
dd = { 'genres':[
[{'id':16,'name':'Animation'},{'id':26,'name':'ChicFlick'}],
[{'id':12,'name':'Adventure'},{'id':22,'name':'Horror'}],
[{'id':18,'name':'Romance'},{'id':28,'name':'Crime'}],
]}
df = pd.DataFrame(dd)
df['genres2'] = [x[0]['name'] for x in df['genres'].to_list()]
print(df.to_string(index=False))
Output
{'genres':
[[{'id': 16, 'name': 'Animation'}, {'id': 26, 'name': 'ChicFlick'}],
[{'id': 12, 'name': 'Adventure'}, {'id': 22, 'name': 'Horror'}],
[{'id': 18, 'name': 'Romance'}, {'id': 28, 'name': 'Crime'}]],
'genres2': ['Animation', 'Adventure', 'Romance']}
genres genres2
[{'id': 16, 'name': 'Animation'}, {'id': 26, 'name': 'ChicFlick'}] Animation
[{'id': 12, 'name': 'Adventure'}, {'id': 22, 'name': 'Horror'}] Adventure
[{'id': 18, 'name': 'Romance'}, {'id': 28, 'name': 'Crime'}] Romance
Related
I have a list of dictionaries with the same keys and I want to filter these dictionaries by conditions that gave to me in input strings.
I have this list of dictionaries:
dictlist: [[{'name': '"fight club"', 'rank': 12, 'budget': 450000},
{'name': '"Interstellar"', 'rank': 26, 'budget': 700000}], [{'name':
'"se7en"', 'rank': 19, 'budget': 200000}, {'name': '"Hamilton"',
'rank': 107, 'budget': 650000}]]
and I receive the conditions from the user for example:
SELECT (name,budget) FROM movies WHERE budget=450000
or
SELECT (name,budget) FROM movies WHERE rank>10
or
SELECT (name,budget,rank) FROM movies WHERE name=='fight club'
unfortunately, I couldn't code for this to filter my list of dictionaries.
I appreciate your helping me.
I wrote this:
dictlist = [
[{'name': '"fight club"', 'rank': 12, 'budget': 450000}, {'name': '"Interstellar"', 'rank': 26, 'budget': 700000}],
[{'name': '"se7en"', 'rank': 19, 'budget': 200000}, {'name': '"Hamilton"', 'rank': 107, 'budget': 650000}]
]
def greaterThanFilter(filterName, filterValue, dictlist):
for i in dictlist:
for j in i:
if j[filterName] > filterValue:
print(j['name'])
greaterThanFilter('budget', 200000, dictlist)
This can only check if a value is greater than another.
Modifying this code should help you. I would recommend you restructure your dictlist to look like this:
dictlist = [
{'name': '"fight club"', 'rank': 12, 'budget': 450000},
{'name': '"Interstellar"', 'rank': 26, 'budget': 700000},
{'name': '"se7en"', 'rank': 19, 'budget': 200000},
{'name': '"Hamilton"', 'rank': 107, 'budget': 650000}
]
The code to filter list the same way would look something like this:
def greaterThanFilter(budget, dictlist):
return [x for x in dictlist if x['budget'] > budget]
# Output: ['"Interstellar"', '"Hamilton"']
experimenting on a project with a large dataset of movies. I have a large data frame, with one row named "Genres" and one named "Vote Average". My goal is to find the 20 highest rated genres bases on "Vote Average".
I would use a group by but I can't seem to figure it out because the genre information looks like this in the column "Genres" :
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
How can I extract Comedy, Drama and Romance from the list above?
How can I group by individual genres while assigning the rows "Vote Average to each genre, so I can print the top 20 rated genres in the data frame?
Genres Vote Average
1 [{'id': 16, 'name': 'Animation'}, {'id': 35, '... 7.7
2 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '... 6.9
3 [{'id': 10749, 'name': 'Romance'}, {'id': 35, ... 6.5
4 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam... 6.1
5 [{'id': 35, 'name': 'Comedy'}] 5.7
... ... ...
32255 [{'id': 878, 'name': 'Science Fiction'}] 3.5
32256 [{'id': 18, 'name': 'Drama'}, {'id': 28, 'name... 5.7
32257 [{'id': 28, 'name': 'Action'}, {'id': 18, 'nam... 3.8
32258 [] 0.0
32259 [] 0.0
EDIT: Example from Data frame is above. movies_metadata.csv from https://www.kaggle.com/rounakbanik/the-movies-dataset
EDIT:
Now when I see all information on kaggle then I think it may need totally different method because these genres are assigned to titles and they can't be in separated rows.
OLD:
Now you have to convert it to correct DataFrame with genres in separated rows insitead of
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, ...]
Here is my example data
import pandas as pd
df = pd.DataFrame([
{'Genre': [{'id': 16, 'name': 'Animation'}], 'Vote Average': 7.7},
{'Genre': [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}], 'Vote Average': 6.1},
{'Genre': [{'id': 10749, 'name': 'Romance'}], 'Vote Average': 6.5},
])
print(df)
result:
Genre Vote Average
0 [{'id': 16, 'name': 'Animation'}] 7.7
1 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam... 6.1
2 [{'id': 10749, 'name': 'Romance'}] 6.5
You can iterate every row and use pd.DataFrame(row['Genre']) to create correct dataframe which you will add to new global dataframe
new_df = pd.DataFrame(columns=['id', 'name', 'Vote Average'])
for index, row in df.iterrows():
temp_df = pd.DataFrame(row['Genre'])
temp_df['Vote Average'] = row['Vote Average']
new_df = new_df.append(temp_df)
print(new_df)
result:
id name Vote Average
0 16 Animation 7.7
0 35 Comedy 6.1
1 18 Drama 6.1
2 10749 Romance 6.1
0 10749 Romance 6.5
and now you can do whatever you like.
Other method to correct data:
First convert list of dictionares to separated rows with dictionares
new_df = df.explode('Genre')
print(new_df)
result:
Genre Vote Average
0 {'id': 16, 'name': 'Animation'} 7.7
1 {'id': 35, 'name': 'Comedy'} 6.1
1 {'id': 18, 'name': 'Drama'} 6.1
1 {'id': 10749, 'name': 'Romance'} 6.1
2 {'id': 10749, 'name': 'Romance'} 6.5
and later convert every dictionary to columns
new_df['id'] = new_df['Genre'].str['id']
new_df['name'] = new_df['Genre'].str['name']
print(new_df)
result:
Genre Vote Average id name
0 {'id': 16, 'name': 'Animation'} 7.7 16 Animation
1 {'id': 35, 'name': 'Comedy'} 6.1 35 Comedy
1 {'id': 18, 'name': 'Drama'} 6.1 18 Drama
1 {'id': 10749, 'name': 'Romance'} 6.1 10749 Romance
2 {'id': 10749, 'name': 'Romance'} 6.5 10749 Romance
or using
new_df[['id','name']] = new_df['Genre'].apply(pd.Series)
Full example
import pandas as pd
df = pd.DataFrame([
{'Genre': [{'id': 16, 'name': 'Animation'}], 'Vote Average': 7.7},
{'Genre': [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}], 'Vote Average': 6.1},
{'Genre': [{'id': 10749, 'name': 'Romance'}], 'Vote Average': 6.5},
])
print('--- df ---')
print(df)
print('--- iterrows ---')
new_df = pd.DataFrame(columns=['id', 'name', 'Vote Average'])
for index, row in df.iterrows():
temp_df = pd.DataFrame(row['Genre'])
temp_df['Vote Average'] = row['Vote Average']
new_df = new_df.append(temp_df)
print(new_df)
print('--- explode #1 ---')
new_df = df.explode('Genre')
print(new_df)
print('--- columns #1 ---')
new_df['id'] = new_df['Genre'].str['id']
new_df['name'] = new_df['Genre'].str['name']
new_df.drop('Genre', inplace=True, axis=1)
new_df.reset_index(inplace=True)
print(new_df)
print('--- explode #2 ---')
new_df = df.explode('Genre')
print(new_df)
print('--- columns #2 ---')
new_df[['id','name']] = new_df['Genre'].apply(pd.Series)
new_df.drop('Genre', inplace=True, axis=1)
new_df.reset_index(inplace=True)
print(new_df)
I need to initialize an empty List of Dictionary(LOD) which must have the following keys in it. "id","name","age", "gender". I want to create a loop/nested loop that starts populating the LOD. For poppulating I have a list which has ID's and the rest of the keys are generated using the random function.
The ID list looks like this: id = ['1','2','3']
The result must look something like this.
LOD = [
{
'id': '1',
'name':'122121',
'age':'2131',
'gender':'121'
},
{
'id': '2',
'name':'122121',
'age':'2131',
'gender':'121'
},
{
'id': '3',
'name':'122121',
'age':'2131',
'gender':'121'
},
]
CJDB already does what you want. But if you'd perhaps prefer another approach:
ids = ['1','2','3']
keys = ["name","age", "gender"]
LOD = []
and then populate your list with dictionaries
for i in ids:
your_dictionary = {"id": i}
for key in keys:
your_dictionary[key] = '{}_rnd_function_output'.format(key)
LOD.append(your_dictionary)
And the output would be
>>> LOD
[{'id': '1',
'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'},
{'id': '2',
'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'},
{'id': '3',
'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'}
]
You might consider having a sub-dictionaries within a dictionary. Your ids would be keys for main dictionary and sub-dictionaries would be values.
LOD = {}
for i in ids:
LOD[i] = {}
for key in keys:
LOD[i][key] = '{}_rnd_function_output'.format(key)
And the output
>>> LOD
{'1': {'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'},
'2': {'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'},
'3': {'name': 'name_rnd_function_output',
'age': 'age_rnd_function_output',
'gender': 'gender_rnd_function_output'}}
You can use a dictionary-comprehension for this:
ids = ['1','2','3']
LOD = [
{
'id': i,
'name':'122121',
'age':'2131',
'gender':'121'
} for i in ids
]
Output:
>>> LOD
[{'id': '1', 'name': '122121', 'age': '2131', 'gender': '121'},
{'id': '2', 'name': '122121', 'age': '2131', 'gender': '121'},
{'id': '3', 'name': '122121', 'age': '2131', 'gender': '121'}]
Or, using the random module:
import random
ids = ['1','2','3']
LOD = [
{
'id': i,
'name': str(random.randint(100000, 999999)),
'age': str(random.randint(1000, 9999)),
'gender': str(random.randint(100, 999))
} for i in ids
]
Output:
>>> LOD
[{'id': '1', 'name': '727325', 'age': '5367', 'gender': '238'},
{'id': '2', 'name': '316019', 'age': '8963', 'gender': '702'},
{'id': '3', 'name': '464023', 'age': '4324', 'gender': '155'}]
Note that you should not use id as a variable name as it shadows the builtin python id object.
You can do it by initializing dict objects in list comprehensions
keys = ['id', 'name', 'age', 'gender']
ids = ['1', '2', '3']
LOD = [dict((key, i if key == 'id' else random.randint(1, 100)) for key in keys) for i in ids]
print(LOD)
'''
[{'id': '1', 'name': 34, 'age': 10, 'gender': 57},
{'id': '2', 'name': 64, 'age': 13, 'gender': 21},
{'id': '3', 'name': 11, 'age': 17, 'gender': 2}]
'''
I have a dataframe about movies and one of the columns is genre.
The entries of this column are in the form of list like -
[{'id': 35, 'name': 'Comedy'},
{'id': 18, 'name': 'Drama'},
{'id': 10751, 'name': 'Family'},
{'id': 10749, 'name': 'Romance'}]
My aim is to extract the genre from the list and store them as a list such as ['Comedy', 'Drama', 'Family', 'Romance'].
When I print the entries of the column for example -
data['genres'][1] it returns the list within the quotes (datatype : string)
"[{'id': 35, 'name': 'Comedy'}]"
Can someone help to get the list without the quotes? like [{'id': 35, 'name': 'Comedy'}] I should be able to take it from there.
When I create my custom example, it works as expected and returns a list without quotes. For example -
ref = pd.DataFrame({'col':[[1,2,3],[4,3,2]]})
ref['col'][0]
This returns a list (without quotes).
Problem is there are string representation of lists, so is necessary first convert it to list of dicts and then extract by get:
a = [{'id': 35, 'name': 'Comedy'},
{'id': 18, 'name': 'Drama'},
{'id': 10751, 'name': 'Family'},
{'id': 10749, 'name': 'Romance'}]
df = pd.DataFrame({'col':a}).astype(str)
import ast
df['genres'] = df['col'].apply(lambda x: ast.literal_eval(x).get('name'))
print (df)
col genres
0 {'id': 35, 'name': 'Comedy'} Comedy
1 {'id': 18, 'name': 'Drama'} Drama
2 {'id': 10751, 'name': 'Family'} Family
3 {'id': 10749, 'name': 'Romance'} Romance
If is necessary get all values:
df = pd.DataFrame({'a':list('abcd'),'col':a}).astype(str)
df = df.join(pd.DataFrame([ast.literal_eval(x) for x in df.pop('col')], index=df.index))
print (df)
a id name
0 a 35 Comedy
1 b 18 Drama
2 c 10751 Family
3 d 10749 Romance
This question already has answers here:
Remove duplicate dict in list in Python
(16 answers)
Closed 6 years ago.
I have a list of dictionaries where I want to drop any dictionaries that repeat their id key. What's the best way to do this e.g:
example dict:
product_1={ 'id': 1234, 'price': 234}
List_of_products[product1:, product2,...........]
How can I the list of products so I have non repeating products based on their product['id']
Select one of product dictionaries in which the values with the same id are different. Use itertools.groupby,
import itertools
list_products= [{'id': 12, 'price': 234},
{'id': 34, 'price': 456},
{'id': 12, 'price': 456},
{'id': 34, 'price': 78}]
list_dicts = list()
for name, group in itertools.groupby(sorted(list_products, key=lambda d : d['id']), key=lambda d : d['id']):
list_dicts.append(next(group))
print(list_dicts)
# Output
[{'price': 234, 'id': 12}, {'price': 456, 'id': 34}]
If the product dictionaries with the same id are totally the same, there is an easier way as described in Remove duplicate dict in list in Python. Here is a MWE.
list_products= [{'id': 12, 'price': 234},
{'id': 34, 'price': 456},
{'id': 12, 'price': 234},
{'id': 34, 'price': 456}]
result = [dict(t) for t in set([tuple(d.items()) for d in list_products])]
print(result)
# Output
[{'price': 456, 'id': 34}, {'price': 234, 'id': 12}]
a = [{'id': 124, 'price': 234}, {'id': 125, 'price': 234}, {'id': 1234, 'price': 234}, {'id': 1234, 'price': 234}]
a.sort()
for indx, val in enumerate(a):
if val['id'] == a[indx+1]['id']:
del a[indx]