django-tables2 sorting non-queryset data - python

Hi I have a table thus:
class MyTable(tables.Table):
brand = tables.Column(verbose_name='Brand')
market = tables.Column(verbose_name='Market')
date = tables.DateColumn(format='d/m/Y')
value = tables.Column(verbose_name='Value')
I'm populating the table with a list of dicts and then I'm configuring it and rendering it in my view in the usual django-tables2 way:
data = [
{'brand': 'Nike', 'market': 'UK', 'date': <datetime obj>, 'value': 100},
{'brand': 'Django', 'market': 'FR', 'date': <datetime obj>, 'value': 100},
]
table = MyTable(data)
RequestConfig(request, paginate=False).configure(table)
context = {'table': table}
return render(request, 'my_tmp.html', context)
This all renders the table nicely with the correct data in it. However, I can sort by the date column and the value column on the webpage, but not by the brand and market. So it seems non-string values can be sorted but strings can't. I've been trying to figure out how to do this sorting, is it possible with non-queryset data?
I can't populate the table with a queryset, as I'm using custom methods to generate the value column data. Any help appreciated! I guess I need to specify the order_by parameter in my tables.Column but haven't found the correct setting yet.

data.sort(key = lambda item: item["brand"].lower()) will sort the list in place (so it will return None; the original 'data' list is edited) based on brand (alphabetically). The same can be done for any key.
Alternatively, sorted(data, key = lambda item: item["brand"].lower()) returns a sorted copy of the list.

Related

Parsing nested dictionary to dataframe

I am trying to create data frame from a JSON file.
and each album_details have a nested dict like this
{'api_path': '/albums/491200',
'artist': {'api_path': '/artists/1421',
'header_image_url': 'https://images.genius.com/f3a1149475f2406582e3531041680a3c.1000x800x1.jpg',
'id': 1421,
'image_url': 'https://images.genius.com/25d8a9c93ab97e9e6d5d1d9d36e64a53.1000x1000x1.jpg',
'iq': 46112,
'is_meme_verified': True,
'is_verified': True,
'name': 'Kendrick Lamar',
'url': 'https://genius.com/artists/Kendrick-lamar'},
'cover_art_url': 'https://images.genius.com/1efc5de2af228d2e49d91bd0dac4dc49.1000x1000x1.jpg',
'full_title': 'good kid, m.A.A.d city (Deluxe Version) by Kendrick Lamar',
'id': 491200,
'name': 'good kid, m.A.A.d city (Deluxe Version)',
'url': 'https://genius.com/albums/Kendrick-lamar/Good-kid-m-a-a-d-city-deluxe-version'}
I want to create another column in the data frame with just album name which is one the above dict
'name': 'good kid, m.A.A.d city (Deluxe Version)',
I have been looking how to do this from very long time , can some one please help me. thanks
Is that is the case use str to call the dict key
df['name'] = df['album_details'].str['name']
If you have the dataframe stored in the df variable you could do:
df['artist_name'] = [x['artist']['name'] for x in df['album_details'].values]
You can use apply with lambda function:
df['album_name'] = df['album_details'].apply(lambda d: d['name'])
Basically you execute the lambda function for each value of the column 'album_details'. Note that the argument 'd' in the function is the album dictionary. Apply returns a series of the function return values and this you can set to a new column.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Basic pandas dataframe manipulation question

I have the following JSON snippet:
{'search_metadata': {'completed_in': 0.027,
'count': 2},
'statuses': [{'contributors': None,
'coordinates': None,
'created_at': 'Wed Mar 31 19:25:16 +0000 2021',
'text': 'The text',
'truncated': True,
'user': {'contributors_enabled': False,
'screen_name': 'abcde',
'verified': false
}
}
,{...}]
}
The info that interests me is all in the statuses array. With pandas I can turn this into a DataFrame like this
df = pd.DataFrame(Data['statuses'])
Then I extract a subset out of this dataframe with
dfsub = df[['created_at', 'text']]
display(dfsub) shows exactly what I expect.
But I also want to include [user][screen_name] to the subset.
dfs = df[[ 'user', 'created_at', 'text']]
is syntactically correct but user contains to much information.
How do I add only the screen_name to the subset?
I have tried things like the following but none of that works
[user][screen_name]
user.screen_name
user:screen_name
I would normalize data before contructing DataFrame.
Take a look here: https://stackoverflow.com/a/41801708/14596032
Working example as an answer for your question:
df = pd.json_normalize(Data['statuses'], sep='_')
dfs = df[[ 'user_screen_name', 'created_at', 'text']]
print(dfs)
You can try to access Dataframe, then Series, then Dict
df['user'] # user column = Series
df['user'][0] # 1st (only) item of the Series = dict
df['user'][0]['screen_name'] # screen_name in dict
You can use pd.Series.str. The docs don't do justice to all the wonderful things .str can do, such as accessing list and dict items. Case in point, you can access dict elements like this:
df['user'].str['screen_name']
That said, I agree with #VladimirGromes that a better way is to normalize your data into a flat table.

Using Pymongo Upsert to Update or Create a Document in MongoDB using Python

I have a dataframe that contains data I want to upload into MongoDB. Below is the data:
MongoRow = pd.DataFrame.from_dict({'school': {1: schoolID}, 'student': {1: student}, 'date': {1: dateToday}, 'Probability': {1: probabilityOfLowerThanThreshold}})
school student date Probability
1 5beee5678d62101c9c4e7dbb 5bf3e06f9a892068705d8420 2020-03-27 0.000038
I have the following code which checks if a row in mongo contains the same student ID and date, if it doesn't then it adds the row:
def getPredictions(school):
schoolDB = DB[school['database']['name']]
schoolPredictions = schoolDB['session_attendance_predicted']
Predictions = schoolPredictions.aggregate([{
'$project': {
'school': '$school',
'student':'$student',
'date':'$date'
}
}])
return list(Predictions)
Predictions = getPredictions(school)
Predictions = pd.DataFrame(Predictions)
schoolDB = DB[school['database']['name']]
collection = schoolDB['session_attendance_predicted']
import json
for i in Predictions.index:
schoolOld = Predictions.loc[i,'school']
studentOld = Predictions.loc[i,'student']
dateOld = Predictions.loc[i,'date']
if(studentOld == student and date == dateOld):
print("Student Exists")
#UPDATE THE ROW WITH NEW VALUES
else:
print("Student Doesn't Exist")
records = json.loads(df.T.to_json()).values()
collection.insert(records)
However if it does exist, I want it to update the row with the new values. Does anyone know how to do this? I have looked at pymongo upsert but I'm not sure how to use it. Can anyone help?
'''''''UPDATE'''''''
The above is partly working now, however, I am now getting an error with the following code:
dateToday = datetime.datetime.combine(dateToday, datetime.time(0, 0))
MongoRow = pd.DataFrame.from_dict({'school': {1: schoolID}, 'student': {1: student}, 'date': {1: dateToday}, 'Probability': {1: probabilityOfLowerThanThreshold}})
data_dict = MongoRow.to_dict()
for i in Predictions.index:
print(Predictions)
collection.replace_one({'student': student, 'date': dateToday}, data_dict, upsert=True)
Error:
InvalidDocument: documents must have only string keys, key was 1
Probably a number of people are going to be confused by the accepted answer as it suggests using replace_one with the upsert flag.
Upserting means 'Updated or Insert' (Up = update and sert= insert). For most people looking to 'upsert', they should be using update_one with the upsert flag.
For example:
collection.update_one({'matchable_field': field_data_to_match}, {"$set": upsertable_data}, upsert=True)
To upsert you cannot use insert() (deprecated) insert_one() or insert_many(). You must use one of the collection level operators that supports upserting.
To get started I would point you towards reading the dataframe line by line and using replace_one() on each line. There are more advanced ways of doing this but this is the easiest.
Your code will look a bit like:
collection.replace_one({'Student': student, 'Date': date}, record, upsert=True)

Retrieve value in JSON from pandas series object

I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks
asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']
The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.

How to Optimally Save Data in Python as Data Structure

My model returns information about PC games in the following format. The format is game index and game value. This is my sim_sorted.
[(778, 0.99999994), (1238, 0.9999997), (1409, 0.99999905), (1212, 0.99999815)]
I retrieve the information about the game by indexing the database (df_indieGames):
sims_sorted = sorted(enumerate(sims), key=lambda item: -item[1])
results = {}
for val in sims_sorted[:4]:
index, value = val[0], val[1]
results[df_indieGames.game_name.loc[index]] =
{
"Genre":df_indieGames.genre.loc[index],
"Rating": df_indieGames.score.loc[index],
"Link": df_indieGames.game_link[index]
}
However, such a data structure is hard to sort (by Rating). Is there a better way to store the information so retrieval and sorting is easier? Thanks.
Here's the output of results:
{u'Diehard Dungeon': {'Genre': u'Roguelike',
'Link': u'http://www.indiedb.com/games/diehard-dungeon',
'Rating': 8.4000000000000004},
u'Fork Truck Challenge': {'Genre': u'Realistic Sim',
'Link': u'http://www.indiedb.com/games/fork-truck-challenge',
'Rating': 7.4000000000000004},
u'Miniconomy': {'Genre': u'Realistic Sim',
'Link': u'http://www.indiedb.com/games/miniconomy',
'Rating': 7.2999999999999998},
u'World of Padman': {'Genre': u'First Person Shooter',
'Link': u'http://www.indiedb.com/games/world-of-padman',
'Rating': 9.0}}
UPDATE
The solution to the problem as suggested by ziddarth is the following:
result = sorted(results.iteritems(), key=lambda x: x[1]['Rating'], reverse=True)
You can sort by rating using code below. The lambda function is called with a tuple whose first element is the dictionary key and the second element is the dictionary value for the corresponding key, so you can use the lambda function to get to any value in the nested dictionary
sorted(results.iteritems(), key=lambda x: x[1]['Rating'])

Categories