Related
I have this kind of json I would transform it into a pandas dataframe, with specific columns names.
{
"data": [
{
"id": 1,
"name": "3Way Result",
"suspended": false,
"bookmaker": {
"data": [
{
"id": 27802,
"name": "Ladbrokes",
"odds": {
"data": [
{
"label": "1",
"value": "1.61",
"probability": "62.11%",
"dp3": "1.610",
"american": -164,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:41:27.000000",
"timezone_type": 3,
"timezone": "UTC"
}
},
{
"label": "X",
"value": "3.90",
"probability": "25.64%",
"dp3": "3.900",
"american": 290,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:41:27.000000",
"timezone_type": 3,
"timezone": "UTC"
}
},
{
"label": "2",
"value": "5.20",
"probability": "19.23%",
"dp3": "5.200",
"american": 420,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:41:27.000000",
"timezone_type": 3,
"timezone": "UTC"
}
}
]
}
},
{
"id": 70,
"name": "Pncl",
"odds": {
"data": [
{
"label": "1",
"value": "1.65",
"probability": "60.61%",
"dp3": "1.645",
"american": -154,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:59:18.000000",
"timezone_type": 3,
"timezone": "UTC"
}
},
{
"label": "X",
"value": "4.20",
"probability": "23.81%",
"dp3": "4.200",
"american": 320,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:59:18.000000",
"timezone_type": 3,
"timezone": "UTC"
}
},
{
"label": "2",
"value": "5.43",
"probability": "18.42%",
"dp3": "5.430",
"american": 443,
"factional": null,
"winning": null,
"handicap": null,
"total": null,
"bookmaker_event_id": null,
"last_update": {
"date": "2021-10-01 16:59:18.000000",
"timezone_type": 3,
"timezone": "UTC"
}
}
]
}
}
]
}
}
],
"meta": {
"plans": [
{
"name": "Football Free Plan",
"features": "Standard",
"request_limit": "180,60",
"sport": "Soccer"
}
],
"sports": [
{
"id": 1,
"name": "Soccer",
"current": true
}
]
}
}
All columns name contains the name of the bookmaker plus the label value.
I would take the value in label and use it as column name with the name of the bookmaker in name. Then the float in value use it as row of the dataframe
Here the Expected Output
1_LadBrokes X_LadBrokes 2_LadBrokes last_update_LadBrokes 1_Pncl X_Pncl 2_Pncl last_update_Pncl
0 1.61 3.9 5.2 2021-10-01 16:41:27.000000 1.65 4.2 5.43 2021-10-01 16:59:18.000000
You can achieve it like so using json_normalize + apply.
def set_values(x):
data = x["odds.data"]
label = data.get("label")
value = data.get("value")
last_update_date = data["last_update"]["date"]
name = x["name"]
x[f"{label}_{name}"] = value
x[f"last_update_{name}"] = last_update_date
return x
df = (
pd.json_normalize(data["data"], record_path=["bookmaker", "data"])
.explode("odds.data")
.apply(lambda x: set_values(x), axis=1)
.drop(["odds.data", "id", "name"], axis=1)
.ffill()
.bfill()
.head(1)
)
In [39]: df
Out[39]:
1_Ladbrokes 1_Pncl 2_Ladbrokes 2_Pncl X_Ladbrokes X_Pncl last_update_Ladbrokes last_update_Pncl
0 1.61 1.65 5.20 5.43 3.90 4.20 2021-10-01 16:41:27.000000 2021-10-01 16:59:18.000000
Use pd.json_normalize and create two subdataframes for value and last_update them join them.
out = pd.json_normalize(
data=data['data'],
record_path=['bookmaker', 'data', 'odds', 'data'],
meta=[['bookmaker', 'data', 'name']]
)[['label', 'value', 'last_update.date', 'bookmaker.data.name']]
df1 = out.set_index(out['label'] + '_' + out['bookmaker.data.name'])['value']
df2 = out.set_index('bookmaker.data.name')['last_update.date'] \
.add_prefix('last_update_').drop_duplicates()
df = pd.concat([df1, df2]).to_frame().T
Output:
>>> df
1_Ladbrokes_Ladbrokes X_Ladbrokes_Ladbrokes 2_Ladbrokes_Ladbrokes 1_Pncl_Pncl X_Pncl_Pncl 2_Pncl_Pncl last_update_Ladbrokes last_update_Pncl
0 1.61 3.90 5.20 1.65 4.20 5.43 2021-10-01 16:41:27.000000 2021-10-01 16:59:18.000000
This is the Json List Below:
"chartOfAccounts": [
{
"id": 147,
"name": "Sales Product - Wholesale test",
"description": "Sales - Wholesale",
"balance": "0.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 5,
"name": "Accounts Receivable (A/R)",
"account_type": {
"id": 2,
"name": "Accounts Receivable (A/R)",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 146,
"name": "Sales Product - Wholesale test",
"description": "Sales - Wholesale",
"balance": "0.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 5,
"name": "Accounts Receivable (A/R)",
"account_type": {
"id": 2,
"name": "Accounts Receivable (A/R)",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 145,
"name": "Cash in hand rony",
"description": "Cash in hand rony",
"balance": "-45980.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 1,
"name": "Cash and cash equivalents",
"account_type": {
"id": 1,
"name": "Cash and cash equivalents",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 144,
"name": "6yt4",
"description": "gyyy",
"balance": "5203.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 1,
"name": "Cash and cash equivalents",
"account_type": {
"id": 1,
"name": "Cash and cash equivalents",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 99,
"name": "Cash in hand Monim",
"description": "monim cash",
"balance": "-1759.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 1,
"name": "Cash and cash equivalents",
"account_type": {
"id": 1,
"name": "Cash and cash equivalents",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 98,
"name": "Monim Capital",
"description": "Monim Capital",
"balance": "50000.00",
"is_active": true,
"is_editable": true,
"account_detail_type": {
"id": 10,
"name": "Owen's Capital",
"account_type": {
"id": 6,
"name": "Owen's Capital",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "cr"
}
},
{
"id": 93,
"name": "Payroll - m#m.bn",
"description": "Payroll - m#m.bn",
"balance": "0.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 12,
"name": "Expenses",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "dr"
}
},
{
"id": 12,
"name": "Profit",
"description": "Profit",
"balance": "437690.75",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 11,
"name": "Income",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "cr"
}
},
{
"id": 10,
"name": "Rajib",
"description": "Test",
"balance": "50000.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 11,
"name": "Income",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "cr"
}
},
{
"id": 9,
"name": "Sales - Product",
"description": "Sales - Product",
"balance": "0.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 11,
"name": "Income",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "cr"
}
},
{
"id": 8,
"name": "Purchases - Product",
"description": "Purchases - Product",
"balance": "47388.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 12,
"name": "Expenses",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "dr"
}
},
{
"id": 7,
"name": "Payroll Expenses",
"description": "Payroll Expenses",
"balance": "0.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 12,
"name": "Expenses",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "dr"
}
},
{
"id": 6,
"name": "Office expenses",
"description": "Office expenses",
"balance": "28899.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 12,
"name": "Expenses",
"account_type": {
"id": 7,
"name": "Revenue",
"principle": {
"id": 3,
"name": "Owen's Equity",
"calculation_type": "cr"
}
},
"calculation_type": "dr"
}
},
{
"id": 5,
"name": "Accounts Payable (A/P)",
"description": "Accounts Payable (A/P)",
"balance": "18491.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 9,
"name": "Accounts Payable (A/P)",
"account_type": {
"id": 5,
"name": "Current liabilities",
"principle": {
"id": 2,
"name": "Liability",
"calculation_type": "cr"
}
},
"calculation_type": "cr"
}
},
{
"id": 4,
"name": "Inventory",
"description": "Inventory",
"balance": "88682.75",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 7,
"name": "Inventory",
"account_type": {
"id": 3,
"name": "Current assets",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 3,
"name": "Accounts Receivable (A/R)",
"description": "Accounts Receivable (A/R)",
"balance": "2500.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 5,
"name": "Accounts Receivable (A/R)",
"account_type": {
"id": 2,
"name": "Accounts Receivable (A/R)",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
},
{
"id": 1,
"name": "Rajib",
"description": "Test",
"balance": "431248.00",
"is_active": true,
"is_editable": false,
"account_detail_type": {
"id": 1,
"name": "Cash and cash equivalents",
"account_type": {
"id": 1,
"name": "Cash and cash equivalents",
"principle": {
"id": 1,
"name": "Asset",
"calculation_type": "dr"
}
},
"calculation_type": "dr"
}
}
],
From this JSON list data I want to filter it by id and show it on an html template! Basically this needs for a edit tasks. when someone click on edit button particular id is passed. like.. when id is 147 I want to print 147's other data like name, description, balance, account details type! And this will be like a single object.
See the belows HTML Format: Here I want to show the data like name, description, balance etc in the value {{ }} template:
<div class="form-group row">
<label class="col-form-label col-md-2">Name</label>
<div class="col-md-10">
<input type="text" class="form-control" name="name"
value="{{ }}">
</div>
</div>
<div class="form-group row">
<label class="col-form-label col-md-2">Description</label>
<div class="col-md-10">
<input type="text" class="form-control" name="desc"
value="{{ }}">
</div>
</div>
<div class="form-group row">
<label class="col-form-label col-md-2">Balance</label>
<div class="col-md-10">
<input type="text" class="form-control" name="bal"
value="{{ }}">
</div>
</div>
it is better if you store this json file as a model in the project and render the html when the required Id is sent.
ex if givenId = 147 is pressed. the function in view.py can run where
chart = chartsOfAccounts.object.filter(id=givenId)
return render(request,<htmlfile>,{"chart":chart})
and in html you can use the Chart Model to get the following data.
{{ chart.name }}
{{ chart.desc }}
or else convert the json file to a dictionary and send the value
I'm having a problem where i can scrape data from a website by using the java pathing. I'm trying to scrape from Rocket League Tracker.
here's my code:
import requests
import re
import json
import math
def rankGetter():
trackerLink = 'https://rocketleague.tracker.network/rocket-league/profile/epic/DirectPanda/overview'
# now we have the tracker link we're going to scrape the website
# all the HTML of the site is now in result
result = requests.get(trackerLink)
# checker to make sure the user used the correct information
if result.status_code == 400:
print('profile not found')
else:
# Extract everything needed to render the current page. Data is stored as Json in the
# JavaScript variable: window.__INITIAL_STATE__={"route":{"path":"\u0 ... }};
json_string = re.search(r"window.__INITIAL_STATE__\s?=\s?(\{.*?\});", result.text).group(1)
# convert text string to structured json data
rocketleague = json.loads(json_string)
# Save structured json data to a text file that helps you orient yourself and pick
# the parts you are interested in.
with open('rocketleague_json_data.txt', 'w') as outfile:
outfile.write(json.dumps(rocketleague, indent=4, sort_keys=True))
The error is the text doc made doesn't have the ranks I want anymore.
"stats": {
"standardLeaderboardLeaders": {},
"standardLeaderboards": [],
"standardPlayers": {},
"standardTitles": {}
},
**"stats-v2": {
"segments": {},
"standardProfileMatches": {},
"standardProfileSummaries": {},
"standardProfiles": {},
"standardProfilesHistory": {},
"standardSessions": {},
"subscriptions": {}
},**
"titles": {
"currentTitle": {
"name": "Rocket League",
"platforms": [
The Ranks should be under stats-V2 but as you can see its empty now.
whats happening and how do i fix it? I was able to get ranks for a week but all the sudden it stopped working today.
Seems that the data are loaded from external URL:
import json
import requests
url = "https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/DirectPanda"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
data = requests.get(url, headers=headers).json()
print(json.dumps(data, indent=4))
Prints:
{
"data": {
"platformInfo": {
"platformSlug": "epic",
"platformUserId": null,
"platformUserHandle": "DirectPanda",
"platformUserIdentifier": "DirectPanda",
"avatarUrl": null,
"additionalParameters": null
},
"userInfo": {
"userId": null,
"isPremium": false,
"isVerified": false,
"isInfluencer": false,
"isPartner": false,
"countryCode": null,
"customAvatarUrl": null,
"customHeroUrl": null,
"socialAccounts": [],
"pageviews": 592,
"isSuspicious": null
},
"metadata": {
"lastUpdated": {
"value": "2021-04-22T17:39:42.277-04:00",
"displayValue": "2021-04-22T21:39:42.2770000+00:00"
},
"playerId": 16603481,
"currentSeason": 17
},
"segments": [
{
"type": "overview",
"attributes": {},
"metadata": {
"name": "Lifetime"
},
"expiryDate": "0001-01-01T00:00:00+00:00",
"stats": {
"wins": {
"rank": 30357,
"percentile": 98.3,
"displayName": "Wins",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 4985,
"displayValue": "4,985",
"displayType": "Number"
},
"goals": {
"rank": 23698,
"percentile": 98.7,
"displayName": "Goals",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 14363,
"displayValue": "14,363",
"displayType": "Number"
},
"mVPs": {
"rank": 35646,
"percentile": 98.0,
"displayName": "MVPs",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 2093,
"displayValue": "2,093",
"displayType": "Number"
},
"saves": {
"rank": 30864,
"percentile": 98.3,
"displayName": "Saves",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 9231,
"displayValue": "9,231",
"displayType": "Number"
},
"assists": {
"rank": 29228,
"percentile": 98.4,
"displayName": "Assists",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 4763,
"displayValue": "4,763",
"displayType": "Number"
},
"shots": {
"rank": 24596,
"percentile": 98.6,
"displayName": "Shots",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 29139,
"displayValue": "29,139",
"displayType": "Number"
},
"goalShotRatio": {
"rank": 1409320,
"percentile": 15.0,
"displayName": "Goal Shot Ratio",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 49.29132777377398,
"displayValue": "49.3",
"displayType": "NumberPrecision1"
},
"score": {
"rank": 28260,
"percentile": 98.4,
"displayName": "TRN Score",
"displayCategory": "General",
"category": "general",
"metadata": {},
"value": 2398222.83,
"displayValue": "2,398,222.8",
"displayType": "NumberPrecision1"
},
"seasonRewardLevel": {
"rank": null,
"percentile": 85.0,
"displayName": "Season Reward Level",
"displayCategory": "General",
"category": "general",
"metadata": {
"iconUrl": "https://trackercdn.com/cdn/tracker.gg/rocket-league/ranks/s4-13.png",
"rankName": "Diamond"
},
"value": 5,
"displayValue": "5",
"displayType": "Number"
},
"seasonRewardWins": {
"rank": null,
"percentile": 95.8,
"displayName": "Season Reward Wins",
"displayCategory": "General",
"category": "general",
"metadata": {},
"value": 9,
"displayValue": "9",
"displayType": "Number"
}
}
},
{
"type": "playlist",
"attributes": {
"playlistId": 0,
"season": 17
},
"metadata": {
"name": "Un-Ranked"
},
"expiryDate": "0001-01-01T00:00:00+00:00",
"stats": {
"tier": {
"rank": null,
"percentile": null,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"iconUrl": "https://trackercdn.com/cdn/tracker.gg/rocket-league/ranks/s4-0.png",
"name": "Unranked"
},
"value": 0,
"displayValue": "0",
"displayType": "Number"
},
"division": {
"rank": null,
"percentile": null,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"name": "Division I"
},
"value": 0,
"displayValue": "0",
"displayType": "Number"
},
"matchesPlayed": {
"rank": null,
"percentile": null,
"displayName": "Matches",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 0,
"displayValue": "0",
"displayType": "Number"
},
"winStreak": {
"rank": null,
"percentile": null,
"displayName": "WinStreak",
"displayCategory": "Performance",
"category": "performance",
"metadata": {
"type": "win"
},
"value": 0,
"displayValue": "0",
"displayType": "Number"
},
"rating": {
"rank": 215152,
"percentile": 90.0,
"displayName": "Rating",
"displayCategory": "Skill",
"category": "skill",
"metadata": {},
"value": 1597,
"displayValue": "1,597",
"displayType": "Number"
}
}
},
{
"type": "playlist",
"attributes": {
"playlistId": 10,
"season": 17
},
"metadata": {
"name": "Ranked Duel 1v1"
},
"expiryDate": "0001-01-01T00:00:00+00:00",
"stats": {
"tier": {
"rank": null,
"percentile": 98.2,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"iconUrl": "https://trackercdn.com/cdn/tracker.gg/rocket-league/ranks/s4-16.png",
"name": "Champion I"
},
"value": 16,
"displayValue": "16",
"displayType": "Number"
},
"division": {
"rank": null,
"percentile": 88.0,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"deltaDown": 13,
"deltaUp": 6,
"name": "Division III"
},
"value": 2,
"displayValue": "2",
"displayType": "Number"
},
"matchesPlayed": {
"rank": null,
"percentile": 57.0,
"displayName": "Matches",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 2,
"displayValue": "2",
"displayType": "Number"
},
"winStreak": {
"rank": null,
"percentile": 60.0,
"displayName": "WinStreak",
"displayCategory": "Performance",
"category": "performance",
"metadata": {
"type": "win"
},
"value": 1,
"displayValue": "1",
"displayType": "Number"
},
"rating": {
"rank": 101541,
"percentile": 96.1,
"displayName": "Rating",
"displayCategory": "Skill",
"category": "skill",
"metadata": {},
"value": 1031,
"displayValue": "1,031",
"displayType": "Number"
}
}
},
{
"type": "playlist",
"attributes": {
"playlistId": 11,
"season": 17
},
"metadata": {
"name": "Ranked Doubles 2v2"
},
"expiryDate": "0001-01-01T00:00:00+00:00",
"stats": {
"tier": {
"rank": null,
"percentile": 87.0,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"iconUrl": "https://trackercdn.com/cdn/tracker.gg/rocket-league/ranks/s4-16.png",
"name": "Champion I"
},
"value": 16,
"displayValue": "16",
"displayType": "Number"
},
"division": {
"rank": null,
"percentile": 90.0,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"deltaDown": 15,
"deltaUp": 3,
"name": "Division IV"
},
"value": 3,
"displayValue": "3",
"displayType": "Number"
},
"matchesPlayed": {
"rank": null,
"percentile": 80.0,
"displayName": "Matches",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 40,
"displayValue": "40",
"displayType": "Number"
},
"winStreak": {
"rank": null,
"percentile": 34.0,
"displayName": "WinStreak",
"displayCategory": "Performance",
"category": "performance",
"metadata": {
"type": "loss"
},
"value": 1,
"displayValue": "-1",
"displayType": "Number"
},
"rating": {
"rank": 311789,
"percentile": 89.0,
"displayName": "Rating",
"displayCategory": "Skill",
"category": "skill",
"metadata": {},
"value": 1177,
"displayValue": "1,177",
"displayType": "Number"
}
}
},
{
"type": "playlist",
"attributes": {
"playlistId": 13,
"season": 17
},
"metadata": {
"name": "Ranked Standard 3v3"
},
"expiryDate": "0001-01-01T00:00:00+00:00",
"stats": {
"tier": {
"rank": null,
"percentile": 96.0,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"iconUrl": "https://trackercdn.com/cdn/tracker.gg/rocket-league/ranks/s4-17.png",
"name": "Champion II"
},
"value": 17,
"displayValue": "17",
"displayType": "Number"
},
"division": {
"rank": null,
"percentile": 79.0,
"displayName": "Matches",
"displayCategory": "General",
"category": "general",
"metadata": {
"deltaDown": 7,
"deltaUp": 27,
"name": "Division III"
},
"value": 2,
"displayValue": "2",
"displayType": "Number"
},
"matchesPlayed": {
"rank": null,
"percentile": 97.8,
"displayName": "Matches",
"displayCategory": "Performance",
"category": "performance",
"metadata": {},
"value": 95,
"displayValue": "95",
"displayType": "Number"
},
"winStreak": {
"rank": null,
"percentile": 16.0,
"displayName": "WinStreak",
"displayCategory": "Performance",
"category": "performance",
"metadata": {
"type": "loss"
},
"value": 2,
"displayValue": "-2",
"displayType": "Number"
},
"rating": {
"rank": 122500,
"percentile": 95.8,
"displayName": "Rating",
"displayCategory": "Skill",
"category": "skill",
"metadata": {},
"value": 1255,
"displayValue": "1,255",
"displayType": "Number"
}
}
},
...
I have a scenario as depicted below in python code .
In this I am trying to explicitly define new york and ny as synonyms. But unfortunately it is not working. Can you please guide me as I am new to elastic search.
Also I am using custom analyzer.
I also have the file synonyms.txt having text:
ny,newyork,nyork
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
keywords = ['thousand eyes', 'facebook', 'superdoc', 'quora', 'your story', 'Surgery', 'lending club', 'ad roll',
'the honest company', 'Draft kings', 'newyork']
count = 1
doc_setting = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": "true"
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
validate=es.index(index='test', doc_type='your_type', body=doc_setting)
print(validate)
for keyword in keywords:
doc = {
'id': count,
'keyword': keyword
}
res = es.index(index="test", doc_type='your_type', id=count, body=doc)
print(res['result'])
count = count + 1
#res11 = es.get(index="test", doc_type='your_type', id=1)
#print(res11['_source'])
es.indices.refresh(index="test")
question = "I saw news on ny news channel of lending club on facebook, your story and quora"
print("Question asked: %s" % question)
res = es.search(index="test",`enter code here` doc_type='your_type', body={
"query": {"match": {"keyword": question}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print(hit["_source"])
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"synonym"
]
}
},
"filter": {
"synonym" : {
"type" : "synonym",
"lenient": true,
"synonyms" : ["ny,newyork,nyork"]
}
}
}
}, "mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "text",
"analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
Then Analyze using
POST /test_index/_analyze
{
"analyzer" : "my_analyzer_shingle",
"text" : "I saw news on ny news channel of lending club on facebook, your story and quorat"
}
The tokens I get are
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "saw",
"start_offset": 2,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "news",
"start_offset": 6,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "on",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "ny",
"start_offset": 14,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "newyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "nyork",
"start_offset": 14,
"end_offset": 16,
"type": "SYNONYM",
"position": 4
},
{
"token": "news",
"start_offset": 17,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "channel",
"start_offset": 22,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "of",
"start_offset": 30,
"end_offset": 32,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "lending",
"start_offset": 33,
"end_offset": 40,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "club",
"start_offset": 41,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 9
},
{
"token": "on",
"start_offset": 46,
"end_offset": 48,
"type": "<ALPHANUM>",
"position": 10
},
{
"token": "facebook",
"start_offset": 49,
"end_offset": 57,
"type": "<ALPHANUM>",
"position": 11
},
{
"token": "your",
"start_offset": 59,
"end_offset": 63,
"type": "<ALPHANUM>",
"position": 12
},
{
"token": "story",
"start_offset": 64,
"end_offset": 69,
"type": "<ALPHANUM>",
"position": 13
},
{
"token": "and",
"start_offset": 70,
"end_offset": 73,
"type": "<ALPHANUM>",
"position": 14
},
{
"token": "quorat",
"start_offset": 74,
"end_offset": 80,
"type": "<ALPHANUM>",
"position": 15
}
]
}
and the search produces
POST /test_index/_search
{
"query" : {
"match" : { "keyword" : "I saw news on ny news channel of lending club on facebook, your story and quora" }
}
}
{
"took": 36,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.6858001,
"hits": [
{
"_index": "test_index",
"_type": "your_type",
"_id": "4",
"_score": 1.6858001,
"_source": {
"keyword": "newyork"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "2",
"_score": 1.1727304,
"_source": {
"keyword": "facebook"
}
},
{
"_index": "test_index",
"_type": "your_type",
"_id": "5",
"_score": 0.6931472,
"_source": {
"keyword": "quora"
}
}
]
}
}
I have the following data:
[
{
"M": [
{
"id": 1,
"nk": "MATH$$SPRING$$INST1$$2",
"section": {
"nk": "MATH$$SPRING$$INST1",
"course": 1,
"id": 1
},
"location": {
"id": 1,
"nk": "mcu$$101",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "101"
},
"day_of_week": 2,
"start_time": "09:00:00",
"end_time": "10:00:00"
},
{
"id": 3,
"nk": "ENG$$SPRING$$INST2$$2",
"section": {
"nk": "ENG$$SPRING$$INST2",
"course": 2,
"id": 4
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 2,
"start_time": "09:00:00",
"end_time": "10:00:00"
},
{
"id": 4,
"nk": "ENG$$SPRING$$INST2$$22",
"section": {
"nk": "ENG$$SPRING$$INST2",
"course": 2,
"id": 4
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 2,
"start_time": "10:00:00",
"end_time": "11:00:00"
}
]
},
{
"W": [
{
"id": 2,
"nk": "MATH$$SPRING$$INST1$$4",
"section": {
"nk": "MATH$$SPRING$$INST2",
"course": 1,
"id": 2
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 4,
"start_time": "08:00:00",
"end_time": "10:00:00"
}
]
}
]
I'm trying to extract "W"'s list.
When i do: jq('[.[].W][]').transform(data) i get None, But when i do jq('[.[].M][]').transform(data) I get the desired result. Why im i experiencing this?
I'm trying to extract "W"'s list.
OK, so let's first deal with jq, and then with the python interface.
jq
.[] yields all the items in the top-level array, and therefore
.[] | .W will yield two items:
null (because the first item does not have .W), and
the desired list
To extract just "W"'s list, you could use any of the following filters,
depending on your precise requirements:
.[] | select(has("W")) | .W
.[] | .W | select(.)
.[] | .W // empty
.[1].W
from jq import jq
As the documentation at https://pypi.org/project/pyjq/ says:
If multiple_output is False (the default), then the first output is used
For example:
print jq('1,2').transform(data)
yields just 1.
In summary
Depending on the precise requirements, you can use any of the filters given above, for example:
jq('.[] | .W // empty').transform(data)
Moral
If there's a moral to this tale, it might be that, when in doubt, one should consider using jq (the command-line executable) or jqplay to make sure your jq filter is doing what you want.