Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
The string below I believe is in json format but when I try to
load string into a json variable so I can easily query and extract the data points it errors out.
jdata = '<pre>{\n "askId": "AAABB110-000011",\n "dateCreated": "2009-09-01T00:00:00.000Z",\n "dateUpdated": "2021-06-24T00:00:00.000Z",\n "owners": [\n {\n "ownerType": "Service-Level Owner",\n "VendorId": "000111222"\n },\n {\n "ownerType": "Technical Owner",\n "CustomerId": "000333444"\n },\n {\n "ownerType": "Business Owner",\n "ServiceId": "000005556"\n }\n ],\n "createdBy": "SYSTEM",\n "lastUpdatedBy": "000667778",\n "applicationName": "Treasury Bank Data",\n "description": "Process Data",\n "aliases": [\n "Treasury Bank Data",\n "Bank Data",\n "Bank Reconciliation",\n "Bank Rec",\n "SERV-X"\n ],\n "billingBusinessSegmentId": 6,\n "category": {\n "categoryId": 1,\n "categoryName": "Application"\n },\n "lifecycleStage": {\n "lifecycleStageId": 3,\n "lifecycleStageName": "Production"\n },\n "acquiredEntity": {\n "acquiredEntityId": 0,\n "acquiredEntityName": "Not Applicable"\n },\n "enclaveEnvironment": {\n "enclaveEnvironmentId": 0,\n "enclaveEnvironmentName": "General Hosting (Internal, Cloud, or Vendor hosted)"\n },\n "softwareType": {\n "softwareTypeId": 7,\n "softwareTypeName": "Vendor Product"\n },\n "infrastructureRequired": false,\n "uhgHelpdeskRequired": false,\n "references": [\n {\n "referenceType": "disaster-recovery",\n "referenceValue": "APPX009919"\n }\n ]\n}\n</pre>'
jsonData = json.loads(jdata)
Once I try to load the data into jsonData variable i get error below.
Expecting value: line 1 column 1 (char 0)
Is there something wrong with my string not allowing me to load as a json variable?
If you strip off the html tags, you get json:
import json
jdata = '<pre>{\n "askId": "AAABB110-000011",\n "dateCreated": "2009-09-01T00:00:00.000Z",\n "dateUpdated": "2021-06-24T00:00:00.000Z",\n "owners": [\n {\n "ownerType": "Service-Level Owner",\n "VendorId": "000111222"\n },\n {\n "ownerType": "Technical Owner",\n "CustomerId": "000333444"\n },\n {\n "ownerType": "Business Owner",\n "ServiceId": "000005556"\n }\n ],\n "createdBy": "SYSTEM",\n "lastUpdatedBy": "000667778",\n "applicationName": "Treasury Bank Data",\n "description": "Process Data",\n "aliases": [\n "Treasury Bank Data",\n "Bank Data",\n "Bank Reconciliation",\n "Bank Rec",\n "SERV-X"\n ],\n "billingBusinessSegmentId": 6,\n "category": {\n "categoryId": 1,\n "categoryName": "Application"\n },\n "lifecycleStage": {\n "lifecycleStageId": 3,\n "lifecycleStageName": "Production"\n },\n "acquiredEntity": {\n "acquiredEntityId": 0,\n "acquiredEntityName": "Not Applicable"\n },\n "enclaveEnvironment": {\n "enclaveEnvironmentId": 0,\n "enclaveEnvironmentName": "General Hosting (Internal, Cloud, or Vendor hosted)"\n },\n "softwareType": {\n "softwareTypeId": 7,\n "softwareTypeName": "Vendor Product"\n },\n "infrastructureRequired": false,\n "uhgHelpdeskRequired": false,\n "references": [\n {\n "referenceType": "disaster-recovery",\n "referenceValue": "APPX009919"\n }\n ]\n}\n</pre>'
jsonData = json.loads(jdata.lstrip('<pre>').rstrip('</pre>'))
print(jsonData)
Related
i am trying to web scrape all resturant names on deliveroo when entering a postcode, i have somewhat managed to do this with the following code but the issue is i only get 20 or so results, but on deliveroo theres nearly 100 resturants.
import requests
from bs4 import BeautifulSoup
import pandas as pd
# get the page
url = 'https://deliveroo.co.uk/restaurants/oxford/port-meadow?geohash=gcpn7n35zy89&sort=distance'
page = requests.get(url)
# parse the page
soup = BeautifulSoup(page.content, 'html.parser')
# find the element with tag "li" and class "HomeFeedUILines-8bd2eacc5b5ba98e"
results = soup.findAll("li", {"class": "HomeFeedUILines-8bd2eacc5b5ba98e"})
# scrape all the "p" elements inside and get the text attribute
for result in results:
ps = result.findAll("p")
for p in ps:
print(p.text)
i believe someone else faced a similar issue but i do not understand their answer and how i'd implement it:
Scraping website only gets me 20 results, but there should be more.
Also found someones github project that does something similar but much more complex and the code seems to break at the cookie clicking part:https://github.com/SelvinSelbaraju/deliveroo-web-scraper/blob/main/Deliveroo%20-%20Web%20Scraper%20and%20Data%20Analysis.ipynb
Why scrape the frontend when you have direct access to their GraphQL API; I just took a look out of curiosity and before scraping anything I generally check if there's a way to tap into the feed of the DOM rather than scrape the DOM itself (often scraping the DOM is not reliable and can be inconsistent, prone to errors, etc.);
By looking at the AJAX call made when filtering these results, I noticed that GraphQL network request which has all the data you need passing it right into the DOM - that would be the best live feed to tap into; the results are 1.11MB so all your restaurants are surely there.
See the GraphQL python request below, you can consume this JSON and parse the results as you loop through the nested arrays and viola - all your data is there.
import requests
import json
url = "https://api.uk.deliveroo.com/consumer/graphql/"
payload = json.dumps({
"query": "\n query getHomeFeed(\n $ui_actions: [UIActionType!]\n $ui_blocks: [UIBlockType!]\n $ui_controls: [UIControlType!]\n $ui_features: [UIFeatureType!]\n $ui_layouts: [UILayoutType!]\n $ui_layout_carousel_styles: [UILayoutCarouselStyle!]\n $ui_lines: [UILineType!]\n $ui_targets: [UITargetType!]\n $ui_themes: [UIThemeType!]\n $fulfillment_methods: [FulfillmentMethod!]\n $location: LocationInput!\n $url: String\n $options: SearchOptionsInput\n $uuid: String!\n ) {\n results: search(\n location: $location\n options: $options\n url: $url\n capabilities: {\n ui_actions: $ui_actions,\n ui_blocks: $ui_blocks,\n ui_controls: $ui_controls,\n ui_features: $ui_features,\n ui_layouts: $ui_layouts,\n ui_layout_carousel_styles: $ui_layout_carousel_styles,\n ui_lines: $ui_lines\n ui_targets: $ui_targets,\n ui_themes: $ui_themes,\n fulfillment_methods: $fulfillment_methods\n }\n uuid: $uuid\n ) {\n layoutGroups: ui_layout_groups {\n id\n subheader\n data: ui_layouts { ...uiHomeLayoutFields }\n }\n\n controlGroups: ui_control_groups {\n appliedFilters: applied_filters { ...uiControlAppliedFilterFields }\n filters { ...uiControlFilterFields }\n sort { ...uiControlFilterFields }\n queryResults: query_results { ...uiControlQueryResultFields }\n fulfillmentMethods: fulfillment_methods { ...uiControlFulfillmentMethodFields }\n }\n\n modals: ui_modals {\n ...uiModalFields\n ...uiChallengesModalFields\n ...uiPlusFullScreenModalFields\n }\n\n overlays: ui_feed_overlays {\n ...uiFeedOverlayFields\n }\n\n meta {\n ...searchResultMetaFields\n }\n }\n\n \n fulfillmentTimes: fulfillment_times(\n capabilities: {\n ui_actions: $ui_actions,\n ui_blocks: $ui_blocks,\n ui_controls: $ui_controls,\n ui_features: $ui_features,\n ui_layouts: $ui_layouts,\n ui_layout_carousel_styles: $ui_layout_carousel_styles,\n ui_lines: $ui_lines\n ui_targets: $ui_targets,\n ui_themes: $ui_themes,\n fulfillment_methods: $fulfillment_methods\n }\n fulfillment_methods: $fulfillment_methods\n location: $location\n uuid: $uuid\n ) {\n fulfillmentTimeMethods: fulfillment_time_methods {\n fulfillmentMethodLabel: fulfillment_method_label\n fulfillmentMethod: fulfillment_method\n asap {\n ...fulfillmentTimeOptionFields\n }\n days {\n day\n dayLabel: day_label\n times {\n ...fulfillmentTimeOptionFields\n }\n }\n }\n }\n\n }\n \n fragment fulfillmentTimeOptionFields on FulfillmentTimeOption {\n optionLabel: option_label\n selectedLabel: selected_label\n timestamp(format: UNIX)\n selectedTime: selected_time {\n day\n time\n }\n }\n\n \n fragment uiControlFulfillmentMethodFields on UIControlFulfillmentMethod {\n label\n targetMethod: target_method\n }\n\n \n fragment searchResultMetaFields on SearchResultMeta {\n location {\n cityName: city_name\n cityUname: city_uname\n isoCode: country_iso_code\n countryName: country_name\n geohash\n lat\n lon\n neighborhoodUname: neighborhood_uname\n neighborhoodName: neighborhood_name\n postcode\n }\n options {\n fulfillmentMethod: fulfillment_method\n deliveryTime: delivery_time\n params {\n id\n value\n }\n query\n }\n restaurantCount: restaurant_count {\n results\n location\n }\n searchPlaceholder: search_placeholder\n title\n collection {\n linkTitle: link_title\n targetParams: target_params {\n ...uiTargetParamsFields\n }\n previousTargetParams: previous_target_params {\n ...uiTargetParamsFields\n }\n searchBarMeta: search_bar_meta {\n searchBarPlaceholder: search_bar_placeholder\n searchBarParams: search_bar_params {\n id\n value\n }\n }\n }\n uuid\n web {\n url\n }\n warnings {\n type: warning_type\n }\n searchPills: search_pills {\n id\n label\n placeholder\n params {\n id\n value\n }\n }\n }\n\n \n fragment uiTargetFields on UITarget {\n typeName: __typename\n ... on UITargetRestaurant {\n ...uiTargetRestaurant\n }\n ... on UITargetParams {\n ...uiTargetParamsFields\n }\n ... on UITargetAction {\n action\n }\n ... on UITargetMenuItem {\n ...uiTargetMenuItem\n }\n ... on UITargetDeepLink {\n webTarget: fallback_target {\n uri: url\n }\n }\n ... on UITargetMenuItemModifier {\n ...uiTargetMenuItemModifier\n }\n }\n\n \n fragment uiBlockFields on UIBlock {\n typeName: __typename\n ... on UIBanner {\n key\n header\n caption\n backgroundColor: background_color {\n ...colorFields\n }\n buttonCaption: button_caption\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n images {\n icon {\n ...iconFields\n }\n image\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n ... on UIButton {\n key\n text\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n ... on UIShortcut {\n key\n images {\n default\n }\n name\n contentDescription: content_description\n nameColor: name_color {\n ...colorFields\n }\n backgroundColor: background_color {\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n }\n ... on UICard {\n key\n trackingId: tracking_id\n trackingProperties: tracking_properties\n theme: ui_theme\n contentDescription: content_description\n border {\n ...uiCardBorderFields\n }\n target {\n ...uiTargetFields\n }\n uiContent: properties {\n default {\n ...uiHomeCardFields\n }\n expanded {\n ...uiHomeCardFields\n }\n }\n }\n ... on UIMerchandisingCard {\n key\n headerImageUrl: header_image_url\n backgroundImageUrl: background_image_url\n contentDescription: content_description\n uiLines: ui_lines {\n ...uiLines\n }\n cardBackgroundColor: background_color {\n ...colorFields\n }\n buttonCaption: button_caption\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n }\n ... on UICategoryPill {\n ...uiCategoryPillFields\n }\n ... on UITallMenuItemCard {\n ...uiTallMenuItemCardFields\n }\n ... on UIStoryCard {\n ...uiStoryCardFields\n }\n }\n\n \n fragment uiHomeLayoutFields on UILayout {\n typeName: __typename\n ... on UILayoutCarousel {\n header\n subheader\n style\n imageUrl: image_url\n target {\n ...uiTargetFields\n }\n uiLines: ui_lines {\n ...uiLines\n }\n targetPresentational: target_presentational\n key\n blocks: ui_blocks {\n ...uiBlockFields\n }\n trackingId: tracking_id\n rows\n }\n ... on UILayoutList {\n header\n key\n blocks: ui_blocks {\n ...uiBlockFields\n }\n trackingId: tracking_id\n }\n }\n\n \n fragment uiControlFilterFields on UIControlFilter {\n id\n header\n images {\n icon {\n name\n image\n }\n }\n optionsType: options_type\n options {\n count\n default\n name: header\n id\n selected\n target_params {\n ...uiTargetParamsFields\n }\n }\n styling {\n web {\n desktop {\n collapse\n }\n mobile {\n collapse\n }\n }\n }\n }\n\n \n fragment uiControlAppliedFilterFields on UIControlAppliedFilter {\n label\n target_params {\n ...uiTargetParamsFields\n }\n }\n\n \n fragment uiTargetParamsFields on UITargetParams {\n params {\n id\n value\n }\n queryParams: query_params\n title\n typeName: __typename\n }\n\n \n fragment uiTargetRestaurant on UITargetRestaurant {\n restaurant {\n id\n name\n links {\n self {\n href\n }\n }\n }\n typeName: __typename\n }\n\n \n fragment uiTargetMenuItem on UITargetMenuItem {\n menuItem: menu_item {\n id\n }\n links {\n href\n }\n typeName: __typename\n }\n\n \n fragment uiTargetMenuItemModifier on UITargetMenuItemModifier {\n restaurantId: restaurant_id\n menuItemId: menu_item_id\n uiTargetType: ui_target_type\n }\n\n \n fragment uiControlQueryResultFields on UIControlQueryResult {\n header\n key\n resultTarget: result_target {\n ...uiTargetFields\n }\n targetPresentational: result_target_presentational\n trackingId: tracking_id\n options {\n key\n count\n highlights {\n begin\n end\n }\n uiLines: ui_lines {\n ...uiLines\n }\n image {\n type: __typename\n ... on DeliverooIcon {\n ...iconFields\n }\n ... on UIControlQueryResultOptionImageSet {\n default\n }\n }\n label\n isAvailable: is_available\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n }\n }\n\n \n fragment colorFields on Color {\n hex\n r: red\n g: green\n b: blue\n a: alpha\n }\n\n \n fragment colorGradientFields on ColorGradient {\n from {\n ...colorFields\n }\n to {\n ...colorFields\n }\n }\n\n \n fragment iconFields on DeliverooIcon {\n name\n image\n }\n\n \n fragment illustrationBadgeFields on DeliverooIllustrationBadge {\n name\n image\n }\n\n \n fragment uiHomeCardFields on UICardFields {\n bubble {\n uiLines: ui_lines {\n ...uiLines\n }\n }\n overlay {\n background: background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n text {\n position\n color {\n ...colorFields\n }\n value\n }\n promotionTag: promotion_tag {\n primaryTagLine: primary_tag_line {\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n text {\n ...uiLines\n }\n }\n secondaryTagLine: secondary_tag_line {\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n text {\n ...uiLines\n }\n }\n }\n }\n favouritesOverlay: favourites_overlay {\n id\n entity\n isSelected: is_selected\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n selectedColor: selected_color {\n ...colorFields\n }\n unselectedColor: unselected_color {\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n countData: count_data {\n count\n isMaxCount: is_max_count\n }\n }\n countdownBadgeOverlay: countdown_badge_overlay {\n backgroundColor: background_color {\n ...colorFields\n }\n uiLine: ui_line {\n ...uiLines\n }\n }\n image\n uiLines: ui_lines {\n ...uiLines\n }\n }\n\n \n fragment uiTextLine on UITextLine {\n typeName: __typename\n key\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n ... on UISpanCountdown {\n endsAt: ends_at\n isBold: is_bold\n size\n key\n color {\n ...colorFields\n }\n }\n ... on UISpanTag {\n key\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n }\n backgroundColor: background_color {\n ...colorFields\n }\n }\n }\n }\n\n \n fragment uiLines on UILine {\n typeName: __typename\n ... on UITextLine {\n ...uiTextLine\n }\n ... on UITitleLine {\n key\n text\n color {\n ...colorFields\n }\n size\n }\n ... on UIBulletLine {\n key\n iconSpan: icon_span {\n typeName: __typename\n color {\n ...colorFields\n }\n icon {\n ...iconFields\n }\n iconSize: size\n }\n bulletSpacerSpan: bullet_spacer_span {\n typeName: __typename\n width\n }\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n }\n }\n }\n\n \n fragment uiSpansPrimitive on UISpan {\n typeName: __typename\n ... on UISpanIcon {\n key\n color {\n ...colorFields\n }\n icon {\n ...iconFields\n }\n iconSize: size\n }\n ... on UISpanSpacer {\n key\n width\n }\n ... on UISpanText {\n key\n color {\n ...colorFields\n }\n text\n isBold: is_bold\n textSize: size\n }\n }\n\n \n fragment uiCardBorderFields on UICardBorderType {\n topColor: top_color {\n ...colorFields\n }\n bottomColor: bottom_color {\n ...colorFields\n }\n leftColor: left_color {\n ...colorFields\n }\n rightColor: right_color {\n ...colorFields\n }\n borderWidth: border_width\n }\n\n \n fragment uiModalButtonFields on UIModalButton {\n title\n theme: ui_theme\n dismissOnAction: dismiss_on_action\n target {\n typeName: __typename\n ... on UITargetWebPage {\n url\n newWindow: new_window\n }\n ... on UITargetAction {\n action\n params {\n id\n value\n }\n }\n ... on UITargetParams {\n ...uiTargetParamsFields\n }\n }\n trackingId: tracking_id\n }\n\n \n fragment uiModalFields on UIModal {\n typeName: __typename\n header\n caption\n image {\n ... on UIModalImage {\n image\n }\n ... on DeliverooIcon {\n ...iconFields\n }\n ... on DeliverooIllustrationBadge {\n ...illustrationBadgeFields\n }\n }\n buttons {\n ...uiModalButtonFields\n }\n theme: ui_theme\n displayId: display_id\n trackingId: tracking_id\n }\n\n \n fragment uiChallengesModalFields on UIChallengesModal {\n typeName: __typename\n displayId: display_id\n trackingId: tracking_id\n challengeDrnId: challenges_drn_id\n mode\n smallView: small_view {\n header\n bodyText: body_text\n infoButton: info_button {\n ...uiModalButtonFields\n }\n icon {\n ... on UIChallengesIndicator {\n required\n completed\n }\n ... on UIChallengesBadge {\n url\n }\n ... on UIChallengesSteppedIndicator {\n steps {\n ... on UIChallengesSteppedStamp {\n text\n icon\n isHighlighted: is_highlighted\n }\n }\n stepsCompleted: steps_completed\n stepsRequired: steps_required\n }\n }\n }\n fullView: full_view {\n header\n headerSubtitle: header_subtitle\n bodyTitle: body_title\n bodyText: body_text\n confirmationButton: confirmation_button {\n ...uiModalButtonFields\n }\n infoButton: info_button {\n ...uiModalButtonFields\n }\n icon {\n ... on UIChallengesIndicator {\n required\n completed\n }\n ... on UIChallengesBadge {\n url\n }\n ... on UIChallengesSteppedIndicator {\n steps {\n ... on UIChallengesSteppedStamp {\n text\n icon\n isHighlighted: is_highlighted\n }\n }\n stepsCompleted: steps_completed\n stepsRequired: steps_required\n }\n }\n }\n }\n\n \n fragment uiPlusFullScreenModalFields on UIPlusFullScreenModal {\n typeName: __typename\n displayId: display_id\n trackingId: tracking_id\n image {\n typeName: __typename\n ... on UIModalImage {\n image\n }\n ... on DeliverooIllustrationBadge {\n ...illustrationBadgeFields\n }\n }\n header\n body\n footnote\n primaryButton {\n ...uiModalButtonFields\n }\n secondaryButton {\n ...uiModalButtonFields\n }\n confetti\n displayOnlyOnce: display_only_once\n }\n\n \n fragment uiFeedOverlayFields on UIFeedOverlay {\n id\n position\n blocks: overlay_blocks {\n ... on UIFeedOverlayBanner {\n typeName: __typename\n id: display_id\n trackingId: tracking_id\n header\n caption\n isDismissible: is_dismissible\n theme: ui_theme\n image {\n ... on DeliverooIllustrationBadge {\n name\n }\n }\n }\n }\n }\n\n \n fragment uiCategoryPillFields on UICategoryPill {\n typeName: __typename\n blocks: content {\n ...uiLines\n }\n backgroundColor: background_color {\n typeName: __typename\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n contentDescription: content_description\n }\n\n \n fragment uiTallMenuItemCardFields on UITallMenuItemCard {\n id: menu_item_id\n title\n key\n image\n target {\n ...uiTargetFields\n }\n price {\n ...currencyFields\n }\n trackingId: tracking_id\n }\n\n \n fragment currencyFields on Currency {\n code\n formatted\n fractional\n presentational\n }\n\n \n fragment uiStoryCardFields on UIStoryCard {\n preview {\n profile {\n imageUrl: image_url\n headingLines: heading_lines {\n ...uiLines\n }\n }\n video {\n sources {\n url\n type\n }\n placeholderImage: placeholder_url\n autoplay\n trackingId: tracking_id\n }\n overlay {\n typeName: __typename\n ... on UIStoryTextOverlay {\n background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n uiLines: ui_lines {\n ...uiLines\n }\n }\n }\n target {\n ...uiTargetFields\n }\n }\n main {\n profile {\n imageUrl: image_url\n headingLines: heading_lines {\n ...uiLines\n }\n }\n video {\n sources {\n url\n type\n }\n placeholderImage: placeholder_url\n autoplay\n trackingId: tracking_id\n }\n overlay {\n typeName: __typename\n ... on UIStoryButtonOverlay {\n background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n contentLines: content {\n ...uiLines\n }\n button {\n key\n text\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n }\n }\n }\n trackingId: tracking_id\n trackingProperties: tracking_properties\n key\n }\n\n ",
"variables": {
"ui_blocks": [
"BANNER",
"CARD",
"SHORTCUT",
"BUTTON",
"MERCHANDISING_CARD",
"STORY_CARD"
],
"ui_controls": [
"APPLIED_FILTER",
"FILTER",
"SORT"
],
"ui_layout_carousel_styles": [
"DEFAULT",
"PARTNER_HEADING"
],
"ui_lines": [
"TITLE",
"TEXT",
"BULLET"
],
"ui_targets": [
"PARAMS",
"RESTAURANT",
"MENU_ITEM",
"WEB_PAGE",
"DEEP_LINK"
],
"fulfillment_methods": [
"DELIVERY",
"COLLECTION"
],
"location": {
"geohash": "gcpn7n35zy89",
"city_uname": "oxford",
"neighborhood_uname": "port-meadow",
"postcode": ""
},
"options": {
"query": "",
"web_column_count": 4,
"user_preference": {
"seen_modals": [
{
"id": "nc_promos_voucher_take10",
"timestamp": 1674830243
}
]
}
},
"url": "https://deliveroo.co.uk/restaurants/oxford/port-meadow?sort=distance&geohash=gcpn7n35zy89",
"uuid": "7270af10-2722-4bda-b041-d768208912c6",
"ui_actions": [
"CHANGE_DELIVERY_TIME",
"CLEAR_FILTERS",
"NO_DELIVERY_YET",
"SHOW_MEAL_CARD_ISSUERS",
"SHOWCASE_PICKUP",
"TOGGLE_FAVOURITE",
"COPY_TO_CLIPBOARD",
"SHOW_PICKUP",
"SHOW_DELIVERY",
"REFRESH",
"SHOW_VIDEO_STORIES",
"SHOW_HOME_MAP_VIEW",
"ACCEPT_CHALLENGES",
"SHOW_CHALLENGES_DETAILS"
],
"ui_features": [
"UNAVAILABLE_RESTAURANTS",
"LIMIT_QUERY_RESULTS",
"UI_CARD_BORDER",
"UI_CAROUSEL_COLOR",
"UI_PROMOTION_TAG",
"UI_BACKGROUND",
"ILLUSTRATION_BADGES",
"SCHEDULED_RANGES",
"UI_SPAN_TAGS",
"UI_SPAN_COUNTDOWN",
"HOME_MAP_VIEW"
],
"ui_themes": [
"BANNER_CARD",
"BANNER_EMPTY",
"BANNER_MARKETING_A",
"BANNER_MARKETING_B",
"BANNER_MARKETING_C",
"BANNER_PICKUP_SHOWCASE",
"BANNER_SERVICE_ADVISORY",
"CARD_LARGE",
"CARD_MEDIUM",
"CARD_MEDIUM_HORIZONTAL",
"CARD_SMALL",
"CARD_SMALL_DIAGONAL",
"CARD_SMALL_HORIZONTAL",
"CARD_WIDE",
"CARD_TALL",
"CARD_TALL_GRADIENT",
"MODAL_DEFAULT",
"MODAL_PLUS",
"MODAL_BUTTON_PRIMARY",
"MODAL_BUTTON_SECONDARY",
"MODAL_BUTTON_TERTIARY",
"SHORTCUT_DEFAULT",
"SHORTCUT_STACKED",
"SHORTCUT_HORIZONTAL",
"BUTTON_PRIMARY",
"BUTTON_SECONDARY",
"ANY_MODAL"
],
"ui_layouts": [
"LIST",
"CAROUSEL"
]
}
})
headers = {
'authority': 'api.uk.deliveroo.com',
'accept': 'application/json, application/vnd.api+json',
'accept-language': 'en',
'authorization': '',
'content-type': 'application/json',
'origin': 'https://deliveroo.co.uk',
'referer': 'https://deliveroo.co.uk/',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-roo-client': 'consumer-web-app',
'x-roo-client-referer': '',
'x-roo-country': 'uk',
'x-roo-external-device-id': '',
'x-roo-guid': '76ff73aa-cf71-47eb-aeaf-ada8b6c400c1',
'x-roo-session-guid': 'cdd20a08-5740-49b0-ba93-90f58def6d43',
'x-roo-sticky-guid': '76ff73aa-cf71-47eb-aeaf-ada8b6c400c1',
'Cookie': '__cf_bm=nWG7YHerxVjA.cYSDMt6MA4YytsDozK3asJXqyb5A.8-1674830393-0-AbXU1uPHJZDRHZhTAIgJDvPdu7i8l4jypJWlcqs5f091IupaQEClXx3fWikwZpO80QD3SWYBeZQB9YiUpCAC0KBdTvgpY7KNwZlPufclBwSM'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Give it a shot and you'll see what I mean :)
i'm still learn to code with python. I really need help to scrape the element from this website:
https://www.tokopedia.com/craftdale/crossback-apron-hijau-army?src=topads
I want to get Review data (Review Time) from Review (Ulasan) container
enter image description here
This is HTML from the site
<p disabled="" data-testid="txtDateGivenReviewFilter0" class="css-oals0c-unf-heading e1qvo2ff8">1 bulan lalu</p>
I've tried to get the element with this code
review = soup.findAll('p',class_='css-oals0c-unf-heading e1qvo2ff8')
or
review= soup.findAll('p',id_='txtDateGivenReviewFilter0')
But the result i only get empty data
enter image description here
Can anybody fix this problem? Thank you very much
When you analyse the website, the website makes ajax calls to retrieve different information in the website. To get the review information, it makes an ajax call to a specific endpoint with json payload.
import requests, json
payload = [{"operationName": "PDPReviewRatingQuery", "variables": {"productId": 353506414}, "query": "query PDPReviewRatingQuery($productId: Int!) {\n ProductRatingQuery(productId: $productId) {\n ratingScore\n totalRating\n totalRatingWithImage\n detail {\n rate\n totalReviews\n percentage\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewImagesQuery", "variables": {"productID": 353506414, "page": 1}, "query": "query PDPReviewImagesQuery($page: Int, $productID: Int!) {\n ProductReviewImageListQuery(page: $page, productID: $productID) {\n detail {\n reviews {\n reviewer {\n fullName\n profilePicture\n __typename\n }\n reviewId\n message\n rating\n updateTime\n isReportable\n __typename\n }\n images {\n imageAttachmentID\n description\n uriThumbnail\n uriLarge\n reviewID\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewHelpfulQuery", "variables": {"productID": 353506414}, "query": "query PDPReviewHelpfulQuery($productID: Int!) {\n ProductMostHelpfulReviewQuery(productId: $productID) {\n shop {\n shopId\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n user {\n fullName\n image\n url\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewListQuery", "variables": {"page": 1, "rating": 0, "withAttachment": 0, "productID": 353506414, "perPage": 10}, "query": "query PDPReviewListQuery($productID: Int!, $page: Int!, $perPage: Int!, $rating: Int!, $withAttachment: Int!) {\n ProductReviewListQuery(productId: $productID, page: $page, perPage: $perPage, rating: $rating, withAttachment: $withAttachment) {\n shop {\n shopId\n name\n image\n url\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n reviewResponse {\n message\n createTime\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n user {\n userId\n fullName\n image\n url\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}]
res = requests.post("https://gql.tokopedia.com/", json=payload)
data = res.json()
with open("data.json", "w") as f:
json.dump(data, f)
The above script will save the review information as a json to a file.
In order to get the rating score
print(data[0]['data']['ProductRatingQuery']['ratingScore'])
``
I am trying to scrape this site for job openings:
https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/?q=&o=postedDateDesc&w=&wc=&we=&wpst=
I looked in dev tools and saw that the page makes an XHR request to this site to retrieve the job opening(s) information which is in the form of a JSON object:
https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/JobBoardView/LoadSearchResults
So I'm like "Great! I can parse this in two seconds using a python program like this":
''' from bs4 import BeautifulSoup
import json
import requests
def crawl():
union = requests.get('https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/JobBoardView/LoadSearchResults').content
soup = BeautifulSoup(union, 'html.parser')
newDict = json.loads(str(soup))
for job in newDict['opportunities']:
print(job['Title'])
crawl() '''
Well it turns out that this page only returns 20 job openings out of 62. So I went back to the page and loaded the entirety of the page (clicked "view more opportunities")
And it said that it sent another XHR request to that same link, yet only 20 records are shown when I look.
How can I scrape all of the records from this page? And if someone could explain what is going on behind the scenes that would be great. I am a little new to web scraping so any insight is appreciated.
You don't need do a scraping , like you say the API that return all json is the link
https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/JobBoardView/LoadSearchResults but you need set in body request this parameters
import requests
headers = {
'Content-Type': 'application/json'
}
data = '{\n "opportunitySearch": {\n "Top": 62,\n "Skip": 0,\n "QueryString": "",\n "OrderBy": [\n {\n "Value": "postedDateDesc",\n "PropertyName": "PostedDate",\n "Ascending": false\n }\n ],\n "Filters": [\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 4,\n "extra": null,\n "values": [\n \n ]\n },\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 5,\n "extra": null,\n "values": [\n \n ]\n },\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 6,\n "extra": null,\n "values": [\n \n ]\n }\n ]\n },\n "matchCriteria": {\n "PreferredJobs": [\n \n ],\n "Educations": [\n \n ],\n "LicenseAndCertifications": [\n \n ],\n "Skills": [\n \n ],\n "hasNoLicenses": false,\n "SkippedSkills": [\n \n ]\n }\n}'
response = requests.post('https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/JobBoardView/LoadSearchResults', headers=headers, data=data)
print(response.text)
And here using pandas (pip install pandas)
import requests
import pandas as pd
pd.set_option('display.width', 1000)
headers = {
'Content-Type': 'application/json'
}
data = '{\n "opportunitySearch": {\n "Top": 62,\n "Skip": 0,\n "QueryString": "",\n "OrderBy": [\n {\n "Value": "postedDateDesc",\n "PropertyName": "PostedDate",\n "Ascending": false\n }\n ],\n "Filters": [\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 4,\n "extra": null,\n "values": [\n \n ]\n },\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 5,\n "extra": null,\n "values": [\n \n ]\n },\n {\n "t": "TermsSearchFilterDto",\n "fieldName": 6,\n "extra": null,\n "values": [\n \n ]\n }\n ]\n },\n "matchCriteria": {\n "PreferredJobs": [\n \n ],\n "Educations": [\n \n ],\n "LicenseAndCertifications": [\n \n ],\n "Skills": [\n \n ],\n "hasNoLicenses": false,\n "SkippedSkills": [\n \n ]\n }\n}'
response = requests.post('https://recruiting.ultipro.com/UNI1029UNION/JobBoard/74c2a308-3bf1-4fb1-8a83-f92fa61499d3/JobBoardView/LoadSearchResults', headers=headers, data=data)
data=response.json()
df=pd.DataFrame.from_dict(data['opportunities'])
df= df[['Id','Title','RequisitionNumber','JobCategoryName','PostedDate']]
print(df.head(5))
Where data has "TOP" 62 like a limited your results:
{
"opportunitySearch": {
"Top": 62,
"Skip": 0,
"QueryString": "",
"OrderBy": [
{
"Value": "postedDateDesc",
"PropertyName": "PostedDate",
"Ascending": false
}
],
"Filters": [
{
"t": "TermsSearchFilterDto",
"fieldName": 4,
"extra": null,
"values": [
]
},
{
"t": "TermsSearchFilterDto",
"fieldName": 5,
"extra": null,
"values": [
]
},
{
"t": "TermsSearchFilterDto",
"fieldName": 6,
"extra": null,
"values": [
]
}
]
},
"matchCriteria": {
"PreferredJobs": [
],
"Educations": [
],
"LicenseAndCertifications": [
],
"Skills": [
],
"hasNoLicenses": false,
"SkippedSkills": [
]
}
}
So i finally have setup the elasticsearch database and imported data into it.
Sometimes when i try to request data from frontend, i get 500 error( not all the time, just sometimes ).
I tried to request data from POSTMAN( to see the ES error message ).
I got:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[9m4uVcf3TLmQ9Kr7z_fSpQ][text][0]: QueryPhaseExecutionException[[text][0]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#56319fc9]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#60b46f02]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][1]: QueryPhaseExecutionException[[text][1]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#3ca7d41e]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#63daf999]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][2]: QueryPhaseExecutionException[[text][2]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#27521539]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#66dbac2b]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][3]: QueryPhaseExecutionException[[text][3]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#73bb4f5e]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#112dcf1c]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][4]: QueryPhaseExecutionException[[text][4]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#b650549]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#7fbe90f4]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }]",
"status": 500
}
Here is the request body:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "test",
"minimum_should_match": "-25%",
"type": "cross_fields",
"tie_breaker": 0.5,
"fields": ["title^3", "body", "url_words^2", "domain_words^8"]
}
},
"functions": [{
"field_value_factor": {
"field": "rank",
"factor": 1
}
},{
"field_value_factor": {
"field": "lang_en"
}
}]
}
},
"from": 0,
"size": 25
}
I understand that Missing value for field [lang_en] is the problem. I toyed around es with google results, but without success.
ES version: 1.5.2
Any ideas ?
EDIT:
I added "missing": 0, to second field_value_factor, but i got this error instead:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[9m4uVcf3TLmQ9Kr7z_fSpQ][text][0]: SearchParseException[[text][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][1]: SearchParseException[[text][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][2]: SearchParseException[[text][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][3]: SearchParseException[[text][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][4]: SearchParseException[[text][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }]",
"status": 400
}
In at least one document, the field lang_en is null, empty or simply non-existent.
You need to modify your field_value_factor function in order to tell it what to do in such as case, by using the missing setting with whatever default value makes sense (0, 1, etc):
{
"field_value_factor": {
"field": "lang_en",
"missing": 1 <---- add this line
}
}
The problem was with the dumb AWS ES version 1.5.2.
My solution: Create EC2 instance and deploy Elasticsearch manualy.
I am running python on windows and trying to pretty print a json output. Here is what I get:
>>> ta
{u'Status': {u'code': 200, u'request': u'geocode'}, u'Placemark': [{u'Point': {u'coordinates': [34.777821, 32.066157, 0]}, u'ExtendedData': {u'LatLonBox': {u'west': 34.71
37913, u'east': 34.8418507, u'north': 32.1039719, u'south': 32.0283265}}, u'AddressDetails': {u'Country': {u'CountryName': u'\u05d9\u05e9\u05e8\u05d0\u05dc', u'Locality':
{u'LocalityName': u'\u05ea\u05dc \u05d0\u05d1\u05d9\u05d1 \u05d9\u05e4\u05d5'}, u'CountryNameCode': u'IL'}, u'Accuracy': 4}, u'id': u'p1', u'address': u'Tel Aviv, Israel
'}], u'name': u'Tel Aviv'}
>>> json.dumps(ta, sort_keys=True, indent = 4)
'{\n "Placemark": [\n {\n "AddressDetails": {\n "Accuracy": 4, \n "Country": {\n "CountryName": "\\u
05d9\\u05e9\\u05e8\\u05d0\\u05dc", \n "CountryNameCode": "IL", \n "Locality": {\n "LocalityName": "\\u05ea\\u
05dc \\u05d0\\u05d1\\u05d9\\u05d1 \\u05d9\\u05e4\\u05d5"\n }\n }\n }, \n "ExtendedData": {\n "LatLo
nBox": {\n "east": 34.8418507, \n "north": 32.1039719, \n "south": 32.0283265, \n "west": 34.7
137913\n }\n }, \n "Point": {\n "coordinates": [\n 34.777821, \n 32.066157, \n
0\n ]\n }, \n "address": "Tel Aviv, Israel", \n "id": "p1"\n }\n ], \n "Status": {\n
"code": 200, \n "request": "geocode"\n }, \n "name": "Tel Aviv"\n}'
>>>
Why doesn't it work?
It did work. Remember that the JSON representation of a dictionary looks very much like Python syntax. Try printing out the return value from json.dumps and see if it looks more like you expect:
s = json.dumps(ta, sort_keys=True, indent = 4)
print s