Related
i am trying to web scrape all resturant names on deliveroo when entering a postcode, i have somewhat managed to do this with the following code but the issue is i only get 20 or so results, but on deliveroo theres nearly 100 resturants.
import requests
from bs4 import BeautifulSoup
import pandas as pd
# get the page
url = 'https://deliveroo.co.uk/restaurants/oxford/port-meadow?geohash=gcpn7n35zy89&sort=distance'
page = requests.get(url)
# parse the page
soup = BeautifulSoup(page.content, 'html.parser')
# find the element with tag "li" and class "HomeFeedUILines-8bd2eacc5b5ba98e"
results = soup.findAll("li", {"class": "HomeFeedUILines-8bd2eacc5b5ba98e"})
# scrape all the "p" elements inside and get the text attribute
for result in results:
ps = result.findAll("p")
for p in ps:
print(p.text)
i believe someone else faced a similar issue but i do not understand their answer and how i'd implement it:
Scraping website only gets me 20 results, but there should be more.
Also found someones github project that does something similar but much more complex and the code seems to break at the cookie clicking part:https://github.com/SelvinSelbaraju/deliveroo-web-scraper/blob/main/Deliveroo%20-%20Web%20Scraper%20and%20Data%20Analysis.ipynb
Why scrape the frontend when you have direct access to their GraphQL API; I just took a look out of curiosity and before scraping anything I generally check if there's a way to tap into the feed of the DOM rather than scrape the DOM itself (often scraping the DOM is not reliable and can be inconsistent, prone to errors, etc.);
By looking at the AJAX call made when filtering these results, I noticed that GraphQL network request which has all the data you need passing it right into the DOM - that would be the best live feed to tap into; the results are 1.11MB so all your restaurants are surely there.
See the GraphQL python request below, you can consume this JSON and parse the results as you loop through the nested arrays and viola - all your data is there.
import requests
import json
url = "https://api.uk.deliveroo.com/consumer/graphql/"
payload = json.dumps({
"query": "\n query getHomeFeed(\n $ui_actions: [UIActionType!]\n $ui_blocks: [UIBlockType!]\n $ui_controls: [UIControlType!]\n $ui_features: [UIFeatureType!]\n $ui_layouts: [UILayoutType!]\n $ui_layout_carousel_styles: [UILayoutCarouselStyle!]\n $ui_lines: [UILineType!]\n $ui_targets: [UITargetType!]\n $ui_themes: [UIThemeType!]\n $fulfillment_methods: [FulfillmentMethod!]\n $location: LocationInput!\n $url: String\n $options: SearchOptionsInput\n $uuid: String!\n ) {\n results: search(\n location: $location\n options: $options\n url: $url\n capabilities: {\n ui_actions: $ui_actions,\n ui_blocks: $ui_blocks,\n ui_controls: $ui_controls,\n ui_features: $ui_features,\n ui_layouts: $ui_layouts,\n ui_layout_carousel_styles: $ui_layout_carousel_styles,\n ui_lines: $ui_lines\n ui_targets: $ui_targets,\n ui_themes: $ui_themes,\n fulfillment_methods: $fulfillment_methods\n }\n uuid: $uuid\n ) {\n layoutGroups: ui_layout_groups {\n id\n subheader\n data: ui_layouts { ...uiHomeLayoutFields }\n }\n\n controlGroups: ui_control_groups {\n appliedFilters: applied_filters { ...uiControlAppliedFilterFields }\n filters { ...uiControlFilterFields }\n sort { ...uiControlFilterFields }\n queryResults: query_results { ...uiControlQueryResultFields }\n fulfillmentMethods: fulfillment_methods { ...uiControlFulfillmentMethodFields }\n }\n\n modals: ui_modals {\n ...uiModalFields\n ...uiChallengesModalFields\n ...uiPlusFullScreenModalFields\n }\n\n overlays: ui_feed_overlays {\n ...uiFeedOverlayFields\n }\n\n meta {\n ...searchResultMetaFields\n }\n }\n\n \n fulfillmentTimes: fulfillment_times(\n capabilities: {\n ui_actions: $ui_actions,\n ui_blocks: $ui_blocks,\n ui_controls: $ui_controls,\n ui_features: $ui_features,\n ui_layouts: $ui_layouts,\n ui_layout_carousel_styles: $ui_layout_carousel_styles,\n ui_lines: $ui_lines\n ui_targets: $ui_targets,\n ui_themes: $ui_themes,\n fulfillment_methods: $fulfillment_methods\n }\n fulfillment_methods: $fulfillment_methods\n location: $location\n uuid: $uuid\n ) {\n fulfillmentTimeMethods: fulfillment_time_methods {\n fulfillmentMethodLabel: fulfillment_method_label\n fulfillmentMethod: fulfillment_method\n asap {\n ...fulfillmentTimeOptionFields\n }\n days {\n day\n dayLabel: day_label\n times {\n ...fulfillmentTimeOptionFields\n }\n }\n }\n }\n\n }\n \n fragment fulfillmentTimeOptionFields on FulfillmentTimeOption {\n optionLabel: option_label\n selectedLabel: selected_label\n timestamp(format: UNIX)\n selectedTime: selected_time {\n day\n time\n }\n }\n\n \n fragment uiControlFulfillmentMethodFields on UIControlFulfillmentMethod {\n label\n targetMethod: target_method\n }\n\n \n fragment searchResultMetaFields on SearchResultMeta {\n location {\n cityName: city_name\n cityUname: city_uname\n isoCode: country_iso_code\n countryName: country_name\n geohash\n lat\n lon\n neighborhoodUname: neighborhood_uname\n neighborhoodName: neighborhood_name\n postcode\n }\n options {\n fulfillmentMethod: fulfillment_method\n deliveryTime: delivery_time\n params {\n id\n value\n }\n query\n }\n restaurantCount: restaurant_count {\n results\n location\n }\n searchPlaceholder: search_placeholder\n title\n collection {\n linkTitle: link_title\n targetParams: target_params {\n ...uiTargetParamsFields\n }\n previousTargetParams: previous_target_params {\n ...uiTargetParamsFields\n }\n searchBarMeta: search_bar_meta {\n searchBarPlaceholder: search_bar_placeholder\n searchBarParams: search_bar_params {\n id\n value\n }\n }\n }\n uuid\n web {\n url\n }\n warnings {\n type: warning_type\n }\n searchPills: search_pills {\n id\n label\n placeholder\n params {\n id\n value\n }\n }\n }\n\n \n fragment uiTargetFields on UITarget {\n typeName: __typename\n ... on UITargetRestaurant {\n ...uiTargetRestaurant\n }\n ... on UITargetParams {\n ...uiTargetParamsFields\n }\n ... on UITargetAction {\n action\n }\n ... on UITargetMenuItem {\n ...uiTargetMenuItem\n }\n ... on UITargetDeepLink {\n webTarget: fallback_target {\n uri: url\n }\n }\n ... on UITargetMenuItemModifier {\n ...uiTargetMenuItemModifier\n }\n }\n\n \n fragment uiBlockFields on UIBlock {\n typeName: __typename\n ... on UIBanner {\n key\n header\n caption\n backgroundColor: background_color {\n ...colorFields\n }\n buttonCaption: button_caption\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n images {\n icon {\n ...iconFields\n }\n image\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n ... on UIButton {\n key\n text\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n ... on UIShortcut {\n key\n images {\n default\n }\n name\n contentDescription: content_description\n nameColor: name_color {\n ...colorFields\n }\n backgroundColor: background_color {\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n }\n ... on UICard {\n key\n trackingId: tracking_id\n trackingProperties: tracking_properties\n theme: ui_theme\n contentDescription: content_description\n border {\n ...uiCardBorderFields\n }\n target {\n ...uiTargetFields\n }\n uiContent: properties {\n default {\n ...uiHomeCardFields\n }\n expanded {\n ...uiHomeCardFields\n }\n }\n }\n ... on UIMerchandisingCard {\n key\n headerImageUrl: header_image_url\n backgroundImageUrl: background_image_url\n contentDescription: content_description\n uiLines: ui_lines {\n ...uiLines\n }\n cardBackgroundColor: background_color {\n ...colorFields\n }\n buttonCaption: button_caption\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n }\n ... on UICategoryPill {\n ...uiCategoryPillFields\n }\n ... on UITallMenuItemCard {\n ...uiTallMenuItemCardFields\n }\n ... on UIStoryCard {\n ...uiStoryCardFields\n }\n }\n\n \n fragment uiHomeLayoutFields on UILayout {\n typeName: __typename\n ... on UILayoutCarousel {\n header\n subheader\n style\n imageUrl: image_url\n target {\n ...uiTargetFields\n }\n uiLines: ui_lines {\n ...uiLines\n }\n targetPresentational: target_presentational\n key\n blocks: ui_blocks {\n ...uiBlockFields\n }\n trackingId: tracking_id\n rows\n }\n ... on UILayoutList {\n header\n key\n blocks: ui_blocks {\n ...uiBlockFields\n }\n trackingId: tracking_id\n }\n }\n\n \n fragment uiControlFilterFields on UIControlFilter {\n id\n header\n images {\n icon {\n name\n image\n }\n }\n optionsType: options_type\n options {\n count\n default\n name: header\n id\n selected\n target_params {\n ...uiTargetParamsFields\n }\n }\n styling {\n web {\n desktop {\n collapse\n }\n mobile {\n collapse\n }\n }\n }\n }\n\n \n fragment uiControlAppliedFilterFields on UIControlAppliedFilter {\n label\n target_params {\n ...uiTargetParamsFields\n }\n }\n\n \n fragment uiTargetParamsFields on UITargetParams {\n params {\n id\n value\n }\n queryParams: query_params\n title\n typeName: __typename\n }\n\n \n fragment uiTargetRestaurant on UITargetRestaurant {\n restaurant {\n id\n name\n links {\n self {\n href\n }\n }\n }\n typeName: __typename\n }\n\n \n fragment uiTargetMenuItem on UITargetMenuItem {\n menuItem: menu_item {\n id\n }\n links {\n href\n }\n typeName: __typename\n }\n\n \n fragment uiTargetMenuItemModifier on UITargetMenuItemModifier {\n restaurantId: restaurant_id\n menuItemId: menu_item_id\n uiTargetType: ui_target_type\n }\n\n \n fragment uiControlQueryResultFields on UIControlQueryResult {\n header\n key\n resultTarget: result_target {\n ...uiTargetFields\n }\n targetPresentational: result_target_presentational\n trackingId: tracking_id\n options {\n key\n count\n highlights {\n begin\n end\n }\n uiLines: ui_lines {\n ...uiLines\n }\n image {\n type: __typename\n ... on DeliverooIcon {\n ...iconFields\n }\n ... on UIControlQueryResultOptionImageSet {\n default\n }\n }\n label\n isAvailable: is_available\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n }\n }\n\n \n fragment colorFields on Color {\n hex\n r: red\n g: green\n b: blue\n a: alpha\n }\n\n \n fragment colorGradientFields on ColorGradient {\n from {\n ...colorFields\n }\n to {\n ...colorFields\n }\n }\n\n \n fragment iconFields on DeliverooIcon {\n name\n image\n }\n\n \n fragment illustrationBadgeFields on DeliverooIllustrationBadge {\n name\n image\n }\n\n \n fragment uiHomeCardFields on UICardFields {\n bubble {\n uiLines: ui_lines {\n ...uiLines\n }\n }\n overlay {\n background: background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n text {\n position\n color {\n ...colorFields\n }\n value\n }\n promotionTag: promotion_tag {\n primaryTagLine: primary_tag_line {\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n text {\n ...uiLines\n }\n }\n secondaryTagLine: secondary_tag_line {\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n text {\n ...uiLines\n }\n }\n }\n }\n favouritesOverlay: favourites_overlay {\n id\n entity\n isSelected: is_selected\n backgroundColor: background_color {\n ...colorFields\n ...colorGradientFields\n }\n selectedColor: selected_color {\n ...colorFields\n }\n unselectedColor: unselected_color {\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n countData: count_data {\n count\n isMaxCount: is_max_count\n }\n }\n countdownBadgeOverlay: countdown_badge_overlay {\n backgroundColor: background_color {\n ...colorFields\n }\n uiLine: ui_line {\n ...uiLines\n }\n }\n image\n uiLines: ui_lines {\n ...uiLines\n }\n }\n\n \n fragment uiTextLine on UITextLine {\n typeName: __typename\n key\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n ... on UISpanCountdown {\n endsAt: ends_at\n isBold: is_bold\n size\n key\n color {\n ...colorFields\n }\n }\n ... on UISpanTag {\n key\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n }\n backgroundColor: background_color {\n ...colorFields\n }\n }\n }\n }\n\n \n fragment uiLines on UILine {\n typeName: __typename\n ... on UITextLine {\n ...uiTextLine\n }\n ... on UITitleLine {\n key\n text\n color {\n ...colorFields\n }\n size\n }\n ... on UIBulletLine {\n key\n iconSpan: icon_span {\n typeName: __typename\n color {\n ...colorFields\n }\n icon {\n ...iconFields\n }\n iconSize: size\n }\n bulletSpacerSpan: bullet_spacer_span {\n typeName: __typename\n width\n }\n uiSpans: ui_spans {\n ...uiSpansPrimitive\n }\n }\n }\n\n \n fragment uiSpansPrimitive on UISpan {\n typeName: __typename\n ... on UISpanIcon {\n key\n color {\n ...colorFields\n }\n icon {\n ...iconFields\n }\n iconSize: size\n }\n ... on UISpanSpacer {\n key\n width\n }\n ... on UISpanText {\n key\n color {\n ...colorFields\n }\n text\n isBold: is_bold\n textSize: size\n }\n }\n\n \n fragment uiCardBorderFields on UICardBorderType {\n topColor: top_color {\n ...colorFields\n }\n bottomColor: bottom_color {\n ...colorFields\n }\n leftColor: left_color {\n ...colorFields\n }\n rightColor: right_color {\n ...colorFields\n }\n borderWidth: border_width\n }\n\n \n fragment uiModalButtonFields on UIModalButton {\n title\n theme: ui_theme\n dismissOnAction: dismiss_on_action\n target {\n typeName: __typename\n ... on UITargetWebPage {\n url\n newWindow: new_window\n }\n ... on UITargetAction {\n action\n params {\n id\n value\n }\n }\n ... on UITargetParams {\n ...uiTargetParamsFields\n }\n }\n trackingId: tracking_id\n }\n\n \n fragment uiModalFields on UIModal {\n typeName: __typename\n header\n caption\n image {\n ... on UIModalImage {\n image\n }\n ... on DeliverooIcon {\n ...iconFields\n }\n ... on DeliverooIllustrationBadge {\n ...illustrationBadgeFields\n }\n }\n buttons {\n ...uiModalButtonFields\n }\n theme: ui_theme\n displayId: display_id\n trackingId: tracking_id\n }\n\n \n fragment uiChallengesModalFields on UIChallengesModal {\n typeName: __typename\n displayId: display_id\n trackingId: tracking_id\n challengeDrnId: challenges_drn_id\n mode\n smallView: small_view {\n header\n bodyText: body_text\n infoButton: info_button {\n ...uiModalButtonFields\n }\n icon {\n ... on UIChallengesIndicator {\n required\n completed\n }\n ... on UIChallengesBadge {\n url\n }\n ... on UIChallengesSteppedIndicator {\n steps {\n ... on UIChallengesSteppedStamp {\n text\n icon\n isHighlighted: is_highlighted\n }\n }\n stepsCompleted: steps_completed\n stepsRequired: steps_required\n }\n }\n }\n fullView: full_view {\n header\n headerSubtitle: header_subtitle\n bodyTitle: body_title\n bodyText: body_text\n confirmationButton: confirmation_button {\n ...uiModalButtonFields\n }\n infoButton: info_button {\n ...uiModalButtonFields\n }\n icon {\n ... on UIChallengesIndicator {\n required\n completed\n }\n ... on UIChallengesBadge {\n url\n }\n ... on UIChallengesSteppedIndicator {\n steps {\n ... on UIChallengesSteppedStamp {\n text\n icon\n isHighlighted: is_highlighted\n }\n }\n stepsCompleted: steps_completed\n stepsRequired: steps_required\n }\n }\n }\n }\n\n \n fragment uiPlusFullScreenModalFields on UIPlusFullScreenModal {\n typeName: __typename\n displayId: display_id\n trackingId: tracking_id\n image {\n typeName: __typename\n ... on UIModalImage {\n image\n }\n ... on DeliverooIllustrationBadge {\n ...illustrationBadgeFields\n }\n }\n header\n body\n footnote\n primaryButton {\n ...uiModalButtonFields\n }\n secondaryButton {\n ...uiModalButtonFields\n }\n confetti\n displayOnlyOnce: display_only_once\n }\n\n \n fragment uiFeedOverlayFields on UIFeedOverlay {\n id\n position\n blocks: overlay_blocks {\n ... on UIFeedOverlayBanner {\n typeName: __typename\n id: display_id\n trackingId: tracking_id\n header\n caption\n isDismissible: is_dismissible\n theme: ui_theme\n image {\n ... on DeliverooIllustrationBadge {\n name\n }\n }\n }\n }\n }\n\n \n fragment uiCategoryPillFields on UICategoryPill {\n typeName: __typename\n blocks: content {\n ...uiLines\n }\n backgroundColor: background_color {\n typeName: __typename\n ...colorFields\n }\n target {\n ...uiTargetFields\n }\n trackingId: tracking_id\n contentDescription: content_description\n }\n\n \n fragment uiTallMenuItemCardFields on UITallMenuItemCard {\n id: menu_item_id\n title\n key\n image\n target {\n ...uiTargetFields\n }\n price {\n ...currencyFields\n }\n trackingId: tracking_id\n }\n\n \n fragment currencyFields on Currency {\n code\n formatted\n fractional\n presentational\n }\n\n \n fragment uiStoryCardFields on UIStoryCard {\n preview {\n profile {\n imageUrl: image_url\n headingLines: heading_lines {\n ...uiLines\n }\n }\n video {\n sources {\n url\n type\n }\n placeholderImage: placeholder_url\n autoplay\n trackingId: tracking_id\n }\n overlay {\n typeName: __typename\n ... on UIStoryTextOverlay {\n background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n uiLines: ui_lines {\n ...uiLines\n }\n }\n }\n target {\n ...uiTargetFields\n }\n }\n main {\n profile {\n imageUrl: image_url\n headingLines: heading_lines {\n ...uiLines\n }\n }\n video {\n sources {\n url\n type\n }\n placeholderImage: placeholder_url\n autoplay\n trackingId: tracking_id\n }\n overlay {\n typeName: __typename\n ... on UIStoryButtonOverlay {\n background {\n typeName: __typename\n ...colorFields\n ...colorGradientFields\n }\n contentLines: content {\n ...uiLines\n }\n button {\n key\n text\n contentDescription: content_description\n target {\n ...uiTargetFields\n }\n theme: ui_theme\n trackingId: tracking_id\n trackingProperties: tracking_properties\n }\n }\n }\n }\n trackingId: tracking_id\n trackingProperties: tracking_properties\n key\n }\n\n ",
"variables": {
"ui_blocks": [
"BANNER",
"CARD",
"SHORTCUT",
"BUTTON",
"MERCHANDISING_CARD",
"STORY_CARD"
],
"ui_controls": [
"APPLIED_FILTER",
"FILTER",
"SORT"
],
"ui_layout_carousel_styles": [
"DEFAULT",
"PARTNER_HEADING"
],
"ui_lines": [
"TITLE",
"TEXT",
"BULLET"
],
"ui_targets": [
"PARAMS",
"RESTAURANT",
"MENU_ITEM",
"WEB_PAGE",
"DEEP_LINK"
],
"fulfillment_methods": [
"DELIVERY",
"COLLECTION"
],
"location": {
"geohash": "gcpn7n35zy89",
"city_uname": "oxford",
"neighborhood_uname": "port-meadow",
"postcode": ""
},
"options": {
"query": "",
"web_column_count": 4,
"user_preference": {
"seen_modals": [
{
"id": "nc_promos_voucher_take10",
"timestamp": 1674830243
}
]
}
},
"url": "https://deliveroo.co.uk/restaurants/oxford/port-meadow?sort=distance&geohash=gcpn7n35zy89",
"uuid": "7270af10-2722-4bda-b041-d768208912c6",
"ui_actions": [
"CHANGE_DELIVERY_TIME",
"CLEAR_FILTERS",
"NO_DELIVERY_YET",
"SHOW_MEAL_CARD_ISSUERS",
"SHOWCASE_PICKUP",
"TOGGLE_FAVOURITE",
"COPY_TO_CLIPBOARD",
"SHOW_PICKUP",
"SHOW_DELIVERY",
"REFRESH",
"SHOW_VIDEO_STORIES",
"SHOW_HOME_MAP_VIEW",
"ACCEPT_CHALLENGES",
"SHOW_CHALLENGES_DETAILS"
],
"ui_features": [
"UNAVAILABLE_RESTAURANTS",
"LIMIT_QUERY_RESULTS",
"UI_CARD_BORDER",
"UI_CAROUSEL_COLOR",
"UI_PROMOTION_TAG",
"UI_BACKGROUND",
"ILLUSTRATION_BADGES",
"SCHEDULED_RANGES",
"UI_SPAN_TAGS",
"UI_SPAN_COUNTDOWN",
"HOME_MAP_VIEW"
],
"ui_themes": [
"BANNER_CARD",
"BANNER_EMPTY",
"BANNER_MARKETING_A",
"BANNER_MARKETING_B",
"BANNER_MARKETING_C",
"BANNER_PICKUP_SHOWCASE",
"BANNER_SERVICE_ADVISORY",
"CARD_LARGE",
"CARD_MEDIUM",
"CARD_MEDIUM_HORIZONTAL",
"CARD_SMALL",
"CARD_SMALL_DIAGONAL",
"CARD_SMALL_HORIZONTAL",
"CARD_WIDE",
"CARD_TALL",
"CARD_TALL_GRADIENT",
"MODAL_DEFAULT",
"MODAL_PLUS",
"MODAL_BUTTON_PRIMARY",
"MODAL_BUTTON_SECONDARY",
"MODAL_BUTTON_TERTIARY",
"SHORTCUT_DEFAULT",
"SHORTCUT_STACKED",
"SHORTCUT_HORIZONTAL",
"BUTTON_PRIMARY",
"BUTTON_SECONDARY",
"ANY_MODAL"
],
"ui_layouts": [
"LIST",
"CAROUSEL"
]
}
})
headers = {
'authority': 'api.uk.deliveroo.com',
'accept': 'application/json, application/vnd.api+json',
'accept-language': 'en',
'authorization': '',
'content-type': 'application/json',
'origin': 'https://deliveroo.co.uk',
'referer': 'https://deliveroo.co.uk/',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-roo-client': 'consumer-web-app',
'x-roo-client-referer': '',
'x-roo-country': 'uk',
'x-roo-external-device-id': '',
'x-roo-guid': '76ff73aa-cf71-47eb-aeaf-ada8b6c400c1',
'x-roo-session-guid': 'cdd20a08-5740-49b0-ba93-90f58def6d43',
'x-roo-sticky-guid': '76ff73aa-cf71-47eb-aeaf-ada8b6c400c1',
'Cookie': '__cf_bm=nWG7YHerxVjA.cYSDMt6MA4YytsDozK3asJXqyb5A.8-1674830393-0-AbXU1uPHJZDRHZhTAIgJDvPdu7i8l4jypJWlcqs5f091IupaQEClXx3fWikwZpO80QD3SWYBeZQB9YiUpCAC0KBdTvgpY7KNwZlPufclBwSM'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Give it a shot and you'll see what I mean :)
I am pretty new to Json files. I want to create a Json file with 10 JSon objects. Each Object has a temperature, flow and pressure given by a Sensor. The values for each are stored in a variable. I can create the Json file but the variable is always handled like a string. To make it simple I've created a similar loop where every Json Object got only one entry, the variable stored as ID.
This is my try:
json_Daten2 = [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}]
for i in range(10):
json_Daten2[i] = """
{
"ID": i,
}
"""
And this is my result:
[
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n ",
"\n {\n \"ID\": i,\n }\n "
]
Sorry if I've missed an similarly Boardentry but I am thankful for every hint or help!
Thanks in advance!
Max
That's because you used string. How about that:
for i in range(10):
json_Daten2[i] = {"ID": i}
you dont need to do this man, python dumps json naturaly:
from json import dump,dumps
MY_DATA = [
{'id':112,'tempeture':2.23},
{'id':112,'tempeture':2.23}
]
#if you want in string mode
result = dumps(MY_DATA,ensure_ascii=False,indent=4)
print(result)
#if you want in file
with open('teste.json','w') as arq:
dump(MY_DATA,arq)
I can successfully send Post request and receive all data through python requests module.
import requests
cookies = {
'split': 'n',
'split_tcv': 'n',
'__vst': '747e3d5a-52bd-4404-8290-0541c8ae3e8d',
'permutive-id': '83d956ce-dcc4-4013-ba11-deaa916d1d11',
'ab.storage.userId.7cc9d032-9d6d-44cf-a8f5-d276489af322': '%7B%22g%22%3A%22visitor_747e3d5a-52bd-4404-8290-0541c8ae3e8d%22%2C%22c%22%3A1647078965078%2C%22l%22%3A1648673046950%7D',
'ab.storage.sessionId.7cc9d032-9d6d-44cf-a8f5-d276489af322': '%7B%22g%22%3A%2279a8b95e-56ac-5536-8f6d-5a63de27b982%22%2C%22e%22%3A1648674868938%2C%22c%22%3A1648673046947%2C%22l%22%3A1648673068938%7D',
'ab.storage.deviceId.7cc9d032-9d6d-44cf-a8f5-d276489af322': '%7B%22g%22%3A%22ea184537-7399-0e15-1556-b3839d293b53%22%2C%22c%22%3A1647078965081%2C%22l%22%3A1648673046950%7D',
'_pxvid': 'a344163f-a1ea-11ec-a1b4-516177714d73',
'g_state': '{"i_p":1649259615738,"i_l":3}',
'__split': '64',
'G_ENABLED_IDPS': 'google',
'AMCV_8853394255142B6A0A4C98A4%40AdobeOrg': '-1124106680%7CMCIDTS%7C19082%7CMCMID%7C05773401507726959754212551189522089328%7CMCAID%7CNONE%7CMCOPTOUT-1648761666s%7CNONE%7CvVersion%7C5.2.0',
'threshold_value': '74',
'clstr': '',
'clstr_tcv': '',
'CSAT_RENTALS': 'true',
'__fp': '57f929a2b9fe50bc016ff3280cdc0a79',
'__snn': 'b9699e1f-99c3-44b9-b25a-f57e30a93a23',
'__ssnstarttime': '1648754211',
'recent_searches': '%5B%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T16%3A10%3A05.597Z%22%2C%22last_ran%22%3A%222022-03-30T18%3A19%3A49.021Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22Studio%20City%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2291604%22%2C%22slug_id%22%3A%2291604%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291604%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T18%3A21%3A43.199Z%22%2C%22last_ran%22%3A%222022-03-30T19%3A19%3A48.210Z%22%2C%22query%22%3A%7B%22city%22%3A%22Studio%20City%22%2C%22county%22%3A%22Los%20Angeles%22%2C%22postal_code%22%3A%2291604%22%2C%22state_code%22%3A%22CA%22%2C%22slug_id%22%3A%2291604%22%7D%2C%22resource_type%22%3A%22for_rent%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291604%22%2C%22prop_status%22%3A%5B%5D%2C%22isSavable%22%3Atrue%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T19%3A22%3A59.176Z%22%2C%22last_ran%22%3A%222022-03-30T19%3A23%3A43.750Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22San%20Diego%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2292101%22%2C%22slug_id%22%3A%2292101%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2292101%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T20%3A44%3A18.721Z%22%2C%22last_ran%22%3A%222022-03-30T20%3A44%3A18.721Z%22%2C%22query%22%3A%7B%22city%22%3A%22Chula%20Vista%22%2C%22postal_code%22%3A%2291913%22%2C%22state_code%22%3A%22CA%22%2C%22slug_id%22%3A%2291913%22%2C%22centroid%22%3A%7B%22lon%22%3A-116.985898%2C%22lat%22%3A32.619087%7D%2C%22srp_link%22%3A%22%2Frealestateandhomes-search%2F91913%22%7D%2C%22resource_type%22%3A%22for_sale%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22user_query%22%3A%7B%22search_query%22%3A%7B%22status%22%3A%5B%22for_sale%22%2C%22ready_to_build%22%5D%2C%22primary%22%3Atrue%2C%22search_location%22%3A%7B%22location%22%3A%2291913%2C%20Chula%20Vista%2C%20CA%22%7D%7D%2C%22search_params%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22slug_id%22%3A%2291913%22%2C%22discovery_mode%22%3Afalse%2C%22location%22%3A%2291913%2C%20Chula%20Vista%2C%20CA%22%7D%7D%2C%22srp_link%22%3A%22%2Frealestateandhomes-search%2F91913%22%2C%22isSavable%22%3Atrue%2C%22id%22%3A%2291913%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T22%3A45%3A41.450Z%22%2C%22last_ran%22%3A%222022-03-30T22%3A46%3A09.195Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22San%20Diego%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2292123%22%2C%22slug_id%22%3A%2292123%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T19%3A22%3A11.736Z%22%2C%22last_ran%22%3A%222022-03-31T19%3A21%3A06.151Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22Chula%20Vista%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2291913%22%2C%22slug_id%22%3A%2291913%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291913%22%7D%5D',
'AMCVS_8853394255142B6A0A4C98A4%40AdobeOrg': '1',
}
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
'Accept': 'application/json',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
# Already added when you pass json=
# 'Content-Type': 'application/json',
'Origin': 'https://www.realtor.com',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'Sec-GPC': '1',
'Referer': 'https://www.realtor.com/realestateandhomes-detail/779-Eastshore-Ter-Unit-232_Chula-Vista_CA_91913_M23699-12806',
'Connection': 'keep-alive',
# Requests sorts cookies= alphabetically
# 'Cookie': 'split=n; split_tcv=n; __vst=747e3d5a-52bd-4404-8290-0541c8ae3e8d; permutive-id=83d956ce-dcc4-4013-ba11-deaa916d1d11; ab.storage.userId.7cc9d032-9d6d-44cf-a8f5-d276489af322=%7B%22g%22%3A%22visitor_747e3d5a-52bd-4404-8290-0541c8ae3e8d%22%2C%22c%22%3A1647078965078%2C%22l%22%3A1648673046950%7D; ab.storage.sessionId.7cc9d032-9d6d-44cf-a8f5-d276489af322=%7B%22g%22%3A%2279a8b95e-56ac-5536-8f6d-5a63de27b982%22%2C%22e%22%3A1648674868938%2C%22c%22%3A1648673046947%2C%22l%22%3A1648673068938%7D; ab.storage.deviceId.7cc9d032-9d6d-44cf-a8f5-d276489af322=%7B%22g%22%3A%22ea184537-7399-0e15-1556-b3839d293b53%22%2C%22c%22%3A1647078965081%2C%22l%22%3A1648673046950%7D; _pxvid=a344163f-a1ea-11ec-a1b4-516177714d73; g_state={"i_p":1649259615738,"i_l":3}; __split=64; G_ENABLED_IDPS=google; AMCV_8853394255142B6A0A4C98A4%40AdobeOrg=-1124106680%7CMCIDTS%7C19082%7CMCMID%7C05773401507726959754212551189522089328%7CMCAID%7CNONE%7CMCOPTOUT-1648761666s%7CNONE%7CvVersion%7C5.2.0; threshold_value=74; clstr=; clstr_tcv=; CSAT_RENTALS=true; __fp=57f929a2b9fe50bc016ff3280cdc0a79; __snn=b9699e1f-99c3-44b9-b25a-f57e30a93a23; __ssnstarttime=1648754211; recent_searches=%5B%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T16%3A10%3A05.597Z%22%2C%22last_ran%22%3A%222022-03-30T18%3A19%3A49.021Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22Studio%20City%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2291604%22%2C%22slug_id%22%3A%2291604%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291604%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T18%3A21%3A43.199Z%22%2C%22last_ran%22%3A%222022-03-30T19%3A19%3A48.210Z%22%2C%22query%22%3A%7B%22city%22%3A%22Studio%20City%22%2C%22county%22%3A%22Los%20Angeles%22%2C%22postal_code%22%3A%2291604%22%2C%22state_code%22%3A%22CA%22%2C%22slug_id%22%3A%2291604%22%7D%2C%22resource_type%22%3A%22for_rent%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291604%22%2C%22prop_status%22%3A%5B%5D%2C%22isSavable%22%3Atrue%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T19%3A22%3A59.176Z%22%2C%22last_ran%22%3A%222022-03-30T19%3A23%3A43.750Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22San%20Diego%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2292101%22%2C%22slug_id%22%3A%2292101%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2292101%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T20%3A44%3A18.721Z%22%2C%22last_ran%22%3A%222022-03-30T20%3A44%3A18.721Z%22%2C%22query%22%3A%7B%22city%22%3A%22Chula%20Vista%22%2C%22postal_code%22%3A%2291913%22%2C%22state_code%22%3A%22CA%22%2C%22slug_id%22%3A%2291913%22%2C%22centroid%22%3A%7B%22lon%22%3A-116.985898%2C%22lat%22%3A32.619087%7D%2C%22srp_link%22%3A%22%2Frealestateandhomes-search%2F91913%22%7D%2C%22resource_type%22%3A%22for_sale%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22user_query%22%3A%7B%22search_query%22%3A%7B%22status%22%3A%5B%22for_sale%22%2C%22ready_to_build%22%5D%2C%22primary%22%3Atrue%2C%22search_location%22%3A%7B%22location%22%3A%2291913%2C%20Chula%20Vista%2C%20CA%22%7D%7D%2C%22search_params%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22slug_id%22%3A%2291913%22%2C%22discovery_mode%22%3Afalse%2C%22location%22%3A%2291913%2C%20Chula%20Vista%2C%20CA%22%7D%7D%2C%22srp_link%22%3A%22%2Frealestateandhomes-search%2F91913%22%2C%22isSavable%22%3Atrue%2C%22id%22%3A%2291913%22%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T22%3A45%3A41.450Z%22%2C%22last_ran%22%3A%222022-03-30T22%3A46%3A09.195Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22San%20Diego%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2292123%22%2C%22slug_id%22%3A%2292123%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%7D%2C%7B%22items_viewed%22%3A0%2C%22first_ran%22%3A%222022-03-30T19%3A22%3A11.736Z%22%2C%22last_ran%22%3A%222022-03-31T19%3A21%3A06.151Z%22%2C%22query%22%3A%7B%22area_type%22%3A%22postal_code%22%2C%22city%22%3A%22Chula%20Vista%22%2C%22state_id%22%3A%22CA%22%2C%22state_code%22%3A%22CA%22%2C%22postal_code%22%3A%2291913%22%2C%22slug_id%22%3A%2291913%22%7D%2C%22resource_type%22%3A%22pdp%22%2C%22search_type%22%3A%22postal_code%22%2C%22isAjax%22%3Afalse%2C%22id%22%3A%2291913%22%7D%5D; AMCVS_8853394255142B6A0A4C98A4%40AdobeOrg=1',
# Requests doesn't support trailers
# 'TE': 'trailers',
}
params = {
'client_id': 'rdc-x',
'schema': 'vesta',
}
json_data = {
'query': '{\n home(property_id: "2369912806") {\n advertisers {\n team_name\n address {\n city\n country\n line\n postal_code\n state\n state_code\n }\n builder {\n fulfillment_id\n }\n broker {\n accent_color\n designations\n fulfillment_id\n name\n logo\n }\n email\n fulfillment_id\n href\n mls_set\n name\n nrds_id\n office {\n address {\n city\n coordinate {\n lat\n lon\n }\n country\n line\n postal_code\n state\n state_code\n }\n application_url\n email\n lead_email {\n to\n cc\n }\n fulfillment_id\n hours\n href\n mls_set\n out_of_community\n name\n phones {\n ext\n number\n primary\n trackable\n type\n }\n photo {\n href\n }\n slogan\n }\n phones {\n ext\n number\n primary\n trackable\n type\n }\n photo {\n href\n }\n slogan\n type\n }\n buyers {\n address {\n city\n country\n line\n postal_code\n state\n state_code\n }\n broker {\n accent_color\n designations\n fulfillment_id\n name\n logo\n }\n email\n fulfillment_id\n href\n mls_set\n name\n nrds_id\n office {\n address {\n city\n coordinate {\n lat\n lon\n }\n country\n line\n postal_code\n state\n state_code\n }\n application_url\n email\n lead_email {\n to\n cc\n }\n fulfillment_id\n hours\n href\n mls_set\n out_of_community\n name\n phones {\n ext\n number\n primary\n trackable\n type\n }\n photo {\n href\n }\n slogan\n }\n phones {\n ext\n number\n primary\n trackable\n type\n }\n photo {\n href\n }\n slogan\n type\n }\n community {\n permalink\n }\n estimates {\n current_values(source: "corelogic")\n #include_if(field: "status", operator: in, value: "sold,off_market,other") {\n estimate\n estimate_high\n estimate_low\n date\n source {\n type\n name\n }\n }\n }\n days_on_market\n description {\n baths\n baths_3qtr\n baths_full\n baths_full_calc\n baths_half\n baths_max\n baths_min\n baths_partial_calc\n baths_total\n beds\n beds_max\n beds_min\n construction\n cooling\n exterior\n fireplace\n garage\n garage_max\n garage_min\n garage_type\n heating\n logo {\n href\n }\n lot_sqft\n name\n pool\n roofing\n rooms\n sqft\n sqft_max\n sqft_min\n stories\n styles\n sub_type\n text\n type\n units\n year_built\n year_renovated\n zoning\n }\n details {\n category\n parent_category\n text\n }\n flags {\n is_coming_soon\n is_contingent\n is_deal_available\n is_for_rent\n is_foreclosure\n is_garage_present\n is_new_construction\n is_pending\n is_price_excludes_land\n is_senior_community\n is_short_sale\n is_subdivision\n }\n href\n last_sold_date\n last_sold_price\n list_date\n list_price\n listing_id\n local {\n flood {\n firststreet_url\n fsid\n flood_factor_score\n flood_factor_severity\n environmental_risk\n trend_direction\n fema_zone\n insurance_quotes{\n provider_url\n provider_name\n provider_logo\n expires\n price\n home_coverage\n contents_coverage\n disclaimer\n }\n }\n noise {\n score\n }\n }\n location {\n address {\n city\n coordinate {\n lat\n lon\n }\n country\n line\n postal_code\n state\n state_code\n street_direction\n street_name\n street_number\n street_post_direction\n street_suffix\n unit\n validation_code\n }\n county {\n fips_code\n name\n state_code\n }\n neighborhoods {\n city\n id\n level\n name\n state_code\n slug_id\n }\n search_areas {\n city\n state_code\n }\n }\n nearby_schools {\n schools {\n coordinate {\n lat\n lon\n }\n distance_in_miles\n district {\n id\n name\n }\n education_levels\n funding_type\n grades\n greatschools_id\n id\n name\n nces_code\n parent_rating\n rating\n review_count\n slug_id\n student_count\n }\n }\n photo_count\n photos {\n title\n description\n href\n type\n }\n primary_photo {\n href\n }\n property_history {\n date\n event_name\n price\n price_sqft\n source_listing_id\n source_name\n listing #include_if(field: "status", operator: in, value: "sold,off_market,other") {\n list_price\n last_status_change_date\n last_update_date\n status\n list_date\n listing_id\n suppression_flags\n photos {\n href\n }\n description {\n text\n }\n advertisers {\n fulfillment_id\n nrds_id\n name\n email\n href\n slogan\n office {\n fulfillment_id\n name\n email\n href\n slogan\n out_of_community\n application_url\n mls_set\n }\n broker {\n fulfillment_id\n name\n accent_color\n logo\n }\n type\n mls_set\n }\n buyers {\n fulfillment_id\n nrds_id\n name\n email\n href\n slogan\n type\n mls_set\n address {\n line\n city\n postal_code\n state_code\n state\n country\n coordinate {\n lat\n lon\n }\n }\n office {\n fulfillment_id\n name\n email\n href\n slogan\n hours\n out_of_community\n application_url\n mls_set\n address {\n line\n city\n postal_code\n state_code\n state\n country\n }\n phones {\n number\n type\n primary\n trackable\n ext\n }\n county {\n name\n }\n }\n phones {\n number\n type\n primary\n trackable\n ext\n }\n broker {\n fulfillment_id\n name\n accent_color\n logo\n }\n }\n source {\n id\n agents {\n agent_id\n agent_name\n office_id\n office_name\n office_phone\n type\n }\n }\n }\n }\n property_id\n provider_url {\n href\n level\n type\n }\n source {\n agents {\n agent_id\n agent_name\n id\n office_id\n office_name\n office_phone\n type\n }\n disclaimer {\n href\n logo {\n href\n height\n width\n }\n text\n }\n id\n plan_id\n listing_id\n name\n raw {\n status\n style\n tax_amount\n }\n type\n community_id\n }\n status\n suppression_flags\n tags\n tax_history {\n assessment {\n building\n land\n total\n }\n market {\n building\n land\n total\n }\n tax\n year\n }\n }\n }',
'variables': {},
'callfrom': 'PDP',
'isClient': True,
}
response = requests.post('https://www.realtor.com/api/v1/hulk', headers=headers, params=params, cookies=cookies, json=json_data)
This post request returns neat json dict.
I tried removing new lines and strip the unwanted spaces from json_data['query'] string but by cleaning the query string it gives 400 response.
When I use the same in scrapy I got 500 error. Here is my Scrapy code.
>>> from scrapy import Request
>>> from urllib.parse import urlencode
>>> url = 'https://www.realtor.com/api/v1/hulk?' + urlencode(params)
>>> r = Request(url=url, method='POST', body=json.dumps(json_data), cookies=cookies, headers=headers)
>>> fetch(r)
Error:
2022-04-01 00:35:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://www.realtor.com/api/v1/hulk?client_id=rdc-x&schema=vesta> (failed 1 times): 500 Internal Server Error
2022-04-01 00:35:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://www.realtor.com/api/v1/hulk?client_id=rdc-x&schema=vesta> (failed 2 times): 500 Internal Server Error
2022-04-01 00:35:50 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <POST https://www.realtor.com/api/v1/hulk?client_id=rdc-x&schema=vesta> (failed 3 times): 500 Internal Server Error
2022-04-01 00:35:50 [scrapy.core.engine] DEBUG: Crawled (500) <POST https://www.realtor.com/api/v1/hulk?client_id=rdc-x&schema=vesta> (referer: https://www.realtor.com/realestateandhomes-detail/779-Eastshore-Ter-Unit-232_Chula-Vista_CA_91913_M23699-12806)
There is another post request from the same api endpoint which is neat and clean but not all the data in it specifically I am after. Here is the code:
import requests
cookies = {
'split': 'n',
'split_tcv': '143',
'__vst': '152dcbd8-0835-4216-8fb0-08acbd3d84cb',
'__ssn': '363de20b-632d-44df-8a7e-710530f17aca',
'__ssnstarttime': '1648756568',
'__split': '49',
'G_ENABLED_IDPS': 'google',
'pxcts': '9f300f24-b12c-11ec-acd0-6d6763416465',
'_pxvid': '9f3002da-b12c-11ec-acd0-6d6763416465',
'_fbp': 'fb.1.1648756575432.190210113',
's_ecid': 'MCMID%7C02791291578246133719093373803026546195',
'AMCVS_8853394255142B6A0A4C98A4%40AdobeOrg': '1',
'__fp': '3f685e7829acae9bfd1b24cc585822f2',
'_pxff_bdd': '2000',
'_pxff_cde': '5,10',
'srchID': 'c1621829c2ba44eeaf6730f485681000',
'criteria': 'pg%3D1%26sprefix%3D%252Frealestateandhomes-search%26area_type%3Dcity%26search_type%3Dcity%26city%3DLos%2520Angeles%26state_code%3DCA%26state_id%3DCA%26lat%3D34.0193936%26long%3D-118.4108248%26county_fips%3D06037%26county_fips_multi%3D06037%26loc%3DLos%2520Angeles%252C%2520CA%26locSlug%3DLos-Angeles_CA%26county_needed_for_uniq%3Dfalse',
'user_activity': 'return',
'last_ran': '-1',
'last_ran_threshold': '1648756647928',
'AMCV_8853394255142B6A0A4C98A4%40AdobeOrg': '-1124106680%7CMCIDTS%7C19083%7CMCMID%7C02791291578246133719093373803026546195%7CMCAID%7CNONE%7CMCOPTOUT-1648763850s%7CNONE%7CvVersion%7C5.2.0',
'_px3': '58197ee5ca6ae8cbda1bc68ab010ab5904d51ef28caab952f12e1404be5e6d8d:mUP+py2YqvMEeklbv4mq/DiHWSgUe4L8Xv96S935fYZ3bvsPObKY9AS52EOZkVABZx1qkwAkoez6rcs8069F8Q==:1000:AW3uJyae1jL3UWtkn9QKYWs1b7EbIVN8qGQHX5trb1NKOPHiJNpqayoDXtaZ1BoiDGbq1JsSBvWtug3vKcLVWEXi53XECTZVNMgC3H9pJcN6gFUsLyZiTRiDsA+ZuZcFkHrYpm1vtKeYAMgxlHId23nBDicSlnJ9JeB9SN1/CaBdrMbUPcOtsbKCmpU1puXo0pCuuGIe+wWaalpDoqyPRg==',
}
headers = {
'authority': 'www.realtor.com',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
# Already added when you pass json=
# 'content-type': 'application/json',
# Requests sorts cookies= alphabetically
# 'cookie': 'split=n; split_tcv=143; __vst=152dcbd8-0835-4216-8fb0-08acbd3d84cb; __ssn=363de20b-632d-44df-8a7e-710530f17aca; __ssnstarttime=1648756568; __split=49; G_ENABLED_IDPS=google; pxcts=9f300f24-b12c-11ec-acd0-6d6763416465; _pxvid=9f3002da-b12c-11ec-acd0-6d6763416465; _fbp=fb.1.1648756575432.190210113; s_ecid=MCMID%7C02791291578246133719093373803026546195; AMCVS_8853394255142B6A0A4C98A4%40AdobeOrg=1; __fp=3f685e7829acae9bfd1b24cc585822f2; _pxff_bdd=2000; _pxff_cde=5,10; srchID=c1621829c2ba44eeaf6730f485681000; criteria=pg%3D1%26sprefix%3D%252Frealestateandhomes-search%26area_type%3Dcity%26search_type%3Dcity%26city%3DLos%2520Angeles%26state_code%3DCA%26state_id%3DCA%26lat%3D34.0193936%26long%3D-118.4108248%26county_fips%3D06037%26county_fips_multi%3D06037%26loc%3DLos%2520Angeles%252C%2520CA%26locSlug%3DLos-Angeles_CA%26county_needed_for_uniq%3Dfalse; user_activity=return; last_ran=-1; last_ran_threshold=1648756647928; AMCV_8853394255142B6A0A4C98A4%40AdobeOrg=-1124106680%7CMCIDTS%7C19083%7CMCMID%7C02791291578246133719093373803026546195%7CMCAID%7CNONE%7CMCOPTOUT-1648763850s%7CNONE%7CvVersion%7C5.2.0; _px3=58197ee5ca6ae8cbda1bc68ab010ab5904d51ef28caab952f12e1404be5e6d8d:mUP+py2YqvMEeklbv4mq/DiHWSgUe4L8Xv96S935fYZ3bvsPObKY9AS52EOZkVABZx1qkwAkoez6rcs8069F8Q==:1000:AW3uJyae1jL3UWtkn9QKYWs1b7EbIVN8qGQHX5trb1NKOPHiJNpqayoDXtaZ1BoiDGbq1JsSBvWtug3vKcLVWEXi53XECTZVNMgC3H9pJcN6gFUsLyZiTRiDsA+ZuZcFkHrYpm1vtKeYAMgxlHId23nBDicSlnJ9JeB9SN1/CaBdrMbUPcOtsbKCmpU1puXo0pCuuGIe+wWaalpDoqyPRg==',
'newrelic': 'eyJ2IjpbMCwxXSwiZCI6eyJ0eSI6IkJyb3dzZXIiLCJhYyI6IjM3ODU4NCIsImFwIjoiMTI5NzQxMzUyIiwiaWQiOiI0ZDk3N2RmNjg2ZWI5NjMwIiwidHIiOiI4NjMxYTMzMGYzODExZDg3YjdmZWExMWM2NzIzMzg4MCIsInRpIjoxNjQ4NzU2NjU3MDA5LCJ0ayI6IjEwMjI2ODEifX0=',
'origin': 'https://www.realtor.com',
'referer': 'https://www.realtor.com/realestateandhomes-detail/8522-N-Adir-Dr_Canoga-Park_CA_91304_M10239-84718?ex=2941654578',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'sec-gpc': '1',
'traceparent': '00-8631a330f3811d87b7fea11c67233880-4d977df686eb9630-01',
'tracestate': '1022681#nr=0-1-378584-129741352-4d977df686eb9630----1648756657009',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
}
params = {
'client_id': 'rdc-x',
'schema': 'vesta',
}
json_data = {
'query': None,
'propertyId': '1023984718',
'callfrom': 'LDP',
'nrQueryType': 'MAIN_QV',
'user_id': None,
'variables': {
'propertyId': '1023984718',
'relatedHomesQuery': {
'type': 'similar_homes',
},
'query': {
'type': 'home',
'property_id': '1023984718',
},
'listing_id': '2941654578',
},
'seoPayload': {
'pageType': {
'silo': 'detail_page',
'status': 'for_sale',
},
},
'queryLoader': {
'appType': 'FOR_SALE',
'pageType': 'LDP',
'serviceType': 'MAIN_QV',
},
'isClient': True,
}
response = requests.post('https://www.realtor.com/api/v1/hulk', headers=headers, params=params, cookies=cookies, json=json_data)
Where am I wrong?
i'm still learn to code with python. I really need help to scrape the element from this website:
https://www.tokopedia.com/craftdale/crossback-apron-hijau-army?src=topads
I want to get Review data (Review Time) from Review (Ulasan) container
enter image description here
This is HTML from the site
<p disabled="" data-testid="txtDateGivenReviewFilter0" class="css-oals0c-unf-heading e1qvo2ff8">1 bulan lalu</p>
I've tried to get the element with this code
review = soup.findAll('p',class_='css-oals0c-unf-heading e1qvo2ff8')
or
review= soup.findAll('p',id_='txtDateGivenReviewFilter0')
But the result i only get empty data
enter image description here
Can anybody fix this problem? Thank you very much
When you analyse the website, the website makes ajax calls to retrieve different information in the website. To get the review information, it makes an ajax call to a specific endpoint with json payload.
import requests, json
payload = [{"operationName": "PDPReviewRatingQuery", "variables": {"productId": 353506414}, "query": "query PDPReviewRatingQuery($productId: Int!) {\n ProductRatingQuery(productId: $productId) {\n ratingScore\n totalRating\n totalRatingWithImage\n detail {\n rate\n totalReviews\n percentage\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewImagesQuery", "variables": {"productID": 353506414, "page": 1}, "query": "query PDPReviewImagesQuery($page: Int, $productID: Int!) {\n ProductReviewImageListQuery(page: $page, productID: $productID) {\n detail {\n reviews {\n reviewer {\n fullName\n profilePicture\n __typename\n }\n reviewId\n message\n rating\n updateTime\n isReportable\n __typename\n }\n images {\n imageAttachmentID\n description\n uriThumbnail\n uriLarge\n reviewID\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewHelpfulQuery", "variables": {"productID": 353506414}, "query": "query PDPReviewHelpfulQuery($productID: Int!) {\n ProductMostHelpfulReviewQuery(productId: $productID) {\n shop {\n shopId\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n user {\n fullName\n image\n url\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewListQuery", "variables": {"page": 1, "rating": 0, "withAttachment": 0, "productID": 353506414, "perPage": 10}, "query": "query PDPReviewListQuery($productID: Int!, $page: Int!, $perPage: Int!, $rating: Int!, $withAttachment: Int!) {\n ProductReviewListQuery(productId: $productID, page: $page, perPage: $perPage, rating: $rating, withAttachment: $withAttachment) {\n shop {\n shopId\n name\n image\n url\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n reviewResponse {\n message\n createTime\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n user {\n userId\n fullName\n image\n url\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}]
res = requests.post("https://gql.tokopedia.com/", json=payload)
data = res.json()
with open("data.json", "w") as f:
json.dump(data, f)
The above script will save the review information as a json to a file.
In order to get the rating score
print(data[0]['data']['ProductRatingQuery']['ratingScore'])
``
So i finally have setup the elasticsearch database and imported data into it.
Sometimes when i try to request data from frontend, i get 500 error( not all the time, just sometimes ).
I tried to request data from POSTMAN( to see the ES error message ).
I got:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[9m4uVcf3TLmQ9Kr7z_fSpQ][text][0]: QueryPhaseExecutionException[[text][0]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#56319fc9]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#60b46f02]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][1]: QueryPhaseExecutionException[[text][1]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#3ca7d41e]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#63daf999]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][2]: QueryPhaseExecutionException[[text][2]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#27521539]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#66dbac2b]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][3]: QueryPhaseExecutionException[[text][3]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#73bb4f5e]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#112dcf1c]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][4]: QueryPhaseExecutionException[[text][4]: query[filtered(function score (blended(terms: [url_words:test, domain_words:test, title:test, body:test]), functions: [{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#b650549]}{filter(*:*), function [org.elasticsearch.common.lucene.search.function.FieldValueFactorFunction#7fbe90f4]}]))->cache(_type:page)],from[0],size[25]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[Missing value for field [lang_en]]; }]",
"status": 500
}
Here is the request body:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "test",
"minimum_should_match": "-25%",
"type": "cross_fields",
"tie_breaker": 0.5,
"fields": ["title^3", "body", "url_words^2", "domain_words^8"]
}
},
"functions": [{
"field_value_factor": {
"field": "rank",
"factor": 1
}
},{
"field_value_factor": {
"field": "lang_en"
}
}]
}
},
"from": 0,
"size": 25
}
I understand that Missing value for field [lang_en] is the problem. I toyed around es with google results, but without success.
ES version: 1.5.2
Any ideas ?
EDIT:
I added "missing": 0, to second field_value_factor, but i got this error instead:
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[9m4uVcf3TLmQ9Kr7z_fSpQ][text][0]: SearchParseException[[text][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][1]: SearchParseException[[text][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][2]: SearchParseException[[text][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][3]: SearchParseException[[text][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }{[9m4uVcf3TLmQ9Kr7z_fSpQ][text][4]: SearchParseException[[text][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n \"query\": {\n \"function_score\": {\n \"query\": {\n \"multi_match\": {\n \"query\": \"test\",\n \"minimum_should_match\": \"-25%\",\n \"type\": \"cross_fields\",\n \"tie_breaker\": 0.5,\n \"fields\": [\"title^3\", \"body\", \"url_words^2\", \"domain_words^8\"]\n }\n\n },\n \"functions\": [{\n \"field_value_factor\": {\n \"field\": \"rank\",\n \"factor\": 1\n }\n },{\n \"field_value_factor\": {\n \"field\": \"lang_en\",\n \"missing\": 0\n }\n }]\n }\n },\n \"from\": 0,\n \"size\": 25\n }\n]]]; nested: QueryParsingException[[text] field_value_factor query does not support [missing]]; }]",
"status": 400
}
In at least one document, the field lang_en is null, empty or simply non-existent.
You need to modify your field_value_factor function in order to tell it what to do in such as case, by using the missing setting with whatever default value makes sense (0, 1, etc):
{
"field_value_factor": {
"field": "lang_en",
"missing": 1 <---- add this line
}
}
The problem was with the dumb AWS ES version 1.5.2.
My solution: Create EC2 instance and deploy Elasticsearch manualy.