I have dumped a dict of dataframes expanding the json encoder based on this answer. I just altered the way to dump the dataframe, changing orient="records" to orient="table" for my own purpose.
Somehow I can't manage to read the dataframes from the json ; to be precise, pandas seems to read it allright (no exception is raised), but it is filled with NaN values.
Can anybody check if I'm doing anything wrong or if this is a bug for pandas (maybe referring to multiindexed dataframes) ?
I'm using pandas version 1.1.4.
The following code would be enough (I hope) to either test if pandas is KO on my machine or if I've somehow messed up with the dataframe's format. I've also tried to reproduce this with a dummy dataframe including two indexes and didn't run into trouble.
Note also that the json displays a "pandas_version": "0.20.0" which is incoherent with my version (I just made a fresh installation to be sure and it stays there). I've seen the same 0.20.0 version is displayed on the doc's example for current pandas version...
import pandas as pd
s = """{
"schema": {
"fields": [{
"name": "grandeur",
"type": "string"
}, {
"name": "unite",
"type": "string"
}, {
"name": "year",
"type": "integer"
}, {
"name": 1,
"type": "number"
}, {
"name": 2,
"type": "number"
}, {
"name": 3,
"type": "number"
}, {
"name": 4,
"type": "number"
}, {
"name": 5,
"type": "number"
}, {
"name": 6,
"type": "number"
}, {
"name": 7,
"type": "number"
}, {
"name": 8,
"type": "number"
}, {
"name": 9,
"type": "number"
}, {
"name": 10,
"type": "number"
}, {
"name": 11,
"type": "number"
}, {
"name": 12,
"type": "number"
}
],
"primaryKey": ["grandeur", "unite", "year"],
"pandas_version": "0.20.0"
},
"data": [{
"grandeur": "Volumetric soil water layer 1",
"unite": "m3 m-3",
"year": 1981,
"1": 0.3893150916,
"2": 0.3614713229,
"3": 0.3965121538,
"4": 0.3513062306,
"5": 0.3860211495,
"6": 0.3507631742,
"7": 0.3499931922,
"8": 0.3195245205,
"9": 0.3078848032,
"10": 0.3917079828,
"11": 0.380486904,
"12": 0.3987094194
}, {
"grandeur": "Volumetric soil water layer 1",
"unite": "m3 m-3",
"year": 1982,
"1": 0.3924450997,
"2": 0.360954089,
"3": 0.3714920435,
"4": 0.3366828332,
"5": 0.329994006,
"6": 0.3659116305,
"7": 0.3035419171,
"8": 0.3143600073,
"9": 0.3099404359,
"10": 0.3938543858,
"11": 0.383870834,
"12": 0.3909665621
}]
}"""
pd.read_json(s, orient="table")
This is because a field in schema is not matching a key in data.
For example,
schema
{
"name": 1, // integer
"type": "number"
}
data
"1": 0.3893150916 // "1" is string
====================================================
If you change schema to match with data key. read_json should read properly.
schema
{
"name": "1", // string
"type": "number"
}
data
"1": 0.3893150916 // "1" is string
If the example json string is generated by pandas to_json, it is generating a wrong schema for integer column name.
Related
I have a pandas dataframe with two columns : ticket number and history.
History is a string with the following structure. I need to create third column which include author name who change status from New to Open. Is it possible?
[
{
"id": "1,
"author": {
"name": "user1",
"emailAddress": "user1#test.com",
"displayName": "user1"
},
"created": "2021-06-09T12:54:22.915+0000",
"items": [
{
"field": "name",
"from": "1",
"fromString": null,
"to": "2",
"toString": "test"
}
]
},
{
"id": "2",
"author": {
"name": "user2",
"emailAdress": "user2#test.com",
"displayName": "user2"
},
"created": "2021-06-11T09:33:18.692+0000",
"items": [
{
"field": "status",
"from": 3,
"fromString": "New",
"to": "7",
"toString": "Open"
}
]
}]
If your dataframe is named df, the history column (column 2) is named history and the items in the history column actually are json strings with a structure like the one you've provided, you could do the following:
import json
def extract_author(json_string):
records = json.loads(json_string)
for record in records:
items = record['items'][0]
if (items['field'] == 'status'
and items['fromString'] == 'New'
and items['toString'] == 'Open'):
return record['author']['name']
return None
df['author'] = df['history'].map(extract_author)
I am using the Python 3 avro_validator library.
The schema I want to validate references other schemas in sperate avro files. The files are in the same folder. How do I compile all the referenced schemas using the library?
Python code as follows:
from avro_validator.schema import Schema
schema_file = 'basketEvent.avsc'
schema = Schema(schema_file)
parsed_schema = schema.parse()
data_to_validate = {"test": "test"}
parsed_schema.validate(data_to_validate)
The error I get back:
ValueError: Error parsing the field [contentBasket]: The type [ContentBasket] is not recognized by Avro
And example Avro file(s) below:
basketEvent.avsc
{
"type": "record",
"name": "BasketEvent",
"doc": "Indicates that a user action has taken place with a basket",
"fields": [
{
"default": "basket",
"doc": "Restricts this event to having type = basket",
"name": "event",
"type": {
"name": "BasketEventType",
"symbols": ["basket"],
"type": "enum"
}
},
{
"default": "create",
"doc": "What is being done with the basket. Note: create / delete / update will always follow a product event",
"name": "action",
"type": {
"name": "BasketEventAction",
"symbols": ["create","delete","update","view"],
"type": "enum"
}
},
{
"default": "ContentBasket",
"doc": "The set of values that are specific to a Basket event",
"name": "contentBasket",
"type": "ContentBasket"
},
{
"default": "ProductDetail",
"doc": "The set of values that are specific to a Product event",
"name": "productDetail",
"type": "ProductDetail"
},
{
"default": "Timestamp",
"doc": "The time stamp for the event being sent",
"name": "timestamp",
"type": "Timestamp"
}
]
}
contentBasket.avsc
{
"name": "ContentBasket",
"type": "record",
"doc": "The set of values that are specific to a Basket event",
"fields": [
{
"default": [],
"doc": "A range of details about product / basket availability",
"name": "availability",
"type": {
"type": "array",
"items": "Availability"
}
},
{
"default": [],
"doc": "A range of care pland applicable to the basket",
"name": "carePlan",
"type": {
"type": "array",
"items": "CarePlan"
}
},
{
"default": "Category",
"name": "category",
"type": "Category"
},
{
"default": "",
"doc": "Unique identfier for this basket",
"name": "id",
"type": "string"
},
{
"default": "Price",
"doc": "Overall pricing info about the basket as a whole - individual product pricings will be dealt with at a product level",
"name": "price",
"type": "Price"
}
]
}
availability.avsc
{
"name": "Availability",
"type": "record",
"doc": "A range of values relating to the availability of a product",
"fields": [
{
"default": [],
"doc": "A list of offers associated with the overall basket - product level offers will be dealt with on an individual product basis",
"name": "shipping",
"type": {
"type": "array",
"items": "Shipping"
}
},
{
"default": "",
"doc": "The status of the product",
"name": "stockStatus",
"type": {
"name": "StockStatus",
"symbols": ["in stock","out of stock",""],
"type": "enum"
}
},
{
"default": "",
"doc": "The ID for the store when the stock can be collected, if relevant",
"name": "storeId",
"type": "string"
},
{
"default": "",
"doc": "The status of the product",
"name": "type",
"type": {
"name": "AvailabilityType",
"symbols": ["collection","shipping",""],
"type": "enum"
}
}
]
}
maxDate.avsc
{
"type": "record",
"name": "MaxDate",
"doc": "Indicates the timestamp for latest day a delivery should be made",
"fields": [
{
"default": "Timestamp",
"doc": "The time stamp for the delivery",
"name": "timestamp",
"type": "Timestamp"
}
]
}
minDate.avsc
{
"type": "record",
"name": "MinDate",
"doc": "Indicates the timestamp for earliest day a delivery should be made",
"fields": [
{
"default": "Timestamp",
"doc": "The time stamp for the delivery",
"name": "timestamp",
"type": "Timestamp"
}
]
}
shipping.avsc
{
"name": "Shipping",
"type": "record",
"doc": "A range of values relating to shipping a product for delivery",
"fields": [
{
"default": "MaxDate",
"name": "maxDate",
"type": "MaxDate"
},
{
"default": "MinDate",
"name": "minDate",
"type": "minDate"
},
{
"default": 0,
"doc": "Revenue generated from shipping - note, once a specific shipping object is selected, the more detailed revenye data sits within the one of object in pricing - this is more just to define if shipping is free or not",
"name": "revenue",
"type": "int"
},
{
"default": "",
"doc": "The shipping supplier",
"name": "supplier",
"type": "string"
}
]
}
timestamp.avsc
{
"name": "Timestamp",
"type": "record",
"doc": "Timestamp for the action taking place",
"fields": [
{
"default": 0,
"name": "timestampMs",
"type": "long"
},
{
"default": "",
"doc": "Timestamp converted to a string in ISO format",
"name": "isoTimestamp",
"type": "string"
}
]
}
I'm not sure if that library supports what you are trying to do, but fastavro should.
If you put the first schema in a file called BasketEvent.avsc and the second schema in a file called ContentBasket.avsc then you can do the following:
from fastavro.schema import load_schema
from fastavro import validate
schema = load_schema("BasketEvent.avsc")
validate({"test": "test"}, schema)
Note that when I tried to do this I got an error of fastavro._schema_common.UnknownType: Availability because it seems that there are other referenced schemas that you haven't posted here.
I Want to convert json1 into json2 with minimal looping in Python because there are many records like below.
JSON1:
{
"1-5":[
{
"NAME": "A",
"AGE": "1"
},
{
"NAME": "A",
"AGE": "2"
},
{
"NAME": "B",
"AGE": "3"
}
],
"6-10":[
{
"NAME": "x",
"AGE": "6"
},
{
"NAME": "y",
"AGE": "6"
},
{
"NAME": "z",
"AGE": "10"
}
]
}
JSON2:
{
"1": [
{
"NAME": "A",
"AGE": "1"
}
],
"2": [
{
"NAME": "A",
"AGE": "2"
},
{
"NAME": "B",
"AGE": "2"
}
],
"3": [
{
"NAME": "B",
"AGE": "1"
}
],
...
}
Is there any way to do like this, Could anyone help me with this?
Note Extra info for this question to be submitted.
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999.
This question already has answers here:
Python list of dictionaries search
(24 answers)
Finding the index of an item in a list
(43 answers)
Closed 6 years ago.
I have a site that I'm working on using Django that makes API requests to get information about cars. These requests are triggered by JS events, which go to a created URL that connects to a view. In my view, I make the actual API request and use the .json() method to get the returned JSON from the response.
What I'm having trouble doing however is getting a certain index/value from this response.
Here is an example of a response I would receive:
{
"equipment": [{
"id": "20047746549",
"name": "Specifications",
"equipmentType": "OTHER",
"availability": "STANDARD",
"attributes": [{
"name": "Aerodynamic Drag (cd)",
"value": "0.26"
}, {
"name": "Ege Highway Mpg",
"value": "29"
}, {
"name": "Epa Combined Mpg",
"value": "23"
}, {
"name": "Epa City Mpg",
"value": "20"
}, {
"name": "Curb Weight",
"value": "3957"
}, {
"name": "Turning Diameter",
"value": "39.0"
}, {
"name": "Manufacturer 0 60mph Acceleration Time (seconds)",
"value": "6.6"
}, {
"name": "Epa Highway Mpg",
"value": "29"
}, {
"name": "Tco Curb Weight",
"value": "3957"
}, {
"name": "Ege Combined Mpg",
"value": "23"
}, {
"name": "Fuel Capacity",
"value": "19.8"
}, {
"name": "Ege City Mpg",
"value": "20"
}]
}],
"equipmentCount": 1
}
What I'm trying to get is the "value": "3957" that corresponds to the "name": "Curb Weight" attribute
The way I thought about doing so (in Python) was jsonResponse['equipment'][0]['attributes'][4]['value'] however the index is not always the same for this Curb Weight attribute. Sometimes the index is 5, sometimes 6, etc.
Is there anyway to get this Curb Weight attribute, or any other attribute, by the value of it's "name" key?
I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.
json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']
You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'
json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']