How to read a nested Json with unique values in Pandas

How to read a nested Json with unique values in Pandas - python

I have a source json of steam reviews, in the format:
{
"reviews": {
"69245216": {
"recommendationid": "69245216",
"author": {
"steamid": "76561198166378463",
"num_games_owned": 31,
"num_reviews": 4,
"playtime_forever": 60198,
"playtime_last_two_weeks": 5899,
"last_played": 1589654367
},
"language": "english",
"review": "Me:*Playing Heroes of Hammrwatch\nAlso me 1 year later:*Playing Heroes of Hammrwatch\nIt's one of the best rougelites I've ever played. You can easly say that by the amount of hours I have on this game. I also have every achievement in the game.\nThe things I don't like about this game:\n-Limit- The game has limits like max damage you can deal. This is not that big problem because you would have to play this game as long as me to hit \"the wall\". And its because the damage is codded in 32bit number which makes the limit around 2 billion.\n-Tax- There is tax in the game for gold which scales with the amount of gold you have on you what makes no sense.\nThe things I like about this game:\n-Music- There are different themed ones depending on the act you are on.\n-Pixel Art-\n-Graphics- Game feels so smooth.\n-Classes- 9 Playable characters with unique sets.\n-Challanging gameplay- You can get far on the first run if you play good.\n-Bosses- There is a boss for every act in the game with different skills which can be harder for some characters.\n-Replayable- There are higher difficulty levels called NewGamePlus (NG+).\n-COOP- Playing with friends makes the game much better and also the game balances the difficulty.\n-DLC- There are DLCs for the game with new content (locations,game modes and playable characters).\n-Builds- There are different combination of items which makes game interesting in some situations.\n-Quality of life- Game has many quality of life improvements\n-Price- The game is very cheap. The only price is your soul beacuse you won't stop playing it! ;)\n\n\n\n",
"timestamp_created": 1589644982,
"timestamp_updated": 1589644982,
"voted_up": true,
"votes_up": 0,
"votes_funny": 0,
"weighted_vote_score": 0,
"comment_count": 0,
"steam_purchase": true,
"received_for_free": false,
"written_during_early_access": false
},
"69236471": {
"recommendationid": "69236471",
"author": {
"steamid": "76561198279405449",
"num_games_owned": 595,
"num_reviews": 46,
"playtime_forever": 1559,
"playtime_last_two_weeks": 1559,
"last_played": 1589652037
},
"language": "english",
"review": "Yes",
"timestamp_created": 1589635540,
"timestamp_updated": 1589635540,
"voted_up": true,
"votes_up": 0,
"votes_funny": 0,
"weighted_vote_score": 0,
"comment_count": 0,
"steam_purchase": true,
"received_for_free": false,
"written_during_early_access": false
},
"69226790": {
"recommendationid": "69226790",
"author": {
"steamid": "76561198004456693",
"num_games_owned": 82,
"num_reviews": 14,
"playtime_forever": 216,
"playtime_last_two_weeks": 216,
"last_played": 1589579174
},
"language": "english",
"review": "I really like how Hipshot/Crackshell is improving their formula from game to game. Altough SS Bogus Detour I didn't really like, I see how they implemented what they've learnt there to this game. Visuals just keep getting better and better and for that I really can't wait to see Hammerwatch 2 (check their YoutTube channel, early footage is out there).\nGameplay-wise I think it's a perfect match between the classic Hammerwatch feeling and a rougelike setting. My only issue with this game is the random map generator. Most of the time like 1/5 of all levels are just empty dead-ends. Otherwise highly recommend, already see huge amount of gameplay ahead of me.",
"timestamp_created": 1589623437,
"timestamp_updated": 1589623437,
"voted_up": true,
"votes_up": 0,
"votes_funny": 0,
"weighted_vote_score": 0,
"comment_count": 0,
"steam_purchase": true,
"received_for_free": false,
"written_during_early_access": false
},
and so on..
reading this in with df = pd.read_json(r'review_677120.json')
gives the following
reviews query_summary cursors
69245216 {'recommendationid': '69245216', 'author': {'s... NaN NaN
69236471 {'recommendationid': '69236471', 'author': {'s... NaN NaN
69226790 {'recommendationid': '69226790', 'author': {'s... NaN NaN
However I'd like something more along the lines of
steamid num_games_owned num_reviews playtime_forever playtime_last_two_weeks last_played language review
69245216 76561198166378463 31 4 60198 5899 589654367 english "me..
so each line expanded to one row.
I've tried playing around with json_normalize, but none of my attempts seem to work, I either get errors AttributeError: 'str' object has no attribute 'values' for df = json_normalize(df)
Other attempts have resulted in everything being in one row, and not usable.
Would appreciate any help

import pandas as pd
import json
with open('content.json') as f:
# reading in json file
d = json.load(f)
for id in d['reviews']:
# pulls nested author information into main dictionary
for key, val in d['reviews'][id]['author'].items():
d['reviews'][id][key] = val
del d['reviews'][id]['author']
print(d)
# OUTPUT:
{
'reviews': {
'69245216': {'recommendationid': '69245216', 'language': 'english', 'review': 'Me:*Playing Heroes of Hammrwatch\nAlso me 1 year later:*Playing Heroes of Hammrwatch\nIt\'s one of the best rougelites I\'ve ever played. You can easly say that by the amount of hours I have on this game. I also have every achievement in the game.\nThe things I don\'t like about this game:\n-Limit- The game has limits like max damage you can deal. This is not that big problem because you would have to play this game as long as me to hit "the wall". And its because the damage is codded in 32bit number which makes the limit around 2 billion.\n-Tax- There is tax in the game for gold which scales with the amount of gold you have on you what makes no sense.\nThe things I like about this game:\n-Music- There are different themed ones depending on the act you are on.\n-Pixel Art-\n-Graphics- Game feels so smooth.\n-Classes- 9 Playable characters with unique sets.\n-Challanging gameplay- You can get far on the first run if you play good.\n-Bosses- There is a boss for every act in the game with different skills which can be harder for some characters.\n-Replayable- There are higher difficulty levels called NewGamePlus (NG+).\n-COOP- Playing with friends makes the game much better and also the game balances the difficulty.\n-DLC- There are DLCs for the game with new content (locations,game modes and playable characters).\n-Builds- There are different combination of items which makes game interesting in some situations.\n-Quality of life- Game has many quality of life improvements\n-Price- The game is very cheap. The only price is your soul beacuse you won\'t stop playing it! ;)\n\n\n\n', 'timestamp_created': 1589644982, 'timestamp_updated': 1589644982, 'voted_up': True, 'votes_up': 0, 'votes_funny': 0, 'weighted_vote_score': 0, 'comment_count': 0, 'steam_purchase': True, 'received_for_free': False, 'written_during_early_access': False, 'steamid': '76561198166378463', 'num_games_owned': 31, 'num_reviews': 4, 'playtime_forever': 60198, 'playtime_last_two_weeks': 5899, 'last_played': 1589654367},
'69236471': {'recommendationid': '69236471', 'language': 'english', 'review': 'Yes', 'timestamp_created': 1589635540, 'timestamp_updated': 1589635540, 'voted_up': True, 'votes_up': 0, 'votes_funny': 0, 'weighted_vote_score': 0, 'comment_count': 0, 'steam_purchase': True, 'received_for_free': False, 'written_during_early_access': False, 'steamid': '76561198279405449', 'num_games_owned': 595, 'num_reviews': 46, 'playtime_forever': 1559, 'playtime_last_two_weeks': 1559, 'last_played': 1589652037},
'69226790': {'recommendationid': '69226790', 'language': 'english', 'review': "I really like how Hipshot/Crackshell is improving their formula from game to game. Altough SS Bogus Detour I didn't really like, I see how they implemented what they've learnt there to this game. Visuals just keep getting better and better and for that I really can't wait to see Hammerwatch 2 (check their YoutTube channel, early footage is out there).\nGameplay-wise I think it's a perfect match between the classic Hammerwatch feeling and a rougelike setting. My only issue with this game is the random map generator. Most of the time like 1/5 of all levels are just empty dead-ends. Otherwise highly recommend, already see huge amount of gameplay ahead of me.", 'timestamp_created': 1589623437, 'timestamp_updated': 1589623437, 'voted_up': True, 'votes_up': 0, 'votes_funny': 0, 'weighted_vote_score': 0, 'comment_count': 0, 'steam_purchase': True, 'received_for_free': False, 'written_during_early_access': False, 'steamid': '76561198004456693', 'num_games_owned': 82, 'num_reviews': 14, 'playtime_forever': 216, 'playtime_last_two_weeks': 216, 'last_played': 1589579174}
}
}
df = pd.DataFrame.from_dict(d['reviews'], orient='index')
print(df)
# OUTPUT:
recommendationid language ... playtime_last_two_weeks last_played
69245216 69245216 english ... 5899 1589654367
69236471 69236471 english ... 1559 1589652037
69226790 69226790 english ... 216 1589579174
[3 rows x 19 columns]
print(df.axes)
# OUPUT:
[Index(['69245216', '69236471', '69226790'], dtype='object'), Index(['recommendationid', 'language', 'review', 'timestamp_created',
'timestamp_updated', 'voted_up', 'votes_up', 'votes_funny',
'weighted_vote_score', 'comment_count', 'steam_purchase',
'received_for_free', 'written_during_early_access', 'steamid',
'num_games_owned', 'num_reviews', 'playtime_forever',
'playtime_last_two_weeks', 'last_played'],
dtype='object')]

Related

Regex for specific Pattern

I have this text:
What can cause skidding on bends?
All of the following can:
SA Faulty shock-absorbers
SA Insufficient or uneven tyre pressure
[| Load is too small
What can cause a dangerous situation?
SA Brakes which engage heavily on one side
SA Too much steering-wheel play
[| Disturbed reception of traffic information on the radio
It starts raining. Why must you immediately increase the safe distance?
What is correct
[| Because the brakes react more quickly
SA Because a greasy film may form which increases the braking distance
SA Because a second greasy film may form which increases the braking distance
What the text is about?
Above are multiple choice questions with multiple options.
The question stem is almost always ends with '?' but sometimes there is additional text before the multiple option starts.
All options either starts by the word 'SA' or '[|' , all option starts with 'SA'are correct and the option starts with '[|' or '[]' are wrong.
What I want to Do
I want to split the questions and all multiple option and save them into python dictionary/list ideally as key values pairs
{'ques': 'blalal','opt1':'this is option one', 'option2': 'this is option two'} and so on
What I have tried?
rx='r.*\?$\s*\w*(?:SA|\[\|)'
this is Reg101 link

Assuming you have three options at all times:
p = r'(?m)^(?P<ques>\w[^?]*\?)[\s\S]*?^(?P<opt1>(?:SA|\[(?:\||\s])).*)\s+^(?P<opt2>(?:SA|\[(?:\||\s])\[\|).*)\s+^(?P<opt3>(?:SA|\[(?:\||\s])).*)'
dt = [x.groupdict() for x in re.finditer(p, string)]
See regex proof and Python proof.
Results:
[{'ques': 'What can cause skidding on bends?', 'opt1': 'SA Faulty shock-absorbers', 'opt2': 'SA Insufficient or uneven tyre pressure', 'opt3': '[| Load is too small'}, {'ques': 'What can cause a dangerous situation?', 'opt1': 'SA Brakes which engage heavily on one side', 'opt2': 'SA Too much steering-wheel play', 'opt3': '[| Disturbed reception of traffic information on the radio'}, {'ques': 'It starts raining. Why must you immediately increase the safe distance?', 'opt1': '[| Because the brakes react more quickly', 'opt2': 'SA Because a greasy film may form which increases the braking distance', 'opt3': 'SA Because a second greasy film may form which increases the braking distance'}]

This is one of the cases that I would recommend not using regex since it can get very complex very fast. My solution would be the following parser:
def parse(fname = "/tmp/data.txt"):
questions = []
with open(fname) as f:
for line in f:
lstrip = line.strip()
# Skip empty lines
if not lstrip:
continue
# Check for Questions
is_option = (
lstrip.startswith("[]")
or lstrip.startswith("[|")
or lstrip.startswith("SA")
)
if not is_option:
# Here we know that this line is not empty and is not
# an option... We have two options:
# 1. This is continuation of the last question
# 2. This is a new question
if not questions or questions[-1]["options"]:
# Last questions has options, this is a new question!
questions.append({
"ques": [lstrip],
"options": []
})
else:
# We are still parsing the questions part. Add a new line
questions[-1]["ques"].append(lstrip)
# We are done with the question part, move on
continue
# We are only here if we are parsing options!
is_correct = lstrip.startswith("SA")
# We _must_ have at least one question
assert questions
# Add the option
questions[-1]["options"].append({
"option": lstrip,
"correct": is_correct,
"number": len(questions[-1]["options"]) + 1,
})
# End of with
return questions
An example usage of the above and its output:
# main
data = parse()
# json just for pretty printing
import json
print(json.dumps(data, indent=4))
---
$ python3 ~/tmp/so.py
[
{
"ques": [
"What can cause skidding on bends?",
"All of the following can:"
],
"options": [
{
"option": "SA Faulty shock-absorbers",
"correct": true,
"number": 1
},
{
"option": "SA Insufficient or uneven tyre pressure",
"correct": true,
"number": 2
},
{
"option": "[| Load is too small",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"What can cause a dangerous situation?"
],
"options": [
{
"option": "SA Brakes which engage heavily on one side",
"correct": true,
"number": 1
},
{
"option": "SA Too much steering-wheel play",
"correct": true,
"number": 2
},
{
"option": "[| Disturbed reception of traffic information on the radio",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"It starts raining. Why must you immediately increase the safe distance?",
"What is correct"
],
"options": [
{
"option": "[| Because the brakes react more quickly",
"correct": false,
"number": 1
},
{
"option": "SA Because a greasy film may form which increases the braking distance",
"correct": true,
"number": 2
},
{
"option": "SA Because a second greasy film may form which increases the braking distance",
"correct": true,
"number": 3
}
]
}
]
There are few advantages in using a custom parser instead of regex:
A lot more readable (think what would you like to read when you go back to this project in 6 months :) )
More control on which lines you keep or how you trim them
Easier to deal with bad input data (debug them using logging)
That said, data are rarely perfect and in most cases few workarounds might be required to get the desired output. For example, in your original data, the "All of the following can:" does not seem like an option since it does not start with any of the option sequences. However, it also does not seem to me like part of the question! You will have to deal with such cases based on your dataset (and doing so in regex will be a lot harder). In this particular case you can:
Only consider part of the question anything that ends with ? (problematic in 3rd question)
Treat lines starting with "None" or "All" as options
etc
The exact solution depends on your data quality/cases but the code above should be easy to adjust in most cases

Compare json objects with csv file

Edit: So far my code is finding the comparisons. Am working on appending the JSON object data to the row of where the word matching occurs.
I'm trying to find the matching words between my JSON file and my CSV then check where that word has a low rating(the column with decimal values) from the CSV.
If the word has low rating. I record the time of the word and the index of the word (edited). Is there a way for me to use something like pandas to loop over all my json objects and append the objects' data when words are matching on the rightmost column of my csv?
Edit(Per the answers given below):
row,col = dfSynsets.shape
for value in contents['words']:
current_word = value['word']
for csv_row in range(row):
curr_csv_word = dfSynsets.loc[csv_row][-1]
if curr_csv_word == current_word:
print(curr_csv_word)
print(current_word)
This code block produces this output:
universe
universe
in
in
apparent
apparent
mention
mention
passing
passing
way
way
even
even
over
over
there
there
total
total
experiment
experiment
most
most
work
work
by
by
low
low
empty
empty
in
in
fill
fill
Here's an example of my json file
Json File:
{
"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang. So is there such thing as a total absence of everything? This isn't just a thought experiment. Empty spaces, or vacuums, are incredibly useful. Inside our homes, most vacuum cleaners work by using a fan to create a low-pressure relatively empty area that sucks matter in to fill the void. But that's far from empty. There's still plenty of matter bouncing around. Manufacturers rely on more thorough, sealed vacuums for all sorts of purposes. That includes vacuum-packed food that stays fresh longer, and the vacuums inside early light bulbs that protected filaments from degrading. These vacuums are generally created with some version of what a vacuum cleaner does using high-powered pumps that create enough suction to remove as many stray atoms as possible. But the best of these industrial processes tends to leave hundreds of millions of atoms per cubic centimeter of space. That isn't empty enough for scientists who work on experiments, like the Large Hadron Collider, where particle beams need to circulate at close to the speed of light for up to ten hours without hitting any stray atoms. So how do they create a vacuum? The LHC's pipes are made of materials, like stainless steel, that don't release any of their own molecules and are lined with a special coating to absorb stray gases. Raising the temperature to 200 degrees Celsius burns off any moisture, and hundreds of vacuum pumps take two weeks to trap enough gas and debris out of the pipes for the collider's incredibly sensitive experiments. Even with all this, the Large Hadron Collider isn't a perfect vacuum. In the emptiest places, there are still about 100,000 particles per cubic centimeter. But let's say an experiment like that could somehow get every last atom out. There's still an unfathomably huge amount of radiation all around us that can pass right through the walls. Every second, about 50 muons from cosmic rays, 10 million neutrinos coming directly from the Big Bang, 30 million photons from the cosmic microwave background, and 300 trillion neutrinos from the Sun pass through your body. It is possible to shield vacuum chambers with substances, including water, that absorb and reflect this radiation, except for neutrinos. Let's say you've somehow removed all of the atoms and blocked all of the radiation. Is the space now totally empty? Actually, no. All space is filled with what physicists call quantum fields. What we think of as subatomic particles, electrons and photons and their relatives, are actually vibrations in a quantum fabric that extends throughout the universe. And because of a physical law called the Heisenberg Principle, these fields never stop oscillating, even without any particles to set off the ripples. They always have some minimum fluctuation called a vacuum fluctuation. This means they have energy, a huge amount of it. Because Einstein's equations tell us that mass and energy are equivalent, the quantum fluctuations in every cubic meter of space have an energy that corresponds to a mass of about four protons. In other words, the seemingly empty space inside your vacuum would actually weigh a small amount. Quantum fluctuations have existed since the earliest moments of the universe. In the moments after the Big Bang, as the universe expanded, they were amplified and stretched out to cosmic scales. Cosmologists believe that these original quantum fluctuations were the seeds of everything we see today: galaxies and the entire large scale structure of the universe, as well as planets and solar systems. They're also the center of one of the greatest scientific mysteries of our time because according to the current theories, the quantum fluctuations in the vacuum of space ought to have 120 orders of magnitude more energy than we observe. Solving the mystery of that missing energy may entirely rewrite our understanding of physics and the universe. ",
"words": [
{
"alignedWord": "the",
"end": 6.31,
"start": 6.17,
"word": "The"
},
{
"alignedWord": "universe",
"end": 6.83,
"start": 6.31,
"word": "universe"
},
{
"alignedWord": "is",
"end": 7.05,
"start": 6.85,
"word": "is"
},
{
"alignedWord": "bustling",
"end": 7.4799999999999995,
"start": 7.05,
"word": "bustling"
},
{
"alignedWord": "with",
"end": 7.65,
"start": 7.48,
"word": "with"
},
{
"alignedWord": "matter",
"end": 7.970000000000001,
"start": 7.65,
"word": "matter"
},
{
"alignedWord": "and",
"end": 8.09,
"start": 7.97,
"word": "and"
},
{
"alignedWord": "energy",
"end": 8.579999,
"start": 8.099999,
"word": "energy"
},
{
"alignedWord": "even",
"end": 9.35,
"start": 9.08,
"word": "Even"
},
{
"alignedWord": "in",
"end": 9.439999,
"start": 9.349999,
"word": "in"
},
{
"alignedWord": "the",
"end": 9.53,
"start": 9.44,
"word": "the"
},
{
"alignedWord": "vast",
"end": 9.84,
"start": 9.53,
"word": "vast"
},
{
"alignedWord": "apparent",
"end": 10.17,
"start": 9.84,
"word": "apparent"
},
{
"alignedWord": "emptiness",
"end": 10.67,
"start": 10.19,
"word": "emptiness"
},
{
"alignedWord": "of",
"end": 10.8,
"start": 10.67,
"word": "of"
}
]
}
Here's my csv file
CSV File:
572714 0.0 ['knocked out', 'kayoed', '"KOd"', 'out', 'stunned'] "KOd"
0 1771194 0.500000 ['get', '"get under ones skin"'] "get under ones skin"
1 462301 0.125000 ['south-southwest', '"sou-sou-west"'] "sou-sou-west"
2 250898 0.500000 ['between', '"tween"'] "tween"
3 2203763 0.400000 ['thirteenth', '13th'] 13th
4 2202047 0.333333 ['first', '1st'] 1st
... ... ... ... ...
5552 1848465 0.000000 ['move over', 'give way', 'give', 'ease up', '... yield
5553 7176243 0.000000 ['concession', 'conceding', 'yielding'] yielding
5554 14425853 0.000000 ['youth'] youth
5555 8541841 0.250000 ['zone', 'geographical zone'] zone
5556 1943718 0.500000 ['soar', 'soar up', 'soar upwards', 'surge', '... zoom
Example of desired output
col1:synset col2:rating col3:list col4:word col5:json data
9466280 0.5 ['universe', 'existence', 'creation', 'world', 'cosmos', 'macrocosm'] macrocosm
{
"alignedWord": "universe",
"end": 178.109999,
"start": 177.599999,
"word": "universe"
},

As per your questions, I ascertain that you want to traverse the JSON file and retrieve the value of the 'word' key, and compare the value with the last column of the CSV file. If the both the words are the same, print equal, otherwise 'not equal'.
If this is correct, then find the below approach:-
import pandas as pd
df = pd.read_csv(CSV FILE NAME)
row,col = df.shape
for value in contents['words']:
current_word = value['word']
for csv_row in range(row):
curr_csv_word = df.loc[csv_row][-1]
if curr_csv_word == current_word:
print("EQUAL")
else:
print("NOT EQUAL")
I hope you find your answer.

First define a mapping function :
import json
import pandas
def apply_fun (row):
for value in contents['words']
if value['word'] in row['word'] :
return json.dumps(value)
return ""
Then add it to your dataframe :
x = dfSynsets.apply(lambda row : apply_fun(row),axis=1)
dfSynsets.insert(4,'json_ref',x)

Why is NLU not picking up an entire text of data

I appears that NLU is not recognizing the full blob of data I am providing it. Am I doing something wrong in my code or have misplaced assumptions on how the api should work? The response from the api is included and it contains the code that was analyzed as well as the full submitted text. There is a delta and I'm unsure why that is.
Here's my code:
def nlu(text):
print("Calling NLU")
url = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2017-02-27"
data = {
'text': text,
'language': "en",
'return_analyzed_text': True,
'clean': True,
'features': {
'entities': {
'emotion': True,
'sentiment': True,
'limit': 2
},
"concepts": {
"limit": 15
},
'keywords': {
'emotion': True,
'sentiment': True,
'limit': 2
}
}
}
headers = {
'content-type': "application/json"
}
username = os.getenv("nlu-username")
password = os.getenv("nlu-password")
print("NLU", username, password)
print("data", json.dumps(data))
response = requests.request("POST", url, data=json.dumps(data), headers=headers, auth=(username, password))
print("Done calling NLU")
print(response.text)
Here's the request/response:
"keywords": [
{
"text": "anthropologists study skeletons",
"sentiment": {
"score": 0.0
},"analyzed_text": "move between two thousand eight and two thousand twelve archaeologists excavated the rubble of an ancient hospital in England in the process they uncovered a number of skeletons one in particular belong to a wealthy Mel who lived in the eleventh or twelfth century and died of leprosy between the ages of eighteen and twenty five how do we know all this simply by examining some old soil Kate bones even centuries after death skeletons carry unique features that tell us about their identities and using modern tools and techniques we can read those features as clues this is a branch of science known as biological anthropology it allows researchers to piece together details about Incheon individuals and identify historical events that affected whole populations when researchers uncover a skeleton some of the first clues they gather like age and gender line its morphology which is the structure appearance and size of a skeleton mostly the clavicle stop growing at age twenty five so a skeleton with the clavicle that hasn't fully formed must be younger than similarly the plates in the cranium can continue fusing up to age forty and sometimes beyond by combining these with some microscopic skeletal clues physical anthropologists can estimate an approximate age of death meanwhile pelvic bones reveal gender biologically female palaces are wider allowing women to give birth whereas males are narrower those also betrayed the signs of aging disease disorders like anemia leave their traces on the bones and the condition of teeth can reveal clues to factors like diet and malnutrition which sometimes correlate with wealth or poverty a protein called collagen can give us even more profound details the air we breathe water we drink and food we eat leaves permanent traces in our bones and teeth in the form of chemical compounds these compounds contain measurable quantities called isotopes stable isotopes in bone collagen and tooth enamel varies among mammals dependent on where they lived and what they eat so but analyzing these isotopes we can draw direct inferences regarding the diet and location of historic people not only that but during life bones undergo a constant cycle of remodeling so if someone moves from one place to another bones synthesized after that move will also reflect the new isotopic signatures of the surrounding environment that means that skeletons can be used like migratory maps for instance between one and six fifty A. D. the great city of TOT Makana Mexico bustled with thousands of people researchers examined the isotope ratios and skeletons to the now which held details of their diets when they were young they found evidence for significant migration into the city a majority of the individuals were born elsewhere with further geological and skeletal analysis they may be able to map where those people came from that work in tier two Akon is also an example of how bio anthropologists study skeletons in cemeteries and mass graves and analyze their similarities and differences from not information they can learn about cultural beliefs social norms wars and what caused their deaths today we use these tools to answer big questions about how forces like migration and disease shape the modern world DNA analysis is even possible in some relatively well preserved ancient remains that's helping us understand how diseases like tuberculosis have evolved over the centuries so we can build better treatments for people today ocean skeletons can tell us a surprisingly great deal about the past two of your remains are someday buried intact what might archaeologists of the distant future learn from them"

I just tried on NLU with your text and getting a proper response. Check the below result. I think you should try the Watson API Explorer first with your service credentials. It will also help you to fix any misplaced headers or missed param into the API call.
Note: Just remove the "metadata": {} in the parameter object before making the POST call as it's for URL and HTML.
{
"semantic_roles": [{
"subject": {
"text": "anthropologists"
},
"sentence": "anthropologists study skeletons",
"object": {
"text": "skeletons"
},
"action": {
"verb": {
"text": "study",
"tense": "present"
},
"text": "study",
"normalized": "study"
}
}],
"language": "en",
"keywords": [{
"text": "anthropologists",
"relevance": 0.966464
},
{
"text": "skeletons",
"relevance": 0.896147
}
],
"entities": [],
"concepts": [{
"text": "Cultural studies",
"relevance": 0.86926,
"dbpedia_resource": "http://dbpedia.org/resource/Cultural_studies"
}],
"categories": [{
"score": 0.927751,
"label": "/science/social science/anthropology"
},
{
"score": 0.219365,
"label": "/education/homework and study tips"
},
{
"score": 0.128377,
"label": "/science"
}
],
"warnings": [
"emotion: cannot locate keyphrase",
"relations: Not Found",
"sentiment: cannot locate keyphrase"
]
}

In your code you have
data=json.dumps(data)
Which is converting the whole JSON object to a string. That should just be:
data=data
Also I would recommend to use the Python WDC SDK, as it will make it easier for you.
The same example as above.
import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
import watson_developer_cloud.natural_language_understanding.features.v1 as Features
username = os.getenv("nlu-username")
password = os.getenv("nlu-password")
nluv1 = NaturalLanguageUnderstandingV1(
username=username,
password=password)
features = [
Features.Entities(),
Features.Concepts(),
Features.Keywords()
]
def nlu(text):
print('Calling NLU')
response = nluv1.analyze(text,features=features, language='en')
print('Done calling NLU')
print(json.dumps(response, indent=2))

JSON listing all identities in a set

I have recently started working with JSON. The code below shows a snippet of what I'm working with. In this example, I want to extract the set {1411, 1410, 2009, 3089}. Do JSON provide a method for this, or do I need to create it myself?
In case it is relevant, I'm working with Python.
{
"1411": {
"id": 1411,
"plaintext": "Increases Attack Speed, and gives increasing power as you kill Jungle Monsters and Champions",
"description": "<stats>+40% Attack Speed<br>+30 Magic Damage on Hit<\/stats><br><br><unique>UNIQUE Passive - Devouring Spirit:<\/unique> Takedowns on large monsters and Champions increase the magic damage of this item by +1. Takedowns on Rift Scuttlers and Rift Herald increase the magic damage of this item by +2. Takedowns on Dragon and Baron increase the magic damage of this item by +5. At 30 Stacks, your Devourer becomes Sated, granting extra on Hit effects.",
"name": "Enchantment: Devourer",
"group": "JungleItems"
},
"1410": {
"id": 1410,
"plaintext": "Grants Ability Power and periodically empowers your Spells",
"description": "<stats>+60 Ability Power<br>+7% Movement Speed<\/stats><br><br><unique>UNIQUE Passive - Echo:<\/unique> Gain charges upon moving or casting. At 100 charges, the next damaging spell hit expends all charges to deal 60 (+10% of Ability Power) bonus magic damage to up to 4 targets on hit.<br><br>This effect deals 250% damage to Large Monsters. Hitting a Large Monster with this effect will restore 18% of your missing Mana.",
"name": "Enchantment: Runic Echoes",
"group": "JungleItems"
},
"2009": {
"id": 2009,
"description": "<consumable>Click to Consume:<\/consumable> Restores 80 Health and 50 Mana over 10 seconds.",
"name": "Total Biscuit of Rejuvenation"
},
"3089": {
"id": 3089,
"plaintext": "Massively increases Ability Power",
"description": "<stats>+120 Ability Power <\/stats><br><br><unique>UNIQUE Passive:<\/unique> Increases Ability Power by 35%.",
"name": "Rabadon's Deathcap"
}

No, JSON does not provide a method for that or any methods for anything at all. JSON is just a format for representing data, nothing more.

As mentioned by others, JSON is a format and doesn't provide any API. Since you are using python, what you can do is
import json
my_data = json.loads(my_json)
print my_data.keys()
I assume your ids are same as keys. Also, you won't need to do set as keys are unique.

Creating pandas dataframe from json file; getting memory error

I'm trying to read a json file into a pandas dataframe:
df = pd.read_json('output.json',orient='index')
but I'm getting the error:
/usr/local/lib/python2.7/dist-packages/pandas/io/json.pyc
in read_json(path_or_buf, orient, typ, dtype, convert_axes,
convert_dates,keep_default_dates, numpy, precise_float, date_unit)
196 if exists:
197 with open(filepath_or_buffer, 'r') as fh:
--> 198 json = fh.read()
199 else:
200 json = filepath_or_buffer
MemoryError:
I've also tried reading it using gzip:
def parse(path):
g = gzip.open(path, 'rb')
for l in g:
yield eval(l)
def getDF(path):
i = 0
df = {}
for d in parse(path):
df[i] = d
i +=1
#if i == 10000: break ## hack for local testing
return pd.DataFrame.from_dict(df,orient='index')
pathname ='./output.json.gz'
df = getDF(pathname)
But get a segmentation fault. How can I read in a json file (or json.gz) that's this large?
The head of the json file looks like this:
{"reviewerID": "ARMDSTEI0Z7YW", "asin": "0077614992", "reviewerName": "dodo", "helpful": [0, 0], "unixReviewTime": 1360886400, "reviewText": "This book was a requirement for a college class. It was okay to use although it wasn't used much for my particular class", "overall": 5.0, "reviewTime": "02 15, 2013", "summary": "great"}
{"reviewerID": "A3FYN0SZYWN74", "asin": "0615208479", "reviewerName": "Marilyn Mitzel", "helpful": [0, 0], "unixReviewTime": 1228089600, "reviewText": "This is a great gift for anyone who wants to hang on to what they've got or get back what they've lost. I bought it for my 77 year old mom who had a stroke and myself.I'm 55 and like many of us at that age my memory started slipping. You know how it goes. Can't remember where I put my keys, can't remember names and forget about numbers. As a medical reporter I was researching the importance of exercising the brain. I heard about BrainAerobics and that it can help improve and even restore memory. I had nothing to lose, nor did mom so we tried it and were actually amazed how well it works.My memory improved pretty quickly. I used to have to write notes to myself about every thing. Not any more. I can remember my grocery list and errands without writing it all down. I can even remember phone numbers now. You have to keep doing it. Just like going to the gym for your body several times a week, you must do the same for your brain.But it's a lot of fun and gives you a new sense of confidence because you just feel a lot sharper. On top of your game so to speak.That's important in this competitive world today to keep up with the younger one's in the work force. As for mom, her stroke was over two years ago and we thought she would never regain any more brain power but her mind continues to improve. We've noticed a big difference in just the last few months since she has been doing the BrainAerobics program regularly. She's hooked on it and we are believers.Marilyn Mitzel/Aventura, FL", "overall": 5.0, "reviewTime": "12 1, 2008", "summary": "AMAZING HOW QUICKLY IT WORKS!"}
{"reviewerID": "A2J0WRZSAAHUAP", "asin": "0615269990", "reviewerName": "icu-rn", "helpful": [0, 0], "unixReviewTime": 1396742400, "reviewText": "Very helpful in learning about different disease processes and easy to understand. You do not have to be a med student to play. Also you can play alone or with several players", "overall": 5.0, "reviewTime": "04 6, 2014", "summary": "Must have"}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read a nested Json with unique values in Pandas - python

Related

Regex for specific Pattern

Compare json objects with csv file

Why is NLU not picking up an entire text of data

JSON listing all identities in a set

Creating pandas dataframe from json file; getting memory error

Categories

Resources