Why is NLU not picking up an entire text of data - python

I appears that NLU is not recognizing the full blob of data I am providing it. Am I doing something wrong in my code or have misplaced assumptions on how the api should work? The response from the api is included and it contains the code that was analyzed as well as the full submitted text. There is a delta and I'm unsure why that is.
Here's my code:
def nlu(text):
print("Calling NLU")
url = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2017-02-27"
data = {
'text': text,
'language': "en",
'return_analyzed_text': True,
'clean': True,
'features': {
'entities': {
'emotion': True,
'sentiment': True,
'limit': 2
},
"concepts": {
"limit": 15
},
'keywords': {
'emotion': True,
'sentiment': True,
'limit': 2
}
}
}
headers = {
'content-type': "application/json"
}
username = os.getenv("nlu-username")
password = os.getenv("nlu-password")
print("NLU", username, password)
print("data", json.dumps(data))
response = requests.request("POST", url, data=json.dumps(data), headers=headers, auth=(username, password))
print("Done calling NLU")
print(response.text)
Here's the request/response:
"keywords": [
{
"text": "anthropologists study skeletons",
"sentiment": {
"score": 0.0
},"analyzed_text": "move between two thousand eight and two thousand twelve archaeologists excavated the rubble of an ancient hospital in England in the process they uncovered a number of skeletons one in particular belong to a wealthy Mel who lived in the eleventh or twelfth century and died of leprosy between the ages of eighteen and twenty five how do we know all this simply by examining some old soil Kate bones even centuries after death skeletons carry unique features that tell us about their identities and using modern tools and techniques we can read those features as clues this is a branch of science known as biological anthropology it allows researchers to piece together details about Incheon individuals and identify historical events that affected whole populations when researchers uncover a skeleton some of the first clues they gather like age and gender line its morphology which is the structure appearance and size of a skeleton mostly the clavicle stop growing at age twenty five so a skeleton with the clavicle that hasn't fully formed must be younger than similarly the plates in the cranium can continue fusing up to age forty and sometimes beyond by combining these with some microscopic skeletal clues physical anthropologists can estimate an approximate age of death meanwhile pelvic bones reveal gender biologically female palaces are wider allowing women to give birth whereas males are narrower those also betrayed the signs of aging disease disorders like anemia leave their traces on the bones and the condition of teeth can reveal clues to factors like diet and malnutrition which sometimes correlate with wealth or poverty a protein called collagen can give us even more profound details the air we breathe water we drink and food we eat leaves permanent traces in our bones and teeth in the form of chemical compounds these compounds contain measurable quantities called isotopes stable isotopes in bone collagen and tooth enamel varies among mammals dependent on where they lived and what they eat so but analyzing these isotopes we can draw direct inferences regarding the diet and location of historic people not only that but during life bones undergo a constant cycle of remodeling so if someone moves from one place to another bones synthesized after that move will also reflect the new isotopic signatures of the surrounding environment that means that skeletons can be used like migratory maps for instance between one and six fifty A. D. the great city of TOT Makana Mexico bustled with thousands of people researchers examined the isotope ratios and skeletons to the now which held details of their diets when they were young they found evidence for significant migration into the city a majority of the individuals were born elsewhere with further geological and skeletal analysis they may be able to map where those people came from that work in tier two Akon is also an example of how bio anthropologists study skeletons in cemeteries and mass graves and analyze their similarities and differences from not information they can learn about cultural beliefs social norms wars and what caused their deaths today we use these tools to answer big questions about how forces like migration and disease shape the modern world DNA analysis is even possible in some relatively well preserved ancient remains that's helping us understand how diseases like tuberculosis have evolved over the centuries so we can build better treatments for people today ocean skeletons can tell us a surprisingly great deal about the past two of your remains are someday buried intact what might archaeologists of the distant future learn from them"

I just tried on NLU with your text and getting a proper response. Check the below result. I think you should try the Watson API Explorer first with your service credentials. It will also help you to fix any misplaced headers or missed param into the API call.
Note: Just remove the "metadata": {} in the parameter object before making the POST call as it's for URL and HTML.
{
"semantic_roles": [{
"subject": {
"text": "anthropologists"
},
"sentence": "anthropologists study skeletons",
"object": {
"text": "skeletons"
},
"action": {
"verb": {
"text": "study",
"tense": "present"
},
"text": "study",
"normalized": "study"
}
}],
"language": "en",
"keywords": [{
"text": "anthropologists",
"relevance": 0.966464
},
{
"text": "skeletons",
"relevance": 0.896147
}
],
"entities": [],
"concepts": [{
"text": "Cultural studies",
"relevance": 0.86926,
"dbpedia_resource": "http://dbpedia.org/resource/Cultural_studies"
}],
"categories": [{
"score": 0.927751,
"label": "/science/social science/anthropology"
},
{
"score": 0.219365,
"label": "/education/homework and study tips"
},
{
"score": 0.128377,
"label": "/science"
}
],
"warnings": [
"emotion: cannot locate keyphrase",
"relations: Not Found",
"sentiment: cannot locate keyphrase"
]
}

In your code you have
data=json.dumps(data)
Which is converting the whole JSON object to a string. That should just be:
data=data
Also I would recommend to use the Python WDC SDK, as it will make it easier for you.
The same example as above.
import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
import watson_developer_cloud.natural_language_understanding.features.v1 as Features
username = os.getenv("nlu-username")
password = os.getenv("nlu-password")
nluv1 = NaturalLanguageUnderstandingV1(
username=username,
password=password)
features = [
Features.Entities(),
Features.Concepts(),
Features.Keywords()
]
def nlu(text):
print('Calling NLU')
response = nluv1.analyze(text,features=features, language='en')
print('Done calling NLU')
print(json.dumps(response, indent=2))

Related

Python Check if Key/Value exists in JSON output

I have a JSON output and I want to create an IF statement so if it contains the value I am looking for the do something ELSE do something else.
JSON Blob 1
[
{
"domain":"www.slatergordon.co.uk",
"displayed_link":"https://www.slatergordon.co.uk/",
"description":"We Work With Thousands Of People Across The UK In All Areas Of Personal Legal Services. Regardless Of How You Have Been Injured Through Negligence We're Here To Help You. Personal Injury Experts.",
"position":1,
"block_position":"top",
"title":"Car Claims Solicitors - No Win No Fee Solicitors - SlaterGordon.co.uk",
"link":"https://www.slatergordon.co.uk/personal-injury-claim/road-traffic-accidents-solicitors/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABABGgJwdg&sig=AOD64_3u1ct0jmXAnvemxFHh_tfK5UK8Xg&q&adurl"
},
{
"is_phone_ad":true,
"phone_number":"0333 358 0496",
"domain":"www.accident-claimsline.co.uk",
"displayed_link":"http://www.accident-claimsline.co.uk/",
"description":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"sitelinks":[
{
"title":"Replacement Vehicle Hire",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABALGgJwdg&ae=2&sig=AOD64_20YjAoyMY_c6XVTnBU1vQAD2tDTA&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAM&adurl="
},
{
"title":"Request a Call Back",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAOGgJwdg&ae=2&sig=AOD64_36-Pd831AXrPbh1yvUyTbhXH2irg&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAN&adurl="
}
],
"position":6,
"block_position":"bottom",
"title":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"link":"http://www.accident-claimsline.co.uk/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAGGgJwdg&ae=2&sig=AOD64_09pMtWxFo9s8c1dL16NJo5ThOlrg&q&adurl"
}
]
JSON Blob 2
JSON
[
{
"domain":"www.slatergordon.co.uk",
"displayed_link":"https://www.slatergordon.co.uk/",
"description":"We Work With Thousands Of People Across The UK In All Areas Of Personal Legal Services. Regardless Of How You Have Been Injured Through Negligence We're Here To Help You. Personal Injury Experts.",
"position":1,
"block_position":"top",
"title":"Car Claims Solicitors - No Win No Fee Solicitors - SlaterGordon.co.uk",
"link":"https://www.slatergordon.co.uk/personal-injury-claim/road-traffic-accidents-solicitors/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABABGgJwdg&sig=AOD64_3u1ct0jmXAnvemxFHh_tfK5UK8Xg&q&adurl"
},
{
"is_phone_ad":true,
"phone_number":"0333 358 0496",
"domain":"www.accident-claimsline.co.uk",
"displayed_link":"http://www.accident-claimsline.co.uk/",
"description":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"sitelinks":[
{
"title":"Replacement Vehicle Hire",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABALGgJwdg&ae=2&sig=AOD64_20YjAoyMY_c6XVTnBU1vQAD2tDTA&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAM&adurl="
},
{
"title":"Request a Call Back",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAOGgJwdg&ae=2&sig=AOD64_36-Pd831AXrPbh1yvUyTbhXH2irg&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAN&adurl="
}
],
"position":6,
"block_position":"top",
"title":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"link":"http://www.accident-claimsline.co.uk/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAGGgJwdg&ae=2&sig=AOD64_09pMtWxFo9s8c1dL16NJo5ThOlrg&q&adurl"
}
]
Desired Output
if "block_position":"bottom" in JSONBlob:
do something
else:
do something else
but I cant seem to get it to trigger for me. I want it to search through the entire output and if it contains that key/value do something and if it doesnt contain it do something else.
Blob 1 would go down the IF path
Blob 2 would go down the else path
The main problem you have here is that the JSON output is a list/array with two objects inside. As you can have the block_position key in any of the inner objects, you could do something like this:
if any([obj.get('block_position') == 'bottom' for obj in JSONBlob]):
print('I do something')
else:
print('I do somehting else')
EDIT 1: OK, I think I got your point. You only need to do something for each object with block_position set to bottom. Then the following should do it:
for obj in JSONBlob:
if obj.get('block_position') == 'bottom':
print('I do something with the object')
else:
print('I do something else with the object')
EDIT 2: As spoken in the post, if you only want to do something with the objects with block_position set as bottom, you can suppress the else clause as follows:
for obj in JSONBlob:
if obj.get('block_position') == 'bottom':
print('I do something with the object')
you can use JMESPath library. Its a query language for JSON.
Basic jmespath expression for your case would be [?block_position==bottom]. This will filter out the specific node for you.
I tried it online here with data provided by you.
If you are looking for more nested node you will only have to alter your expression to search that specific node.

Regex for specific Pattern

I have this text:
What can cause skidding on bends?
All of the following can:
SA Faulty shock-absorbers
SA Insufficient or uneven tyre pressure
[| Load is too small
What can cause a dangerous situation?
SA Brakes which engage heavily on one side
SA Too much steering-wheel play
[| Disturbed reception of traffic information on the radio
It starts raining. Why must you immediately increase the safe distance?
What is correct
[| Because the brakes react more quickly
SA Because a greasy film may form which increases the braking distance
SA Because a second greasy film may form which increases the braking distance
What the text is about?
Above are multiple choice questions with multiple options.
The question stem is almost always ends with '?' but sometimes there is additional text before the multiple option starts.
All options either starts by the word 'SA' or '[|' , all option starts with 'SA'are correct and the option starts with '[|' or '[]' are wrong.
What I want to Do
I want to split the questions and all multiple option and save them into python dictionary/list ideally as key values pairs
{'ques': 'blalal','opt1':'this is option one', 'option2': 'this is option two'} and so on
What I have tried?
rx='r.*\?$\s*\w*(?:SA|\[\|)'
this is Reg101 link
Assuming you have three options at all times:
p = r'(?m)^(?P<ques>\w[^?]*\?)[\s\S]*?^(?P<opt1>(?:SA|\[(?:\||\s])).*)\s+^(?P<opt2>(?:SA|\[(?:\||\s])\[\|).*)\s+^(?P<opt3>(?:SA|\[(?:\||\s])).*)'
dt = [x.groupdict() for x in re.finditer(p, string)]
See regex proof and Python proof.
Results:
[{'ques': 'What can cause skidding on bends?', 'opt1': 'SA Faulty shock-absorbers', 'opt2': 'SA Insufficient or uneven tyre pressure', 'opt3': '[| Load is too small'}, {'ques': 'What can cause a dangerous situation?', 'opt1': 'SA Brakes which engage heavily on one side', 'opt2': 'SA Too much steering-wheel play', 'opt3': '[| Disturbed reception of traffic information on the radio'}, {'ques': 'It starts raining. Why must you immediately increase the safe distance?', 'opt1': '[| Because the brakes react more quickly', 'opt2': 'SA Because a greasy film may form which increases the braking distance', 'opt3': 'SA Because a second greasy film may form which increases the braking distance'}]
This is one of the cases that I would recommend not using regex since it can get very complex very fast. My solution would be the following parser:
def parse(fname = "/tmp/data.txt"):
questions = []
with open(fname) as f:
for line in f:
lstrip = line.strip()
# Skip empty lines
if not lstrip:
continue
# Check for Questions
is_option = (
lstrip.startswith("[]")
or lstrip.startswith("[|")
or lstrip.startswith("SA")
)
if not is_option:
# Here we know that this line is not empty and is not
# an option... We have two options:
# 1. This is continuation of the last question
# 2. This is a new question
if not questions or questions[-1]["options"]:
# Last questions has options, this is a new question!
questions.append({
"ques": [lstrip],
"options": []
})
else:
# We are still parsing the questions part. Add a new line
questions[-1]["ques"].append(lstrip)
# We are done with the question part, move on
continue
# We are only here if we are parsing options!
is_correct = lstrip.startswith("SA")
# We _must_ have at least one question
assert questions
# Add the option
questions[-1]["options"].append({
"option": lstrip,
"correct": is_correct,
"number": len(questions[-1]["options"]) + 1,
})
# End of with
return questions
An example usage of the above and its output:
# main
data = parse()
# json just for pretty printing
import json
print(json.dumps(data, indent=4))
---
$ python3 ~/tmp/so.py
[
{
"ques": [
"What can cause skidding on bends?",
"All of the following can:"
],
"options": [
{
"option": "SA Faulty shock-absorbers",
"correct": true,
"number": 1
},
{
"option": "SA Insufficient or uneven tyre pressure",
"correct": true,
"number": 2
},
{
"option": "[| Load is too small",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"What can cause a dangerous situation?"
],
"options": [
{
"option": "SA Brakes which engage heavily on one side",
"correct": true,
"number": 1
},
{
"option": "SA Too much steering-wheel play",
"correct": true,
"number": 2
},
{
"option": "[| Disturbed reception of traffic information on the radio",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"It starts raining. Why must you immediately increase the safe distance?",
"What is correct"
],
"options": [
{
"option": "[| Because the brakes react more quickly",
"correct": false,
"number": 1
},
{
"option": "SA Because a greasy film may form which increases the braking distance",
"correct": true,
"number": 2
},
{
"option": "SA Because a second greasy film may form which increases the braking distance",
"correct": true,
"number": 3
}
]
}
]
There are few advantages in using a custom parser instead of regex:
A lot more readable (think what would you like to read when you go back to this project in 6 months :) )
More control on which lines you keep or how you trim them
Easier to deal with bad input data (debug them using logging)
That said, data are rarely perfect and in most cases few workarounds might be required to get the desired output. For example, in your original data, the "All of the following can:" does not seem like an option since it does not start with any of the option sequences. However, it also does not seem to me like part of the question! You will have to deal with such cases based on your dataset (and doing so in regex will be a lot harder). In this particular case you can:
Only consider part of the question anything that ends with ? (problematic in 3rd question)
Treat lines starting with "None" or "All" as options
etc
The exact solution depends on your data quality/cases but the code above should be easy to adjust in most cases

Compare json objects with csv file

Edit: So far my code is finding the comparisons. Am working on appending the JSON object data to the row of where the word matching occurs.
I'm trying to find the matching words between my JSON file and my CSV then check where that word has a low rating(the column with decimal values) from the CSV.
If the word has low rating. I record the time of the word and the index of the word (edited). Is there a way for me to use something like pandas to loop over all my json objects and append the objects' data when words are matching on the rightmost column of my csv?
Edit(Per the answers given below):
row,col = dfSynsets.shape
for value in contents['words']:
current_word = value['word']
for csv_row in range(row):
curr_csv_word = dfSynsets.loc[csv_row][-1]
if curr_csv_word == current_word:
print(curr_csv_word)
print(current_word)
This code block produces this output:
universe
universe
in
in
apparent
apparent
mention
mention
passing
passing
way
way
even
even
over
over
there
there
total
total
experiment
experiment
most
most
work
work
by
by
low
low
empty
empty
in
in
fill
fill
Here's an example of my json file
Json File:
{
"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang. So is there such thing as a total absence of everything? This isn't just a thought experiment. Empty spaces, or vacuums, are incredibly useful. Inside our homes, most vacuum cleaners work by using a fan to create a low-pressure relatively empty area that sucks matter in to fill the void. But that's far from empty. There's still plenty of matter bouncing around. Manufacturers rely on more thorough, sealed vacuums for all sorts of purposes. That includes vacuum-packed food that stays fresh longer, and the vacuums inside early light bulbs that protected filaments from degrading. These vacuums are generally created with some version of what a vacuum cleaner does using high-powered pumps that create enough suction to remove as many stray atoms as possible. But the best of these industrial processes tends to leave hundreds of millions of atoms per cubic centimeter of space. That isn't empty enough for scientists who work on experiments, like the Large Hadron Collider, where particle beams need to circulate at close to the speed of light for up to ten hours without hitting any stray atoms. So how do they create a vacuum? The LHC's pipes are made of materials, like stainless steel, that don't release any of their own molecules and are lined with a special coating to absorb stray gases. Raising the temperature to 200 degrees Celsius burns off any moisture, and hundreds of vacuum pumps take two weeks to trap enough gas and debris out of the pipes for the collider's incredibly sensitive experiments. Even with all this, the Large Hadron Collider isn't a perfect vacuum. In the emptiest places, there are still about 100,000 particles per cubic centimeter. But let's say an experiment like that could somehow get every last atom out. There's still an unfathomably huge amount of radiation all around us that can pass right through the walls. Every second, about 50 muons from cosmic rays, 10 million neutrinos coming directly from the Big Bang, 30 million photons from the cosmic microwave background, and 300 trillion neutrinos from the Sun pass through your body. It is possible to shield vacuum chambers with substances, including water, that absorb and reflect this radiation, except for neutrinos. Let's say you've somehow removed all of the atoms and blocked all of the radiation. Is the space now totally empty? Actually, no. All space is filled with what physicists call quantum fields. What we think of as subatomic particles, electrons and photons and their relatives, are actually vibrations in a quantum fabric that extends throughout the universe. And because of a physical law called the Heisenberg Principle, these fields never stop oscillating, even without any particles to set off the ripples. They always have some minimum fluctuation called a vacuum fluctuation. This means they have energy, a huge amount of it. Because Einstein's equations tell us that mass and energy are equivalent, the quantum fluctuations in every cubic meter of space have an energy that corresponds to a mass of about four protons. In other words, the seemingly empty space inside your vacuum would actually weigh a small amount. Quantum fluctuations have existed since the earliest moments of the universe. In the moments after the Big Bang, as the universe expanded, they were amplified and stretched out to cosmic scales. Cosmologists believe that these original quantum fluctuations were the seeds of everything we see today: galaxies and the entire large scale structure of the universe, as well as planets and solar systems. They're also the center of one of the greatest scientific mysteries of our time because according to the current theories, the quantum fluctuations in the vacuum of space ought to have 120 orders of magnitude more energy than we observe. Solving the mystery of that missing energy may entirely rewrite our understanding of physics and the universe. ",
"words": [
{
"alignedWord": "the",
"end": 6.31,
"start": 6.17,
"word": "The"
},
{
"alignedWord": "universe",
"end": 6.83,
"start": 6.31,
"word": "universe"
},
{
"alignedWord": "is",
"end": 7.05,
"start": 6.85,
"word": "is"
},
{
"alignedWord": "bustling",
"end": 7.4799999999999995,
"start": 7.05,
"word": "bustling"
},
{
"alignedWord": "with",
"end": 7.65,
"start": 7.48,
"word": "with"
},
{
"alignedWord": "matter",
"end": 7.970000000000001,
"start": 7.65,
"word": "matter"
},
{
"alignedWord": "and",
"end": 8.09,
"start": 7.97,
"word": "and"
},
{
"alignedWord": "energy",
"end": 8.579999,
"start": 8.099999,
"word": "energy"
},
{
"alignedWord": "even",
"end": 9.35,
"start": 9.08,
"word": "Even"
},
{
"alignedWord": "in",
"end": 9.439999,
"start": 9.349999,
"word": "in"
},
{
"alignedWord": "the",
"end": 9.53,
"start": 9.44,
"word": "the"
},
{
"alignedWord": "vast",
"end": 9.84,
"start": 9.53,
"word": "vast"
},
{
"alignedWord": "apparent",
"end": 10.17,
"start": 9.84,
"word": "apparent"
},
{
"alignedWord": "emptiness",
"end": 10.67,
"start": 10.19,
"word": "emptiness"
},
{
"alignedWord": "of",
"end": 10.8,
"start": 10.67,
"word": "of"
}
]
}
Here's my csv file
CSV File:
572714 0.0 ['knocked out', 'kayoed', '"KOd"', 'out', 'stunned'] "KOd"
0 1771194 0.500000 ['get', '"get under ones skin"'] "get under ones skin"
1 462301 0.125000 ['south-southwest', '"sou-sou-west"'] "sou-sou-west"
2 250898 0.500000 ['between', '"tween"'] "tween"
3 2203763 0.400000 ['thirteenth', '13th'] 13th
4 2202047 0.333333 ['first', '1st'] 1st
... ... ... ... ...
5552 1848465 0.000000 ['move over', 'give way', 'give', 'ease up', '... yield
5553 7176243 0.000000 ['concession', 'conceding', 'yielding'] yielding
5554 14425853 0.000000 ['youth'] youth
5555 8541841 0.250000 ['zone', 'geographical zone'] zone
5556 1943718 0.500000 ['soar', 'soar up', 'soar upwards', 'surge', '... zoom
Example of desired output
col1:synset col2:rating col3:list col4:word col5:json data
9466280 0.5 ['universe', 'existence', 'creation', 'world', 'cosmos', 'macrocosm'] macrocosm
{
"alignedWord": "universe",
"end": 178.109999,
"start": 177.599999,
"word": "universe"
},
As per your questions, I ascertain that you want to traverse the JSON file and retrieve the value of the 'word' key, and compare the value with the last column of the CSV file. If the both the words are the same, print equal, otherwise 'not equal'.
If this is correct, then find the below approach:-
import pandas as pd
df = pd.read_csv(CSV FILE NAME)
row,col = df.shape
for value in contents['words']:
current_word = value['word']
for csv_row in range(row):
curr_csv_word = df.loc[csv_row][-1]
if curr_csv_word == current_word:
print("EQUAL")
else:
print("NOT EQUAL")
I hope you find your answer.
First define a mapping function :
import json
import pandas
def apply_fun (row):
for value in contents['words']
if value['word'] in row['word'] :
return json.dumps(value)
return ""
Then add it to your dataframe :
x = dfSynsets.apply(lambda row : apply_fun(row),axis=1)
dfSynsets.insert(4,'json_ref',x)

JSON listing all identities in a set

I have recently started working with JSON. The code below shows a snippet of what I'm working with. In this example, I want to extract the set {1411, 1410, 2009, 3089}. Do JSON provide a method for this, or do I need to create it myself?
In case it is relevant, I'm working with Python.
{
"1411": {
"id": 1411,
"plaintext": "Increases Attack Speed, and gives increasing power as you kill Jungle Monsters and Champions",
"description": "<stats>+40% Attack Speed<br>+30 Magic Damage on Hit<\/stats><br><br><unique>UNIQUE Passive - Devouring Spirit:<\/unique> Takedowns on large monsters and Champions increase the magic damage of this item by +1. Takedowns on Rift Scuttlers and Rift Herald increase the magic damage of this item by +2. Takedowns on Dragon and Baron increase the magic damage of this item by +5. At 30 Stacks, your Devourer becomes Sated, granting extra on Hit effects.",
"name": "Enchantment: Devourer",
"group": "JungleItems"
},
"1410": {
"id": 1410,
"plaintext": "Grants Ability Power and periodically empowers your Spells",
"description": "<stats>+60 Ability Power<br>+7% Movement Speed<\/stats><br><br><unique>UNIQUE Passive - Echo:<\/unique> Gain charges upon moving or casting. At 100 charges, the next damaging spell hit expends all charges to deal 60 (+10% of Ability Power) bonus magic damage to up to 4 targets on hit.<br><br>This effect deals 250% damage to Large Monsters. Hitting a Large Monster with this effect will restore 18% of your missing Mana.",
"name": "Enchantment: Runic Echoes",
"group": "JungleItems"
},
"2009": {
"id": 2009,
"description": "<consumable>Click to Consume:<\/consumable> Restores 80 Health and 50 Mana over 10 seconds.",
"name": "Total Biscuit of Rejuvenation"
},
"3089": {
"id": 3089,
"plaintext": "Massively increases Ability Power",
"description": "<stats>+120 Ability Power <\/stats><br><br><unique>UNIQUE Passive:<\/unique> Increases Ability Power by 35%.",
"name": "Rabadon's Deathcap"
}
No, JSON does not provide a method for that or any methods for anything at all. JSON is just a format for representing data, nothing more.
As mentioned by others, JSON is a format and doesn't provide any API. Since you are using python, what you can do is
import json
my_data = json.loads(my_json)
print my_data.keys()
I assume your ids are same as keys. Also, you won't need to do set as keys are unique.

Json file to dictionary

I am using the yelp dataset and I want to parse the review json file to a dictionary. I tried loading it on a pandas DataFrame and then creating the dictionary, but because the file is too big it is time consuming. I want to keep only the user_id and stars values. A line of the json file looks like this:
{
"votes": {
"funny": 0, "useful": 2, "cool": 1},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17",
"text": (
"dr. goldberg offers everything i look for in a general practitioner. "
"he's nice and easy to talk to without being patronizing; he's always on "
"time in seeing his patients; he's affiliated with a top-notch hospital (nyu) "
"which my parents have explained to me is very important in case something "
"happens and you need surgery; and you can get referrals to see specialists "
"without having to see him first. really, what more do you need? i'm "
"sitting here trying to think of any complaints i have about him, but i'm "
"really drawing a blank."
),
"type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
How can i iterate over every 'field' (for the lack o a better word)? So far i can only iterate over each line.
EDIT
As requested pandas code :
reading the json
with open('yelp_academic_dataset_review.json') as f:
df = pd.DataFrame(json.loads(line) for line in f)
Creating the dictionary
dict = {}
for i, row in df.iterrows():
business_id = row['business_id']
user_id = row['user_id']
rating = row['stars']
key = (business_id, user_id)
dict[key] = rating
You don't need to read this into a DataFrame. json.load() returns a dictionary. For example:
sample.json
{
"votes": {
"funny": 0,
"useful": 2,
"cool": 1
},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg",
"stars": 5,
"date": "2007-05-17",
"text": "dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank.",
"type": "review",
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
read_json.py
import json
with open('sample.json', 'r') as fh:
result_dict = json.load(fh)
print(result_dict['user_id'])
print(result_dict['stars'])
output
Xqd0DzHaiyRqVH3WRG7hzg
5
With that output you can easily create a DataFrame.
There are several good discussions about parsing json as a stream on SO, but the gist is it's not possible natively, although some tools seem to attempt it.
In the interest of keeping your code simple and with minimal dependencies, you might see if reading the json directory into a dictionary is a sufficient improvement.

Categories