Extracting value data from multiple JSON strings in a single file - python

I know I am missing the obvious here but I have the following PYTHON code in which I am trying to-
Take a specified JSON file containing multiple strings as an input.
Start at the line 1 and look for the key value of "content_text"
Add the key value to a new dictionary and write said dictionary to a new file
Repeat 1-3 on additional JSON files
import json
def OpenJsonFileAndPullData (JsonFileName, JsonOutputFileName):
output_file=open(JsonOutputFileName, 'w')
result = []
with open(JsonFileName, 'r') as InputFile:
for line in InputFile:
Item=json.loads(line)
my_dict={}
print item
my_dict['Post Content']=item.get('content_text')
my_dict['Type of Post']=item.get('content_type')
print my_dict
result.append(my_dict)
json.dumps(result, output_file)
OpenJsonFileAndPullData ('MyInput.json', 'MyOutput.txt')
However, when run I receive this error:
AttributeError: 'str' object has no attribute 'get'

Python is case-sensitive.
Item = json.loads(line) # variable "Item"
my_dict['Post Content'] = item.get('content_text') # another variable "item"
By the way, why don't you load whole file as json at once?

Related

Deleting specific JSON lines while iterating thorugh key in Python

I have a large JSON file that contains image annotation data. I am iterating through one of the keys below.:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['annotations']:
if i['segmentation'] == [[]]:
print(i['segmentation'])
del i
#print(i['segmentation'])
# Closing file
f.close()
Printing the returned dictionaries, they look like this:
{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":339,"area":0}
I am trying to remove the following above lines in the annotations key that contain no data for segmentation. I am able to extract these lines, I am just not sure how to remove them without breaking the format of the file.
{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":339,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":340,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":341,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":342,"area":0},
...
Here is what finally got it working for me:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
# Iterating through the json
# list
count = 0
for key in data['annotations']:
count +=1
if key['segmentation'] == [[]]:
print(key['segmentation'])
data["annotations"].pop(count)
if key['bbox'] == []:
data["annotations"].pop(count)
#print(i['segmentation'])
with open("newannotations.json", "w") as json_file:
json.dump(data, json_file)
The function json.loads() returns a python dictionary, which you can then modify as you'd like. Similarly json.dumps() can be used to write a json file from a python dictionary.
In order to remove an entry from a dictionary, you can use the dictionary pop() method. Assuming in the above you want to delete each entry referred to with the key i (as per the del i) if the entry in data["annotations"][i]["segmentation"] ==[[]], one could do it approximately as follows:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
# Iterating through the json
# list
for key in data['annotations']:
if data["annotations"][key]['segmentation'] == [[]]:
print(data["annotations"][key]['segmentation'])
data["annotations"].pop(key)
#print(i['segmentation'])
with open("newannotations.json", "w") as json_file:
json.dump(data, json_file)
Is this what you wanted to do?

Accessing items in a dump of dictionary objects in Python

I have a strange dataset from our customer. It is a .json file but inside it looks like below
{"a":"aaa","b":"bbb","text":"hello"}
{"a":"aaa","b":"bbb","text":"hi"}
{"a":"aaa","b":"bbb","text":"hihi"}
As you notice, this is just a dump of dictionary objects. It is neither a list (no [] and comma seperator between objects) nor a proper JSON although the file extension is .json. So I am really confused about how to read this file.
All I care about is reading all the text keys from each of the dictionary objects.
This "strange dataset" is actually an existing format that builds upon JSON, called JSONL.
As #user655321 said, you can parse each line. Here's a more complete example with the complete dataset available in the list of dicts dataset:
import json
dataset = []
with open("my_file.json") as file:
for line in file:
dataset.append(json.loads(line))
In [51]: [json.loads(i)["text"] for i in open("file.json").readlines()]
Out[51]: ['hello', 'hi', 'hihi']
Use list comprehension, it's easier
You can read it line by line and convert the lines to JSON objects and extract the needed data text in your case.
You can do something as follows:
import json
lines = open("file.txt").readlines()
for line in lines:
dictionary = json.loads(line)
print(dictionary["text"])
Since it's not a single JSON file, you can read in the input line by line and deserialize them independently:
import json
with open('my_file.json') as fh:
for line in fh:
json_obj = json.loads(line)
keys = json_obj.keys() # eg, 'a', 'b', 'text'
text_val = json_obj['text'] # eg, 'hello', 'hi', or 'hihi'
How about splitting the content by \n then using json to load each dictionary? something like:
import json
with open(your_file) as f:
data = f.read()
my_dicts = []
for line in data.split():
my_dicts.append(json.loads(line))
import ast
with open('my_file.json') as fh:
for line in fh:
try:
dict_data = ast.literal_eval(line)
assert isinstance(dict_data,dict)
### Process Dictionary Data here or append to list to convert to list of dicts
except (SyntaxError, ValueError, AssertionError):
print('ERROR - {} is not a dictionary'.format(line))

Im getting error while tring to read .fasta file in python

im trying to read a .fasta file as a dictionary and extract the header and sequence separately.there are several headers and sequences in the file.
an example below.
header= CMP12
sequence=agcgtmmnngucnncttsckkld
but when i try to read a fasta file using the function read_f and test it using print(dict.keys()) i get an empty list.
def read_f(fasta):
'''Read a file from a FASTA format'''
dictionary = {}
with open(fasta) as file:
text = file.readlines()
print(text)
name=''
seq= ''
#Create blocks of fasta text for each sequence, EXCEPT the last one
for line in text:
if line[0]=='>':
dictionary[name] = seq
name=line[1:].strip()
seq=''
else: seq = seq + line.strip()
yield name,seq
fasta= ("sample.prot.fasta")
dict = read_f(fasta)
print(dict.keys())
this is the error i get:
'generator' object has no attribute 'keys'
Using the yield keyword implies that when you call the function read_fasta, the function is not executed. Instead, a generator is returned and you have to iterate this generator to get the elements the function yields.
In concrete terms, replacing dict = read_fasta(fasta) by dict = read_fasta(*fasta) should do the job (* is the operator for unpacking).
As Iguananaut already mentioned, Bipython helps you out. (requires biopython package installed)
See Biopython "sequence file to dictionary"
from Bio import SeqIO
fasta= "sample.prot.fasta"
seq_record_dict = SeqIO.to_dict(SeqIO.parse(fasta, "fasta"))

Creating runtime variable in python to fetch data from dictionary object

I have created dictionary object my parsing a json file in python....lets assume the data is as follows
plants = {}
# Add three key-value tuples to the dictionary.
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
This could be a very long dictionary object
But at runtime, I need only few values to be stored in a commaa delimited csv file.....The list of properties desired is in a file....
e.g
radish.color
carrot.juice
So, how would I create in python an app, where I can created dynamic variables such as below to get data of the json object & create a csv file....
at runtime i need variable
plants[radish][color]
plants[carrot][juice]
Thank you to all who help
Sanjay
Consider parsing the text file line by line to retrieve file contents. In the read, split the line by period which denotes the keys of dictionaries. From there, use such a list of keys to retrieve dictionary values. Then, iteratively output values to csv, conditioned by number of items:
Txt file
radish.color
carrot.juice
Python code
import csv
plants = {}
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
data = []
with open("Input.txt", "r") as f:
for line in f:
data.append(line.replace("\n", "").strip().split("."))
with open("Output.csv", "w") as w:
writer = csv.writer(w, lineterminator = '\n')
for item in data:
if len(item) == 2: # ONE-NEST DEEP
writer.writerow([item[0], item[1], plants[item[0]][item[1]]])
if len(item) == 3: # SECOND NEST DEEP
writer.writerow([item[0], item[1], item[2], plants[item[0]][item[1]][item[2]]])
Output csv
radish,color,red
carrot,juice,sour
(Note: the deeper the nest, the more columns will output conflicting with key/value pairs across columns -maybe output different structured csv files like one-level files/second-level files)

Python: Parse string objects into python objects from a file

I've got a .csv file and I need to get some information from it. If I open the file, I can see two lines in it, that says "data" and "notes", and I need to get the information that these two variables have.
When I open the .csv file, it shows these lines:
data =
[0,1,2,3,4,5,3,2,3,4,5,]
notes = [{"text": "Hello", "position":(2,3)}, {"text": "Bye", "position":(4,5)}]
To open the file I use:
import csv
class A()
def __init__(self):
#Some stuff in here
def get_data(self):
file = open(self.file_name, "r")
data = csv.reader(file, delimiter = "\t)
rows = [row for row in data]
Now, to read the information in data, I just write:
for line in row[1][0]:
try:
value_list = int(line)
print value_list
except ValueError:
pass
And, with this I can create another list with these values and print it. Now, I need to read the data from "notes", as you can see, it is a list with dictionaries as elements. What I need to do, is to read the "position" element inside each dictionary and print it.
This is the code that I have:
for notes in row[3][0]:
if notes["position"]:
print notes["position"]
But this, gives me this error:
TypeError: string indices must be integers, not str
How can I access these elements of each dictionary and then print it? Hope you can help me.
This is the .csv file from where I am trying to get the information.
You can change the last part of your code to:
for note in eval(rows[3][0].strip("notes = ")):
if note["position"]:
print note["position"]
If you need the position to be an actual tuple instead of a string, you can change the last line to:
print tuple(note["position"])

Categories