Accessing items in a dump of dictionary objects in Python - python

I have a strange dataset from our customer. It is a .json file but inside it looks like below
{"a":"aaa","b":"bbb","text":"hello"}
{"a":"aaa","b":"bbb","text":"hi"}
{"a":"aaa","b":"bbb","text":"hihi"}
As you notice, this is just a dump of dictionary objects. It is neither a list (no [] and comma seperator between objects) nor a proper JSON although the file extension is .json. So I am really confused about how to read this file.
All I care about is reading all the text keys from each of the dictionary objects.

This "strange dataset" is actually an existing format that builds upon JSON, called JSONL.
As #user655321 said, you can parse each line. Here's a more complete example with the complete dataset available in the list of dicts dataset:
import json
dataset = []
with open("my_file.json") as file:
for line in file:
dataset.append(json.loads(line))

In [51]: [json.loads(i)["text"] for i in open("file.json").readlines()]
Out[51]: ['hello', 'hi', 'hihi']
Use list comprehension, it's easier

You can read it line by line and convert the lines to JSON objects and extract the needed data text in your case.
You can do something as follows:
import json
lines = open("file.txt").readlines()
for line in lines:
dictionary = json.loads(line)
print(dictionary["text"])

Since it's not a single JSON file, you can read in the input line by line and deserialize them independently:
import json
with open('my_file.json') as fh:
for line in fh:
json_obj = json.loads(line)
keys = json_obj.keys() # eg, 'a', 'b', 'text'
text_val = json_obj['text'] # eg, 'hello', 'hi', or 'hihi'

How about splitting the content by \n then using json to load each dictionary? something like:
import json
with open(your_file) as f:
data = f.read()
my_dicts = []
for line in data.split():
my_dicts.append(json.loads(line))

import ast
with open('my_file.json') as fh:
for line in fh:
try:
dict_data = ast.literal_eval(line)
assert isinstance(dict_data,dict)
### Process Dictionary Data here or append to list to convert to list of dicts
except (SyntaxError, ValueError, AssertionError):
print('ERROR - {} is not a dictionary'.format(line))

Related

How to parse dictionary syntaxed string to dictionary object

I have a file syntaxed in a way that ressembles a Dictionary as follows:
{YEARS:5}
{GROUPS:[1,2]}
{SAVE_FILE:{USE:1,NAME:CustomCalendar.ics}}
{SAVE_ONLINE:{USE:1,NAME:Custom Calendar,EMAIL:an.email#something.com,PASSWORD:AcompLExP#ssw0rd}}
{COURSES:[BTC,CIT,CQN,OSA,PRJCQN,PRJIOT,PILS,SPO,SHS1]}
I would like to find a way to parse each individual line into a dictionary as it is written. The difficulty I have is that some of these lines contain a dictionary as their value.
I am capable of taking the single lines and converting them to actual dictionaries but I am having an issue when working with the other lines.
Here is the code I have so far:
def get_config(filename=):
with open(filename, encoding="utf8") as config:
years = config.read().split()[0]
print(parse_line(years))
def parse_line(input_line):
input_line = input_line.strip("{}")
input_line = input_line.split(":")
return {input_line[i]: input_line[i + 1] for i in range(0, len(input_line), 2)}
If at all possible, I'd love to be able to deal with any line within a single function and hopefully deal with more than two nested dictionaries.
Thanks in advance!
If your file would contain valid JSON format, it would be an easy task to read the file and convert your data structures to dictionaries.
To give an example, consider having the following line of text in a file text.txt:
{"SAVE_ONLINE":{"USE":1,"NAME":"Custom Calendar","EMAIL":"an.email#something.com","PASSWORD":"AcompLExP#ssw0rd"}}
Please note, that the only difference are the quotes " around strings.
You can easily parse the line to a dictionary structure with:
import json
with open('text.txt', 'r') as f:
d = json.loads(f.read())
Output
print(d)
# {'SAVE_ONLINE': {'USE': 1, 'NAME': 'Custom Calendar', 'EMAIL': 'an.email#something.com', 'PASSWORD': 'AcompLExP#ssw0rd'}}

need help to improve my Python script performance that uses nested loop and json file

I need help with improving my script's execution time.
It does what it suppose to do:
Reads a file line by line
Matches the line with the content of json file
Writes both the matching lines with the corresponding information from json file into a new txt file
The problem is with execution time, the file has more than 500,000 lines and the json file contains much more.
How can I optimize this script?
import json
import time
start = time.time()
print start
JsonFile=open('categories.json')
data = json.load(JsonFile)
Annotated_Data={}
FileList = [line.rstrip('\n') for line in open("FilesNamesID.txt")]
for File in FileList:
for key, value in data.items():
if File == key:
Annotated_Data[key]=(value)
with open('Annotated_Files.txt', 'w') as outfile:
json.dump(Annotated_Data, outfile, indent=4)
end = time.time()
print(end - start)
There is no need for the nested for loop to look up the File in data. You could replace it with the following code:
for File in FileList:
if File in data:
Annotated_Data[File]=data[File]
or with a comprehension:
AnnotatedData = {File: data[File] for File in FileList if File in data}
You can also avoid copying the contents of the whole FilesNamesID.txt to the new list - you are consuming it line by line anyway - but it would be a relatively minor improvement.
I don't know exact format of your data, but you could try speed-up your script by using set():
json_data = '''
{
"file1": "data1",
"file2": "data2",
"file3": "data3"
}
'''
filenames_id_txt = '''
file1
file3
'''
import json
data = json.loads(json_data)
lines = [l.strip() for l in filenames_id_txt.splitlines() if l.strip()]
s = set(data.keys())
Annotated_Data = {k: data[k] for k in s.intersection(lines)}
print(json.dumps(Annotated_Data))
Prints:
{"file3": "data3", "file1": "data1"}
EDIT: If I understand your question correctly, you want to find "intersection" between your JSON data and lines in your TXT file.
I chose the set() (doc) to store the JSON keys (set is collection of unique elements). The set() has very fast methods, one of the method is intersection() (doc), which accepts other iterators (e.g. lines from the TXT file) and return a new set with common elements.
I use this new set to construct new dictionary and output it as JSON file.

Check if line is a dictionary type and print the data?

I have .txt file and its loaded with a lot of text but in between 2-3 paragraphs there is a text like dictionary:
somerandomtextinthisline
{"key1":"value1","key2":"value2"}
somerandomtextinthislineblasd
asbdjalsdnlasd
dasdjasdkjn
<space>
{"key1":"value1","key2":"value2"}
someranomtextaganinasdlasd
asdasd
So what I want to do is read the whole file and grab all 'key2' from the file and paste it in a file called result.txt.
How can I code this?
Use ast.literal_eval to convert it to a dictionary (if possible) and check if the parsed line can be indexed using 'key2':
import ast
with open(filename) as fin:
for line in fin:
try:
parsed = ast.literal_eval(line)
key2 = parsed['key2']
except Exception:
continue
print(key2) # I just print it here, you probably need to write it to another file instead
You can use regex to match a dictionary in the file:
import re
import ast
data = [i.strip('\n') for i in open('filename.txt')]
final_dicts = list(map(ast.literal_eval, [re.sub("\s+", '', i) for i in data if re.findall('\{.*?:.*?,*\}', re.sub("\s+", '', i))]))

Extracting value data from multiple JSON strings in a single file

I know I am missing the obvious here but I have the following PYTHON code in which I am trying to-
Take a specified JSON file containing multiple strings as an input.
Start at the line 1 and look for the key value of "content_text"
Add the key value to a new dictionary and write said dictionary to a new file
Repeat 1-3 on additional JSON files
import json
def OpenJsonFileAndPullData (JsonFileName, JsonOutputFileName):
output_file=open(JsonOutputFileName, 'w')
result = []
with open(JsonFileName, 'r') as InputFile:
for line in InputFile:
Item=json.loads(line)
my_dict={}
print item
my_dict['Post Content']=item.get('content_text')
my_dict['Type of Post']=item.get('content_type')
print my_dict
result.append(my_dict)
json.dumps(result, output_file)
OpenJsonFileAndPullData ('MyInput.json', 'MyOutput.txt')
However, when run I receive this error:
AttributeError: 'str' object has no attribute 'get'
Python is case-sensitive.
Item = json.loads(line) # variable "Item"
my_dict['Post Content'] = item.get('content_text') # another variable "item"
By the way, why don't you load whole file as json at once?

open a .json file with multiple dictionaries

I have a problem that I can't solve with python, it is probably very stupid but I didn't manage to find the solution by myself.
I have a .json file where the results of a simulation are stored. The result is stored as a series of dictionaries like
{"F_t_in_max": 709.1800264942982, "F_t_out_max": 3333.1574129603068, "P_elec_max": 0.87088836042046958, "beta_max": 0.38091242406098391, "r0_max": 187.55175182942901, "r1_max": 1354.8636763521174, " speed ": 8}
{"F_t_in_max": 525.61428305710433, "F_t_out_max": 2965.0538075438467, "P_elec_max": 0.80977406754203796, "beta_max": 0.59471606595464666, "r0_max": 241.25371753877008, "r1_max": 688.61786996066826, " speed ": 9}
{"F_t_in_max": 453.71124051199763, "F_t_out_max": 2630.1763649193008, "P_elec_max": 0.64268078173342935, "beta_max": 1.0352896471221695, "r0_max": 249.32706230502498, "r1_max": 709.11415981343885, " speed ": 10}
I would like to open the file and and access the values like to plot "r0_max" as function of "speed" but I can't open unless there is only one dictionary.
I use
with open('./results/rigid_wing_opt.json') as data_file:
data = json.load(data_file)
but When the file contains more than one dictionary I get the error
ValueError: Extra data: line 5 column 1 - line 6 column 1 (char 217 - 431)
If your input data is exactly as provided then you should be able to interpret each individual dictionary using json.load. If each dictionary is on its own line then this should be sufficient:
with open('filename', 'r') as handle:
json_data = [json.loads(line) for line in handle]
I would recommend reading the file line-by-line and convert each line independently to a dictionary.
You can place each line into a list with the following code:
import ast
# Read all lines into a list
with open(fname) as f:
content = f.readlines()
# Convert each list item to a dict
content = [ ast.literal_eval( line ) for line in content ]
Or an even shorter version performing the list comprehension on the same line:
import ast
# Read all lines into a list
with open(fname) as f:
content = [ ast.literal_eval( l ) for l in f.readlines() ]
{...} {...} is not proper json. It is two json objects separated by a space. Unless you can change the format of the input file to correct this, I'd suggest you try something a little different. If the data is a simple as in your example, then you could do something like this:
with open('filename', 'r') as handle:
text_data = handle.read()
text_data = '[' + re.sub(r'\}\s\{', '},{', text_data) + ']'
json_data = json.loads(text_data)
This should work even if your dictionaries are not on separate lines.
That is not valid JSON. You can't have multiple obje at the top level, without surrounding them by a list and inserting commas between them.

Categories