Im trying to import a file which contains lines like this:
{ "dictitem" : 1, "anotherdictitem" : 2 }
I want to import them in to a list of dictionaries like this:
[{ "dictitem" : "henry", "anotherdictitem" : 2 },{ "dictitem" : "peter", "anotherdictitem" : 4 },{ "dictitem" : "anna", "anotherdictitem" : 6 }]
I tried this: tweetlist = open("sample.out").readlines()
But then they get appended as a string. Does anyone have an idea?
Thanks!
You need to decode each line using json. Example:
import json
list = []
with open("data.txt", 'r') as file:
for line in file:
dict = json.loads(line)
list.append(dict)
print(list)
You can use the AST library's literal_eval function on each line by using a list comprehension like this:
import ast
tweetlist = [ast.literal_eval(x) for x in open("sample.out").readlines()]
ast.literal_eval is a safer version of the eval function that doesn't execute functions.
Related
I am trying to convert an xml to JSON (condensed version of the code is provided below).
The issue I am facing is with a tag, which can have multiple values (example below). I cannot directly make it as dict, since the key (NAME) can have multiple values. The output generated by the code Vs the expected output is given below.
python script:
import json
mylist = ['"Event" : "BATCHS01-wbstp01"', '"Event" : "BATCHS01-wbstrt01"']
tmpdict = {}
tmpdict['Events'] = mylist
with open('test.json','w') as fp:
json.dump(tmpdict,fp,indent=4, sort_keys=False)
Output Generated:
{
"Events": [
"\"Event\" : \"BATCHS01-wbstp01\"",
"\"Event\" : \"BATCHS01-wbstrt01\""
]
}
Expected Output:
{
"Events": [
{"Event" : "BATCHS01-wbstp01"},
{"Event" : "BATCHS01-wbstrt01"}
]
}
The issue is that your mylist is an array of strings rather than an array of map objects.
You need to remove the outer quote to make it:
mylist = [{"Event" : "BATCHS01-wbstp01"}, {"Event" : "BATCHS01-wbstrt01"}]
I don't see why you cannot produce this structure from XML. It's rather simple regardless of whether 'key (NAME) can have multiple values'.
You can salvage your data by first converting it to valid JSON piecewise and then dumping the JSON into a string or a file:
tmpdict = {"Events" : [json.loads('{' + item + '}') for item in mylist]}
json.dumps(tmpdict)
'{"Events": [{"Event": "BATCHS01-wbstp01"}, {"Event": "BATCHS01-wbstrt01"}]}'
Code:
You can first convert the XML pieces to dict's like:
tmpdict['Events'] = [json.loads('{%s}' % x) for x in mylist]
Test Code:
import json
mylist = ['"Event" : "BATCHS01-wbstp01"', '"Event" : "BATCHS01-wbstrt01"']
tmpdict = {}
tmpdict['Events'] = [json.loads('{%s}' % x) for x in mylist]
with open('test.json', 'w') as fp:
json.dump(tmpdict, fp, indent=4, sort_keys=False)
Results:
{
"Events": [
{
"Event": "BATCHS01-wbstp01"
},
{
"Event": "BATCHS01-wbstrt01"
}
]
}
How can I convert this text file to json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json.
I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python).
Example of a file:
file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5
Example of dict I want to build to convert to json:
{ "file1" : {
"namespace1" : {
"metric_A" : "value_1",
"metric_B" : "value_2"
},
"namespace2" : {
"metric_A" : "value_3",
"metric_B" : ["value4", "value5"]
}
}
I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). I'm basically going line by line through the file, building a python dict. I check each namespace for existence in the dict, if it exists, i check the metric. If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). There has to be a more simple/clean way.
import glob
import json
answer = {}
for fname in glob.glob(file_*.txt): # loop over all filenames
answer[fname] = {}
with open(fname) as infile:
for line in infile:
line = line.strip()
if not line: continue
splits = line.split()[::2]
splits[0] = splits[0][1:-1]
namespace, metric, value = splits # all the values in the line that we're interested in
answer[fname].get(namespace, {})[metric] = value # populate the dict
required_json = json.dumps(answer) # turn the dict into proper JSON
You can use regex for that. re.findall('\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. The simplest way to do that is to use defaultdict from collections.
import re
from collections import defaultdict
answer = defaultdict(lambda: defaultdict(lambda: []))
with open('file_1.txt', 'r') as f:
for line in f:
namespace, metric, value = re.findall(r'\w+', line)
answer[namespace][metric].append(value)
As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, i.e. namespace, metric, value. Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact.
I'm trying to figure out was is the best way to go about this problem:
I'm reading text lines from a certain buffer that eventually creates a certain log that looks something like this:
Some_Information: here there's some information about date and hour
Additional information: log summary #1234:
details {
name: "John Doe"
address: "myAdress"
phone: 01234567
}
information {
age: 30
height: 1.70
weight: 70
}
I would like to get all the fields in this log to a dictionary which I can later turn into a json file, the different sections in the log are not important so for example if myDictionary is a dictionary variable in python I would like to have:
> myDictionary['age']
will show me 30.
and the same for all other fields.
Speed is very important here that's why I would like to just go through every line once and get it in a dictionary
My way about doing this would be to for each line that contains ":" colon I would split the string and get the key and the value in the dictionary.
is there a better way to do it?
Is there any python module that would be sufficient?
If more information is needed please let me know.
Edit:
So I've tried something that to me look to work best so far,
I am currently reading from a file to simulate the reading of the buffer
My code:
import json
import shlex
newDict = dict()
with open('log.txt') as f:
for line in f:
try:
line = line.replace(" ", "")
stringSplit = line.split(':')
key = stringSplit[0]
value = stringSplit[1]
value = shlex.split(value)
newDict[key] = value[0]
except:
continue
with open('result.json', 'w') as fp:
json.dump(newDict, fp)
Resulting in the following .json:
{"name": "JohnDoe", "weight": "70", "Additionalinformation": "logsummary#1234",
"height": "1.70", "phone": "01234567", "address": "myAdress", "age": "30"}
You haven't described exactly what the desired output should be from the sample input, so it's not completely clear what you want done. So I guessed and the following only extracts data values from lines following one that contains a '{' until one with a '}' in it is encountered, while ignoring others.
It uses the re module to isolate the two parts of each dictionary item definition found on the line, and then uses the ast module to convert the value portion of that into a valid Python literal (i.e. string, number, tuple, list, dict, bool, and None).
import ast
import json
import re
pat = re.compile(r"""(?P<key>\w+)\s*:\s*(?P<value>.+)$""")
data_dict = {}
with open('log.txt', 'rU') as f:
braces = 0
for line in (line.strip() for line in f):
if braces > 0:
match = pat.search(line)
if match and len(match.groups()) == 2:
key = match.group('key')
value = ast.literal_eval(match.group('value'))
data_dict[key] = value
elif '{' in line:
braces += 1
elif '}' in line:
braces -= 1
else:
pass # ignore line
print(json.dumps(data_dict, indent=4))
Output from your example input:
{
"name": "John Doe",
"weight": 70,
"age": 30,
"height": 1.7,
"phone": 342391,
"address": "myAdress"
}
If I have dictionary like:
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
And try to save it to a text file, I get something like this:
{"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
How can I save a dictionary in pretty format, so it will be easy to read by human eye?
You can import json and specify an indent level:
import json
d = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
j = json.dumps(d, indent=4)
print(j)
{
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
Note that this is a string, however:
>>> j
'{\n "cats": {\n "sphinx": 3, \n "british": 2\n }, \n "dogs": {}\n}'
You can use pprint for that:
import pprint
pprint.pformat(thedict)
If you want to save it in a more standard format, you can also use, for example, a yaml file (and the related python package http://pyyaml.org/wiki/PyYAMLDocumentation), and the code would look like:
import yaml
dictionary = {"cats": {"sphinx": 3}, {"british": 2}, "dogs": {}}
with open('dictionary_file.yml', 'w') as yaml_file:
yaml.dump(dictionary, stream=yaml_file, default_flow_style=False)
dump creates a string in the yaml format to be written to the file. Note that it is possible to specify the stream and write the content immediately to the file. If it is necessary to get the string for some reason before writing to the file, just don't specify it and write it after using write function for the file.
Note also that the parameter default_flow_style allows to have a nicer format; in the example the file looks:
cats:
british: 2
sphinx: 3
dogs: {}
To load again the yaml file in a dictionary:
import yaml
with open('dictionary_file.yml', 'r') as yaml_file:
dictionary = yaml.load(yaml_file)
You can dump it by using the Python Object Notation module (pon: disclaimer I am the author of that module)
from pon import PON, loads
data = {
"cats": {
"sphinx": 3,
"british": 2
},
"dogs": {}
}
pon = PON(obj=data)
pon.dump()
which gives:
dict(
cats=dict(
sphinx=3,
british=2,
),
dogs=dict( ),
)
which again is correct Python, but trading the quoted strings needed for keys by using dict .
You can load this again with:
read_back = loads(open('file_name.pon').read())
print(read_back)
giving:
{'cats': {'sphinx': 3, 'british': 2}, 'dogs': {}}
Please note that loads() does not evaluate the string, it actually parses it safely using python's built-in parser.
PON also allows you to load python dictionaries from files, that have commented entries, and dump them while preserving the comments. This is where it's real usefulness comes into action.
Alternatively, if you would like something, arbitrarily more readable like the YAML format, you can use ruamel.yaml and do:
import ruamel.yaml
ruamel.yaml.round_trip_dump(data, stream=open('file_name.yaml', 'wb'), indent=4)
which gives you a file file_name.yaml with contents:
cats:
sphinx: 3
british: 2
dogs: {}
which uses the indent you seem to prefer (and is more efficient than #alberto's version)
I have a txt file that contains a dictionary in Python and I have opened it in the following manner:
with open('file') as f:
test = list(f)
The result when I look at test is a list of one element. This first element is a string of the dictionary (which also contains other dictionaries), so it looks like:
["ID": 1, date: "2016-01-01", "A": {name: "Steve", id: "534", players:{last: "Smith", first: "Joe", job: "IT"}}
Is there any way to store this as the dictionary without having to find a way to determine the indices of the characters where the different keys and corresponding values begin/end? Or is it possible to read in the file in a way that recognizes the data as a dictionary?
If you are reading a json file then you can use the json module.
import json
with open('data.json') as f:
data = json.load(f)
If you are sure that the file you are reading contains python dictionaries, then you can use the built-in ast.literal_eval to convert those strings to a Python dictionary:
>>> import ast
>>> a = ast.literal_eval("{'a' : '1', 'b' : '2'}")
>>> a
{'a': '1', 'b': '2'}
>>> type(a)
<type 'dict'>
There is an alternative method, eval. But using ast.literal_eval would be better. This answer will explain why.
You can use json module
for Writing
import json
data = ["ID": 1, date: "2016-01-01", "A": {name: "Steve", id: "534", players:{last: "Smith", first: "Joe", job: "IT"}}
with open("out.json", "w") as f:
json.dump(data)
for Reading
import json
with open("out.json", "w") as f:
data = json.load(f)
print data
Just use eval() when you read it from your file.
for example:
>>> f = open('file.txt', 'r').read()
>>> mydict = eval(f)
>>> type(f)
<class 'str'>
>>> type(mydict)
<class 'dict'>
The Python interpreter thinks you are just trying to read an external file as text. It does not know that your file contains formatted content. One way to import easily as a dictionary be to write a second python file that contains the dictionary:
# mydict.py
myImportedDict = {
"ID": 1,
"date": "2016-01-01",
"A": {
"name": "Steve",
"id": "534",
"players": {
"last": "Smith",
"first": "Joe",
"job": "IT"
}
}
}
Then, you can import the dictionary and use it in another file:
#import_test.py
from mydict import myImportedDict
print(type(myImportedDict))
print(myImportedDict)
Python also requires that folders containing imported files also contain a file called
__init__.py
which can be blank. So, create a blank file with that name in addition to the two files above.
If your source file is meant to be in JSON format, you can use the json library instead, which comes packaged with Python: https://docs.python.org/2/library/json.html