Say that I have a JSON file whose structure is either unknown or may change overtime - I want to replace all values of "REPLACE_ME" with a string of my choice in Python.
Everything I have found assumes I know the structure. For example, I can read the JSON in with json.load and walk through the dictionary to do replacements then write it back. This assumes I know Key names, structure, etc.
How can I replace ALL of a given string value in a JSON file with something else?
This function recursively replaces all strings which equal the value original with the value new.
This function works on the python structure - but of course you can use it on a json file - by using json.load
It doesn't replace keys in the dictionary - just the values.
def nested_replace( structure, original, new ):
if type(structure) == list:
return [nested_replace( item, original, new) for item in structure]
if type(structure) == dict:
return {key : nested_replace(value, original, new)
for key, value in structure.items() }
if structure == original:
return new
else:
return structure
d = [ 'replace', {'key1': 'replace', 'key2': ['replace', 'don\'t replace'] } ]
new_d = nested_replace(d, 'replace', 'now replaced')
print(new_d)
['now replaced', {'key1': 'now replaced', 'key2': ['now replaced', "don't replace"]}]
I think there's no big risk if you want to replace any key or value enclosed with quotes (since quotes are escaped in json unless they are part of a string delimiter).
I would dump the structure, perform a str.replace (with double quotes), and parse again:
import json
d = { 'foo': {'bar' : 'hello'}}
d = json.loads(json.dumps(d).replace('"hello"','"hi"'))
print(d)
result:
{'foo': {'bar': 'hi'}}
I wouldn't risk to replace parts of strings or strings without quotes, because it could change other parts of the file. I can't think of an example where replacing a string without double quotes can change something else.
There are "clean" solutions like adapting from Replace value in JSON file for key which can be nested by n levels but is it worth the effort? Depends on your requirements.
Why not modify the file directly instead of treating it as a JSON?
with open('filepath') as f:
lines = f.readlines()
for line in lines:
line = line.replace('REPLACE_ME', 'whatever')
with open('filepath_new', 'a') as f:
f.write(line)
You could load the JSON file into a dictionary and recurse through that to find the proper values but that's unnecessary muscle flexing.
The best way is to simply treat the file as a string and do the replacements that way.
json_file = 'my_file.json'
with open(json_file) as f:
file_data = f.read()
file_data = file_data.replace('REPLACE_ME', 'new string')
<...>
with open(json_file, 'w') as f:
f.write(file_data)
json_data = json.loads(file_data)
From here the file can be re-written and you can continue to use json_data as a dict.
Well that depends, if you want to place all the strings entitled "REPLACE_ME" with the same string you can use this. The for loop loops through all the keys in the dictionary and then you can use the keys to select each value in the dictionary. If it is equal to your replacement string it will replace it with the string you want.
search_string = "REPLACE_ME"
replacement = "SOME STRING"
test = {"test1":"REPLACE_ME", "test2":"REPLACE_ME", "test3":"REPLACE_ME", "test4":"REPLACE_ME","test5":{"test6":"REPLACE_ME"}}
def replace_nested(test):
for key,value in test.items():
if type(value) is dict:
replace_nested(value)
else:
if value==search_string:
test[key] = replacement
replace_nested(test)
print(test)
To solve this problem in a dynamic way, I have obtained to use the same json file to declare the variables that we want to replace.
Json File :
{
"properties": {
"property_1": "value1",
"property_2": "value2"
},
"json_file_content": {
"key_to_find": "{{property_1}} is my value"
"dict1":{
"key_to_find": "{{property_2}} is my other value"
}
}
Python code (references Replace value in JSON file for key which can be nested by n levels):
import json
def fixup(self, a_dict:dict, k:str, subst_dict:dict) -> dict:
"""
function inspired by another answers linked below
"""
for key in a_dict.keys():
if key == k:
for s_k, s_v in subst_dict.items():
a_dict[key] = a_dict[key].replace("{{"+s_k+"}}",s_v)
elif type(a_dict[key]) is dict:
fixup(a_dict[key], k, subst_dict)
# ...
file_path = "my/file/path"
if path.exists(file_path):
with open(file_path, 'rt') as f:
json_dict = json.load(f)
fixup(json_dict ["json_file_content"],"key_to_find",json_dict ["properties"])
print(json_dict) # json with variables resolved
else:
print("file not found")
Hope it helps
Related
I have a text file something.txt holds data like :
sql_memory: 300
sql_hostname: server_name
sql_datadir: DEFAULT
i have a dict parameter={"sql_memory":"900", "sql_hostname":"1234" }
I need to replace the values of paramter dict into the txt file , if parameters keys are not matching from keys in txt file then values in txt should left as it is .
For example, sql_datadir is not there in parameter dict . so, no change for the value in txt file.
Here is what I have tried :
import json
def create_json_file():
with open(something.txt_path, 'r') as meta_data:
lines = meta_data.read().splitlines()
lines_key_value = [line.split(':') for line in lines]
final_dict = {}
for lines in lines_key_value:
final_dict[lines[0]] = lines[1]
with open(json_file_path, 'w') as foo:
json.dumps(final_dict,foo, indent=4)
def generate_server_file(parameters):
create_json_file()
with open(json_file_path, 'r') as foo:
server_json_data = json.load(foo)
for keys in parameters:
if keys not in server_json_data:
raise KeyError("Cannot find keys")
# Need to update the paramter in json file
# and convert json file into txt again
x={"sql_memory":"900", "sql_hostname":"1234" }
generate_server_file(x)
Is there a way I can do this without converting the txt file into a JSON ?
Expected output file(something.txt) :
sql_memory: 900
sql_hostname: 1234
sql_datadir: DEFAULT
Using Python 3.6
If you want to import data from a text file use numpy.genfromtxt.
My Code:
import numpy
data = numpy.genfromtxt("something.txt", dtype='str', delimiter=';')
print(data)
something.txt:
Name;Jeff
Age;12
My Output:
[['Name' 'Jeff']
['Age' '12']]
It`s very useful and I use it all of the time.
If your full example is using Python dict literals, a way to do this would be to implement a serializer and a deserializer. Since yours closely follows object literal syntax, you could try using ast.literal_eval, which safely parses a literal from a string. Notice, it will not handle variable names.
import ast
def split_assignment(string):
'''Split on a variable assignment, only splitting on the first =.'''
return string.split('=', 1)
def deserialize_collection(string):
'''Deserialize the collection to a key as a string, and a value as a dict.'''
key, value = split_assignment(string)
return key, ast.literal_eval(value)
def dict_doublequote(dictionary):
'''Print dictionary using double quotes.'''
pairs = [f'"{k}": "{v}"' for k, v in dictionary.items()]
return f'{{{", ".join(pairs)}}}'
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={dict_doublequote(value)}'
And example using the data above produces:
>>> data = 'parameter={"sql_memory":"900", "sql_hostname":"1234" }'
>>> key, value = deserialize_collection(data)
>>> key, value
('parameter', {'sql_memory': '900', 'sql_hostname': '1234'})
>>> serialize_collection(key, value)
'parameter={"sql_memory": "900", "sql_hostname": "1234"}'
Please note you'll probably want to use JSON.dumps rather than the hack I implemented to serialize the value, since it may incorrectly quote some complicated values. If single quotes are fine, a much more preferable solution would be:
def serialize_collection(key, value):
'''Serialize the collection to a string'''
return f'{key}={str(value)}'
I'm importing data from a text file, and then made a dictionary out of that. I'm now trying to make a separate one, with the entries that have the same value only. Is that possible?
Sorry if that's a little confusing! But basically, the text file looks like this:
"Andrew", "Present"
"Christine", "Absent"
"Liz", "Present"
"James", "Present"
I made it into a dictionary first, so I could group them into keys and values, and now I'm trying to make a list of the people who were 'present' only (I don't want to delete the absent ones, I just want a separate list), and then pick one from that list randomly.
This is what I tried:
d = {}
with open('directory.txt') as f:
for line in f:
name, attendance = line.strip().split(',')
d[name.strip()] = attendance.strip()
present_list = []
present_list.append({"name": str(d.keys), "attendance": "Present"})
print(random.choice(present_list))
When I tried running it, I only get:
{'name': '<built-in method keys of dict object at 0x02B26690>', 'attendance': 'Present'}
Which part should I change? Thank you so much in advance!
You can try this:
present_list = [key for key in d if d[key] == "Present"]
first, you have to change the way you the read lines than you can have in your initial dict as key the attendence :
from collections import defaultdict
d = defaultdict(list)
with open('directory.txt') as f:
for line in f.readlines():
name, attendance = line.strip().split(',')
d[attendance.strip()].append(name.strip())
present_list = d["Present"]
print(random.choice(present_list) if present_list else "All absent")
Dict.Keys is a method, not a field. So you must instead do:
d.keys()
This returns an array generator: if you want a comma separated list with square brackets, just calling str() on it is ok. If you want a different formatting, consider ','.join(dict.keys()) to do a simple comma separated list with no square brackets.
UPDATE:
You also have no filtering in place, instead I'd try something like this, where you grab the list of statuses and then compile (new code in BOLD):
d = {}
with open('directory.txt') as f:
for line in f:
name, attendance = line.strip().split(',')
**if name.strip() not in d.keys():
d[attendance.strip()] = [name.strip()]
else:
d[attendance.strip()] = d[attendance.strip()].append(name.strip())**
This way you don't need to go through all those intermediate steps, and you will have something like {"present": "Andrew, Liz, James"}
How can I convert this text file to json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json.
I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python).
Example of a file:
file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5
Example of dict I want to build to convert to json:
{ "file1" : {
"namespace1" : {
"metric_A" : "value_1",
"metric_B" : "value_2"
},
"namespace2" : {
"metric_A" : "value_3",
"metric_B" : ["value4", "value5"]
}
}
I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). I'm basically going line by line through the file, building a python dict. I check each namespace for existence in the dict, if it exists, i check the metric. If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). There has to be a more simple/clean way.
import glob
import json
answer = {}
for fname in glob.glob(file_*.txt): # loop over all filenames
answer[fname] = {}
with open(fname) as infile:
for line in infile:
line = line.strip()
if not line: continue
splits = line.split()[::2]
splits[0] = splits[0][1:-1]
namespace, metric, value = splits # all the values in the line that we're interested in
answer[fname].get(namespace, {})[metric] = value # populate the dict
required_json = json.dumps(answer) # turn the dict into proper JSON
You can use regex for that. re.findall('\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. The simplest way to do that is to use defaultdict from collections.
import re
from collections import defaultdict
answer = defaultdict(lambda: defaultdict(lambda: []))
with open('file_1.txt', 'r') as f:
for line in f:
namespace, metric, value = re.findall(r'\w+', line)
answer[namespace][metric].append(value)
As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, i.e. namespace, metric, value. Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact.
I have several very large not quite csv log files.
Given the following conditions:
value fields have unescaped newlines and commas, almost anything can be in the value field including '='
each valid line has an unknown number of valid value fields
valid value looks like key=value such that a valid line looks like key1=value1, key2=value2, key3=value3 etc.
the start of each valid line should begin with eventId=<some number>,
What is the best way to read a file, split the file into correct lines and then parse each line into correct key value pairs?
I have tried
file_name = 'file.txt'
read_file = open(file_name, 'r').read().split(',\neventId')
This correctly parses the first entry but all other entries starts with =# instead of eventId=#. Is there a way to keep the deliminator and split on the valid newline?
Also, speed is very important.
Example Data:
eventId=123, key=value, key2=value2:
this, will, be, a problem,
maybe?=,
anotherkey=anothervalue,
eventId=1234, key1=value1, key2=value2, key3=value3,
eventId=12345, key1=
msg= {this is not a valid key value pair}, key=value, key21=value=,
Yes the file really is this messy (sometimes) each event here has 3 key value pairs although in reality there is an unknown number of key value pairs in each event.
This problem is pretty insane, but here's a solution that seems to work. Always use an existing library to output formatted data, kids.
import re;
in_string = """eventId=123, goodkey=goodvalue, key2=somestuff:
this, will, be, a problem,
maybe?=,
anotherkey=anothervalue, gotit=see,
the problem===s,
eventId=1234, key1=value1, key2=value2, key3=value3,
eventId=12345, key1=
msg= {this is not a valid key value pair}, validkey=validvalue,"""
line_matches = list(re.finditer(r'(,\n)?eventId=\d', in_string))
lines = []
for i in range(len(line_matches)):
match_start = line_matches[i].start()
next_match_start = line_matches[i+1].start() if i < len(line_matches)-1 else len(in_string)-1
line = in_string[match_start:next_match_start].lstrip(',\n')
lines.append(line)
lineDicts = []
for line in lines:
d = {}
pad_line = ', '+line
matches = list(re.finditer(r', [\w\d]+=', pad_line))
for i in range(len(matches)):
match = matches[i]
key = match.group().lstrip(', ').rstrip('=')
next_match_start = matches[i+1].start() if i < len(matches)-1 else len(pad_line)
value = pad_line[match.end():next_match_start]
d[key] = value
lineDicts.append(d)
print lineDicts
Outputs [{'eventId': '123', 'key2': 'somestuff:\nthis, will, be, a problem,\nmaybe?=,\nanotherkey=anothervalue', 'goodkey': 'goodvalue', 'gotit': 'see,\nthe problem===s'}, {'eventId': '1234', 'key2': 'value2', 'key1': 'value1', 'key3': 'value3'}, {'eventId': '12345', 'key1': '\nmsg= {this is not a valid key value pair}', 'validkey': 'validvalue'}]
If the start of each valid line should begin with eventId= is correct, you can groupby those lines and find valid pairs with a regex:
from itertools import groupby
import re
with open("test.txt") as f:
r = re.compile("\w+=\w+")
grps = groupby(f, key=lambda x: x.startswith("eventId="))
d = dict(l.split("=") for k, v in grps if k
for l in r.findall(next(v))[1:])
print(d)
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1', 'goodkey': 'goodvalue'}
If you want to keep the eventIds:
import re
with open("test.txt") as f:
r = re.compile("\w+=\w+")
grps = groupby(f, key=lambda x: x.startswith("eventId="))
d = list(r.findall(next(v)) for k, v in grps if k)
print(d)
[['eventId=123', 'goodkey=goodvalue', 'key2=somestuff'], ['eventId=1234', 'key1=value1', 'key2=value2', 'key3=value3']]
Not clear from your description exactly what the output should be, if you want all the valids key=value pairs and if the start of each valid line should begin with eventId= is not accurate:
from itertools import groupby,chain
import re
def parse(fle):
with open(fle) as f:
r = re.compile("\w+=\w+")
grps = groupby(f, key=lambda x: x.startswith("eventId="))
for k, v in grps:
if k:
sub = "".join((list(v)) + list(next(grps)[1]))
yield from r.findall(sub)
print(list(parse("test.txt")))
Output:
['eventId=123', 'key=value', 'key2=value2', 'anotherkey=anothervalue',
'eventId=1234', 'key1=value1', 'key2=value2', 'key3=value3',
'eventId=12345', 'key=value', 'key21=value']
If your values are can really contain anything, there's no unambiguous way of parsing. Any key=value pair could be part of the preceding value. Even a eventID=# pair on a new line could be part of a value from the previous line.
Now, perhaps you can do a "good enough" parse on the data despite the ambiguity, if you assume that values will never contain valid looking key= substrings. If you know the possible keys (or at least, what constraints they have, like being alphanumeric), it will be a lot easier to guess at what is a new key and what is just part of the previous value.
Anyway, if we assume that all alphanumeric strings followed by equals signs are indeed keys, we can do a parse with regular expressions. Unfortunately, there's no easy way to do this line by line, nor is there a good way to capture all the key-value pairs in a single scan. However, it's not too hard to scan once to get the log lines (which may have embedded newlines) and then separately get the key=value, pairs for each one.
with open("my_log_file") as infile:
text = infile.read()
line_pattern = r'(?S)eventId=\d+,.*?(?:$|(?=\neventId=\d+))'
kv_pattern = r'(?S)(\w+)=(.*?),\s*(?:$|(?=\w+=))'
results = [re.findall(kv_pattern, line) for line in re.findall(line_pattern, text)]
I'm assuming that the file is small enough to fit into memory as a string. It would be quite a bit more obnoxious to solve the problem if the file can't all be handled at once.
If we run this regex matching on your example text, we get:
[[('eventId', '123'), ('key', 'value'), ('key2', 'value2:\nthis, will, be, a problem,\nmaybe?='), ('anotherkey', 'anothervalue')],
[('eventId', '1234'), ('key1', 'value1'), ('key2', 'value2'), ('key3', 'value3')],
[('eventId', '12345'), ('key1', '\nmsg= {this is not a valid key value pair}'), ('key', 'value'), ('key21', 'value=')]]
maybe? is not considered a key because of the question mark. msg and the final value are not considered keys because there were no commas separating them from a previous value.
Oh! This is an interesting problem, you'll want to process each line and part of line separately without iterating though the file more than once.
data_dict = {}
file_lines = open('file.txt','r').readlines()
for line in file_lines:
line_list = line.split(',')
if len(line_list)>=1:
if 'eventId' in line_list[0]:
for item in line_list:
pair = item.split('=')
data_dict.update({pair[0]:pair[1]})
That should do it. Enjoy!
If there are spaces in the 'pseudo csv' please change the last line to:
data_dict.update({pair[0].split():pair[1].split()})
In order to remove spaces from the strings for your key and value.
p.s. If this answers your question, please click the check mark on the left to record this as an accepted answer. Thanks!
p.p.s. A set of lines from your actual data would be very helpful in writing something to avoid error cases.
Here is how I dump a file
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()), fp)
The results is
"a,b,c"
I would like:
a,b,c
Thanks
Before doing a string replace, you might want to strip the quotation marks:
print '"a,b,c"'.strip('"')
Output:
a,b,c
That's closer to what you want to achieve. Even just removing the first and the last character works: '"a,b,c"'[1:-1].
But have you looked into this question?
To remove the quotation marks in the keys only, which may be important if you are parsing it later (presumably with some tolerant parser or maybe you just pipe it directly into node for bizarre reasons), you could try the following regex.
re.sub(r'(?<!: )"(\S*?)"', '\\1', json_string)
One issue is that this regex expects fields to be seperated key: value and it will fail for key:value. You could make it work for the latter with a minor change, but similarly it won't work for variable amounts of whitespace after :
There may be other edge cases but it will work with outputs of json.dumps, however the results will not be parseable by json. Some more tolerant parsers like yaml might be able to read the results.
import re
regex = r'(?<!: )"(\S*?)"'
o = {"noquotes" : 127, "put quotes here" : "and here", "but_not" : "there"}
s = json.dumps(o)
s2 = json.dumps(o, indent=3)
strip_s = re.sub(regex,'\\1',s)
strip_s2 = re.sub(regex,'\\1',s2)
print(strip_s)
print(strip_s2)
assert(json.loads(strip_s) == json.loads(s) == json.loads(strip_s2) == json.loads(s2) == object)
Will raise a ValueError but prints what you want.
Well, that's not valid json, so the json module won't help you to write that data. But you can do this:
import json
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(json.dumps(','.join(data)).replace('"', ''))
That's because you asked for json, but since that's not json, this should suffice:
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(','.join(data))
Use python's built-in string replace function
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()).replace('\"',''), fp)
Just use for loop to assign list to string.
import json
with open('json_file') as f:
data = json.loads(f.read())
for value_wo_bracket in data['key_name']:
print(value_wo_bracket)
Note there is difference between json.load and json.loads