Suppose I have a config file like this where I list a bunch of values. I am running a function in which I am checking that a set of strings will always begin with one of these defined values.
start_values = [
"cats",
"dogs",
"birds",
"horses"
]
And I also have a json file on which I want run unit tests on to make sure that my function is running properly, like this.
{
"sentence_tests": [
"horses eat grass.",
"birds fly high.",
"cats like to nap.",
"dogs are cool."
]
}
However, the problem I am facing is that if I want to change one of my start_values to be something else, I want to also update my json file for that specific value. For example, if I change "dogs" to "cows", I want that to update automatically in my json file instead of having to do that manually.
So this is how I would want it to be after I change the start_values:
Modified start_values:
start_values = [
"cats",
"cows",
"birds",
"horses"
]
Modified json file:
{
"sentence_tests": [
"horses eat grass.",
"birds fly high.",
"cats like to nap.",
"cows are cool."
]
}
Is there a way to do this in python?
import json
with open(filename, "rt") as f:
sentences = json.loads(f.read())
for i, value in enumerate(starts_values):
words = sentences["sentences_tests"][i].split()
if words[0] != value:
words[0] = value
words = " ".join(words)
sentences["sentences_tests"][i] = words
with open(filename, "wt") as f:
f.write(json.dumps(sentences, indent=4))
Yes,
you can easily load a JSON object using the json python library. It will transform into a python dictionary. All you have to do after this is rewrite the file.
Related
So I have a dictionary with a bunch of words as keys and their defination as values.
E.g., word_list.txt
words = {
happy: "feeling or showing pleasure or contentment.",
apple: "the round fruit which typically has thin green or red skin and crisp flesh.",
today: "on or in the course of this present day."
faeces: "waste matter remaining after food has been digested, discharged from the bowels; excrement."
}
How do I print a random word from the dictionary that is in the text file on Python?
You need to open that file in your code, load it with json library and then you can do any random operation.
To load your file you have to properly add the , to the end of elements.
Also, since your file have a 'words = ' before the keys, you need to split it. You also need to replace single quotes with double:
import json, random
with open('word_list.txt', 'r') as file:
file_text = file.read()
words = json.loads(file_text.split(' = ')[1].replace("'", '"'))
random_word = random.choice(list(words))
print(random_word)
random.choice() will pick a random element from a list. Therefore you just need to pass your dict as a list to it as param. random.choice(list(your_dict))
EDIT: op has edited his question removing the single quotes from every key in his word_list.txt sample. This code will only work if that keys are single or double quoted.
First, you will need to fix your txt file. This could also be a json file but to make it a json file you will need to modify the code. But for the future json is the proper way to do this. You need to remove words =. You also need to put your keys(apple, today, those words) in quotes. Here is the fixed file:
{
"happy": "feeling or showing pleasure or contentment.",
"apple": "the round fruit which typically has thin green or red skin and crisp flesh.",
"today": "on or in the course of this present day.",
"faeces": "waste matter remaining after food has been digested, discharged from the bowels; excrement."
}
Here is some code to do it.
#Nessasary imports.
import json, random
#Open the txt file.
words_file = open("words.txt", "r")
#Turn the data from the file into a string.
words_string = words_file.read()
#Covert the string into json so we can use the data easily.
words_json = json.loads(words_string)
#This gets the values of each item in the json dictionary. It removes the "apple" or whatever it is for that entry.
words_json_values = words_json.values()
#Turns it into a list that python can use.
words_list = list(words_json_values)
#Gets a random word from the list.
picked_word = random.choice(words_list)
#prints is so we can see it.
print(picked_word)
If you want it all on the same line here you go.
#Nessasary imports.
import json, random
#The code to do it.
print(random.choice(list(json.loads(open("words.txt", "r").read()).values())))
I need some help parsing JSON file. I've tried a couple of different ways to get the data I need. Below is a sample of the code and also a section of the JSON data but when I run the code I get the error listed above.
There's 500K lines of text in the JSON and it first fails about about 1400 lines in and I can't see anything in that area section to indicate why.
I've run it successfully by only checking blocks of JSON up to the first 1400 lines and I've used a different parser and got the same error.
I'm debating if it's an error in the code, an error in the JSON or a result of the JSON being made of different kids of data as some (like the example below) is for a forklift and others for fixed machines but it is all structured just like below.
All help sincerely appreciate.
Code:
import json
file_list = ['filename.txt'] #insert filename(s) here
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_dict = json.load(f)
#list the headlines to be parsed
for distro in distros_dict:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
And here is a section of the JSON:
{
"id": "b4994c877c9c",
"name": "Trukki_0001",
"areaId": "Tracking001",
"areaName": "Ajoneuvo",
"color": "#FF0000",
"coordinateSystemId": "CoordSys001",
"coordinateSystemName": null,
"covarianceMatrix": [
0.47,
0.06,
0.06,
0.61
],
"position": [
33.86,
33.07,
2.15
],
"positionAccuracy": 0.36,
"positionTS": 1489363199493,
"smoothedPosition": [
33.96,
33.13,
2.15
],
"zones": [
{
"id": "Zone001",
"name": "Halli1"
}
],
"direction": [
0,
0,
0
],
"collisionId": null,
"restrictedArea": "",
"tagType": "VEHICLE_MANNED",
"drivenVehicleId": null,
"drivenByEmployeeIds": null,
"simpleXY": "33|33",
"EventProcessedUtcTime": "2017-03-13T00:00:00.3175072Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2017-03-13T00:00:00.0470000Z"
}
The actual problem was that the JSON file was coded in UTF not ASCII. If you change the encoding using something like notepad++ then it will be solved.
Using the file provided I got it to work by changing "distros_dict" to a list. In you code you assign distros_dict not add to it, so if more than 1 file were to be read it would assign it to the last one.
This is my implementation
import json
file_list = ['filename.txt'] #insert filename(s) here
distros_list = []
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_list.append(json.load(f))
#list the headlines to be parsed
for distro in distros_list:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
You will be left with a list of dictionaries
I'm guessing that your JSON is actually a list of objects, i.e. the whole stream looks like:
[
{ x:1, y:2 },
{ x:3, y:4 },
...
]
... with each element being structured like the section you provided above. This is perfectly valid JSON, and if I store it in a file named file.txt and paste your snippet between a set of [ ], thus making it a list, I can parse it in Python. Note, however, that the result will be again a Python list, not a dict, so you'd iterate like this over each list-item:
import json
import pprint
file_list = ['file.txt']
# Just iterate over the file-list like this, no need for range()
for x in file_list:
with open(x, 'r') as f:
# distros is a list!
distros = json.load(f)
for distro in distros:
print(distro['name'])
print(distro['positionTS'])
print(distro['smoothedPosition'][1])
pprint.pprint(distro)
Edit: I moved the second for-loop into the loop over the files. This seems to make more sense, as otherwise you'll iterate once over all files, store the last one in distros, then print elements only from the last one. By nesting the loops, you'll iterate over all files, and for each file iterate over all elements in the list. Hat-tip to the commenters for pointing this out!
How can I convert this text file to json? Ultimately, I'll be inserting the json blobs into a NoSQL database, but for now I plan to parse the text files and build a python dict, then dump to json.
I think there has to be a way to do this with a dict comprehension that I'm just not seeing/following (I'm new to python).
Example of a file:
file_1.txt
[namespace1] => metric_A = value1
[namespace1] => metric_B = value2
[namespace2] => metric_A = value3
[namespace2] => metric_B = value4
[namespace2] => metric_B = value5
Example of dict I want to build to convert to json:
{ "file1" : {
"namespace1" : {
"metric_A" : "value_1",
"metric_B" : "value_2"
},
"namespace2" : {
"metric_A" : "value_3",
"metric_B" : ["value4", "value5"]
}
}
I currently have this working, but my code is a total mess (and much more complex than this example w/ clean up etc). I'm basically going line by line through the file, building a python dict. I check each namespace for existence in the dict, if it exists, i check the metric. If the metric exists already, I know I have duplicates and need to convert the value to an array that contains the existing value and my new value(s). There has to be a more simple/clean way.
import glob
import json
answer = {}
for fname in glob.glob(file_*.txt): # loop over all filenames
answer[fname] = {}
with open(fname) as infile:
for line in infile:
line = line.strip()
if not line: continue
splits = line.split()[::2]
splits[0] = splits[0][1:-1]
namespace, metric, value = splits # all the values in the line that we're interested in
answer[fname].get(namespace, {})[metric] = value # populate the dict
required_json = json.dumps(answer) # turn the dict into proper JSON
You can use regex for that. re.findall('\w+', line) will find all text groups which you are after, then the rest is saving it in the dictionary of dictionary. The simplest way to do that is to use defaultdict from collections.
import re
from collections import defaultdict
answer = defaultdict(lambda: defaultdict(lambda: []))
with open('file_1.txt', 'r') as f:
for line in f:
namespace, metric, value = re.findall(r'\w+', line)
answer[namespace][metric].append(value)
As we know, that we expect exactly 3 alphanum groups, we assign it to 3 variable, i.e. namespace, metric, value. Finally, defaultdict will return defaultdict for the case when we see namespace first time, and the inner defaultdict will return an empty array for first append, making code more compact.
I made a big mistake, when I choose the way of dumping data;
Now I have a text file, that consist of
{ "13234134": ["some", "strings", ...]}{"34545345": ["some", "strings", ...]} ..so on
How can I read it into python?
edit:
I have tried json,
when I add at begin and at end of file curly-braces manually, I have "ValueError: Expecting property name:", because "13234134" string maybi invalid for json, I do not know how to avoid it.
edit1
with open('new_file.txt', 'w') as outfile:
for index, user_id in enumerate(users):
json.dump(dict = get_user_tweets(user_id), outfile)
It looks like what you have is an undelimited stream of JSON objects. As if you'd called json.dump over and over on the same file, or ''.join(json.dumps(…) for …). And, in fact, the first one is exactly what you did. :)
So, you're in luck. JSON is a self-delimiting format, which means you can read up to the end of the first JSON object, then read from there up to the end of the next JSON object, and so on. The raw_decode method essentially does the hard part.
There's no stdlib function that wraps it up, and I don't know of any library that does it, but it's actually very easy to do yourself:
def loads_multiple(s):
decoder = json.JSONDecoder()
pos = 0
while pos < len(s):
pos, obj = decoder.raw_decode(s, pos)
yield obj
So, instead of doing this:
obj = json.loads(s)
do_stuff_with(obj)
… you do this:
for obj in loads_multi(s):
do_stuff_with(obj)
Or, if you want to combine all the objects into one big list:
objs = list(loads_multi(s))
Consider simply rewriting it to something that is valid json. If indeed your bad data only contains the format that you've shown (a series of json structures that are not comma-separated), then just add commas and square braces:
with open('/tmp/sto/junk.csv') as f:
data = f.read()
print(data)
s = "[ {} ]".format(data.strip().replace("}{", "},{"))
print(s)
import json
data = json.loads(s)
print(type(data))
Output:
{ "13234134": ["some", "strings"]}{"34545345": ["some", "strings", "like", "this"]}
[ { "13234134": ["some", "strings"]},{"34545345": ["some", "strings", "like", "this"]} ]
<class 'list'>
If I have many of these in a text file;
<Vertex> 0 {
-0.597976 -6.85293 8.10038
<UV> { 0.898721 0.149503 }
<RGBA> { 0.92549 0.92549 0.92549 1 }
}
...
<Vertex> 1507 {
12 -5.3146 -0.000708352
<UV> { 5.7487 0.180395 }
<RGBA> { 0.815686 0.815686 0.815686 1 }
}
How can I read through the text file and add 25 to the first number in the second row? (-0.597976 in Vertex 0)
I have tried splitting the second line's text at each space with .split(' '), then using float() on the third element, and adding 25, but I don't know how to implicitly select the line in the text file.
Try to ignore the lines that start with "<", for example:
L=["<Vertex> 0 {",
"-0.597976 -6.85293 8.10038",
"<UV> { 0.898721 0.149503 }",
"<RGBA> { 0.92549 0.92549 0.92549 1 }"
]
for l in L:
if not l.startswith("<"):
print l.split(' ')[0]
Or if you read your data from a file:
f = open("test.txt", "r")
for line in f:
line = line.strip().split(' ')
try:
print float(line[0]) + 25
except:
pass
f.close()
The hard way is to use Python Lex/Yacc tools.
The hardest (did you expect "easy"?) way is to make a custom function recognizing tokens (tokens would be <Vertex>, numbers, bracers, <UV> and <RGBA>; token separators would be spaces).
I'm sorry but what you're asking is a mini language if you cannot guarantee the entries respect the CR and LFs.
Another ugly (and even harder!) way is, since you don't use recursion in that mini language, using regex. But the regex solution would be long and ugly in the same way and amount (trust me: really long one).
Try using this library: Python Lex/Yacc since what you need is to parse a language, and even when regex is possible to use here, you'll end with an ugly and unmaintainable one. YOU HAVE TO LEARN THE TIPS of language parsing to use this. Have a look Here
If the verticies will always be on the line after , you can look for that as a marker, then read the next line. If you read the second line, .strip() leading and trailing whitespace, then .split() by the space character, you will have a list of your three verticies, like so (assuming you have read the line into a string varaible line:
>>> line = line.strip()
>>> verticies = line.split(' ')
>>> verticies
['-0.597976', '-6.85293', '8.10038']
What now? Call float() on the first item in your list, then add 25 to the result.
The real challenge here is finding the <Vertex> marker and reading the subsequent line. This looks like a homework assignment, so I'll let you puzzle that out a bit first!
If your file is well-formatted, then you should be able to parse through the file pretty easily. Assuming <Vertex> is always on a line proceeding a line with just the three numbers, you could do this:
newFile = []
while file:
line = file.readline()
newFile.append(line)
if '<Vertex>' in line:
line = file.readline()
entries = line.strip().split()
entries[0] = str(25+float(entries[0]))
line = ' ' + ' '.join(entries)
newFile.append(line)
with open(newFileName, 'w') as fileToWrite:
fileToWrite.writelines(newFile)
This syntax looks like a Panda3d .egg file.
I suggest you use Panda's file load, modify, and save functions to work on the file safely; see https://www.panda3d.org/manual/index.php/Modifying_existing_geometry_data
Something like:
INPUT = "path/to/myfile.egg"
def processGeomNode(node):
# something using modifyVertexData()
def main():
model = loader.loadModel(INPUT)
for nodePath in model.findAllMatches('**/+GeomNode').asList():
processGeomNode(nodePath.node())
if __name__=="__main__":
main()
It is a Panda3D .egg file. The easiest and most reliable way to modify data in it is by using Panda3D's EggData API to parse the .egg file, modify the desired value through these structures, and write it out again, without loss of data.