Python - Extracting data from text to use - python

probably a basic question to most.
Though I am looking at getting some specific values that would follow certain words within a mass amount of text like so - (https://character-service.dndbeyond.com/character/v5/character/00000001)
In this instance if I wanted to take the value after baseHitPoints and also the value after removedHitPoints how would I go about this in the simplest way? The aim is to take these values and any others, even potentially text after a certain word, and store them as strings/integers to use in calculations.
I am also using BeautifulSoup to get this information if that effects anything in your answers.
Thanks in advance!
Tried a few different approaches.

Learning JSON document structure
First of all, you don't actually need to use BeautifulSoup here. The URL is targeting to JSON document, which is quite easy to navigate and get values from if you know the structure of JSON document.
To examine the structure of JSON document I can recommend using browser extensions for prettifying JSON, like JSON lite for Opera. But this is not best approach if JSON document is so large, your browser will be struggling render so much text, and extension will be struggling trying to format the JSON document. If document is large - use second approach.
You can download JSON document, and use Text Editors with plugins for JSON prettifying (like powerful Notepad++, plugin JSFormat).
Also, you can examine the structure of JSON document dirrectly in python using library for pretty print output called pprint (this is also not the best option with large documents).
import pprint
json = {"data": {"actions": {"items": [1, 2, 3], "colors": ['red', 'green', 'blue'], "items1": [1, 2, 3], "items2": {"numbers": [1, 2, 3], "characters": ['a', 'b', 'c']}}, "some_other_data": ['one', 'two', 'three']}, "new_data": [11, 22, 33, 44, 55]}
pprint.pprint(json)
Outputs same structure but more structured, and you can easily check how to retrieve certain value from JSON:
{'data': {'actions': {'colors': ['red', 'green', 'blue'],
'items': [1, 2, 3],
'items1': [1, 2, 3],
'items2': {'characters': ['a', 'b', 'c'],
'numbers': [1, 2, 3]}},
'some_other_data': ['one', 'two', 'three']},
'new_data': [11, 22, 33, 44, 55]}
Retrieving data from your URL
As we can see from the JSON document in your URL, JSON-path to baseHitPoints is: data -> baseHitPoints. JSON-path to removedHitPoints is: data -> removedHitPoints.
Knowing that, we can now easily retrieve those values using library requests:
import requests
URL = "https://character-service.dndbeyond.com/character/v5/character/00000001"
req = requests.get(URL)
json_data = req.json()
baseHitPoints = json_data['data']['baseHitPoints']
removedHitPoints = json_data['data']['removedHitPoints']
print(f"{baseHitPoints = }\n{removedHitPoints = }")
Output:
baseHitPoints = 8
removedHitPoints = 2

What you have there is JSON, not really a standard string. Because that's JSON you can get the data out of there easily by doing something like this:
import json
myData = json.load(myJSON) #myJSON would be the contents of that link(https://character-service.dndbeyond.com/character/v5/character/00000001) you provided
baseHitPoints = myData["data"]["baseHitPoints"]
removedHitPoints = myData["data"]["removedHitPoints"]
The values of those variables in my example using your JSON:
baseHitPoints = 8
baseHitPoints = 2
To make your life a little easier going forward I'd look for a tutorial for how to parse through JSON.

Related

Store f-string with function calls in a file

I'm storing f-strings with function calls in a separate file (with lots of variables).
I am writing a script that has hundreds of variables which are then loaded into an HTML table. Some of the contents in the HTML table require function calls.
This works:
def add_one(a):
return a + 1
a = 1
s = f"A is {a} and next comes {add_one(a)}"
print(s)
When I store s in a file, I can use **locals() to format it and it works when I store variables in s.txt.
Contents of s.txt:
A is {a}
Contents of script that works:
a = 1
print(open('s.txt').read().format(**locals()))
However, when I try to call functions, it does not work:
Contents of s.txt:
A is {a} and next comes {add_one(a)}
Contents of script that does not work:
def add_one(a):
return a + 1
a = 1
print(open('s.txt').read().format(**locals()))
What can I do to make it work (given my actual case is hundreds of function calls and not this simple 2 variable example)?
In this example it should result in A is 1 and next comes 2.
You might want to consider using a templating language rather than f-strings if you have a complex HTML table with hundreds of variables. e.g. Jinja2.
For simplicity I've stored the a value in a dictionary as this then simplifies passing it to the Jinja2 render and also converting it to JSON for storing it in a file.
Here is your example using Jinja2 templates and storing the data to a json file:
import json
from pathlib import Path
import jinja2
json_file = Path('/tmp/test_store.json')
jinja_env = jinja2.Environment()
# Set variable values
values = {'a': 3}
# Save to json file
json_file.write_text(json.dumps(values))
# Read from json file to dictionary with new variable name
read_values = json.loads(json_file.read_text())
def add_one(a):
return a + 1
# Add custom filter to jinja environment
jinja_env.filters['add_one'] = add_one
# Define template
template = jinja_env.from_string("A is {{a}} and next comes {{a | add_one}}")
# Print rendered template
print(template.render(read_values))
This gave the output of:
A is 3 and next comes 4
The JSON file is the following:
{"a": 3}
As mentioned in the discussion e.g. here, what you want does not really work in any simple way. There is one obvious workaround: storing an f-string (e.g. f"A is {a} and next comes {add_one(a)}") in your text file and then eval'ing it:
with open('s.txt', 'r') as f:
print(eval(f.read())) # A is 1 and next comes 2
Of course, all the usual warnings about shooting yourself in the foot apply here, but your problem definition sounds exactly like this use case. You can try sandboxing your functions and whatnot, but it generally does not work well. I would say it is still a viable use case for homebrew automation, but it has a massive potential for backfiring, and the only reason I am suggesting it is because alternative solutions are likely to be about as dangerous.
Use serialization and deserialization to store data
import json
data = {
"a" : 1,
"b" : 2,
"name" : "Jack",
"bunch_of_numbers" : [1, 2, 3, 5, 6]
}
file_name = "s.txt"
with open(file_name, 'w') as file:
file.write(json.dumps(data)) #serialization
with open(file_name, 'rb') as file:
data = json.load(file) # de-serialization
print(data)
Output:
{'a': 1, 'b': 2, 'name': 'Jack', 'bunch_of_numbers': [1, 2, 3, 5, 6]}

How to convert this data structure of JSON to python dict?

I have parsed some JSON data to dict in python3, using json module.
In the result dict, some data remains in the form of string, similar to
's:4:"name";s:5:"value";s:5:"array";a:4:{etc...}'
What is the proper name of this, and how can I further convert to a dict, like
{"name":"value", "array": [etc...]}
One of the simplest ways is to use json:
import json
str_dict = "{'one': 'three', 'two': 'four'}"
new_dict = json.loads(str_dict)
Edit: You can also use ast:
import ast
srt_dict = "{'five': 'seven', 'six': 'eight'}"
new_dict = ast.literal_eval(str_dict)
Edit: You can also use eval(), however the docs recommend using literal_eval:
str_dict = "{'nine': 'ten', 'eleven': 'twelve'}"
new_dict = eval(str_dict)
There are more ways out there on the internet, however these are what I consider to be the simplest of them.

Alternate way to read a file as json using python

I have a file that contains information similar to the example shown below. I would like to know if there is a way to read this file using python and use it as json. I am aware of how to open it and process the text but is there another way?
Example:
key={{name='foo', etc..},{},{},{}}
If you want your content to be treated as json you need to have a valid json syntax:
{"key":[{"name": "foo"},{},{},{}]}
Then you can use json library to convert it to a dictionary:
>>> import json
>>> json.loads('{"key":[{"name": "foo"},{},{},{}]}')
{u'key': [{u'name': u'foo'}, {}, {}, {}]}
Additionally, if your file content cannot be changed to a correct json syntax, then you will need to develop a parser for your specific case.
After searching a few different languages I was able to determine that the data struct was a Lua container. I found a simple lua-python parser currently available here github.com/SirAnthony/slpp and I was able to parse(decode) the lua data structure. Below is an example of parsing a lua container with slpp.py I hope this helps anyone with a similar problem.
>>> from slpp import slpp as lua
>>> data = lua.decode('{ array = { 65, 23, 5 }, dict = { string = "value", array = { 3, 6, 4}, mixed = { 43, 54.3, false, string = "value", 9 } } }')
>>> print data
{'array': [65, 23, 5], 'dict': {'mixed': {0: 43, 1: 54.33, 2: False, 4: 9, 'string': 'value'}, 'array': [3, 6, 4], 'string': 'value'}}

How does JSON work in my python program?

So I use the Java Debugger JSON in my python program because a few months ago I was told that this was the best way of opening a text file and making it into a dictionary and also saving the dictionary to a text file. However I am not sure how it works.
Below is how I am using it within my program:
with open ("totals.txt", 'r') as f30:
totaldict = json.load(f30)
and
with open ("totals.txt", 'w') as f29:
json.dump(totaldict, f29)
I need to explain how it works for my project so could anyone explain for me how exactly json works when loading a text file into dictionary format and when dumping contents into the text file?
Thanks.
Edit: please don't just post links to other articles as I have tried to look at these and they have offered me not much help as they are not in my context of using JSON for dictionaries and a bit overwhelming as I am only a beginner.
JSON is J ava S cript O bject N otation. It works in Python like it does anywhere else, by giving you a syntax for describing arbitrary things as objects.
Most JSON is primarily composed of JavaScript arrays, which look like this:
[1, 2, 3, 4, 5]
Or lists of key-value pairs describing an object, which look like this:
{"key1": "value1", "key2": "value2"}
These can also be nested in either direction:
[{"object1": "data1"}, {"object2": "data2"}]
{"object1": ["list", "of", "data"]}
Naturally, Python can very easily treat these types as lists and dicts, which is exactly what the json module tries to do.
>>> import json
>>> json.loads('[{"object1": "data1"}, {"object2": "data2"}]')
[{'object1': 'data1'}, {'object2': 'data2'}]
>>> json.dumps(_)
'[{"object1": "data1"}, {"object2": "data2"}]'
Try this: Python Module of the Week
The json module provides an API similar to pickle for converting in-memory Python objects to a serialized representation known as JavaScript Object Notation (JSON). Unlike pickle, JSON has the benefit of having implementations in many languages (especially JavaScript)
Encoding and Decoding Simple Data Types
The encoder understands Python’s native types by default (string, unicode, int, float, list, tuple, dict).
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
data_string = json.dumps(data)
print 'JSON:', data_string
Values are encoded in a manner very similar to Python’s repr() output.
$ python json_simple_types.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]
Encoding, then re-decoding may not give exactly the same type of object.
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
data_string = json.dumps(data)
print 'ENCODED:', data_string
decoded = json.loads(data_string)
print 'DECODED:', decoded
print 'ORIGINAL:', type(data[0]['b'])
print 'DECODED :', type(decoded[0]['b'])
In particular, strings are converted to unicode and tuples become lists.
$ python json_simple_types_decode.py
ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]
DECODED: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
ORIGINAL: <type 'tuple'>
DECODED : <type 'list'>

How to MySQL to store multi-params for HTTP POST?

HTTP Post may have multiple params such as:
http://example.com/controller/ (while params=({'country':'US'},{'city':'NYC'})
I am developing a web spider with Python, I face a problem how to track difference with same url with different params. Now I can load the content, but I have no idea how to store the post params in a field of SQLite3 table. It is easy to store the params in database like MySQL for system developer, but since the params of different sites are various. I prefer to store the post params in single field, rather than one-on-one relationship mapping in a table.
HTTP GET
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read()
HTTP POST
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()
My project configuration:
Python + SQLite3
Store http url and post params and tracking the changes.
The post params contains multiple key-value pairs
The stored params should be decoded back to params.
The encode issues should be covered.
I saw multiple solutions like JSON, XML and YAML. I guess this format actually stored as string (CHAR) type in SQLite, in UTF-8. But I have no idea if there is any handy way to convert them back to Python tuple type? Or, can I encode the post params into get params with + and & symbal, and decode it back to post params?
Sorry, I am just a newbie for Python.
You can convert to and from json easily like this:
>>> import json
>>> json.dumps({'spam': 1, 'eggs': 2, 'bacon': 0})
'{"eggs": 2, "bacon": 0, "spam": 1}'
>>> json.loads('{"eggs": 2, "bacon": 0, "spam": 1}')
{u'eggs': 2, u'bacon': 0, u'spam': 1}
>>> json.dumps((1,2,3,4))
'[1, 2, 3, 4]'
>>> json.loads('[1, 2, 3, 4]')
[1, 2, 3, 4]
>>>
Better use it because it is more versatile than home made & separated encoding, it supports any nesting complexity.
I would probably go with Frost's suggestion - JSON encoding is far more robust. However, there have been situations in the past where I've been forced to go a simpler route:
>>> d = {'spam': 1, 'eggs': 2, 'bacon': 0}
>>> l = [(a +":"+str(b)) for a,b in d.items()]
>>> ','.join(l)
'eggs:2,bacon:0,spam:1'
Obviously your delimiters (, and : in this case) need to be carefully chosen, but this works in a pinch.

Categories