Force YAML to format string as a quoted string - python

I have the following string "058" that I need to dump into YAML, but when I do a dump, it gets converted into a "number". (no quotes). All other number-like strings seem to work fine.
yaml.dump({'a': '058'})
returns:
'a: 058\n'
as you will notice, the string doesn't have the quote around it. Compare to another number:
yaml.dump({'a': '057'})
returns:
"a: '057'\n"
and that one has the single quotes around the string. Every other number I have tested does the quotes except for '058'.
How do I force YAML to have the quotes around it?

There are certain regular expressions defined in yaml library. Of course their purpose is to parse popular formats (i.a octal numbers). The exact regexp that causes this behaviour is
[-+]?0[0-7_]+
To handle this problem you need to add custom explicit resolver, but keep in mind that oct values containing numbers from beyond the scope of 0-8 will be parsed improperly - that is like they were proper oct values.
And here's the solution:
import re
from yaml import dump
from yaml.resolver import Resolver
Resolver.add_implicit_resolver(
'tag:yaml.org,2002:int',
re.compile(r'''^([-+]?0[0-9_]+)$''', re.X),
list('-+0123456789'))
yaml.dump({'a': '058'})
Then you'll get
"a: '058'\n"

It is a string. Quotes arent required in YAML; the value doesn't indicate an octal number
import yaml
from yaml import CLoader as Loader
yaml.load('a: 058\n', Loader=Loader)
# {'a': '058'}
type(yaml.load('a: 058\n', Loader=Loader)['a'])
# str

Related

How to get raid of double quotes when dumping a string that includes single quotes

I am trying to add single quotes in dumping the following yaml string:
yaml_str = 'Type: modified'
But the output includes double quotes which are not required.
Here is my code:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
data['Type'] = f"'{data['Type']}'"
yaml.dump(data, sys.stdout)
The output:
Type: "'modified'"
The expected output:
Type: 'modified'
Any ideas, please?
I tried all kinds of string formatting, nothing helped.
I also tried to add yaml.preserve_quotes = True which also didn't do any good.
Your expectation is completely wrong, so string formatting is not going to help you at all. YAML, like many other languages need to be able to handle scalars that have embedded quotes and YAML has multiple ways to handle that:
if a string to be dumped into a scalar has special characters that need backslash escaping (e.g. the audible bell \a), the scalar needs to be between double quotes (and double quotes in the string escaped in the scalar)
if a string to be dumped into a scalar has no special characters, but starts with a double quote, the whole scalar can be single quoted (and any existing single quotes in the string, will need to be duplicated '' in the scalar)
If you want to force single quotes in ruamel.yaml, even if they are superfluous, you can use:
data['Type'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('{data['Type']}')
although the much better solution would be to get rid of the program that reads your output file and requires the unnecessary quotes to be there in the first place.
Please note that having quotes in a string doesn't necessarily require the corresponding scalar to have quotes. E.g. a string that has no spaces and a quote somewhere between normal readable characters can be dumped without (extra) quotes.

Python YAML dumper single quote and double quote issue

I am reading rows from excel file and dumping to YAML file, after dumping i figured out some row are mentioned in single quote, double quote and plain text.
Data without any special characters are creaetd as plain text.
Data with \n character and parenthesis are created as 'Data here'
Data with special characters are created as "Data here"
I am using yaml dumper to create YAML file
with open(myprops['output'], "w") as f:
ruamel.yaml.dump(doc,f, Dumper=ruamel.yaml.RoundTripDumper,default_flow_style=False)
How to represent all data to be in single quote - 'Data here'?
You can force the dumper to use single quotes, when the scalar can be represented
using single quoted strings by providing the default_style="'" parameter.
This is not guaranteed to get you single quotes though, single quotes cannot do
the escape sequences that double quotes have (i.e. it is not like Python) and
some values might still get double quotes.
Using ruamel.yaml's new API (where round-trip-dumping is the default):
import sys
import ruamel.yaml
data = [
"25",
"with an\n embedded newline",
"entry with single quote: (')",
42
]
yaml = ruamel.yaml.YAML()
yaml.default_style = "'"
yaml.dump(data, sys.stdout)
which gives:
- '25'
- "with an\n embedded newline"
- 'entry with single quote: ('')'
- !!int '42'
Please note that in order to recognise 42 as an integer, because of
the quotes, that scalar needs to be tagged. The same holds for the
other special types YAML can represent (float, booleans, etc.) If you
don't want that make sure all the values you dump are strings.
You can also see the one escape mechanism single quoted scalars in YAML have:
as single quote in the scalar is doubled. (And if it had been at the end of the
Python string, you would have three single quotes in a row at the end of the scalar.
If you want consistency in your quoting, you should use double quotes, as that can represent all valid characters. Single quoted scalars in YAML can span multiple lines, so in principle it is possible to embed a newline. But there are restrictions on whitespace around the newline.
If you have a mix of string and non-string values in your input data, and you don't want to get the non-strings quoted, then you have to recurse over the data structure and replace each string x with ruamel.yaml.scalarstring.SingleQuotedScalarString(x), that is the
internal representation that ruamel.yaml uses if you specify yaml.preserve_quotes = True to distinguish single quoted input from plain/double/literal/folded scalars.

Yaml load converting string to UTF8?

I have this YAML:
---
test: {"gender":0,"nacionality":"Alem\u00e3o"}
I am reading it using python 3.5 as follow:
with open('teste.yaml', 'r') as stream:
doc = yaml.load_all(stream)
for line in doc:
print(line)
This is the result I get:
{'test': {'gender': 0, 'nacionality': 'Alemão'}}
But If I change " for ' in my YAML, I get this:
{'test': {'nacionality': 'Alem\\u00e3o', 'gender': 0}}
As you can see, when I use " the string Alem\\u00e3o is converted to UTF, but with ' it does not.
So I have two questions:
Why do I get different outputs when I use ' and "?
What can I do to get the output as Alem\\u00e3o when using "?
That's how the YAML data format is defined. Within double quotes, specific escape sequences are interpreted. Within single quotes, they're not.
7.3.1. Double-Quoted Style
The double-quoted style is specified by surrounding “"” indicators. This is the only style capable of expressing arbitrary strings, by using “\” escape sequences. This comes at the cost of having to escape the “\” and “"” characters.
http://yaml.org/spec/1.2/spec.html#id2787109
What can I do to get the output as Alem\u00e3o when using "?
Escape the escape character:
test: {"gender":0,"nacionality":"Alem\\u00e3o"}
Backslash escaping in YAML is only available in double quotes scalars. Not in single quoted scalars, unquoted nor (litereral) block scalars.
To get the output as you wish, the best way is to drop the quotes all together and use this as input:
---
test: {gender: 0, nacionality: Alem\u00e3o}
Your program however is up for some improvement.
you should never use load_all() or load() on this kind of non-tagged YAML. That is unsafe and can lead to arbitrary code being executed on your machine if you don't have complete control over the source YAML. Newer versions of ruamel.yaml will throw a warning if you don't explicitly specify the unsafe Loader as an argument. Do yourself a favour and and get into the habit of using safe_load() and safe_load_all().
load_all() gives back an iterator over documents so using doc and line are misleading variable names. You should use:
import ruamel.yaml as yaml
with open('teste.yaml', 'r') as stream:
for doc in yaml.safe_load_all(stream):
print(doc)
or if there is always just one document in teste.yaml you can simplify that to:
import ruamel.yaml as yaml
with open('teste.yaml') as stream:
print(yaml.safe_load(stream))
both of which will give you:
{'test': {'gender': 0, 'nacionality': 'Alem\\u00e3o'}}
Please note that it is mandatory in YAML to have a space after the : separating key and value in a mapping. Only for compatibility with JSON is it allowed to drop the space assuming the key is quoted (double and single quotes both work). So this works as input as well:
---
test: {"gender":0, 'nacionality':Alem\u00e3o}

Why does json.loads care which type of quotes are used?

In a python script I am parsing the return of
gsettings get org.gnome.system.proxy ignore-hosts
which looks like it should be properly formatted JSON
['localhost', '127.0.0.0/8']
however, when passing this output to json.loads it throws
ValueError: No JSON object could be decoded
I make the call to gsettings via:
import subprocess
proc = subprocess.Popen(["gsettings", "get", "org.gnome.system.proxy", "ignore-hosts"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = proc.communicate()
which assigns "['localhost', '127.0.0.0/8']\n" to stdout.
Then I strip the newline and pass to json.loads:
ignore = json.loads(stdout.strip("\n"))
But, this throws a ValueError.
I've tracked the issue down to the string being defined by single-quotes or double-quotes as shown in the following snippet:
# tested in python 2.7.3
import json
ignore_hosts_works = '["localhost", "127.0.0.0/8"]'
ignore_hosts_fails = "['localhost', '127.0.0.0/8']"
json.loads(ignore_hosts_works) # produces list of unicode strings
json.loads(ignore_hosts_fails) # ValueError: No JSON object could be decoded
import string
table = string.maketrans("\"'", "'\"")
json.loads(string.translate(ignore_hosts_fails, table)) # produces list of unicode strings
Why is ignore_hosts_fails not successfully parsed by json.loads without swapping the quote types?
In case it might matter, I'm running Ubuntu 12.04 with Python 2.7.3.
From the JSON RFC 7159:
string = quotation-mark *char quotation-mark
[...]
quotation-mark = %x22 ; "
JSON strings must use " quotes.
You can parse that list as a Python literal instead, using ast.literal_eval():
>>> import ast
>>> ast.literal_eval("['localhost', '127.0.0.0/8']")
['localhost', '127.0.0.0/8']
Because RFC 7159 says so. Strings in JSON documents are enclosed in double quotes.
JSON is not just JavaScript.
JSON strings are double quoted according to the spec pdf or json.org.
JSON object keys are strings.
You must use double quotes for your strings and keys (to follow the spec). Many JSON parsers will be more permissive.
From object definition:
An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs.
A name is a string. A single colon token follows each name, separating the name from the value. A single comma token separates a value from a following name.
From string definition:
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022).
That U+0022 is the (double) quotation mark: ".
As said before, that is invalid JSON. To parse, there are two other possibilities: use either demjson or yaml
>>> demjson.decode(" ['localhost', '127.0.0.0/8']")
[u'localhost', u'127.0.0.0/8']
>>> yaml.load(" ['localhost', '127.0.0.0/8']")
['localhost', '127.0.0.0/8']
Yes it cares for valid json. But you can tweak Simple json code to parse this Unquoted and single quoted json strings.
I have given my answer on this post
Single versus double quotes in json loads in Python

Loading document as raw string in yaml with PyYAML

I want to parse yaml documents like the following
meta-info-1: val1
meta-info-2: val2
---
Plain text/markdown content!
jhaha
If I load_all this with PyYAML, I get the following
>>> list(yaml.load_all(open('index.yml')))
[{'meta-info-1': 'val1', 'meta-info-2': 'val2'}, 'Plain text/markdown content! jhaha']
What I am trying to achieve here is that the yaml file should contain two documents, and the second one is supposed to be interpreted as a single string document, more specifically any large body of text with markdown formatting. I don't want it to be parsed as YAML syntax.
In the above example, PyYAML returns the second document as a single string. But if the second document has a : character in place of the ! for instance, I get a syntax error. This is because PyYAML is parsing the stuff in that document.
Is there a way I can tell PyYAML that the second document is a just a raw string and not to parse it?
Edit: A few excellent answers there. While using quotes or the literal syntax solves the said problem, I'd like the users to be able to write the plain text without any extra cruft. Just the three -'s (or .'s) and write away a large body of plain text. Which might also include quotes too. So, I'd like to know if I can tell PyYAML to parse only one document, and give the second to me raw.
Eidt 2: So, adapting agf's idea, instead of using a try/except as the second document could be valid yaml syntax,
config_content, body_content = open(filename).read().split('\n---')
config = yaml.loads(config_content)
body = yaml.loads(body_content)
Thanks agf.
You can do
raw = open(filename).read()
docs = []
for raw_doc in raw.split('\n---'):
try:
docs.append(yaml.load(raw_doc))
except SyntaxError:
docs.append(raw_doc)
If you won't have control over the format of the original document.
From the PyYAML docs,
Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences \x** and \u****, you may express any ASCII or Unicode character.
So it sounds like there is no way to represent an arbitrary scalar in the parsing if it's not double quoted.
If all you want is to escape the colon character in YAML, then enclose it within single or double quotes. Also, you can try literal style for your second document which should be treated as single scalar.

Categories