write parameter and value to yaml using python - python

I have the following YAML file:
heat_template_version: 2015-10-15
parameters:
image:
type: string
label: Image name or ID
default: CirrOS
private_network_id:
type: string
label: Private network name or ID
floating_ip:
type: string
I want to add key-> default to private_network_id and floating_ip (if default doesn't exist) and to the default key I want to add the value (which I get from user)
How can I achieve this in python?
The resulting YAML should look like:
heat_template_version: 2015-10-15
parameters:
image:
type: string
label: Image name or ID
default: CirrOS
private_network_id:
type: string
label: Private network name or ID
default: <private_network_id>
floating_ip:
type: string
default: <floating_ip>

For this kind of round-tripping you should do use ruamel.yaml (disclaimer: I am the author of the package).
Assuming your input is in a file input.yaml and the following program:
from ruamel.yaml import YAML
from pathlib import Path
yaml = YAML()
path = Path('input.yaml')
data = yaml.load(path)
parameters = data['parameters']
# replace assigned values with user input
parameters['private_network_id']['default'] = '<private_network_id>'
parameters['floating_ip']['default'] = '<floating_ip>'
yaml.dump(data, path)
After that your file will exact match the output you requested.
Please note that comments in the YAML file, as well as the key ordering are automatically preserved (not guaranteed by the YAML specification).
If you are still using Python2 (which has no pathlib in the standard library) use from ruamel.std.pathlib import Path or rewrite the .load() and .dump() lines with appropriately opened, old style, file objects. E.g.
with open('input.yaml', 'w') as fp:
yaml.dump(data, fp)

Related

How can I update a value in a YAML file?

I have this YAML file:
id: "bundle-1"
version: "1"
apiVersion: "1"
description: "Desc"
jcasc:
- "jenkins.yaml"
plugins:
- "plugins.yaml"
I want to modify the file by increasing the version number by 1.
I tried this code:
import sys
from ruamel.yaml import YAML
import yaml
file_name = 'bundle.yaml'
yaml.preserve_quotes = True
with open(file_name) as yml_file:
data = yaml.safe_load(yml_file)
value = int(data['version'])
print(type(value))
value += 1
str(value)
print(type(value))
data['version'] = str(value)
data = str(data)
print(value)
with open(file_name, 'w') as yaml_file:
yaml_file.write( yaml.dump(data, sys.stdout))
But I get this output, without double quotes and ordered differently:
id: bundle-1
apiVersion: 1
description: Desc
jcasc:
- jenkins.yaml
plugins:
- plugins.yaml
version: 1
But I get this output, without double quotes and ordered differently:
Since this is an object, the order of the keys don't matter. Also, yaml doesn't require quotes around strings. It looks like the library you are using omits them.
This is not a problem since the YAML is valid.
A more important problem I see is that the version is not incremented. You will have to debug your code to figure out why. I don't see anything obviously wrong with what you are doing.
On a side note, this line looks strange to me:
yaml_file.write( yaml.dump(data, sys.stdout))
I don't know what yaml library you are using, but I doubt that yaml.dump() returns anything. You probably need to do this instead:
yaml.dump(data, yaml_file)
```
You should refer to the documentation for this library to learn the correct usage of the `dump()` function.
What you need to do delete the line import yaml, instantiate a YAML() instance, and set the preserve_quotes attribute
on it. And then use the 'load() and 'dump() methods.
There is never, ever use for safe_load() and safe_dump().
import sys
import ruamel.yaml
yaml_str = """\
id: "bundle-1"
version: "1"
apiVersion: "1"
description: "Desc"
jcasc:
- "jenkins.yaml"
plugins:
- "plugins.yaml"
"""
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
data['version'] = ruamel.yaml.scalarstring.DoubleQuotedScalarString(int(data['version']) + 1)
yaml.dump(data, sys.stdout)
which gives:
id: "bundle-1"
version: "2"
apiVersion: "1"
description: "Desc"
jcasc:
- "jenkins.yaml"
plugins:
- "plugins.yaml"
Please note that I do not import yaml and neither should you.
This code:
with open(file_name, 'w') as yaml_file:
yaml_file.write( yaml.dump(data, sys.stdout))
will write None to the file you open, as that is what yaml.dump() returns. Instead do:
from pathlib import Path
path = Path('bundle.yaml')
yaml.dump(data, path) # with yaml being a YAML instance

Python - Get empty key from yml file

I have a lot yaml file names that have similar structure but with different data. I need to parse out selective data, and put into a single csv (excel) file as three columns.
But i facing an issue with empty key, that always gives me an "KeyError: 'port'"
my yaml file example:
base:
server: 10.100.80.47
port: 3306
namePrefix: well
user: user1
password: kjj&%$
base:
server: 10.100.80.48
port:
namePrefix: done
user: user2
password: fhfh#$%
In the second block i have an empty "port", and my script is stuck on that point.
I need that always that an empty key is found it doesn't write anything.
from asyncio.windows_events import NULL
from queue import Empty
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
rows_to_write = []
for i, each_yaml_file in enumerate(yaml_file_names):
print("Processing file {} of {} file name: {}".format(
i+1, len(yaml_file_names),each_yaml_file))
with open(each_yaml_file) as file:
data = yaml.safe_load(file)
for v in data:
if "port" in v == "":
data['base']['port'] = ""
rows_to_write.append([data['base']['server'],data['base']['port'],data['server']['host'],data['server']['contex']])
with open('output_csv_file.csv', 'w', newline='') as out:
csv_writer = csv.writer(out)
csv_writer.writerow(["server","port","hostname", "contextPath"])
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
You are trying to access the key by index e.g.
data['base']['port']
But what you want is to access it with the get method like so:
data['base'].get('port')
This way if the key does not exists it return None as default, and you could even change the default value to whatever you want by passing it as the second parameter.
In PyYAML, an empty element is returned as None, not an empty string.
if data['base']['port'] is None:
data['base']['port'] = ""
Your yaml file is invalid. In yaml file, whenever you have a key (like port: in your example) you must provide a value, you cannot leave it empty and go to the next line. Unless the value is the next bunch of keys of course, but in that case you need to ident the following lines one step more, which is obviously not what you intend to do here.
This is likely why you cannot parse the file as expected with the python yaml module. If you are the creator of those yaml file, you really need to put a key in the file like port: None if you don't want to provide a value for the port, or even better you just not provide any port key at all.
If they are provided to you by someone else, ask them to provide valid yaml files.
Then the other solutions posted should work.

Intake: catalogue level parameters

I am reading about "parameters" here and wondering whether I can define catalogue level parameters that I can later use in the definition of the catalogue's sources?
Consider a simple YAML-catalogue with two sources:
sources:
data1:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
Note that both data sources (data1 and data2) make use of snapshot_date parameter inside urlpath argument? With this definition I can load data sources with:
cat = intake.open_catalog("./catalog.yaml")
cat.data1(snapshot_date="latest").read() # reads from data/latest/data1.csv
cat.data2(snapshot_date="20211029").read() # reads from data/20211029/data2.csv
Please note that cat.data1().read() will not work, since snapshot_date defaults to empty string, so the csv driver cannot find the path "./data//data1.csv".
I can set the default value by adding parameters section to every (!) source like in the below.
sources:
data1:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
But this looks complicated (too much repetitive code) and a little inconvenient for the end user -- if a user wants to load all data sources from a given date, he has to explicitly provide snapshot_date parameter to every(!) data source at initialization. IMO, it would be nice I user can provide this value once when initializing the catalog.
Is there a way I can define snapshot_date parameter at catalog level? So that:
I can set default value (e.g. "latest" in my example) in the YAML-definition of the catalogue's parameter
or can pass catalogue's parameter value at runtimeduring the call intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
this value should be accessible in the definition of data sources of this catalog
?
cat = intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
cat.data1.read() # will return data from ./data/20211029/data1.csv
cat.data2.read() # will return data from ./data/20211029/data2.csv
cat.data2(snapshot_date="latest").read() # will return data from ./data/latest/data1.csv
cat = intake.open_catalog("./catalog.yaml")
cat.data1.read() # will return data from ./data/latest/data1.csv
cat.data2.read() # will return data from ./data/latest/data2.csv
Thanks in advance
This idea has been suggested before ( https://github.com/intake/intake/pull/562 , https://github.com/intake/intake/issues/511 ), and I have an inkling that maybe https://github.com/zillow/intake-nested-yaml-catalog supports something like you are asking.
However, I fully support adding this functionality in Intake, either based on #562, above, or otherwise. Adding it to the base Catalog and YAML file(s) catalog should be easy, but doing it so that it works for all subclasses might be tricky.
Currently, you can achieve what you want using environment variables, e.g., "{{snapshot_date}}"->"{{env(SNAPSHOT_DATE)}}", but you would ned to communicate to the user that this variable should be set. In addition, if the value is not to be used within a string, you would still need a parameter definition to cast to the right type.
This is a bit of a hack, but consider a yaml file with this content:
global_params:
snapshot_date: &global
default: latest
description: ''
type: str
sources:
data1:
args:
urlpath: '{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv'
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
parameters:
snapshot_date: *global
data2:
args:
urlpath: '{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv'
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
parameters:
snapshot_date: *global
Now intake will accept keyword argument for snapshot_date for specific sources.
Some relevant answers: 1 and 2.

How do I add a pipe the vertical bar (|) into a yaml file from Python

I have a task. I need to write python code to generate a yaml file for kubernetes. So far I have been using pyyaml and it works fine. Here is my generated yaml file:
apiVersion: v1
kind: ConfigMap
data:
info:
name: hostname.com
aio-max-nr: 262144
cpu:
cpuLogicalCores: 4
memory:
memTotal: 33567170560
net.core.somaxconn: 1024
...
However, when I try to create this configMap the error is that info expects a string() but not a map. So I explored a bit and it seem the easiest way to resolve this is to add a pipe after info like this:
apiVersion: v1
kind: ConfigMap
data:
info: | # this will translate everything in data into a string but still keep the format in yaml file for readability
name: hostname.com
aio-max-nr: 262144
cpu:
cpuLogicalCores: 4
memory:
memTotal: 33567170560
net.core.somaxconn: 1024
...
This way, my configmap is created successfully. My struggling is I dont know how to add that pipe bar from python code. Here I manually added it, but I want to automate this whole process.
part of the python code I wrote is, pretend data is a dict():
content = dict()
content["apiVersion"] = "v1"
content["kind"] = "ConfigMap"
data = {...}
info = {"info": data}
content["data"] = info
# Get all contents ready. Now write into a yaml file
fileName = "out.yaml"
with open(fileName, 'w') as outfile:
yaml.dump(content, outfile, default_flow_style=False)
I searched online and found a lot of cases, but none of them fits my needs. Thanks in advance.
The pipe makes the contained values a string. That string is not processed by YAML, even if it contains data with YAML syntax. Consequently, you will need to give a string as value.
Since the string contains data in YAML syntax, you can create the string by processing the contained data with YAML in a previous step. To make PyYAML dump the scalar in literal block style (i.e. with |), you need a custom representer:
import yaml, sys
from yaml.resolver import BaseResolver
class AsLiteral(str):
pass
def represent_literal(dumper, data):
return dumper.represent_scalar(BaseResolver.DEFAULT_SCALAR_TAG,
data, style="|")
yaml.add_representer(AsLiteral, represent_literal)
info = {
"name": "hostname.com",
"aio-max-nr": 262144,
"cpu": {
"cpuLogicalCores": 4
}
}
info_str = AsLiteral(yaml.dump(info))
data = {
"apiVersion": "v1",
"kind": "ConfigMap",
"data": {
"info": info_str
}
}
yaml.dump(data, sys.stdout)
By putting the rendered YAML data into the type AsLiteral, the registered custom representer will be called which will set the desired style to |.

Reading values from python object loaded from yaml file

I have a script that reads a YAML file into a python dictionary. How do I read the values and concatenate some of them to be more meaningful?
#script to load the yaml file into a python object
import yaml
from yaml import load, dump
#read data from the config yaml file
with open("config.yaml", "r") as stream:
try:
print(yaml.load(stream))
except yaml.YAMLError as exc:
print(exc)
Contents of YAML file:
os2:
host:hostname
ip:10.123.3.182
path:/var/log/syslog
file:syslog
Your yaml is inappropriately formatted. There should be a space after the : in each of the sub items like so:
os2:
host: hostname
ip: 10.123.3.182
path: /var/log/syslog
file: syslog
After that if you do a data = yaml.load(stream) it should pass the data correctly as such:
{'os2': {'file': 'syslog',
'host': 'hostname',
'ip': '10.123.3.182',
'path': '/var/log/syslog'}}
Also, you don't need the line from yaml import load, dump since you already import yaml in its entirety.
Once the data is loaded, you can do pretty much anything you wish with it. You might want to use str.format() or f strings (Python 3.6+) as such:
'{host}#{ip}:{path}'.format(**data['os2'])
# 'hostname#10.123.3.182:/var/log/syslog'
this is called string formatting . The **data['os2'] bit is essentially unpacking the dictionary within `data['os2'] so you can refer to the keys directly in your string as such:
{'file': 'syslog',
'host': 'hostname',
'ip': '10.123.3.182',
'path': '/var/log/syslog'}
Note that since your yaml doesn't include the key or value "ubuntu" there's no way for you to get reference that string unless you update your yaml.
Also Note: Don't confuse dictionary keys with attributes. You cannot reference data.os2.file as no such attribute exist under dictionary. You can however reference data['os2']['file'] (note they are in strings) to retrieve the data stored.
Your YAML is perfectly normal, and it loads as you can see here.
You have one key (os2) and as value, a multiline plain scalar that loads, following the YAML standard, as a string with a space where the YAML has newline+spaces. That value thus loads as "host:hostname ip:10.123.3.182 path:/var/log/syslog file:syslog".
Since you indicate you expect values (multiple), you either have to introduce make the value for os2 a flow-style mapping (in which case you must quote the scalars, otherwise you could e.g. not write plain URLs as scalars in valid YAML):
os2: {
"host":"hostname",
"ip":"10.123.3.182",
"path":"/var/log/syslog",
"file":"syslog"
}
or you should follow the guideline from the YAML standard that
Normally, YAML insists the “:” mapping value indicator be separated from the value by white space.
os2:
host: hostname
ip: 10.123.3.182
path: /var/log/syslog
file: syslog
You should load YAML (when using PyYAML), using yaml.safe_load() as there is absolutely no need to use yaml.load() function, which is documented to be potentially unsafe.
With either of the above in config.yaml, you can do:
import sys
import yaml
with open('config.yaml') as stream:
d = yaml.safe_load(stream)
os2 = d['os2']
# "concatenate" host, ip and path
print('{host}#{ip}:{path}'.format(**d['os2']))
to get:
hostname#10.123.3.182:/var/log/syslog
Your yaml file is incorrectly configured. There should be a space between each key and its value. You should have something like:
os2:
host: hostname
ip: 10.123.3.182
path: /var/log/syslog
file: syslog
yaml.load will return a dictionary whose values you can access normally.
{'os2': {'host': 'hostname', 'ip': '10.123.3.182', 'path': '/var/log/syslog', 'file': 'syslog'}}
Your code will look like this
#script to load the yaml file into a python object
import yaml
from yaml import load, dump
#read data from the config yaml file
with open("config.yaml", "r") as stream:
try:
config = yaml.load(stream)
#concatenate into string
string = f"{config['os2']['host']}#{config['os2']['ip']}:{config['os2']['path']}"
except yaml.YAMLError as exc:
print(exc)

Categories