I have a lot yaml file names that have similar structure but with different data. I need to parse out selective data, and put into a single csv (excel) file as three columns.
But i facing an issue with empty key, that always gives me an "KeyError: 'port'"
my yaml file example:
base:
server: 10.100.80.47
port: 3306
namePrefix: well
user: user1
password: kjj&%$
base:
server: 10.100.80.48
port:
namePrefix: done
user: user2
password: fhfh#$%
In the second block i have an empty "port", and my script is stuck on that point.
I need that always that an empty key is found it doesn't write anything.
from asyncio.windows_events import NULL
from queue import Empty
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
rows_to_write = []
for i, each_yaml_file in enumerate(yaml_file_names):
print("Processing file {} of {} file name: {}".format(
i+1, len(yaml_file_names),each_yaml_file))
with open(each_yaml_file) as file:
data = yaml.safe_load(file)
for v in data:
if "port" in v == "":
data['base']['port'] = ""
rows_to_write.append([data['base']['server'],data['base']['port'],data['server']['host'],data['server']['contex']])
with open('output_csv_file.csv', 'w', newline='') as out:
csv_writer = csv.writer(out)
csv_writer.writerow(["server","port","hostname", "contextPath"])
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
You are trying to access the key by index e.g.
data['base']['port']
But what you want is to access it with the get method like so:
data['base'].get('port')
This way if the key does not exists it return None as default, and you could even change the default value to whatever you want by passing it as the second parameter.
In PyYAML, an empty element is returned as None, not an empty string.
if data['base']['port'] is None:
data['base']['port'] = ""
Your yaml file is invalid. In yaml file, whenever you have a key (like port: in your example) you must provide a value, you cannot leave it empty and go to the next line. Unless the value is the next bunch of keys of course, but in that case you need to ident the following lines one step more, which is obviously not what you intend to do here.
This is likely why you cannot parse the file as expected with the python yaml module. If you are the creator of those yaml file, you really need to put a key in the file like port: None if you don't want to provide a value for the port, or even better you just not provide any port key at all.
If they are provided to you by someone else, ask them to provide valid yaml files.
Then the other solutions posted should work.
Related
I have a yaml file as below:
server1:
host: os1
ip: ##.###.#.##
path: /var/log/syslog
file: syslog
identityfile: /identityfile/keypair.pub
server2:
host: os2
ip: ##.###.#.##
path: /var/log/syslog
file: syslog.1
identityfile: /identityfile/id_rsa.pub
I have a piece of code to parse the yaml and read entries.
read data from the config yaml file
def read_yaml(file):
with open(file, "r") as stream:
try:
config = yaml.load(stream)
print(config)
except yaml.YAMLError as exc:
print(exc)
print("\n")
return config
read_yaml("config_file")
print(config)
My problems:
1. I am unable to return values and I get a "NameError: name 'config' is not defined" at the print statement called outside the function.
How can I iterate and read the values in my yaml file by passing only the parameters?
Ex:
print('{host}#{ip}:{path}'.format(**config['os1']))
but without the 'os1' as the yaml file may have 100s of entries
I ensured there are no duplicates by using sets but want to use a loop and store the values from my string formatting command into a variable without using 'os1' or 'os2' or 'os#'.
def iterate_yaml():
remotesys = set()
for key,val in config.items():
print("{} = {}".format(key,val))
#check to ensure duplicates are removed by storing it in a set
remotesys.add('{host}#{ip}:{path}'.format(**config['os1']))
remotesys.add('{host}#{ip}:{path}'.format(**config['os2']))
remotesys.add('{host}#{ip}:{path}'.format(**config['os3']))
Thanks for the help.
You get the NameError exception because you don't return any values. You have to return config from the function.
For example:
def read_yaml(...):
# code
return config
Then, by calling read_yaml, you'll get your configuration returned.
Check the Python documentation & tutorials for that.
2-3. You can perform a for loop using the dict.items method.
For example:
x = {'lol': 1, 'kek': 2}
for name, value in x.items():
print(name, value)
I have a property file "holder.txt" like this which is in key=value format. Here key is clientId and value is hostname.
p10=machineA.abc.host.com
p11=machineB.pqr.host.com
p12=machineC.abc.host.com
p13=machineD.abc.host.com
Now I want to read this file in python and get corresponding clientId where this python script is running. For example: if python script is running on machineA.abc.host.com then it should give me p10 as clientId. Similarly for others.
import socket, ConfigParser
hostname=socket.getfqdn()
print(hostname)
# now basis on "hostname" figure out whats the clientId
# by reading "holder.txt" file
Now I have worked with ConfigParser but my confusion is how can I get value of key which is clientId basis on what hostname it is? Can we do this in python?
You Need to read and store the holder file in memory as a dictionary:
mappings = {}
with open('holder.txt', 'r') as f:
for line in f:
mapping = line.split('=')
mappings[mapping[1].rstrip()] = mapping[0]
Then perform a mapping every time you want to get clientId from hostname:
import socket, ConfigParser
hostname=socket.getfqdn()
clientId = mappings[hostname]
Hope that helps.
I have a python function that will open a YAML file and read the data. The YAML file contains two api keys and a domain. I want to return each value in a dictionary so it can be used in the program. However I get the error
"list indices must be integers, not str".
Should I just make the variables global, so it doesn't have to return anything?
The code is:
def ImportConfig():
with open("config.yml", 'r') as ymlfile:
config = yaml.load(ymlfile)
darksky_api = config['darksky']['api_key']
gmaps_api = ['gmaps']['api_key']
gmaps_domain = ['gmaps']['domain']
return {'darksky_api_key': darksky_api, 'gmaps_api_key': gmaps_api, 'gmaps_domain': gmaps_domain }
What does it mean that the list indices must be integers? I thought curly brackets indicated a dictionary? Also is there a better way to do this?
Independent of your yaml file if you type ['xy'] a the prompt of Python you create a list with one element and if you then index that with another string:
['xy']['abc']
you'll get that error.
You are missing config in line 5 and 6 of your program:
def ImportConfig():
with open("config.yml", 'r') as ymlfile:
config = yaml.safe_load(ymlfile)
darksky_api = config['darksky']['api_key']
gmaps_api = config['gmaps']['api_key']
gmaps_domain = config['gmaps']['domain']
return {'darksky_api_key': darksky_api, 'gmaps_api_key': gmaps_api, 'gmaps_domain': gmaps_domain }
please note that using load in PyYAML is security risk and for your data you should use safe_load().
I'm trying to make a script to back up a MySQL database. I have a config.yml file:
DB_HOST :'localhost'
DB_USER : 'root'
DB_USER_PASSWORD:'P#$$w0rd'
DB_NAME : 'moodle_data'
BACKUP_PATH : '/var/lib/mysql/moodle_data'
Now I need to read this file. My Python code so far:
import yaml
config = yaml.load(open('config.yml'))
print(config.DB_NAME)
And this is an error that comes up:
file "conf.py", line 4, in <module>
print(config.DB_NAME)
AttributeError: 'str' object has no attribute 'DB_NAME'
Does anyone have an idea where I made a mistake?
There are 2 issues:
As others have said, yaml.load() loads associative arrays as mappings, so you need to use config['DB_NAME'].
The syntax in your config file is not correct: in YAML, keys are separated from values by a colon+space.
Should work if the file is formatted like this:
DB_HOST: 'localhost'
DB_USER: 'root'
DB_USER_PASSWORD: 'P#$$w0rd'
DB_NAME: 'moodle_data'
BACKUP_PATH: '/var/lib/mysql/moodle_data'
To backup your data base, you should be able to export it as a .sql file. If you're using a specific interface, look for Export.
Then, for Python's yaml parser.
DB_HOST :'localhost'
DB_USER : 'root'
DB_USER_PASSWORD:'P#$$w0rd'
DB_NAME : 'moodle_data'
BACKUP_PATH : '/var/lib/mysql/moodle_data'
is a key-value thing (sorry, didn't find a better word for that one). In certain langage (such as PHP I think), they are converted to objects. In python though, they are converted to dicts (yaml parser does it, JSON parser too).
# access an object's attribute
my_obj.attribute = 'something cool'
my_obj.attribute # something cool
del my_obj.attribute
my_obj.attribute # error
# access a dict's key's value
my_dict = {}
my_dict['hello'] = 'world!'
my_dict['hello'] # world!
del my_dict['hello']
my_dict['hello'] # error
So, that's a really quick presentation of dicts, but that should you get you going (run help(dict), and/or have a look here you won't regret it)
In your case:
config['DB_NAME'] # moodle_data
Try this:
import yaml
with open('config.yaml', 'r') as f:
doc = yaml.load(f)
To access "DB_NAME" you can use:
txt = doc["DB_NAME"]
print txt
I have a python bolt which parses information from a file. The bolt in question receives a file path, parses the file and then emits a number of tuples from within a for loop.
The problem is that when it runs only two tuples are emitted and then it hangs. In the logs I can see that the correct number of keys have been parsed from the file and the first two tuples have been emitted but after this there are no other logs related to the bolt. (Only metrics logs)
38640 [Thread-19] INFO backtype.storm.task.ShellBolt - ShellLog
pid:14644, name:ParseFileBolt Number of keys = 1373
38870 [Thread-21] INFO backtype.storm.daemon.task - Emitting:
ParseFileBolt default ["177328623"]
38870 [Thread-21] INFO backtype.storm.daemon.task - Emitting:
ParseFileBolt default ["177328532"]
Here is a simplified version of the code which produces the issue.
As noted in the code if I manually enter a number of keys instead of parsing them from the file they all get emitted successfully.
import gzip
import storm
class ParseFileBolt(storm.BasicBolt):
def process(self, tup):
file_path = tup.values[0]
# If I parse keys from a file only two get emitted
keys = get_keys(file_path)
# e.g keys = {'393548331', '177329025', '123456789'}
# If I manually enter the keys they all get emitted
#keys = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
storm.logInfo("Number of keys = {0}".format(len(keys)))
for key in keys:
storm.emit([key])
def get_keys(file_name):
with gzip.open(file_name, 'rt') as file:
key_set = set()
for line in file:
if line.startswith("#"):
continue
else:
columns = line.split("|")
key = columns[0].strip(' \t\n\r')
key_set.add(key)
return key_set
ParseFileBolt().run()
The file which is being parsed is a .gz file which contains a header row starting with # followed by rows of '|' separated data.
# Header Row
177328623|columns1|column2|column3
177328532|columns1|column2|column3
123456789|columns1|column2|column3
...
I'm using apache-storm-0.9.4 on Windows.
The issue occurs on both local and remote clusters.
Any thoughts on what the issue would be greatly appreciated.