ConfigParser breaking when section header itself has an ]

ConfigParser breaking when section header itself has an ] - python

I am using the ConfigParser module like this:
from ConfigParser import ConfigParser
c = ConfigParser()
c.read("pymzq.ini")
However, the sections gets botched up like this:
>>> c.sections()
['pyzmq:platform.architecture()[0']
for the pymzq.ini file which has ] in tht title to mean something:
[pyzmq:platform.architecture()[0] == '64bit']
url = ${pkgserver:fullurl}/pyzmq/pyzmq-2.2.0-py2.7-linux-x86_64.egg

Looks like ConfigParser uses a regex that only parses section lines up to the first closing bracket, so this is as expected.
You should be able to subclass ConfigParser/RawConfigParser and change that regexp to something that better suits your case, such as ^\[(?P<header>.+)\]$, maybe.

Thanks for the pointer #AKX, I went with:
class MyConfigParser(ConfigParser):
_SECT_TMPL = r"""
\[ # [
(?P<header>[^$]+) # Till the end of line
\] # ]
"""
SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE)
Please let me know if you have any better versions. Source code of the original ConfigParser.

Related

How to read from a url directly as a list verbatim?

I have a url that contains just a list. For example, the path
https://somepath.com/dev/doc/72
returns simply (no html code):
[
"A/RES/72/1",
"A/RES/72/2",
"A/RES/72/3",
"A/RES/72/4"
]
I want to take the entire contents (including the square brackets) and make this into a list. Doing it by hand, I can copy/paste as a list like this:
docs = [
"A/RES/72/1",
"A/RES/72/2",
"A/RES/72/3",
"A/RES/72/4"
]
print(docs)
['A/RES/72/1', 'A/RES/72/2', 'A/RES/72/3', 'A/RES/72/4']
I would like to pass the content of the url to the list.
I tried the following
link = "https://somepath.com/dev/doc/72"
f = urlopen(link)
myfile = f.read()
print(myfile)
b'[\n "A/RES/72/1", \n "A/RES/72/2", \n "A/RES/72/3", \n "A/RES/72/4"\n]\n
It's a mess with new lines and not a list.
I'm guessing I would have to parse each line, removing the \n character, or something like this:
file.read().splitlines()
, but that seems overly complicated for such a simple input.
I've seen many solutions that parse .csv files, read inputs from each line, etc. But nothing to deal with a list that is already made and just needs to be called. Thanks for any help and pointers.
edit:
I tried this:
import urllib.request # the lib that handles the url stuff
link = "https://somepath.com/dev/doc/72"
a=[]
for line in urllib.request.urlopen(link):
print(line.decode('utf-8'))
a.append(line)
a
The print command gives me something close to what I want. But the append command gives me a mess again:
[b'[\n',
b' "A/RES/72/1", \n',
b' "A/RES/72/2", \n',
b' "A/RES/72/3", \n',
b' "A/RES/72/4"\n',
b']\n']
Edit: Turns out the url is serving a JSON. The solution by fuglede below (https://stackoverflow.com/a/60119016/10764078):
import requests
docs = requests.get('https://somepath.com/dev/doc/72').json()
I'm going to do some reading on JSON.

Assuming what the site is sending you is JSON, with requests, this would be obtainable through
import requests
docs = requests.get('https://somepath.com/dev/doc/72').json()

This works w/ the example you provided:
ast.literal_eval(str(myfile)[2:-1].replace("\\n",""))

Source-level transformation in Sphinx

I'm trying to write a sphinx extension that performs a source-level transformation, but I don't know how to actually change the output file.
My extension looks something like this:
def my_source_handler(app, docname, source):
import re
print 'test'
source = [re.sub("foo", "bar", source[0])]
return source
def setup(app):
app.connect('source-read', my_source_handler)
app.add_config_value('my_source_handler_include', True, False)
However, when I add the module to the extensions list and build html, it prints the 'test' but does not actually change the "foo"s to "bar"s in the output HTML file.
The Sphinx documentation is a little vague, saying, "You can process the contents and replace this item to implement source-level transformations" with regards to the source argument.
The problem is I'm not sure how I'm supposed to go about replacing the source argument.

Actually after a little digging I figured it out, you're supposed to replace the contents of the first (and only) element of source, not replace source itself, so like:
def my_source_handler(app, docname, source):
import re
print 'test'
source[0] = re.sub("foo", "bar", source[0])

Parsing puppet-api yaml with python

I am creating a script which need to parse the yaml output that the puppet outputs.
When I does a request agains example https://puppet:8140/production/catalog/my.testserver.no I will get some yaml back that looks something like:
--- &id001 !ruby/object:Puppet::Resource::Catalog
aliases: {}
applying: false
classes:
- s_baseconfig
...
edges:
- &id111 !ruby/object:Puppet::Relationship
source: &id047 !ruby/object:Puppet::Resource
catalog: *id001
exported:
and so on... The problem is when I do an yaml.load(yamlstream), I will get an error like:
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!ruby/object:Puppet::Resource::Catalog'
in "<string>", line 1, column 5:
--- &id001 !ruby/object:Puppet::Reso ...
^
As far as I know, this &id001 part is supported in yaml.
Is there any way around this? Can I tell the yaml parser to ignore them?
I only need a couple of lines from the yaml stream, maybe regex is my friend here?
Anyone done any yaml cleanup regexes before?
You can get the yaml output with curl like:
curl --cert /var/lib/puppet/ssl/certs/$(hostname).pem --key /var/lib/puppet/ssl/private_keys/$(hostname).pem --cacert /var/lib/puppet/ssl/certs/ca.pem -H 'Accept: yaml' https://puppet:8140/production/catalog/$(hostname)
I also found some info about this in the puppet mailinglist # http://www.mail-archive.com/puppet-users#googlegroups.com/msg24143.html. But I cant get it to work correctly...

I have emailed Kirill Simonov, the creator of PyYAML, to get help to parse Puppet YAML file.
He gladly helped with the following code. This code is for parsing Puppet log, but I'm sure you can modify it to parse other Puppet YAML file.
The idea is to create the correct loader for the Ruby object, then PyYAML can read the data after that.
Here goes:
#!/usr/bin/env python
import yaml
def construct_ruby_object(loader, suffix, node):
return loader.construct_yaml_map(node)
def construct_ruby_sym(loader, node):
return loader.construct_yaml_str(node)
yaml.add_multi_constructor(u"!ruby/object:", construct_ruby_object)
yaml.add_constructor(u"!ruby/sym", construct_ruby_sym)
stream = file('201203130939.yaml','r')
mydata = yaml.load(stream)
print mydata

I believe the crux of the matter is the fact that puppet is using yaml "tags" for ruby-fu, and that's confusing the default python loader. In particular, PyYAML has no idea how to construct a ruby/object:Puppet::Resource::Catalog, which makes sense, since that's a ruby object.
Here's a link showing some various uses of yaml tags: http://www.yaml.org/spec/1.2/spec.html#id2761292
I've gotten past this in a brute-force approach by simply doing something like:
cat the_yaml | sed 's#\!ruby/object.*$##gm' > cleaner.yaml
but now I'm stuck on an issue where the *resource_table* block is confusing PyYAML with its complex keys (the use of '? ' to indicate the start of a complex key, specifically).
If you find a nice way around that, please let me know... but given how tied at the hip puppet is to ruby, it may just be easier to do you script directly in ruby.

I only needed the classes section. So I ended up creating this little python function to strip it out...
Hope its usefull for someone :)
#!/usr/bin/env python
import re
def getSingleYamlClass(className, yamlList):
printGroup = False
groupIndent = 0
firstInGroup = False
output = ''
for line in yamlList:
# Count how many spaces in the beginning of our line
spaceCount = len(re.findall(r'^[ ]*', line)[0])
cleanLine = line.strip()
if cleanLine == className:
printGroup = True
groupIndent = spaceCount
firstInGroup = True
if printGroup and (spaceCount > groupIndent) or firstInGroup:
# Strip away the X amount of spaces for this group, so we get valid yaml
output += re.sub(r'^[ ]{%s}' % groupIndent, '', line) + '\n'
firstInGroup = False # Reset this
else:
# End of our group, reset
groupIndent = 0
printGroup = False
return output
getSingleYamlClass('classes:', open('puppet.yaml').readlines())

Simple YAML parser:
with open("file","r") as file:
for line in file:
re= yaml.load('\n'.join(line.split('?')[1:-1]).replace('?','\n').replace('""','\'').replace('"','\''))
# print '\n'.join(line.split('?')[1:-1])
# print '\n'.join(line.split('?')[1:-1]).replace('?','\n').replace('""','\'').replace('"','\'')
print line
print re

How to read multiline .properties file in python

I'm trying to read a java multiline i18n properties file. Having lines like:
messages.welcome=Hello\
World!
messages.bye=bye
Using this code:
import configobj
properties = configobj.ConfigObj(propertyFileName)
But with multilines values it fails.
Any suggestions?

According to the ConfigObj documentation, configobj requires you to surround multiline values in triple quotes:
Values that contain line breaks
(multi-line values) can be surrounded
by triple quotes. These can also be
used if a value contains both types of
quotes. List members cannot be
surrounded by triple quotes:
If modifying the properties file is out of the question, I suggest using configparser:
In config parsers, values can span
multiple lines as long as they are
indented more than the key that holds
them. By default parsers also let
empty lines to be parts of values.
Here's a quick proof of concept:
#!/usr/bin/env python
# coding: utf-8
from __future__ import print_function
try:
import ConfigParser as configparser
except ImportError:
import configparser
try:
import StringIO
except ImportError:
import io.StringIO as StringIO
test_ini = """
[some_section]
messages.welcome=Hello\
World
messages.bye=bye
"""
config = configparser.ConfigParser()
config.readfp(StringIO.StringIO(test_ini))
print(config.items('some_section'))
Output:
[('messages.welcome', 'Hello World'),
('messages.bye', 'bye')]

Thanks for the answers, this is what I finally did:
Add the section to the fist line of the properties file
Remove empty lines
Parse with configparser
Remove first line (section added in first step)
This is a extract of the code:
#!/usr/bin/python
...
# Add the section
subprocess.Popen(['/bin/bash','-c','sed -i \'1i [default]\' '+srcDirectory+"/*.properties"], stdout=subprocess.PIPE)
# Remove empty lines
subprocess.Popen(['/bin/bash','-c','sed -i \'s/^$/#/g' '+srcDirectory+"/*.properties"], stdout=subprocess.PIPE)
# Get all i18n files
files=glob.glob(srcDirectory+"/"+baseFileName+"_*.properties")
config = ConfigParser.ConfigParser()
for propFile in files:
...
config.read(propertyFileName)
value=config.get('default',"someproperty")
...
# Remove section
subprocess.Popen(['/bin/bash','-c','sed -i \'1d\' '+srcDirectory+"/*.properties"], stdout=subprocess.PIPE)
I still have troubles with those multilines that doesn't start with an empty space. I just fixed them manually, but a sed could do the trick.

Format your properties file like this:
messages.welcome="""Hello
World!"""
messages.bye=bye

Give a try to ConfigParser

I don't understand anything in the Java broth, but a regex would help you, I hope:
import re
ch = '''messages.welcome=Hello
World!
messages.bye=bye'''
regx = re.compile('^(messages\.[^= \t]+)[ \t]*=[ \t]*(.+?)(?=^messages\.|\Z)',re.MULTILINE|re.DOTALL)
print regx.findall(ch)
result
[('messages.welcome', 'Hello\n World! \n'), ('messages.bye', 'bye')]

Python - ConfigParser throwing comments

Based on ConfigParser module how can I filter out and throw every comments from an ini file?
import ConfigParser
config = ConfigParser.ConfigParser()
config.read("sample.cfg")
for section in config.sections():
print section
for option in config.options(section):
print option, "=", config.get(section, option)
eg. in the ini file below the above basic script prints out the further comments lines as well like:
something = 128 ; comment line1
; further comments
; one more line comment
What I need is having only the section names and pure key-value pairs inside them without any comments. Does ConfigParser handles this somehow or should I use regexp...or? Cheers

according to docs lines starting with ; or # will be ignored. it doesn't seem like your format satisfies that requirement. can you by any chance change format of your input file?
edit: since you cannot modify your input files, I'd suggest pre-parsing them with something along the lines:
tmp_fname = 'config.tmp'
with open(config_file) as old_file:
with open(tmp_fname, 'w') as tmp_file:
tmp_file.writelines(i.replace(';', '\n;') for i in old_lines.readlines())
# then use tmp_fname with ConfigParser
obviously if semi-colon is present in options you'll have to be more creative.

Best way is to write a commentless file subclass:
class CommentlessFile(file):
def readline(self):
line = super(CommentlessFile, self).readline()
if line:
line = line.split(';', 1)[0].strip()
return line + '\n'
else:
return ''
You could use it then with configparser (your code):
import ConfigParser
config = ConfigParser.ConfigParser()
config.readfp(CommentlessFile("sample.cfg"))
for section in config.sections():
print section
for option in config.options(section):
print option, "=", config.get(section, option)

It seems your comments are not on lines that start with the comment leader. It should work if the comment leader is the first character on the line.

As the doc said: "(For backwards compatibility, only ; starts an inline comment, while # does not.)" So use ";" and not "#" for inline comments. It is working well for me.

Python 3 comes with a build-in solution: The class configparser.RawConfigParser has constructor argument inline_comment_prefixes. Example:
class MyConfigParser(configparser.RawConfigParser):
def __init__(self):
configparser.RawConfigParser.__init__(self, inline_comment_prefixes=('#', ';'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ConfigParser breaking when section header itself has an ] - python

Looks like ConfigParser uses a regex that only parses section lines up to the first closing bracket, so this is as expected. You should be able to subclass ConfigParser/RawConfigParser and change that regexp to something that better suits your case, such as ^\[(?P<header>.+)\]$, maybe.

Thanks for the pointer #AKX, I went with: class MyConfigParser(ConfigParser): _SECT_TMPL = r""" \[ # [ (?P<header>[^$]+) # Till the end of line \] # ] """ SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE) Please let me know if you have any better versions. Source code of the original ConfigParser.

Related

How to read from a url directly as a list verbatim?

Source-level transformation in Sphinx

Parsing puppet-api yaml with python

How to read multiline .properties file in python

Python - ConfigParser throwing comments

Categories

Resources