writing strings that have quotes correctly in python - python

I have a Dockerfile that contains lines like:
LABEL "com.datadoghq.ad.logs"='[{"source": "mysource", "name": "myservicename"}]'
I have a python script that I want to read grab this LABEL value(everything after "LABEL" and add that to yaml file. Here's my script:
#!/usr/bin/python3
import yaml
def _extract_dockerfile_labels():
labels = []
with open("testing/dockerfile") as dockerfile:
for line in dockerfile.readlines():
if line.startswith('LABEL'):
label_val = line.replace("LABEL ","", 1)
labels.append(label_val.rstrip()). # rstrip() gets rid of the newline
with open("output.yml", "w+") as out_file:
yaml.dump(labels, out_file, default_flow_style=False)
_extract_dockerfile_labels()
The outfile looks like this:
- '"com.datadoghq.ad.logs"=''[{"source": "mysource", "name": "myservicename"}]'''
What I need for that to look like is:
- "com.datadoghq.ad.logs"='[{"source": "mysource", "name": "myservicename"}]'
How can I make this work without adding extra quotes?

I guess the issue is that you are using yaml.dump() function for output, which outputs object as YAML document. It YAML notation strings have quotes.
For standart python output try using out_file.write(labels), closing file afterwards.
Note, that your string is not parsed as a dictionary currently, and so is interpretated as a plain string. So if you intended parsing it to object, you should do it yourself first.
Your encoding format doesn't look familiar to me, maybe there is a parsing lib for it available. To convert it to YAML format you should consider using something like this:
labels = labels.replace("='[{", ":\n- ").replace(", ", "\n- ").replace("}]'", "")

Related

How to stop printing the properties of an .rtf file out when I use print(file.read()) in python

I am new to coding python and have trouble when I print out from a file (only tried from .rtf) as it displays all the file properties. I've tried a variety of ways to code the same thing, but the output is always similar. Example of the code and the output:
opener=open("file.rtf","r")
print(opener.read())
opener.close()
The file only contains this:
Camila
Employee
Try it
But the outcome is always:
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Camila\
\
Employees\
\
Try it}
Help? How to stop that from happening or what am I doing wrong?
The RTF filetype contains more information than just the text, like fonts etc..
Python reads the RTF file as plain text, and therefore includes this information.
If you want to get the plain text, you need a module that can translate it, like striprtf
Make sure the module is installed by running this in the commandline:
pip install striprtf
Then, to get your text:
from striprtf.striprtf import rtf_to_text
file = open("file.rtf", "r")
plaintext = rtf_to_text(file.read())
file.close()
Use this package https://github.com/joshy/striprtf.
from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf)
print(text)

Select specific fields from DB export that includes Json

I have an export from a DB table that has the following columns:
name|value|age|external_atributes
The external_atribute is on Json format. So the export looks like this:
George|10|30|{"label1":1,"label2":2,"label3":3,"label4":4,"label5":5,"label6":"6","label7":"7","label8":"8"}
Which is the most efficient way (since the export has more than 1m lines) to keep only the name and the values from label2, label5 and label6. For example from the above export I would like to keep only:
George|2|5|6
Edit: I am not sure for the sequence of the fields/variables on the JSON part. Data could be also for example:
George|10|30|{"label2":2,"label1":1,"label4":4,"label3":3,"label6":6,"label8":"8","label7":"7","label5":"5"}
Also the fact that some of the values are double quoted, while some are not, is intentional (This is how they appear also on the export).
My understanding until now is that I have to use something that has a JSON parser like Python or jq.
This is what i created on Python and seems that is working as expected:
from __future__ import print_function
import sys,json
with open(sys.argv[1], 'r') as file:
for line in file:
fields = line.split('|')
print (fields[0], json.loads(fields[3])['label2'], json.loads(fields[3])['label5'], json.loads(fields[3])['label6'], sep='|')
output:
George|2|5|6
Since I am looking for the most efficient way to do this, any comment is more than welcome.
Even if the data are easy to parse, I advise to use a json parser like jq to extract your json data:
<file jq -rR '
split("|")|[ .[0], (.[3]|fromjson|(.label2,.label5,.label6)|tostring)]|join("|")'
Both options -R and -r allows jq to accept and display a string as input and output (instead of json data).
The split function enable getting all fields into an array that can be indexed with number .[0] and .[3].
The third field is then parsed as json data with the function fromjson such that the wanted labels are extracted.
All wanted fields are put into an array and join together with the | delimiter.
You could split with multiple delimiters using a character class.
The following prints the desired result:
awk 'BEGIN { FS = "[|,:]";OFS="|"} {gsub(/"/,"",$15)}{print $1,$7,$13,$15}'
The above solution assumes that the input data is structured.
Since it is about record-based text edits, awk is most probably the best tool to accomplish the task. However, here it is a sed solution:
sed 's/\([^|]*\).*label2[^0-9]*\([0-9]*\).*label5[^0-9]*\([0-9]*\).*label6[^0-9]*\([0-9]*\).*/\1|\2|\3|\4/' inputFile

YAML vs Python configuration/parameter files (but perhaps also vs JSON vs XML)

I see Python used to do a fair amount of code generation for C/C++ header and source files. Usually, the input files which store parameters are in JSON or YAML format, although most of what I see is YAML. However, why not just use Python files directly? Why use YAML at all in this case?
That also got me thinking: since Python is a scripted language, its files, when containing only data and data structures, could literally be used the same as XML, JSON, YAML, etc. Do people do this? Is there a good use case for it?
What if I want to import a configuration file into a C or C++ program? What about into a Python program? In the Python case it seems to me there is no sense in using YAML at all, as you can just store your configuration parameters and variables in pure Python files. In the C or C++ case, it seems to me you could still store your data in Python files and then just have a Python script import that and auto-generate header and source files for you as part of the build process. Again, perhaps there's no need for YAML or JSON in this case at all either.
Thoughts?
Here's an example of storing some nested key/value hash table pairs in a YAML file:
my_params.yml:
---
dict_key1:
dict_key2:
dict_key3a: my string message
dict_key3b: another string message
And the same exact thing in a pure Python file:
my_params.py
data = {
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message",
}
}
}
And to read in both the YAML and Python data and print it out:
import_config_file.py:
import yaml # Module for reading in YAML files
import json # Module for pretty-printing Python dictionary types
# See: https://stackoverflow.com/a/34306670/4561887
# 1) import .yml file
with open("my_params.yml", "r") as f:
data_yml = yaml.load(f)
# 2) import .py file
from my_params import data as data_py
# OR: Alternative method of doing the above:
# import my_params
# data_py = my_params.data
# 3) print them out
print("data_yml = ")
print(json.dumps(data_yml, indent=4))
print("\ndata_py = ")
print(json.dumps(data_py, indent=4))
Reference for using json.dumps: https://stackoverflow.com/a/34306670/4561887
SAMPLE OUTPUT of running python3 import_config_file.py:
data_yml =
{
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message"
}
}
}
data_py =
{
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message"
}
}
}
Yes people do this, and have been doing this for years.
But many make the mistake you do and make it unsafe to by using import my_params.py. That would be the same as loading YAML using YAML(typ='unsafe') in ruamel.yaml (or yaml.load() in PyYAML, which is unsafe).
What you should do is using the ast package that comes with Python to parse your "data" structure, to make such an import safe. My package pon has code to update these kind of structures, and in each of my __init__.py files there is such an piece of data named _package_data that is read by some code (function literal_eval) in the setup.py for the package. The ast based code in setup.py takes around ~100 lines.
The advantage of doing this in a structured way are the same as with using YAML: you can programmatically update the data structure (version numbers!), although I consider PON, (Python Object Notation), less readable than YAML and slightly less easy to manually update.

remove single quotes from dict values while adding content to yaml file using python ruamel.yaml

I have an yaml file as mentioned below
test1.yaml
resources:
name:{get_param: vname}
ssh_keypair: {get_param: ssh_keypair}
Now I want to add test1_routable_net: { get_param: abc_routable_net } under resources of test1.yaml
Here is the code which I tried
import ruamel.yaml
yaml = ruamel.yaml.YAML()
test="{ get_param: abc_routable_net }".strip(‘\’’)
with open('/tmp/test1.yaml') as fp:
data = yaml.load(fp)
data['resources'].update({‘test1_routable_net’:test})
yaml.dump(data,file('/tes2.yaml', 'w'))
output of above code is
tes2.yaml
resources:
name:{get_param: vname}
ssh_keypair: {get_param: ssh_keypair}
test1_routable_net: '{ get_param: abc_routable_net }'
Desired output is
tes2.yaml
resources:
name:{get_param: vname}
ssh_keypair: {get_param: ssh_keypair}
test1_routable_net: { get_param: abc_routable_net }
I tried using test.strip('\'') , but no use still I see single quotes for the value .... How can I remove those quotes from the value?
In your program test is a string. Strings normally don't get quoted when dumped, but if their interpretation would be ambiguous, they will be. That is the reason why your output has the single quotes around them: to make sure that on reading back in this node is not incorrectly interpreted as a mapping instead of a string.
Removing the non-existent quotes with .strip() therefore doesn't do anything.
You should work backwards from what you what you want to accomplish (you actually want a mapping instead of a string, as one can see from the output).
If you load your desired output, you will see that the value for test1_routable_net is a python dict (or a subclass thereof), so make sure that is what you assign to test:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
test = { 'get_param': 'abc_routable_net' }
with open('./test1.yaml') as fp:
data = yaml.load(fp)
data['resources'].update({'test1_routable_net': test})
yaml.dump(data, sys.stdout)
Which gives:
resources:
name:{get_param: vname}
ssh_keypair: {get_param: ssh_keypair}
test1_routable_net:
get_param: abc_routable_net
This is semantically the same as your desired output, but since you want the get_param: abc_routable_net in flow-style, you could add:
yaml.default_flow_style=None
to get your desired output. You can also look at assigning, to test, a ruamel.comments.CommentedMap, which gives you more fine grained control over its style (and comments, etc.).
"test" is not a string it is a dict:
example:
import ruamel.yaml
import yaml
yaml = ruamel.yaml.YAML()
test={ 'get_param': 'abc_routable_net' }
with open('test1.yaml') as fp:
data = yaml.load(fp)
data['resources'].update({'test1_routable_net':test})
yaml.dump(data,open('test2.yaml', 'w'))

Newbie question about file formatting in Python

I'm writing a simple program in Python 2.7 using pycURL library to submit file contents to pastebin.
Here's the code of the program:
#!/usr/bin/env python2
import pycurl, os
def send(file):
print "Sending file to pastebin...."
curl = pycurl.Curl()
curl.setopt(pycurl.URL, "http://pastebin.com/api_public.php")
curl.setopt(pycurl.POST, True)
curl.setopt(pycurl.POSTFIELDS, "paste_code=%s" % file)
curl.setopt(pycurl.NOPROGRESS, True)
curl.perform()
def main():
content = raw_input("Provide the FULL path to the file: ")
open = file(content, 'r')
send(open.readlines())
return 0
main()
The output pastebin looks like standard Python list: ['string\n', 'line of text\n', ...] etc.
Is there any way I could format it so it looks better and it's actually human-readable? Also, I would be very happy if someone could tell me how to use multiple data inputs in POSTFIELDS. Pastebin API uses paste_code as its main data input, but it can use optional things like paste_name that sets the name of the upload or paste_private that sets it private.
First, use .read() as virhilo said.
The other step is to use urllib.urlencode() to get a string:
curl.setopt(pycurl.POSTFIELDS, urllib.urlencode({"paste_code": file}))
This will also allow you to post more fields:
curl.setopt(pycurl.POSTFIELDS, urllib.urlencode({"paste_code": file, "paste_name": name}))
import pycurl, os
def send(file_contents, name):
print "Sending file to pastebin...."
curl = pycurl.Curl()
curl.setopt(pycurl.URL, "http://pastebin.com/api_public.php")
curl.setopt(pycurl.POST, True)
curl.setopt(pycurl.POSTFIELDS, "paste_code=%s&paste_name=%s" \
% (file_contents, name))
curl.setopt(pycurl.NOPROGRESS, True)
curl.perform()
if __name__ == "__main__":
content = raw_input("Provide the FULL path to the file: ")
with open(content, 'r') as f:
send(f.read(), "yournamehere")
print
When reading files, use the with statement (this makes sure your file gets closed properly if something goes wrong).
There's no need to be having a main function and then calling it. Use the if __name__ == "__main__" construct to have your script run automagically when called (unless when importing this as a module).
For posting multiple values, you can manually build the url: just seperate different key, value pairs with an ampersand (&). Like this: key1=value1&key2=value2. Or you can build one with urllib.urlencode (as others suggested).
EDIT: using urllib.urlencode on strings which are to be posted makes sure content is encoded properly when your source string contains some funny / reserved / unusual characters.
use .read() instead of .readlines()
The POSTFIELDS should be sended the same way as you send Query String arguments. So, in the first place, it's necessary to encode the string that you're sending to paste_code, and then, using & you could add more POST arguments.
Example:
paste_code=hello%20world&paste_name=test
Good luck!

Categories