I have a json file and I have to extract only the value of the key "data" and decode the base64-encoded.
This is the json file
{
"equno": "229151246954324320",
"data": "CdwL703m6njjyJ7tdEXdRoYKKv49oatv9UWvdLONpWgd407fakXNfbt18j+qKb/toYgHZvj34ig7iu4XN7BfrqTf/wNsIdVNJ67cH9hCyIfULWLgyT01vQ7u5zS5vSB9hazXTQHRYGxpV+eCrVFMEdxHId54sJDn+KL8hea93WyQKlFwMDO0QQD5X2lK02Y88uI7MehBUFXzK5jNTgKmLtUE9KoM4xF6bEzm2oNMrxz0QwOB4b/tvRvhThb/r+Wgb32gV3UBZBZ/RP4ID+lc7JE7TLVRyeGCNA+uV11/no358XZdGo/E5Aq7KZ95W+rQ8TE3/PLbodAhWZ+1wAcXvfuxJxpIm0giOZv3Dys/pZesM5wbdwaNrFnD+ngHfXB67IxiBM3oRRxC7CBHoFvQFjC8g2E3dk1ELP6VPex24lPJY1JeBwuy8DQroN7rxa4bwLAE6Z3SyL16dpwYdMmAB2YN2h4nMGfl1TMXPGsJKxcSv4tBLj905WTGYgDKG3sQ0AR4YHoPKni7/rUZQb/hM25wXFKYNRzGU6EIleCTP4fl1vqASLFUHDS0GqjwcYkCOilvDbb3PqNqDzPEI84L7XDidiWQ8XKfzw7ryjuIaw1b1ODqN3+ctnny88WXTzANzwA5wjqfhDJDGpHr58fQgi1/j2QIsFBt+VoOslxvx1YQvbubDTwM7dTEwWDY0U5+l8KvSv9fYIMbNYxjwXU3tR2SFhClumNjisUE0lHgCBAfkbTA5Fw9eW3+QRZdB5rY/7DWgxlRnBORZ54c5xxTnsk2ntFm14zQA8HN7zv09FQW+sMk6B767cyzi5HoEkf+PjnNh78OrIPVOFtigMYUb5PdDWDzjDCu2+9dN4mm5aml+/SIOFDUHg6aX+GLj7c0tI0thMFAR6dKP6QtmVbUanF7gSt+L2c4qRq48s3QYMlrTr++PqeoCYNuhWIo2iXllvzERarLDU/pxZNfGB39bFmjmiAnLwmDqNZuVTi40/A38AI+r4f39Y/eywskz/rco1CZGUXxd0FJj0pwdO9H0eedwVgXAmi3KYy3j5MZBWeObqs/ufvRpHjDeh54Bq91DrxcKPya/b6FGDxH73jIgB9Y9x/mbZq2h20H9fbbV+hTk8XIA5ItY+2N9J7FHiJ+NyQbl4UNZT/GVF4HS+NXplgzEAEIlzgRwrNoY0GJzeocxZlAa5f5ANu7OHltqpSTAZ0PzVCopG1NgwaQEpS08mVAtgXo7jq34VejdNuHiTo+/ht3Dn+C+WzKXHZIABkhHjGg1Bv4hJHuLXIpQjIE0xwQo2UcTmcAYvrGO6FcHZz+eRUmJyrtsJczwZK7nimfgJ6T/iuggPVwyn9pifU9VA=="
}
I tried using jq
jq -r '.[].data' < test.json | base64 --decode
But I got this error:
jq: error (at <stdin>:3): Cannot index string with string "data"
I have no idea how to resolve this error. I also tried using python but I couldn't decode it.
Help me, please!
Considering that you have loaded the json data into an object. You can then try the following:
import json
import base64
json_obj = {
"equno": "229151246954324320",
"data": "CdwL703m6njjyJ7tdEXdRoYKKv49oatv9UWvdLONpWgd407fakXNfbt18j+qKb/toYgHZvj34ig7iu4XN7BfrqTf/wNsIdVNJ67cH9hCyIfULWLgyT01vQ7u5zS5vSB9hazXTQHRYGxpV+eCrVFMEdxHId54sJDn+KL8hea93WyQKlFwMDO0QQD5X2lK02Y88uI7MehBUFXzK5jNTgKmLtUE9KoM4xF6bEzm2oNMrxz0QwOB4b/tvRvhThb/r+Wgb32gV3UBZBZ/RP4ID+lc7JE7TLVRyeGCNA+uV11/no358XZdGo/E5Aq7KZ95W+rQ8TE3/PLbodAhWZ+1wAcXvfuxJxpIm0giOZv3Dys/pZesM5wbdwaNrFnD+ngHfXB67IxiBM3oRRxC7CBHoFvQFjC8g2E3dk1ELP6VPex24lPJY1JeBwuy8DQroN7rxa4bwLAE6Z3SyL16dpwYdMmAB2YN2h4nMGfl1TMXPGsJKxcSv4tBLj905WTGYgDKG3sQ0AR4YHoPKni7/rUZQb/hM25wXFKYNRzGU6EIleCTP4fl1vqASLFUHDS0GqjwcYkCOilvDbb3PqNqDzPEI84L7XDidiWQ8XKfzw7ryjuIaw1b1ODqN3+ctnny88WXTzANzwA5wjqfhDJDGpHr58fQgi1/j2QIsFBt+VoOslxvx1YQvbubDTwM7dTEwWDY0U5+l8KvSv9fYIMbNYxjwXU3tR2SFhClumNjisUE0lHgCBAfkbTA5Fw9eW3+QRZdB5rY/7DWgxlRnBORZ54c5xxTnsk2ntFm14zQA8HN7zv09FQW+sMk6B767cyzi5HoEkf+PjnNh78OrIPVOFtigMYUb5PdDWDzjDCu2+9dN4mm5aml+/SIOFDUHg6aX+GLj7c0tI0thMFAR6dKP6QtmVbUanF7gSt+L2c4qRq48s3QYMlrTr++PqeoCYNuhWIo2iXllvzERarLDU/pxZNfGB39bFmjmiAnLwmDqNZuVTi40/A38AI+r4f39Y/eywskz/rco1CZGUXxd0FJj0pwdO9H0eedwVgXAmi3KYy3j5MZBWeObqs/ufvRpHjDeh54Bq91DrxcKPya/b6FGDxH73jIgB9Y9x/mbZq2h20H9fbbV+hTk8XIA5ItY+2N9J7FHiJ+NyQbl4UNZT/GVF4HS+NXplgzEAEIlzgRwrNoY0GJzeocxZlAa5f5ANu7OHltqpSTAZ0PzVCopG1NgwaQEpS08mVAtgXo7jq34VejdNuHiTo+/ht3Dn+C+WzKXHZIABkhHjGg1Bv4hJHuLXIpQjIE0xwQo2UcTmcAYvrGO6FcHZz+eRUmJyrtsJczwZK7nimfgJ6T/iuggPVwyn9pifU9VA=="
}
print(base64.b64decode(json_obj["data"]))
Here's a way to do that in Python:
import base64
import json
with open("sample_data.json") as f:
text = f.read()
d = json.loads(text)
data = base64.b64decode(d["data"])
The variable data now contains the decoded content of the relevant item in the json file.
If the string in .data were a valid base64 encoding of a UTF-8 string, the following would be equivalent:
jq -r .data input.json | base64 -D
and
jq -r '.data|#base64d' input.json
As it happens, with the given JSON, base64 shows there is a problem:
$ jq -r .data input.json | base64 -D
Invalid character in input stream.
Related
How to convert complete xml file to base64 string using python/ scala?
I have tried b64 module ,but it requires a string(bytes-like) to be passed to it . But how to do that with ML given it's multiline structure and hierarchy.
Could anyone give an example on how to do it.
Thanks.
Python solution:
import base64
# convert file content to base64 encoded string
with open("input.xml", "rb") as file:
encoded = base64.encodebytes(file.read()).decode("utf-8")
# output base64 content
print(encoded)
decoded = base64.decodebytes(encoded.encode('utf-8'))
# write decoded base64 content to file
with open("output.xml", "wb") as file:
file.write(decoded)
# output decoded base64 content
print(decoded.decode('utf-8'))
I got some data from an API with Python, and I'm trying to print it to a file. My understanding was that the indent argument lets you pretty print. Here's my code:
import urllib2, json
APIKEY_VALUE = "APIKEY"
APIKEY = "?hapikey=" + APIKEY_VALUE
HS_API_URL = "http://api.hubapi.com"
def getInfo():
xulr = "/engagements/v1/engagements/paged"
url = HS_API_URL + xulr + APIKEY + params
response = urllib2.urlopen(url).read()
with open("hubdataJS.json", "w") as outfile:
json.dump(response, outfile, sort_keys=True, indent=4, ensure_ascii=False)
getInfo()
What I expected hubdataJS.json to look like when I opened it in Sublime text is some JSON with a format like this:
{
a: some data
b: [
some list of data,
more data
]
c: some other data
}
What I got instead was all the data on one line, in quotes (I thought dumps was for outputting as a string), with lots of \s, \rs, and \ns.
Confused about what I'm doing wrong.
in your code, response is a bytestring that contains the data serialized in the json format. When you do json.dump you're serializing the string to json. You end up with a json formatted file containing a string, and in that string you have another json data, so, json inside json.
To solve that you have to decode (deserialize) the bytestring data you got from the internet, before reencoding it to json to write in the file.
response = json.load(urllib2.urlopen(url))
that will convert the serialized data from the web into a real python object.
I have this
import json
header = """{"fields": ["""
print(header)
with open('fields.json', 'w') as outfile:
json.dump(header, outfile)
and the return of print it's ok:
{"fields": [
but what is in the fields.json it's
"{\"fields\": ["
Why and how can I solve it?
Thanks
There is no issue there. The \ in the file is to escape the ". To see this read the json back and print it. So to add to your code
import json
header = """{"fields": ["""
print(header)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json', 'w') as outfile:
json.dump(header, outfile)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json') as data_file:
data = json.load(data_file)
print(data)
you will see the data is printed as {"fields": [
The JSON you are trying to write is bring treated as a string
so " is converted to \" . to avoid that we need to decode json using json.loads() before writing to file
The json should be complete or else json.loads() will throw an error
import json
header = """{"name":"sk"}"""
h = json.loads(header)
print(header)
with open('fields.json', 'w') as outfile:
json.dump(h, outfile)
First of all, it's not /, it's \.
Second, it's not in your JSON, because you don't really seem to have a JSON here, you have just a string. And json.dump() converts the string into a string escaped for JSON format, replacing all " with \".
By the way, you should not try to write incomplete JSON as a string yourself, it's better just to make a dictionary {"fields":[]}, then fill it with all values you want and later save into file through json.dump().
I am trying to decompress a byte64 encoded string in Python 2.7.
I can verify that my string is valid by running this in the command line:
echo -n "MY_BASE64_ENCODED_STRING" | base64 -d | zcat
However, if I run this in Python2.7:
b64_data = 'MY_BASE64_ENCODED_STRING'
text_data = zlib.decompress(base64.b64decode(b64_data))
I get an exception:
Error -3 while decompressing data: incorrect header check
Should I pass extra parameters to zlib.decompress to make it work?
As noted in the comments, your data is in gzip format and not just zlib compressed data. In Python 2.7, you can use GzipFile with StringIO to process the string:
>>> from gzip import GzipFile
>>> from StringIO import StringIO
>>> from base64 import b64decode
>>> data = 'H4sIAEm2algAAytJLS7hAgDGNbk7BQAAAA=='
>>> GzipFile(fileobj=StringIO(b64decode(data))).read()
'test\n'
I have a bson file: xyz.bson full of useful data and I'd like to query/process the data using python. Is there a simple example/tutorial out there I can get started with?
I don't understand this one.
You could use the mongorestore command to import the data into a mongoDB server and then query it by connecting to that server.
If you want to stream the data as though it were a flat JSON file on disk rather than loading it into a mongod, you can use this small python-bson-streaming library:
https://github.com/bauman/python-bson-streaming
from bsonstream import KeyValueBSONInput
from sys import argv
for file in argv[1:]:
f = open(file, 'rb')
stream = KeyValueBSONInput(fh=f, fast_string_prematch="somthing") #remove fast string match if not needed
for id, dict_data in stream:
if id:
...process dict_data...
You may use sonq to query .bson file directly from bash, or you can import and use the lib in Python.
A few examples:
Query a .bson file
sonq -f '{"name": "Stark"}' source.bson
Convert query results to a newline separated .json file
sonq -f '{"name": {"$ne": "Stark"}}' -o target.json source.bson
Query a .bson file in python
from sonq.operation import query_son
record_list = list(query_son('source.bson', filters={"name": {"$in": ["Stark"]}}))