I have a situation where I can only read BSON file as bytes (it is Apache Beam bytes coder). So I get BSON file content as bytes. Now I try to convert it to JSON. My code is:
from bson import json_util
import apache_beam as beam
class ParseBsontoJson(beam.DoFn):
def process(self, element):
print(type(element))
# data = bson.BSON.decode(bson.BSON(element))
data = element.decode('utf-8')
# data = bson.decode_all(element)
# data2 = json_util.dumps(data)
# print(type(data))
return [data]
p = beam.Pipeline(options=pipeline_options)
# This gives me Pcollection of bytes (elements)
test = (p | 'test_r' >> beam.io.ReadFromText(known_args.input + '/' + 'test.bson', coder=coders.BytesCoder()
| 'test_parse' >> beam.ParDo(ParseBsontoJson())) - here I have problem
data = element.decode('latin-1').encode("utf-8")
data2 = json_util.dumps(data)
print(data2)
Where element is a line in BSON file.
What I get is:
{
"$binary": "w5MBAAAHX2lkAF5cw5/Dvj3Cu3FsIcKMOsO+Am5hbWUABgAAAHNjZW5lAAdhY2NvdW50SWQAXlzDn8OKw6B1OEp4TcKmXQdkYXRhQ2VudGVyAF5cw5/DizpiwrIWY8KLw5EIB3RzaGlydFNpemUAXlzDn8OLPcK7cWwhwow6Sgdvc0ltYWdlAF5cw5/Diz3Cu3FsIcKMOksHcHJvdmlzaW9uZWRCeVdvcmtPcmRlcgBeXMOfw5Q9wrtxbCHCjDpcB3BoeXNpY2FsU2VydmVyTm9kZQBeXMOfw5U9wrtxbCHCjDpmBGJvbmRzAHoAAAADMAByAAAAEGluZGV4AAAAAAAEbmljcwAoAAAAAzAAIAAAAAJtYWMAEgAAADhjOmZkOjFiOjAwOjlmOjk5AAAAB25ldHdvcmsAXlzDn8OmOmLCshZjwovDkTwCaXB2NEFkZHJlc3MADgAAADM4LjEzMy4xNjQuNjAAAAAEbHVucwAUAAAABzAAXlzDn8O+PcK7cWwhwow6w70AAnN0YXR1cwAOAAAAZGVwcm92aXNpb25lZAAJX3VwZGF0ZWQAMMO4w4rCmnABAAAJX2NyZWF0ZWQAMMO4w4rCmnABAAACX2V0YWcAKQAAAGMxN2QyZGNjOGU2ZGQ3ZGQ1NGI1ZGQzMjVlYjkzMDcyZTE2NWVmZjEAAMK5AgAAB19pZABeXMOgCTpiwrIWY8KLw5HCsQJuYW1lAAUAAABhd2F5AAdhY2NvdW50SWQAXlzDn8OKOmLCshZjwovDkQMHZGF0YUNlbnRlcgBeXMOfw4s6YsKyFmPCi8ORCQd0c2hpcnRTaXplAF5cw5/DjDpiwrIWY8KLw5E=",
"$type": "00"
}
I tried another recommendations like for example from similar StackOverflow responses:
bson.decode_all(element)
or
# This give an exception: InvalidBSONbad eoo
data = bson.BSON.decode(bson.BSON(element))
But it doesn't convert it to JSON view.
element.decode('latin-1')
gives
"b²cÓ:nameimpactaccountId^\à=»ql!<RdataCenter^\à=»ql!<UtshirtSize^\à:b²cÒÞosImage^\à:b²cÒßprovisionedByWorkOrder^\à:b²cÒäphysicalServerNode^\à=»ql!<]bonds|0tindexnics(0 mac5d:b1:d1:82:d5:99network^"
Could you help me? I can't find in the internet how I can do this. All my tries receive exceptions.
I am using Python 3.7.
Related
I am creating Azure Data Factory pipeline using Python SDK (azure.mgmt.datafactory.models.PipelineResource). I need to convert PipelineResource object to JSON file. Is it possible anyhow?
I tried json.loads(pipeline_object) , json.dumps(pipeline_object) but no luck.
I need to convert PipelineResource object to JSON file. Is it possible anyhow?
You can try the following code snippet as suggested by mccoyp:
You can add a default argument to json.dumps to make objects that are not JSON serializable into dict
import json
from azure.mgmt.datafactory.models import Activity, PipelineResource
activity = Activity(name="activity-name")
resource = PipelineResource(activities=[activity])
json_dict = json.dumps(resource, default=lambda obj: obj.__dict__)
print(json_dict)
you can use this.
# Create a copy activity
act_name = 'copyBlobtoBlob'
blob_source = BlobSource()
blob_sink = BlobSink()
dsin_ref = DatasetReference(reference_name=ds_name)
dsOut_ref = DatasetReference(reference_name=dsOut_name)
copy_activity = CopyActivity(name=act_name,inputs=[dsin_ref], outputs=[dsOut_ref], source=blob_source, sink=blob_sink)
#Create a pipeline with the copy activity
#Note1: To pass parameters to the pipeline, add them to the json string params_for_pipeline shown below in the format { “ParameterName1” : “ParameterValue1” } for each of the parameters needed in the pipeline.
#Note2: To pass parameters to a dataflow, create a pipeline parameter to hold the parameter name/value, and then consume the pipeline parameter in the dataflow parameter in the format #pipeline().parameters.parametername.
p_name = 'copyPipeline'
params_for_pipeline = {}
p_name = 'copyPipeline'
params_for_pipeline = {}
p_obj = PipelineResource(activities=[copy_activity], parameters=params_for_pipeline)
p = adf_client.pipelines.create_or_update(rg_name, df_name, p_name, p_obj)
print_item(p)
I am new to Python. I have been trying to parse the response sent as parameter in a function.
I have been trying to convert a function from Perl to Python.
The Perl block looks something like this:
sub fetchId_byusername
{
my ($self,$resString,$name) =#_;
my $my_id;
my #arr = #{$json->allow_nonref->utf8->decode($resString)};
foreach(#arr)
{
my %hash = %{$_};
foreach my $keys (keys %hash)
{
$my_id = $hash{id} if($hash{name} eq $name);
}
}
print "Fetched Id is : $my_id\n";
return $my_id;
The part where JSON data is being parsed is troubling me. How do i write this in python3.
I tried something like
def fetchID_byUsername(self, resString, name):
arr = []
user_id = 0
arr = resString.content.decode('utf-8', errors="replace")
for item in arr:
temp_hash = {}
temp_hash = item
for index in temp_hash.keys():
if temp_hash[name] == name:
user_id = temp_hash[id]
print("Fetched ID is: {}".format(user_id))
return user_id
Now I am not sure, if this is the right way to do it.
The json inputs are something like:
[{"id":12345,"name":"11","email":"11#test.com","groups":[{"id":6967,"name":"Test1"},{"id":123456,"name":"E1"}],"department":{"id":3863,"name":"Department1"},"comments":"111","adminUser":false},{"id":123457,"name":"1234567","email":"1234567#test.com","groups":[{"id":1657,"name":"mytest"},{"id":58881,"name":"Service Admin"}],"department":{"id":182,"name":"Service Admin"},"comments":"12345000","adminUser":true}]
Thanks in advance.
Your json input should be valid python I changed false to False and true to True. If it is json formatted string you can do
import json
data=json.loads(json_formatted_string_here) #data will be python dictionary herer
And tried like this it just iterates and when match found returns id
data=[{"id":12345,"name":"11","email":"11#test.com","groups":[{"id":6967,"name":"Test1"},{"id":123456,"name":"E1"}],"department":{"id":3863,"name":"Department1"},"comments":"111","adminUser":False},{"id":123457,"name":"1234567","email":"1234567#test.com","groups":[{"id":1657,"name":"mytest"},{"id":58881,"name":"Service Admin"}],"department":{"id":182,"name":"Service Admin"},"comments":"12345000","adminUser":True}]
def fetch_id_by_name(list_records,name):
for record in list_records:
if record["name"] == name:
return record["id"]
print(fetch_id_by_name(data,"11"))
First of all import the the json library and use json.loads() like:
import json
x = json.loads(json_feed) #This converts the json feed to a python dictionary
print(x["key"]) #values to "key"
The following code is giving me:
Runtime.MarshalError: Unable to marshal response: {'Yes'} is not JSON serializable
from calendar import monthrange
def time_remaining_less_than_fourteen(year, month, day):
a_year = int(input['year'])
b_month = int(input['month'])
c_day = int(input['day'])
days_in_month = monthrange(int(a_year), int(b_month))[1]
time_remaining = ""
if (days_in_month - c_day) < 14:
time_remaining = "No"
return time_remaining
else:
time_remaining = "Yes"
return time_remaining
output = {time_remaining_less_than_fourteen((input['year']), (input['month']), (input['day']))}
#print(output)
When I remove {...} it then throws: 'unicode' object has no attribute 'copy'
I encountered this issue when working with lambda transformation blueprint kinesis-firehose-process-record-python for Kinesis Firehose which led me here. Thus I will post a solution to anyone who also finds this questions when having issues with the lambda.
The blueprint is:
from __future__ import print_function
import base64
print('Loading function')
def lambda_handler(event, context):
output = []
for record in event['records']:
print(record['recordId'])
payload = base64.b64decode(record['data'])
# Do custom processing on the payload here
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload)
}
output.append(output_record)
print('Successfully processed {} records.'.format(len(event['records'])))
return {'records': output}
The thing to note is that the Firehose lambda blueprints for python provided by AWS are for Python 2.7, and they don't work with Python 3. The reason is that in Python 3, strings and byte arrays are different.
The key change to make it work with lambda powered by Python 3.x runtime was:
changing
'data': base64.b64encode(payload)
into
'data': base64.b64encode(payload).decode("utf-8")
Otherwise, the lambda had an error due to inability to serialize JSON with byte array returned from base64.b64encode.
David here, from the Zapier Platform team.
Per the docs:
output: A dictionary or list of dictionaries that will be the "return value" of this code. You can explicitly return early if you like. This must be JSON serializable!
In your case, output is a set:
>>> output = {'Yes'}
>>> type(output)
<class 'set'>
>>> json.dumps(output)
Object of type set is not JSON serializable
To be serializable, you need a dict (which has keys and values). Change your last line to include a key and it'll work like you expect:
# \ here /
output = {'result': time_remaining_less_than_fourteen((input['year']), (input['month']), (input['day']))}
What I have so far is the first request gathering Id's. I would then like to use that return draftgroupid to insert into the second url request. Is it possible to send two requests in the same script, and if so how would I do a for i in range(draftgroupid) in the second url request?
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
Output of draftgroupid:
16901
16905
16902
16903
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/THEVALUEIWANTTOLOOPTHROUGH/draftables?format=json")
EDIT
import csv
import requests
import json
req = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
data2 = req2.json
for i, player_info in enumerate(data2['draftables'][0]):
date = player_info['competition']['startTime']
print(date)
Running into a TypeError: 'method' object is not subscriptable
As I understand, your problem is related to string manipulation rather than for the request library.
So basically,
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
should do the job.
More elegant ways to concatenate strings can be found at http://www.pythonforbeginners.com/concatenation/string-concatenation-and-formatting-in-python
Edit
For example,
"some string " + str(123)
"some string %d" % 123
"some string %s" % 123
Will all give the same output. There are more ways to concatenate strings. You just need to choose the best fit based on the context.
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/%d/draftables?format=json" % draftgroupid)
I assume you didn't actually mean for i in range(draftgroupid) as you stated in the question, because that would mean making 16901 requests, followed by 16905 requests (all of which except the last four would be duplicates of the first batch), followed by 16902 requests (of which all would be duplicates), etc.
I'm trying to import JSON data via an API, and use the imported data to construct a DataFrame.
import json
import pandas as pd
import numpy as np
import requests
api_username = 'acb'
api_password = 'efg'
germany_name = 'Germany'
germany_api_url = "https://api.country_data.com/stats/?country=" + germany_name + "&year=2014"
germany_api_resp = requests.get(germany_api_url,auth=(api_username,api_password))
germany_data_json = json.loads(germany_api_resp)
germany_frame = pd.DataFrame(germany_data_json['data']).set_index('tag')
print(germany_frame) shows me the desired DataFrame.
I want to repeat the process for many countries, not just 'Germany', so I created a country object like this:
class Country(object):
def __init__(self,name):
self.name = name
self.api_url = "https://api.country_data.com/stats/?country=" + name + "&year=2014"
self.api_resp = requests.get(self.api_url,auth=(api_username,api_password))
self.data_json = json.loads(self.api_resp)
self.frame = pd.DataFrame(self.data_json['data']).set_index('tag')
When I create my first object, like this:
Germany = Country('Germany')
I get an Error message:
TypeError: expected string or buffer
Can someone help me with this issue?
I don't which version of Python you're using, and which version of requests but I recommend to you to update everything. Here is a error I found :
self.data_json = json.loads(self.api_resp)
You try to load in a json-way a Response from requests, so change it to :
self.data_json = self.api_resp.json()
I replaced your api url to another because yours is wrong and it works for me.
See ya !