AWS - transforming objects from wrong double json to single json - python

I created a S3 bucket to store incoming data.
The flow looked like this.
Whenever a message came to messengerpeople or a heyflow was submitted by a customer a webhook was triggered and the data was sent to my Lambda-Url. There I had the following code
const AWS = require( 'aws-sdk' );
const firehose = new AWS.Firehose();
exports.handler = async (event) => {
console.log(JSON.stringify(event, null, 4));
try {
await firehose
.putRecord({
DeliveryStreamName: 'delivery-stream-name',
Record: {
Data:JSON.stringify(event.body)
}
})
.promise();
} catch (error) {
console.error(error);
return {
statusCode: 400,
body: `Cannot process event: ${error}`
};
}
return {
statusCode: 200,
body: JSON.stringify({
ack: new Date().getTime()
})
};
};
so the lambda was putting the data to my firehose delivery stream which I used to get the partition of yyyy/mm/dd/h.
Then when I wanted to crawl the data, the crawler didn't recognize what is inside the json file. So when a table was created it didn't give me the columns with their type.
e.g. id: string
created_at: timestamp
etc.
after reviewing it a friend of mine noticed that two things needed to be changed.
First: the code should look like this
const AWS = require( 'aws-sdk' );
const firehose = new AWS.Firehose();
exports.handler = async (event) => {
console.log(JSON.stringify(event, null, 4));
try {
await firehose
.putRecord({
DeliveryStreamName: 'delivery-stream-name',
Record: {
Data:event.body
}
})
.promise();
} catch (error) {
console.error(error);
return {
statusCode: 400,
body: `Cannot process event: ${error}`
};
}
return {
statusCode: 200,
body: JSON.stringify({
ack: new Date().getTime()
})
};
};
so we changed on part
from
Data:JSON.stringify(event.body)
to
Data:event.body
Next I set in my delivery stream UI the compression to gzip.
Since I changed these thing when I crawl my data it now recognizes the columns and their type.
So in order to make the old data, which was send in before these changes, usable, I need to implement these two changes.
So one idea was to list all objects before the change, as they are in a different bucket and apply these two changes to it in vscode.
When i was listing objects to compare I noticed this difference.
So this is how the json file in wrong format is read.
I used the following code in vscode
s3_client = boto3.client("s3")
response = s3_client.get_object(Bucket = 'bucketname', Key = 'key')
data = response['Body'].read()
print(data)
and get the following result of the read json file
b'"{\\"event_id\\":\\"123456789\\", etc.}\\n"'
when i read the correct saved json with the same code as before, so with this
s3_client = boto3.client("s3")
response = s3_client.get_object(Bucket = 'bucketname', Key = 'key')
data = response['Body'].read()
print(data)
i get the following output
b'\x1f\x8b\x'
So I need a way to make the data which was send in before the two changes to be in the same format.
I was thinking of doing it in to ways
1.
Make everything in VSCode
So get all objects
Solve this problem with the json.
Gzipping it either in vscode and putting the data back into the bucket or sending it to a delivery stream and compressing it there.
Writing a lambda which will be triggered by new uploaded files in a S3 bucket.
Within the lambda solving the issue with json.
Next put it to a delivery stream, compressing it there and from there putting it to the correct bucket.

Related

Bug in Boto3 AWS S3 generate_presigned_url in Lambda Python 3.X with specified region?

I tried to write a python lambda function that returns a pre-signed url to put an object.
import os
import boto3
import json
import boto3
session = boto3.Session(region_name=os.environ['AWS_REGION'])
s3 = session.client('s3', region_name=os.environ['AWS_REGION'])
upload_bucket = 'BUCKER_NAME' # Replace this value with your bucket name!
URL_EXPIRATION_SECONDS = 30000 # Specify how long the pre-signed URL will be valid for
# Main Lambda entry point
def lambda_handler(event, context):
return get_upload_url(event)
def get_upload_url(event):
key = 'testimage.jpg' # Random filename we will use when uploading files
# Get signed URL from S3
s3_params = {
'Bucket': upload_bucket,
'Key': key,
'Expires': URL_EXPIRATION_SECONDS,
'ContentType': 'image/jpeg' # Change this to the media type of the files you want to upload
}
# Get signed URL
upload_url = s3.generate_presigned_url(
'put_object',
Params=s3_params,
ExpiresIn=URL_EXPIRATION_SECONDS
)
return {
'statusCode': 200,
'isBase64Encoded': False,
'headers': {
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps(upload_url)
}
The code itself works and returns a signed URL in the format "https://BUCKET_NAME.s3.amazonaws.com/testimage.jpg?[...]"
However when using POSTMAN to try to put an object, it loads without ending.
Originally I thought it was because of my code, and after a while I wrote a NodeJS function that does the same thing:
const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const s3 = new AWS.S3()
const uploadBucket = 'BUCKET_NAME' // Replace this value with your bucket name!
const URL_EXPIRATION_SECONDS = 30000 // Specify how long the pre-signed URL will be valid for
// Main Lambda entry point
exports.handler = async (event) => {
return await getUploadURL(event)
}
const getUploadURL = async function(event) {
const randomID = parseInt(Math.random() * 10000000)
const Key = 'testimage.jpg' // Random filename we will use when uploading files
// Get signed URL from S3
const s3Params = {
Bucket: uploadBucket,
Key,
Expires: URL_EXPIRATION_SECONDS,
ContentType: 'image/jpeg' // Change this to the media type of the files you want to upload
}
return new Promise((resolve, reject) => {
// Get signed URL
let uploadURL = s3.getSignedUrl('putObject', s3Params)
resolve({
"statusCode": 200,
"isBase64Encoded": false,
"headers": {
"Access-Control-Allow-Origin": "*"
},
"body": JSON.stringify(uploadURL)
})
})
}
The NodeJs version gives me a url in the format of "https://BUCKET_NAME.s3.eu-west-1.amazonaws.com/testimage.jpg?"
The main difference between the two is the aws sub domain. When using NodeJS it gives me "BUCKET_NAME.s3.eu-west-1.amazonaws.com" and when using Python "https://BUCKET_NAME.s3.amazonaws.com"
When using python the region does not appear.
I tried, using the signed url generated in python to add the "s3.eu-west-1" manually and IT Works!!
Is this a bug in the AWS Boto3 python library?
as you can see, in the python code I tried to specify the region but it does not do anything.?
Any idea guys ?
I wanna solve this mystery :)
Thanks a lot in advance,
Léo
I was able to reproduce the issue in us-east-1. There are a few bugs in Github (e.g., this and this), but the proposed resolutions are inconsistent.
The workaround is to create an Internet-facing access point for the bucket and then assign the full ARN of the access point to your upload_bucket variable.
Please note that the Lambda will create a pre-signed URL, but it will only work if the Lambda has an appropriate permissions policy attached to its execution role.

how do I consume django api from react frontend

I'm new to React, I've written a Django API endpoint to perform some kind of sync. I want this API to be called from a React page, on clicking a TextLink - say Sync Now, how I should go about doing it? Based on the response from the API(200 or 400/500), I want to display Sync failed, on successful sync, I want to show Sync Now again, but with another text Last Sync Time: DateTime(for this I've added a key in my Django model), how can I use this as well.
Also, I've a followup, say instead of Sync Now and Synced we have another state Syncing, which is there until a success or failure is returned, Is polling from the server is a good option, or is there any other way. I know of websockets but not sure which can be used efficiently.
I've been stuck here from 3 days with no real progress. Anything will help. Thanks in advance.
I'd do something like that:
// Store last sync date in state with default value null
const [lastSyncDate, setLastSyncDate] = useState(null);
const [isLoading, setIsLoading] = useState(false);
const handleApiCall = async () => {
try {
setIsLoading(true);
const response = await fetch('api.url');
if (response.status === 200) {
const currentDate = Date.now().toString();
setLastSyncDate(currentDate);
}
}
catch(error) {
// handle error here
}
finally {
setIsLoading(false);
}
}
if (isLoading) {
return 'Loading...';
}
return (
<div onClick={() => handleApiCall()}>
{lastSyncDate !== null ? `lastSynced: ${lastSyncDate}` : "Sync now"}
</div>
)

How can I upload a file with JSON data to django rest api?

I am using angular as the front-end scope and Django Rest as the back-end. Now I am facing a situation where I want to create a Model. The structure of a model is really complex in nature, I could use some other simple way outs but using JSON and passing the files with that can really simplify the logic and also make process really efficient.
I am have been trying a lot but none of the ways seem to work.
Can some someone help me with a standard way or tell me even it is possible or not.
This the structure of my Typescript which I want to upload.
import { v4 as uuid4 } from 'uuid';
export interface CompleteReport{
title: string;
description: string;
author: string;
article_upload_images: Array<uuid4>,
presentation_upload_images: Array<uuid4>,
report_article: ReportArticle,
report_image: ReportImage,
report_podcast: ReportPodcast,
report_presentation: ReportPresentation,
report_video: ReportVideo,
}
export interface ReportArticle{
file: File;
body: string;
}
export interface ReportPodcast{
file: any;
}
export interface ReportVideo{
file: Array<File>;
}
export interface ReportImage{
file: File;
body: string;
}
export interface ReportPresentation{
body: string;
}
export interface UploadImage{
file: File;
}
I don't see how you wanna send data, but if you wanna send data with multipart/data-form, I think you should make small changes to your report structure.
JSON doesn't supports binary. So, you can't put files on it. You need to split file and report JSON.
(async () => {
let formData = new FormData();
// here's how to send file on multipart/data-form via fetch
let reportFile = document.querySelector('#file');
formData.append("file", imagefile.files[0]);
// here's your report json
let report = {
...
};
formData.append("report", JSON.stringify(report));
// send request and upload
let response = await fetch(url, {
method: 'POST',
body: formData
});
// do something with response
let responseText = await response.text();
console.log(responseText)
})();
And I see UUID on your frontend code, I think it's better to put that kinda stuff on backend to prevent manipulated request. I think it's better to put complicated stuff, and any server data related on your backend. Just my opinion.

How to post complex type to WCF using Python's requests?

I am trying to query a WCF web service using Python's request package.
I created a very simple web service in WCF, following the default VS template:
[ServiceContract]
public interface IHWService
{
[OperationContract]
[WebInvoke(Method="GET", UriTemplate="SayHello", ResponseFormat=WebMessageFormat.Json)]
string SayHello();
[OperationContract]
[WebInvoke(Method = "POST", UriTemplate = "GetData", ResponseFormat = WebMessageFormat.Json)]
string GetData(int value);
[OperationContract]
[WebInvoke(Method = "POST", UriTemplate = "GetData2", BodyStyle=WebMessageBodyStyle.Bare, RequestFormat=WebMessageFormat.Json, ResponseFormat = WebMessageFormat.Json)]
CompositeType GetDataUsingDataContract(CompositeType composite);
// TODO: Add your service operations here
}
From Python, I manage to call the first two and get the data back easily.
However, I am trying to call the third one, which adds the concept of complex types.
This is my python code:
import requests as req
import json
wsAddr="http://localhost:58356/HWService.svc"
methodPath="/GetData2"
cType={'BoolValue':"true", 'StringValue':'Hello world'}
headers = {'content-type': 'application/json'}
result=req.post(wsAddr+methodPath,params=json.dumps({'composite':json.dumps(cType)}),headers=headers)
But it does not work, i.e., if I put a breakdown in VS in the GetDataUsingDataContract method, I see that the composite argument is null. I think this comes from a problem in parsing, but I can't quite see what's wrong.
Do you see an obvious mistake there?
Do you know how I can debug inside the parsing mechanism?
EDIT:
Here is the complex type definition:
[DataContract]
public class CompositeType
{
bool boolValue = true;
string stringValue = "Hello ";
[DataMember]
public bool BoolValue
{
get { return boolValue; }
set { boolValue = value; }
}
[DataMember]
public string StringValue
{
get { return stringValue; }
set { stringValue = value; }
}
}
You need to send JSON in the POST body, but you are attaching it to the query parameters instead.
Use data instead, and only encode the outer structure:
result=req.post(wsAddr+methodPath,
data=json.dumps({'composite': cType}),
headers=headers)
If you encoded cType, you'd send a JSON-encoded string containing another JSON-encoded string, which in turn contains your cType dictionary.

Calling a python function using dojo/request

Firstly, I'm very new to the world of web development, so sorry if this question is overly simple. I'm trying to use python to handle AJAX requests. From reading the documentation it seems as though Dojo/request should be able to do this form me, however I've not found any examples to help get this working.
Assuming I've got a Python file (myFuncs.py) with some functions that return JSON data that I want to get from the server. For this call I'm interested in a particular function inside this file:
def sayhello():
return simplejson.dumps({'message':'Hello from python world'})
What is not clear to me is how to call this function using Dojo/request. The documentation suggests something like this:
require(["dojo/dom", "dojo/request", "dojo/json", "dojo/domReady!"],
function(dom, request, JSON){
// Results will be displayed in resultDiv
var resultDiv = dom.byId("resultDiv");
// Request the JSON data from the server
request.get("../myFuncs.py", {
// Parse data from JSON to a JavaScript object
handleAs: "json"
}).then(function(data){
// Display the data sent from the server
resultDiv.innerHTML = data.message
},
function(error){
// Display the error returned
resultDiv.innerHTML = error;
});
}
);
Is this even close to what I'm trying to achieve? I don't understand how to specify which function to call inside myFuncs.py?
What you could also do is to create a small jsonrpc server and use dojo to do a ajax call to that server and get the json data....
for python side you can follow this
jsonrpclib
for dojo you could try something like this..
<script>
require(['dojox/rpc/Service','dojox/rpc/JsonRPC'],
function(Service,JsonRpc)
{
function refreshContent(){
var methodParams = {
envelope: "JSON-RPC-2.0",
transport: "POST",
target: "/jsonrpc",
contentType: "application/json-rpc",
services:{}
};
methodParams.services['myfunction'] = { parameters: [] };
service = new Service(methodParams);
function getjson(){
dojo.xhrGet({
url: "/jsonrpc",
load : function(){
var data_list = [];
service.myfunction().then(
function(data){
dojo.forEach(data, function(dat){
data_list.push(dat);
});
console.log(data_list)
},
function(error) {
console.log(error);
}
);
}
});
}
getjson();
}
refreshContent();
});
});
</script>
I've used this approach with django where i am not creating a different server for the rpc calls but using django's url link to forward the call to my function.. But you can always create a small rpc server to do the same..

Categories