Creating a CKAN package/dataset with resources using ckanapi and Python

Creating a CKAN package/dataset with resources using ckanapi and Python - python

CKAN provides the ckanapi package for accessing the CKAN API via Python or the command line.
I can use it to download metadata, create resources, etc. But I can't create a package and upload resources to it in a single API call. (A package is also referred to as a dataset.)
Internally, ckanapi scans all keys moving any file-like parameters into a separate dict, which it passes to the requests.session.post(files=..) parameter.
This is the closest I can get but CKAN returns an HTTP 500 error (copied from this guide to requests):
with ckanapi.RemoteCKAN('http://myckan.example.com', apikey='real-key', user_agent=ua, username='joe', password='pwd') as ckan:
ckan.action.package_create(name='joe_data',
resources=('report.xls',
open('/path/to/file.xlsx', 'rb'),
'application/vnd.ms-excel',
{'Expires': '0'}))
I've also tried resources=open('path/file'), files=open('file'), shorter or longer tuples, but get the same 500 error.
The requests documentation says:
:param files: (optional) Dictionary of ``'filename': file-like-objects``
for multipart encoding upload.
I can't pass ckanapi resources={'filename': open('file')} as ckanapi doesn't detect the file, attempts to pass it to requests as a normal parameter, and fails ("BufferedReader is not JSON serializable" as it attempts to make the file a POST parameter). I get the same if I try to pass a list of files. But the API is able to create a package and add a number of resources in a single call.
So how do I create a package and multiple resources with a single ckanapi call?

I was curious about this and thought I'd put something together to test it. Unfortunately I haven't played with the CLI you mentioned. But I hope this will help you and others stumbling across this.
I am not positive but I'm guessing your resource dict isn't formatted properly. The resources needs to be a list of dictionaries.
Here's a ruby script to do the single api call insert (my preferred language at this time):
# Ruby script to create a package and resource in one api call.
# You can run this in https://repl.it/languages/ruby
# Don't forget to update URLs and API key.
require 'csv'
require 'json'
require 'net/http'
hash_to_json = {
"title" => 'test1',
"name" => 'test1',
"owner_org" => 'bbb9682e-b58c-4826-bf4b-b161581056be',
"resources" => [
{
"url" => 'http://www.resource_domain.com/doc.kml'
}
]
}.to_json
uri = URI('http://ckan_app_domain.com:5000/api/3/action/package_create')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Post.new uri
request['Authorization'] = 'user-api-key'
request.body = hash_to_json
response = http.request request
puts response.body
end
And here's a plain python script to do the same thing (thank you CKAN docs for this template I modified)
#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint
# Put the details of the dataset we're going to create into a dict.
dataset_dict = {
'name': 'my_dataset_name',
'notes': 'A long description of my dataset',
'owner_org': 'bbb9682e-b58c-4826-bf4b-b161581056be',
'resources': [
{
'url': 'example.com'
}
]
}
# Use the json module to dump the dictionary to a string for posting.
data_string = urllib.quote(json.dumps(dataset_dict))
# We'll use the package_create function to create a new dataset.
request = urllib2.Request(
'http://ckan_app_domain.com:5000/api/3/action/package_create')
# Creating a dataset requires an authorization header.
# Replace *** with your API key, from your user account on the CKAN site
# that you're creating the dataset on.
request.add_header('Authorization', 'user-api-key')
# Make the HTTP request.
response = urllib2.urlopen(request, data_string)
assert response.code == 200
# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())
assert response_dict['success'] is True
# package_create returns the created package as its result.
created_package = response_dict['result']
pprint.pprint(created_package)

Related

Create asset for GitLab release

I am able to successfully create a release with the GitLab API but I am trying to create an additional asset that has a link in the release, package.zip. The release currently has the entire code as a zip but I am wanting to create a zip out of a subset of the repo.
Reading here: https://docs.gitlab.com/ee/user/project/releases/index.html#permanent-links-to-release-assets
It looks like I needed to do something similar to the following:
if __name__ == '__main__':
url = "https://gitlab.com/api/v4/projects/12345678/releases"
headers = {'PRIVATE-TOKEN': os.environ['CI_JOB_TOKEN']}
data = {'tag_name': 'Lite-Release',
'assets': {
'links': [{
'name': 'link_test',
'url': 'https://gitlab.com/api/v4/projects/12345678/releases/Lite-Release/downloads',
'filepath': '/package.zip', 'link_type': 'other'}]
}}
post_resp = requests.post(url, headers=headers, data=data)
print(post_resp.text)
This returns the error: {"error":"assets is invalid"}
What am I missing here?
Is the url field supposed to be what I want the url to be or what?
Edit: It does not appear to be a JSON formatting issue as the following works fine and creates a release.
if __name__ == '__main__':
url = "https://gitlab.com/api/v4/projects/12345678/releases"
headers = {'PRIVATE-TOKEN': os.environ['PRIVATE_TOKEN']}
data = {'tag_name': 'tag_test', 'ref': 'HEAD'}
post_resp = requests.post(url, headers=headers, data=data)
print(post_resp.text)

Did the same as you using curl, compared the data json, and everything seems ok.
Had previously problems with the CI_JOB_TOKEN, which can create a release (as stated in the docs - https://docs.gitlab.com/ee/api/README.html#gitlab-cicd-job-token), however it has not enough privileges to include assets - got a 401, instead of the error message you get. Might be a difference in the Gitlab server version..
By using a Personal Access Token I was able to make the creation of a release with package asset possible.

When creating a release via API the data is a JSON-object.
The JSON standard requires double quotes and will not accept single quotes.

Python sending POST requests/ multipart/form-data

I'm just working on API conection at my work. I already made some GET and PUT request, but now i have problem with POST. API documantation is here. And here is my code I test but get 400 bad request:
import requests
files = {'files': ('fv.pdf', open(r"C:\python\API\fv.pdf", 'rb'))}
data = {"order_documents":[{'file_name':"fv.pdf", 'type_code':'CUSTOMER_INVOICE' }]}
header = {
'Authorization': '###########################',
}
response = requests.post("https://######.com/api/orders/40100476277994-A/documents", headers=header, files = files, data = data)
print(response.status_code)
print(response.url)
Someone have any idea how i can handle with this?

Looks like you are missing the order_documents parameter, it needs to be an array and also needs to be called order_documents.
Try changing your data variable into:
data = {"order_documents": [ {'file_name':"fv.pdf", 'type_code':'CUSTOMER_INVOICE' } ] }
The API expects files as the parameter name and your dictionary sends file to the server. The parameter name files that you give to session.post is just for requests library and not the actual parameter sent to the server.
The API also expects multiple files in an array, so you need to change your files object.
files = [
('files', ('fv.pdf', open(r"C:\python\API\fv.pdf", 'rb')),
]
Also, I don't think you need to use requests.Session(), just use requests.post(), unless you're planning on using the session object multiple times for subsequent requests.

How to upload files to slack using file.upload and requests

I've been searching a lot and I haven't found an answer to what I'm looking for.
I'm trying to upload a file from /tmp to slack using python requests but I keep getting {"ok":false,"error":"no_file_data"} returned.
file={'file':('/tmp/myfile.pdf', open('/tmp/myfile.pdf', 'rb'), 'pdf')}
payload={
"filename":"myfile.pdf",
"token":token,
"channels":['#random'],
"media":file
}
r=requests.post("https://slack.com/api/files.upload", params=payload)
Mostly trying to follow the advice posted here

Sending files through http requires a bit more extra work than sending other data. You have to set content type and fetch the file and all that, so you can't just include it in the payload parameter in requests.
You have to give your file information to the files parameter of the .post method so that it can add all the file transfer information to the request.
my_file = {
'file' : ('/tmp/myfile.pdf', open('/tmp/myfile.pdf', 'rb'), 'pdf')
}
payload={
"filename":"myfile.pdf",
"token":token,
"channels":['#random'],
}
r = requests.post("https://slack.com/api/files.upload", params=payload, files=my_file)

Writing this post, to potentially save you all the time I've wasted. I did try to create a new file and upload it to Slack, without actually creating a file (just having it's content). Because of various and not on point errors from the Slack API I wasted few hours to find out that in the end, I had good code from the beginning and simply missed a bot in the channel.
This code can be used also to open an existing file, get it's content, modify and upload it to Slack.
Code:
from io import StringIO # this library will allow us to
# get a csv content, without actually creating a file.
sio = StringIO()
df.to_csv(sio) # save dataframe to CSV
csv_content = sio.getvalue()
filename = 'some_data.csv'
token=os.environ.get("SLACK_BOT_TOKEN")
url = "https://slack.com/api/files.upload"
request_data = {
'channels': 'C123456', # somehow required if you want to share the file
# it will still be uploaded to the Slack servers and you will get the link back
'content': csv_content, # required
'filename': filename, # required
'filetype': 'csv', # helpful :)
'initial_comment': comment, # optional
'text': 'File uploaded', # optional
'title': filename, # optional
#'token': token, # Don't bother - it won't work. Send a header instead (example below).
}
headers = {
'Authorization': f"Bearer {token}",
}
response = requests.post(
url, data=request_data, headers=headers
)
OFFTOPIC - about the docs
I just had a worst experience (probably of this year) with Slack's file.upload documentation. I think that might be useful for you in the future.
Things that were not working in the docs:
token - it cannot be a param of the post request, it must be a header. This was said in one of github bug reports by actual Slack employee.
channel_not_found - I did provide an existing, correct channel ID and got this message. This is somehow OK, because of security reasons (obfuscation), but why there is this error message then: not_in_channel - Authenticated user is not in the channel. After adding bot to the channel everything worked.
Lack of examples for using content param (that's why I am sharing my code with you.
Different codding resulted with different errors regarding form data and no info in the docs helped to understand what might be wrong, what encoding is required in which upload types.
The main issue is they do not version their API, change it and do not update docs, so many statements in the docs are false/outdated.

Base on the Slack API file.upload documentation
What you need to have are:
Token : Authentication token bearing required scopes.
Channel ID : Channel to upload the file
File : File to upload
Here is the sample code. I am using WebClient method in #slack/web-api package to upload it in slack channel.
import { createReadStream } from 'fs';
import { WebClient } from '#slack/web-api';
const token = 'token'
const channelId = 'channelID'
const web = new WebClient(token);
const uploadFileToSlack = async () => {
await web.files.upload({
filename: 'fileName',
file: createReadStream('path/file'),
channels: channelId,
});
}

Flask Restful: change representation based on URL parameter

I am building an API using Flask and Flask-Restful. The API might be accessed by different sort of tools (web apps, automated tools, etc.) and one of the requirement is to provide different representations (let's say json and csv for the sake of the example)
As explained in the restful doc, it's easy to change the serialization based on the content type, so for my CSV serialization I've added this:
#api.representation('text/csv')
def output_csv(data, code, headers=None):
#some CSV serialized data
data = 'some,csv,fields'
resp = app.make_response(data)
return resp
And it's working when using curl and passing the correct -H "Accept: text/csv" parameter.
The issue is that since some browsers might be routed to a url directly to download a csv file, I would like to be able to force my serialization via a url parameter for example http://my.domain.net/api/resource?format=csv where the format=csvwould have the same effect as -H "Accept: text/csv".
I've gone through both Flask and Flask-Restful documentation and I don't see how to correctly handle this.

Simply create a sub-class of Api and override the mediatypes method:
from werkzeug.exceptions import NotAcceptable
class CustomApi(Api):
FORMAT_MIMETYPE_MAP = {
"csv": "text/csv",
"json": "application/json"
# Add other mimetypes as desired here
}
def mediatypes(self):
"""Allow all resources to have their representation
overriden by the `format` URL argument"""
preferred_response_type = []
format = request.args.get("format")
if format:
mimetype = FORMAT_MIMETYPE_MAP.get(format)
preferred_response_type.append(mimetype)
if not mimetype:
raise NotAcceptable()
return preferred_response_type + super(CustomApi, self).mediatypes()

Basically you want to retrieve parameters from GET method. Please refer to:
How do I get the url parameter in a Flask view

Python, can someone explain this script to me?

I know nothing about Python but would like to clone this script with jquery using ajax post.
To do that i need to know what this script is doing in the first place.
import requests
import json
params = {'nearest': True, 'imageurl': img, 'timestamp':140000}
request = requests.post('http://example.com/api/upload/', data=params)
output = request.json()
print json.dumps(output['files'][0]['predicted_classes'])
Thanks. If something is unclear please comment and i'll clarify.

import requests
import json
above line imports two modules Request(contain methods for sending request to the server) and json(to serialise/deserialise data to json)
params = {'nearest': True, 'imageurl': img, 'timestamp':140000}
creating a dictionary with key value .here it is used to pass parameter
response= requests.post('http://example.com/api/upload/', data=params)
this is used to send Post resquest . here post is method in request module with parameters(Url,data_to_send)
output = response.json()
output has the response in json format
print json.dumps(output['files'][0]['predicted_classes'])
json dumps is used to convert to json format

This code does the following:
1) #First it imports the external modules.
2) #Next it defines params as a dictionary with 3 entries.
3) #Then it uses request libraries to get the file and transfigures "params" into a json object
4) #Lastly, the code prints the request.
To see the request you may need to use an image library to see what you gathered from the World Wide Web.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.