API

There are Python and Javascript libraries available for communicating with Picarus and they are kept in sync with the server. As the API is being changed somewhat frequently at this point, it is best to use these libraries for communication. The documentation below explains the encodings, REST calls, and parameters available.

Data access

You can access data by row (/data/:table/:row) or by slice (/slice/:table/:startRow/:stopRow which is [startRow, stopRow)). Slices exploit the contiguous nature of the rows in HBase and allow for batch execution on Hadoop.

Two-Factor Authentication: Yubikey/Email

Picarus supports two forms of additional authentication Yubikey (yubico.com/yubikey) which is a hardware token that can be programmed and input through a Picarus admin tool (api/yubikey.py) and email where a key is sent to a user’s email address. Using a Yubikey has the benefit of a more streamlined login process (i.e., one press vs checking email and pasting key) and is preferred if available.

Authentication

All calls use HTTP Basic Authentication with an email as the user and either the Login Key (only for /auth/) or API Key (everything but /auth/) as the password.

  • Email: Used to send API/Login keys, used in all calls as the “user”.
  • Login Key: Used only for /auth/ calls as they are used to get an API key.
  • API Key: Used for all other calls.

Get an API Key (email)

Send user an email with an API key.

Example Response

{}

Example: Python

r = picarus.PicarusClient(server=server, email=email, login_key=login_key).auth_email_api_key()
assert r == {}

Example: Javascript

p = new PicarusClient(server=server)
p.setAuth(email, loginKey)
p.authEmailAPIKey({success: testPassed, fail: testFailed})

Get an API Key (yubikey)

Return an API Key given a Yubikey One-Time Password (OTP).

PARAMETERS

  • otp (string): Yubikey token

EXAMPLE RESPONSE

{"apiKey": "w0tnnb7wcUbpZFp8wH57"}

Example: Python

r = picarus.PicarusClient(server=server, email=email, login_key=login_key).auth_yubikey(otp)
assert 'apiKey' in r

Example: Javascript

p = new PicarusClient(server=server)
p.setAuth(email, loginKey)
p.authYubikey({success: function (r) {if (_.has(r, 'apiKey')) testPassed() else testFailed()}, fail: testFailed})

Encodings

JSON has become the standard interchange for REST services; however, it does not support binary data without encoding and when using HBase the row/column/value is, in general, binary as the underlying data is a byte string. Moreover, we often using rows/columns in URLs, making standard url escape (due to %00 primarily) and base64 not appropriate as various browsers and intermediate servers will have issues with URLs containing these characters. Values on the other hand are never used in URLs but they still must be JSON safe. Base64 encoding is often performed natively and as values are often large (much larger than rows/columns) it makes sense to ensure that encoding/decoding them is as efficient as possible. Consequently, rows/columns are always “urlsafe” base64 (+ -> - and / -> _) and values are always base64. Below are implementations of the necessary enc/dec functions for all the encodings necessary in Picarus. The encodings will be referred to by their abbreviated name (e.g., ub64) and from context it will be clear if enc/dec is intended.

Python

import base64
import json
b64_enc = base64.b64encode
b64_dec = base64.b64decode
ub64_enc = base64.urlsafe_b64encode
ub64_dec = base64.urlsafe_b64decode
json_ub64_b64_enc = lambda x: json.dumps({ub64_enc(k): b64_enc(v)
                                          for k, v in x.items()})
json_ub64_b64_dec = lambda x: {ub64_dec(k): b64_dec(v)
                               for k, v in json.loads(x).items()}

Javascript

// Requires underscore.js (http://underscorejs.org/) and base64
// (http://stringencoders.googlecode.com/svn-history/r210/trunk/javascript/base64.js)
// b64
b64_enc = base64.encode
b64_dec = base64.decode
// ub64
function ub64_enc(x) {
    return base64.encode(x).replace(/\+/g , '-').replace(/\//g , '_');
}
function ub64_dec(x) {
    return base64.decode(x.replace(/\-/g , '+').replace(/\_/g , '/'));
}
// json_ub64_b64
function json_ub64_b64_enc(x) {
    return JSON.stringify(_.object(_.map(_.pairs(x), function (i) {
        return [ub64_enc(i[0]), b64_enc(i[1])];
    })));
}
function json_ub64_b64_dec(x) {
    return _.object(_.map(_.pairs(JSON.parse(x)), function (i) {
        return [ub64_dec(i[0]), b64_dec(i[1])];
    }));
}

Versioning

All API calls are prefixed with a version (currently /a1/) that is an opaque string.

HTTP Status Codes

Standard status codes used are 400, 401, 403, 404, and 500. In general 4xx is a user error and 5xx is a server error.

Column Semantics

In several API calls a “columns” parameter is available, each column is b64 encoded and separated by commas (,). The parameter itself is optional (i.e., if not specified, all columns are returned). For GET operations, a row will be returned if it contains a single of the specified columns or any columns at all if there are none specified. As these columns are used in HBase, the column family may also be specified and has the same semantics as they do with the Thrift API (i.e., has the effect of returning all columns in the column family); however, this property only holds for tables stored in HBase.

Content-Type: application/json

If the request “Content-Type” is set to “application/json” then JSON parameters may be provided as a JSON object where columns are replaced with lists of b64 encoded values instead of comma delimiting them in a string.

Table Permissions

The table below contains the data commands for Picarus. GET/PATCH/DELETE are idempotent (multiple applications have the same impact as one). Each table defines which columns can be modified directly by a user (see individual table docs for details).

Verb Path Table Encoding
images models parameters annotation-* Input Output
GET /data/:table N Y Y Y col row list
POST /data/:table Y Y N N b64/b64 {row: b64}
GET /data/:table/:row Y Y N N col b64/b64
POST /data/:table/:row Y N N N raw/b64 b64/b64
PATCH /data/:table/:row Y Y N N b64/b64 {}
DELETE /data/:table/:row Y Y N N none {}
DELETE /data/:table/:row/:column Y Y N N none {}
GET /slice/:table/:startRow/:stopRow Y N N N col+raw/raw row list
POST /slice/:table/:startRow/:stopRow Y N N N raw/b64 b64/b64
PATCH /slice/:table/:startRow/:stopRow Y N N N b64/b64 {}
DELETE /slice/:table/:startRow/:stopRow N N N N none {}
  • “col”: a key of “columns” with a value that is b64’d columns separated by commas.
  • “b64/b64”: key/value pairs that are both base64 encoded.
  • “raw/b64”: keys that are plaintext and values that are base64 encoded.
  • “row list”: outputs a json list of objects, each with an attribute of “row” that is the base64 encoded row key. All other key/values are base64 encoded.
  • In the url “table” is plaintext. “row”, “column”, “startRow”, and “stopRow” are ub64.

Row Operations

Create a row

Upload data without specifying a row.

PARAMETERS

  • *b64 column* (b64): One or more base64 encoded column/value pairs. See table permissions for what values you can set.

EXAMPLE RESPONSE

{"row": b64 row}

Create/Modify a row

Upload data specifying a row. A row need not be created with POST before this operation can be called. Use this operation when you want the row to be a specific value (normally the case) and the POST method for temporary data.

PARAMETERS

  • *b64 column* (b64): One or more base64 encoded column/value pairs. See table permissions for what values you can set.

EXAMPLE RESPONSE

{}

Get row

Get data from the specified row

PARAMETERS

  • columns (string): Optional list of columns (b64 encoded separated by ‘,’).

EXAMPLE RESPONSE

{"meta:class": "horse"}

DELETE /data/:table/:row

Delete a specified row

PARAMETERS

None

EXAMPLE RESPONSE

{}

Example: Python

c = picarus.PicarusClient(server=server, email=email, api_key=api_key)
# POST /data/images
r = c.post_table('images', {'meta:class': 'horse', 'data:image': 'not image'})
assert 'row' in r
row = r['row']
# GET /data/images/:row
r = c.get_row('images', row, ['meta:class'])
assert r == {'meta:class': 'horse'}
r = c.get_row('images', row, ['meta:'])
assert r == {'meta:class': 'horse'}
r = c.get_row('images', row, ['data:image'])
assert r == {'data:image': 'not image'}
r = c.get_row('images', row)
assert r == {'meta:class': 'horse', 'data:image': 'not image'}
# PATCH /data/images/:row
r = c.patch_row('images', row, {'meta:class': 'cat', 'data:image': 'image not'})
assert r == {}
# GET /data/images/:row
r = c.get_row('images', row)
assert r == {'meta:class': 'cat', 'data:image': 'image not'}
# DELETE /data/images/:row
r = c.delete_row('images', row)
assert r == {}

Creating a Model

Create a model that doesn’t require training data.

PARAMETERS

  • path (string): Model path (valid values found by GET /data/parameters)
  • model-* (string): Model parameter
  • module-* (string): Module parameter
  • key-* (ub64): Input parameter key

EXAMPLE RESPONSE

{"row": b64 row}

POST /data/:table/:row

Perform an action on a row

Each action specifies it’s own return value and semantics.

PARAMETERS

  • action: Execute this on the row
action parameters description
i/classify imageColumn, model Classify an image using model
i/search imageColumn, model Query search index using image

POST /data/:table/:startRow/:stopRow

Get a slice of rows

PARAMETERS

  • maxRows: Maximum number of rows (int, max value of 100)
  • filter: Valid HBase thrift filter
  • excludeStart: If 1 then skip the startRow, |maxRows| are still returned if we don’t reach stopRow.
  • cacheKey: A user provided key (opaque string) that if used on a repeated call with excludeStart=1 and the new startRow (last row of the result), the internal scanner may be reused. This is a significant optimization when enumerating long slices.
  • column: This is optional and repeated, represents columns that should be returned (if not specified then all columns are).

Perform an action on a slice

Each action specifies it’s own return value and semantics.

PARAMETERS

  • action: Execute this on the row
action parameters description
io/thumbnail    
io/exif    
io/preprocess model  
io/classify model  
io/feature model  
io/hash model  
i/dedupe/identical column  
o/crawl/flickr className, query, apiKey, apiSecret, hasGeo, minUploadDate, maxUploadDate, page  
io/annotate/image/query imageColumn, query  
io/annotate/image/entity imageColumn, entityColum  
io/annotate/image/query_batch imageColumn, query  
i/train/classifier/svmlinear key-meta, model-class_positive, key-feature  
i/train/classifier/nbnnlocal key-meta, key-multi_feature  
i/train/hasher/rrmedian module-hash_bits, key-feature  
i/train/index/linear *TODO*