HBase Tables¶
Images¶
Row¶
Each row corresponds to an “image” along with all associated features, metadata, etc. The image itself is stored in data:image.
Permissions¶
Users can read all columns and write to data:image and meta: (i.e., anything under meta:).
Column Families¶
Column Family | Description |
data | Image data. data:image is where the “source” image goes. Preprocessors place other copies in data: |
thum | Where visualization-only thumbnails exist (these are not to be used for actual analysis) |
feat | Image features |
pred | Image predictions stored as a binary double. |
hash | Hash codes stored as binary bytes. Separated from feat so that it can be scanned fast. |
meta | Image labels, tags, etc. |
Videos¶
There are a variety of ways to store videos, this approach allows for client/server/workers to only have a constant amount of the video in memory at any given time which is essential for long videos. As the video isn’t re-encoded, if we want to execute a video in parallel with frame-accurate processing then each client will need to have access to portion of video they need to process along with all video preceeding it. This is a very conservative approach that simplifies the implementation considerably and future optimizations will allow us to only require the preceding chunk. However, any non-trivial analysis of the video will dominate the execution time so this isn’t a priority at this point.
Row¶
Each row corresponds to a “video” along with all associated features, metadata, etc. The video data is broken into chunks, with the chunk size stored in meta:video_chunk_size (the last chunk may be partial) and the number of chunks stored in meta:video_chunks. The current recommended chunk size is 1MB. To ensure that when the video is reconstituted it is identical to the original, the sha1 hash is stored in meta:video_sha1. These must all be set by the user from Picarus’s point of view, but practically this will be done by a client side library that developers will interact with.
Permissions¶
Users can read all columns and write to data:video- and meta: (i.e., anything under meta:).
Column Families¶
Column Family | Description |
data | Video data. data:video-* is where the “source” video goes. |
thum | Where visualization-only thumbnails exist (these are not to be used for actual analysis) |
feat | Image/video features |
pred | Image/video predictions stored as a binary double. |
hash | Hash codes stored as binary bytes. Separated from feat so that it can be scanned fast. |
meta | Image labels, tags, etc. |
Models¶
Row¶
Each row corresponds to a “model” which is something derived from data, primarily from the images table. Parameters of the model should be included, along with the source columns used to produce it.
Permissions¶
Users can read all columns and write to data:tags, data:notes, and user: (i.e., anything under user).
Column Families¶
Column Family | Description |
user | Stored user permissions (“r” or “rw”) as user:name@domain.com |
data | Used for everything not in user: |