Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data

  • S3 compatible interface for easy up- and download
  • Compatible with many storage backends through OpenDAL
    • Currently FS, S3, FTP, Postgres
  • Manual replication or configurable replication policies
  • Easy ingestion of existing data per API
  • Content hash addressable data resources
  • Easy discovery of content because content hashes are stored in DHT

More info

Files can be uploaded independently of metadata creation via an s3 interface. Data can be located at nodes by its content hash, or by its s3 path. While s3 paths can change, content hashes cannot, allowing for pinning specific data versions via content hashes, while still allowing for flexiblity of paths via the s3 interface. Data can be stored in different data-backends, allowing for flexible deployment of nodes in vastly different environments. Not only newly created data can be stored on nodes, but also existing data on supported backends can be ingested to a node, allowing for data reuse and integration of existing data sources into aruna. Data can be replicated to other nodes via replication policies. Disclaimer: Filters in replication policies are currently ignored. If a complete bucket gets replicated, the associated paths and permissions are also replicated. If only some objects are replicated, they are only replicated by content hash, without their s3 paths. This means, that data can be accessed at any node by its content hashes, but only on some nodes by its s3 paths. Uploaded data can be linked to metadata objects, including their technical metadata. This allows for workflows to first create all neccessary data and load it into aruna, and then select only the neccessary data to be linked in one or more metadata objects.