Update: The state of Ceph support in openATTIC

In addition to managing "traditional" storage resources like CIFS/NFS, iSCSI and Fibre Channel, we started adding Ceph management capabilities to openATTIC 2.0 some time last year.

For us, Ceph is the answer to users looking for ways to scale their existing storage systems from individual storage nodes into a scalable, distributed and self-healing storage cluster, where an openATTIC node can also perform as a "bridge" to translate between legacy applications that still depend on established storage protocols/methods and this new storage paradigm.

In this blog post, we'd like to give you a brief summary/update on the state of support for the Ceph distributed storage system in openATTIC.

Earlier this year, we announced a development cooperation with SUSE to enhance and improve the Ceph functionality in openATTIC. We are really grateful for the expertise and guidance from the SUSE folks - the collaboration has been quite fruitful so far and the teams have been busy since then! SUSE's feedback and support so far was invaluable for us, to make sure that openATTIC becomes a Ceph management and monitoring tool that people actually want to use and that provides useful functionality.

It took us some time to get over the initial hump of creating the required foundation/infrastructure in the openATTIC backend code. We initially spent some time trying to figure out the best method of communicating with a Ceph cluster in order to perform management tasks. While Calamari (Server) sounded like an option at first, we eventually decided against using it and go went with the native librados Python bindings instead (and the details of this decision process probably deserve a blog post of its own).

In addition to that, we needed to come up with an alternative way to retrieve information about the Ceph cluster's various objects (e.g. Pools, RBDs, etc) that did not involve storing the data in a local database. Since we still wanted to make sure that a Ceph administrator can use other tools to configure or manage the cluster (and some of this information changes quite frequently), storing any information about it locally (like it would usually be done when using Django models) would have resulted in consistency and synchronization issues.

To solve this problem, we came up with a concept that we dubbed "nodb models" - from a developer perspective, they pretty much look and feel like regular Django models with one key difference: instead of persisting any data in a local database, they are capable of obtaining and storing the required information from a different resource. In our case, this resource is the Ceph cluster itself - this ensures that the openATTIC API always returns up to date information.

From a feature implementation perspective, we basically have to implement each component of the Ceph management functionality on two different layers: at first, we need to implement the backend code, which communicates with the Ceph cluster via librados and returns the results via the Django REST framework.

Once this backend part has been finished, the Web UI components can be built on top of it. The first stages of the Web UI usually include creating "data tables" - views that display the required information in table form, with the option to click on individual elements for detailed information. These views will be further refined and improved in upcoming releases, as simply displaying the "raw" data is usually not that useful. Also, the creation of new objects like Ceph pools requires obtaining a lot of information and making the right choices, so these parts of the UI will continue to evolve over time, to simplify the work flows and reduce the amount of unnecessary information.

The high-level Ceph development plan and roadmap can be reviewed on our public Confluence Wiki page openATTIC Ceph Management Roadmap and Implementation Plan.

The Wiki page openATTIC Ceph REST API overview shows a matrix of the existing functionality that is currently available via the API.

These documents will be updated continuously to reflect the current state of development. Currently, our development focus is primarily on the Ceph "Jewel" release, older versions have not been tested.

So, in a nutshell, what has been implemented so far?

Based on the newly developed "nodb" backend infrastructure, openATTIC currently provides the following functionality:

  • Read-access to a Ceph cluster's health status information. This information will be visible on a dedicated Ceph dashboard (which is currently under development)
  • Read-Write access to a REST Collection allowing the creation of replicated and erasure-coded Ceph pools, setting-up cache-tiering, modifying a few pool properties and deleting pools.
  • Creation and deletion of erasure-code profiles is supported.
  • Read access to Ceph OSDs, displaying configuration options and performance data of each OSD in the UI. Enabling and disabling of OSDs is currently work in progress.
  • Read access to Ceph Placement Groups, to display status information for each PG.
  • We are currently working on the code for managing RADOS block devices (RBDs), so you will soon be able to create, list, modify and delete RBDs via the openATTIC API (and UI).

The following Ceph functionality is based on traditional Django models. Via the REST API, you can:

  • Show the content of the Ceph crushmap tree.
  • Create and map RADOS block devices as system devices using the Ceph kernel module.

It may also be worthwhile mentioning that support for multiple Ceph clusters has been built into the code base and web UI already - you can manage and monitor multiple Ceph clusters within one openATTIC instance.

The upcoming Ceph dashboard page will display basic cluster status information like overall storage utilization, the current read/write bandwidth and IOPS of the cluster and a summary of the health status of all nodes (e.g. OSDs and MONs). A click on a degraded node will redirect to a more detailed view of that object/node ("drill-down").

Note that our current focus of development is implementing management and monitoring functionality that can be achieved by using the existing librados and librbd Python bindings. In the next steps, we'll work on monitoring and remote management functionality that will allow openATTIC to perform arbitrary actions on any node of the Ceph cluster directly. We intent to use Salt Open for these remote management purposes, in particular with the Ceph Salt modules that SUSE is working on.

The Ceph support in openATTIC is still pretty much "work in progress" and we're still working on improving and extending the functionality, so any feedback and suggestions from early adopters would be useful.

So if you are using Ceph and would like to get a free and open source management/monitoring platform for it, we would like to invite you to give openATTIC a try!

If you have any comments oder ideas, please leave them on the Wiki pages, Jira issues or get in touch with us via any of the other communication channels (see the Get Involved page for details).

Your input is key - we want to make sure that openATTIC turns into a Ceph management tool that provides value and is fun to use!

Comments

Comments powered by Disqus