Using CockroachDB storage attributes for heterogeneous data

Wondering how to leverage heterogeneous storage with CockroachDB, wonder no more!

Heterogeneous storage with CockroachDB


This tutorial covers steps to intentionally pin data to a specific storage device.

Motivation

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters. It makes building geo-distributed databases easy. Ability to anchor data to geographic localities is a unique capability of CockroachDB. Today, I will demonstrate this ability to extend beyond regions, availability zones, nodes, data centers and racks. The architecture allows us to domicile data to specific storage devices using storage attributes.

High Level Steps

  • Start a CockroachDB cluster with multiple disks per node using storage attributes
  • Create a table pinned to a specific disk type
  • Create another table and verify its location on disk
  • Evict the second table to demonstrate granularity
  • Assign proper storage constraints deterministically
  • Verify Setup

Step by step instructions

export storepath="${PWD}/workdir"
mkdir -p "${storepath}"
cockroach start \
--insecure \
--store=path=$storepath/node1/data1,attrs=hot \
--store=path=$storepath/node1/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node1/logs \
--background

cockroach start \
--insecure \
--store=path=$storepath/node2/data1,attrs=hot \
--store=path=$storepath/node2/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node2/logs \
--background

cockroach start \
--insecure \
--store=path=$storepath/node3/data1,attrs=hot \
--store=path=$storepath/node3/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node3/logs \
--background

After the nodes are started, we need to initialize the cluster

cockroach init --insecure

At this point, we have a three node cluster with two disks per node, here's what the filesystem looks like right now

workdir
├── node1
│   ├── data1
│   │   ├── 000002.log
│   │   ├── CURRENT
│   │   ├── LOCK
│   │   ├── MANIFEST-000001
│   │   ├── OPTIONS-000003
│   │   ├── auxiliary
│   │   ├── cockroach-temp784405882
│   │   │   ├── 000002.log
│   │   │   ├── CURRENT
│   │   │   ├── LOCK
│   │   │   ├── MANIFEST-000001
│   │   │   ├── OPTIONS-000003
│   │   │   ├── TEMP_DIR.LOCK
│   │   │   └── auxiliary
│   │   ├── cockroach.advertise-addr
│   │   ├── cockroach.advertise-sql-addr
│   │   ├── cockroach.http-addr
│   │   ├── cockroach.listen-addr
│   │   ├── cockroach.sql-addr
│   │   └── temp-dirs-record.txt
│   ├── data2
│   │   ├── 000002.log
│   │   ├── CURRENT
│   │   ├── LOCK
│   │   ├── MANIFEST-000001
│   │   ├── OPTIONS-000003
│   │   ├── auxiliary
│   │   ├── cockroach.advertise-addr
│   │   ├── cockroach.advertise-sql-addr
│   │   ├── cockroach.http-addr
│   │   ├── cockroach.listen-addr
│   │   └── cockroach.sql-addr
│   └── logs
│       ├── cockroach-pebble.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│       ├── cockroach-pebble.log -> cockroach-pebble.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│       ├── cockroach-stderr.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│       ├── cockroach-stderr.log -> cockroach-stderr.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│       ├── cockroach.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│       └── cockroach.log -> cockroach.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
├── node2
│   ├── data1
│   │   ...
│   ├── data2
│   │   ...
│   └── logs
│   │   ...
└── node3
│   ├── data1
│   │   ...
│   ├── data2
│   │   ...
│   └── logs
│   │   ...

Let's verify all of the disks are present

SELECT node_id, store_id, attrs
FROM crdb_internal.kv_store_status;
  node_id | store_id |  attrs
----------+----------+-----------
        1 |        1 | ["hot"]
        1 |        2 | ["warm"]
        2 |        3 | ["hot"]
        2 |        4 | ["warm"]
        3 |        5 | ["hot"]
        3 |        6 | ["warm"]

We can observe that every node and their associated disk is present. We can now create the tables and deterministically placing them on proper device.

CREATE TABLE t1 (id UUID DEFAULT gen_random_uuid() PRIMARY KEY, val STRING);
CREATE TABLE t2 (id UUID DEFAULT gen_random_uuid() PRIMARY KEY, val STRING);

Let's verify the replica placement for each table

SHOW RANGES FROM TABLE t1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       37 |             0 |            1 |                       | {1,4,5}  | {"","",""}

We can observe that range 37 resides across stores 1, 4, 5. By default, the placement is not deterministic and balancer will place the replicas based on different heuristics, storage attribute not being one of them.

Let's see what the placement for table t2 looks like.

SHOW RANGES FROM TABLE t2;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       38 |             0 |            1 |                       | {1,3,6}  | {"","",""}

Table t2 has a range ID 38 and is spread out across stores 1, 3 and 6. Again a mix of hot and warm storage disks.

Let's add some data and observe the placement behavior.

INSERT INTO t1 (val) VALUES ('one'), ('two'), ('three');
INSERT INTO t2 (val) VALUES ('one'), ('two'), ('three');
SELECT * FROM t1;
SELECT * FROM t2;
                   id                  |  val
---------------------------------------+--------
  32e392ca-b16a-4b76-a988-5a80a5be6699 | one
  3b82ac72-9d8b-4234-997a-325875b755b3 | two
  d9d947aa-8b08-4267-820b-48fbfc349a75 | three

                   id                  |  val
---------------------------------------+--------
  83111a80-035d-4a53-bd9a-e0d36971363e | two
  881f70b4-8efe-4d0e-9496-001433d85644 | three
  8b49bc83-f4da-491e-a4f0-8a8b455a7a35 | one

Let's verify the placement after some data was inserted

SHOW RANGES FROM TABLE t1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       37 |      0.000137 |            1 |                       | {1,4,5}  | {"","",""}

and t2 has the same placement as before, nothing changed

  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       38 |      0.000138 |            1 |                       | {1,3,6}  | {"","",""}

The disk placement is all over the place, let's assign disk placement explicitly.

ALTER TABLE t1 CONFIGURE ZONE USING constraints = '[+hot]';

Let's verify

SHOW ZONE CONFIGURATION FOR TABLE t1;
  target  |           raw_config_sql
-----------+--------------------------------------
  TABLE t1 | ALTER TABLE t1 CONFIGURE ZONE USING
           |     range_min_bytes = 134217728,
           |     range_max_bytes = 536870912,
           |     gc.ttlseconds = 90000,
           |     num_replicas = 3,
           |     constraints = '[+hot]',
           |     lease_preferences = '[]'

Ok we know the constraints are assigned but did data move?

  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       37 |      0.000137 |            1 |                       | {1,3,5}  | {"","",""}

Indeed, the replicas field shows stores 1, 3 and 5. If you recall from the earlier, stores with attribute hot have the same IDs.

SELECT node_id, store_id, attrs
FROM crdb_internal.kv_store_status WHERE store_id IN (1, 3, 5);
  node_id | store_id |  attrs
----------+----------+----------
        1 |        1 | ["hot"]
        2 |        3 | ["hot"]
        3 |        5 | ["hot"]

At this point table t1 resides on the hot disks but table t2 is still split between hot and warm. Let's evict the table from the hot disks.

ALTER TABLE t2 CONFIGURE ZONE USING constraints = '[-hot]';

This command will make sure table t2 is placed anywhere but the hot disks.

The zone config for the table now looks like so

  target  |           raw_config_sql
-----------+--------------------------------------
  TABLE t2 | ALTER TABLE t2 CONFIGURE ZONE USING
           |     range_min_bytes = 134217728,
           |     range_max_bytes = 536870912,
           |     gc.ttlseconds = 90000,
           |     num_replicas = 3,
           |     constraints = '[-hot]',
           |     lease_preferences = '[]'

The store IDs for warm disks are 2, 4 and 6.

SELECT node_id, store_id, attrsFROM crdb_internal.kv_store_status WHERE store_id IN (2, 4, 6);
  node_id | store_id |  attrs
----------+----------+-----------
        1 |        2 | ["warm"]
        2 |        4 | ["warm"]
        3 |        6 | ["warm"]

The range for table t2 should have replicas 2, 4 and 6.

  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       38 |      0.000138 |            6 | NULL                  | {2,4,6}  | {"","",""}

We want to be explicit with what we're trying to do. With the constraints above, table t2 will be placed anywhere but the hot disks. Imagine you had a third storage type? The reason I demonstrated it is to show how to evict ranges.

ALTER TABLE t2 CONFIGURE ZONE USING constraints = '[+warm]';

The result is identical as we only have two disks types, warm and hot but this constraint is more specific with the type of disk we're going to pin table t2 to.

  target  |           raw_config_sql
-----------+--------------------------------------
  TABLE t2 | ALTER TABLE t2 CONFIGURE ZONE USING
           |     range_min_bytes = 134217728,
           |     range_max_bytes = 536870912,
           |     gc.ttlseconds = 90000,
           |     num_replicas = 3,
           |     constraints = '[+warm]',
           |     lease_preferences = '[]'

Ths is a short overview of storage attributes in CockroachDB. This capability opens doors for some sophisticated use cases using storage attributes. That said, CockroachDB also supports node attributes but that's a story for another day!

Clean up

pkill cockroach
rm -rf "${storepath}"

Comments

Popular posts from this blog

Vista Vulnerability Report mentions Ubuntu 6.06 LTS

Running CockroachDB with Docker Compose and Minio, Part 2

VirtualBox options to start VM in Normal, Detached and Headless Modes