Using CockroachDB storage attributes for heterogeneous data
This tutorial covers steps to intentionally pin data to a specific storage device.
CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters. It makes building geo-distributed databases easy. Ability to anchor data to geographic localities is a unique capability of CockroachDB. Today, I will demonstrate this ability to extend beyond regions, availability zones, nodes, data centers and racks. The architecture allows us to domicile data to specific storage devices using storage attributes.
- Start a CockroachDB cluster with multiple disks per node using storage attributes
- Create a table pinned to a specific disk type
- Create another table and verify its location on disk
- Evict the second table to demonstrate granularity
- Assign proper storage constraints deterministically
- Verify Setup
export storepath="${PWD}/workdir"
mkdir -p "${storepath}"
cockroach start \
--insecure \
--store=path=$storepath/node1/data1,attrs=hot \
--store=path=$storepath/node1/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node1/logs \
--background
cockroach start \
--insecure \
--store=path=$storepath/node2/data1,attrs=hot \
--store=path=$storepath/node2/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node2/logs \
--background
cockroach start \
--insecure \
--store=path=$storepath/node3/data1,attrs=hot \
--store=path=$storepath/node3/data2,attrs=warm \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--log-dir=$storepath/node3/logs \
--background
After the nodes are started, we need to initialize the cluster
cockroach init --insecure
At this point, we have a three node cluster with two disks per node, here's what the filesystem looks like right now
workdir
├── node1
│ ├── data1
│ │ ├── 000002.log
│ │ ├── CURRENT
│ │ ├── LOCK
│ │ ├── MANIFEST-000001
│ │ ├── OPTIONS-000003
│ │ ├── auxiliary
│ │ ├── cockroach-temp784405882
│ │ │ ├── 000002.log
│ │ │ ├── CURRENT
│ │ │ ├── LOCK
│ │ │ ├── MANIFEST-000001
│ │ │ ├── OPTIONS-000003
│ │ │ ├── TEMP_DIR.LOCK
│ │ │ └── auxiliary
│ │ ├── cockroach.advertise-addr
│ │ ├── cockroach.advertise-sql-addr
│ │ ├── cockroach.http-addr
│ │ ├── cockroach.listen-addr
│ │ ├── cockroach.sql-addr
│ │ └── temp-dirs-record.txt
│ ├── data2
│ │ ├── 000002.log
│ │ ├── CURRENT
│ │ ├── LOCK
│ │ ├── MANIFEST-000001
│ │ ├── OPTIONS-000003
│ │ ├── auxiliary
│ │ ├── cockroach.advertise-addr
│ │ ├── cockroach.advertise-sql-addr
│ │ ├── cockroach.http-addr
│ │ ├── cockroach.listen-addr
│ │ └── cockroach.sql-addr
│ └── logs
│ ├── cockroach-pebble.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│ ├── cockroach-pebble.log -> cockroach-pebble.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│ ├── cockroach-stderr.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│ ├── cockroach-stderr.log -> cockroach-stderr.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│ ├── cockroach.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
│ └── cockroach.log -> cockroach.Artems-MacBook-Pro.artem.2021-05-19T17_39_52Z.013639.log
├── node2
│ ├── data1
│ │ ...
│ ├── data2
│ │ ...
│ └── logs
│ │ ...
└── node3
│ ├── data1
│ │ ...
│ ├── data2
│ │ ...
│ └── logs
│ │ ...
Let's verify all of the disks are present
SELECT node_id, store_id, attrs
FROM crdb_internal.kv_store_status;
node_id | store_id | attrs
----------+----------+-----------
1 | 1 | ["hot"]
1 | 2 | ["warm"]
2 | 3 | ["hot"]
2 | 4 | ["warm"]
3 | 5 | ["hot"]
3 | 6 | ["warm"]
We can observe that every node and their associated disk is present. We can now create the tables and deterministically placing them on proper device.
CREATE TABLE t1 (id UUID DEFAULT gen_random_uuid() PRIMARY KEY, val STRING);
CREATE TABLE t2 (id UUID DEFAULT gen_random_uuid() PRIMARY KEY, val STRING);
Let's verify the replica placement for each table
SHOW RANGES FROM TABLE t1;
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 37 | 0 | 1 | | {1,4,5} | {"","",""}
We can observe that range 37 resides across stores 1, 4, 5. By default, the placement is not deterministic and balancer will place the replicas based on different heuristics, storage attribute not being one of them.
Let's see what the placement for table t2
looks like.
SHOW RANGES FROM TABLE t2;
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 38 | 0 | 1 | | {1,3,6} | {"","",""}
Table t2
has a range ID 38 and is spread out across stores 1, 3 and 6. Again a mix of hot and warm storage disks.
Let's add some data and observe the placement behavior.
INSERT INTO t1 (val) VALUES ('one'), ('two'), ('three');
INSERT INTO t2 (val) VALUES ('one'), ('two'), ('three');
SELECT * FROM t1;
SELECT * FROM t2;
id | val
---------------------------------------+--------
32e392ca-b16a-4b76-a988-5a80a5be6699 | one
3b82ac72-9d8b-4234-997a-325875b755b3 | two
d9d947aa-8b08-4267-820b-48fbfc349a75 | three
id | val
---------------------------------------+--------
83111a80-035d-4a53-bd9a-e0d36971363e | two
881f70b4-8efe-4d0e-9496-001433d85644 | three
8b49bc83-f4da-491e-a4f0-8a8b455a7a35 | one
Let's verify the placement after some data was inserted
SHOW RANGES FROM TABLE t1;
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 37 | 0.000137 | 1 | | {1,4,5} | {"","",""}
and t2
has the same placement as before, nothing changed
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 38 | 0.000138 | 1 | | {1,3,6} | {"","",""}
The disk placement is all over the place, let's assign disk placement explicitly.
ALTER TABLE t1 CONFIGURE ZONE USING constraints = '[+hot]';
Let's verify
SHOW ZONE CONFIGURATION FOR TABLE t1;
target | raw_config_sql
-----------+--------------------------------------
TABLE t1 | ALTER TABLE t1 CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 3,
| constraints = '[+hot]',
| lease_preferences = '[]'
Ok we know the constraints are assigned but did data move?
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 37 | 0.000137 | 1 | | {1,3,5} | {"","",""}
Indeed, the replicas field shows stores 1, 3 and 5. If you recall from the earlier, stores with attribute hot
have the same IDs.
SELECT node_id, store_id, attrs
FROM crdb_internal.kv_store_status WHERE store_id IN (1, 3, 5);
node_id | store_id | attrs
----------+----------+----------
1 | 1 | ["hot"]
2 | 3 | ["hot"]
3 | 5 | ["hot"]
At this point table t1
resides on the hot disks but table t2
is still split between hot and warm. Let's evict the table from the hot
disks.
ALTER TABLE t2 CONFIGURE ZONE USING constraints = '[-hot]';
This command will make sure table t2
is placed anywhere but the hot disks.
The zone config for the table now looks like so
target | raw_config_sql
-----------+--------------------------------------
TABLE t2 | ALTER TABLE t2 CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 3,
| constraints = '[-hot]',
| lease_preferences = '[]'
The store IDs for warm disks are 2, 4 and 6.
SELECT node_id, store_id, attrsFROM crdb_internal.kv_store_status WHERE store_id IN (2, 4, 6);
node_id | store_id | attrs
----------+----------+-----------
1 | 2 | ["warm"]
2 | 4 | ["warm"]
3 | 6 | ["warm"]
The range for table t2
should have replicas 2, 4 and 6.
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
NULL | NULL | 38 | 0.000138 | 6 | NULL | {2,4,6} | {"","",""}
We want to be explicit with what we're trying to do. With the constraints above, table t2
will be placed anywhere but the hot disks. Imagine you had a third storage type? The reason I demonstrated it is to show how to evict ranges.
ALTER TABLE t2 CONFIGURE ZONE USING constraints = '[+warm]';
The result is identical as we only have two disks types, warm
and hot
but this constraint is more specific with the type of disk we're going to pin table t2
to.
target | raw_config_sql
-----------+--------------------------------------
TABLE t2 | ALTER TABLE t2 CONFIGURE ZONE USING
| range_min_bytes = 134217728,
| range_max_bytes = 536870912,
| gc.ttlseconds = 90000,
| num_replicas = 3,
| constraints = '[+warm]',
| lease_preferences = '[]'
Ths is a short overview of storage attributes in CockroachDB. This capability opens doors for some sophisticated use cases using storage attributes. That said, CockroachDB also supports node attributes but that's a story for another day!
pkill cockroach
rm -rf "${storepath}"
Comments