Home

Awesome

Instaclustr SSTable tools

Maven Badge Circle CI

https://www.instaclustr.com/update-released-instaclustr-sstable-analysis-tools-apache-cassandra/ https://www.instaclustr.com/instaclustr-open-sources-cassandra-sstable-analysis-tools/

Compile

$ git clone git@github.com:instaclustr/cassandra-sstable-tools.git
$ cd cassandra-sstable-tools
# Select the correct branch for major version (default is cassandra-4.1)
$ git checkout cassandra-4.1
$ mvn clean install

Install

Copy ic-sstable-tools.jar to Cassandra JAR folder, eg. /usr/share/cassandra/lib

Copy the bin/ic-sstable-tools script into your $PATH

We also offer RPM and DEB packages. In case you install them, you do not need to execute steps above, obviously.

Documentation

$ ./bin/ic-sstable-tools 
Missing required sub-command.
Usage: <main class> [-hV] [COMMAND]
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  cfstats   Detailed statistics about cells in a column family
  pstats    Partition size statistics for a column family
  purge     Statistics about reclaimable data for a column family
  sstables  Print out metadata for sstables that belong to a column family
  summary   Summary information about all column families including how much of the data is repaired

If you want invoke help command for each subcommand, do it like:

$ ./bin/ic-sstable-tools help cfstats

summary

Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.

Usage

ic-sstable-tools summary

Output

ColumnDescription
KeyspaceKeyspace the column family belongs to
Column FamilyName of column family
SSTablesNumber of sstables on this node for the column family
Disk SizeCompressed size on disk for this node
Data SizeUncompressed size of the data for this node
Last RepairedMaximum repair timestamp on sstables
Repair %Percentage of data marked as repaired

sstables

Print out sstable metadata for a column family. Useful in helping to tune compaction settings.

Usage

ic-sstable-tools sstables <keyspace> <column-family>

Output

ColumnDescription
SSTableData.db filename of sstable
Disk SizeSize of sstable on disk
Total SizeUncompressed size of data contained in the sstable
Min TimestampMinimum cell timestamp contained in the sstable
Max TimestampMaximum cell timestamp contained in the sstable
DurationThe time span between minimum and maximum cell timestamps
Min Deletion TimeThe minimum deletion time
Max Deletion TimeThe maximum deletion time
LevelLeveled Tiered Compaction sstable level
KeysNumber of partition keys
Avg Partition SizeAverage partition size
Max Partition SizeMaximum partition size
Avg Column CountAverage number of columns in a partition
Max Column CountMaximum number of columns in a partition
DroppableEstimated droppable tombstones
Repaired AtTime when marked as repaired by incremental repair

pstats

Tool for finding largest partitions. Reads the Index.db files so is relatively quick.

Usage

ic-sstable-tools pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-hDisplay help
-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num>Number of partitions to display
-t <name>Snapshot to analyse. Snapshot is created if none is specified.
-f <files>Comma separated list of Data.db sstables to filter on

Output

Summary: Summary statistics about partitions

ColumnDescription
Count (Size)Number of partition keys on this node
Total (Size)Total uncompressed size of all partitions on this node
Total (SSTable)Number of sstables on this node
Minimum (Size)Minimum uncompressed partition size
Minimum (SSTable)Minimum number of sstables a partition belongs to
Average (Size)Average (mean) uncompressed partition size
Average (SSTable)Average (mean) number of sstables a partition belongs to
std dev. (Size)Standard deviation of partition sizes
std dev. (SSTable)Standard deviation of number of sstables for a partition
50% (Size)Estimated 50th percentile of partition sizes
50% (SSTable)Estimated 50th percentile of sstables for a partition
75% (Size)Estimated 75th percentile of partition sizes
75% (SSTable)Estimated 75th percentile of sstables for a partition
90% (Size)Estimated 90th percentile of partition sizes
90% (SSTable)Estimated 90th percentile of sstables for a partition
95% (Size)Estimated 95th percentile of partition sizes
95% (SSTable)Estimated 95th percentile of sstables for a partition
99% (Size)Estimated 99th percentile of partition sizes
99% (SSTable)Estimated 99th percentile of sstables for a partition
99.9% (Size)Estimated 99.9th percentile of partition sizes
99.9% (SSTable)Estimated 99.9th percentile of sstables for a partition
Maximum (Size)Maximum uncompressed partition size
Maximum (SSTable)Maximum number of sstables a partition belongs to

Largest partitions: The top N largest partitions

ColumnDescription
KeyThe partition key
SizeTotal uncompressed size of the partition
SSTable CountNumber of sstables that contain the partition

SSTable Leaders: The top N partitions that belong to the most sstables

ColumnDescription
KeyThe partition key
SSTable CountNumber of sstables that contain the partition
SizeTotal uncompressed size of the partition

SSTables: Metadata about sstables as it relates to partitions.

ColumnDescription
SSTableData.db filename of SSTable
SizeUncompressed size
Min TimestampMinimum cell timestamp in the sstable
Max TimestampMaximum cell timestamp in the sstable
LevelLeveled Tiered Compaction level of sstable
PartitionsNumber of partition keys in the sstable
Avg Partition SizeAverage uncompressed partition size in sstable
Max Partition SizeMaximum uncompressed partition size in sstable

cfstats

Tool for getting detailed cell statistics that can help identify issues with data model.

Usage

ic-sstable-tools cfstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-hDisplay help
-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num>Number of partitions to display
-t <name>Snapshot to analyse. Snapshot is created if none is specified.
-f <files>Comma separated list of Data.db sstables to filter on

Output

Summary: Summary statistics about partitions

ColumnDescription
Count (Size)Number of partition keys on this node
Rows (Size)Number of clustering rows
(deleted)Number of clustering row deletions
Total (Size)Total uncompressed size of all partitions on this node
Total (SSTable)Number of sstables on this node
Minimum (Size)Minimum uncompressed partition size
Minimum (SSTable)Minimum number of sstables a partition belongs to
Average (Size)Average (mean) uncompressed partition size
Average (SSTable)Average (mean) number of sstables a partition belongs to
std dev. (Size)Standard deviation of partition sizes
std dev. (SSTable)Standard deviation of number of sstables for a partition
50% (Size)Estimated 50th percentile of partition sizes
50% (SSTable)Estimated 50th percentile of sstables for a partition
75% (Size)Estimated 75th percentile of partition sizes
75% (SSTable)Estimated 75th percentile of sstables for a partition
90% (Size)Estimated 90th percentile of partition sizes
90% (SSTable)Estimated 90th percentile of sstables for a partition
95% (Size)Estimated 95th percentile of partition sizes
95% (SSTable)Estimated 95th percentile of sstables for a partition
99% (Size)Estimated 99th percentile of partition sizes
99% (SSTable)Estimated 99th percentile of sstables for a partition
99.9% (Size)Estimated 99.9th percentile of partition sizes
99.9% (SSTable)Estimated 99.9th percentile of sstables for a partition
Maximum (Size)Maximum uncompressed partition size
Maximum (SSTable)Maximum number of sstables a partition belongs to

Row Histogram: Histogram of number of rows per partition

ColumnDescription
PercentileMinimum, average, standard deviation (std dev.), percentile, maximum
CountEstimated number of rows per partition for the given percentile

Largest partitions: Partitions with largest uncompressed size

ColumnDescription
KeyThe partition key
SizeTotal uncompressed size of the partition
RowsTotal number of clustering rows in the partition
(deleted)Number of row deletions in the partition
TombstonesNumber of cell or range tombstones
(droppable)Number of tombstones that can be dropped as per gc_grace_seconds
CellsNumber of cells in the partition
SSTable CountNumber of sstables that contain the partition

Widest partitions: Partitions with the most cells

ColumnDescription
KeyThe partition key
RowsTotal number of clustering rows in the partition
(deleted)Number of row deletions in the partition
CellsNumber of cells in the partition
TombstonesNumber of cell or range tombstones
(droppable)Number of tombstones that can be dropped as per gc_grace_seconds
SizeTotal uncompressed size of the partition
SSTable CountNumber of sstables that contain the partition

Most Deleted Rows: Partitions with the most row deletions

ColumnDescription
KeyThe partition key
RowsTotal number of clustering rows in the partition
(deleted)Number of row deletions in the partition
SizeTotal uncompressed size of the partition
SSTable CountNumber of sstables that contain the partition

Tombstone Leaders: Partitions with the most tombstones

ColumnDescription
KeyThe partition key
TombstonesNumber of cell or range tombstones
(droppable)Number of tombstones that can be dropped as per gc_grace_seconds
RowsTotal number of clustering rows in the partition
CellsNumber of cells in the partition
SizeTotal uncompressed size of the partition
SSTable CountNumber of sstables that contain the partition

SSTable Leaders: Partitions that are in the most sstables

ColumnDescription
KeyThe partition key
SSTable CountNumber of sstables that contain the partition
SizeTotal uncompressed size of the partition
RowsTotal number of clustering rows in the partition
CellsNumber of cells in the partition
TombstonesNumber of cell or range tombstones
(droppable)Number of tombstones that can be dropped as per gc_grace_seconds

SSTables: Metadata about sstables as it relates to partitions.

ColumnDescription
SSTableData.db filename of SSTable
SizeUncompressed size
Min TimestampMinimum cell timestamp in the sstable
Max TimestampMaximum cell timestamp in the sstable
PartitionsNumber of partitions
(deleted)Number of row level partition deletions
(avg size)Average uncompressed partition size in sstable
(max size)Maximum uncompressed partition size in sstable
RowsTotal number of clustering rows in sstable
(deleted)Number of row deletions in sstable
CellsNumber of cells in the SSTable
TombstonesNumber of cell or range tombstones in the SSTable
(droppable)Number of tombstones that are droppable according to gc_grace_seconds
(range)Number of range tombstones
Cell LivenessPercentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.

purge

Finds the largest reclaimable partitions (GCable). Intensive process, effectively does "fake" compactions to calculate metrics.

Usage

ic-sstable-tools purge [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-hDisplay help
-bBatch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num>Number of partitions to display
-t <name>Snapshot to analyse. Snapshot is created if none is specified.

Output

Largest reclaimable partitions: Partitions with the largest amount of reclaimable data

ColumnDescription
KeyThe partition key
SizeTotal uncompressed size of the partition
ReclaimReclaimable uncompressed size
GenerationsSSTable generations the partition belongs to

Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project