Awesome

Description

Simple scripts for working with Apache Cassandra logs.

Overview

These are scripts I created to make it simpler to read/summarise/parse the system.log.

They were intended to be overly simple for readability and for portability, i.e. they can just run on any machine that can run Bourne shell or Perl without having to download additional modules or plugins.

Scripts

`compaction_rate.sh`

Extracts the compaction throughput from a Cassandra system.log.

Use this script to get a feel for the compaction in MB/s.

Usage: compaction_rate.sh <system_log> [min_data_size_bytes]

Sample output:

$ compaction_rate.sh system.log 20000000
  Throughput: 16.228230MB/s. | Data size: 38798031 (37 MB) | SSTables count: 2
  Throughput: 15.939397MB/s. | Data size: 41834507 (39 MB) | SSTables count: 2
  Throughput: 15.885001MB/s. | Data size: 45639407 (43 MB) | SSTables count: 2

`count_entries_per_hour.sh`

Counts the occurrences of a string for each hour in the Cassandra system.log.

Use this script to work out whether load has increased during a particular period.

Useful query strings:

"ParNew" - shows distribution of GC pauses
"Compacted" - shows distribution of compaction activity
"flush of Memtable" - shows flushing activity, indicates traffic
"ConcurrentMarkSweep" - indicates GC pressure
"Started hinted handoff" - indicates existence of unresponsive nodes

Sample output:

$ count_entries_per_hour.sh ParNew system.log
2015-04-28 12:00 - 8
2015-04-28 16:00 - 108
2015-04-28 20:00 - 202

`display_large_rows.sh`

Displays entries in Cassandra logs relating to compaction of large rows.

Useful for showing rows larger than a given size, e.g. 100MB.

Usage: display_large_rows.sh <system_log> [min_row_size_bytes]

Sample output:

$ display_large_rows.sh system.log 1000000000
INFO [CompactionExecutor:73] 2015-01-14 19:11:45,959 CompactionController.java (line 192) Compacting large row myKS/myCF:8e6fb0b72937 (1407625692 bytes) incrementally | 1342 MB
INFO [CompactionExecutor:73] 2015-01-14 19:13:09,901 CompactionController.java (line 192) Compacting large row myKS/myCF:2eec906de37b (1410187132 bytes) incrementally | 1344 MB
INFO [CompactionExecutor:73] 2015-01-14 19:14:34,765 CompactionController.java (line 192) Compacting large row myKS/myCF:ce49043461ce (2871138316 bytes) incrementally | 2738 MB

`initial_log_assess.sh`

A script I use to do a quick assessment of a Cassandra node. It highlights restarts, GC activity, dropped mutations, large rows, errors, etc.

Usage: initial_log_assess.sh system.log

`rename_node_dirs.sh`

The directory names for nodes in the OpsCenter Diagnostic tarball are quite long so I wrote this script to simplify them to just the IP address of the nodes.

Usage: cd <diagnostic_dir>/nodes && rename_node_dirs.sh

Sample output:

$ rename_node_dirs.sh 
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.3.100] to [172.31.3.100]... OK
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.36.54] to [172.31.36.54]... OK
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.43.125] to [172.31.43.125]... OK

`summarise_log_cass.pl`

Summarises a Cassandra log file into line counts by message type, thread names, classes and de-personalised messages.

Usage: summarise_log_cass.pl -f system.log

Sample output:

===== Summarising log file [system_678571075.log] =====

===== Count of message entries by log level =====
    15708 --- INFO 
      311 --- WARN 

===== Count of message entries by thread =====
     9871 --- [ScheduledTasks] 
     1512 --- [FlushWriter] 
     1263 --- [MemoryMeter] 
     1068 --- [RequestResponseStage] 
      814 --- [CompactionExecutor] 
      305 --- [HANDSHAKE#.#.#.#/#.#.#.#] 
      279 --- [WRITE-/#.#.#.#] 
      228 --- [main] 
      219 --- [HintedHandoff] 
...

===== Count of message entries by class =====
     9239 --- StatusLogger.java 
     2775 --- Memtable.java 
     1233 --- Gossiper.java 
      808 --- ColumnFamilyStore.java 
      593 --- CompactionTask.java 
      305 --- OutboundTcpConnection.java 
      280 --- SSLFactory.java 
      202 --- HintedHandOffManager.java 
      118 --- GCInspector.java 
...

===== Count of message entries =====
     9239 --- INFO [ScheduledTasks] StatusLogger.java ... 
     1234 --- INFO [MemoryMeter] Memtable.java CFS(Keyspace='...') liveRatio is #.# (just-counted was #.#).  calculation took #ms for # cells 
     1067 --- INFO [RequestResponseStage] Gossiper.java InetAddress /#.#.#.# is now UP 
      756 --- INFO [FlushWriter] Memtable.java Writing Memtable-<table>@#(#/# serialized/live bytes, # ops) 
      743 --- INFO [FlushWriter] Memtable.java Completed flushing <db_file> (# bytes) for commitlog position ReplayPosition(segmentId=#, position=#) 
      500 --- INFO [ScheduledTasks] ColumnFamilyStore.java Enqueuing flush of Memtable-<table> 
      298 --- INFO [CompactionExecutor] CompactionTask.java Compacting [SSTableReader(path=<sstable_files>), ...] 
      295 --- INFO [CompactionExecutor] CompactionTask.java Compacted # to [<sstable_files>]... 
      279 --- INFO [HANDSHAKE#.#.#.#/#.#.#.#] OutboundTcpConnection.java Handshaking version with #.#.#.#/#.#.#.# 
      279 --- WARN [WRITE-/#.#.#.#] SSLFactory.java Filtering out TLS_RSA_WITH_AES_#_CBC_SHA,TLS_DHE_RSA_WITH_AES_#_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_#_CBC_SHA as it isnt supported by the socket 
      155 --- INFO [GossipTasks] Gossiper.java InetAddress /#.#.#.# is now DOWN 
      106 --- INFO [ScheduledTasks] GCInspector.java GC for ConcurrentMarkSweep: # ms for # collections, # used; max is # 
      100 --- INFO [HintedHandoff] HintedHandOffManager.java Started hinted handoff for host: UUID with IP: /#.#.#.#
...

`show_log_ranges.sh`

This script is also run from the nodes directory, it will look for "system.log" files and then grab the first and last lines with the date string in them. The idea is to give the user the range the logs cover

`nodeuptime`

This script takes input from nodetool info and prints the nodes uptime in a more 'friendly' format . Useful to see how long has been up when looking at other nodetool commands.