Monday, November 07, 2011

Cassandra: Recover SSTable Due To Version Mismatch

Cassandra 0.7 version shipped with a bug that caused incorrect row-level bloom filters to be generated when compacting sstables generated with earlier versions. This would manifest in IOExceptions during column name-based queries. Cassandra provides "nodetool scrub" to rebuild sstables with correct bloom filters, with no data lost. (If your cluster was never on 0.7.0 or earlier, you don't have to worry about this.) Note that nodetool scrub will snapshot your data files before rebuilding, just in case.

When adding a node to the cluster, the boostrapping process get's stuck when it receives a file which is apparently from an older version. This is the Exception:

java.lang.RuntimeException: Cannot recover SSTable /var/lib/cassandra/data/MYKEYSPACE/UserUpdateStatusWallTimeline-tmp-f-1 due to version mismatch. (current version is g).
at org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
at org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1090)
at org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
at org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:177)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)


This is a bash script that will simplify scrubbing your Cassandra data.

#!/bin/sh
# Cassandra: Recover SSTable Due To Version Mismatch
# (by) Asep Andria

# http://www.linkedin.com/in/asepandria

NODETOOL="/usr/local/cassandra/bin/nodetool"
CASSANDRA_HOST="192.168.0.1"
EXEC="$NODETOOL -h$CASSANDRA_HOST scrub "
CASSANDRA_DATA="/data/cassandra/data/
MYKEYSPACE"
$EXEC | ls "$CASSANDRA_DATA"/*-f-* | sed 's#^.*/##' | cut -f1 -d'-' | uniq -i -d