Text

We’re using Cassandra for a project at work. In one lower environment I’ve launched without problems a 6 nodes cluster at the beginning of the week. I’m saying without problems because all boxes came up ok and no clients had problems talking to the cluster.

But when I ran nodetool ring from one of the boxes it only showed 5 machines, one of which was owning 33% of the ring:

Address DC Rack Status State Load Owns Token
141784319550391026443072753098378663705
<IP1> us-east 1a Up Normal 271.02 KB 16.67% 1808575600
<IP2> us-east 1d Up Normal 417.66 KB 33.33% 56713727820156410577229101240436610842
<IP3> us-east 1a Up Normal 324.21 KB 16.67% 85070591730234615865843651859750628463
<IP4> us-east 1d Up Normal 125.87 KB 16.67% 113427455640312821154458202479064646084
<IP5> us-east 1e Up Normal 268.65 KB 16.67% 141784319550391026443072753098378663705

When ran from any of the 6 boxes, including the one that was missing from the list - let’s call it IPX - nodetool ring showed everyone was up and normal & pretty much the same information. Running nodetool info on IPX it seemed to think it had the same token as IP5 (the other box in AZ 1e):

Token : 141784319550391026443072753098378663705
Gossip active : true
Load : 322.3 KB
Generation No : 1388719379
Uptime (seconds) : 254182
Heap Memory (MB) : 89.50 / 3753.88
Data Center : us-east
Rack : 1e
Exceptions : 0

To try and make things whole again I decided to move the token of IPX to a better place on the ring.

As a consequence running nodetool info from IP5 produced:

Token : 113427455640312821154458202479064646084
Gossip active : true
Load : 317.96 KB
Generation No : 1388719379
Uptime (seconds) : 257647
Heap Memory (MB) : 85.41 / 3753.88
Exception in thread “main” java.lang.AssertionError: Could not find myself in the endpoint list, something is very wrong!
at org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:557)
at org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:564)
at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651)

 And running nodetool ring from another box showed:

Address DC Rack Status State Load Owns Token
141784319550391026443072753098378663705
<IP1> us-east 1a Up Normal 275.44 KB 16.67% 1808575600
<IPX> us-east 1d Up Normal 273.17 KB 16.67% 28356863910078205288614550619314017621
<IP2> us-east 1d Up Normal 422.08 KB 16.67% 56713727820156410577229101240436610842
<IP3> us-east 1a Up Normal 315.42 KB 16.67% 85070591730234615865843651859750628463
<IP4> us-east 1e Up Normal 273.07 KB 33.33% 141784319550391026443072753098378663705

So it looks like IPX got in the ring and IP5 got kicked out (and it’s not happy about it)…

Restarting Cassandra on IP5 seemed to make everyone happy, so I ran nodetool cleanup followed by nodetool repair to wrap things up.

All is left now is to find some metric (hopefully JMX) that can detect such intriguing situations.

Text

For the past couple of months I’ve been creating and updating a fair number of Graphite dashboards. Sometimes, I can’t quite figure out why, the toolbar disappears.

The latest incarnation of the disappearing toolbar was extra sad for me as I just finished making some substantial changes to the dashboard with no way of saving it…

After messing around for a while in Chrome’s Inspector console the solution was just running: 

toggleToolbar()

Text

The project is coming along. A bit slower, but am making progress. A basic version should be up on GitHub in a couple of weeks.

Text

I have a friend that likes to use on all his browsers & computers a custom HTML page as the Home page. The page is pretty basic: a table with a number of links and a few images from webcams overlooking parts of the Panama Canal. The page gets refreshed periodically using the “classic” Meta refresh method:

<meta http-equiv=”refresh” content=”20;”>

All simple & sweet except that in IE9 none of the images got refreshed (most have time stamps so it was easy to see) although the page itself got refreshed. In Chrome everything got refreshed without problems.

Opening IE9’s Developer Tools I’ve noticed that the Document Mode was set to quirks and on refresh in the Network tab for all images it listed aborted in the Result column instead of a numeric HTTP result code.

After forcing the Document Mode to IE9 the images began refreshing. So the next step was finding out how IE9 sets the Document Mode. Fortunately the documentation was pretty easy to find (i.e. here) & the fix was adding a <!DOCTYPE> directive.

What I couldn’t find was why the strange behavior of not (re)loading external resources on refresh. Quirks mode indeed…

Text

I’m currently working on a little script that will generated a user friendly description from a Voluptuous schema using the very nice Python ast module.

So far it looks quite promising. Or at least more promising than when I’ve tried using inspect.

Text

A while ago I’ve had to modify a Python script to collect overall ok/failure metrics and send them to a statsd server as counters. Nothing really special except that the numbers didn’t look quite right, always way less than expected.

After ruling out that perhaps the stats were not sent in the first place (tcpdump & the script’s logs said otherwise) or that it was a graphite aggregation rule causing lost precision the next step was looking at statsd itself.

It turned out that the cause was the implementation of a feature of our statsd fork which was meant to pretty much treat counters matching a certain prefix the same way as gauges: if multiple values are received by statsd during the flushInterval, only the last one seen will be sent to the backend instead of their sum.

The check for which key was considered a pre-summarized one was pretty straightforward (apart from what happens if key does start with summarizedPrefix): 

if (key.indexOf(config.summarizedPrefix)) {…}

The problem is that indexOf returns -1 if summarizedPrefix is undefined & -1 is truthy in Javascript. Thus it was treating all counters as summarized ones since there was no summarizedPrefix in the config file… 

So now that piece of code looks like this (kind of giving a new meaning to the word prefix):

if (config.summarizedPrefix && key.indexOf(config.summarizedPrefix) >= 0) {…}