For folks who run large datacenters or server-farms, systems monitoring is a typical boring day-to-day thing.  However, there are mountains of data available that are typically forgotten or lost due to a severe lack of good analysis and visualization tools.  Brendan Gregg has a great writeup of alternative ways to visualize real-time performance of multiple parameters across multiple systems.

For any given device type (CPUs, disks, network interfaces), and any number of devices (from a single device to a cloud of servers), we’d like to identify the following:

  • single or multiple devices at 100% utilization
  • average, minimum and maximum device utilization
  • device utilization balance (tight or loose distribution)
  • time-based characteristics

By including the time domain, we can identify whether utilization is steady or changing, and various finer details. These may include short bursts of high utilization, where it is useful to know the length of the bursts and the interval between them.

via Brendan’s blog » Visualizing Device Utilization.