Cooling Doesn’t Manage Itself

Cooling Doesn’t Manage Itself

Of the primary components driving data center operations – IT assets, power, space and cooling – the first three command the lion’s share of attention.  Schneider Electric (StruxureWare), Panduit (PIM), ABB (Decathalon), Nlyte, Emerson (Trellis) and others have created superb asset and power tracking systems.   Using systems like these and others, companies can get a good idea as to where their assets are located, how to get power to them and even how to optimally manage them under changing conditions.

Less well understood and, I would argue, not understood at all, is how to get all the IT-generated heat out of the data center, and as efficiently as possible.

Some believe that efficient cooling can be “designed in,” as opposed to operationally managed, and that this is good enough.

On the day a new data center goes live the cooling will, no doubt, operate superbly.  That is, right up until something changes – which could happen the next day, weeks or months later.  Even the most efficiently designed data centers eventually operate inefficiently. At that point, your assets are at risk and you probably won’t even know it.  Changes and follow-in inefficiencies are inevitable.

As well, efficiency by design only applies to new data centers.  The vast majority of data centers operating today are aging. All of them have degraded with incremental cooling issues over time.   IT changes, infrastructure updates, failures, essentially any and all physical data center changes or incidents, affect cooling in ways that may not be detected through traditional operations or “walk around” management.

Data center managers must manage their cooling infrastructure as dynamically and closely as they do their IT assets.  The health of the cooling system directly impacts the health of those very same IT assets.

Further, cooling must be managed operationally.  Beyond the cost savings of continually optimized efficiency, cooling management systems provide clearer insight into where to add capacity, redundancy, potential thermal problems, and areas of risk.

Data centers have grown beyond the point where they can be managed manually.  It’s time stop treating cooling as the red-headed step-child of data centers.  Cooling requires the same attention and sophisticated management systems that are in common use for IT assets.  There’s no time to lose.

Machine Learning

Why Machine Learning-based DCIM Systems Are Becoming Best Practice.

Here’s a conundrum.  While data center IT equipment has a lifespan of about three years, data center cooling equipment will endure about 15 years. In other words,  your data center will likely  undergo five complete IT refreshes within the lifetime of your cooling equipment – at the very least.  In reality, refreshes happen much more frequently. Racks and servers come and go, floor tiles are moved, maintenance is performed, density is changed based on containment operations – any one of which will affect the ability of the cooling system to work efficiently and effectively.

If nothing is done to re-configure cooling operations as IT changes are made, and this is typically the case, the data center develops hot and cold spots, stranded cooling capacity and wasted energy consumption.  There is also risk with every equipment refresh – particularly if the work is done manually.

There’s a better way. The ubiquitous availability of low cost sensors, in tandem with the emerging availability of machine learning technology, is leading to development of new best practices for data center cooling management. Sensor-driven machine learning software enables the impact of IT changes on cooling performance to be anticipated and more safely managed.

Data centers instrumented with sensors gather real-time data which can inform software of minute-by-minute cooling capacity changes.  Machine learning software uses this information to understand the influence of each and every cooling unit, on each and every rack, in real-time as IT loads change.  And when loads or IT infrastructure changes, the software re-learns accordingly and updates itself, ensuring that the accuracy of its influence predictions remains current and accurate.   This ability to understand cooling influence at a granular level also enables the software to learn which cooling units are working effectively – and at expected performance levels  – and which aren’t.

This understanding also illuminates, in a data-supported way, the need for targeted corrective maintenance. With a clearer understanding and visualization of cooling unit health, operators can justify the right budget to maintain equipment effectively thereby improving the overall health and reducing risk in the data center.

In one recent experience at a large US data center, machine learning software revealed that 40% of the cooling units were consuming power but not cooling.  The data center operator was aware of the problem, but couldn’t convince senior management to expend budget because he couldn’t quantify the problem nor prove the value/need for a specific expenditure to resolve the issue.  With new and clear data in hand, the operator was able to identify the failed CRACs and present the appropriate budget required to fix and replace them accordingly.

This ability to more clearly see the impact of IT changes on cooling equipment enables personnel to keep up with cooling capacity adjustment and, in most cases, eliminate the need for manual control.  A reduction of the corresponding “on-the-fly, floor time corrections” also frees up operators to focus on problems that require more creativity and to more effectively manage physical changes such floor tile adjustments, etc.

There’s no replacement for experience-based human expertise. However, why not leverage your staff  to do what they do best, and eliminate those tasks which are better served by software control.  Data centers using machine learning software are undeniably more efficient and more robust.  Operators can more confidently future proof themselves against inefficiency or adverse capacity impact as conditions change.  For these reasons alone, use of machine learning-based software should be considered an emerging best practice.

2012 Retrospective

It’s getting better all the time.

Despite our relentless drive to consume more and more data, driven by ever more interesting and arguably useful multimedia applications, energy consumption of data centers is growing slower than would be predicted from historical trends.

For that success, we should be proud, while remaining focused on even greater efficiency innovation.

Large companies have stepped up with powerful sustainability initiatives which impact energy use throughout their enterprise. We’ve gotten better at leveraging natural resources, like outside air to moderate data center temperatures.  We are using denser, smarter racks for space and other efficiencies. Data center cooling units are built with variable speed devices improving energy efficiency machine-by-machine. Utility companies are increasingly offering sophisticated and results-generating incentives to jump-start efficiency programs.

These and other contributing factors are making a difference, clearly proven in Jonathan Koomey’s Growth in Data Center Electricity Use 2011 report which showed a flattening, versus a lockstep correlation of energy usage to data center growth. Koomey and other analyst growth estimates projected a doubling of world data center energy usage from 2005 to 2010.  Actual growth rates were closer to 56%, a reduction that Koomey attributes both to fewer than expected server installations – and a reduced use of electricity per server.

I am proud of what our industry – and what our company – has achieved.  Consider some of this year’s highlights.

The New York Times raised the profile – and the ire  – of the data center industry calling attention to the massive energy consumed by, well, consumers.  Data center facilities and analysts alike responded with criticism, saying that the article ignored the many and significant sustainability and energy use reductions now actively in use.

Vigilent received an astounding 8 industry awards this year – recognizing our technology innovation, business success and workplace values. I’m very proud of the fact that several of these awards were presented by or achieved in partnership with our customers.  For example, Vigilent and NTT won the prestigious Uptime GEIT 2012 award in the Facility Product Deployment Category.  NTT Facilities with NTT Communications received the 2012 Green Grid Grand Prix award, recognizing NTT’s innovative efforts in raising the energy efficient levels of Japan by using Vigilent and contributing DCIM tools.  And Verizon, in recognition of our support for their commitment to continuing quality and service, presented us with their Supplier Recognition award in the green and sustainability category.

We moved strongly into Japanese and Canadian markets with the help of NTT Facilities and Telus, both of whom made strategic investments in Vigilent following highly successful deployments.  Premiere Silicon Valley venture firm Accel Partners became an investor early in the year.

We launched Version 5 of our intelligent energy management system adding enhanced cooling system control with Intelligent Analytics-driven trending and visualization, along with a new alarm and notification product to further reduce downtime risk.

And, perhaps most satisfyingly of all, we helped our customers avert more than a few data center failures through real-time monitoring and intercession, along with early notification of possible issues.

This year, we will reduce energy consumption by more than 72 million kWh in the US alone.  And this figure grows with each new deployment.  We do this profitably, and with direct contribution to our customer’s bottom line as well through energy cost savings.

Things are getting better. And we’re just getting started.