The Real Cost of Cooling Configuration Errors

Hands in the network cause problems. A setting adjusted once, based on someone’s instinct of what needed to be changed at one moment in time, is often unmodified years later.

This is configuration rot. If your data center has been running for a while, the chances are pretty high that your cooling configurations, to name one example, are wildly out of sync. It’s even more likely you don’t know about it.

Every air conditioner is controlled by an embedded computer. Each computer supports multiple configuration parameters. Each of these different configurations can be perfectly acceptable. But a roomful of air conditioners with individually sensible configurations can produce bad outcomes when their collective impact is considered.

I recently toured a new data center in which each air conditioner supported 17 configuration parameters affecting temperature and humidity. There was a lot of unexplainable variation in the configurations. Six of the 17 configuration settings varied by more than 30%, unit to unit. Only five configurations were the same. Configuration variation initially and entropy over time wastes energy and prevents the overall air conditioning system from producing an acceptable temperature and humidity distribution.

Configuration errors contribute to accidental de-rating and loss of capacity. This wastes energy, and it’s costly from a capex perceptive. Perhaps you don’t need a new air conditioner. Instead, perhaps you can optimize or synchronize the configurations for the air conditioners you already have and unlock the capacity you need. Another common misconfiguration error is incompatible set points. If one air conditioner is trying to make a room cold and another is trying to make it warmer, the units will fight.

Configuration errors also contribute to poor free cooling performance. Misconfiguration can lock out free cooling in many ways.

The problem is significant. Large organizations use thousands of air conditioners. Manual management of individual configurations is impossible. Do the math. If you have 2000 air conditioners, each of which has up to 17 configuration parameters, you have 34,000 configuration possibilities, not to mention the additional external variables. How can you manage, much less optimize configurations over time?

Ideally, you need intelligent software that manages these configurations automatically. You need templates that prescribe optimized configuration. You need visibility to determine, on a regular basis, which configurations are necessary as conditions change. You need exception handling, so you can temporarily change configurations when you perform tasks such as maintenance, equipment swaps, and new customer additions, and then make sure the configurations return to their optimized state afterward. And, you need a system that will alert you when someone tries to change a configuration, and/or enforce optimized configurations automatically.

This concept isn’t new. It’s just rarely done. But if you aren’t aggressively managing configurations, you are losing money.

When Free Cooling Isn’t Free

Published in Data Center Dynamics.

The use of free cooling systems is quickly becoming common practice – particularly in new mission critical facility builds. Using outside air, either directly or indirectly, to cool ICT equipment is undeniably compelling, both logically and financially.

But is free air really free? Not always. Free cooling systems add considerable complexity to the operation and maintenance of mechanical equipment. If this complexity isn’t recognized or managed well, free cooling will add to energy costs and increase operational risk.

Watch the weather

Weather is the most obvious variable. Free cooling capacity declines in hot weather, requiring a design that either allows for elevated indoor temperatures or combines free cooling with conventional mechanical cooling to ensure that indoor temperatures remain within an acceptable range.

Multiple operating modes are another complicating factor. For example, the free cooling system at Facebook’s Prineville data center (pictured) uses eight distinct operating conditions to optimize the use of direct outside air and direct evaporative cooling under different weather conditions. Free cooling systems that use direct outside air augmented by compressorized cooling have at least three distinct operating conditions.

Maintenance also becomes more complex. Free cooling adds to the number of moving mechanical components (e.g. air dampers and actuators) that are in direct contact with outdoor air. Outdoor air is corrosive, which can cause the dampers and actuators to get stuck, and either fail to provide cooling or cause the system to bring in hot outdoor air when it should not. Free cooling systems with evaporative cooling have the added maintenance of cooling water, which requires chemical treatment and periodic flushing.

This complexity can significantly impact the energy reduction that free cooling can deliver, while creating real thermal management problems.

High failure rates

Accordingly, the high failure rates of free cooling systems are well documented in energy efficiency and building technology literature. A particularly good and practical paper entitled Free Cooling, At What Cost was written by Kristen Heinemeier and presented at the 2014 ACEEE Summer Study on Energy Efficiency in Buildings. My direct experience with free cooling systems throughout the US and Europe is completely consistent with Heinemeier’s paper. Specifically, I have seen even higher failure rates in mission critical facilities than in the commercial buildings referenced in Heinemeier’s paper.

Heinemeier examined the prevalence and impact of air-side economizer (direct free cooling) failure. She found that although economizers are an excellent energy saving technology, they do not perform well in practice. In California alone, she cites that in surveyed facilities, the economizer is disabled and outside air dampers are closed 30 – 40 percent of the time. She states: “This type of failure means that the economizer is not providing any savings, and that the building may not be bringing in any outside air. Other studies have found that the high-limit setpoints, set by technicians, are incorrect on the majority of RTUs in California, resulting in very few hours in the ‘free cooling’ range.”

I recently toured five sites in two countries, owned by different multinational companies, using cooling equipment from three different manufacturers.

Among the dozens of free cooling units that I observed on this trip, nearly all either had a problem that limited capacity and function or weren’t working at all. Problems included controller configuration, sensor failure, installation faults, and mechanical failures.

Some examples:

  • In one site, the outdoor air was cool but the outside air dampers were fully closed and the unit was recirculating indoor air. The temperature remained within an acceptable range; however, this was because the DX compressors were running unnecessarily – at massive cost. The operators knew that the free cooling should be operating, but didn’t know why it wasn’t. The facility had been operating that way since the free cooling units had been installed – about a year prior. Inspection of the units revealed that the controls weren’t configured properly, and that misconfigured control logic was preventing the free cooling from operating. I saw a similar scenario in a second site.
  • At another site I observed that the controls were working and appeared to be pulling in outside air. However, the discharge air on one particular unit wasn’t as cold as I would have expected. Inspection of the unit revealed that BOTH the outside air dampers and the return air dampers were closed. The damper actuator clamp on the outside air damper had either fallen off or been removed, leaving that damper stuck in the fully closed position. This problem was identified by analyzing data from the cooling optimization sensor network
  • At yet another site, I saw that the controls were working, the dampers were working and that cold air was produced – just not very much. We measured a large temperature difference in the outdoor air intake across the outside wall. The outside air duct was installed with a flanged connection to the wall. At a nearby site with the same free cooling equipment, the outside air duct penetrated the wall. The flanged installation caused the cooling units to draw air from the hollow wall construction, reducing the capacity of the free cooling by up to 40 percent. This problem was also identified by analyzing sensor network data.

What’s important to note is that while in each case the free cooling system had problems, they were all fixable problems – often with little or no investment. More significantly, operators didn’t always recognize that their free cooling was compromised, nor how it could be fixed. Besides the additional energy costs and potential thermal risk incurred by this lack of visibility, these facilities were on the verge of spending a lot of money in pursuit of a solution, when in fact their existing equipment would achieve the desired operation.

Monitor your cooling system

Because free cooling systems are highly efficient when they do work as intended, best practice would suggest that risk mitigation and visibility through a monitoring system is required to realize the safe operation and full benefit of free cooling. In California, Title 24 requires diagnostics for use with free cooling systems. Dynamic monitoring, analytics, and diagnostics in conjunction with visual inspection will reveal issues and help ensure the ongoing and proper operation of free cooling within a complex cooling infrastructure. In mission critical facilities that are operated lights-out, use of remote monitoring and analytics combined with intelligent alerting is the only way to ensure reliable operation of free cooling.

As free cooling becomes a standard means of cooling mission critical facilities, consideration of the risk and complexity it adds is critical. Data-driven oversight of cooling operations, in combination with a layer of smart analytics and control, is the best-practice way to ensure your thermal environment continually operates in the most efficient way possible. This oversight also ensures that you continue to optimize your capital investment, even as conditions, weather and physical changes occur over time.

2016 and Looking Forward

2016-imageTo date, Vigilent has saved more than 1 billion kilowatt hours of energy, delivering $100 million in savings to our customers.  This also means we reduced the amount of CO2 released into the atmosphere by over 700,000 metric tons, equivalent to not acquiring and burning almost 4000 railcars of coal.  This matters because climate change is real.

Earlier this year, Vigilent announced its support for the Low-Carbon USA initiative, a consortium of leading businesses across the United States that support the Paris Climate Accord with the goal of reducing global temperature rise to well below 2 degrees Celsius.  Conservation plays its part, but innovation driving efficiency and renewable power creation will make the real difference.  Vigilent and its employees are fiercely proud to be making a tangible difference every day with the work that we do.

Beyond this remarkable energy savings milestone, I am very proud of the market recognition Vigilent achieved this year.  Bloomberg recognized Vigilent as a “New Energy Pioneer.”  Fierce Innovation named Vigilent the Best in Show:  Green Application & Data Centers (telecom category.)

Of equal significance, Vigilent has become broadly recognized as a leader in the emerging field of industrial IoT.  With our early start in this industry, integrating sensors and machine learning for measurable advantage long before they ever became a “thing,” Vigilent has demonstrated significant market traction with concrete results.  The industry has recognized Vigilent’s IoT achievements with the following awards this year:

TiE50                    Top Startup: IoT

IoT Innovator     Best Product: Commercial and Industrial Software

We introduced Vigilent prescriptive analytics this summer with shocking results, and I say that in a good way.  Our customers have uniformly received insights that surprised them.  These insights have ranged from unrealized capacity to failing equipment in critical areas.  The analytics are also helping customers meet SLA requirements with virtually no extra work and to identify areas ranging out of compliance, enabling facility operators to quickly resolve issues as soon as a temperature goes beyond a specified threshold.

Vigilent dynamic cooling management systems are actively used in the world’s largest colos and telcos, and in Fortune 500 companies spanning the globe.  We have expanded relationships with long-term partners’ NTT Facilities and Schneider Electric, who have introduced Vigilent to new regions such as Latin America and Greater Asia.  We signed a North America-focused partnership with Siemens, which leverages Siemens Demand Flow and the Vigilent system to optimize efficiency and manage data center challenges across the white space and chiller plant. We are very pleased that the world’s leading data center infrastructure and service vendors have chosen to include Vigilent in their solution portfolio.

We thank you, our friends, customers and partners, for your continued support and look forward to another breakout year as we help the businesses of the world manage energy use intelligently and combat climate change.

 

The Fastest Route to Using Data Analysis in Data Center Operations

voltThe transition to data-driven operations within data centers is inevitable.  In fact, it has already begun.

With this in mind, my last blog questioned why data centers still resist data use, surmising that because data use doesn’t fall within traditional roles and training, third parties – and new tools – will be needed to help with the transition. “Retrofitting” existing personnel, at least in the short term, is unrealistic.  And time matters.

Consider the example of my Chevy Volt.  The Volt illustrates just how quickly a traditional industry can be caught flat-footed in a time of transition, opening opportunities for others to seize market share. The Volt is as much a rolling mass of interconnected computers as it is a car. It has 10 million lines of code. 10 million!  That’s more than a F-22 Raptor, the most advanced fighter plane on earth.

The Volt of course, needs regular service just like any car.  While car manufacturers were clearly pivoting toward complex software-driven engines, car dealerships were still staffed with engine mechanics, albeit highly skilled mechanics.  During my service experience, the dealership had one guy trained and equipped to diagnose and tune the Volt.  One guy.  Volts were and are selling like crazy.  And when that guy was on vacation, I had to wait.

So, the inevitable happened.  Third party service shops, which were fully staffed with digitally-savvy technicians specifically trained in electric vehicle maintenance, quickly gained business.  Those shops employed mechanics, but the car diagnostics were performed by technology experts who could provide the mechanics with very specific guidance from the car’s data.  In addition, I had direct access to detail about the operation of my car from monthly reports delivered by OnStar, enabling me to make more informed driving, maintenance and purchase decisions.

Most dealerships weren’t prepared for the rapid shift from servicing mechanical systems to servicing computerized systems.  Referencing my own experience, the independent service shop that had been servicing my other, older car, very quickly transitioned to service all kinds of electric service vehicles.  Their agility in adjusting to new market conditions brought them a whole new set of service opportunities.  The Chevy dealership, on the other hand, created a service vacuum that opened business for others.

The lesson here is to transition rapidly to new market conditions.  Oftentimes, using external resources is the fastest way to transition to a new skillset without taking your eye off operations, without making a giant investment, and while creating a path to incorporating these skills into your standard operating procedures over time. 

During transitions, and as your facility faces learning curve challenges, it makes sense to turn to resources that have the expertise and the tools at hand.  Because external expert resources work with multiple companies, they also bring the benefit of collective perspective, which can be brought to bear on many different types of situations.

In an outsourced model, and specifically in the case of data analytics services, highly experienced and focused data specialists can be responsible for collecting, reviewing and regularly reporting back to facility managers on trends, exceptions, actions to take and potentially developing issues.  These specialists augment the facility manager’s ability to steer his or her data centers through a transition to more software and data intensive systems, without the time hit or distraction of engaging a new set of skills.  Also, as familiarity with using data evolves, the third party can train data center personnel, providing operators with direct access to data and indicative metrics in the short term, while creating a foundation for the eventual onboarding of data analysis operations.  

Data analysis won’t displace existing data center personnel.  It is an additional and critical function that can be supported internally or externally.  Avoiding the use of data to improve data center operations is career-limiting.  Until data analysis skills and tools are embedded within day-to-day operations, hiring a data analysis service can provide immediate relief and help your team transition to adopt these skills over time.  

Why Don’t Data Centers Use Data?

Data analysis doesn’t readily fall into the typical data center operator’s job description.   That fact, and the traditional hands-on focus of those operators, isn’t likely to change soon.

But turning a blind eye or ignoring the floodgate of data now available to data centers through IoT technology, sensors and cloud-based analytics is no longer tenable.  While the data impact of IoT has yet to be truly realized, most data centers have already become too complex to be managed manually.

What’s needed is a new role entirely, one with dotted line/cross-functional responsibility to operations, energy, sustainability and planning teams.

Consider this.  The aircraft industry has historically been driven by design, mechanical and engineering teams.  Yet General Electric aircraft engines, as an example, throw off terabytes of data on every single flight.  This massive quantity of data isn’t managed by these traditional teams.  It’s managed by data analysts who continually monitor this information to assess safety and performance, and update the traditional teams who can take any necessary actions.

Like aircraft, data centers are complex systems.  Why aren’t they operated in the same data-driven way given that the data is available today?

Data center operators aren’t trained in data analysis nor can they be expected to take it on.  The new data analyst role requires an understanding and mastery of an entirely different set of tools.  It requires domain-specific knowledge so that incoming information can be intelligently monitored and triaged to determine what constitutes a red flag event, versus something that could be addressed during normal work hours to improve reliability or reduce energy costs.

It’s increasingly clear that managing solely through experience and physical oversight is no longer best practice and will no longer keep pace with the increasing complexity of modern data centers.  Planning or modeling based only on current conditions – or a moment in time –  is also not sufficient.  The rate of change, both planned and unplanned, is too great.  Data, like data centers, is fluid and multidimensional. 

Beyond the undeniable necessity of incorporating data into day-to-day operations to manage operational complexity, data analysis provides significant value-added benefit by revealing cost savings and revenue generating opportunities in energy use, capacity and risk avoidance.  It’s time to build this competency into data center operations.

Does Efficiency Matter?

Currently, it seems that lots of things matter more than energy efficiency. Investments in reliability, capacity expansion and revenue protection all receive higher priority in data centers than any investment focusing on cutting operating expenses through greater efficiency.

So does this mean that efficiency really doesn’t matter? Of course efficiency matters. Lawrence Berkeley National Labs just issued a data center energy report proving just how much efficiency improvements have slowed the data center industry’s energy consumption; saving a projected 620 billion kWh between 2010 and 2020.

The investment priority disconnect occurs when people view efficiency from the too narrow perspective of cutting back.

Efficiency, in fact, has transformational power – when viewed through a different lens.

Productivity is an area ripe for improvements specifically enabled by IoT and automation. Automation’s impact on productivity often gets downplayed by employees who believe automation is the first step toward job reductions. And sure, this happens. Automation will replace some jobs. But if you have experienced and talented people working on tasks that could be automated, your operational productivity is suffering. Those employees can and should be repurposed for work that’s more valuable. And, as most datacenters run with very lean staffing, your employees are already working under enormous pressure to keep operations working perfectly and without downtime. Productivity matters here as well. Making sure your employees are working on the right, highest impact activities generates direct returns in cost, facility reliability and job satisfaction.

Outsourcing is another target. Outsourcing maintenance operations has become common practice. Yet how often are third party services monitored for efficiency? Viewing the before and after performance of a room or a piece of equipment following maintenance is telling. These details, in context with operational data, can identify where you are over-spending on maintenance contracts or where dollars can be allocated elsewhere for higher benefit.

And then there is time. Bain and Company in a 2014 Harvard Business Review article called time “your scarcest resource,” and as such is a logical target for efficiency improvement.  Here’s an example. Quite often data center staff will automatically add cooling equipment to facilities to support new or additional IT load. A quick and deeper look into the right data often reveals that the facilities can handle the additional load immediately and without new equipment. A quick data dive can save months of procurement and deployment time, while simultaneously accelerating your time to the revenue generated by the additional IT load.

Every time employees can stop or reduce time spent on a low value activity, they can achieve results in a different area, faster. Conversely, every time you free up employee time for more creative or innovative endeavors, you have an opportunity to capture competitive advantage. According to a report by KPMG as cited by the Silicon Valley Beat, the tech sector is already focused on this concept, leveraging automation and machine learning for new revenue advantages as well as efficiency improvements.

“Tech CEOs see the benefits of digital labor augmenting workforce capabilities,” said Gary Matuszak, global and U.S. chair of KPMG’s Technology, Media and Telecommunications practice.

“The increased automation and machine learning could enable new ways for tech companies to conduct business so they can add customer value, become more efficient and slash costs.”

Investments in efficiency when viewed through the lens of “cutting back” will continue to receive low priority. However, efficiency projects focusing on productivity or time to revenue will pay off with immediate top line effect. They will uncover ways to simultaneously increase return on capital, improve workforce productivity, and accelerate new sources of revenue. And that’s where you need to put your money.

The Real Cost of Cooling Configuration Errors

Hands in the network cause problems. A setting adjusted once, based on someone's instinct of what needed to be changed at one moment in time, is often unmodified years later. This is configuration rot. If your data center has been running for a … [Read more]

When Free Cooling Isn’t Free

Published in Data Center Dynamics. The use of free cooling systems is quickly becoming common practice – particularly in new mission critical facility builds. Using outside air, either directly or indirectly, to cool ICT equipment is undeniably … [Read more]

2016 and Looking Forward

To date, Vigilent has saved more than 1 billion kilowatt hours of energy, delivering $100 million in savings to our customers.  This also means we reduced the amount of CO2 released into the atmosphere by over 700,000 metric tons, equivalent to not … [Read more]

The Fastest Route to Using Data Analysis in Data Center Operations

The transition to data-driven operations within data centers is inevitable.  In fact, it has already begun. With this in mind, my last blog questioned why data centers still resist data use, surmising that because data use doesn’t fall within … [Read more]

Why Don’t Data Centers Use Data?

Data analysis doesn’t readily fall into the typical data center operator’s job description.   That fact, and the traditional hands-on focus of those operators, isn’t likely to change soon. But turning a blind eye or ignoring the floodgate of data … [Read more]

Does Efficiency Matter?

Currently, it seems that lots of things matter more than energy efficiency. Investments in reliability, capacity expansion and revenue protection all receive higher priority in data centers than any investment focusing on cutting operating expenses … [Read more]