Why Don’t Data Centers Use Data?

Data analysis doesn’t readily fall into the typical data center operator’s job description.   That fact, and the traditional hands-on focus of those operators, isn’t likely to change soon.

But turning a blind eye or ignoring the floodgate of data now available to data centers through IoT technology, sensors and cloud-based analytics is no longer tenable.  While the data impact of IoT has yet to be truly realized, most data centers have already become too complex to be managed manually.

What’s needed is a new role entirely, one with dotted line/cross-functional responsibility to operations, energy, sustainability and planning teams.

Consider this.  The aircraft industry has historically been driven by design, mechanical and engineering teams.  Yet General Electric aircraft engines, as an example, throw off terabytes of data on every single flight.  This massive quantity of data isn’t managed by these traditional teams.  It’s managed by data analysts who continually monitor this information to assess safety and performance, and update the traditional teams who can take any necessary actions.

Like aircraft, data centers are complex systems.  Why aren’t they operated in the same data-driven way given that the data is available today?

Data center operators aren’t trained in data analysis nor can they be expected to take it on.  The new data analyst role requires an understanding and mastery of an entirely different set of tools.  It requires domain-specific knowledge so that incoming information can be intelligently monitored and triaged to determine what constitutes a red flag event, versus something that could be addressed during normal work hours to improve reliability or reduce energy costs.

It’s increasingly clear that managing solely through experience and physical oversight is no longer best practice and will no longer keep pace with the increasing complexity of modern data centers.  Planning or modeling based only on current conditions – or a moment in time –  is also not sufficient.  The rate of change, both planned and unplanned, is too great.  Data, like data centers, is fluid and multidimensional. 

Beyond the undeniable necessity of incorporating data into day-to-day operations to manage operational complexity, data analysis provides significant value-added benefit by revealing cost savings and revenue generating opportunities in energy use, capacity and risk avoidance.  It’s time to build this competency into data center operations.

Analytics in Action for Data Center Cooling

When a data center is first designed, everything is tightly controlled. Rack densities are all the same. The layout is precisely planned and very consistent. Power and space constraints are well-understood. The cooling system is modeled – sometimes even with CFD – and all of the cooling units operate at the same level.

But the original design is often a short-lived utopia. The realty of most data centers becomes much more complex as business needs and IT requirements change and equipment moves in and out.

As soon as physical infrastructure changes, cooling capacity and redundancy are affected.  Given the complexity of design versus operational reality, many organizations have not had the tools to understand what has changed or degraded, so cannot make informed decisions about their cooling infrastructure. Traditional DCIM products often focus on space, network and power.  They don’t provide detailed, measured data on the cooling system.  So, decisions about cooling are made without visibility into actual conditions.

Analytics can help. Contrary to prevailing views, analytics don’t necessarily take a lot of know-how or data analysis skills to be extremely helpful in day-to-day operations management. Analytics can be simple and actionable. Consider the following examples of how a daily morning glance at thermal analytics helped these data center managers quickly identify and resolve some otherwise tricky thermal issues.

In our first example, the manager of a legacy, urban colo data center with DX CRAC units was asked to determine the right place for some new IT equipment. There were several areas with space and power available, but determining which of these areas had sufficient cooling was more challenging. The manager used a cooling influence map to identify racks cooled by multiple CRACs. He then referenced a cooling capacity report to confirm that more than one of these CRACs had capacity to spare. By using these visual analytics, the manager was able to place the IT equipment in an area with sufficient, and redundant, cooling.

In a second facility, a mobile switching center for a major telco, the manager noticed a hot spot on the thermal map and sent a technician to investigate the location. The technician saw that some of the cooling coils had low delta T even though the valves were open, which implied a problem with the hydronics. Upon physical investigation of the area, he discovered that this was caused by trapped air in the coil, so he bled it off. The delta T quickly went from 3 to 8.5 – a capacity increase of more than 65 percent – as displayed on the following graph:

 

DeltaT

These examples are deceptively simple. But without analytics, the managers would not have been able to as easily identify the exact location of the problem, the cooling units involved, and have enough information to direct trouble-shooting action within the short time needed to resolve problems in a mission critical facility.

Analytics typically use the information already available in a properly monitored data center. They complement the experienced intuition of data center personnel with at-a-glance data that helps identify potential issues more quickly and bypasses much of the tedious, blood pressure-raising and time-consuming diagnostic activities of hotspot resolution.

Analytics are not the future. Analytics have arrived. Data centers that aren’t taking advantage of them are riskier and more expensive to operate, and place themselves at competitive disadvantage

A Look at 2014

In 2014 we leveraged the significant company, market and customer expansion we achieved in 2013 to focus on strategic partnerships.  Our goal was to significantly increase our global footprint with the considerable resources and vision of these industry leaders.  We have achieved that goal and more.

Together with our long-standing partner NTT Facilities, we continue to add power and agility to complementary data center product lines managed by NTT in pan-Asia deployments.  In partnership with Schneider Electric, we are proud to announce the integration of Vigilent dynamic cooling management technology into the Cooling Optimize module of Schneider Electric’s industry-leading DCIM suite, StruxureWare for Data Centers.

Beyond the technical StruxureWare integration, Vigilent has also worked closely with Schneider Electric to train hundreds of Schneider Electric sales and field operations professionals in preparation for the worldwide roll-out of Cooling Optimize.  Schneider Electric’s faith in us has already proven well-founded as deployments are already underway across multiple continents.  With the reach of Schneider Electric’s global sales and marketing operations, their self-described “Big Green Machine,” and NTT Facilities’ expanding traction in and outside of Japan, we anticipate a banner year.

As an early adopter of machine learning, Vigilent has been recognized as a pioneer of the Internet of Things (IoT) for energy.  Data collected over seven years from hundreds of deployments continually informs and improves Vigilent system performance.  The analytics we have developed provide unprecedented visibility into data center operations and are driving the introduction of new Vigilent capabilities.

Business success aside, our positive impact on the world continues to grow.  In late 2014, we announced that Vigilent systems have reduced energy consumption by more than half a billion kilowatt hours and eliminated more than 351,000 tons of CO2 emissions.  These figures are persistent and grow with each new deployment.

We are proud to see our customers turn pilot projects into multiple deployments as the energy savings and data center operational benefits of the system prove themselves over and over again.  This organic growth is testimony to the consistency of the Vigilent product’s operation in widely varying mission critical environments.

Stay tuned to watch this process repeat itself as we add new Fortune 50 logos to our customer base in 2015.  We applaud the growing sophistication of the data center industry as it struggles with the dual challenges of explosive growth and environmental stewardship and remain thankful for our part in that process.

 

Data Center Risk

Surprising Areas of Data Center Risk and How to Proactively Manage Them

Mission critical facilities need a different level of scrutiny and control over cooling management.

It’s no surprise that cooling is critical to the security of these facilities.  With requirements for 99.999 uptime and multimillion dollar facilities at risk, cooling is often the thin blue line between data safety and disaster.

And yet, many mission critical facilities use cooling control systems that were designed for comfort cooling, versus the reliable operation of hugely valuable and sensitive equipment.

When people get warm, they become uncomfortable. When IT equipment overheats, it fails – often with catastrophically expensive results.

In one recent scenario, a 6-minute chiller plant failure resulted in lost revenue and penalties totaling $14 million.  In another scenario, the failure of a single CRAC unit caused temperatures to shoot up to over 100 degrees Fehrenheit in a particular zone, resulting in the failure of a storage array.

These failures result from a myriad of complex, and usually unrealized risk areas.  My recent talk at the i4Energy Seminar series hosted by the California Institute for Energy and Environment (CIEE) exposes some of these hidden risk areas and what you can do about them.

You can watch that talk here:

 

Unexpected Savings

Data Center Cooling Systems Return
Unexpected Maintenance Cost Savings

Advanced cooling management in critical facilities such as
data centers and telecom central offices can save tons of energy (pun
intended). Using advanced cooling management to achieve always-ready,
inlet-temperature-controlled operation, versus the typical always-on,
always-cold approach yields huge energy savings.

But energy savings isn’t the only benefit of advanced cooling management. NTT America recently took a hard look at some of the
direct, non-energy savings of an advanced cooling system. They quantified
savings from reduced maintenance costs, increased cooling capacity from
existing resources, improved thermal management and deferred capital
expenditures. Their analysis found that the non-energy benefits increased the total dollar savings by one-third.

Consider first the broader advantages of reduced maintenance costs. Advanced cooling management identifies when CRACs are operating
inefficiently. Turning off equipment that doesn’t need to be on reduces wear and tear. Equipment that isn’t running isn’t wearing out. Reducing wear and tear reduces the chance of an unexpected failure, which is always something to avoid in a mission-critical facility. One counter-intuitive result of turning off lightly provisioned CRACs is that inlet air temperatures are reduced by a few degrees. Reducing inlet air temperature also reduces the risk of IT equipment failure and increases the ride-through time in the event of a cooling system failure.

The maintenance and operations cost savings of advanced cooling
management is significant, but avoiding downtime is priceless.

Occam’s Razor

Data Center Energy Savings

The simplest approach to data center energy savings might suggest that a facility manager’s best option is to turn off a few air conditioners.  And there’s truth to this.  See the graph below, showing before and after energy usage, and the impact of turning off some of the cooling units.

Before & After Energy Management Software Started

But the simplicity suggested here is deceptive.

Which air conditioners?

How many?

How will this truly affect the temperature?

What’s the risk to uptime or ridethrough?

While turning things off or down is likely our greatest opportunity for significant, immediate savings, the science driving the decision of which device to turn off and when, is complex and dynamic.

Fortunately, a convergence of new technology – wireless sensors for continuous, real-time and location-specific data, along with predictive, adaptive software algorithms that take into account all immediate and known variables at any given moment – can predict the future impact of energy management decisions, taking on/off decision-making to a new level.  Now, for the first time, it’s possible – thanks to the latest AI technology – to automatically, constantly and dynamically manage cooling resources to reduce average temperatures across a facility and avoid hot and cold spots of localized temperature extremes. Simultaneously, overall cooling energy consumption is reduced by intelligently turning down, or off, the right CRACs at the right time. The result is continually optimized cooling with greater assurance that the overall integrity of the data center is preserved.