Analytics in Action for Data Center Cooling

When a data center is first designed, everything is tightly controlled. Rack densities are all the same. The layout is precisely planned and very consistent. Power and space constraints are well-understood. The cooling system is modeled – sometimes even with CFD – and all of the cooling units operate at the same level.

But the original design is often a short-lived utopia. The realty of most data centers becomes much more complex as business needs and IT requirements change and equipment moves in and out.

As soon as physical infrastructure changes, cooling capacity and redundancy are affected.  Given the complexity of design versus operational reality, many organizations have not had the tools to understand what has changed or degraded, so cannot make informed decisions about their cooling infrastructure. Traditional DCIM products often focus on space, network and power.  They don’t provide detailed, measured data on the cooling system.  So, decisions about cooling are made without visibility into actual conditions.

Analytics can help. Contrary to prevailing views, analytics don’t necessarily take a lot of know-how or data analysis skills to be extremely helpful in day-to-day operations management. Analytics can be simple and actionable. Consider the following examples of how a daily morning glance at thermal analytics helped these data center managers quickly identify and resolve some otherwise tricky thermal issues.

In our first example, the manager of a legacy, urban colo data center with DX CRAC units was asked to determine the right place for some new IT equipment. There were several areas with space and power available, but determining which of these areas had sufficient cooling was more challenging. The manager used a cooling influence map to identify racks cooled by multiple CRACs. He then referenced a cooling capacity report to confirm that more than one of these CRACs had capacity to spare. By using these visual analytics, the manager was able to place the IT equipment in an area with sufficient, and redundant, cooling.

In a second facility, a mobile switching center for a major telco, the manager noticed a hot spot on the thermal map and sent a technician to investigate the location. The technician saw that some of the cooling coils had low delta T even though the valves were open, which implied a problem with the hydronics. Upon physical investigation of the area, he discovered that this was caused by trapped air in the coil, so he bled it off. The delta T quickly went from 3 to 8.5 – a capacity increase of more than 65 percent – as displayed on the following graph:

(Click on the graphic to see a full page, clear image).

DeltaT

These examples are deceptively simple. But without analytics, the managers would not have been able to as easily identify the exact location of the problem, the cooling units involved, and have enough information to direct trouble-shooting action within the short time needed to resolve problems in a mission critical facility.

Analytics typically use the information already available in a properly monitored data center. They complement the experienced intuition of data center personnel with at-a-glance data that helps identify potential issues more quickly and bypasses much of the tedious, blood pressure-raising and time-consuming diagnostic activities of hotspot resolution.

Analytics are not the future. Analytics have arrived. Data centers that aren’t taking advantage of them are riskier and more expensive to operate, and place themselves at competitive disadvantage

A Look at 2014

In 2014 we leveraged the significant company, market and customer expansion we achieved in 2013 to focus on strategic partnerships.  Our goal was to significantly increase our global footprint with the considerable resources and vision of these industry leaders.  We have achieved that goal and more.

Together with our long-standing partner NTT Facilities, we continue to add power and agility to complementary data center product lines managed by NTT in pan-Asia deployments.  In partnership with Schneider Electric, we are proud to announce the integration of Vigilent dynamic cooling management technology into the Cooling Optimize module of Schneider Electric’s industry-leading DCIM suite, StruxureWare for Data Centers.

Beyond the technical StruxureWare integration, Vigilent has also worked closely with Schneider Electric to train hundreds of Schneider Electric sales and field operations professionals in preparation for the worldwide roll-out of Cooling Optimize.  Schneider Electric’s faith in us has already proven well-founded as deployments are already underway across multiple continents.  With the reach of Schneider Electric’s global sales and marketing operations, their self-described “Big Green Machine,” and NTT Facilities’ expanding traction in and outside of Japan, we anticipate a banner year.

As an early adopter of machine learning, Vigilent has been recognized as a pioneer of the Internet of Things (IoT) for energy.  Data collected over seven years from hundreds of deployments continually informs and improves Vigilent system performance.  The analytics we have developed provide unprecedented visibility into data center operations and are driving the introduction of new Vigilent capabilities.

Business success aside, our positive impact on the world continues to grow.  In late 2014, we announced that Vigilent systems have reduced energy consumption by more than half a billion kilowatt hours and eliminated more than 351,000 tons of CO2 emissions.  These figures are persistent and grow with each new deployment.

We are proud to see our customers turn pilot projects into multiple deployments as the energy savings and data center operational benefits of the system prove themselves over and over again.  This organic growth is testimony to the consistency of the Vigilent product’s operation in widely varying mission critical environments.

Stay tuned to watch this process repeat itself as we add new Fortune 50 logos to our customer base in 2015.  We applaud the growing sophistication of the data center industry as it struggles with the dual challenges of explosive growth and environmental stewardship and remain thankful for our part in that process.

 

Data Center Capacity Planning – Why Keep Guessing?

Capacity management involves decisions about space, power, and cooling.

Space is the easiest. You can assess it by inspection.

Power is also fairly easy. The capacity of a circuit is knowable. It never changes. The load on a circuit is easy to measure.

Cooling is the hardest. The capacity of cooling equipment changes with time. Capacity depends on how the equipment is operated, and it degrades over time. Even harder is the fact that cooling is distributed. Heat and air follow the paths of least resistance and don’t always go where you would expect. For these reasons and more, mission-critical facilities are designed for and built with far more cooling capacity than they need. And yet many operators add even more cooling each time there is a move, add, or change to IT equipment, because that’s been a safer bet than guessing wrong.

Here is a situation we frequently observe:

Operations will receive frequent requests to add or change IT loads as a normal course of business.  In large or multi-site facilities, these requests may occur daily.  Let’s say that operations receives a request to add 50 kW to a particular room.  Operations will typically add 70 kW of new cooling.

This provisioning is calculated assuming a full load for each server, with the full load being determined from server nameplate data.  In reality, it’s highly unlikely that all cabinets in a room will be fully loaded, and it is equally unlikely that the server will ever require its nameplate power.  And remember, the room was originally designed with excess cooling capacity.  When you add even more cooling to these rooms, you have escalated over-provisioning.  Capital and energy are wasted.

We find that cooling utilization is typically 35 to 40%, which leaves plenty of excess capacity for IT equipment expansions.  We also find that in 5-10% of situations, equipment performance and capacity has degraded to the point where cooling redundancy is compromised.  In these cases, maintenance becomes difficult and there is a greater risk of IT failure due to a thermal event. So, it’s important to know how a room is running before adding cooling.  But it isn’t always easy to tell if cooling units are not performing as designed and specified.

How can operations managers make more cost effective – and safe – planning decisions?  Analytics.

Analytics using real-time data provides managers with the insight to determine whether or not cooling infrastructure can handle a change or expansion to IT equipment, and to manage these changes while minimizing risk.  Specifically, analytics can quantify actual cooling capacity, expose equipment degradation, and reveal where there is more or less cooling reserve in a room for optimal placement of physical and virtual IT assets.

Consider the following analytics-driven capacity report.  Continually updated by a sensor network, the report clearly displays exactly where capacity is available and where it is not.  With this data alone, you can determine where capacity exists and where you can safely and immediately add capacity with no CapEx investment.  And, in those situations where you do need to add additional cooling, it will predict with high confidence what you need. (click on the image for a full-size version)

Cooling Capacity

Yet you can go deeper still.  By pairing the capacity report with a cooling reserve map (below), you can determine where you can safely place additional load in the desired room.  You can also see where you should locate your most critical assets and, when you need that new air conditioner, and where you should place it.

(click on the image for a full size version)thermalcircle

Using these reports, operations can:

  • avoid the CapEx cost of more cooling every time IT equipment is added;
  • avoid the risk of cooling construction in production data rooms when it is often not needed;
  • avoid the delayed time to revenue from adding cooling to a facility that doesn’t need it.

In addition, analytics used in this way avoids unnecessary energy and maintenance OpEx costs.

Stop guessing and start practicing the art of avoidance with analytics.

 

 

Maintenance is Risky

No real surprise here. Mission critical facilities that pride themselves on and/or are contractually obligated to provide the “five 9’s” of reliability know that sooner or later they must turn critical cooling equipment off to perform maintenance. And they know that they face risk each time they do so.

This is true even for the newest facilities. The minute a facility is turned up, or IT load is added, things start to change. The minute a brand new cooling unit is deployed, it starts to degrade – however incrementally. And that degree of degradation is different from unit to unit, even when those units are nominally identical.

In a risk and financial performance panel presentation at a recent data center event sponsored by Digital Realty, ebay’s Vice President of Global Foundation Services Dean Nelson recently stated that “touching equipment for maintenance increases Probability of Failure (PoF).” Nelson actively manages and focuses on reducing ebay’s PoF metric throughout the facilities he manages.

Performing maintenance puts most facility managers between the proverbial rock and a hard place. If equipment isn’t maintained, by definition you have a “run to failure” maintenance policy. If you do maintain equipment, you incur risk each time you turn something off. The telecom industry calls this “hands in the network” which they manage as a significant risk factor.

What if maintenance risks could be mitigated? What if you could predict what would happen to the thermal conditions of a room and, even more specifically, what racks or servers could be affected if you took a particular HVAC unit offline?

This ability is available today. It doesn’t require computational fluid dynamics (CFD) or other complicated tools that rely on physical models. It can be accomplished through data and analytics. That is, analytics continually updated by real-time data from sensors instrumented throughout a data center floor. Gartner Research says that hindsight based on historical data, followed by insight based on current trends, drives foresight.

Using predictive analytics, facility managers can also determine exactly which units to maintain and when – in addition to understanding the potential thermal affect that each maintenance action will have on every location in the data center floor.

If this knowledge was easily available, what facility manager wouldn’t choose to take advantage of it before taking a maintenance action? My next blog post will provide a visual example of the analysis facility managers can perform to determine when and where to perform maintenance while simultaneously reducing risk to more critical assets and the floor as a whole.

Predictive Analytics & Data Centers: A Technology Whose Time Has Come

Back in 1993, ASHRAE organized a competition called the “Great Energy Predictor Shootout,” a competition designed to evaluate various analytical methods used to predict energy usage in buildings.  Five of the top six entries used artificial neural networks.  ASHRAE organized a second energy predictor shootout in 1994, and this time the winners included a balance of neural networks and non-linear regression approaches to prediction and machine learning.  And yet, as successful as the case studies were, there was little to no adoption of this compelling technology.

Fast forward to 2014 when Google announced its use of machine learning leveraging neural networks to “optimize data center operations and drive…energy use to new lows.”  Google uses neural networks to predict power usage effectiveness (PUE) as a function of exogenous variables such as outdoor temperature, and operating variables such as pump speed. Microsoft too has stepped up to endorse the significance of machine learning for more effective prediction analysis.  Joseph Sirosh, corporate vice president at Microsoft, says:  “traditional analysis lets you predict the future. Machine learning lets you change the future.”  And this recent article advocates the use of predictive analytics for the power industry.

The Vigilent system also embraces this thinking, and uses machine learning as an integral part of its control software.  Specifically, Vigilent uses continuous machine learning to ensure that predictions driving cooling control decisions remain accurate over time, even as conditions change (see my May 2013 blog for more details).  Vigilent predictive analysis continually informs the software of the likely result of any particular control decision, which in turn allows the software to extinguish hot spots – and most effectively optimize cooling operations with desired parameters to the extent that data center design, layout and physical configuration will allow.

This is where additional analysis tools, such as Vigilent’s influence maps, become useful.  The influence maps provide a current, real-time and highly visual display of which cooling units are cooling which parts of the data floor.

As an example, one of our customers saw that he had a hot spot in a particular area that hadn’t been automatically corrected by Vigilent.  He reviewed his Vigilent influence map and saw that the three cooling units closest to the hot spot had little or no influence on the hot spot.  The influence map showed that cooling units located much farther away were providing some cooling to the problem area.  Armed with this information, he investigated the cooling infrastructure near the hot spot and found that dampers in the supply ductwork from the three closest units were closed.  Opening them resolved the hot spot.  The influence map provided insight that helped an experienced data center professional more quickly identify and resolve his problem and ensure high reliability of the data center.

Operating a data center without predictive analytics is like driving a car facing backwards.  All you can see is where you’ve been and where you are right now.  Driving a car facing backwards is dangerous.   Why would anyone “drive” their data center in this way?

Predictive analytics are available, proven and endorsed by technology’s most respected organizations.  This is a technology whose time has not only come, but is critical to the reliability of increasingly complex data center operations.

IMG_7525_cliff250

A Look at 2013

We grew!

We moved!

We’ve had a heck of a year!

In 2013 alone, we reduced (and avoided the generation of) more than 85 thousand tons of carbon emissions from the atmosphere.

This is a statistic of which I am very, very proud and one that clearly demonstrates the double bottom line impact of the Vigilent solution.

We have directly impacted the planet by reducing energy requirements and CO2 emissions, even as the demands of our digital lifestyles increase.  We have impacted individual quality of life by increasing uptime reliability and contributing to the safety of treasured documents and photos, as well as helping to ensure the uninterrupted transmission of information that makes our world operate.  We are honored and privileged to contribute so directly to the well-being of our world and our customers.

While analysts have cited a DCIM market contraction in 2013, Vigilent has thrived.   We attracted new customers and engendered even deeper loyalty among existing customers – evidenced by our organic growth as one deployment turns into 3, then ten, then dozens across the United States when actual energy savings and thermal condition insights are realized.

I am pleased to share some of the milestones we achieved in 2013:

We moved to terrific new facilities in uptown Oakland.  Not only does our new facility (within a literally green building)  provide us with space for in-house product commissioning and expanded R&D,  it provides a vibrant collaborative atmosphere for employees.  The new location is adjacent to public transportation, honoring our commitment to a green corporate culture, and offers dozens of great restaurants, coffee shops and diverse entertainment options for employees.

We grew – in revenues, in customer base, into new markets and with staff.  With growth comes responsibility to provide more directed  leadership in business functions and market focus.  With this in mind, we expanded our executive management staff, hiring  Dave Hudson to oversee sales and operations worldwide, and  Alex Fielding to introduce Vigilent to federal markets and many new field engineers, software engineers, QA and support staff.

We expanded our product offering with new functionality including out-of-the-box reports that help with energy savings, SLA adherence, maintenance and capacity planning.  We continued to refine our trademark intelligence and control functionality enhancing both usability and energy savings in ever more complex data center environments – achieving an additional 30% savings in some cases.

Ultimately, all of this helps our customers succeed not only in direct bottom line impact, but with large-scale sustainability efforts that are widely recognized.  Avnet used the Vigilent system in corporate sustainability initiatives that garnered the company the Uptime Institute GEIT award, as well as recognition by InfoWorld as a top Green IT award winner.    Our sales partner, NTT Facilities, continues to roll out  Vigilent deployments in Japan.

Our ability to contribute to the Federal Government’s initiative to consolidate data centers and reduce overall energy savings is significant indeed.  Watch this space.

With a great year behind us, we recognize that there is much to do, as the data center industry – at last – is realizing how significantly data and analytics can improve day to day operations and efficiency endeavors.

The Emerson-Poneman Institute recently issued a study on Data Center outages that states accidental human error remains in the top-3 cited reasons for downtime and that 52% of survey respondents believe these accidents could have been prevented.

Intelligent software control and analytics will help operators make better,  more informed decisions and reduce such human errors.   These tools will increasingly help data centers proactively avoid trouble, while at the same time helping them diagnose and resolve actual issues more quickly.

This will be the year of analytics for data centers.  Vigilent is equipped and prepared to lead this charge, leveraging years of institutional knowledge we have gleaned  from hundreds of deployments in every conceivable configuration in mission critical facilities on four continents.  This mass of data influences the analytics we use to engage individual control decisions at every site, and also, more recently, places the benefit of this accumulated knowledge into the hands and minds of data center managers for more informed process management.

Happy New Year.

Analytics in Action for Data Center Cooling

When a data center is first designed, everything is tightly controlled. Rack densities are all the same. The layout is precisely planned and very consistent. Power and space constraints are well-understood. The cooling system is modeled – sometimes … [Read more]

A Look at 2014

In 2014 we leveraged the significant company, market and customer expansion we achieved in 2013 to focus on strategic partnerships.  Our goal was to significantly increase our global footprint with the considerable resources and vision of these … [Read more]

Data Center Capacity Planning – Why Keep Guessing?

Capacity management involves decisions about space, power, and cooling. Space is the easiest. You can assess it by inspection. Power is also fairly easy. The capacity of a circuit is knowable. It never changes. The load on a circuit is easy to … [Read more]

Maintenance is Risky

No real surprise here. Mission critical facilities that pride themselves on and/or are contractually obligated to provide the “five 9’s” of reliability know that sooner or later they must turn critical cooling equipment off to perform maintenance. … [Read more]

Predictive Analytics & Data Centers: A Technology Whose Time Has Come

Back in 1993, ASHRAE organized a competition called the “Great Energy Predictor Shootout,” a competition designed to evaluate various analytical methods used to predict energy usage in buildings.  Five of the top six entries used artificial neural … [Read more]

A Look at 2013

We grew! We moved! We’ve had a heck of a year! In 2013 alone, we reduced (and avoided the generation of) more than 85 thousand tons of carbon emissions from the atmosphere. This is a statistic of which I am very, very proud and one that … [Read more]