Data centre construction projects are imperfect, with a traditional project team structure limiting transfer of essential knowledge to operations team - that's the view of David Cameron (no relation) of risk management consultancy Operational Intelligence. Here he puts the case for those operating the centres to have more input
9 April 2015 | By David Cameron
Few would argue against inclusion of the Integrated Systems Test (IST) as an accepted part of the data centre delivery model in a new build or refurbishment project delivery plan.
Definitions of an IST varies from project to project but, broadly speaking, they involve the final demonstration of a data centre's critical infrastructure in order to determine whether designed levels of resilience and redundancy work in practice.
There are many data centre project team structure arrangements but, in principle, a client engages with a design team to produce a design reflecting its needs. A contractor is then engaged to deliver the project.
At this early stage the extent of systems commissioning (Level 4 commissioning) and IST (Level 5 commissioning) is identified only as an overview.
The Kolb learning cycle states that to learn effectively we must touch all four of its quadrants, valuing experience, reflection, design and experience equally. The difficulty in the construction industry is that these quadrants tend to be dominated by different business sectors, becoming barriers to effective communication, learning and effective transfer of knowledge. In addition, there also tend to be contractual boundaries.
The interface between client and designer (or design build contractor) is particularly important. In the US, ASHRAE has proposed the development of an 'owner's project requirement' (OPR) document that identifies what the client requires and a 'basis of design' document (BOD) identifying how it will work.
There is a tendency for concept design reports to be superseded by the detailed design. The BOD, however, is intended to be modified throughout the project so that at handover it reflects the installed systems and commissioning that has taken place.
The two remaining interfaces vary from project to project with little consistency.
The transfer of knowledge from construction to operations teams is always difficult, regardless of the type of project being delivered. This can impact on both risk of downtime and energy efficiency, and the latter has been identified by BSRIA as a problem area with regard to energy performance of commercial office buildings. It's this that led to the development of BSRIA's 'soft landings' framework.
Looking at each of the construction handover deliverables allows us to consider how much more effective the traditional handover process could be.
Demonstrating capacity performance of equipment at full load conditions is important in showing that the supplier has delivered an appropriately sized item of plant. However, there is far more that can be gained for this period of testing.
Of particular relevance to the operations team is the energy and stability performance at no-load conditions.
Full-load performance is important for contractual reasons - but from an operations perspective part-load performance is far more relevant. Many data centre facilities are running at part load in an inefficient way purely because the facility was optimised to full load conditions during commissioning. The operations team has no reference point to make changes to optimise performance, concerned that any change it makes might remove the contractual responsibility from the design/construction team. Clearly, this is a barrier to effective energy optimisation.
Automatic recover under pre-defined failure events is generally used to demonstrate the satisfaction of failure scenarios identified within the specification.
For basic concurrent maintainability, a good demonstration would be the independent isolation and reinstatement of all component parts of the critical services infrastructure for an extended period. Satisfactory demonstration of this requirement would fulfil the contractual obligations, however, it does not demonstrate how the systems will operate under failure events.
This is an area of divergence from the contractual obligations on the installation contractor and the requirements of the operations team. Within a concurrently maintainable design components can and will fail and although such failures may be acceptable within the design, it's still important that the operations team understand the implications of such failure events, but more importantly how they would recover from them.
A principal reason for total mains failure test is that it generates the most alarms and as a result bombards the monitoring station with hundreds of alarm reports, all of differing priorities. Consultants, contractors and specialists will each have an idea of how these alarms should be prioritised and who should be notified, and how. The tendency is always to over-prioritise on the basis that no one ever got criticised for categorising a low-priority alarm as a high-priority alarm.
Once the facility is handed over to the operations team, their next opportunity to witness a full mains failure event is likely to be a real event. Following such an event the alarm priorities are often modified, so the alarm priority list developed by an operations team will be different as their terms of reference and experience are totally different. Therefore, there are benefits in engaging with the operations team during development of the equipment references, graphical and alarm interfaces.
O&M documentation needs to be relevant and of use to the operations team - yet they are rarely involved in the review process. Who better to review the documentation? Better still, who better to specify the make-up of the manuals than the operations team?
Manuals should include the Basis of Design document and the Close Out Report from the Level 5 commissioning.
Record drawings are generally the initial reference point for routine maintenance and fault recovery events. It's important that they contain the same references and notation as the equipment in the field.
Inaccuracies in equipment references between drawings and plant are cited as contributing factors in failure events where the meantime to recover (MTTR) has been extended because of uncertainty in identifying equipment and circuits in the pressured environment of a real-life scenario.
Responsibility for handover training is generally with the installation contractor, but is passed down to the specialist supplier and generally delivered by a commissioning or sales engineer. The focus is always on the specific equipment and often misses its relevance in the overall system and, in particular, operational interfaces.
Training is better received when delivered in context because practical experience reinforces theory. The equipment in isolation is important, but it does not operate on its own. It has operational interfaces that also need to be understood, maintained and tested. These interfaces tend to fall between suppliers and they are seldom highlighted during supplier training.
The focus of handover training for critical facilities should be the transfer of all knowledge from the construction team to the operations team. This can only be conducted effectively by allowing the operations team time to get familiar with the systems before undertaking any training. To this end, Level 4 and 5 testing are perfect introductions into the operational context of the equipment.
Handover training should include: a review of the project brief; presentation of the overall schematics and layout drawings; review of the high-level commissioning plan; the scope and purpose of monitoring and control systems; a programme for Level 4 and 5 commissioning; and a detailed review of the asset list.
The operations team should be allowed time to review and comment on the failure scenarios covered under the Level 4 and 5 commissioning if tests are to satisfy the objective of maximum information transfer as opposed to satisfaction of a contractual milestone. The operations team must also develop Standard and Emergency Operating Procedures (SOP, EOP). While SOPs can be based on good practice and experience, and as such be similar from project to project, EOPs are site-specific.
The order that systems must be recovered is based on dynamic conditions and will vary depending on the failure event. In general, EOPs are site-specific. The starting point for their development should be Level 4 (systems tests) and Level 5 (integrated systems test) commissioning tests.
Transfer of knowledge from design/construction teams to operations teams is an evident weakness in both new and legacy data centres. The process needs a change of mindset from all stakeholders including operations, maintenance contractors, construction teams and consultants. But this problem does not sit with any one party - it needs to be addressed as an industry.
When we talk specifically about energy optimisation we get comments along the lines of "we're not changing anything, otherwise it becomes our responsibility". This stops facilities realising significant energy savings. A conservative estimate based on our experience sets this figure at 10 per cent of annual energy consumption and applies to new as well as legacy data centres.
We propose two modifications to the existing process. Firstly, preparation at the outset of the project of a concept design or basis of design document, to be updated throughout the design, construction and commissioning process. This should then form part of the handover documentation, providing readers with a clear overview of the purpose and limitations of the facility. This should also be used as the reference point for future upgrades and there should be an obligation on the operations team to keep it up to date.
Secondly, change the final milestone from 'project handover /completion' to 'design and construction knowledge transfer' - thus providing a better description and focus. There are contractual definitions for practical and project completion; however, provided that knowledge transfer is stated as a key deliverable of practical completion, there should be no contradiction.