Monitoring & Management Tools
They say if you can’t monitor then how can you manage your Data Centre. Implementing effective monitoring of you core business automation systems is key to maintaining visibility of the Data Centre IT estate. IT Data centre Tools have evolved over the years from manufacturer specific tools to tools that can integrate with multiple manufacturer’s systems providing a single plane or dashboard on the current status of your estate. With our broad experience of the numerous areas of a Data Centre, we are able to bring that experience to aid your needs for tooling. The key is to clearly define your requirements and then prioritise them. A quick scan of the Internet will show more than 100 tools available all calling themselves Data Centre Infrastructure Management (DCIM) Tools – but many are far from being a true DCIM tool. With projects delivered for major organisations in the assessment and implementation of tooling, an independent specialist who can work with your team and the market suppliers to source and implement the right tool for your business needs. The following are some of the activities we execute in the delivery of a DCIM Project:
Identify the Needs Assessment and Requirements gathering
This is where PTS work with the internal technology teams and stakeholders to capture the identified needs and work to define the key items. PTS provides market and technology context and advises on the captured requirements. We create a prioritized list based on the MoSCoW tool and gain agreement.
Undertake a Data Centre Infrastructure Audit and Inventory
Working with the Data Centre Operations and Technology Teams we will undertake a full Data Centre audit and inventory – this will provide the basis of an Asset Management Database (a fundamental element of the DCIM tool).
Tool Selection and Evaluation
With so many tools in the market where do you start? Like so many markets the DCIM tool market is dominated by the big boys who have the marketing power buy it doesn’t mean their tools are the best suited to you or will be the cheapest. Our experience of evaluating the marketplace and ensuring you see the right tools that meet your needs will be important. This stage can appear lengthy as firstly PTS and then your Evaluation Team reduce the products down to a final two.
Tool Deployment and Integration
Once a decision has been made then there is a period of Planning, Deployment and Integration with your existing systems. Like all new systems it will be important that the build meets your internal Build and Security requirements and, in many cases, would have to go through a standard Application roll out process. The important part at this stage is to ensure no steps are missed as this will cause issues later on.
Configuration and Customization
The long and arduous task of adding and integrating all the systems needs to be undertaken in a controlled and systematic way to ensure the attributes are correctly configured and that systems are acting as normal within the current environments – expect some abnormal results initially which do not make sense, usually a badly configured attribute.
Alerting and Notification Setup
Once all the systems are added to the system, the first thing the Supplier would do is turn off all Notifications – this is to ensure you do not get the Tsunami of Error Messages flooding you. Identify the critical items and the error criteria you are specifically interested in and build from there. Remember you don’t want so many messages that you ‘can’t see the forest for the trees’. We will recommend the systems that need to be turned on so that we can manage the flow so its manageable.
Performance Baseline Establishment
In many cases this may be the first-time integrated Performance Reports are available and building high level dashboards for senior stakeholders would go down well with them as they can check performance without disturbing you – expect the unexpected phone calls though from time to time as management see a change to the worse on their dashboards – there is no hiding now. We help you create the baselines – we recommend being generous to start with and tweak them later on to show improvements. Do not expect everything to be perfect on day one, this will be tweaked many times till the environment and ‘real performance’ is known.
Automation and Orchestration
Once the tool is set up and monitoring and there is stability – now the installation team can start the process of implementing standard automation items. This where the Project Team can start to list out all the activities that take place in the event of a monitoring alarm crosses a KPI measurement or a Syslog item is sent by a certain Server or Application. The list will quickly grow with the help of the technology team leads. We work with the various technology stakeholders to identify their initial lists of automation requirements and then work with the tools integrators to configure the tool. This initial list of automation scripts will be the ‘tip of the iceberg’ as the teams take small steps towards addressing all of their real needs.
Capacity Planning and Optimization
Many tools these days allow for an element of virtual test installations and the tool will estimate the impact of a proposed installation – as well as provide trend reports on current run rates – all of these enable the Data Centre Teams to provide Capacity Management Reports on the current estate so that the teams can head off a potential capacity constraint which is heading their way. Many of these reports and tracking will need to be set up of current reports customised. Using our experience of the environment and the tool we work with the Tools integrator to configure appropriate Capacity algorithms.
Patch and Update Management
Like all Servers that sit on the ‘Corporate Network’ they need to comply to standard build standard’s, and this will mean regular patching updates to meet the organisations standards. There will need to be an agreement on who and how this will be done. This has been the biggest challenge in most of the projects we have run with varying solutions implemented so that all parties are happy with the result, out experience of the options will help in resolving this key issue.
Configuration and Monitoring Data Backup
Just like any other key IT System, it will be important that the data collected, and the configuration data is backed up. Although many will not see this system as business critical there is a lot of important historical data that will impact the trending reports if the data is lost, notwithstanding the loss of the configuration data would be disastrous. We work with the data storage team to ensure that data backups are implemented for the platform and meet the needs of the systems integrity.
Performance Tuning and Optimisation
Once the system has been running for at least three months the Project Team will need to review the performance of the platform and start the process of Optimising the installation.
Data Centre Team Training and Documentation
An important element and once the platform has been initially installed would be the best time to commence the training. In many cases the best training will be undertaken on your platform by the installation engineers. Undertaking training offsite can be confusing when returning to see everything set up differently from the training platform. Insist that training is done onsite on your platform (if possible, record the training as there is a lot to learn). Account for the different levels of staff – not everyone will be the Administrator – some will only need Operations Training on the tool.
We ensure that the Installation Engineers provide comprehensive reporting before the platform is signed off. This is the installed Configurations, but it is important to have a baseline configuration documented.
Moving to 24/7 Monitoring and Support
Even though your Data Centre Team may not be 24 x 7 – the tool will continue monitoring – alerts that would be important during a Working Day may not be important on a Sunday Afternoon, so we make sure that alerts are downgraded when they are not required (i.e., the main Server CPU reaches 100% – this could be a serious situation during the day, but if the overnight batch system is running – this is just being efficient).
Provision of Reporting and Dashboard Panes
Creating regular reports – there will be requirements for Daily, Weekly and Monthly Reports to show event issues, exceptional events, capacity and performance reports. We work with you to identify, create and customise reports to meet the needs of the stakeholders. Developing Dashboards for Stakeholders and training them on the use of the boards and explaining the content. In many cases the Dashboards will allow the drilling down to the detail, so it will be important that continuity of the system is maintained and that management of the data is kept at the highest level.
Regular Reviews and Updates
Like all IT systems that are implemented, undertaking a regular review and identifying ‘the good, the bad and the ugly’ and addressing what needs to change to align it to the environment support is an important regular activity. Recommended to take place on at least a Quarterly basis with formal Meeting Notes and Actions Recorded, PTS is able to ensure the maximum benefit is gained from these reviews and can develop the plans out of the review meeting.
Vendor Management and Licensing
The final element is the ongoing management of the Systems Integrator (Tooling Vendor), as part of the Agreement there will probably be an element of maintenance and Consulting Time included in the Annual Contract for the platform. Ensuring this addresses the real needs of the team will be important as part of the initial engagement and Procurement process. The Team now need to manage the client and utilise the available time in the contract wisely. We help set up the Vendor Management process and guide the local team in managing the vendor so that they remain in control of the engagement.