Nagios Monitoring Standards & Guidelines

Naming Standards

Hostname Naming Standards:

Pattern: host.domain
Example: Hostname.domain.com

Each hostname will have a fully qualified domain name.

Host Groups Naming Standards:

Pattern: Operatingsystem_cc
Examples: Linux_us, Windows_us, Cisco_jp, Hpux_us

Each host group contains the operatins system followed by a two letter, lower-case country code.

Service Groups Naming Standards:

Pattern: Servicegroup_loc_env_cc
Example: Apache_ext_prod_us, Drupal_int_test_us

Must start with a descriptive name of the service type.
Should contain network location of external (ext) or internal (int).
Must have what environment it is in prod (production) or test.
Must have a 2 letter country code in lower case at the end.

Service Naming Standards:

Pattern: Apache_host_cc
Example: Httpd_hostname, mysql_hostname, ping_hostname

Must have a capitalized desciptive name about the service type it will hold.
Must have the host name.
Must have a 2 letter country code in lower case country code at the end.

Contact Groups Naming Standards:

Pattern: Contactgroup_dept
Examples: Admins_noc, Helpdesk_noc, SA_noc

Will have a name that is descriptive of the department.
Will contain the abbreviated name to help identify the group in the company.

Contact Naming Standards:

Pattern: FLast
Example: jsmith

Will match the windows login name of the user.

Monitoring Frequency Standards

The default will be to monitor services every 2 minutes.

When an error is caught by the monitoring, the service will be checked 3 times every 30 seconds until a notification is sent.

If the production service is not critical, the service can be checked less frequently.

Test systems, if important, will be checked every 15 minutes, and then will be rechecked 5 more times every 5 minutes before a notification is sent. (Only important test systems will have monitoring and they could be down for 25-45 minutes with no alerts)

Where Monitoring Should Occur

The goal behind our monitoring systems is to have each system handle as much of the monitoring checks as possible to lessen the load off of the main Nagios monitoring system. This will allow more frequent checks and more stability of the monitoring server.

Common Status Code Standards

The 4 status codes that we will standardize on are:

0 = OK (0) means that the process ran to completion and is running within acceptable parameters.
1 = Warning (1) means that the process didn't fail, but it is in a state where some action may be required.
2 = Error (2) means that an error occurred with the process and action needs to be taken.
3 = Unknown (3) means that something unknown may have happened to the process and should be checked.

Subject

Nagios

Nagios Monitoring Standards & Guidelines

Naming Standards

Hostname Naming Standards:

Host Groups Naming Standards:

Service Groups Naming Standards:

Service Naming Standards:

Contact Groups Naming Standards:

Contact Naming Standards:

Monitoring Frequency Standards

Where Monitoring Should Occur

Common Status Code Standards

Subject

Popular

Recent content

Naming Standards

Hostname Naming Standards:

Host Groups Naming Standards:

Service Groups Naming Standards:

Service Naming Standards:

Contact Groups Naming Standards:

Contact Naming Standards:

Monitoring Frequency Standards

Where Monitoring Should Occur

Common Status Code Standards

Subject

User login

Search

Popular

Recent content