How To Monitor Network Switch and Ports Using Nagios

by Ramesh Natarajan on November 3, 2008

[Nagios Monitoring Switch]Nagios is hands-down the best monitoring tool to monitor host and network equipments. Using Nagios plugins you can monitor pretty much monitor anything.

I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.

Earlier I wrote about, how to setup Nagios to monitor Linux Host, Windows Host and VPN device.

In this article, I’ll explain how to configure Nagios to monitor network switch and it’s active ports.

1. Enable switch.cfg in nagios.cfg

Uncomment the switch.cfg line in /usr/local/nagios/etc/nagios.cfg as shown below.

[nagios-server]# grep switch.cfg /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg

2. Add new hostgroup for switches in switch.cfg

Add the following switches hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file.

define hostgroup{
hostgroup_name  switches
alias           Network Switches
}

3. Add a new host for the switch to be monitered

In this example, I’ve defined a host to monitor the core switch in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your switch ip-address accordingly.

define host{
use             generic-switch
host_name       core-switch
alias           Cisco Core Switch
address         192.168.1.50
hostgroups      switches
}

4. Add common services for all switches

Displaying the uptime of the switch and verifying whether switch is alive are common services for all switches. So, define these services under the switches hostgroup_name as shown below.

# Service definition to ping the switch using check_ping
define service{
use                     generic-service
hostgroup_name          switches
service_description     PING
check_command           check_ping!200.0,20%!600.0,60%
normal_check_interval   5
retry_check_interval    1
}

# Service definition to monitor switch uptime using check_snmp
define service{
use                     generic-service
hostgroup_name          switches
service_description     Uptime
check_command           check_snmp!-C public -o sysUpTime.0
}

5. Add service to monitor port bandwidth usage

check_local_mrtgtraf uses the Multil Router Traffic Grapher – MRTG. So, you need to install MRTG for this to work properly. The *.log file mentioned below should point to the MRTG log file on your system.

define service{
use			        generic-service
host_name			core-switch
service_description	Port 1 Bandwidth Usage
check_command		check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10
}

6. Add service to monitor an active switch port

Use check_snmp to monitor the specific port as shown below. The following two services monitors port#1 and port#5. To add additional ports, change the value ifOperStatus.n accordingly. i.e n defines the port#.

# Monitor status of port number 1 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 1 Link Status
check_command        check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}

# Monitor status of port number 5 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 5 Link Status
check_command	       check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB
}

7. Add services to monitor multiple switch ports together

Sometimes you may need to monitor the status of multiple ports combined together. i.e Nagios should send you an alert, even if one of the port is down. In this case, define the following service to monitor multiple ports.

# Monitor ports 1 - 6 on the Cisco core switch.
define service{
use                   generic-service
host_name             core-switch
service_description   Ports 1-6 Link Status
check_command         check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB, -o ifOperStatus.4 -r 1 -m RFC1213-MIB, -o ifOperStatus.5 -r 1 -m RFC1213-MIB, -o ifOperStatus.6 -r 1 -m RFC1213-MIB
}

8. Validate configuration and restart nagios

Verify the nagios configuration to make sure there are no warnings and errors.

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check

Restart the nagios server to start monitoring the VPN device.

# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

# /etc/rc.d/init.d/nagios start
Starting nagios: done.

Verify the status of the switch from the Nagios web UI: http://{nagios-server}/nagios as shown below:

[Nagios GUI for Network Switch]
Fig: Nagios GUI displaying status of a Network Switch

 

9. Troubleshooting

Issue1: Nagios GUI displays “check_mrtgtraf: Unable to open MRTG log file” error message for the Port bandwidth usage

Solution1: make sure the *.log file defined in the check_local_mrtgtraf service is pointing to the correct location.

Issue2: Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.

Solution2: Make sure both net-snmp and net-snmp-util packages are installed. In my case, I was missing the net-snmp-utils package and installing it resolved this issue as shown below.

[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2

[nagios-server]# rpm -ivh net-snmp-utils-5.1.2-11.EL4.10.i386.rpm
Preparing...       ########################################### [100%]
1:net-snmp-utils   ########################################### [100%]

[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
net-snmp-utils-5.1.2-11.EL4.10

Note: After you’ve installed net-snmp and net-snmp-utils, re-compile and re-install nagios plugins as explained in “6. Compile and install nagios plugins” in the Nagios 3.0 jumpstart guide.

Two Best Nagios Books

These are the two best nagios books that covers the latest Nagios 3. I strongly recommend that you read both of these books to gain a detailed understanding on Nagios. Since Nagios is free software, spending few dollars on the books can be the best investment you can make.

Nagios 3.1 (2nd edition)    Learning Nagios 3.1
Nagios Book 1
  
Nagios Book 2

Awesome Nagios Articles

Following are few awesome Nagios articles that you might find helpful.


Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 27 comments… read them below or add one }

1 Prune November 26, 2008 at 4:44 am

You should maybe consider graphing tools like cacti or big brother for your network usage. Nagios is mainly targeted as a status monitoring application.

2 Bryan December 10, 2008 at 8:26 pm

Nagios is a really good status monitoring system, but is better to install Cacti in order to graph our devices, the main advantage of Cacti, is the simple but powerful usage.

3 G S Kharel February 24, 2009 at 1:03 am

hello every one

I want to find the health of all the router and switches connected in my organization LAN using java. So, could you help me in the concern topic.

4 Jeremy April 3, 2009 at 1:30 pm

What if there are 2 switches in a stack? How do I monitor the ports on the 2nd switch on the stack?

5 Ramesh April 11, 2009 at 12:55 am

@Prune, @Bryan,

Thanks for the information about Cacti – The complete RRDtool based graphing solution. I’ll check it out.
 

@Jeremy,

If you are using cisco switches, Check-out check_cisco.pl Nagios Pluginfound in the Nagios Exchange that seems to monitor the ports that are part of the multiple stacked switches.

6 salaz July 9, 2009 at 2:56 am

Hello,

How about if i want to monitor network bandwidth usage within a building, where do i should place the nagios server? after the firewall?

7 Cameron September 14, 2009 at 10:12 pm

Hi Guys,

I’m interested in monitoring a switch stack (echoing Jeremy’s question above). The cisco plugin that Ramesh linked to looks good, but I’m using Enterasys equipment.

Is anyone aware of any ways to monitor switch stacks via snmp?

Cheers,
Cameron

8 Bronislav Kaminsky October 29, 2009 at 4:32 am

Hy guys!
Does anyone have an idea on making smth like subservices in Nagios. A.e. I need to monitor the throughput on each port of the switch and each port status. I wrote a small script using snmpwalk command, that gives a list of ports and their status(or anything that snmpwalk shows) as a result. And I dont’t even imagine how to make Nagios understand it in this way: host -> service(my script) -> port(as a result of the previous) -> smth I choose from snmpwalk
Thanx

9 Norman Paterson November 30, 2009 at 9:38 am

Where is the mapping between snmp port numbers and switch port numbers defined? It’s simple enough for the first 24 ports, but my 2950 then has two more ports labelled 1 and 2 (ie duplicating the first 2 port numbers). If I use snmpwalk to see what ports are being monitored, I get 28 in all:

# snmpwalk -v1 -c public my_switch -m ALL .1 | grep ifOperStatus
RFC1213-MIB::ifOperStatus.1 = INTEGER: down(2)
RFC1213-MIB::ifOperStatus.2 = INTEGER: down(2)
RFC1213-MIB::ifOperStatus.3 = INTEGER: down(2)

RFC1213-MIB::ifOperStatus.26 = INTEGER: down(2)
RFC1213-MIB::ifOperStatus.27 = INTEGER: up(1)
RFC1213-MIB::ifOperStatus.28 = INTEGER: up(1)

I’ve not been able to find this information from the usual sources (ie Google, Cisco).

10 sawsen December 1, 2009 at 5:37 am

Hi,
Always i have: check_mrtgtraf: Impossible d’ouvrir le fichier de log de MRTG
In the configuration of switch I have:
check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.253_1.log!AVG!1000000,1000000!5000000,5000000!10
but in var/lib/mrtg i don’t have a file 192.168.1.253_1.log
Can someone help me?
thank you in advance

11 baggage April 23, 2010 at 3:55 am

Can you take a look at this, please?
Or Can somebody tell me how to add a switch monitoring on nagios3 without centreon interface?
Actually, I mean about the services since i’ve added the switch. =)
But I would like to monitor it’s Bandwidth Usage and CPU load/stats.
I did sth like the following :

- Bandwidth/traffic :
Error message is : Port 1 Bandwidth Usage CRITICAL(Return code of 139 is out of bounds)

define service{
use generic-service
host_name switch1
service_description Port 1 Bandwidth Usage
check_command check_mrtg!10.4.126.234_1.log!1!1000000!1500000!In!Bytes/Sec
# check_command traffic_average!10.4.126.234 10 !AVG 1000000 2000000 5000000 5000000
}

along with the following given command in /etc/nagios-plugins/config/mrtg.cfg :

# ‘check_mrtg’ command definition
define command{
command_name check_mrtg
command_line /usr/lib/nagios/plugins/check_mrtg ‘$ARG1$’ 10 AVG ‘$ARG2$’ ‘$ARG3$’ ‘$ARG4$’ ‘$ARG5$’ ‘$ARG6$’
}

# ‘traffic_average’ command definition
define command{
command_name traffic_average
command_line /usr/lib/nagios/plugins/check_mrtgtraf ‘$ARG1$’ 10 AVG ‘$ARG2$’ ‘$ARG3$’ ‘$ARG4$’ ‘$ARG5$’
}

- CPU stats :
Erro message : Port2CPUstats snmp_cpustat -> UNKNOWN : External command error: Cannot find module (”): At line 1 in (none)

define service{
hostgroup_name Switches
use generic-service
service_description Port2CPUstats snmp_cpustats
check_command snmp_cpustats!public!50
}

along with the following given command definition in /etc/nagios-plugins/config/snmp.cfg:

# ‘snmp_cpustats’ command definition
define command{
command_name snmp_cpustats
command_line /usr/lib/nagios/plugins/check_snmp -H ‘$HOSTADDRESS$’ -C ‘$ARG1$’ -o .1.3.6.1.4.1.2021.11.9.0,.1.3.6.1.4.1.2021.11.10.0,.1.3.6.1.4.1.2021.11.11.0 -l ‘CPU usage (user system idle)’ -u ‘%’
}

For instance, I really abandon centreon matter because it’s a kinda “urgent” project for me. :s
for mrtg, do we need to install it? That’s kinda what i got from googling. :/

Thanks,

kpwstarina@yahoo.com

12 David Merrick June 28, 2010 at 5:50 pm

@baggage and @sawsen–
I’m no expert, and am still trying to get mrtg working myself, but most tutorials on how to install mrtg have you put the log files in /var/www/mrtg.

Instead of:
check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10

You might try:
check_local_mrtgtraf!/var/www/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10

(Notice the /var/www instead of /var/lib)
~David

13 Vaibhav Mishra September 23, 2010 at 2:05 am

Hi Experts,

Can I request for some suggestion for my query as below? I have NAGIOS 3.2 installed on Red Hat Enterprise Linux AS release 4. When I try to monitor traffic on a particular port of my router (router support SNMP), I get Return code of 255 is out of bounds.

I learnt that my OS should have NET:SNMP agent installed. When I checked, I found that NET::SNMP agent installed as you can see below the output.

rpm -qa | grep net-snmp
net-snmp-devel-5.1.2-11.EL4.6
net-snmp-5.1.2-11.EL4.6
net-snmp-utils-5.1.2-11.EL4.6
net-snmp-perl-5.1.2-11.EL4.6
net-snmp-libs-5.1.2-11.EL4.6

Can some one tell me where I am making a mistake or if the NET::SNAMP agent need some customization?

Your kind help must be appreciated.

Regards,

Vaibhav

14 Jay September 27, 2010 at 12:34 pm

Hoping someone can help me here. I am currently running Nagios Core Version 3.2.1 and am having an issue monitoring ports on my Cisco 3750 switches. Currently I have all of my switches setup within a host group and can see all other data with the exception of port status. I followed the directions from this site on how to set it up, but to no avail. For instance, I keep getting the error “External command error: Error in packet” and it shows a status of UNKNOWN. What’s strange is that I also have a monitor setup on port 1 (Port 1 Link Status) and it shows as green, but yet the port is in down/down status within the switch. Here is my syntax:

check_snmp! -C -o ifOperstatus.4 -r 1 -m RFC1213-MIB

Any help would be greatly appreciated.

-Jay

15 ramesh December 24, 2010 at 1:35 am

hi experts,

hi iam configuring nagiios monitoring for but i’m confusing about web access..
so can you please suggest me how would it be please,

regards,
ramesh.

16 PT May 11, 2011 at 4:13 pm

Is it possible to use check_mrtgtraf if the server running mrtg is a windows server? In my environment I have mrtg running on a windows 2000 box.

17 Chitti Prasad December 24, 2011 at 2:01 am

In Steps #6 and #7 – to monitor switch ports, the Port Number given in the service command i.e., “ifOperStatus.1″ <== the 1 is not always the switch port number. The port number for different switches will be different – for example for Cisco2960 Gigabit switch, it will start at 10101 for the first port. It is better to run a snmpwalk command on the switch with "ifDescr" as OID as below for a switch with ip address 172.16.16.1:
#snmpwalk -v 1 -c public 172.16.16.1 ifDescr
IF-MIB::ifDescr.1 = STRING: Vlan1
IF-MIB::ifDescr.10101 = STRING: GigabitEthernet0/1
IF-MIB::ifDescr.10102 = STRING: GigabitEthernet0/2
IF-MIB::ifDescr.10103 = STRING: GigabitEthernet0/3
IF-MIB::ifDescr.10104 = STRING: GigabitEthernet0/4
IF-MIB::ifDescr.10105 = STRING: GigabitEthernet0/5
.
.
.
The number foloowing "ifDescr" in the output above is the actual switch port number. As displayed in the first line, the "ifDescr.1" denotes "Vlan1" – which can be taken as the total switch itself – thus it can be used to get the status of the sitch as a whole!

18 Francesco Pucci March 9, 2012 at 5:50 am

I did not find plugin check_snmp into my Nagios installation (Ubuntu 10.04 – Nagios Core 3.2.3).
Searching Internet I read that I have to install the service snmp and then recompile the Nagios plugin; but when I launched the make command I received the following errors:

nagios-plugins-1.4.15/plugins/check_http.c:785: undefined reference to `np_net_ssl_write’
check_http.o(.text+0x120b):/home/machielr/nagios-plugins-1.4.11/plugins/check_http.c:789: undefined reference to `np_net_ssl_read’
check_http.o(.text+0x12bd):/home/machielr/nagios-plugins-1.4.15/plugins/check_http.c:828: undefined reference to `np_net_ssl_cleanup’
check_http.o(.text+0x14f4):/home/machielr/nagios-plugins-1.4.15/plugins/check_http.c:734: undefined reference to `np_net_ssl_init’
check_http.o(.text+0×1513):/home/machielr/nagios-plugins-1.4.15/plugins/check_http.c:736: undefined reference to `np_net_ssl_check_cert’
check_http.o(.text+0x151a):/home/machielr/nagios-plugins-1.4.15/plugins/check_http.c:737: undefined reference to `np_net_ssl_cleanup’

I resolved cleaning the make info before recompiling the plugins:

make distclean
./configure –with-nagios-user=nagios –with-nagios-group=nagios –with-openssl
make all
make install

Now, I find the check_snmp plugin under the folder /usr/local/nagios/libexec and all works well !

19 Cooltechie March 22, 2012 at 3:11 pm

Although the information on this page is very useful I still have trouble understanding why the Author will call it Monitoring “Ports” when in reality the only thing this setting is doing is monitoring “Interfaces” (NICs)
If I have an HP Switch with 24 “ports” these ports will never be monitored by this writings, only the Interface with an IP address will, the 24 ports will not.
I wish to find something that will monitor bandwidth at each port level, in order for me to know how much bandwidth a particular port in the switch is pulling at any time.

20 Gaurav September 15, 2012 at 8:38 pm

Hi ,
I have to implement cpu utilization for pktr device so I download the plugin check_cpu and did following :

1) Put check_cpu into your Nagios libexec directory
(e.g. /usr/local/Nagios/libexec, or wherever Nagios is installed on your server)

2) Create the symbolic links, being careful not to overwrite any files by the same name already there:
ln –s check_cpu check_load
ln –s check_cpu check_ram
ln –s check_cpu check_swap
(these symlinks dictate how the script is run, as it can check CPU, RAM, load and swap, depending on how it is invoked)

3) Edit your Nagios commands.cfg file and add some entries like this:

# ‘check_cpu’ command definition
# w = Warning level (if CPU % idle falls below this level – must be a percentage)
# c = Critical level
define command{
command_name check_cpu
command_line $USER1$/check_cpu -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $USER3$
}

# ‘check_swap’ command definition
# w = Warning level (if % or MB swap free drops below this level)
# c = Critical level
define command{
command_name check_swap
command_line $USER1$/check_swap -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $USER3$
}

# ‘check_ram’ command definition
# w = Warning level defined as percentage (with % sign) or in megabytes (without % sign)
# c = Critical level (can be defined as percentage or MB independently of warning)
# o = below|above –> if mem [free drops BELOW] or [used rises ABOVE] thresholds
define command{
command_name check_ram
command_line $USER1$/check_ram -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $USER3$ -o $ARG3$
}

# ‘check_load’ command definition
# w = Warning levels (alert if n,n,n (1,5,15 minute load averages) go above these levels)
# c = Critical levels
define command{
command_name check_load
command_line $USER1$/check_load -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $USER3$
}
and then in switch.cf file i entered following :
define host{
use generic-switch
host_name atx-pktr
alias atx-pktr
address xx.xx.xx.xxx
hostgroups dti
}

define service{
use generic-service
host_name atx-pktr
service_description PING
check_command check_ping!700.0,40%!1200.0,70%
normal_check_interval 5
retry_check_interval 1
}

define service{
use generic-service ; Inherit values from a template
host_name atx-pktr
service_description Uptime
check_command check_snmp!-C encrypted -o sysUpTime.0
}

define service{
use generic-service ; Inherit values from a template
host_name atx-pktr
service_description cpu
check_command check_cpu!10%!20%
}

But when I go to Nagios and serch for this device it is not giving me anything for CPU utilization…. Can anyone please help this is really urgent for my Project.

Output on Nagios page:

Service Status Details For Host ‘atx-pktr’

Host Sort by host name (ascending)Sort by host name (descending) Service Sort by service name (ascending)Sort by service name (descending) Status Sort by service status (ascending)Sort by service status (descending) Last Check Sort by last check time (ascending)Sort by last check time (descending) Duration Sort by state duration (ascending)Sort by state duration time (descending) Attempt Sort by current attempt (ascending)Sort by current attempt (descending) Status Information
atx-pktr

PING

OK 09-15-2012 19:33:09 16d 20h 47m 26s 1/3 PING OK – Packet loss = 0%, RTA = 63.53 ms

Uptime

OK 09-15-2012 19:33:06 17d 7h 59m 36s 1/3 SNMP OK – Timeticks: (1417832615) 164 days, 2:25:26.15

cpu

UNKNOWN 09-15-2012 19:28:49 0d 1h 41m 0s 3/3 CPU ERROR: SNMP Connection Problem

Regards
Gaurav

21 Rajendra December 23, 2012 at 11:03 am

Hi All,

Information is worth and usefull, I have installed Nagios 3.2 in my project.

I am able to monitor the cisco switches ports status and traffic utilzation of the ports..

Now i would like to know how to monitor/configure Cisco switches CPU utilization??

It will be great if you can provide the configuration example…

Thank you in advance

Best Regards,
Rajendra

22 Roth January 12, 2013 at 2:13 pm

Hi there,
I am not sure how could I know the OID from my switch port. I would like to monitor my switch port from nagios.
Anyone could help me?

Thanks in advance,
Roth

23 Ben Mobabbe February 17, 2013 at 12:44 pm

Hi,
you should give this plugin a try.

check_nwc_health not only can be used in a multi-vendor environment (Cisco, HP, F5, Juniper,…), it brings together many features in one single plugin.
–mode uptime
–mode interface-usage // monitor _all_ interfaces
–mode interface-usage –name // monitor _one_ specific interface
–mode interface-usage –name –regexp // monitor interfaces which match
–mode cpu-usage
–mode hardware-health
–mode memory-usage
…and many more

24 tonmoy May 20, 2013 at 11:12 pm

not getting my switchport snmp reply…status unknown.any help

25 vijayscsa July 31, 2013 at 1:04 am

Hi ..!

Right now, i’m monitoring a device through device for all services it is giving the following error.
“Return code of 127 is out of bounds – plugin may be missing”, request your suggestions on how to fix the same.

Thanks.

26 Manish Jagtap August 12, 2013 at 12:26 am

Hi,

How do I monitor SNMP table in Nagios? Here, the row count would be dynamic. here.

27 Muharrem Aydin June 24, 2014 at 4:10 am

Hi,

I want to do with SNMPv3, examples of this transaxle? Only your user name and password you want to use.

Leave a Comment

Previous post:

Next post: