August 27, 2004
OpenNMS .III.
Monitoring Squid via ONMS
Khoo Kah Jin
v1.3, August 2004
Abstract:
This walkthrough will guide the reader to configure Squid’s utilization in ONMS.
1. We’ll begin by determining whether or not squid was precompiled with the –-enable-snmp flag.
# squid -v
Squid Cache: Version 2.5.STABLE3
configure options: --host=i386-redhat-linux --build=i386-redhat-linux --target=
i386-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindi
r=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --included
ir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var
--sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --e
xec_prefix=/usr --bindir=/usr/sbin --libexecdir=/usr/lib/squid --localstatedir=/
var --sysconfdir=/etc/squid --enable-poll --enable-snmp --enable-removal-policie
s=heap,lru --enable-storeio=aufs,coss,diskd,null,ufs --enable-ssl --with-openssl
=/usr/kerberos --enable-delay-pools --enable-linux-netfilter --with-pthreads --e
nable-basic-auth-helpers=LDAP,NCSA,PAM,SMB,SASL,MSNT,winbind --enable-ntlm-auth-
helpers=SMB,winbind,fakeauth --enable-external-acl-helpers=ip_user,ldap_group,un
ix_group,wbinfo_group,winbind_group --enable-auth=basic,ntlm --enable-useragent-
log --enable-referer-log
Once this has been confirmed, proceed by editing the necessary options in /etc/squid/squid.conf
2. We’ll need to specify a community string:
acl snmp182 snmp_community public
The port that squid will listen for snmp queries:
snmp_port 3401
Don’t forget to allow snmp enquiries from the acl you created
snmp_access allow snmp182 all
Specify a range to allow incoming messages
snmp_incoming_address 0.0.0.0
And finally the address to return those messages
snmp_outgoing_address 255.255.255.255
Keep in mind of the following paragraph, taken from squid’s config file:
“NOTE, snmp_incoming_address and snmp_outgoing_address can not have
same value since they both use port 3401. “
Restart squid. Just to make sure it works, perform a snmpwalk:
# snmpwalk -m /tmp/squid.mib -v 1 -c public localhost:3401 1.3.6.1.4.1.3495.1
You should be able to see some results from here.
3. Now that we are able to query squid’s mib, it’s now time to include their respective oids in datacollection-config.xml Start running this script:
/opt/OpenNMS/contrib/mibparser/dist/parseMib.sh /etc/squid/mib.txt
Copy the results into datacollection-config.xml as a group, like so:
<group name="squid-stat" ifType="ignore">
<!— add the oids in here -->
</group>
Add the following oids, since not all of it are compatible with ONMS:
<mibObj oid=".1.3.6.1.4.1.3495.1.1.1" instance="0" alias="cacheSysVMsize" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.1.2" instance="0" alias="cacheSysStorage" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.1.3" instance="0" alias="cacheUptime" type="TimeTicks" />
<mibObj oid=".1.3.6.1.4.1.3495.1.2.5.1" instance="0" alias="cacheMemMaxSize" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.2.5.2" instance="0" alias="cacheSwapMaxSize" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.2.5.3" instance="0" alias="cacheSwapHighWM" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.2.5.4" instance="0" alias="cacheSwapLowWM" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.1" instance="0" alias="cacheSysPageFaults" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.3" instance="0" alias="cacheMemUsage" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.4" instance="0" alias="cacheCpuTime" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.5" instance="0" alias="cacheCpuUsage" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.6" instance="0" alias="cacheMaxResSize" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.7" instance="0" alias="cacheNumObjCount" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.8" instance="0" alias="cacheCurrentLRUExpirationTOOLONG" type="TimeTicks" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.9" instance="0" alias="cacheCurrentUnlinkRequestsTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.10" instance="0" alias="cacheCurrentUnusedFDescrCntTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.1.11" instance="0" alias="cacheCurrentResFileDescrCntTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.1" instance="0" alias="cacheProtoClientHttpRequestsTOOLONG" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.2" instance="0" alias="cacheHttpHits" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.3" instance="0" alias="cacheHttpErrors" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.4" instance="0" alias="cacheHttpInKb" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.5" instance="0" alias="cacheHttpOutKb" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.6" instance="0" alias="cacheIcpPktsSent" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.7" instance="0" alias="cacheIcpPktsRecv" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.8" instance="0" alias="cacheIcpKbSent" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.9" instance="0" alias="cacheIcpKbRecv" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.10" instance="0" alias="cacheServerRequests" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.11" instance="0" alias="cacheServerErrors" type="Integer32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.12" instance="0" alias="cacheServerInKb" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.13" instance="0" alias="cacheServerOutKb" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.14" instance="0" alias="cacheCurrentSwapSizeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.3.2.1.15" instance="0" alias="cacheClients" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.1" instance="0" alias="cacheFqdnEntries" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.2" instance="0" alias="cacheFqdnRequests" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.3" instance="0" alias="cacheFqdnHits" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.4" instance="0" alias="cacheFqdnPendingHitsTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.5" instance="0" alias="cacheFqdnNegativeHitsTOOLONG" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.2.6" instance="0" alias="cacheFqdnMisses" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.3.1" instance="0" alias="cacheDnsRequests" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.3.2" instance="0" alias="cacheDnsReplies" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.3495.1.4.3.3" instance="0" alias="cacheDnsNumberServersTOOLONG" type="Counter32" />
Remember to include the group in the respective system definition. (Note: The system definition might already have been defined, so locate one first before defining your own)
4. At this point, we have not yet defined as to how ONMS is supposed to query Squid at port 3401. Thankfully, net-snmp has a solution for this. What we have to do is to get net-snmp to proxy requests for Squid. Edit snmpd.conf and add the following line:
proxy -m /tmp/squid.mib -c public -v 1 localhost:3401 .1.3.6.1.4.1.3495.1
This line will enable a query to forward its request to squid, whenever the OID of .1.3.6.1.4.1.3495.1 is defined. Perform a snmpwalk, this time without specifying port 3401:
# snmpwalk -m /tmp/squid.mib -v 1 -c public localhost 1.3.6.1.4.1.3495.1
5. If you have gotten this far, it might be a good idea to check whether or not rrds have been created in the nodeid’s directory. If it indeed has, you’re all set to define those graphs. If it’s not there, read on.
6. First up, check your logs. Pay special attention to collectd.log because that’s where all your clues are. It might be a good idea to |grep the IP address of the monitored server.
2004-08-19 17:29:27,213 DEBUG [CollectdScheduler-5 Pool-fiber1] SnmpCollector: Failed to retrieve interface count from remote host 134.211.22.72
2004-08-19 17:29:27,463 DEBUG [CollectdScheduler-5 Pool-fiber1] CollectableService: run: change in collection status, generating event.
2004-08-19 17:29:27,464 DEBUG [CollectdScheduler-5 Pool-fiber1] CollectableService: sendEvent: Sent event uei.opennms.org/nodes/dataCollectionFailed for 37/134.211.22.72/SNMP
This particular log is very generic, because it doesn’t really tell you what went wrong. Next, check on your snmpd service to see if it’s still running. If it’s not, you’d probably get this message from snmpd.log:
Aug 19 09:46:16 localhost snmpd[16055]: response to proxy request illegal. We're screwed.
Looks like it died. This is where things get interesting. For now, edit your datacollection-config.xml and change the ifType from ‘ignore’ to ‘all’:
<group name="squid-stat" ifType="all">
Restart opennms and snmpd. You will now have your oids in line for collection:
2004-08-18 17:15:35,771 DEBUG [main] SnmpCollector: buildDataSourceList: ds_name: cacheSysVMsize ds_oid: .1.3.6.1.4.1.3495.1.1.1.0 ds_max: U ds_min: U
But the rrds were not generated!
2004-08-18 17:16:35,782 DEBUG [CollectdScheduler-5 Pool-fiber4] SnmpCollector: updateRRDs: Skipping update, no data retrieved for node/ifindex: 14/3 datasource: cacheSysVMsize
Upon closer inspection:
2004-08-18 17:16:35,328 DEBUG [SnmpPortal--1] SnmpIfCollector: getNextSnmpV2Pdu: adding oid to pdu: .1.3.6.1.4.1.3495.1.1.1
2004-08-18 17:16:35,328 DEBUG [SnmpPortal--1] SnmpIfCollector: SnmpCollector.snmpReceivedPdu(): Sending next GETBULK packet.
2004-08-18 17:16:35,329 DEBUG [SnmpPortal--1] SnmpIfCollector: snmpReceivedPdu: got an SNMP pdu, num vars=1
2004-08-18 17:16:35,330 DEBUG [SnmpPortal--1] SnmpIfCollector: snmpReceivedPdu: interface SNMP response arrived. Handling GETBULK response.
Notice how SnmpIfCollector is issuing a GETBULK via snmpv2. This is where our problem is. We have to keep in mind that squid’s snmp module only supports snmpv1. So in this case, we have to reconfigure snmp-config.xml, and add the following line:
<definition version="v1">
<specific>134.211.22.72</specific>
</definition>
This will enforce a collection via snmpv1, and therefore solving the issue. Now it’s time to see if it worked:
2004-08-19 18:45:19,301 DEBUG [CollectdScheduler-5 Pool-fiber0] SnmpCollector: createRRD: rrd path and file name to create: /var/opennms/rrd/snmp/64/cacheSysVMsize.rrd
2004-08-19 18:45:19,301 DEBUG [CollectdScheduler-5 Pool-fiber0] SnmpCollector: updateRRDs: Issuing RRD update command: update /var/opennms/rrd/snmp/64/cacheSysVMsize.rrd N:84
And there you have it. If needed, you may want to rename your ifType back to ‘ignore’.
7. All you need to do now is to edit snmp-graph.properties and define those graphs. For data sources that exceed the 19 character limit, be sure to specify the truncated data source, like below:
report.cacheCurrentSwapSiz.name=Disk Space used by Squid
report.cacheCurrentSwapSiz.columns=cacheCurrentSwapSiz
report.cacheCurrentSwapSiz.type=node
report.cacheCurrentSwapSiz.command=--title="Disk Space used by Squid" \
DEF:cacheCurrentSwapSiz={rrd1}:cacheCurrentSwapSiz:AVERAGE \
LINE2:cacheCurrentSwapSiz#0000ff:"MB" \
GPRINT:cacheCurrentSwapSiz:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:cacheCurrentSwapSiz:MIN:"Min \\: %8.2lf %s" \
GPRINT:cacheCurrentSwapSiz:MAX:"Max \\: %8.2lf %s\\n"
Failure to do so will result in missing graphs from the SNMP performance report.
Data sources that are graphable on onms are:
• cacheHttpInKb
• cacheHttpOutKb
• cacheServerOutKb
• cacheServerInKb
• cacheSysVMsize
• cacheUptime
• cacheMemUsage
• cacheCpuUsage
• cacheNumObjCount
• cacheClients
• cacheCurrentSwapSize
• cacheDnsReplies
• cacheDnsRequests
• cacheHttpHits
• cacheHttpErrors
• cacheFqdnEntries
• cacheFqdnRequests
• cacheFqdnHits
• cacheSysStorage
• cacheSysPageFaults
• cacheProtoClientHttpRequests
Posted by kahjin at 03:37 PM | Permalink | Comments (6)
July 30, 2004
OpenNMS .II.
Packaging Thresholds
Khoo Kah Jin
v1.0, July 2004
Abstract:
This walkthrough describes threshold maintenance across a range of nodes with various mapped drives or storage allocation units. For this example, a threshold is assigned to trigger whenever disk usage for a particular logical drive and node exceeds its intended cap.
1. Depending on the logical drive you'd like to fix a threshold upon, specify an OID with a relevant alias in datacollection-config.xml. Prior to this, a snmpwalk must be performed on the node in mind to validate their respective values, e.g.
# snmpwalk -c public -v 2c 192.168.10.33 |grep "hrStorage*"
HOST-RESOURCES-MIB::hrStorageType.1 = OID: HOST-RESOURCES-TYPES::hrStorageRemovableDisk
HOST-RESOURCES-MIB::hrStorageType.2 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.3 = OID: HOST-RESOURCES-TYPES::hrStorageCompactDisc
HOST-RESOURCES-MIB::hrStorageType.4 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.5 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.6 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.7 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.8 = OID: HOST-RESOURCES-TYPES::hrStorageVirtualMemory
HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: A:\
HOST-RESOURCES-MIB::hrStorageDescr.2 = STRING: C:\ Label: Serial Number 58cfaf67
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: D:\
HOST-RESOURCES-MIB::hrStorageDescr.4 = STRING: E:\ Label:4096 Serial Number 64889427
HOST-RESOURCES-MIB::hrStorageDescr.5 = STRING: F:\ Label:2048 Serial Number a079525e
HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: G:\ Label:1024 Serial Number 986bffb0
HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: H:\ Label:512 Serial Number 3414ec32
HOST-RESOURCES-MIB::hrStorageDescr.8 = STRING: Virtual Memory
HOST-RESOURCES-MIB::hrStorageAllocationUnits.1 = INTEGER: 0 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.2 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.3 = INTEGER: 0 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.4 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.5 = INTEGER: 2048 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.6 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.7 = INTEGER: 512 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.8 = INTEGER: 65536 Bytes
HOST-RESOURCES-MIB::hrStorageSize.1 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageSize.2 = INTEGER: 1281175
HOST-RESOURCES-MIB::hrStorageSize.3 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageSize.4 = INTEGER: 512063
HOST-RESOURCES-MIB::hrStorageSize.5 = INTEGER: 510047
HOST-RESOURCES-MIB::hrStorageSize.6 = INTEGER: 514048
HOST-RESOURCES-MIB::hrStorageSize.7 = INTEGER: 3068351
HOST-RESOURCES-MIB::hrStorageSize.8 = INTEGER: 8996
HOST-RESOURCES-MIB::hrStorageUsed.1 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageUsed.2 = INTEGER: 746309
HOST-RESOURCES-MIB::hrStorageUsed.3 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageUsed.4 = INTEGER: 3222
HOST-RESOURCES-MIB::hrStorageUsed.5 = INTEGER: 3863
HOST-RESOURCES-MIB::hrStorageUsed.6 = INTEGER: 5177
HOST-RESOURCES-MIB::hrStorageUsed.7 = INTEGER: 21180
HOST-RESOURCES-MIB::hrStorageUsed.8 = INTEGER: 0
For this instance, we’ll monitor the disk usage from C Drive. We can tell that at least 3GB from the 5GB has been used from the said drive. Now that we have the figures in hand, let’s set a 3.5GB threshold on C.
2. The first thing we need to do is to make sure that ONMS uniquely identifies this OID when the threshold kicks in. We’ll start with the datacollection-config.xml file.
<group name = "windows-host" ifType = "ignore">
<mibObj oid=".1.3.6.1.2.1.25.2.3.1.6" instance="2" alias="usedDriveC35" type="integer" />
</group>
Breaking down a couple of variables from above:
• mibObj oid=".1.3.6.1.2.1.25.2.3.1.6" – Refers to the object-type hrStorageUsed.
• alias="usedDriveC35" – A very generic term was used for this example.
In some cases, hrStorageDescr.2 may not necessarily relate to Drive C. This is where instances come in. They must be referenced accordingly.
Next, we have the thresholds.xml file to configure. Start off by adding a new group for that threshold.
<group name="cused35"
rrdRepository = "/var/opennms/rrd/snmp/">
<threshold type="high" ds-name="usedDriveC35" ds-type="node" value="860000" rearm="733070" trigger="1"/>
</group>
A little explanation:
• group name="cused35" – I only intend to monitor usage on drive C with a 3.5GB threshold. So if you’re planning to have more than 1 threshold on a single group, rename accordingly to ease your preference.
• ds-name="usedDriveC35" – Make sure to match your data source with the alias previously stated in the datacollection configuration.
Do note that, depending on the allocation unit of a logical drive, HrStorage values will differ from one node to another. Therefore you must prescribe a value that reflects on the appropriate allocation unit.
3. Now it’s time to include our packages in the threshd-configuration.xml file.
<package name="cused_35">
<filter>IPADDR IPLIKE *.*.*.*</filter>
<specific>192.168.10.33</specific>
<service name="SNMP" interval="150000" user-defined="false" status="on">
<parameter key="thresholding-group" value="cused35"/>
</service>
</package>
• package name="cused_35" – Once again, name this according to your preference.
• <specific>192.168.10.33</specific> - This will be the node that we’ll bind the threshold into.
• value="cused35" – The group “cused35” is added into this package.
As you could probably tell by now, each node would require its very own package. If you wish to include more thresholds on a single node, just add a service tag with the relevant value (group name).
And that’s it! Save your changes and restart ONMS.
Posted by kahjin at 04:07 PM | Permalink | Comments (0)
July 22, 2004
OpenNMS .I.
Unfortunately I won't be telling my life story in this column :) So my sincere apologies to the technically declined!
Entries in this category will keep track if my onms progress. I've never dealt with any sort of nms system before so this is a good start to reflect and learn. Also, if the onms faq site didn't help much, you might get some answers here.
Anyway I had an interesting problem to fix today. While checking out the cpu utilization graph for a winxp workstation I was polling from, I noticed that it hadn't been updated for the past 12 hours! Now, how could that have happened when none of the xml config files were screwed around with? Sensibly, I did a rrdtool dump on the said winxp node and came up with NaN (not a number) values. Which meant 2 things:
1. The snmp agent on the winxp node somehow altered it's own mib. (Don't know how this is possible, I have yet to view the default set, if any. If anyone knows where I can find mibs in winxp do let me know. Previously installed snmp informant, just in case, but a snmpwalk on one if its oid (object id) didn't fetch anything!)
2. cpuPercentBusy oid was wrongly specified in the datacollection.xml file. (Very unlikely, since polling did graph results after the last alteration)
With these two possibilities in mind, one could tell that not a single value was polled because of an unknown or a non integer value from the cpuPercentBusy oid, hence the NaN values and the empty graph!
So, how was this fixed? With the knowledge that an invalid oid could as well have been the cause, I compared two snmpwalk dumps on the said oid which I did the day before, and a newer one this morning. Which resulted in this:
old dump
HOST-RESOURCES-MIB::hrProcessorLoad.1 = INTEGER: 99
new dump
HOST-RESOURCES-MIB::hrProcessorLoad.2 = INTEGER: 99
How was it possible that the instance changed from 1 to 2 is something i'm trying to figure out. AFAIK the snmp agent on winxp has static mib values, which shouldn't contribute to this problem.
So, a quick change of the instance value from "1" to "2" in the datacollection-config.xml file fixed it:
mibObj oid=".1.3.6.1.2.1.25.3.3.1.2" instance="2" alias="cpuPercentBusy" type="integer"
Restart and wallah, no more NaN!
Posted by kahjin at 06:33 PM | Permalink | Comments (0)