How netflow made my users happy.
Recently I had been receiving automated bandwidth alerts for a couple of our offices, so I decided to take a deeper look at what was generating enough traffic to saturate 4 T1's.
How?
The first technology that came to mind was netflow, but I had absolutely no experience with it. I only knew the basics - it would show me source and destination IP's and the type of traffic being transmitted. I also knew that it worked with flows, unlike SPAN ports which forward all traffic and something I wasn't interested in doing.
Since netflow isn't something that's useful without something collecting the data (netflow data is pushed, not polled), the first thing I did was install ManageEngine's Netflow Analyzer. They have a free version that is good up to 2 monitored interfaces (note this is not the same thing as monitored IP's - so you can theoretically just monitor your routers public interface to get useful data!), also I've used manageengine demos in the past and I've always been pleased with their products.
Once I had the Netflow Analyzer installed it was time to enable netflow on my router - a Cisco 2811 running IP Base 12.4(3f). Initially I was pretty concerned about CPU usage because we're already doing PPP multilink which is handled by the CPU and netflow can be CPU intensive, but Cisco's documentation indicated I should be good. Netflow was surprisingly easy to set up:
router#enable
Password:*****
router#configure terminal
router(config)#interface [INTERFACE YOU WISH TO MONITOR]
router(config-if)#ip route-cache flow
router(config-if)#exit
router(config)#ip flow-export destination [YOUR ANALYZER] 9996
router(config)#ip flow-export source [INTERFACE YOU WISH TO MONITOR]
router(config)#ip flow-export version 9
router(config)#ip flow-cache timeout active 1
router(config)#ip flow-cache timeout inactive 15
Note: If ip flow-export version 9 doesn't work, try version 5
In short order, I was in netflow land. Because it works via streams, it took some time for all the data to come available. Roughly an hour later I already had a big enough picture to start acting (IP addresses obfuscated to protect the guilty):
The top bandwidth destination - 192.168.23.5 - is a NAS device in another office (the other office that we were also having issues with).
All of our offices are connected via MPLS and can talk to each other as if they were on the same network. In every office, we have a NAS device for backups. We use a piece of freeware called Cobian to backup that uses a hard coded path for its destination (it isn't location aware).
As it turned out, we had a couple of users move without informing IT, so we never changed their backup locations:
I did some quick math, and figured out that these two users were maxing out our bandwidth in BOTH offices experiencing regular bandwidth shortages. Holy smokes!
Without netflow, I would have never known this was the issue (I always blamed sporting events
- and with it, I can take corrective action:
1. By changing the backup drives, we can make this go away.
2. I set up an automatic report which runs nightly that sends an email to the department with all traffic destined to backup drives at the wrong location.
3. We can start looking for backup software that limits bandwidth usage (we've needed a new backup package for awhile).
Bandwidth is no longer in such short supply, which cuts down on latency and makes users happy!
If you have any netflow stories, please comment!