Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.

Computing Systems

Computing Systems

Informatics Computing Staff jottings

Network disruption yesterday

Apologies for the short network disruption yesterday afternoon.  It was caused by a 10Gbps forwarding loop, which was created as the second-last fibre was being connected as part of our core switch upgrade programme.  As soon as we realised there was a problem the fibre was disconnected again and the port configuration corrected.

Background: we (Informatics, and the constituent Departments before that) set up our network with redundant paths for resilience, using Rapid Spanning Tree Protocol to manage the links and prevent loops.  EdLAN as a whole has different constraints, and they run a different STP variant across the core and no STP at the edge.  Over the years there have been incompatibilties between the way these variants operate, and we have seen some instability as a result of STP-related events elsewhere.  We have therefore for some time filtered BPDUs at all of our interfaces to EdLAN.  This has generally operated well for many years.

So what went wrong yesterday?  The cards in the new switch which was being installed yesterday are slightly different from the ones in the old switch, and the port involved in yesterday’s problems was previously set up as a hot-spare EdLAN link.  (We keep some links pre-configured so that they can be quickly swapped into operation should there be a fault with our principal link.)  As part of the upgrade process that port became one of our “normal” infrastructure links and the hot-spare EdLAN link was moved to a different port.  The VLAN configurations were moved correctly, but the BPDU filtering was accidentally left applied to the wrong port.  When that port was patched in, therefore, STP did not know to block one of the downstream links, and so a loop was set up.  Unicast traffic would still have been operating normally, but we have enough multicast traffic that was looped around to completely saturate our infrastructure links.

The fix was to disconnect the problem link, so breaking the loop.  The BPDU filter was then applied to the correct link, and everything connected up again.

As usual, our technical network documentation is here.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

css.php

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Please enter an email address you wish to be contacted on. Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

  Cancel