AFS Restores
Following this mornings unplanned restart of one of the Forum SAN storage machines, various AFS volumes (where your files are stored) have been affected. We are bringing these back on-line a partition at a time. Unfortunately there is a delay as we check each partition for file system consistency, to make sure there’s been no corruption.
We are doing this manually so we can bring each partition and associated volumes back as they pass their checks. This means volumes will come back gradually for users, rather than having everyone wait until the last partition is checked.
However, the process of reattaching the checked partition, does mean a shortish break in access to the already recovered volumes, so please bear with us if your home directory has been recovered, but you find that every hour or so it seems to freeze for a couple of minutes while we reattach another recovered partition.
We are giving priority to user volumes first, followed by group volumes. Missing group volumes will affect web pages served from those areas.
Apologies for the prolonged recovery, but we’re going as fast as we dare.
Neil
Update: A report explaining the actions that caused the unplanned break in service, and the steps to restore the data is available at https://wiki.inf.ed.ac.uk/twiki/pub/DICE/ServicesUnit/incident-report-10-11-11.pdf
squonk: vicepe – vicepl
crocotta: vicepe – vicepm
bunyip: vicepe – vicepl
cameleopard: vicepd – vicepf, viceph
A bit terse and poor formatting, but FYI volumes still affected.
# user group
bunyip 0 116
cameleopard 0 24
crocotta 384 3
squonk 647 1
Partitions still to be remounted:
squonk: vicep f, g, h
crocotta: vicep g, h, i, j
bunyip: vicep e, f, g, h, i, j, k, l
cameleopard: vicep d, e, f
Volumes still affected:
#----- user group
bunyip 0 116
cameleopard 0 13
crocotta 0 3
squonk 519 1
Out of a total of 5336 user volumes and 386 group volumes
Partitions still to be remounted:
squonk: vicep g*, h
crocotta: vicep g, i*, j
bunyip: vicep f*, g, h, i, j, k, l
cameleopard: vicep e, f*
* indicates the volume being checked, and so the next ones back. Unfortunately the ones left to check are 0.5TB or 1TB partitions, so take the longest to check.
# user group
bunyip 0 93
cameleopard 0 13
crocotta 0 3
squonk 519 1
All user volumes are now back. Some 70 group areas on:
bunyip vicep g, h, i, j, k, l
cameleopard vicep e
are still being checked, and then restored.
All files partitions and volumes have been restored. If you are still having problems, please contact support.