How to archive entire websites and content that may no longer be useful, is outdated, irrelevant, does not get much web traffic, or is no longer managed.
Practice data reduction by not keeping old information. It’s also a good practice for search engine optimization to review your website content on a regular basis to make sure you are providing accurate information.
UCSC websites are electronic communications covered under existing policy.
Can my website be the official place where I keep my records?
No. UCSC’s enterprise content management systems (WordPress and WCMS) are not records keeping systems.
- WordPress is not to be used as a final repository for the retention of Institutional Information. University business information must be stored in a separate final repository and managed in accordance with the department’s records retention practices.
- The WordPress site itself should not be regarded as an archivable record and is not considered an archivable format. Upon the retirement of a WordPress site, a unit should only keep the data as long as it is being referenced and no longer than one year.
- If your unit chooses to archive information as it is displayed publicly, the recommended practice is to save the page as a PDF/A and submit it to the unit’s final repository.
- Keep these principles in mind when migrating information from WCMS to WordPress.
If you have questions about University records retention policies, please contact Diane Lallemand, Campus Records Manager, or see the UC Records Retention Schedule.
What content should you archive?
Archiving may be needed to maintain publications, research, and in other situations where it is necessary to keep official records. To do this, follow guidelines for maintaining publications.
Keeping outdated websites live for aesthetic or personal purposes is not aligned with policy and can present risks.
Risks with archiving websites
Simply downloading an entire website for archiving purposes creates risk.
- Some types of content on a website may require different types of archiving processes, and mixing retention types might present legal, privacy, or data retention risks.
- It creates extra copies and records may:
- Not be in alignment with policy that recommends we practice data reduction, and follow a pre-determined retention schedule.
- Create potential security risks if the storage of the records does not follow policy.
How to archive or remove content
This process might be as basic as identifying content on websites you manage and removing it, or moving it into a shared storage location.
- Best practice: Audit your website content as you develop a content strategy. In addition to identifying all of the pages on your website in an audit, you also can review your pages to see which ones you should keep and which ones to archive or remove.
Risks associated with legacy or outdated websites and content
Websites that are not actively managed can present risks to our UCSC digital campus.
- No longer meet web accessibility standards
- Cost UCSC time and resources to continue to maintain
- Present security risks because of outdated technology that is no longer supported
- Present poor, or dated web designs that may harm reputation
- Include outdated brand or identity components that may harm reputation
- Include outdated or incorrect content that can be found through search engines
- Actively mislead users with old content
How to archive old web content
Be sure that you are following practices set in the UC Records Retention Schedule.
There are options for users with limited finances or technical expertise, and options that may automate the process but may involve costs.
- Identify where you want to keep the content offline. Consider a shared storage location like Google Drive.
- Capturing your webpages
- Copy content from a webpage and place it into a Google Doc.
- Print from your browser and save as a PDF
- Alternatively the Chrome Extension: Awesome Screenshot (not a supported tool) for capturing pages as they display on the web in PDF or JPG formats.
- (Optional) Assemble the PDFs into a single file using Adobe Acrobat
- Store the file offline.
Unsupported automated and paid tools for archiving
- Wayback Machine: a third party tool for view-only of old versions of a website.
- Webrecorder: record a screencapture of website as you browse it, but not the content on it.
- Mirror Web
What websites are being archived?
University Archives selectively crawls and archives UCSC websites inline with its Collecting Policy.
Generally, departmental and divisional websites are crawled on an annual basis and are available to access via the Internet Archive’s Wayback Machine.
Are websites created by individuals affiliated with UCSC archived?
Faculty and other websites maintained by individuals (ie. students and staff) are not part of the Collecting Policy and are not actively archived.