Overview
One of the hardest things to do when managing a web site is creating a catalog of all of your pages and content, then checking and verifying the content is up-to-date and following all current standards. This is why we created The “Ellucian Web Surveyor” tool.
The Web Surveyor will index and scan your site, creating an internal catalog of all of all pages and content on your site. It will then scan that data and generate some useful statistics and measurements and create several useful reports that you can use as a starting point to begin cleaning up, correcting, and modernizing your web site.
The Surveyor tool consolidates several scripts and tools into one unified interface with a focus on websites designed for higher education. We pull from many open source tools along with some in-house custom scripts and tools, and finally combine that with our internal knowledge and experience to give you some recommendations for best practices in higher education.
What does the Surveyor do?
The Surveyor starts by finding as many pages as possible on your site. After an index of the site is created it will extract content from every page that was indexed and perform scans checking for:
- accessibility
- broken links
- general page metrics and analysis
After your site has been scanned, reports will be generated summarizing the information found on your site.
Indexing
Before we start analyzing your site, we have to find as many pages as possible by generating an index of pages. During this process, our crawler will start by going to your home page and gathering as links as possible and adding them to the index. From there it will continue to crawl your site until no more unique pages can be found.
Some pages may not be crawled during the indexing of your site. These pages are typically isolated pages that aren’t referenced anywhere else on your site. An example would be a campaign page that is only referenced in an email or ad. Moreover, any pages that are self-referencing or recursive will not be indexed. An example would be event pages that contain a calendar, pages like these would cause the index to continue indefinitely.
Content Extraction
Over time, the number of pages and content on a web site can grow in a way that is untraceable. Pages get created, moved, deleted, and dumped onto a server perhaps even outside of teh CMS. We heard the frequest request to know if there was a way to get a list of all of the pages on a site. Since our crawler starts at the home page and crawls through every linked page, we were already building an index of all of those pages. We created a way for users to download a file with a list of those pages for auditing purposes. This now exists on a tab of its own inside of Surveyor and you can use it to have a nice ennumeration or your overall web presence.
Accessibility
The main reason we created this tool, to be completely honest, was because we see a huge need for 508/ADA compliance scanning. Accessibility is a huge problem in higher education and has, sadly, been an afterthought to many instututions over the years. With the dangers of lawsuits and threats of government funding being withheld, we wanted to create a starting point to evaluate your site for accessibility issues so that you can begin mitigtating those issues in an organized manner.
You may have already noticed that our scan sresults have two tabs for accessibility. Because this is such an important topic, we provide you with two different tabs with two different sets of scan results from two different tools. You will find a lot of duplication in these reports and that is perfectly normal and expected. But since there are a great number of nuances in scanning tools as compared to compliance requirements, we decided it was more thorough to have two scans in case one scan misses some issues that the other may catch. WE went with the idea that since this is so important that we shoudl be “better safe than sorry” and provide two different results sets to analyze.
Broken Links
Page Metrics & Analysis
Reports
Tools
The Surveyor currently uses technologies from Google (lighthouse) for performance and best practices checking, pa11y for advanced accessibility checking which follows the a11y project, and some other in-house tools written by our very talented team of developers. The entire interface is custom written for Ellucian technology management customers and is hosted by Amazon Web Services.
If you are seeing this from an existing customer and are interested in using this service at your institution, please contact your Service Delivery Manager who will reach out to Ellucian Web Services and we will be glad to help get you started with this tool, and any of our web services.