Article By: MC1 Grant Ammon
The Naval Postgraduate School’s Dudley Knox Library has announced the launch of an extensive online institutional archive. The electronic repository, known as Calhoun, was created to bring NPS' scholarly contributions together, creating an easily searchable collection of scholarly, instructional and institutional publications and research products authored by members of the NPS community.
“Calhoun gives everyone one place to go to find out about the Naval Postgraduate School’s scholarly contributions,” said University Librarian Eleanor Uhlinger. “Right now you have to browse through many, many web pages, look through faculty websites or look through our library catalog to find theses. This is one central location that says these are the products of NPS.”
Calhoun, named after Prof. Guy K. Calhoun who is the first known appointment and published author from NPS, is based on the open source software DSpace created by the Massachusetts Institute of Technology and Hewlett-Packard, and is reflective of a new wave of thinking amongst library professionals.
Calhoun extends the reach of NPS-authored content by digitizing old documents and assigning electronic descriptions, or meta-tags, that allow search engines such as Google or Yahoo! to find them. Documents that were originally electronic are also assigned metadata, also allowing search engines to find easily find them.
“If you just go in the Internet and search a specific topic, you can’t see what is in our library’s catalog. Search engines can’t see beyond a certain point and it is meta-tagging that makes it possible for the search engines to index our content out to the world,” noted Berry. “Now when you look for terms in a search engine, it goes straight to this institution. Our stuff isn’t hidden under a layer of cataloging and other stuff; it’s right out there on the web now.”
Members of the team responsible for the creation of the Institutional Archive, Calhoun, pose for a photo in front of the Dudley Knox Library.
DKL officials noted that a significant portion of the repository's content are dissertations and theses authored by students at NPS – an already integral component to the library's collections.
We have 20,000 theses, with the earliest dating back to 1923, and maybe only about 5,000 of those are in electronic format, noted Berry. “If you want to find anything older than 2001 it’s tough,” she said. “One of the things the library has taken on is this big project to do this digitization, and it’s a massive effort. Works were on a shelf in the library waiting for people to come and find them and take them home. That is not the way libraries work anymore.”
To aid in the digitization of the extensive quantity of articles and works created by NPS students and faculty, team members charged with creating Calhoun at DKL came up with a unique partnership with the San Francisco-based non-profit organization Internet Archive.
“When faced with the retrospective digitization of 14,000 or 15,000 paper documents, we had come up with creative solutions and we found a way to do it that we feel really good about,” said Berry. “We have partnered with the non-profit Internet Archive. We did a win-win so the documents become much more accessible and their service provides us with state of the art scans. They are doing it on an unbelievable scale.”
The scanning or digitization of content is one step in the process of documents being prepared for entry into Calhoun, but another big challenge that had to be addressed by Berry and her team at DKL was the organization and categorizing of the newly created documents in electronic format. Through automation, and a cleverly designed computer script, the team was able to remove data from the documents scanned by Internet Archive and flow pertinent data into a format that allowed for easier categorization and labeling.
We have been really trying to work at automating they process, but the question of how to extract data out of a piece of paper was asked, noted Berry. “We have very clever minds here on staff, and one created a script that reads the PDF (Portable Document Format) pages of every document in the collection, said Berry. “That script actually takes the important text on the PDF and imports it to a table in Excel so that no human beings have to touch the document.”
Even with the automated scanning of older texts and the computer script extracting relevant data through optical character recognition (OCR), some documents proved troubling to translate to the digital archive, and again, Berry turned to her staff for solutions.
“OCR can read fonts that were generated by other computers and it’s fairly accurate, but as we’re going back to paper documents that were typed on onion skin with typewriters it gets difficult,” noted Berry. “Particularly when it runs into complex mathematic formulas and diagrams it gets very, very hard to get good clean documents. We have another person on our team that is quality checking the documents and she is doing a fantastic job at troubleshooting those problems.”
Although much of the content being prepared and populated in Calhoun comes from student theses and dissertations, all NPS-authored content is welcome and can be submitted to Berry at the library. Calhoun includes any and all things that are created by NPS authors ... from faculty articles, lectures, publications created on campus, even newsletters from various departments are all welcome for submission to Calhoun.
The system is currently in Beta testing, but is planned to go live Apr. 2. To view and explore the archive, visit http://calhoun.nps.edu/.