It’s taken us a while but as of yesterday the complete Flight International 100-year archive of scanned and OCR’d (searchable text) pdfs are up on Flightglobal.
Without meaning to blow our own trumpet this is a pretty impressive achievement.
There aren’t that mainly publishers that have been around for 100 years let alone scanned in their entire history and put it online for free. And I don’t just mean within the aviation industry but in the broader publishing world.
The technical aspects of the project also represent a significant achievement. 210,000 individual pages have been scanned in and subjected to Optical Character Recognition to ensure they are searchable. In addition, every page has been thumbnailed to aid the user when browsing. And the unique pdf viewer has been built pretty much from scratch.
All that remains now is the task of crawling the individual pages to extract meta data to ensure that search engines such as Google have the best idea of what each page contains.
I hope everyone is getting as much out of the archive as we are. There are some true gems in there, and as we go into 2008 we’ll be making much more of the pdfs throughout the site.