Wiktionary:Searchable external archives
Wiktionary has an attestation process for entries (WT:ATTEST) that includes a requirement that citations used in attestation are citations from permanently recorded media (that the media is durably archived).
This is a list of durable archives that are searchable for free. It is intended as a resource for finding citations of words to show that they satisfy Criteria for Inclusion. Not listed are resources that require paid membership, even after a trial period, or otherwise charge a fee to perform the search or retrieve quotations. This excludes sites such as Amazon.com, which requires a previous purchase to preview material, and Jstor.org, which requires subscription.
The most commonly cited sources are printed books, magazine and journal articles, and newspapers. Books.google.com is Wiktionary's go-to engine for searching books and some magazines, and scholar.google.com is a good engine for searching academic, scientific and medical journals. (Note that Google Scholar can be used to find mathematical symbols that are otherwise ignored by search engines and hence unfindable: search for the symbols' Latex notation.) Issuu.com is a large index of newspapers and magazines. (Note that Google Books, Google Scholar and Issuu sometimes index e-publications which do not exist in print, so mere inclusion in one of these indices is not a guarantee that a source is durably archived. Use the parameters
id in Template:quote-book, Template:quote-journal, etc. to help confirm that printed materials are durably archived.)
Wikisource and Project Gutenberg also maintain large, searchable collections of printed works, as does the HathiTrust Digital Library. The Internet Archive provides full-text search over its large collection of scanned books and magazines. Material still under copyright can be consulted for free with an account.
Laws are also durably archived, and several websites exist to allow search corpora of them:
- Ireland: Houses of the Oireachtas (debates.oireachtas.ie, historical-debates.oireachtas.ie)
- United Nations Educational, Scientific and Cultural Organization
- United States of America, Federal: FindLaw
- United States of America: Legal Information Institute
Resources with narrower scopeEdit
Several institutions maintain corpora of English language works; in alphabetical order, these include:
- Brigham Young University Corpus of Contemporary American English
- British National Corpus
- The Free Library thefreelibrary.com
When attempting to attest an obscure, obsolete, or dialectal term, it can be useful to consult the Century Dictionary and Wright's English Dialect Dictionary, as these often provide pointers to books/manuscripts where the terms have been used.
Numerous websites maintain searchable copies of the Hebrew and Greek texts of the Bible, as well as numerous English, Latin, and other-language translations. These include BibleGateway.com, Biblehub.com, and Bible.cc.
Resources specific to languages other than EnglishEdit
- Austrian literature online (German)
- Biblio (Portuguese)
- Bibliothèque nationale de France (French)
- Germany: Klaus Graf's Zeitungsarchive search engine (German)
- custom search in several German newspaper archives at once, including:
- zeus.zeit.de, welt.de, netzeitung., taz.de, berlinonline.de, spiegel.de, stern.de, freitag.de, jungewelt.de, nd-online.de
- Germany: Internet-Links für Journalisten (German) recherchetipps.de
- Germany, Berlin: taz - die tageszeitung (German) www.taz.de
- Vietnam: Thư viện Quốc gia Việt Nam (Vietnamese; look for collections like )
Audio and video mediaEdit
Some audio and video media produced in some countries are durably archived by libraries; these include commercially-released songs, motion pictures, and television shows. imsdb.com, the Internet Movie Script Database, provides a searchable archive of movie scripts.
Usenet is considered durably archived because its archives are decentralized. It has been accessible continuously since 1980, before the creation of the World Wide Web. It can be accessed through Google Groups.
Other online media: websites are not durableEdit
Websites are not considered durably archived; do not add any web search engines here. Sites such as web.archive.org and WebCite attempt to archive the Internet where possible, but at present cannot be considered durable because they are at the mercy of the original copyright holders. (Note: citations from the web may be useful if they are particularly good examples of the use of a word or sense, and may be retained for this reason even though they do not help the word meet CFI.)
Media resources such as YouTube, intended for online use only, are not considered durably archived. If the material is taken from another source, such as a movie or television show, cite the original source.
Monumental inscriptions such as runestones are also durable, particularly because they are often reproduced in printed literature. Various websites document runic and other inscriptions in a searchable way; these include CISP for Celtic inscriptions, Rundata for Norse Runic inscriptions (requires downloading a client), and the Epigraphic Database Heidelberg for Latin inscriptions (search page).