User:Pengo/2gram-species

Binomials found in books edit

The one-hundred most common binomial names found in English-language books.

  1. Homo sapiens (Animalia, Chordata)
  2. Escherichia coli (Bacteria, Proteobacteria)
  3. Staphylococcus aureus (Bacteria, Firmicutes) Staphylococcus
  4. Candida albicans (Fungi, Ascomycota)
  5. Pseudomonas aeruginosa (Bacteria, Proteobacteria)Pseudomonas
  6. Mycobacterium tuberculosis (Bacteria, Actinobacteria) Mycobacterium
  7. Saccharomyces cerevisiae (Fungi, Ascomycota)
  8. Drosophila melanogaster (Animalia)
  9. Zea mays (Plantae, Tracheophyta)
  10. Bacillus subtilis (Bacteria, Firmicutes) Bacillus
  11. Haemophilus influenzae (Bacteria, Proteobacteria) Haemophilus
  12. Pneumocystis carinii (Fungi, Ascomycota) Pneumocystis
  13. Salmonella typhimurium (Bacteria) Salmonella
  14. Treponema pallidum (Bacteria, Spirochaetes)
  15. Streptococcus pneumoniae (Bacteria, Firmicutes) Streptococcus
  16. Phaseolus vulgaris (Plantae)
  17. Clostridium botulinum (Bacteria, Firmicutes)
  18. Listeria monocytogenes (Bacteria, Firmicutes) Listeria
  19. Klebsiella pneumoniae (Bacteria, Proteobacteria)
  20. Xenopus laevis - African clawed frog (Animalia, Chordata) Xenopus
  21. Helicobacter pylori (Bacteria, Proteobacteria)
  22. Neisseria gonorrhoeae (Bacteria, Proteobacteria) Neisseria
  23. Vibrio cholerae - epidemic cholera (Bacteria, Proteobacteria) Vibrio
  24. Pisum sativum - pea (Plantae, Tracheophyta)
  25. Clostridium perfringens (Bacteria, Firmicutes) Clostridium
  26. Entamoeba histolytica (Protozoa, Not assigned) Entamoeba
  27. Chlamydia trachomatis (Bacteria, Chlamydiae) Chlamydia
  28. Streptococcus pyogenes (Bacteria, Firmicutes) *
  29. Aspergillus niger (Fungi, Ascomycota) Aspergillus
  30. Mus musculus - house mouse (Animalia, Chordata)
  31. Nicotiana tabacum - Tabak (Plantae, Tracheophyta)
  32. Giardia lamblia (Protozoa, Sarcomastigophora)
  33. Cannabis sativa - Marihuana (Plantae, Tracheophyta)
  34. Salmonella TyphiSalmonella enterica subsp. enterica, serovar Typhi, Salmonella enterica, Salmonella enterica subsp. enterica (Bacteria) *
  35. Bacillus thuringiensis (Bacteria, Firmicutes) *
  36. Oryza sativa (Plantae, Tracheophyta)
  37. Serratia marcescens (Bacteria, Proteobacteria) Serratia
  38. Vicia faba - broad bean (Plantae, Tracheophyta)
  39. Neisseria meningitidis (Bacteria, Proteobacteria) *
  40. Triticum aestivum (Plantae, Tracheophyta)
  41. Glycine max - soya bean (Plantae, Tracheophyta)
  42. Bacillus cereus (Bacteria, Firmicutes) *
  43. Bacillus anthracis (Bacteria, Firmicutes) *
  44. Hordeum vulgare (Plantae, Tracheophyta)
  45. Caenorhabditis elegans (Animalia, Nematoda)
  46. Pinus sylvestris - Scots pine (Plantae, Tracheophyta)
  47. Staphylococcus epidermidis (Bacteria, Firmicutes) *
  48. Ricinus communis (Plantae, Tracheophyta) Ricinus
  49. Aedes aegypti (Animalia, Arthropoda)
  50. Cryptococcus neoformans (Fungi, Basidiomycota) Cryptococcus
  51. Neurospora crassa (Fungi, Ascomycota) Neurospora
  52. Medicago sativa - Lucherne Albastre (Plantae, Tracheophyta)
  53. Solanum tuberosum (Plantae)
  54. Ginkgo biloba (Plantae, Tracheophyta)
  55. Streptococcus faecalisEnterococcus faecalis (Bacteria) *
  56. Clostridium tetani (Bacteria, Firmicutes) *
  57. Allium cepa (Plantae, Tracheophyta)
  58. Mycobacterium avium (Bacteria, Actinobacteria) *
  59. Mycoplasma pneumoniae (Bacteria, Firmicutes) Mycoplasma
  60. Macaca mulatta - rhesus monkey (Animalia, Chordata)
  61. Clostridium difficile (Bacteria, Firmicutes)
  62. Aspergillus fumigatus (Fungi, Ascomycota) *
  63. Brassica oleracea (Plantae, Tracheophyta)
  64. Histoplasma capsulatum (Fungi, Ascomycota) Histoplasma
  65. Rattus norvegicus - Norway rat (Animalia, Chordata) Rattus
  66. Rana pipiens (Animalia) Rana
  67. Daucus carota (Plantae, Tracheophyta)
  68. Arabidopsis thaliana (Plantae, Tracheophyta)
  69. Mytilus edulis - blue mussel (Animalia, Mollusca)
  70. Beta vulgaris - Runkelrübe (Plantae, Tracheophyta) - Beta
  71. Proteus mirabilis (Bacteria, Proteobacteria) - Proteus
  72. Corynebacterium diphtheriae (Bacteria, Actinobacteria) Corynebacterium
  73. Schistosoma mansoni (Animalia, Platyhelminthes) Schistosoma
  74. Helianthus annuus (Plantae, Tracheophyta)
  75. Aspergillus flavus (Fungi, Ascomycota) *
  76. Picea abies - Norway spruce (Plantae, Tracheophyta)
  77. Trichinella spiralis (Animalia, Nematoda) Trichinella
  78. Bordetella pertussis (Bacteria, Proteobacteria) Bordatella
  79. Bombyx mori (Animalia, Arthropoda)
  80. Proteus vulgaris (Bacteria, Proteobacteria) *
  81. Mycobacterium leprae (Bacteria, Actinobacteria) *
  82. Borrelia burgdorferi (Bacteria, Spirochaetes) Borrelia
  83. Canis lupus - gray wolf (Animalia, Chordata)
  84. Vitis vinifera (Plantae)
  85. Cyprinus carpio (Animalia, Chordata)
  86. Lycopersicon esculentum (Plantae) Lycopersicon
  87. Apis mellifera - honey bee (Animalia, Arthropoda)
  88. Agrobacterium tumefaciens (Bacteria, Proteobacteria) Agrobacterium
  89. Papaver somniferum - opium poppy (Plantae, Tracheophyta)
  90. Sus scrofa - pig (Animalia, Chordata)
  91. Datura stramonium - Stechapfel (Plantae, Tracheophyta)
  92. Trifolium repens (Plantae) Trifolium
  93. Avena sativa (Plantae, Tracheophyta)
  94. Yersinia pestis - bubonic plague (Bacteria, Proteobacteria)
  95. Coccidioides immitis (Fungi, Ascomycota) Coccidioides
  96. Brucella abortus (Bacteria, Proteobacteria) Brucella
  97. Pinus strobus (Plantae)
  98. Brassica napus (Plantae, Tracheophyta)
  99. Lolium perenne (Plantae, Tracheophyta) Lolium
  100. Pseudomonas fluorescens (Bacteria, Proteobacteria) *

Notes edit

  • Sorted by the total number of books (or volumes) in which the term is found, with the highest count first.
  • Uses Google's 2012 "English (All)" Ngram corpus. This includes scanned English-language fiction and non-fiction books.
  • Homo sapiens (#1) was found in 87,380 volumes. Pseudomonas fluorescens (#100) in 6,562.
  • 30,019,350,634 lines of Google's 2gram data were parsed to create the list. 0.009% of lines were relevant (i.e. contained a species name). The script completed after running continuously for 6.5 days and downloaded around 120 GB of compressed data.
  • If anyone's interested in further processing the generated data or in some variant of it, let me know what you'd like to do so I can send you the relevant file(s) or see what I can do.
  • 51 of the 100 listed scientific names are red links on en.wikt (at time of this posting). Twelve of the top 1000 are red links on English Wikipedia, but this does not include any of the top 100 (this list).
  • I've generated more specific lists which show only plants or vertebrates. When I have time, I may attempt to generate lists for other categories such as invertebrates, or a list just of genera, or perhaps an auto-updating list only those with missing entries or etymologies, or restricting the search only to fiction.

Inclusion criteria for species found in books:

  • Had to be found already capitalized with the modern convention used for binomial names (i.e. the only capital being the first letter of the genera)
  • Had to be also be found in the Catalogue of Life. Of CoL's approx 2,485,495 binomial species and synonyms, only 52,669 were found in books.
  • Had to appear in a minimum of 40 books (or volumes). This appears to be the minimum to be included in Google's 2gram data.
  • Trinomial names (e.g. subspecies) would have appeared as binomial to the search, and counted as such.
  • Only books published after 1950 were included in the tally. This is partly to keep it current and partly to keep it fair: some binomials were capitalized differently before this date, so would have been left out in the final tally. (See the note on "botanical works published before the 1950s"). However, volume counts in these notes include the totals of all years available.

Pengo (talk) 08:31, 29 September 2014 (UTC)