Text and data mining are associated methods for identifying patterns within large bodies of text, in the case of text mining, or data, in the case of data mining. There are a number of different techniques associated with this method.
"What is Text Mining?" from Elsevier
"How does Text Mining Work?" from Elsevier
Resources and Training
Voyant Tools is a web-based platform for generating statistical information about text corpora that may offer preliminary information about your text(s). For text-wrangling and text mining skills, consult the University of Southern California's excellent list of training resources. Additionally, Programming Historian has excellent tutorials on working with text and textual data.
Getting Textual Datasets
Some vendors, publishers, journals, and other organizations have made text available via application programming interfaces (APIs) and below we list those available to University of Toronto community members. University of Toronto Libraries has some locally loaded materials available for text mining as well. Some openly accessible collections may also be useful; the University of Illinois at Urbana Champaign has compiled a list of open resources for text mining.
For help with using APIs or to inquire about available materials for text mining, contact us.
APIs
Scholarly Publishing APIs
- arXiv
- BioMed Central
- SAO/NASA
- CORE
- CrossRef REST
- Dataverse Network
- Europe PubMed
- HathiTrust Bibliographic
- HathiTrust Data
- IEEE Xplore
- JSTOR Data for Research
- National Library of Medicine
- OpenAlex
- ORCID
- PLOS Article-Level Metrics
- PLOS Search
- Scholars Portal (SP) Journals API (Beta)
- Science Direct
- SCOPUS
- Springer
- Web of Science
- Wiley Text and Data Mining
Humanities Research APIs
- Chronicling America
- Digital Public Library of America
- Europeana
- Library of Congress
- Metropolitan Museum of Art Collection
- OCLC WorldCat Search
Scientific Research APIs
Government and Institutional Data APIs
GIS Research APIs
We acknowledge MIT Libraries, Berkeley Libraries, and CEU Libraries, for providing some content and inspiration for this page.