en.unionpedia.org

Text mining, the Glossary

Index Text mining

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text.[1]

Table of Contents

  1. 134 relations: Advertising network, Affect (psychology), Algorithm, Annotation, Association of European Research Libraries, Australian Law Reform Commission, Authors Guild, Inc. v. Google, Inc., Automatic summarization, Big data, Bioinformatics, Biology, Biomedicine, Biotechnology and Biological Sciences Research Council, Blog, Book, Business intelligence, Business rule, Classification, Commercial software, Competitive intelligence, Concept mining, Content analysis, Context (linguistics), Copyright and Information Society Directive 2001, Copyright law of Australia, Copyright law of Japan, Copyright law of the European Union, Copyright law of the United States, Coreference, Corpus manager, Counterintelligence, Customer attrition, Customer relationship management, Data and information visualization, Data mining, Data model, Database, Database Directive, Database index, Digital journalism, Dimensionality reduction, Discovery (observation), Document, Document classification, Document clustering, Document processing, Document type definition, Electronic discovery, Email, Email filtering, ... Expand index (84 more) »

  2. Applied data mining
  3. Text

Advertising network

An online advertising network or ad network is a company that connects advertisers to websites that want to host advertisements.

See Text mining and Advertising network

Affect (psychology)

Affect, in psychology, is the underlying experience of feeling, emotion, attachment, or mood.

See Text mining and Affect (psychology)

Algorithm

In mathematics and computer science, an algorithm is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation.

See Text mining and Algorithm

Annotation

An annotation is extra information associated with a particular point in a document or other piece of information.

See Text mining and Annotation

Association of European Research Libraries

The Association of European Research Libraries (Ligue des Bibliothèques Européennes de Recherche or LIBER) is a professional association of national and university research libraries in Europe.

See Text mining and Association of European Research Libraries

Australian Law Reform Commission

The Australian Law Reform Commission (often abbreviated to ALRC) is an Australian independent statutory body established to conduct reviews into the law of Australia.

See Text mining and Australian Law Reform Commission

Authors Guild v. Google 804 F.3d 202 (2nd Cir. 2015) was a copyright case heard in federal court for the Southern District of New York, and then the Second Circuit Court of Appeals between 2005 and 2015.

See Text mining and Authors Guild, Inc. v. Google, Inc.

Automatic summarization

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Text mining and Automatic summarization are computational linguistics and natural language processing.

See Text mining and Automatic summarization

Big data

Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software.

See Text mining and Big data

Bioinformatics

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex.

See Text mining and Bioinformatics

Biology

Biology is the scientific study of life.

See Text mining and Biology

Biomedicine

Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)"." NCI Dictionary of Cancer Medicine.

See Text mining and Biomedicine

Biotechnology and Biological Sciences Research Council

Biotechnology and Biological Sciences Research Council (BBSRC), part of UK Research and Innovation, is a non-departmental public body (NDPB), and is the largest UK public funder of non-medical bioscience.

See Text mining and Biotechnology and Biological Sciences Research Council

Blog

A blog (a truncation of "weblog") is an informational website consisting of discrete, often informal diary-style text entries (posts).

See Text mining and Blog

Book

A book is a medium for recording information in the form of writing or images.

See Text mining and Book

Business intelligence

Business intelligence (BI) consists of strategies and technologies used by enterprises for the data analysis and management of business information.

See Text mining and Business intelligence

Business rule

A business rule defines or constrains some aspect of a business.

See Text mining and Business rule

Classification

Classification is usually understood to mean the allocation of objects to certain pre-existing classes or categories.

See Text mining and Classification

Commercial software

Commercial software, or seldom payware, is a computer software that is produced for sale or that serves commercial purposes.

See Text mining and Commercial software

Competitive intelligence

Competitive intelligence (CI) is the process and forward-looking practices used in producing knowledge about the competitive environment to improve organizational performance.

See Text mining and Competitive intelligence

Concept mining

Concept mining is an activity that results in the extraction of concepts from artifacts. Text mining and concept mining are natural language processing.

See Text mining and Concept mining

Content analysis

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video.

See Text mining and Content analysis

Context (linguistics)

In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a focal event, in these disciplines typically a communicative event, of some kind.

See Text mining and Context (linguistics)

The Copyright and Information Society Directive 2001 is a directive in European Union law that was enacted to implement the WIPO Copyright Treaty and to harmonise aspects of copyright law across Europe, such as copyright exceptions.

See Text mining and Copyright and Information Society Directive 2001

The copyright law of Australia defines the legally enforceable rights of creators of creative and artistic works under Australian law.

See Text mining and Copyright law of Australia

consist of two parts: "Author's Rights" and "Neighbouring Rights".

See Text mining and Copyright law of Japan

The copyright law of the European Union is the copyright law applicable within the European Union.

See Text mining and Copyright law of the European Union

The copyright law of the United States grants monopoly protection for "original works of authorship".

See Text mining and Copyright law of the United States

Coreference

In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent.

See Text mining and Coreference

Corpus manager

A corpus manager (corpus browser or corpus query system) is a tool for multilingual corpus analysis, which allows effective searching in corpora.

See Text mining and Corpus manager

Counterintelligence

Counterintelligence (counter-intelligence) or counterespionage (counter-espionage) is any activity aimed at protecting an agency's intelligence program from an opposition's intelligence service.

See Text mining and Counterintelligence

Customer attrition

Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.

See Text mining and Customer attrition

Customer relationship management

Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information.

See Text mining and Customer relationship management

Data and information visualization

Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items.

See Text mining and Data and information visualization

Data mining

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

See Text mining and Data mining

Data model

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.

See Text mining and Data model

Database

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data.

See Text mining and Database

Database Directive

The Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases is a directive of the European Union in the field of copyright law, made under the internal market provisions of the Treaty of Rome.

See Text mining and Database Directive

Database index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

See Text mining and Database index

Digital journalism

Digital journalism, also known as netizen journalism or online journalism, is a contemporary form of journalism where editorial content is distributed via the Internet, as opposed to publishing via print or broadcast.

See Text mining and Digital journalism

Dimensionality reduction

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.

See Text mining and Dimensionality reduction

Discovery (observation)

Discovery is the act of detecting something new, or something previously unrecognized as meaningful.

See Text mining and Discovery (observation)

Document

A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content.

See Text mining and Document

Document classification

Document classification or document categorization is a problem in library science, information science and computer science. Text mining and document classification are natural language processing.

See Text mining and Document classification

Document clustering

Document clustering (or text clustering) is the application of cluster analysis to textual documents.

See Text mining and Document clustering

Document processing

Document processing is a field of research and a set of production processes aimed at making an analog document digital. Text mining and document processing are Applied data mining.

See Text mining and Document processing

Document type definition

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language (GML, SGML, XML, HTML).

See Text mining and Document type definition

Electronic discovery

Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often referred to as electronically stored information or ESI).

See Text mining and Electronic discovery

Email

Electronic mail (email or e-mail) is a method of transmitting and receiving messages using electronic devices.

See Text mining and Email

Email filtering

Email filtering is the processing of email to organize it according to specified criteria.

See Text mining and Email filtering

Encryption

In cryptography, encryption is the process of transforming (more specifically, encoding) information in a way that, ideally, only authorized parties can decode.

See Text mining and Encryption

Engineering and Physical Sciences Research Council

The Engineering and Physical Sciences Research Council (EPSRC) is a British Research Council that provides government funding for grants to undertake research and postgraduate degrees in engineering and the physical sciences, mainly to universities in the United Kingdom.

See Text mining and Engineering and Physical Sciences Research Council

Entity–relationship model

An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge.

See Text mining and Entity–relationship model

European Commission

The European Commission (EC) is the primary executive arm of the European Union (EU).

See Text mining and European Commission

Exploratory data analysis

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.

See Text mining and Exploratory data analysis

Fair dealing

Fair dealing is a limitation and exception to the exclusive rights granted by copyright law to the author of a creative work.

See Text mining and Fair dealing

Fair use

Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder.

See Text mining and Fair use

File system

In computing, a file system or filesystem (often abbreviated to FS or fs) governs file organization and access.

See Text mining and File system

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database.

See Text mining and Full-text search

Gensim

Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning.

See Text mining and Gensim

GoPubMed

GoPubMed was a knowledge-based search engine for biomedical texts.

See Text mining and GoPubMed

Homonym

In linguistics, homonyms are words which are either homographs—words that have the same spelling (regardless of pronunciation)—or homophones—words that have the same pronunciation (regardless of spelling)—or both.

See Text mining and Homonym

IBM

International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American multinational technology company headquartered in Armonk, New York and present in over 175 countries.

See Text mining and IBM

Information

Information is an abstract concept that refers to something which has the power to inform.

See Text mining and Information

Information Awareness Office

The Information Awareness Office (IAO) was established by the United States Defense Advanced Research Projects Agency (DARPA) in January 2002 to bring together several DARPA projects focused on applying surveillance and information technology to track and monitor terrorists and other asymmetric threats to U.S.

See Text mining and Information Awareness Office

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Text mining and information extraction are natural language processing.

See Text mining and Information extraction

Information retrieval

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. Text mining and information retrieval are natural language processing.

See Text mining and Information retrieval

Intelligence analysis

Intelligence analysis is the application of individual and collective cognitive methods to weigh data and test hypotheses within a secret socio-cultural context.

See Text mining and Intelligence analysis

Jisc

Jisc is a United Kingdom not-for-profit organisation that provides network and IT services and digital resources in support of further and higher education and research, as well as the public sector.

See Text mining and Jisc

Lexical analysis

Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program.

See Text mining and Lexical analysis

Limitations and exceptions to copyright are provisions, in local copyright law or the Berne Convention, which allow for copyrighted works to be used without a license from the copyright owner.

See Text mining and Limitations and exceptions to copyright

Linguistics

Linguistics is the scientific study of language.

See Text mining and Linguistics

List of life sciences

This list of life sciences comprises the branches of science that involve the scientific study of life – such as microorganisms, plants, and animals including human beings.

See Text mining and List of life sciences

List of text mining software

Text mining computer programs are available from many commercial and open source companies and sources.

See Text mining and List of text mining software

Machine learning

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data and thus perform tasks without explicit instructions.

See Text mining and Machine learning

Machine translation

Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Text mining and Machine translation are computational linguistics and natural language processing.

See Text mining and Machine translation

Macromolecular docking

Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules.

See Text mining and Macromolecular docking

Market sentiment

Market sentiment, also known as investor attention, is the general prevailing attitude of investors as to anticipated price development in a market.

See Text mining and Market sentiment

Microsoft

Microsoft Corporation is an American multinational corporation and technology company headquartered in Redmond, Washington.

See Text mining and Microsoft

Name resolution (semantics and text extraction)

In semantics and text extraction, name resolution refers to the ability of text mining software to determine which actual person, actor, or object a particular use of a name refers to. Text mining and name resolution (semantics and text extraction) are computational linguistics.

See Text mining and Name resolution (semantics and text extraction)

Named-entity recognition

Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Text mining and named-entity recognition are computational linguistics.

See Text mining and Named-entity recognition

National Centre for Text Mining

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. Text mining and National Centre for Text Mining are computational linguistics.

See Text mining and National Centre for Text Mining

National Institutes of Health

The National Institutes of Health, commonly referred to as NIH, is the primary agency of the United States government responsible for biomedical and public health research.

See Text mining and National Institutes of Health

National security

National security, or national defence (national defense in American English), is the security and defence of a sovereign state, including its citizens, economy, and institutions, which is regarded as a duty of government.

See Text mining and National security

Natural language

In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that occurs naturally in a human community by a process of use, repetition, and change without conscious planning or premeditation. Text mining and natural language are natural language processing.

See Text mining and Natural language

Natural language processing

Natural language processing (NLP) is an interdisciplinary subfield of computer science and artificial intelligence. Text mining and Natural language processing are computational linguistics.

See Text mining and Natural language processing

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. Text mining and natural Language Toolkit are natural language processing.

See Text mining and Natural Language Toolkit

Nature (journal)

Nature is a British weekly scientific journal founded and based in London, England.

See Text mining and Nature (journal)

News analytics

In trading strategy, news analysis refers to the measurement of the various qualitative and quantitative attributes of textual (unstructured data) news stories. Text mining and news analytics are natural language processing.

See Text mining and News analytics

Noun phrase

A noun phrase – or NP or nominal (phrase) – is a phrase that usually has a noun or pronoun as its head, and has the same grammatical functions as a noun.

See Text mining and Noun phrase

Novelty (patent)

Novelty is one of the patentability requirement for a patent claim, whose purpose is to prevent issuing patents on known things, i.e. to prevent public knowledge from being taken away from the public domain.

See Text mining and Novelty (patent)

Offender profiling

Offender profiling, also known as criminal profiling, is an investigative strategy used by law enforcement agencies to identify likely suspects and has been used by investigators to link cases that may have been committed by the same perpetrator.

See Text mining and Offender profiling

Ontology learning

Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. Text mining and ontology learning are natural language processing.

See Text mining and Ontology learning

Open access

Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers.

See Text mining and Open access

Open Mind Common Sense

Open Mind Common Sense (OMCS) is an artificial intelligence project based at the Massachusetts Institute of Technology (MIT) Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.

See Text mining and Open Mind Common Sense

Open source

Open source is source code that is made freely available for possible modification and redistribution.

See Text mining and Open source

Parsing

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar.

See Text mining and Parsing

Part-of-speech tagging

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context.

See Text mining and Part-of-speech tagging

Pattern matching

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern.

See Text mining and Pattern matching

Pattern recognition

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data.

See Text mining and Pattern recognition

Plain text

In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters.

See Text mining and Plain text

Predictive analytics

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications.

See Text mining and Predictive analytics

Protein

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues.

See Text mining and Protein

PubGene

PubGene AS is a bioinformatics company located in Oslo, Norway and is the daughter company of PubGene Inc.

See Text mining and PubGene

Readability

Readability is the ease with which a reader can understand a written text.

See Text mining and Readability

Record linkage

Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and databases).

See Text mining and Record linkage

Relevance (information retrieval)

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user.

See Text mining and Relevance (information retrieval)

Research

Research is "creative and systematic work undertaken to increase the stock of knowledge".

See Text mining and Research

Research Councils UK

Research Councils UK, sometimes known as RCUK, was a non-departmental public body that coordinated science policy in the United Kingdom from 2002 to 2018.

See Text mining and Research Councils UK

Review

A review is an evaluation of a publication, product, service, or company or a critical take on current affairs in literature, politics or culture.

See Text mining and Review

Search engine

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query.

See Text mining and Search engine

Security appliance

A security appliance is any form of server appliance that is designed to protect computer networks from unwanted traffic.

See Text mining and Security appliance

Semantic Web

The Semantic Web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C).

See Text mining and Semantic Web

Sentiment analysis

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Text mining and Sentiment analysis are natural language processing.

See Text mining and Sentiment analysis

Sequential pattern mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.

See Text mining and Sequential pattern mining

Sexism

Sexism is prejudice or discrimination based on one's sex or gender.

See Text mining and Sexism

Social media are interactive technologies that facilitate the creation, sharing and aggregation of content (such as ideas, interests, and other forms of expression) amongst virtual communities and networks.

See Text mining and Social media

Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies.

See Text mining and Social science

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

See Text mining and Statistics

Subject–verb–object word order

In linguistic typology, subject–verb–object (SVO) is a sentence structure where the subject comes first, the verb second, and the object third.

See Text mining and Subject–verb–object word order

In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file).

See Text mining and Tag (metadata)

Text Analysis Portal for Research

TAPoR (Text Analysis Portal for Research) is a gateway that highlights tools and code snippets usable for textual criticism of all types. Text mining and text Analysis Portal for Research are computational linguistics.

See Text mining and Text Analysis Portal for Research

Text corpus

In linguistics and natural language processing, a corpus (corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Text mining and text corpus are computational linguistics.

See Text mining and Text corpus

Tribune Media Company, also known as Tribune Company, was an American multimedia conglomerate headquartered in Chicago, Illinois.

See Text mining and Tribune Media

UC Berkeley School of Information

The University of California, Berkeley, School of Information, also known as the UC Berkeley School of Information or the I School, is a graduate school and, created in 1994, the newest of the schools at the University of California, Berkeley.

See Text mining and UC Berkeley School of Information

University of Alberta

The University of Alberta (also known as U of A or UAlberta) is a public research university located in Edmonton, Alberta, Canada.

See Text mining and University of Alberta

University of California, Berkeley

The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California.

See Text mining and University of California, Berkeley

University of Manchester

The University of Manchester is a public research university in Manchester, England.

See Text mining and University of Manchester

University of Tokyo

The University of Tokyo (abbreviated as Tōdai (東大) in Japanese and UTokyo in English) is a public research university in Bunkyō, Tokyo, Japan.

See Text mining and University of Tokyo

Unstructured data

Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner.

See Text mining and Unstructured data

W-shingling

In natural language processing a w-shingling is a set of unique shingles (therefore ''n-grams'') each of which is composed of contiguous subsequences of tokens within a document, which can then be used to ascertain the similarity between documents. Text mining and w-shingling are natural language processing.

See Text mining and W-shingling

Website

A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server.

See Text mining and Website

Weka (software)

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License.

See Text mining and Weka (software)

WordNet

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. Text mining and WordNet are computational linguistics.

See Text mining and WordNet

See also

Applied data mining

Text

References

[1] https://en.wikipedia.org/wiki/Text_mining

Also known as Applications of text mining, Auto-entity extraction, Data and text mining, Intelligent text analysis, Text analytics, Text and data mining, Text-mining, Textmining.

, Encryption, Engineering and Physical Sciences Research Council, Entity–relationship model, European Commission, Exploratory data analysis, Fair dealing, Fair use, File system, Full-text search, Gensim, GoPubMed, Homonym, IBM, Information, Information Awareness Office, Information extraction, Information retrieval, Intelligence analysis, Jisc, Lexical analysis, Limitations and exceptions to copyright, Linguistics, List of life sciences, List of text mining software, Machine learning, Machine translation, Macromolecular docking, Market sentiment, Microsoft, Name resolution (semantics and text extraction), Named-entity recognition, National Centre for Text Mining, National Institutes of Health, National security, Natural language, Natural language processing, Natural Language Toolkit, Nature (journal), News analytics, Noun phrase, Novelty (patent), Offender profiling, Ontology learning, Open access, Open Mind Common Sense, Open source, Parsing, Part-of-speech tagging, Pattern matching, Pattern recognition, Plain text, Predictive analytics, Protein, PubGene, Readability, Record linkage, Relevance (information retrieval), Research, Research Councils UK, Review, Search engine, Security appliance, Semantic Web, Sentiment analysis, Sequential pattern mining, Sexism, Social media, Social science, Statistics, Subject–verb–object word order, Tag (metadata), Text Analysis Portal for Research, Text corpus, Tribune Media, UC Berkeley School of Information, University of Alberta, University of California, Berkeley, University of Manchester, University of Tokyo, Unstructured data, W-shingling, Website, Weka (software), WordNet.