I S K O

edited by Birger Hjørland and Claudio Gnoli

 

Document retrieval

by

Table of contents:
1. Introduction
    1.1 Terminological issues;
    1.2 Theoretical developments in information science
    1.3 Methodological approach
2. The epistemological importance of the concepts «document» and »literature»
    2.1 The epistemic importance of documents
    2.2 Epistemology of fact retrieval
    2.3 Source criticism
3 ChatGPT as an information provider
    3.1 Case 1: The application of logical division in the sciences
    3.2 Case 2: The Flakpanzer Gepard and drone warfare in Ukraine
    3.3 Case 3: Functional classification
    3.4 Case 4: Theory of knowledge in the Dewey Decimal Classification
    3.5 Lessons learned about ChatGPT's relation to documents
4. Conclusion
Acknowledgments
Endnotes
References
Colophon

Abstract:
ChatGPT and related technologies have revived an old issue in information science (IS) concerning information retrieval (IR) versus document retrieval. Since 1950, the term IR has primarily been used as a misnomer for document retrieval. This problematic terminology reflects a desire to go beyond documents and provide, in response to user queries, not lists of documents but direct answers. Only with the emergence of large language models such as ChatGPT has the goal of directly informing users appeared to many as justifiable in relation to IR. Such models, however, still depend on input in the form of documents. A basic problem with large language models is their inability to establish a valid connection between their answers and the sources on which they are based. Whereas scholarly norms dictate that all claims be explicitly supported by the sources and arguments used, this cannot be done satisfactorily by ChatGPT, which represents a fundamental limitation of this technology. Neglecting the documentary basis in all forms of IR is naïve, and the core concept in IS should be understood as document retrieval. Recognizing this distinction is essential for enabling users to maintain control over the search and to perform “source criticism”.

1. Introduction


1.1 Terminological issues

Mooers (1950, 5) coined the term → information retrieval (IR) at almost the same time as the term → information science (IS) and related phrases were introduced [1]. He defined IR as “finding information whose location or very existence is a priori unknown”. However, two senses of “finding information” are conflated in this definition and should be distinguished:

  1. Finding concrete answers to queries (i.e., information recorded in documents) [2]. This may be termed IR. Near synonyms and associated terms include fact retrieval [3], question answering, chatbots [4], and lookup searches [5].
  2. Finding → documents that contain answers to queries (or rather, that with some probability contain such answers). This is better termed document retrieval. Synonyms include searching information sources, and near-synonyms include literature searching, reference retrieval, or the retrieval of document surrogates such as abstracts or extracts.

The idea and desirability of “fact retrieval” predate the establishment of IS, as it was advocated, for example, by Paul Otlet and Julius Otto Kaiser, as documented by information researcher Dousa (2010) [6]. Distinguished researchers in IS have also regarded fact retrieval as the ultimate goal of the discipline. Karen Spärck Jones (1987, 9), for instance, claimed: “we are concerned with access and, more materially, indirect access to the information the user wants: he wants the information in the documents, but the system only gives him the documents”.

Despite this aim of moving from providing documents to providing answers, many researchers in the early development of IR realized that the field was in practice about document retrieval. Lancaster (1968), for example, wrote:

An information retrieval system does not inform (i.e., change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.

Kochen (1974, 1) formulated it this way: “When an information technologist speaks of an information retrieval (IR) system, he usually means a way of providing people with documents they need”. Similarly, Rijsbergen and Lalmas (1996, 386) wrote: “In the early days of information retrieval (van Rijsbergen, 1979), people used to qualify their statements about IR by saying that really they were working on document retrieval. It was denied strenuously that information was being retrieved”.

Over time, the distinction between document retrieval and IR, as expressed by Lancaster and others, was largely neglected, and the terms were often used in the same ambiguous way as by Mooers. However, in 1996 the issue was revisited by Rijsbergen and Lalmas, who provided a defense of the term IR as contrasted with document retrieval. This is the only well-developed defense found in the literature. I interpret their article as part of a long tradition in IS of seeking to replace document retrieval with IR and shall return to it in Section 2.2 on the epistemology of fact retrieval.

[top of entry]

1.2 Theoretical developments in information science

The issue discussed in this article involves broader aspects of the development of IS. When this discipline was introduced, “information” often replaced “document”, and the former term documentation was largely replaced by IS, as indicated by the name change in 1968 of the American Documentation Institute to the American Society for Information Science. However, this downplaying of the term document has met a countermovement, sometimes termed the neo-documentary turn [7]. A notable contribution is Buckland's “information-as-thing” (1991), which argued that → information has multiple meanings, but that the most important within library and information science (LIS) is “information-as-thing” (i.e. document) [8]. This is crucial because, as Buckland noted (1991, 358), “information storage and retrieval systems can deal directly only with ‘information-as-thing’”.

The neo-documentary movement, however, has not engaged deeply with specific information technologies such as recent large language model (LLM)-based systems. This paper therefore addresses the conflict between the dominant understanding of IR and the document perspective for understanding the limitations of technologies such as ChatGPT.

[top of entry]

1.3 Methodological approach

Section 1.1 documented that the term IR has often been used as a misnomer for document retrieval. A core argument in this article is that the importance of document retrieval is primarily based on epistemological arguments. Therefore, Section 2 examines the epistemological implications of neglecting the concepts document and literature, divided into the following subsections: 2.1, the epistemic importance of documents; 2.2, the epistemology of fact retrieval; and 2.3, source criticism.

Section 3 discusses ChatGPT as an information provider. Today, artificial intelligence (AI) — more precisely, generative AI, which creates new content, such as text, images, audio, video, and code, by learning patterns from large datasets — has become increasingly pervasive and is heavily promoted by the information technology industry. LLMs are a specific type of generative AI focused on generating human-like text, and one of the well-known examples thereof is ChatGPT. They do not retrieve text verbatim but create new output by statistically modeling patterns, structures, and associations found in the data. Their behavior thus reflects the limitations of probabilistic pattern modeling rather than conceptually based understanding. Well-known problems with LLM technology are that such systems “encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes” (cf. Gao et al. 2024). In contrast to traditional bibliographical databases and search engines, they provide concrete answers rather than lists of documents, and it is not possible to specify which specific documents have contributed to a given answer, although newer versions often cite specific sources. Section 3 presents four case studies illustrating ChatGPT's dependence on input documents as well as on the user's subject knowledge. The study focuses not on user trust but on the epistemic validity and justifiability of AI-generated information. It thus points to an inherent weakness in the fused and opaque processing of document retrieval and information synthesis, that is, not separating the discovery of documents that contain answers from the provision of interpretations and syntheses from these documents.

[top of entry]

2. The epistemological importance of the concepts “document” and “literature”


2.1 The epistemic importance of documents

Information scientist Henning Spang-Hanssen (2001) provided an important analysis of documentary provenance as an essential element of any piece of information:

Information about some physical property of a material is actually incomplete without information about the precision of the data and about the conditions under which these data were obtained. Moreover, various investigations of a property have often led to different results that cannot be compared and evaluated apart from information about their background. An empirical fact always has a history and a perhaps not too certain future. This history and future can be known only through information from particular documents, i.e., by document retrieval.
The so-called fact retrieval centers seem to me to be just information centers that keep their information sources—i.e., their documents—exclusively to themselves.

Public norms for documenting knowledge claims have evolved. A hallmark of academic work is to provide sources for all claims, either by referring to one's own observations and reflections or to those of others as documented in public sources. It is well known, for example, that Wikipedia often calls for documentation by inserting “[citation needed]”.

Providing proper bibliographic references is not merely a formalistic academic procedure. The hallmark of scholarship is to provide relevant and authoritative references that support the claims made. This may be irritating to some, but without such documentation, errors, myths, confusions, and problematic claims flourish.

As Spang-Hanssen argued, a given claim may have been contradicted in the literature, and it is naïve to consider “the universe of knowledge” to consist only of verified statements. As information researcher Mai (2009, 1) wrote:

The notion that there is such a thing as the universe of knowledge has been challenged many times in modern philosophy, and most recent work in classification rejects the notion. Most scholars accept that there are multiple universes of knowledge, that people come to the world from different perspectives, view the world differently, and understand the world differently.
Further, most scholars today accept that things and topics cannot be classified independently from their function, interpretation, or relation to other things and topics — it is generally accepted that things and topics do not have innate essences. In this line of thought, things and topics make sense only in context, and they are always understood from a particular epistemic standpoint.

Consequently, the term universe of knowledge should either be avoided or replaced, or it should be made explicit that it refers to a universe encompassing different claims, perspectives, theories, and viewpoints on given subject matters — and indeed different views of what counts as knowledge, and thereby of what the universe of knowledge is and what it contains.

According to philosopher Gelfert (2010, 90), Immanuel Kant emphasized the role of documents in science:

According to Kant, science, with its goal of “exactly determining the phenomena through observations and experiments”, established a new “accuracy in the reporting of experience”, which “was then transferred from natural science to history”. Improvements in the publication and communication of reports contributed to this development: “The printing presses, and also the gazettes, immediately reveal where someone has erred in his writings”; unlike in ancient times, “one can no longer get away with telling merely anecdotal stories”.

Further:

Kant believes in the capacity of the social world to furnish objective grounds for knowledge, even in cases where direct first-hand experience of the facts concerned is out of our reach.

Placing the role of public documents — that is, publications — in the perspective of social epistemology, Gelfert shows how Kant generalizes Spang-Hanssen's point by emphasizing that the public nature of scientific communication contributes to eliminating errors and overcoming the subjectivity of individual scientists.

The role of disciplinary literature is all too often underestimated. The literature of a field is the alpha and omega of research: all research should take its starting point in the literature, identify unsolved or problematic issues, and contribute to improving and clarifying them. In other words, research should both begin and end in the literature. Recorded knowledge is what scientists and scholars must draw upon to become professionals; it is the intellectual heritage passed down to future scholars and practitioners in each field. Of course, many other elements matter, such as the mastery of various skills. However, the knowledge recorded in the disciplinary literature represents the core. Therefore, it is an important — though often under-prioritized — task to nurture the scientific literature, correct errors, provide conceptual clarity, and synthesize its most important knowledge.

[top of entry]

2.2 Epistemology of fact retrieval

In the positivist background of IS, the aim of recording pure facts has long prevailed, as pointed out by information researcher Sundin (2025, 293–294):

Otlet wanted to create what he called a ‘Universal book of knowledge’, with all possible facts extracted from books, journals, statistics and so on. These facts ‘will be recorded on separate leaves or cards rather than being confined in volumes’ and organised according to a standardised classification system (Universal Decimal Classification) (Otlet in Frohmann, 2008, p. 79). Otlet's idea of extracting facts from documents conveys an understanding of facts as abstract, neutral, and decontextualised from authors as well as from a specific volume (or similar).

Otlet referred to this process of extracting facts from documents by professional documentalists as “documentation”. In this process, new, highly standardized, short documents were created. The positivist attitude is especially clear in what Frohmann (2008, 82) described as Otlet's wish “to eliminate authorial subjectivity” (i.e., the opposite of providing signed articles). This reveals a naïve, positivist view in IS, opposed to the hermeneutical position, in which the objectivity of claimed facts is questioned, and which considers their extraction not as a mechanical activity but as contextualized and theory-dependent [9].

Spärck Jones and Kay (1973, 175) wrote: “The difference [between document retrieval and fact retrieval] is not one of principle because, clearly, a document retrieval system is nothing more than a fact retrieval system restricted to operate on facts about documents” [10]. The authors further found that the main difference between the two types of retrieval is that fact retrieval to a much higher degree must be able to find facts in a file on the basis of inferring correct answers from facts given in the file, for example, if the file contains the information that the Mona Lisa is in the Louvre and that the Louvre is in Paris, the system should be able to infer that the Mona Lisa is in Paris. I shall not here discuss whether this difference is a principal one (or whether document retrieval may also demand such inference) [11]. Of interest here is that Spärck Jones and Kay (1973, 177) concluded that research on fact retrieval must be based on human psychology:

This is to say that the intellectual capabilities of the machine would have to be precisely those of a human being. […] The argument that such a machine was a valid psychological model would be immensely compelling. It is for this reason that work on fact retrieval systems is usually said to belong to the discipline of artificial intelligence, and hence to psychology.

While many commentators hold, as Sparck Jones and Kay did, that psychology contributed to — and is a precondition for — the development of AI (e.g., by the idea of “neural networks”), this concept seems to have been more metaphorical and formal than it has been psychologically realistic. Modern AI (with LLMs) is, at its engineering core, independent of psychology and brain research. It is based on statistics, linear algebra, optimization methods, and scalable data infrastructures. However, it is outside the scope of the present article to discuss the relation between AI and psychology in general, and this does not seem necessary either, as it is difficult to understand the relevance of a universal psychological model for research on fact retrieval. Clearly, people's ability to ask questions depends on the knowledge they possess, including their ability to infer a correct answer from implicit information. Therefore, what is required is a socio-cultural or epistemological research perspective rather than one based on experimental psychology, assuming a universal psychological model.

Information researcher Furner (2016, 288) observed that the relation between → data (i.e., facts) and documents is often confused in the literature:

There is little consensus on the precise nature of the conceptual relationship between ‘data’ and ‘document’. The default position appears to be the view that all documents are in some sense made up of data […]. The position I wish to develop in this paper, however, is that it is not in fact the case that documents are made up of data. On the contrary, it is the other way round: datasets are made up of documents.

Furner's point can be understood in the context of the debate between → atomism and holism (see Hjørland 2025, 11–13). Atomists understand data and facts as independent statements, whereas holists regard them as relative to contexts. Holism and hermeneutics thus correspond to Spang-Hanssen's view that a fact always has a history and an uncertain future and therefore must be considered in the context of the document in which it is published — and, more broadly, in the literature of which it forms a part.

A relatively late article by Rijsbergen and Lalmas (1996, 386), however, defends the view that IR concerns information rather than documents:

We believe that the purpose of an information retrieval system is to provide information about a request and that a request is a representation of an information need that an IR system attempts to satisfy. Hence, a fundamental problem is how to compute the information contained in one object (e.g., a document) about another (e.g., a query). Thus, if a user states a query then it behooves the IR system to find the objects that contain information about that query.

Rijsbergen and Lalmas treat information as that which informs one document (or information object) about another (the query) and seek the solution to IR to be in semantic similarity between the two. Two basic objections to their view are: (1) the informativeness of a document in relation to a query is not objective but theory-dependent, and (2) the query itself is also shaped by the “paradigm” of its author and must therefore be interpreted before an answer can be provided.

As philosopher A.F. Chalmers (1982, 92) wrote from the history of science: prior to Newton, there was no agreement about which observations (or documents reporting those observations) were relevant to understanding and guiding research on the nature of light. In other words, there was no agreement on which documents were informative for this problem. This means that Rijsbergen and Lalmas are wrong in assuming the existence of a fixed relationship between two documents in which one objectively contains information about the other. The example shows that informativeness is theory-dependent. Different theories of a subject — for instance, bedwetting (nocturnal enuresis) — make different investigations and documents informative in relation to queries on the topic. At one time, psychoanalytic theories of bedwetting prevailed, but today they have been replaced by other theories, which in turn provide new criteria for what studies to consult. This perspective is completely absent from Rijsbergen and Lalmas (1996) and from much of the IR literature.

In addition, as physicist and philosopher Kuhn (1962) pointed out, a paradigm influences how researchers view the world and the kinds of queries they pose. As information researcher Kwon (2016, ii) emphasized:

If ‘information’ is a central concept for library and information science, then ‘questions’ are fundamental, for information ‘informs’ relative to the question. But research focusing on questions as a central theoretical concept has been stymied by the paradox of the question, which observes that in order to ask one must know enough to know what one does not know.

Rather than relying on the traditional methodology in IS, where users are asked which documents they consider relevant, it seems preferable to examine how advanced students and researchers receive feedback from advisors and peer reviewers, and how different views of relevance are connected to paradigms in the research field. Such feedback does not take a student's or researcher's assumptions as given but may suggest alternative perspectives. It can highlight the relevance of new documents and problematize the use of others. This shifts the question of relevance (or informativeness) from psychological studies of users to philosophical studies of subject fields.

Based on Spang-Hanssen's analysis and the considerations in this section, the wish to replace document retrieval with IR, as expressed by Spärck Jones and others, appears naïve and philosophically problematic. I shall consider this further in the remainder of this article.

[top of entry]

2.3 Source criticism

As noted in Section 2.1, many users may be irritated by retrieving sets of references rather than direct answers to their queries. This reaction is understandable and even logical because it places an additional workload on the users: they must extract information from a set of sources, evaluate and select what is true and relevant, synthesize the information, and formulate an answer to the query. (For example, we commonly rely on our doctor's diagnosis and prescriptions without performing the work of searching, interpreting, and concluding on the basis of the medical literature.) Systems like ChatGPT carry out this work for us, whereas document-retrieval systems do not. An important task, therefore, is to justify this workload — or more precisely, to specify when it is justified. (For instance, it seems seldom justified for patients to search and read the medical literature, but it is important for medical specialists, and particularly for researchers.) This question involves the issue of when a given information source should be trusted and when its claims should be questioned. In other words, it involves source criticism, which, in a broad interdisciplinary sense, concerns evaluating whether a given source is reliable for the intended use and whether it provides valid knowledge (cf. Hjørland 2012).

Fact-retrieval and question-answering systems and services are not problematic in themselves. There has always been a rich variety of such systems and services, including statistical databases, directories, dictionaries, encyclopedias, maps, and databases providing scientific and mathematical constants and tables. Such information sources vary in their degree of “cognitive authority” [12], which plays a role in their use, and sometimes multiple sources are compared to check the validity of the information received. In the case of encyclopedias, one quality norm is that entries are written and signed by experts, thereby enabling a certain kind of source criticism. The point here is not that fact-retrieval systems are inherently problematic, but that the term IR has typically been applied in IS to document-retrieval systems. Moreover, this misnomer has been associated with the idea that document retrieval is a poor substitute for “real” IR and with a positivist view that seeks to eliminate authorial subjectivity.

The need for source criticism is also associated with the problem that consensus in science is often overrated, as is the view that synthesizing knowledge from different documents can be done objectively and free of authorial subjectivity. Regarding consensus, philosopher Broadfield (1946, 69–70) wrote:

Consensus is most likely to appear among the unenlightened, of whom it is characteristic to be unanimous on the truth of what is false. In intellectual matters agreement is rare, especially in live issues.

By implication, different documents often represent disagreements and diverse perspectives. If the first seemingly relevant document is used uncritically, the user risks being poorly informed, perhaps even becoming a victim of “fake news”. As part of academic training, students are taught methodology and how to evaluate studies and claims (which itself is a field with multiple perspectives). The process of moving from the literature (i.e., identifying a set of documents deemed sufficient to provide authoritative answers) to answering questions must be as explicit, traceable, and transparent as possible. This is at the core of the problem with attempts to substitute document retrieval with IR. Sundin (2025, 296) presented this point of view, writing:

[T]his increasing invisibility of sources [documents used to provide answers] is important and likely consequential.

Further (2025, 298):

Increasing inbuilt AI support for searching and drawing conclusions from information sources may lead to de-skilling of searching for and evaluating information. Such de-skilling of information literacy transfers some of the previously required analytical skills from the user to the system. As a result, this transfer can also be seen as a transfer of agency in which the user has less control and becomes increasingly more dependent on the functions of the system—not only in the search as such, but also in evaluating and analysing the results.

Users' control of the search process is important for their ability to perform source criticism. As we shall see in the next section on ChatGPT, the way queries are formulated influences the answers. We thus face a hermeneutical circle: the quality of the answers depends on the knowledge of the user, and his or her knowledge depends on prior answers. When users have limited subject knowledge, ChatGPT and related technologies may therefore provide poor answers.

[top of entry]

3. ChatGPT as an information provider

A bit of background information about ChatGPT was given in Section 1.3. In keeping with the topic of this article, this section focuses on providing information depending on documents that contain the desired information. Systems based on LLMs obviously depend heavily on the input used for their training and fine-tuning. The base GPT models (including GPT-5) are trained on large static datasets, which are not updated in real time. However, beginning in 2023, ChatGPT could import external information during a chat session through a web search function (a kind of “RAG [retrieval-augmented generation]-like behavior”) [13], where retrieved web content may be incorporated into the generation process. At the time of writing, users of ChatGPT do not have direct control over whether web-browsing or retrieval functions are enabled and from the user's perspective this retrieval is non-transparent, non-persistent, and not documentable. Consequently, it is not possible for an author to state with certainty whether a given response relied solely on the model's training data or incorporated live web retrieval.

To examine the quality and trustworthiness of ChatGPT's answers, four cases are here presented as illustrative examples [14]. Three of these studies relate to my own research in knowledge organization, while case 2 was added to illustrate a non-academic example.

[top of entry]

3.1 Case 1: The application of logical division in the sciences

When I asked ChatGPT-4o (July 21, 2025) [15] about the application of → logical division in the sciences, it produced a rather comprehensive and detailed answer. Part of this response was that Linnaean classification was based on morphological traits and logical division, assuming essential characteristics for each group (e.g., “birds have feathers”). This answer could only have been generated from sources on the history and philosophy of biology, which ChatGPT had access to. Indeed, a large amount of literature exists that explains why ChatGPT produced this answer, for example, Cain (1958). However, when I pointed out to the system that more recent literature considers this view insufficient or wrong and labels it “the essentialism story” (cf. Richards 2016, 38), ChatGPT produced a qualified summary of this criticism of essentialism and logical division, thereby correcting its initial response.

This case demonstrates that, to obtain fully adequate answers, the user must already possess detailed knowledge about the subject of the query.

[top of entry]

3.2 Case 2: The Flakpanzer Gepard and drone warfare in Ukraine

On September 26, 2025, I asked ChatGPT-5: “The German Flakpanzer Gepard was some time ago a very effective weapon for Ukraine to shoot down Russian/Iranian drones. Is this still the case?” ChatGPT concluded:

Final take: Yes—the Gepard is still an effective counter-drone solution for Ukraine. Its proven record, cost-efficiency, and ongoing refreshes keep it valuable, although sustaining operations hinges on addressing spare parts and maintenance challenges.

I then commented: “As far as I have heard, Russia has changed strategy and now uses high-altitude drone attacks (>2 km altitude). Machine guns cannot shoot this high, and Ukraine is now developing and (until now on a small scale) using interceptor drones. How does the Gepard fit into this problem?”

ChatGPT revised its response:

The Gepard remains a vital asset — but its optimal use is increasingly restricted to lower altitudes and terminal drone phases.

Again, this case illustrates the need for subject knowledge to obtain a sufficient answer. (This example is included to show that the problem is not confined to academic discourse.) It cannot be determined whether the first answer drew on the webpages of Rheinmetall, the manufacturer of the Gepard, and the revised answer on other sources, but this remains a plausible hypothesis.

[top of entry]

3.3 Case 3: Functional classification

In a query on functional classification (August 1, 2025), ChatGPT-4o claimed that classification researcher Ingetraut Dahlberg's concept theory was about functional classification and even supplied references and quotes, the most precise being:

Dahlberg (1993, p. 36): “Functional concepts denote actions, processes, or roles which objects or subjects fulfill in given contexts”.

After a futile search, I told the system that I could not verify this quotation. ChatGPT then conducted a search and responded:

You're absolutely right to question that reference. I wasn't able to locate the exact quote attributed to Dahlberg (‘Functional concepts denote actions, processes, or roles…’); neither my prior publications nor other reliable summaries seem to contain it verbatim.

This is not simply an error in bibliographic referencing. Even after acknowledging the problem, ChatGPT continued to link Dahlberg with functional classification, which I consider incorrect and probably due to “personalization” [16]. → Dahlberg's theory should not be labeled “functional” [17], which ChatGPT-5 admitted in another chat on 16 August 2025:

There is a fundamental tension — and possibly even an incompatibility — between Ingetraut Dahlberg's formalist concept theory and a genuinely functionalist point of view, especially if we understand functionalism in its pragmatic, context-sensitive, and goal-relative sense (as often emphasized in fields like archival science, cognitive science, and pragmatist epistemology).

This example also demonstrates that deep subject knowledge is needed to obtain good answers. In addition, it illustrates bias due to “personalization”.

[top of entry]

3.4 Case 4: Theory of knowledge in the Dewey Decimal Classification

My question on August 19, 2025, was: “The → DDC makes a distinction between Theory of knowledge (001.01) and Philosophical theory of knowledge (epistemology) (121). Which theories of knowledge are not philosophical?”

ChatGPT's first answer was:

Dewey (and librarianship more generally) conceives epistemology versus more applied or disciplinary theories of knowledge […] In other words, 001.01 covers “theories of knowledge” developed within the sciences, social sciences, or applied fields—often empirical or pragmatic rather than explicitly philosophical [18]. Thus, non-philosophical theories of knowledge are those in psychology, sociology, information science, cybernetics, AI, and science studies. They are concerned with knowledge but are not treated as philosophy per se.

I then commented: “I wonder if you are right in putting those non-philosophical theories in 001.01. Is the sociology of knowledge not placed under sociology, and domain analysis under library and information sciences?”

ChatGPT answered: “Excellent point — you are absolutely right to challenge my quick placement”. After citing some DDC Manual notes — which, upon my comments, it later acknowledged were incorrect [19] — the final conclusion of ChatGPT was:

The distinction between general/non-disciplinary “theory of knowledge” (sometimes put in 001.01) and philosophical epistemology (in 121) emerges from editorial decisions or in-house practice, rather than from an explicit Manual note. Librarians often infer this separation based on standard classification logic — 121 is strictly philosophical, while 001.01 would only be used for works that are general or interdisciplinary — but these are ad hoc decisions rather than codified instructions.

This case again demonstrates that ChatGPT tends to provide incorrect answers, which it can revise if challenged by the user. But this requires detailed subject knowledge, in which case the revised answer is often less informative, because the user already knew the most important part.

[top of entry]

3.5 Lessons learned about ChatGPT's relation to documents

The preceding four case studies of ChatGPT's treatment of documentary information allow us to formulate the following six points about its relation to documents:

  1. A system (and a person) can only provide answers to questions about the empirical world based on information sources available to it. If we disregard systems that make their own observations, such information sources are documents in the broad sense of the term (including secondary sources such as Wikipedia, meaning that system outputs also depend on the quality of those secondary sources). As information scientist Donner (2025, 802) wrote: “the truthfulness of the model output would have to be discussed at the level of the model's primary data first, which could then have an impact on the LLM output. If the model's primary data is not truthful, how could the model's output be truthful?”
  2. The total accumulated documents and literature in the world are, of course, only partially represented in any given system, including ChatGPT, even though its database is extremely comprehensive. This limits the answers it can provide. Imagine a document that has not been cited elsewhere: it may or may not be included in full text within the system's database. If it is included, the system may say something about it based on derived information. If not, the system cannot, in principle, answer questions about it. In such cases, it may “hallucinate” and provide incorrect answers based on generalizations of information in the query. In general, the fewer sources available to the system, the poorer its performance compared to cases where abundant sources exist. This issue can be termed domain-dependent reliability, and a major problem is that users cannot know about this for the subject inquired about.
  3. Donner (2025, 795) wrote: “While the printing press produces a more or less accurate copy of human testimony, LLMs' mostly coherent text outputs are based on mixed-up and blended testimony”. This means that answer quality depends not only on input but also on the specific technology that selects and processes parts of the input to produce an output. In the case of the logical division mentioned above, the system had to manage conflicting views (the essentialist version and the revisionist version). This leads to the next point.
  4. The “universe of knowledge” should not be conceived in a positivist way—as if it were a universe of verified statements—but rather in a pluralist way, as a universe offering different perspectives, theories, and viewpoints on subject matters. In the first case above, it contains documents based on the essentialist story as well as documents opposing this myth [20]. In such cases, the system may miss one of the perspectives and provide a one-sided (or incorrect) answer, as it initially did. A crucial point is that this can only be corrected if the user recognizes the problem, which requires detailed subject knowledge.
  5. Donner (2025, 795) noted: “LLMs operate by predicting the likelihood of a sequence of words based on statistical patterns learned from large text datasets”. This technique means that LLMs do not normally base their answers on specific sources, but on a static sample of mixed sources. ChatGPT-5 often provides bibliographic references and now has the ability to verify them (e.g., in WorldCat and ERIC), though it does not always use this feature. In principle, bibliographic accuracy is therefore less of a problem due to its RAG-like capabilities, which we must expect to be improved in the future. Therefore, the central issue is not whether the references are verifiable but whether the claims in its answers correspond directly to the contents of the cited sources. Case 3 on functional classification clearly demonstrated that this is not always the case. The linking process is non-transparent, based on untraceable reasoning processes, and references are often inappropriate for the answer, unsupported, or may even be contradictory to the sources. This contrasts with scholarly literature, where linking is based on direct quotes or paraphrases with precise analysis, interpretation, and references (though even scholarly practice sometimes fails here). This seems to be a fundamental problem with the technique and suggests that the dream of providing information rather than documents may be deeply mistaken.
  6. A comment should be made about “cognitive authority” in relation to the references. An anonymous reviewer wrote: “It seems to me, however, that the leveraging of cognitive authority is the primary raison d'être for the use of references to indicate sources of information in documents”. This is correct if we, following Kuhn (1962), recognize that cognitive authorities are theory- and paradigm-dependent, a view which is unfortunately absent in the literature on cognitive authority. Therefore, simplistic or mechanical solutions, such as demanding references derived from, for example, journals with high journal impact factors (JIF) or highly respected encyclopedias like the Stanford Encyclopedia of Philosophy, are not sufficient. References must be used precisely to support the claims made and recognizing their wider theoretical context. In the examples above, ChatGPT is often referring to “cognitive authorities” in the meaning of well-known researchers in the field of inquiry but still fails to provide sufficient support for the claims made.

The four cases provided above do not compare ChatGPT's performance to a human-produced strategy for document retrieval. In cases involving explicit search strategies, such as systematic reviews in evidence-based medicine, it is possible to ask ChatGPT to apply a predefined list of inclusion and exclusion criteria. Such searches are more “mechanical” and therefore easier for LLMs to learn. In medical informatics, there is today a focus on examining LLMs' performance in doing systematic reviews (or assisting researchers in doing them) but results so far support the claims in the present article [21, 22, 23].

A final consideration is that the coverage of ChatGPT—its indexed sources—is not published, not even broadly described [24]. Users and researchers therefore cannot know ChatGPT's omissions and weaknesses, which adds to its nature as a “black box”. This contrasts with the scholarly database tradition of publishing lists of indexed journals and sources (with exceptions such as Google Scholar), which allows users to adapt their search strategies to the database's coverage.

[top of entry]

4. Conclusion

Why has the IS community so stubbornly adhered to a problematic terminology? Spang-Hanssen (2001) suggested that professional prestige, rather than theoretical arguments, may be the reason:

The word information — and combinations like information retrieval, information center — definitely contributed to raising the public opinion of library and documentation work, which is generally held to be a little dull, dusty, and distant from what is actually going on in society.

If Spang-Hanssen is right, terms such as information and IR, and the associated goal of providing direct answers, may have been selected to increase the status of the profession, with efforts to define their meaning only coming afterward [25]. This would also explain why, for a long time and across most applications of these terms, it was not considered a major issue to clarify Mooers's ambiguity — namely, the distinction between finding information recorded in documents and finding documents that contain information. This ambiguity has led to confusion, and as White (2017, 3927) observed: “When IS [information science] is defined as the study of literature-based answering, much else falls into place”.

This article allies itself to a neo-documentalist movement in IS and, as such, argues that the document perspective is important for understanding modern front-end technologies such as LLMs. The main argument has been that the input to such systems are documents containing many “paradigms” or “voices” that are important to distinguish. However, the theoretical basis of systems such as ChatGPT makes the quality of the output dependent on both the systems' training documents and queries from the users and requires appropriate expertise. This means that only users with deep subject knowledge can recognize problematic outputs and prompt the system to modify them.

What are the practical implications of this paper, and how may future research or system design operationalize its conceptual insights? Most importantly, document retrieval and knowledge synthesis from sets of documents should be distinguished as two different processes, each of which contains its own difficulties and its own sources of bias. To provide users with control over the outcome, each process must be made as transparent as possible, which, however, is not possible in ChatGPT, where these processes are fused and opaque.

The essential difference between document retrieval and IR is, as shown in this article, that in IR with ChatGPT the user has insufficient control over which documents are retrieved and how their information has been synthesized, thereby abdicating a significant part of his or her responsibility as an agent (e.g., as a researcher).

[top of entry]

Acknowledgments

I thank three anonymous peer reviewers who have contributed significantly to the improvement of this article.

Endnotes

1. Proffitt (2010) wrote that information retrieval dated from 1950, followed by information scientist in 1953, and information science in 1955.

2. Vickery (1961, 2–3) provided the expressions finding “information recorded in documents” versus “information about documents”.

3. The term fact retrieval is unfortunate, because what is retrieved is not necessarily something known or proved to be true or relevant. It shares with terms such as question answering, information extraction, knowledge-based querying, and others the underlying goal of returning contents from documents rather than the documents containing this content.

4. A chatbot is an artificial intelligence program designed to simulate conversation with human users, especially over the Internet (cf. Adamopoulou and Moussiades 2020, 1).

5. Lookup searches are described by Marchionini (2006, 42).

6. Dousa (2010, 20) provides a quote from Kaiser (1911) strongly similar to the one cited from Spärck Jones: “Because, Kaiser argued, men of affairs are interested in the information contained in documents rather than in the documents themselves, ‘we must try to dissociate information from literature’ and so render it more accessible to its potential users [Kaiser 1911, §83]”.

7. The first documentation movement was developed in the late 19th and early 20th centuries and led by figures such as Paul Otlet and Henri La Fontaine (first generation) and Suzanne Briet (second generation). The neo-documentalist movement is a contemporary development, initiated in the 1990s, with Michael Buckland as a leading figure.

8. Buckland (1991, 351) wrote: “The term ‘information’ is also used attributively for objects, such as data and documents, that are referred to as ‘information’ because they are regarded as being informative”.

9. The hermeneutic circle is a process of interpretation in which we continually move between smaller and larger units of meaning to determine the meaning of both. For a hermeneutical understanding of facts, see Caputo (2018, 4–10).

10. Spärck Jones and Kay's (1973) view that a document-retrieval system is nothing more than a kind of fact-retrieval system does not correspond to the way these terms were expressed in Section 1. The authors overlook Spang-Hanssen's point that fact retrieval centers keep their information sources exclusively to themselves.

11. Turtle and Croft (1990, 1) wrote: “Recent work suggests that significant improvements in retrieval performance will require techniques that, in some sense, ‘understand’ the content of documents and queries [Rijsbergen 1986; Croft 1987] and can be used to infer probable relationships between documents and queries”.

12. Wilson (1983) argued that all people know about the world outside their own lives is what others have told them. However, people do not consider all rumors to be equally reliable; only those who are considered to “know what they are talking about” become cognitive authorities.

13. A family of technologies known as retrieval-augmented generation (RAG) enables LLMs to retrieve relevant documents from an external knowledge source at query time and use those documents as context for generating answers (see Gao et al. 2024). ChatGPT itself does not use RAG but can be said to exhibit RAG-like behavior.

14. In contrast to bibliometric studies, the cases provided here cannot be replicated by other researchers because (1) ChatGPT produces personalized answers and (2) ChatGPT models are continuously updated, often silently, so that a specific version, say 4.1, is a model family rather than a static database. The case studies do not claim to be representative, also because of domain-dependent reliability, as presented in Section 3.

15. ChatGPT-5 was released on August 7, 2025. On August 9, I made the same query using the new ChatGPT-5. Again, I first received one answer, but after I informed it about the debate on the “essentialism story”, the system modified its answer.

16. In a ChatGPT-5 chat on November 9, 2025, I asked: “I would like you to expand on (1) how you use personalization/customization. […] For example, when I asked about functional classification, you mentioned Ingetraut Dahlberg. I have a feeling that this was because I am connected to the community that she established”. ChatGPT confirmed that this answer was given because it knew that I was a knowledge organization scholar, but that it could also answer from the perspective of a biologist or an archival scholar if asked to (or if asked not to use the information it has stored about me). Also, prior prompts within the same chat session may influence subsequent responses, insofar as ChatGPT maintains conversational context. However, the nature and extent of such influence are not transparent to the user and cannot be systematically controlled or documented.

17. It is outside the scope of this paper to present and discuss Dahlberg's concept theory (there was a comprehensive discussion of it in German in 1996 in the journal Ethik und Sozialwissenschaften, which unfortunately has not been considered in the English-language literature). Dahlberg (1978) used the phrase “referent-oriented, analytic concept theory”, which is a better characterization of her theoretical position.

18. Among the theories listed by ChatGPT as belonging to 001.01 are theories of knowledge in library and information science (e.g., Hjørland's domain analysis, facet analysis, information-seeking models), Mannheim's sociology of knowledge, science and technology studies (STS), Kuhn's paradigms, and Latour's actor-network theory.

19. ChatGPT wrote: “Upon reviewing available official documentation of DDC 23, including the printed volumes of schedules and the Manual (Volume 1 and its guidance sections), there is no explicit note in the Manual that directly compares 001.01 with 121 in the way I described”.

20. The essentialism story was especially popularized by Ernst Mayr in the mid-20th century (Mayr 1976; 1982). Critiques of the essentialism story were raised by Mary P. Winsor (2003; 2006), Staffan Müller-Wille (2007; 2011; 2013), and others.

21. Bourgeois and Ellingson (2025) found: “[LLMs] should not be given agency to complete complex tasks, such as generating systematic review search strategies, independently. As shown by our research, which corroborates an already robust body of scholarly literature, the results tend to be simplistic and incomplete. We do not use the trendy term ‘slop,’ though any review published using only AI search strategies would clearly have major issues”.

22. Chen et al. (2025) found: “We found that the LLM-FA [fully automated] approach retrieved less than 30% of papers in the original studies, while the LLM-SA [assisted semi-automated] approach retrieved over 80%. In addition, LLM-SA correctly excluded 92.2% of irrelevant papers. These findings highlight that although LLMs are not yet capable of autonomously and independently selecting papers in systematic reviews, the LLM-SA approach shows considerable promise in enhancing and streamlining the systematic review process”.

23. Pratte et al. (2026) found that discrepancies between results from SRMAs [systematic reviews and meta-analyses] and LLMs were due to differences in study selection rather than to model hallucination or misinterpretation and concluded (2026, 7): “We are at the beginning of a new era in medical research, where LLMs have shown early potential to rapidly and accurately synthesize clinical evidence. While they are not yet ready to replace SRMAs, their speed, scalability, and emerging accuracy suggest a valuable role as adjunctive tools”.

24. The public has some idea of the sources used by systems like ChatGPT through copyright owners' lawsuits against AI companies, for example, the one raised by The New York Times against OpenAI and Microsoft.

25. Shannon's (1948) so-called information theory was also an important background for the strong influence of the information concept (cf. endnote 1). Although this theory later lost influence in information science, it is an important prerequisite for the development of systems like ChatGPT.

[top of entry]

References

Adamopoulou, E. and Moussiades, L. (2020). Chatbots: History, technology, and applications. Machine Learning with Applications, 2, 100006. https://doi.org/10.1016/j.mlwa.2020.100006

Bourgeois, J. P. and Ellingson, H. (2025). Ability of ChatGPT to generate systematic review search strategies compared to a published search strategy. Medical Reference Services Quarterly, 44(3), 279–291. https://doi.org/10.1080/02763869.2025.2537075

Broadfield, A. (1946). The philosophy of classification. Grafton.

Buckland, M. K. (1991). Information as thing. Journal of the American Society for Information Science, 42(5), 351–360. https://doi.org/10.1002/(SICI)1097-4571(199106)42:5<351::AID-ASI5>3.0.CO;2-3

Cain, A. J. (1958). Logic and memory in Linnaeus's system of taxonomy. Proceedings of the Linnean Society of London, 169(1–2), 144–163. https://doi.org/10.1111/j.1095-8312.1958.tb00819.x

Caputo, J. D. (2018). Hermeneutics: Facts and interpretation in the age of information. Penguin Random House.

Chalmers, A. F. (1982). What is this thing called science? An assessment of the nature and status of science and its methods (2nd ed.). Hackett Publishing Company.

Chen, H., Jiang, Z., Liu, X., Xue, C. C., Yew, S. M. E., Sheng, B., Zheng, Y. F., Wang, X., Wu, Y., Sivaprasad, S., Wong, T. Y., Chaudhary, V., & Tham, Y. C. (2025). Can large language models fully automate or partially assist paper selection in systematic reviews? British Journal of Ophthalmology, 109(8), 962–966. https://doi.org/10.1136/bjo-2024-326254

Croft, W. B. (1987). Approaches to intelligent information retrieval. Information Processing & Management, 23(4), 249–254. https://doi.org/10.1016/0306-4573(87)90016-1

Dahlberg, I. (1978). A referent-oriented, analytical concept theory for INTERCONCEPT. International Classification, 5(3), 142–151. https://doi.org/10.5771/0943-7444-1978-3-142

Donner, S. (2025). What kind of documents can AI language model outputs be? The concept of artificially blended testimony. Journal of Documentation, 81(3), 788–809. https://doi.org/10.1108/JD-11-2024-0271

Dousa, T. M. (2010). Facts and frameworks in Paul Otlet's and Julius Otto Kaiser's theories of knowledge organization. Bulletin of the American Society for Information Science and Technology, 36(2), 19–25. https://doi.org/10.1002/bult.2010.1720360208

Frohmann, B. (2008). The role of facts in Paul Otlet's modernist project of documentation. In W. B. Rayward (Ed.), European modernism and the information society: Informing the present, understanding the past (pp. 75–88). Ashgate.

Furner, J. (2016). “Data”: The data. In M. Kelly & J. Bielby (Eds.), Information cultures in the digital age: A festschrift in honor of Raphael Capurro (pp. 287–306). Springer.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv: 1–21. https://arxiv.org/abs/2312.10997

Gelfert, A. (2010). Kant and the Enlightenment's contribution to social epistemology. Episteme, 7(1), 79–99. https://doi.org/10.3366/E1742360009000823

Hjørland, B. (2012). Methods for evaluating information sources: An annotated catalogue. Journal of Information Science, 38(3), 258–268. https://doi.org/10.1177/0165551512439178

Hjørland, B. (2025). Semantic primitives and compositionality: An annual review of information science and technology (ARIST) paper. Journal of the Association for Information Science and Technology, 77, 198–223. https://doi.org/10.1002/asi.70011. Also in ISKO Encyclopedia of Knowledge Organization, https://www.isko.org/cyclo/primitives.htm.

Kaiser, J. (1911). Systematic indexing. Isaac Pitman & Sons.

Kochen, M. (1974). Principles of information retrieval. Melville.

Kuhn, T. S. (1962). The structure of scientific revolutions. University of Chicago Press.

Kwon, H. (2016). On the social epistemological nature of questions: A comparison of knowledge domain's question formulations on the topic “memory” (Doctoral dissertation, Rutgers University). Rutgers University Libraries. https://rucore.libraries.rutgers.edu/rutgers-lib/50032/

Lancaster, F. W. (1968). Information retrieval systems: Characteristics, testing and evaluation. Wiley.

Mai, J.-E. (2009). The boundaries of classification. Advances in Classification Research Online, 20, 1–7. https://doi.org/10.7152/acro.v20i1.12887

Marchionini, G. (2006). Exploratory search: From finding to understanding. Communications of the ACM, 49(4), 41–46. https://doi.org/10.1145/1121949.1121979

Mayr, E. (1976). Evolution and the diversity of life: Selected essays. Belknap Press of Harvard University Press.

Mayr, E. (1982). The growth of biological thought: Diversity, evolution, and inheritance. Belknap Press of Harvard University Press.

Mooers, C. N. (1950). The theory of digital handling of non-numerical information and its implications to machine economics (Zator Technical Bulletin no. 48). https://babel.hathitrust.org/cgi/pt?id=mdp.39015034570583&seq=4

Müller-Wille, S. (2007). Collection and collation: Theory and practice of Linnaean botany. Studies in History and Philosophy of Biological and Biomedical Sciences, 38(3), 541–562. https://doi.org/10.1016/j.shpsc.2007.06.010

Müller-Wille, S. (2011). Making sense of essentialism. Critical Quarterly, 53(4), 61–67. https://doi.org/10.1111/j.1467-8705.2011.02022.x

Müller-Wille, S. (2013). Systems and how Linnaeus looked at them in retrospect. Annals of Science, 70(3), 305–317. https://doi.org/10.1080/00033790.2013.783109

Pratte, M., Thirukumar, S., Zhang, C., Slessarev, M., Basmaji, J. and Prager, R. (2026). Can large language models approximate the results of meta-analyses in critical care? A meta-research study. Journal of Critical Care, 92, 155358. https://doi.org/10.1016/j.jcrc.2025.155358

Proffitt, M. (2010). Information. In Word stories, Oxford English Dictionary. Oxford University Press. https://www.oed.com/discover/information/

Richards, R. A. (2016). Biological classification: A philosophical introduction. Cambridge University Press.

Rijsbergen, C. J. van (1979). Information retrieval (2nd ed.). Butterworths. https://openlib.org/...

Rijsbergen, C. J. van (1986). A non-classical logic for information. The Computer Journal, 29(6), 481–485. https://doi.org/10.1093/comjnl/29.6.481

Rijsbergen, C. J. van, & Lalmas, M. (1996). Information calculus for information retrieval. Journal of the American Society for Information Science, 47(5), 385–398. https://doi.org/10.1002/(SICI)1097-4571(199605)47:5<385::AID-ASI6>3.0.CO;2-S

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(379–423), 623–656. https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

Spang-Hanssen, H. (2001). How to teach about information as related to documentation. Human IT, 5(1), 125–143. http://etjanst.hb.se/bhs/ith/1-01/hsh.htm

Spärck Jones, K. (1987). Architecture problems in the construction of expert systems for document retrieval. In I. Wormell (Ed.), Knowledge engineering: Expert systems and information retrieval (pp. 7–33). Taylor Graham.

Spärck Jones, K., & Kay, M. (1973). Linguistics and information science. Academic Press.

Sundin, O. (2025). Theorising notions of searching, (re)sources and evaluation in the light of generative AI. Information Research, 30(CoLIS), 291–302. https://doi.org/10.47989/ir30CoLIS52258

Turtle, H., & Croft, W. B. (1990). Inference networks for document retrieval. In Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval (pp. 1–24). ACM. https://doi.org/10.1145/96749.98006

Vickery, B. C. (1961). On retrieval system theory. Butterworths.

White, H. D. (2017). Relevance in theory. In J. D. McDonald & M. Levine-Clark (Eds.), Encyclopedia of library and information sciences (Vol. 6, 4th ed., pp. 3926–3939). CRC Press.

Wilson, P. (1983). Second-hand knowledge: An inquiry into cognitive authority. Greenwood Press.

Winsor, M. P. (2003). Non-essentialist methods in pre-Darwinian taxonomy. Biology and Philosophy, 18(3), 387–400. https://doi.org/10.1023/A:1024139523966

Winsor, M. P. (2006). The creation of the essentialism story: An exercise in metahistory. History and Philosophy of the Life Sciences, 28(2), 149–174.

[top of entry]

 

Visited Hit Counter by Digits times.


Version 1.0 published 2026-03-16, last edited 2026-04-02

Article category: Theoretical concepts

This article (version 1.0) is also published in JASIST. How to cite it:
Hjørland, Birger. 2026. “Information retrieval or document retrieval? Terminological confusions and unrealistic goals in information science, exemplified in relation to generative artificial intelligence”. Journal of the Association for Information Science and Technology 77(5), 714-726. https://doi.org/10.1002/asi.70057. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, .
To quote text edited in a later version, you should save it in the Wayback Machine and cite the saved version.

CC BY 4.0