I S K O

edited by Birger Hjørland and Claudio Gnoli

 

Statistical classification

Preliminary editorial placeholder article; to be replaced when an author is found for an improved article

Table of contents:
1. Definition
2. Examples of statistical classifications
3. Functions of statistical classifications
4. Research and development about statistical classification
Endnotes
References
Colophon

1. Definition

The term statistical classification in this article means the classification of numerical data (or sets of numerical data or documents providing numerical data, i.e., statistics in sense 1). Statistical classifications are the classifications used by, for example, national [1] or international statistical services like Statistics Denmark or Eurostat [2] for classifying their products.

It must be distinguished from the application of statistical techniques for classification data (for example, in numerical taxonomy, cluster analysis, factor analysis and multidimensional scaling, cf., Krauth 1981; 1982), despite these are described in Wikipedia under the very entry "Statistical classification". Statistics in sense 2 has been defined (Mann 2007, 2) as “a group of methods used to collect, analyze, present, and interpret data and to make decisions”. It must also be distinguished from the classification of statistics as a research field being part of the social sciences, as in DDC class 310 and its subclasses, or of → mathematics, as in DDC class 519.5 and its subclasses.

The present article is thus about the classification of statistical information, not about approaches to classification based on statistical methods (which are intended to be covered in separate articles in IEKO) nor about the classification of topics in statistics as a discipline.

United Nations Statistics Division (2013, 5) in its Best Practice Guidelines for Developing International Statistical Classifications recommended that the following definition should be used by national statistical agencies:

A statistical classification is a set of categories which may be assigned to one or more variables registered in statistical surveys or administrative files, and used in the production and dissemination of statistics. The categories are defined in terms of one or more characteristics of a particular population of units of observation. A statistical classification may have a flat, linear structure or may be hierarchically structured, such that all categories at lower levels are sub-categories of a category at the next level up. The categories at each level of the classification structure must be mutually exclusive and jointly exhaustive of all objects in the population of interest.

[top of entry]

2. Examples of statistical classifications

The statistics produced by national and international statistical services cover a wide span of different areas (Boeda 2008; 2009), for example [3],

  • Population and elections
  • Labour, income and wealth
  • Prices and consumption
  • External economy
  • Living conditions
  • Education and knowledge
  • National accounts and government finances
  • Culture and national church
  • Money and credit market
  • Business sector in general
  • Business sectors
  • Geography, environment and energy

Each of these areas are classified in the databases and publications in which they are being produced.

Examples of specialized statistical classifications include:

It is obvious that the same subject area may be classified for different purposes, for example, for bibliographic data or for statistical data. This raises the question of whether they can be and should be aligned? Should, for example, the same classification of diseases underlies both bibliographical classification systems and statistical systems? Or should the same classifications of the sciences be used for both (and more) purposes?

Some differences between bibliographic and statistical classifications can be justified by the concept of warrants: → Literary warrant (Barité 2018) for bibliographical purposes and other kinds of warrant for other purposes.

There is, in the field of → knowledge organization (KO), a clear tendency to consider classification of documents to be based on the classification of knowledge (i.e., to be based on scholarly principles in different domains, e.g., on diseases) (e.g. Bliss 1929). Therefore, the field of knowledge organization and the field of statistical classification should not be considered as separate domains, but as essentially related domains about the principles and methodologies of classification. In this connection it can be mentioned that the most influential classification of mental disorders, the → Diagnostic and Statistical Manual of Mental Disorders (DSM), which is widely used by psychiatrists and other professionals all over the world, partly evolved from systems for collecting census and psychiatric hospital statistics (as also reflected in its name).

[top of entry]

3. Functions of statistical classifications

Hancock (2017, 130) listed the following functions for statistical classifications: they

  • identify the similarity between events, ideas, people, information and concepts;
  • provide a framework to assist information and knowledge management, policy and decision making;
  • establish trends for comparison of information and data over time;
  • allow the aggregation and disaggregation of big data and complex datasets in a meaningful way foranalysis; and
  • facilitate the harmonisation and coordination of statistical information and data compilation and comparability worldwide.

These purposes are not unknown to researchers in KO and information science, and are basically identical with functions for most → knowledge organization systems.

[top of entry]

4. Research and development about statistical classification

Knüppel and Kunzler (2001) describe the influence of the Internet on data collection and dissemination in the European statistical system. Hancock (2017, 126) found:

“Traditional approaches to the development, maintenance and revision of statistical classifications no longer support or enable description of data in ways that are as useful to users as they could be. The ability to search and discover information in ways that were previously not possible means that new methodologies for managing and describing the data, and its associated metadata, are required. The development of structured lists of categories, often hierarchic in nature, based on a single concept, limited by the constraints of the printed page, statistical survey processing system needs, sequential code structures or narrow user defined scopes results in statistical classifications neither dynamically reflecting the real world of official statistics nor maintaining relevance in a fast changing information society.

The article then explores the application of semantic web technology, Simple Knowledge Organisation Systems (SKOS), Resource Description Frameworks (RDF) and other technologies and standards for use by statistical classifications. The article provides an analysis of the need to modernise approaches for developing statistical classifications. A part of Hancock’s conclusion (142) is:

Traditional statistical classifications have not yet reached a point of extinction but this paper has tried to demonstrate that demand for instant data and information is better enabled through innovative and newer ways of classifying information. Traditional hierarchies and parent-child relationships limit flexibility, and reduce the ability to better interpret and understand the data. Concepts and categories, relationship matrices and more user defined views of content need to be the norm. Thinking using the principles of SKOS or RDF frameworks is the most viable way to modernise statistical classifications and how they are used in knowledge or information management systems.

Hancock (submitted) outlines principles of metadata modelling and concept-based classification in the subfield of statistical classification of economic statistics, and presents relevant standards in the field. Roos (2010) discusses the involvement of Statistics Netherlands in the Dutch Taxonomy Project, including the application of the eXtensible Business Reporting Language (XBRL) [4] the Dutch Taxonomy. Statistics Netherlands is considered a pioneer in this field and the paper discusses the many practical and methodological difficulties that this organization has faced.

[top of entry]

Endnotes

1. A list of national statistical offices are given by United Nations Statistics Division at: https://unstats.un.org/home/nso_sites/.

2. For a listing of statistical services see: https://en.wikipedia.org/wiki/List_of_national_and_international_statistical_services.

3. These examples are taken from Statistics Denmark, https://www.dst.dk/en.

4 XBRL’s official homepage is: https://www.xbrl.org.

[top of entry]

References

Barité, Mario. 2018. “Literary warrant”. Knowledge Organization 45, no. 6: 517-536. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli. http://www.isko.org/cyclo/literary_warrant.

Bliss, Henry Evelyin. 1929. The organization of knowledge and the system of the sciences. New York: Holt.

Boeda, Michel. 2008. “Les nomenclatures statistiques: pourquoi et comment”. Courrier des statistiques, French series, no. 125, Nov.-Dec. 2008: 5-11. https://www.epsilon.insee.fr/jspui/bitstream/1/8283/1/cs125b.pdf (English translation as Boeda 2009).

Boeda, Michel. 2009. “The How and Why of Statistical Classifications“. Courrier des statistiques, English series no. 15: 3-9. https://www.semanticscholar.org/paper/... (French original as Boeda 2008).

Hancock, Andrew. 2017. “The Modernisation of Statistical Classifications in Knowledge and Information Management Systems”. The Electronic Journal of Knowledge Management 15, no. 2: 126-44). Available online at http://www.ejkm.com.

Hancock, Andrew. Submitted. “The Use of Metadata Modelling for the Modernisation of Information Management of Statistical Classifications”. Journal of the International Association for Official Statistics.

Knüppel, Wolfgang and Kunzler, Uwe. 2001. ”Influence of the Internet on Data Collection and Dissemination in the European Statistical System”. Paper presented at the IAOS Satellite Meeting on Statistics for the Information Society August 30 and 31, 2001, Tokyo, Japan. Available at http://www.stat.go.jp/english/info/meetings/iaos/pdf/knuppel.pdf.

Krauth, Joachim. 1981. “Techniques of Classification in Psychology I: Factor Analysis, Facet Analysis, Multidimensional Scaling, Latent Structure Analysis”. International Classification 8, no. 3: 126-32.

Krauth, Joachim. 1982. “Techniques of Classification in Psychology II: Cluster Analysis, Typal Analysis, Configural Frequency Analysis, Discriminant Analysis, Regression Analysis”. International Classification 9, no. 1: 1-10.

Mann, Prem S. 2007. Introductory Statistics, 6th Edition. New York: Wiley.

Roos, Marko. 2010. “Using XBRL in a Statistical Context: The Case of the Dutch Taxonomy Project”. Journal of Official Statistics 26, no.3: 559-75. https://www.scb.se/contentassets/...

United Nations Statistics Division. 2013. Best Practice Guidelines for Developing International Statistical Classifications. Retrieved from https://unstats.un.org/unsd/classifications/bestpractices/Best_practice_Nov_2013.pdf

World Health Organisation. 2010. International Statistical Classification of Diseases and Related Health Problems, 10th Revision. Retrieved from http://apps.who.int/classifications/icd10/browse/2010/en.

[top of entry]

 

Visited Hit Counter by Digits times.


Version 1.1 (= 1.0 plus reference to Hancock submitted); version 1.0 published 2020-03-04, this version 2020-03-16
Article category: KO in different contexts and applications

This editorial article is not peer-reviewed and is not being published in the journal Knowledge Organization.

©2020 ISKO. All rights reserved.