Measuring Information
Cheatsheet Content
### 1.0 Objectives - Understand difficulties in developing operational frameworks for measuring information and informational objects. - Grasp measurement features and techniques for various entities and phenomena. - Know the notion of Informativeness and its application to user-text interaction and database evaluation. ### 1.1 Introduction to Measurement - **Lord Kelvin's Insight:** "through measurement to knowledge." Measurements yield numerical values, crucial for human communication and knowledge systems. - **Simplest Form: Counting:** Applies to discrete, independent objects (e.g., 543 books, 20 lacs 92 thousand 5 hundred unemployed persons). - **Challenges:** Many entities (water, electricity, performance, information) are not in discrete unitary form and cannot be counted directly. - **Indirect Measurement:** For non-discrete items, properties like volume, weight, density, viscosity are measured (e.g., 2 liters of water, 200 grams of water). - **Intangible Measurement:** For concepts like knowledge or understanding, physical devices are insufficient. Mostly conceptual setups and logical processes are used. ### 1.2 Information Revisited - **Problem:** No universally accepted scales/devices for information measurement till AD 2000. Need to understand information's nature and parameters. - **Information Definition:** Information means different things to different people but is accepted as a basic entity. It relates to knowledge, wisdom, and data. #### 1.2.1 Status of Information - **Pervasiveness:** Information is everywhere, similar to energy or gravitation, yet subtler. - **Definition Challenge:** No workable general definition applicable to all types of information processes (multimedia, books, messages). - **Scientific Soundness:** Unlike matter or energy, information lacks an unambiguous, literal description and mathematical relations in a scientific sense. #### 1.2.2 Three Attitudes toward Information 1. **Human Context:** Information is relevant only in human cognition and communication; no "physics of information." 2. **Organization/Orientation:** Information expresses organization or orientation of elements in a set; indicates change in pattern. 3. **Action-Oriented:** Information is considered in terms of response to a stimulus; gives an idea of system activity. #### 1.2.3 Information in Discourse - **Deer's Requirements:** For information to be defined, it must: - Be a representation (abstract and meaningful). - Consist of made determinations. - Be made of certain objects. #### 1.2.4 Defining Information - **Belkin (1978):** Information as the 'structure' of any text, changing recipient's image structure. - **Mackay (1969):** Information does logical work on system organization. - **Brookes' Definition (Modified):** Information is an entity manifesting communicable knowledge, reducing uncertainty, and switching action/pattern of a system. Semantic in essence, shared meaning is crucial. - **Conditions for Information:** - Communication (material system, human beings as generator/receiver/preserver) is necessary. - Information processes are embedded in energetic processes (energy change occurs), but no direct linear relationship between energy and information amount. #### 1.2.5 Informetrics and Information Measurement - **Informetrics:** Quantitative (and qualitative) accounting of information processes; includes bibliometrics. - **Bibliometrics:** Understanding knowledge diffusion, growth/decay of subject fields, author productivity through word counts, journal rankings, bibliographic references, theorem counts. - **Implicit Hypothesis:** Counted items are linearly related to cognitive knowledge/information. ### Informetrics Challenges Informetrics faces several foundational and practical challenges in its pursuit of understanding information phenomena: - **Information Unit Definition:** A critical challenge lies in selecting the appropriate "information unit" for bibliometric analysis. This choice directly impacts the ability to accurately model and relate to the underlying cognitive flow and knowledge transfer processes. - **Terminology Debates:** The field is characterized by ongoing debates and overlapping definitions for terms such as "bibliometrics," "librametrics," "informetrics," "scientometrics," and "cybermetrics." "Sciento-informetrics" has emerged as a composite term to encompass these interdisciplinary areas. - **Measurement Ambiguity:** The measurement processes within sciento-informetrics are not inherently foolproof. They necessitate the development of robust theoretical models that can effectively connect observable parameters (e.g., citations, publications) to meaningful and interpretable results, moving beyond mere quantitative counts. - **Spiral of Cumulative Knowledge:** Informetrics often operates within a "spiral of cumulative knowledge." This involves an iterative process of building models based on initial intuition and observed empirical regularities, which then leads to model refinement, deeper theoretical understanding, and the discovery of new regularities. - **Informetric Laws:** The field has identified several well-established regularities, often referred to as informetric laws: - **Lotka's Law:** Describes the frequency of publication by scientists, stating that the number of scientists publishing 'n' papers is proportional to the inverse of 'n' (or $1/n^k$ for some exponent k). - **Zipf's Law:** Pertains to the rank of word occurrence in a text, suggesting that the frequency of any word is inversely proportional to its rank in the frequency table. - **Bradford's Law:** Illustrates the relationship between journal rank and the cumulative number of articles on a specific subject, indicating that a small core of journals accounts for a large proportion of relevant articles. - **Zipfian Distribution:** This is a generalization that encompasses these laws, describing power-law distributions where the frequency of an item is inversely proportional to its rank. However, it's important to note that these are not always strict statistical regularities across all datasets. - **Conceptual Issues:** Fundamental conceptual issues, as highlighted by Bookstein, question the very underpinnings guiding data collection. This includes challenges in defining concepts like "core journals," distinguishing a "word" in different contexts, and delineating areas of "specialization" in research. ### 1.3 Framework for Information Exchange - **K-structure & K-state:** Every information system has an inherent K-structure (subjective knowledge/knowledge within) and exists in a K-state at any time. - **Information Flow:** Output doesn't necessarily change K-state, but input might (if knowledge is new). K-structural activities can also change K-state. - **K-amount:** Abstract, subjective counterpart of "information amount." Can decrease (memory loss) or increase (discovery, learning). - **Communication:** Sharing meaning; incorporation of information in K-structures to change K-state. Equivalent K-state changes across systems mean same information amount. - **Information Coding:** Information is coded in symbols, transferred via channels, or preserved in material bodies (messages). - **Atomic Information Units (AIU):** Information can be shared if sharers can produce/receive semantic elements in AIUs (information-carrying concepts). - **Information Units (IU):** Larger strings of AIUs, analogous to molecular concepts. For measurement, IUs are units of information (byte, word, document, parts of document). - **Semantic Integrity:** If symbolic string breaks, semantic essence of IU is lost. AIUs cannot be divided while preserving information. - **Information/K-state:** A particular pattern/organization of information elements embedded in material/energetic elements (molecules, atoms, waves, electrons). - **Notation:** - `I`: Information - `J`: Information units - `E`: Energy - `K`: K-state - `m`: Measure of any parameter - `I(I)`: Information on/about information - `U`: Uncertainty - **Information Equations/Relations:** - `I = I₁ - I₂ = *(I₁/I₂)`: Represents composition on `p` and `r`. - `J`: AIUs (core/prominent information elements). - `j`: AIUs (supporting elements). - `m(AI) = in (*J)(*j)`: Measure of AI. - `m(ap ar)`: Values of AIUs equivalent to J and j. - **Output ΔI:** `K → K`, if `ΔK` is output, `ΔI ≥ 0`. No K change if `ΔK` is output. - **Input ΔI:** `K → K + ΔK`, `ΔK ≥ 0`, `ΔI > 0`. K changes if input knowledge is new (`ΔK > 0`), or not (`ΔK = 0`). - **Energy Relation:** `ΔK = f(E), ΔI = g(E)`. If `ΔI > 0`, `ΔE >> 0`, `ΔK ≥ 0`. Change in information/knowledge corresponds to change in energy. - **Meta-information:** `m(I(I)) ### 1.4 Measurement - **Definition:** Assigning numbers to observed phenomena based on rules (not random). Enhances information value, reduces uncertainty. - **Formal Definition:** Quantified observation of an attribute of an object/process. Mathematically, a one-to-one mapping from empirical observations to the real number line. - **Key Principle:** Only observable parameters can be measured. Mapping preserves relationships and operations. #### 1.4.1 Requisite Conditions of Measurement - **Context-Dependent:** Measurement varies for different phenomena (time, drug action, student performance, market value). - **Conditions:** - **Energetic Process:** Requires effort, expense of energy. Stimulus-response complex affects system, generates energy fluctuation (increases entropy), reduces precision, increases noise. - **Interference:** Measurement interferes with the system, leading to imprecision and non-repeatability. - **Well-defined Parameter:** Must be made on a well-defined parameter with exact expectation of outcome and meaning. - **Datum Expression:** Must be expressed as a datum (numerical value) following a 'code', 'scale', 'unit', 'point of reference' (zero). Requires a well-developed frame of reference, often linear. - **Tools & Process:** Requires a device/equipment/tool and a step-by-step measuring process. - **General Facts:** All measurements are (i) arbitrary, (ii) approximate, (iii) relative, and (iv) a 'validation' and 'evaluation.' #### 1.5 Scales of Measurement - **Definition:** A mathematical homomorphism between relational systems. - **n-dimensional scale:** Homomorphism from an empirical relation system into an n-dimensional numerical relation system. - **Linear/One-dimensional Scale:** A mapping from empirical observations to the real number system. - **Types of Scales:** - **Nominal (Category/Qualitative) Scale:** `f` is nominal if transformation is a one-to-one mapping from R to R. Ordered/unordered words/numbers identifying/describing observables (e.g., good, fair, average). Qualitative assignments can be numerical (e.g., 10, 6, 3, 0). - **Ordinal Scale:** `f` is ordinal if transformation is monotonically increasing continuous mapping from R to R. Examples: ratings, rankings, numerical rolls (accession numbers, competition scores). Numbers identify, lack quantitative significance. - **Interval Scale:** `f` is an interval scale if transformation is a positive linear transformation. Examples: physical measurements like temperature, time, distance. Numbers denote parameter values, not just identification. - **Difference Scale:** `f` is a difference scale if transformation differs from `f` by a constant `C`. Has a unique unit of measure but arbitrary origin. - **Ratio Scale:** `f` is a ratio scale if transformation is a positive multiple (similarity). Examples: sizes, weights, "three times more informative." Has unique origin but arbitrary unit of measure. Zero values (e.g., zero informativeness) have unique meaning. - **Absolute Scale:** `f` is an absolute scale if transformation is the identity transformation. Applies to counting (e.g., number of words, sentences, articles, books). ### 1.6 Measurement Techniques - **Variety:** Techniques differ widely across subject areas. - **Physical Sciences:** Physical scales and devices used. Results observed optically (length, angle, temperature, time). Numerical values displayed digitally (LCD). - **Comparison/Equivalence:** Measurement by comparison to a standard unit (e.g., volume, weight). #### 1.6.1 Utility of Measurement - **Key Uses:** Identification, Comparison, Value judgment, Classification, and enhanced understanding. - **Information Measurement:** Crucial for understanding information state, system structure, and user's decoding capacity. - **Sciento-informetrics:** Utility best seen in exploiting bibliographic data. - **Bibliographic Data:** Represents socio-cultural output. When recorded, folk information becomes bibliographic data. Manipulation of this data helps understand socio-cultural activities. Applied to scientific/technological activities via bibliometrics. #### 1.6.2 Counting and Measurement - **Counting (Enumeration):** Possible only for discrete objects (human beings, grains, stones). Not for non-discrete items like water. - **Measurement:** Requires a device and a scale defining the unit. - **Counting vs. Measurement:** Counting is primitive. In sciento-informetrics, bibliographic data is mostly counted. #### 1.6.3 Classification and Counting - **Counting Condition:** Possible only if an item has a unitary property uniquely defined over a set of countable items. - **Sub-classification:** Counting items in a sub-class (e.g., cows from animals) measures the speciating parameter. - **Semantic Challenge:** For information, words are lowest semantic units. Problem arises with synonyms and when words combine into sentences/documents. ### 1.8 Information Objects and Content - **Buckland's View:** Information exists as 'thing,' 'process,' or 'knowledge.' Recorded information is 'information as thing' (informational objects). - **Informational Objects:** Countable items, from whole documents (book, CD-ROM) to alphanumeric symbols or bits. - **Content Measurement:** Amount of message (information) is measured by number of such objects (e.g., 32 megabytes, 3000 words). #### 1.8.1 Information Content - **Semantic Value:** Counting informational objects doesn't give a true picture of information content. Semantic value (meaning) is the key. - **Challenge:** No unit for semantic value. Words are lowest semantic units, but synonyms and language differences pose problems. - **Combination Problem:** No uniform rule for combining semantic values of words into sentences/texts. Semantic values are not additive and don't follow uniform rules. - **Classification Approach:** Classificationists assign subject components hierarchically to represent thought content. But classification numbers only represent nature, not total amount or semantic value. #### 1.8.2 Range of Information Content - **Debate:** What constitutes information content in transmission/exchange? All components (details, redundancies) or just indicative package? - **Example: Euclidean Geometry:** Sending all axioms/theorems vs. sending only axioms and methods. Are these messages informationally equivalent? - **Conceptual Contraction:** A new term replacing many terms (e.g., a single term having equivalent information content to a chain of description). - **Attempts to Overcome:** Response to stimulus, uncertainty reduction, change of informational status/state, assessment of informativeness. ### 1.9 Informativeness - **Definition (Tague-Sutcliffe):** Intangible aspect of interaction between text and reader. Reader's benefit determines usefulness/information content. - **Dynamic Concept:** Informativeness is dynamic, similar to relevance (Schamber, Eisenberg, Nellan). Depends on user judgments, quality of information item, and user's information need. - **Measurable Relevance:** Degree of relevance can be quantified. Related to "information wanted" and "informativeness." - **Contextual:** Informativeness measurement depends on database development, item organization, retrieval, and user interaction. - **User-centric:** A text is informative to a user only if it adds to their K-store (`K + I → K > K`). Informativeness cannot be negative; it can be zero (non-relevant). - **Properties of Informativeness:** 1. **Non-negative:** A non-negative number associated with user-record interaction. Varies from user to user. 2. **Indirect Measurement:** Cannot be measured directly. User's ranking of records (relative to information amount) preserves ordering. 3. **Non-commutative/Non-additive:** Concatenation (reading/using records sequentially) is not commutative or additive. Order affects informativeness ranking. - **Assumptions:** - Two texts T1, T2 related by content. - Two clients C1, C2 with similar background knowledge. - C1 reads T1 then T2; C2 reads T2 then T1. - Judgments of I(T1), I(T2) may differ. - **Non-commutativity in concatenation:** If texts are different subjects, pure additivity may hold. `I(T1), I(T2) ~ I(T2), I(T1)` if similar topic. - **Non-cumulativeness in concatenation:** If texts are different subjects, pure additivity may hold. `I(T1) * I(T2) = I(T2) * I(T1) = I(T1) + I(T2) = I(T2) + I(T1)`. 4. **Invariance under Granularity/Partitioning:** Total informativeness is invariant if sum of partitions equals whole record, provided sequence of partitions is maintained (subsequence condition). 5. **Decreasing Informativeness:** When records are ordered by non-increasing user relevance, informativeness of a subsequence is proportional to the logarithm of number of records in subsequence. Effectivity decreases with use (knowledge acquisition). Fallacy: Only holds if records have related topical content. 6. **Exponential Decrease:** Probability of acquiring new information decreases exponentially. `I(T)a = k * n^b`. Logarithm straightens this to direct proportionality. 7. **Information Service Response:** Informativeness of an information service response relates to completeness and ordering of retrieved records compared to an ideal chain satisfying user need. - **Query Categorization:** By number of records in ideal chain, alternative ideal chains, and variation across users. Short-answer queries have stable ideal chains; major research queries vary. - **Service Organization:** Informativeness of an information service organization measures extent to which it satisfies user needs. - **Note:** Informativeness measure must be aggregatable (accumulation/cumulation) over records/users for evaluating service organizations. #### 1.9.1 Use of Informativeness Measure - **Applications:** 1. **Collection Development/Assessment:** Comparing user assessment of informative values across databases or over time. Assessing a single document's long-term information provision. 2. **Document Description:** Assessing effectiveness of indexing, abstracting, classifying, cataloguing for different user groups. 3. **Retrieval Processes:** Estimating information from retrieved documents/records using a search strategy. Assessing useful/pertinent information missed. Evaluating retrieval efficiencies of different strategies. 4. **Repackaging/Consolidation:** Evaluating against usefulness and original items. Important for knowledge management, content analysis. #### 1.9.2 Observations 1. **Relative Measure:** Informativeness is relative, not universal/absolute. 2. **Dependencies:** Informativeness depends on: - Topical content/thought content. - Informational state/K-state of clients. - Concatenation/sequence of interaction. - Amount of information content (hypothesized but not directly measurable). - Commonality/disjointedness of I(T)s. - Informativeness I of texts is not additive unless I(T)s are disjointed. - `C₁: K₁ + I(T₁) → K₁ + [ΔK]`: `ΔK` is absolute information amount of Text T₁. ### 1.10 Appendix: Algorithm to Determine Informative Chains - **Source:** Tague-Sutcliffe, 1995. - **Input:** `N` (number of records), and array `PREFER`. - `PREFER(I, J)`: - `1`: I is more informative than J. - `0`: I and J are equally informative. - `-1`: J is more informative than I. - `9`: I and J are incomparable. - **Algorithm Steps:** - **Determine `NUM` array:** `NUM(I)` = number of records `I` such that `T(I)` is more informative than `T`. - **Recalculate `NUM`?** `Q1` index determines if `NUM` needs recalculation due to revisions. - **Initialize:** `Q1=1`, `NUM` array to zeros. - **Outer Loop (I):** Iterate from `1` to `N`. - **Inner Loop (J):** Iterate from `I` to `N`. - **`PREFER` Check:** If `PREFER(I, J) = 1`, increment `NUM(I)`. - **Determine non-ordinal pairs:** - `M=0` (count of non-transitive triples). - Nested loops for `J, K, H` to check for non-transitive conditions: - `PREFER(J,K)=1 AND N(J)>N(K)` - `PREFER(J,K)=-1 AND N(K)>N(J)` - `PREFER(J,K)=0 AND N(J)=N(K)` - `PREFER(J,K)=9` - If any condition is met, `GO TO LABEL (1)` (skip to next `J` iteration for `NUM` calculation). - If `PREFER(J,K)=11` and `J<>H` and `K<>H` and (`PREFER(J,H)=-1` or `PREFER(J,H)=0`) and (`PREFER(H,K)=-1` or `PREFER(H,K)=0`), increment `M` and store `J, K, H` in `NON` array. - **Revise preferences:** If `M > 0`, set `Q1=1` and print "PREFERENCES ... ARE NOT CONSISTENT. PLEASE REVISE". - **Input `PREFER` again:** For `J=1 TO M`, `K=1 TO 3`, prompt for `PREFER(A, B)`. - **Update `PREFER` based on input:** If `PREFER(A, B)=1`, set `PREFER(B, A)=-1`, etc. - **Calculate `OUTT` array:** `OUTT(I, J)` is `1` if `I`-th and `J`-th records are comparable and not assigned, `0` otherwise. - **Sorting `NUM`:** Sort `NUM` in descending order. `ID(I)` is the original number of the `I`-th sorted record. - **Determine comparable sets:** Calculate `COMPARE` matrix. `COMPARE(I, J)=1` if `J`-th record is in `I`-th comparable set, `0` otherwise. - **Determine first comparable set:** - `P=1`. `K1=ID(1)`. `COMPARE(P, K1)=1`. `OUTT(K1, K1)=0`. - Loop `J=2 TO N`. `K2=ID(J)`. - If `PREFER(K1, K2)=9`, `TEST=0`. Else `TEST=1`. - Loop `K=1 TO J-1`. `K3=ID(K)`. - If `COMPARE(P, K3)=1` and `PREFER(K3, K2)=9`, `TEST=0`. - If `TEST=1`, `COMPARE(P, K2)=1`. `OUTT(K1, K2)=0`. `OUTT(K2, K1)=0`. - **Iteratively determine all pairs:** Loop `I=1 TO N`, `J=1 TO I`. `K1=ID(I)`, `K2=ID(J)`. - If `OUTT(K1, K2)=1`, then `COMPARE(P, K1)=1`, `COMPARE(P, K2)=1`. `OUTT(K1, K2)=0`, `OUTT(K2, K1)=0`. - Loop `H=1 TO N`. `K3=ID(H)`. - If `PREFER(K1, K3)=9` or `PREFER(K2, K3)=9`, `TEST=0`. Else `TEST=1`. - **Redetermine `NUM` for `I`-th chain:** Set `NUM(J)=0` for `J=1 TO N`. - If `COMPARE(I, J)=0`, `NUM(J)=-1`. - Loop `K=1 TO N`. If `COMPARE(I, K)=1` and `PREFER(J, K)=1`, increment `NUM(J)`. - **Sort `NUM` in descending order.** - **Determine chains:** `EDGE(I, J)=K` if `J`-th record is in `K`-th edge of `I`-th chain, `0` otherwise. - **Assign records to edges:** `NUM(N+1)=-1`. `R=0`. - Loop `J=N TO 1 STEP-1`. If `NUM(J)<>NUM(J+1)`, increment `R`. `EDGE(I, ID(J))=R`. ### 1.11 Summary - **Measurement of Information:** Not directly possible due to lack of definition and effective devices. - **Informetrics:** Studies quantitative aspects of informational processes; needs clear understanding of measurement. - **Information Concepts:** Reintroduced information from various viewpoints. Information systems have a K-structure (abstract knowledge/information structure) and a K-state (active memory). - **K-state Dynamics:** K-state changes with information input/output (if new/relevant). - **Information Measurement:** Measure information through atomic information units (semantic, unique meaning). - **Informetrics Components:** Bibliometrics, scientometrics, librametrics are components of "sciento-informetrics." Crucial to define "information units." - **Measurement Definition:** Assigning numbers/quantified tags to parameters based on units. It's an energetic process, interfering with the system, expressed as a "datum." - **Scales:** Essential for measurement. Various types: nominal, ordinal, interval, difference, ratio, absolute. - **Techniques:** Differ; classification and counting are simple universal means. Measurements need standardization for accuracy, stability, reliability, validity. - **Informativeness:** Discussed as a key concept. Very useful for measuring effectiveness of information services, following Tague-Sutcliffe's seven properties. - **Seven Properties of Informativeness:** Non-negative, not additive, not directly measurable, invariant under granularity, decreasing with use, exponential decrease, and related to information service response.