Intellectual Organization of Info
Cheatsheet Content
1.0 Objectives Understand intellectual organization of information. Distinguish intellectual from non-intellectual efforts. Perceive intellectual organization in indexing and indexing languages. Assimilate intellectual organization in user services and searches. Gain insight into current trends in computerized information retrieval systems. Note current research efforts in information retrieval (IR) systems and their future. 1.1 Introduction Information and knowledge are prime resources for human growth and socio-economic development. Information and knowledge are products of human intellect, dynamic and ever-growing. Presented in various forms: journal articles, conference papers, technical reports, theses, monographs. Access provided through indexes, abstracts, and other condensation methods. "Intellectual organization of information" refers to methods for organizing secondary communication to access primary communication. Broader perspective: organizing thoughts and ideas, from generation to final presentation, is an intellectual effort. Intellectual efforts include reference interviews, storage methods, search techniques, classification schedules, thesauri design, retrieval efficiency measures, and evaluation of indexing systems. Computers and communication technologies have widened the scope of searching databases. 1.2 Intellectual Organisation of Information (IOI) Information is a product of human intellect meant for use by others, disseminated and communicated. Productive transfer and absorption depend on IOI at every level. The process of human thoughts and ideas: Stages Mental process Formalisation Additional Process 1 Observation Ideas / Data Instrument Extra-senses 2 Organisation (Logical relation) Information Systematic Principles 3 Conference and Compaction Subject Fundamental Principles 4 Learning Assimilation Knowledge Perception, Modeling, Process 5 Judgment Correlation Application to Contexts Wisdom Sensing the Context Retrieval of Knowledge Appropriate Decision 1.2.1 Meaning of Intellectual Organisation of Information (IOI) Human intellect: power of thinking, learning, understanding, assimilating, acquiring, and organizing mental constructs for communication. IOI aims to provide easy access to primary information in various documents (print, non-print, digital). Ziman (1969) on 'information' turning into 'knowledge': Science aims for understanding, not just data accumulation. Synthesis is crucial for combining separate pieces of information into a coherent intellectual machine. Three aspects of knowledge organization: Organization by creation. Self-organization. Bibliographic organization. Indexing and abstracting are 'bibliographic organization' and 'extraction processes'. Reviewing, surveying, and consolidation are 'summation processes'. Secondary services (indexes, abstracts, bibliographies) are essential feeders to primary information communication. 1.2.2 Why IOI Is Necessary? Purpose: provide relevant information to users with speed, accuracy, and ease. Figure 1.1: Aim of intellectual organization of information A: Not Relevant, C: Relevant E: Dodged, Wasted, Retrieved F: Hits, Missed Ideal situation: search result is 100% accurate (EF and CB coincide), rarely achieved. Indexing techniques and tools aim to retrieve as many relevant documents as possible. IOI plays a crucial role in achieving the APUPA pattern: A = Aliens (unwanted parts) P = Penumbra (related parts, wider range) U = Umbra (precise parts) Indexing systems aim for ideal results. 1.3 IOI in Primary Communication Macro documents (books, monographs, treatises, textbooks) often include indexes to locate specific ideas. Indexes connect different scattered aspects of a concept (e.g., "roses" with cultivation, varieties, scent technology). By extracting substantive words/phrases, the author's thoughts and ideas are retained. Intellectual skills required: understanding the subject, choosing substantive words, establishing relationships, and displaying them appropriately in the index. 1.4 IOI in Indexing Systems Indexing work involves both intellectual and non-intellectual (clerical) parts, both equally important for retrieval efficiency. A.C. Foskett divides indexing systems into: Derived Indexing Assigned Indexing Subject approach poses challenges for accurate retrieval efficiency, unlike fact, author, or title retrieval. 1.4.1 Derived Indexing Systems Information for indexing is derived directly from the document, without adding external information. Used in printed, computerized, and online indexes. Examples: title-based systems like Catchword, KWIC, KWOC, KWAC. Citation indexing has a different philosophy. Minimal intellectual effort, suitable for computer operations (speed, efficiency, accuracy). Expert Systems Computerized systems with high intellectual capability in design and operation. Features (Foskett): Represent expert's domain-specific knowledge. Incorporate explanation processes and handle uncertainty. Pertain to symbolically represented problems. More tolerant of user errors than conventional programs. Requirements: Acknowledged expert in the subject area. Expertise based on judgment and experience. Expert willing to explain knowledge. Problem must be well bounded. Problem area must have real consensus. Test data easily available. 1.4.2 Assigned Indexing Systems Indexer assigns terms to documents for storage and retrieval. Intellectual skills required: Good subject knowledge. Ability to analyze documents and identify concepts for representation. Recognize sequence/syntactic relationship between concepts. Identify broad/narrow/semantic relationships between concepts. Once intellectual aspects are done, index entry format can be determined. Mechanical, routine operations can be relegated to computer software. Challenges in English language indexing: multiword concepts, citation order, spelling, synonyms, antonyms, homographs. Tools: Classification Systems, Subject Headings Lists, Thesauri, Thesaurofacets, Classaurus. 1.5 IOI and Indexing Languages Index languages are tools with syntactic and semantic features for indexing. Include: Classification systems, Subject Heading Lists, Thesauri, Thesurofacets. Classification schemes: organize books, display arrangement, study subject ramifications, understand concept relationships. Subject Headings Lists: understand concept relationships, used in indexing. Information Retrieval Thesauri: based on equivalence, hierarchical, and associative relationships, displayed alphabetically. Thesaurofacets: combine classificatory principles with alphabetical thesauric features, used for shelf organization and information retrieval. Natural language indexing and Free indexing languages are common in computerized databases. 1.5.1 Classification Systems Colon Classification (CC): Explicit syntactic system, systematic schedules, hierarchical and associative relationships via notation. Designed by postulates and principles (Analytico-Synthetic). Bibliographic Classification (BC), Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC): Less explicit syntactic/semantic features, systematic schedules with notation. Library of Congress Classification (LCC): Fully enumerated scheme, implicit design principles, extensive schedules. 1.5.2 Subject Headings Lists and Thesauri Subject Headings Lists: valuable for indexing, indicate conceptual relationships, include choice of terms and preference. Increasingly incorporate thesaural features. Library of Congress Subject Headings (LCSH): Best for indexing and retrieval. Sears List: Shorter version for smaller libraries. Medical Subject Headings (MESH), Subject Headings in Engineering (SHE): Specialized. Public Affairs Information Service (PAIS) Subject Headings List: Used in social sciences. Information retrieval thesauri (since 1950s): useful for index term choice and displaying concept relations. Engineering Joint Council (EJC) Thesaurus, Thesaurus of Engineering and Scientific Terms (TEST): For scientific/engineering subjects. Educational Resources Information Center (ERIC) Thesaurus: For social sciences. Thesaurofacets: Combine analytico-synthetic classification and information retrieval thesaurus features (e.g., 'Thesaurofacet of the English Electric Company', 'Root Thesaurus'). Design and construction of these tools require high intellectual competence and subject knowledge. Use requires understanding principles, but construction is a highly intellectual effort. Natural Indexing languages: Natural language of the document, terms taken directly from text. Free Indexing language: No constraints on terms, appropriate terms can be assigned regardless of document language. 1.6 IOI in User Services Purpose: effectively serve users and satisfy their information needs. Effectiveness depends on understanding users and their needs. Information seekers often lack clarity; need intellectual support (hints/cues). Requires analyzing user behavior and search style. Interaction with users (formal/informal meetings) helps clarify problems: Technical knowledge level. Organizational position. Attitudes towards subject, work, search. How retrieved documents are used. Trust in information professionals. 1.6.1 Search Aspects Users can perform searches in libraries using catalogues, printed indexes, or computer files. Users can modify searches as they progress. Familiarity with search procedures leads to successful results. All users need tailored information. Search approaches vary (e.g., nephrologist seeking surgical research vs. consumer looking for manual). Multipronged approach to searching is needed to cater to diverse user needs. Basic issues in IOI for varied physical documents: Analyze searchers' needs (knowledge level, position, attitudes, use). Searchers can be expert, semi-expert, or non-expert. Searcher response influenced by emotions, motivation, preferences. Searchers differ in information requirements (abstract, specific sections, full text, review, hard copies). Multiple searchers for a document require content organization and display for primary and secondary info. Information searching/organization must be tuned to searchers' needs (interaction/discussion useful). If one IOI cannot meet all needs, multiple organizations and displays are necessary. 1.7 IOI and Content Analysis For IOI, content of literature must first be analyzed. Contents: thoughts/ideas author communicates. Can be descriptive, explanatory, investigative, experimental, imaginative, creative. Physical media: paper-print, non-print. Content analysis is crucial for organizing information sources and providing services. 1.7.1 Meaning and Purpose of Content Analysis Meaning Content Analysis: analyzing records of human experience/knowledge; studying communication, its nature, meanings, processes, and people involved. Concerned with message phase of communication: sender's motives, message effect on audience, relationship between messages and senders. Activity following document creation; infers/interprets message content and effect on users. In library/information services: means to organize intellectual content of documents for easy user access. Purpose Varies by activity. In library/information science, it's fundamental to operations. Due to exponential growth of recorded communication, three events occurred: Information storage/retrieval became specialized, involving people from various fields. Content analysis developed as a research method. Man-machine interface developed user-friendly database systems to explore user intentions. Used for: indexing, abstracting, user studies, information product/service production. Applied in social sciences, psychology (counselling), analysis of voting patterns, newspaper reports, election manifestoes. Analysis often by experts, providing comprehensive view of opinions and facts for new interpretations. 1.8 Information Retrieval Systems – Changing Environment Discusses general aspects of IR systems, their purpose, functions, intellectual/non-intellectual components, user approach. Current trends in IR systems: Increasingly computerized IR systems (libraries switching from printed indexes/card catalogues to automation). Phenomenal growth of computer databases in all areas (academic, business, government, current affairs, finance, legal). Accessible via Internet/online services. Full texts, surrogates (indexes, abstracts), bibliographic info. Digital/multimedia versatility. Increased number of users (domain experts, laypersons, information intermediaries) making direct access. Key issue: retrieval depends on indexing and storage stages. Computerized indexes vary greatly; searchers need to recognize strengths/limitations. Common features across IR systems (CD-ROM, Internet, online search, document management, OPACs). Trend: all users (novices, domain experts, occasional users, intermediaries) use these networks. Information managers/intermediaries must be expert users, collaborating with users to understand needs and review results. 1.8.1 Search Facilities Standard search facilities in computerized IR systems. Provide access to full texts and pre-assigned keywords. Table 1.2: General Facilities Available in Computerised Information Retrieval Systems Facility Purpose Intellectual Activity Set-up, e.g. Help Set up the environment No Searching search terms To identify search terms viewing index terms from standard lists such SHL or thesauri Yes Entering search terms Searcher to entry terms No Combining search terms To develop search strategies through search logic Yes Searching fields To make choice of fields in which terms(s) appear Yes Truncation Text strings using truncations No Syntactic/semantic To set words in combination and semantic relations Yes Setting specific range To set ranges, e.g. numeric publication date No Displaying search results To Show no. of references No Displaying records To display records on the screen No Search management To review search Yes Display the thesaurus To display syntactic/semantic relations No Hyperlinks To navigate for associated terms Yes Set-up facilities: Normal system operations (Non-intellectual). Selecting terms: Intellectual activity, choosing terms from display. Entering terms: Routine operation (Non-intellectual). Combining search terms: Important IOI activity, using search logic for statements. Searching fields: Important IOI activity for precise searching in specific sections/fields. Truncation: Routine activity for wider retrieval (Non-intellectual). Syntactic/semantic relations: Intellectual activity for contextual search terms and wider results. Setting specific range: Routine activity (Non-intellectual). Displaying results (routine activities). Search management: Permits reviewing search results (Intellectual). Hyperlink: Facilitates navigation to associated terms (Non-intellectual). 1.8.2 Search Strategies Decisions taken at various stages constitute the search strategy. Strategies should be methodical and yield best results. Iterative or heuristic searches allow reviewing and modifying strategies for maximum success. Broad indications for accessing databases in computerized IR systems. 1.9 Examples Illustrating Information Storage and Retrieval Functions Examples show indexing, storage, search, retrieval, intellectual/non-intellectual efforts, typical user queries, and concepts of recall/precision, exhaustivity/specificity. Example 1: Textile research scientist Query: Research papers on chemical treatment of cotton fibers for fire-proof resistance in men's garment production. Process of Indexing: Keywords: Textile technology, Cotton fibers, Fire-proof resistance, strengthening, chemical treatment, men garments. Keywords cited according to syntactic rules of assigned indexing system. Computerized systems: keywords entered with correct spelling, using RDBMS. Additional references: associated, hierarchical, equivalence keywords using thesauri (e.g., Cotton fabrics: Woolen fibers, Synthetic fibers; Fire-proof resistance: Fire intensity, Optimum temperature). Additional "See also" references. Activities combine textile knowledge, indexing skills, standard forms (Intellectual). Storing keywords, providing additional entries are computer operations (Non-intellectual). Searching and Retrieval: Dialogue with user to ascertain specific needs. Good results if search words match indexed keywords. Boolean operators used for syntactic relations. Rest of operations can be computer-executed or manual with minimal intellectual effort. Example 2: Young mother on dyslexia Query: Information on dyslexia in Indian children at primary/secondary levels learning English. Indexing and Storing: Understanding "Dyslexia" (abnormal spelling/reading difficulties). Keywords: Education, Indian Children, Language Learning, Dyslexia, Corrective methods. Additional references: Indian Children (linguistic groups), Language learning (English, Hindi, other Indian languages). Searching and Retrieval: Same steps as Example 1. Example 3: MBA student on curriculum Query: MBA course details on Personality Development and Current socio-economic-cultural studies of India at IIPM, Delhi (2004). Context: Personality development (behavior patterns, public relations, communication skills); Socio-economic/cultural studies (state policies, fiscal measures, cultural heritage). Formulation of search statement: Keywords: Education, Post graduate professional courses, MBA, Curriculum, Personality Development, Indian Socio-Economic-Cultural Studies, IIPM. Additional references: MBA Curriculum (Personality Development, Public Relations, Communication Skills); Indian socio-economic-cultural studies (State policies, Fiscal Measures, Budget, Growth Rate, Cultural heritage); IIPM (IIM, Business Schools, Management Institutions). Sources: databases, websites, teacher/past student contacts. Example 4: Storywriter on Cinderella story Query: Equivalent of Cinderella story in Indian folk tales. Context: Cinderella (heroine oppressed by stepmother, finds happiness with fairy godmother). Search statement: Folk tales: Anthologies of folk tales, English translations, Indian language, Regional folk tales, Indian fairy tales, Cinderella story. Additional references: Literature (Fiction, Short Stories); Anthologies of folk tales (Fairy tales); Indian language (Folk tales); Cinderella stories (Translations). Recall and Precision Measures Recall ratio: (Relevant hits / Total relevant references in database) * 100. Example: 40 relevant hits out of 100 total relevant references = 40%. Total relevant references are often assumed, making recall subjective. Precision ratio: (Relevant hits / Total retrieved references) * 100. Example: 5 relevant hits out of 40 retrieved references = 12.5%. Exhaustivity: Expand relevance by using broader terms. Precision: Get precision by introducing narrower terms and fixing context. 1.10 Future Trends Active research to refine/develop efficient IR systems for best user results. Directions of research: Better Systems Design Improve matching of document descriptors with query descriptors. Research in new search methods (beyond Boolean logic). Aim for optimum retrieval efficiency (relevant hits, reduced irrelevant hits). Focus on speed of retrieval. Improved Retrieval Facilities and Strategies Improve efficiency/effectiveness, storage requirements, retrieval speed. Overcome inverted file limitations by developing fast text scanning algorithms. Hardware-based solutions for text-scanning speed. Human-Computer Interface Develop user-friendly, self-explanatory, intermediary computer systems. Stimulate best-match searching using knowledge-based/AI techniques. Ranganathan's Ideas and Information Architecture New research trend (since 1991): reawakening of Ranganathan's ideas for faceted classification systems. Information Architects apply Ranganathan's contributions (Laws, Principles, Categories, Canons, Postulates) to Net/Web-based services. Colon Classification (faceted, analytico-synthetic) is seen as ideal for improving web-based searches. Continuous research involving domain experts, information specialists, software/hardware experts is essential. Many efforts are intellectual; operational parts can be handled by technology once design is stable. 1.11 Summary Unit identifies intellectual/non-intellectual activities for efficient IR systems. Highlights importance of retrieving relevant information from various networks. Discusses measures of retrieval efficiency. Contextualizes concepts within computerized systems, networks, and user approaches. Emphasizes continuous research for maximum IR system efficiency. 1.12 Answers to Self Check Exercises 1) Intellectual organization of information: Human Intellect: power of thinking, learning, understanding, assimilating, acquiring, and organizing mental constructs for communication. In IR: intellectual effort to organize storage/retrieval of information for precise, accurate, speedy user access. 2) Accessibility of information: Secondary services developed to provide easy, speedy, accurate access to primary information, meeting user requirements. Without this, valuable information is lost. 3) Derived indexing systems (minimum intellectual effort): Derive information directly from document(s), highly amenable to computer organization. Examples: KWIC, KWAC, Citation indexing. 4) Intellectual requirements for assigned indexing: Ability to assign keywords that effectively represent document contents for precise retrieval. Effective use of multiword search statements and context fixing. Good knowledge of syntactics and semantics of indexing languages. Aspects relegated to computers: routine operations of arranging references, matching search terms with text/surrogates. 5) Types of indexing languages: Classification systems, Subject Headings Lists, Thesauri, Thesaurofacets. 6) Intellectual components in constructing indexing languages: Selecting terms, establishing relationships, indicating relationships, displaying terms. 7) Purpose of information service provider interacting with information seeker: Extract maximum information to understand user needs. Formulate query statement to decide search strategy. Search results depend on provider's database knowledge and ability to get results (often requires multiple searches). Intellectual component: formulating search strategy after eliciting correct user response. 8) Meaning of 'Content Analysis': Analyzing human experience/knowledge records; studying communication, its nature, meanings, processes, and people. 9) Purpose and use of Content Analysis: Purpose varies by activity. In library/information science, it's fundamental to operations. Used as research technique in social sciences, psychology (counselling/diagnostic studies), analysis of voting patterns, newspaper reports, election manifestoes. In library/information science: indexing, abstracting, user studies, information product/service production. Analysis by experts provides comprehensive view of opinions/facts for new interpretations. 10) Distinct current trends in IR systems: Increasing computerization. Phenomenal growth of computer databases. Increased number of users with direct access. Research in Better Systems Design, Improved Retrieval Facilities and Strategies, Human-Computer Interface. 1.13 Keywords Algorithm: Instructions for carrying out a series of logical procedural steps in a specific order. Boolean Search: Searching databases using Boolean operators (AND, OR, SAME, NOT) in same/combined search fields. Exact Match: Searching databases on an exact phrase entered by the searcher. Best Match: Searching databases on criteria given by the searcher that almost matches. Exhaustivity: Navigating in a database, choosing broader search terms to obtain greater number of references. Heuristic search: User's search modified continuously based on knowledge gained at each stage of search results. Intellectual organization of information: The power or faculty of the intellect and the mind of thinking, learning, understanding, assimilating, acquiring and organizing mental constructs and recording them in a suitable form for communication. Iterative search: Searches of information service provider that get modified in the light of the interaction with users reaction to the search results. Kaleidoscopic analysis: Pertaining to changing, complex, variety of forms/images in a manner suggesting changing pattern of a kaleidoscope. A kaleidoscope is an optical instrument in which bits of glass, beads, etc. hold loosely at the end of a rotating tube are shown in continuously changing symmetrical forms by re-flection in two or more mirrors set at angles to each other. Keyword Search: Searching databases on the topic of a keyword entered by the searcher. Penumbra: Partly shaded area around the shadow of an opaque object, referring the related areas of information to get additional information relevant to a query. Semantic: Meaning of words and their relationships. Specificity: Obtaining exact references to query, choosing very specific terms in a search. Surrogate: Person or thing that acts or is used instead of another as substitute; an entry standing for a document. Syntactic: Words and their order of representation in a multiword entry of an index file. Umbra: Dark central part of the shadow cast by the earth or the moon in an eclipse, referring here to the exact references to a query.