资讯检索策略与技巧.ppt
資訊檢索策略與技巧黃慕萱,Chap.6Harter,Chap.71檢索策略v.s.檢索技巧n最早為軍方用語n各家看法1979,Marcia Bates,”Information Search Tactics”Hartlyn如何避免找到不相關文章的方法n處理找到過多或過少相關文章的可能對策Palmern指分區組合檢索和引用文獻滾雪球法Paon指布林邏輯、引用文獻及機率檢索策略n檢索策略(search strategy)針對一檢索問題之通盤考量或全面性之規劃如分區組合檢索法、引用文獻滾雪球法.等n檢索技巧(search heuristics)為完成特定目的所採取的行動2Briefsearch簡易檢索n最常見的檢索方式n快速簡單fast and inexpensiven但常是低recall,低precisionn適用主題明確n想瞭解資料庫製作者所使用的敘述語和索引詞彙確認書目資料n已知書名、作者等3Building Blocks Search分區組合檢索法n亦有人稱為“block building”或“building block”n檢索方式將索引問題分解成數個主題層面(facets)確定主題層面間的關係n通常facets間的關係為”AND”,出現”OR”或”NOT”的情況較少找出可代表各主題層面的檢索詞彙利用布林邏輯”OR”做聯集,以求完整性n使用率最高,早期參考晤談表格常依此設計4Building Blocks Search Strategy-1/41.Conduct reference interviews2.Formulate search objectivesnHigh recallnHigh precisionnModerate levels of recall and precision3.Select database(s)and search system4.Identify major concepts or facets and their logical relationships with one another5Building Blocks Search Strategy-2/45.Identify nsearch strings that represent the conceptsnWordsnFull-text phrasesnPieces of wordsnDescriptorsnIdentifiersnCodesnNon-semantic bibliographic characteristics非主題相關的欄位,如資料類型、語言、年代等n包括同義詞、類同義詞、狹義詞、相關詞nfields to be searched6Building Blocks Search Strategy-3/46.For each distinct facet of the search,a set of postings will be created for each search string within that facet.The sets are then combined into a single set representing that facet using Boolean OR7.Following setp#6,the facets sets themselves will be combined with Boolean AND and NOT8.Plan alternatives7Building Blocks Search Strategy-4/49.Formulate the initial statements of the search in the command language of the system10.Logon and put the search to the system11.Evaluate the intermediate results12.IteratenUse the interactive features of the system to carry out search heuristics tactics,maneuvers,strategies,tricks,devices,approaches,to try to improve search results8Building blocks approachFacet AFacet BTerm A1 ORTerm A2 OR.Term ApTerm B1 ORTerm B2 OR.Term BqFact CTerm C1 ORTerm C2 OR.Term CrAnswer SetBoolean combination of facets(AND,OR,NOT)9Building Blocks search sampleFacet 1Facet 2Facet 3Facet 4Facet 5RISKMEASUREMENTRISK AVERSIONBEHAVIORAL DECISION THEORYINSURANCEriskmeasurementassessmentchoicedecisionoutcomerisk aversionrisk avoidancerisk neutralityrisk pronerisk tendencybehavioraldecisiontheoryinsurance contractbankfinancestockinvestmentadvertisementMeasurement of Risk Tendencies(looking for high recall)Boolean Combination:(RISK AND MEASUREMENT)OR RISK AVERSION OR BEHAVIORAL DECISION THEORY)NOT INSURANCE10檢討結果重新檢索n想增加recall時find additional concepts or search terms to add to one or more facetsdelete a facetn想增加precision時delete some of the more broader or more ambiguous terms in the facetsadd an additional facet to be intersected with the others11Successive facet strategies主題層面連續檢索法 1/3n其他名稱fewest postings first(最少筆數優先)most specific concept first(最精確概念優先)successive fractions(非以主題層面開始的連續檢索)n分區v.s.主題層面分區檢索法使用所有主題層面主題層面連續檢索法設法動用最少的主題層面n決定檢索問題的主題層面後,需確定其優先順序,視結果決定是否要繼續進行檢索12Successive facet strategies-2/3FirstFacetSecondFacet(optional)OtherFacet(optional)OtherFacetSolution Set(optional)ANDAND例1:“members and activities of 4-H clubs”例2:”the emotional,physical,and intellectual characteristics of children who have studied violin with the Suzuki method”13Successive facet strategies-3/3n適用情況當所有的主題層面以布林運算元結合,很可能產生零筆資料時當檢索問題中有一至兩個主題層面涵義相當模糊時當檢索問題具備其他非主題之檢索條件,如資料類型、語言、或出版年代等,可將此非主題檢索條件視為第一個檢索概念時當檢索者寧願忍受誤引而不願失去相關文章時當加入其他主題層面所花費的時間和金錢,可能會超越直接列印檢索結果時當相關文獻過少,檢索者願意檢視一些相關度較低的文章時14Pairwise Facets主題層面配對法1/3n將主題層面兩兩配對並取其交集,而後再聯集之n適用情形所有主題層面都同樣重要主題層面之精確性或模糊性相差不大將所有主題層面結合會導致零筆資料n注意:主題層面過多時,盡量以3-4個為執行交集的基本單位,以免混淆15Pairwise Facets2/3分區組合檢索主題層面配對檢索A AND B AND C(A AND B)OR(A AND C)OR(B AND C)16Pairwise Facets3/3Facet#1Facet#2Facet#3SolutionSet BSolution Set ASample:A doctoral student wants a high recall bibliography prepared on the relationship between facial musculature and the physiological(autonomic)responding of emotions,e.g.,fear.SolutionSet CFINAL SOLUTION SET:A OR B OR CANDANDAND17Citation Pearl Growingn引用文獻滾雪球法以high precision 為目的n由100%precision(相關的文章),反推追求recalln不斷從已知相關的文獻中,獲取檢索所需的descriptors、identifiers、words,重新進行檢索n 適用情形資料庫無索引典或詞彙集新興學科n常需重複多次檢索,不適於初學者18Other facet strategiesnMultiple Briefsearch利用不同的database,盡量取得high recallnInteractive Scanningmost time-consuming and interactive如使用classification codes,natural languagenImplied Concepts掌握隱含性概念,視資料庫之主題性質,選用不同詞彙例:possible health hazards from foods cooked using microwave ovens19Citation indexing strategiesn利用引用(citing)與被引用(cited)文獻之間的關係,建構檢索策略nOffer highly interdisciplinary and multidisciplinary approaches to online searchingn檢索策略Cited publication、Cited Author、Cocited Authors國科會人文學研究中心人文學引用文獻資料庫(THCI)http:/20Non-subject,fact,and multiple database searchingnNon-subject searchingDocument type、year of publication、language、author、corporate sourcedoublelimitingnFact searchingSearch for a known itemnMultiple database searching注意收錄欄位和控制語言用法21檢索技巧(Heuristics)nLanguage HeuristicsnCommand Language,Database and File Structure HeuristicsnRecall and Precision HeuristicsHeuristics for Increasing RecallHeuristics for Increasing PrecisionnPersonal Heuristics22Language Heuristics1/2n當有下列情形,應使用自然語言檢索 One or more of the concepts of interest involves a subtle nuance of meaningOne or more of the concepts of interest is highly specificOne or more of the concepts is relatively new and appropriate terms in the controlled vocabulary don not existA highly comprehensive search is desired(high recall)The literature to be searched is“soft”23Language Heuristics2/2n當有下列情形,應使用控制詞彙檢索The concepts of interest can be expressed precisely and unambiguously in the controlled vocabularyA limited search retrieving a limited number of highly pertinent items is desiredThe literature to be searched is“hard”24Command Language,Database and File Structure Heuristics1/2nKnow the stop words used by the search systemnKnow the sort order associated with the binary coding system used by the host computernKnow which fields are searched by default,if search fields are not explicitly specified25Command Language,Database and File Structure Heuristics2/2nKnow the parsing rule used to index each field searched瞭解基本索引檔所包含的欄位nAlways question null sets注意檢索欄位所使用的索引法,如單字或片語nUnderstand Boolean operations with the null set and make use of this knowledge in reformulating search statements26Questions to ask in low recall1/2nAm I in the correct database?nHave I overspecified the search problem?nIs there anything done on the topic or problem?Is there a literature on this search problem?nHave sufficient search terms been included to properly represent each concept of the search?27Questions to ask in low recall2/2nWhere the proximity specifications placed on the search placed on the search terms too restrictive?nWas Boolean logic used correctly?nDid I make a technical error,e.g.,in spelling or command syntax?nShould I be searching in natural language fields?nHave all word forms of search terms bee used?Should truncation be employed?28Heuristics for Increasing Recall-1/2nUse additional synonyms and near synonyms combined with Boolean OR to represent search conceptsnUse more generic terms in addition to specific terms to represent search conceptsnUse natural language in addition to controlled vocabulary termsnSearch additional subject fields29Heuristics for Increasing Recall-2/2nDelete AND and NOT facets form the formulationnIncrease term truncationnUse less restrictive proximity operators,e.g.,require that terms appear in the same paragraph rather than the same sentencenRemove any restrictions from the formulation,e.g.,language,date of publication,type of publication30Questions to ask in low precision1/2nAm I in the correct database?nHave I underspecified the search problem?nDo I need to disambiguate a concept of the problem?nHave I used Boolean logic correctly?nHave I include vague or ambiguous terms,or terms that are too generic?31Questions to ask in low precision2/2nShould I restrict search terms to elements of a controlled vocabulary?nWhere the proximity specifications too loosely placed on the search terms?nAre false drops resulting from concepts having an unintended relationship with one another?nHas a search term been truncated too severely?32Heuristics for Increasing Precision-1/2nDelete near synonyms and potentially ambiguous termsnUse more specific terms to represent conceptsnUse controlled vocabulary terms if a concept is precisely represented by them;delete controlled vocabulary terms that do not describe a concept preciselynIf multiple meaning does not appear to be a major problem,search natural language terms that represent the concepts of interest precisely33Heuristics for Increasing Precision-2/2nIf none of the above conditions applies,search fewer subject fields,deleting fields in the approximate order;full text,abstract,title,identifier,and descriptornAdd additional facets with AND and NOTnDecrease term truncationnUse more restrictive proximity operatorsnAdd restrictions to the formulation,e.g.,by date of publication,type of publication,language,etc.34Personal Heuristics1/2nBe flexible;stay loose;be willing to look at a search in more than one way.Avoid rigidity in thought and action.nBrowse samples of retrieved citations to assess relevancy.nBrowse samples of retrieved citations to generate additional search terms.nBe heuristic,interactive.Dont do“fast batch”searching.35Personal Heuristics2/2nEvaluate ones own work critically.nAlways be skeptical of search output.nA mindless faith in controlled vocabularies is not always justified.Be critical of the adequacy of artificial languages for the representation of concepts in documents.36