基于远程教育的网络智能答疑系统的研究与设计译文

April 19th, 2008


» 上一篇:常见的分词组件比较
» 下一篇:中文搜索引擎技术揭密:网络蜘蛛

Research and Design on NetworkIntelligent Inquiry System Based on Remote Education

摘要:随着Internet在我国的广泛应用,远程教育越来越受到人们的重视。远程教育不仅仅是将教学材料在网上发布,更多的是学生与教师之间的沟通和交流,这就要求远程教育系统具有解答学生问题的功能。目前,远程教育系统主要采用电子邮件、在线讨论和关键词查询三种主要方式进行答疑。前两种方式都造成了教师资源和答案资源的巨大浪费,关键词查询方式要求学生具备一定的关键词抽取技术,给学生增加负担,查询效果不理想。要解决以上问题必须引入新技术、新方法。

本论文选题是根据本人所在单位计算机中心精品课程网络教育教学系统需求提出的。在现代远程教育和网络教学中,针对某一门具体课程而言,学生提出的问题一般都相对集中,重复和类似的问题比较多,这使得我们可以把这些相对集中的问题组织起来形成常问问题(FrequentlyAsked Questions,FAQ)库。

本文正是根据这一特点,设计并实现了一个针对计算机专业《操作系统》课程常问问题的中文自然语言智能答疑系统。系统研究并引入了ICTCLAS汉语词法分析系统进行中文分词,利用潜在语义分析(LatentSemanticAnalysis,LSA)技术进行信息检索,避免了传统信息检索系统的同义词和多义词问题,提高了分词效率和检索精度。系统采用B/S模式,将自然语言处理技术与网站开发技术相结合,以自然语言智能答疑为主,关键词检索和留言答疑为辅,充分发挥了网络教育的优越性,提高了教学效率并减少了教师的教学负担。

本文的主要研究目标是:对智能答疑系统两大关键技术——中文分词和信息检索展开理论和实践两方面的研究,在此基础上,初步实现针对本校《操作系统》网络课程支持自然语言的智能答疑系统。系统具有较高的准确性、实用性和可扩充性,并且易于开发和维护,它可在各类课程的中文主观性问题答疑中得到广泛应用。课题完成后对本单位网络教育教学提供一定的技术支持。

本文在智能答疑系统设计与实现过程中,主要作了以下几方面的工作:

1、分析了智能答疑系统的发展现状,以及现有答疑系统存在的不足。

2、研究了中文自动分词技术的发展状况,中文分词技术的必要性,分词技术困难之所在,以及常用的分词方法。举例说明了几个现有的分词系统以及他们使用的分词方法和取得的成果。

3、对比分析了IR领域的几种常用的检索技术,引入了潜在语义分析技术进行文本相似度计算,一定程度的解决了自然语言中的同义词和多义词问题,提高了检索精度。

4、对系统进行了详细设计,重点研究了中科院汉语词法分析系统ICTCLAS的接入和潜在语义分析方法的实现,最后给出了测试结果。

由于时间的限制,系统还存在一些缺陷,日后有待进一步改进。首先,我们引入潜在语义分析的目的是寻找一种降维的方法改进向量空间模型(VectorSpace Model,VSM),使它在满足最小方差原则的(即映射前后能保持原向量和投影向量之间的方差最小)情况下,去除语义空间噪音,解决IR领域的一词多义和一义多词现象,但是并没有考虑到对LSA本身的产生影响的因素,如:数据稀疏、权重计算、特征选取以及降维比例等。这些会对LSA的效果产生重大影响的因素本身就是值得我们进一步研究和探讨的。其次,传统的潜在语义分析是通过奇异值分解的方法来计算的,在它得到较好效果的同时,时间复杂度和空间复杂度都比较高。我们应该寻找一种新算法降低时间复杂度和空间复杂度,以使LSA方法能够更好的应用于大规模的文本处理。

关键字: 远程教育,智能答疑,潜在语义分析,FAQ

Abstract: With the extensive application of internet inChina, more and more priority is given to remote education. Suchmode proved not only being delivered teaching material on thewebsite, yet more on the mutual communication and contact amongteaching staff and receivers, which requires the remote educationalsystem installed with functions of online inquiry. The remoteeducational system adopted mainly three forms as electronic mail,online-discussion, key words enquiry, the first two forms causedgigantic waste in teaching staff and answering plot. Key wordsenquiry calls for a definite extractive technique rather thanadding burdens leading to insufficient inquiry. To solve the aboveproblem, new techniques and approaches must be introduced.

The essay was proposed in accordance with requirement ofexquisite curricula network educational and teaching system. Interms of a concrete curricula in contemporary remote education andnetwork teaching, questions raised from initial learners provedrelatively convergent with repetition and similarity, thus couldmake us organize the accumulative questions as bank of FrequentlyAsked Questions.

It is based on the characteristic that theessay designed and realized a Chinese natural language intelligentinquiry system in allusion to computer major’s <OperatingSystem>. The system researched and introduced ICTCLAS forcategory, executing informative index with LSA technology to avoidproblems of synonyms and ambiguity in conventional informationretrieval system and to improve classification efficiency andprecision. The system adopted B/S mode to integrate linguisticprocessing and website exploiting technologies for sufficientlyexertion superiority of network education by natural linguisticintelligence inquiry-orientation and key words retrieving andpostscript inquiry as assistance to improve teaching efficiency andreduced teachers’ teaching burden..

The paramount researching objective of theessay is : execute a research on theoretical and practical researchon two crucial technologies in the intelligent inquiry system——Chinese wording classification and information retrieving,based on which to initially realizing intelligent inquiry systemfor the supportive curricula of <Operating System>. Withrelatively accuracy , practicability and expandability that apt toexploitation and maintenance, the system can be used extensively inChinese subjective inquiry of various curricular , and will providetechnology patronage for network educationalteaching after accomplishment.

The essay executed the following work in thecourse of intelligent inquiry system designing and realization:

1) Analyzed the current circumstance and inadequacy existed inthe present inquiry system

2) Explored the developmental status of Chinese automaticclassification technique, necessity of Chinese classifyingtechnologies and dilemma in classification, cite instances toillustrate the current classification system and achievementobtained.

3) With the contrastive analysis on several commonly usedretrieving technologies in IR, and introduced LSA to complementcontext similarity computation to solved problems as to synonymsand ambiguity in natural language for improvement of retrievingprecison.

4) With the detailed designing on the system, the essayresearched crucially the integration and realization of ChinaScience Academy’s Chinese syntactic ICTCLAS and LSA, and promotedthe ultimate trial outcome

Deficiency still existed in the system and await for furtherimprovement due to the limitation of time. We introduced LSAimplied to seek for a dimensional degrading to improve Vector SpaceModel to diminish semantic noise rumble under the circumstance ofsatisfying the principle of minimum variance ( i.e., the minimumvariance maintaining original vector quantities and projectionvector quantities via priority or approximate epsilon ) to solvethe phenomenon of ambiguity or polynomial without consideration onthe factors affecting LSA: data rarefaction, weight computation,feature option, and dimensional diminishing rate, which is whatdeserve us to explore and research factors exerting magnificenteffect. Secondly, the conventional LSA is computed via singularvalue decomposition with higher time and space complexity whileobtaining benign effect. We should seek a new computation to reducethe complexity so as to apply LSA to cosmically contextprocessing.

Key Words: Remote Education ; Intelligent Inquiry ;Latent Semantic Analysis ; FAQ

随机文章

添加评论

必需

Required, hidden

可用标签:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

引用:http://www.ccouo.com/html/1857.html/trackback  |  订阅