最大熵模型文献阅读指南

　　最大熵模型（Maximum Entropy Model）是一种机器学习方法，在自然语言处理的许多领域（如词性标注、中文分词、句子边界识别、浅层句法分析及文本分类等）都有比较好的应用效果。张乐博士的最大熵模型工具包manual里有“Further Reading”，写得不错，就放到这里作为最大熵模型文献阅读指南了。
　　与《统计机器翻译文献阅读指南》不同，由于自己也正在努力学习Maximum Entropy Model中，没啥发言权，就不多说废话了。这些文献在Google上很容易找到，不过多数都比较长（30多页），甚至有两篇是博士论文，有100多页，希望初学读者不要被吓住了，毕竟经典的东西是值得反复推敲的！

Maximum Entropy Model Tutorial Reading

　　This section lists some recommended papers for your further reference.

1. Maximum Entropy Approach to Natural Language Processing [Berger et al., 1996]
　　（必读）A must read paper on applying maxent technique to Natural Language Processing. This paper describes maxent in detail and presents an Increment Feature Selection algorithm for increasingly construct a maxent model as well as several example in statistical Machine Translation.

2.Inducing Features of Random Fields [Della Pietra et al., 1997]
　　（必读）Another must read paper on maxent. It deals with a more general frame work: Random Fields and proposes an Improved Iterative Scaling algorithm for estimating parameters of Random Fields. This paper gives theoretical background to Random Fields (and hence Maxent model). A greedy Field Induction method is presented to automatically construct a detail random elds from a set of atomic features. An word morphology application for English is developed.

3.Adaptive Statistical Language Modeling: A Maximum Entropy Approach [Rosenfeld, 1996]
　　This paper applied ME technique to statistical language modeling task. More specically, it built a conditional Maximum Entropy model that incorporated traditional N-gram, distant N-gram and trigger pair features. Significantly perplexity reduction over baseline trigram model was reported. Later, Rosenfeld and his group proposed a Whole Sentence Exponential Model that overcome the computation bottleneck of conditional ME model.

4.Maximum Entropy Models For Natural Language Ambiguity Resolution [Ratnaparkhi, 1998]
　　This dissertation discussed the application of maxent model to various Natural Language Disambiguity tasks in detail. Several problems were attacked within the ME framework: sentence boundary detection, part-of-speech tagging, shallow parsing and text categorization. Comparison with other machine learning technique (Naive Bayes, Transform Based Learning, Decision Tree etc.) are given.

5.The Improved Iterative Scaling Algorithm: A Gentle Introduction [Berger, 1997]
　　This paper describes IIS algorithm in detail. The description is easier to understand than [Della Pietra et al., 1997], which involves more mathematical notations.

6.Stochastic Attribute-Value Grammars (Abney, 1997)
　　Abney applied Improved Iterative Scaling algorithm to parameters estimation of Attribute-Value grammars, which can not be corrected calculated by ERF method (though it works on PCFG). Random Fields is the model of choice here with a general Metropolis-Hasting Sampling on calculating feature expectation under newly constructed model.

7.A comparison of algorithms for maximum entropy parameter estimation [Malouf, 2003]
　　Four iterative parameter estimation algorithms were compared on several NLP tasks. L-BFGS was observed to be the most effective parameter estimation method for Maximum Entropy model, much better than IIS and GIS. [Wallach, 2002] reported similar results on parameter estimation of Conditional Random Fields.

附录：
张乐博士的最大熵模型工具包：
　http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html
关于最大熵模型的两个参考网页，后者也是一个reading list，但是较早：
　1.MaxEnt and Exponential Models
　2.A maxent reading list

注：转载请注明出处“我爱自然语言处理”：www.52nlp.cn

本文链接地址：https://www.52nlp.cn/maximum-entropy-model-tutorial-reading

《最大熵模型文献阅读指南》有5条评论

Mars说道：

2009年11月5号 01:32

我一开始看Berger的时候也是一头雾水，后来按照Linear Regression -> Logistic Regression -> MaxEnt的思路，觉得稍微清楚一些。供参考。

[回复]
admin说道：

2009年11月5号 07:14

对，忘了说这个了，《自然语言处理综论》第二版（Speech and Language Processing 2nd)新加的第六章“Hidden Markov and Maximum Entropy Models”关于最大熵模型的讲解就是沿用Linear Regression -> Logistic Regression -> MaxEnt这个思路的，从最大熵模型的背景知识入手，讲得很不错，也是值得参考的阅读资料！谢谢提醒！

[回复]
justmewei说道：

2009年12月31号 22:11

《自然语言处理综论》，好书呀。可惜现在各大网上书店严重缺货，出版社倒是还可以买到。最大熵模型也在看，以后多来讨教了~~呵呵

[回复]
52nlp 回复:
1 1 月, 2010 at 12:57
不过目前国内的《自然语言处理综论》还是冯志伟教授翻译的第一版，第一版似乎还没有讲最大熵模型，英文版第二版才有。

[回复]
Narts说道：

2013年04月30号 22:20

请问博主Maximum entropy model 和 log linear model 有什么区别吗？

[回复]

作者52nlp

作者 52nlp

相关文章

中文分词入门之字标注法3

MIT自然语言处理第五讲：最大熵和对数线性模型（第五部分）

MIT自然语言处理第五讲：最大熵和对数线性模型（第四部分）

《最大熵模型文献阅读指南》有5条评论

发表回复

You missed

Qwen3-VL技术报告英中对照版.pdf

DeepSeek-V3.2-Exp：用稀疏注意力实现更高效的长上下文推理

LongCat-Flash：美团发布的高效MoE大模型，支持智能体任务，推理速度达100 token/秒

GLM-4.5：三体合一的开源智能体大模型，重新定义AI推理边界

作者52nlp

相关文章：

作者 52nlp

相关文章

《最大熵模型文献阅读指南》有5条评论

发表回复

You missed