贝叶斯模型文献阅读指南

　　估计有些读者已经读了Kevin Knight教授的“Bayesian Inference with Tears”，在这篇为自然语言处理研究者写的关于贝叶斯推理的指南性文章中，他同时提到了：
　　“I’ve assembled this tutorial workbook from natural language papers that I’ve tried to understand. If you want to read original work, check out Sharon Goldwater’s reading list on the web. ”
　　Knight 教授指出，这篇workbook是来自于一些他试图理解的自然语言处理论文的“组装（assembled）”，如果你想阅读原始文献，请参考Sharon Goldwater 的文献列表。我在Google搜了一下“Sharon Goldwater’s reading list”，发现是“Reading list on Bayesian modeling for language”，读了一下，主要是关于语言处理方面的贝叶斯模型的文献列表，就放在这里作为本期文献阅读指南了。

Reading list on Bayesian modeling for language

People often ask me what they can read to learn more about recent Bayesian modeling techniques and their applications to language learning. Here is a list of the papers I have found to be most useful and relevant to my own research. I try to emphasize the papers aimed at a slightly less technical/more cognitively inclined audience. This is not intended to be a complete list, only a starting point.

General introductory material

Thomas L. Griffiths and Alan Yuille (2006). A primer on probabilistic inference. Trends in Cognitive Sciences. Supplement to special issue on Probabilistic Models of Cognition (volume 10, issue 7).

Reviews many of the basic concepts underlying probabilistic (especially Bayesian) modeling and inference, using simple examples.

Sharon Goldwater (2006). Nonparametric Bayesian Models of Lexical Acquisition. Unpublished doctoral dissertation, Brown University, 2006.

Aimed primarily at computational linguists, but should (I hope) be accessible to anyone who has a basic familiarity with generative probabilistic models. Chapters 2 and 3 cover many useful topics, including Bayesian integration in finite and infinite models (i.e., Dirichlet distribution, Dirichlet process, Chinese restaurant process) and a brief introduction to sampling techniques (Gibbs sampling and Metropolis-Hastings sampling).

Daniel J. Navarro, Thomas L. Griffiths, Mark Steyvers, and Michael D. Lee (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50, 101-122.

A very nice introduction to Dirichlet processes aimed at cognitive scientists. Slightly more in-depth, covers the stick-breaking construction for the Dirichlet process (which is not in my thesis) as well as the Chinese restaurant process.

Bayesian language models for learning

Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2007). Distributional Cues to Word Segmentation: Context is Important. Proceedings of the 31st Boston University Conference on Language Development.

Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2006). Contextual Dependencies in Unsupervised Word Segmentation. Proceedings of Coling/ACL.

These two papers apply the Dirichlet process and hierarchical Dirichlet process to word segmentation. The BUCLD paper is more conceptual, the ACL paper is more technical. For a more in-depth treatment, see also Chapter 5 of my thesis (above).

Sharon Goldwater and Thomas L. Griffiths. A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. Proceedings of the Association for Computational Linguistics.

This paper provides a direct comparison between Bayesian methods (averaging over parameters and estimation using Gibbs sampling) and standard methods (estimating parameters directly using EM) using the same underlying model (a standard finite HMM).

Mark Johnson (2007). Why Doesn't EM Find Good HMM POS-Taggers? Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

Includes Variational Bayes as well as Gibbs sampling and EM as estimation procedures. Results are somewhat contradictory to Goldwater and Griffiths, possibly due to the combination of a simpler model and more training data.

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes.Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL).

Jenny Rose Finkel, Trond Grenager and Christopher D. Manning (2007). The Infinite Tree. Proceedings of the Association for Computational Linguistics.

Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007). Adaptor Grammars: a Framework for Specifying Compositional Nonparametric Bayesian Models. Advances in Neural Information Processing Systems 19.

These three papers all deal with nonparametric models of syntax (dependency or context-free grammars). They might be a bit tough for those with less background in nonparametrics, although the exposition in Liang et al. is very nice.

Thomas L. Griffiths, Michael Steyvers, and Joshua B. Tenenbaum (2007). Topics in semantic representation. Psychological Review, 114, 211-244.

Thomas L. Griffiths, Michael Steyvers, David M. Blei, and Joshua B. Tenenbaum (2005). Integrating topics and syntax. Advances in Neural Information Processing Systems 17.

David Blei, Andrew Ng, and Michael Jordan (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022. (A shorter version appeared in NIPS 2002).

These three papers are about Latent Dirichlet Allocation (a.k.a. topic models) for learning semantic structure. The Psych Review paper provides a less technical introduction and considers LDA as a cognitive model. The JMLR paper is the original one, suitable if you want more technical details. The NIPS paper is just cool.

Fei Xu and Joshua B. Tenenbaum (2007). Word learning as Bayesian inference. Psychological Review, 114, 245-272.

Develops a Bayesian model to explain how children learn words at different levels of specificity (basic-level categories versus subordinate or superordinate).

Bayesian models of language processing

This isn't really my area, but here are a couple of interesting papers I know of:

Dennis Norris (2006). The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113(2), 327-357.

Naomi Feldman and Thomas L. Griffiths (2007). A rational account of the perceptual magnet effect. Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society.

Inference

A bunch of the papers mentioned above have descriptions of sampling algorithms and/or variational inference procedures for specific models. For more general information on these topics, consider reading some of the following:

Sharon Goldwater (2006). Nonparametric Bayesian Models of Lexical Acquisition. Unpublished doctoral dissertation, Brown University, 2006.

As I mentioned above, there is a brief overview of Markov chain Monte Carlo methods (Gibbs sampling and Metropolis-Hastings) in Chapter 2. Examples of Gibbs sampling algorithms are described in chapters 4 and 5.

Julian Besag (2000). Markov chain Monte Carlo for statistical inference. Working paper no. 9. University of Washington Center for Statistics and the Social Sciences.

A longer and more technical introduction to Markov chain Monte Carlo methods.

Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007). Bayesian Inference for PCFGs via Markov Cain Monte Carlo. Proceedings of the North American Association for Computational Linguistics.

How to do efficient sampling for PCFGs.

Matthew Beal (2003). Variational Algorithms for Approximate Bayesian Inference. PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Or download individual chapters from here.)

I don't know much about variational methods myself, but I've been told this is a good place to start.

Further Reading

Yee Whye Teh, Michael Jordan, Matthew Beal, and David Blei (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006. 101(476):1566-1581.

The original HDP paper. Comprehensive, but I would suggest getting familiar with the ideas using some of the resources above before reading this one.

Radford Neal (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical report CRG-TR-93-1. University of Toronto Department of Computer Science.

Even more information about Markov chain Monte Carlo methods.

注：转载请注明出处“我爱自然语言处理”：www.52nlp.cn

本文链接地址：https://www.52nlp.cn/bayesian-modeling-for-language-tutorial-reading

《贝叶斯模型文献阅读指南》有6条评论

Mars说道：

2009年12月10号 05:49

我觉得这个还是挺难懂的，最近读了很多篇，但还是觉得东西太多，在脑子里结构不够清晰。

这里的reading group的顺序是：Basics (probability, Bayesian rules, etc.) -> Bayesian Network (Graphical Models) -> Inference (Exact & Variational/Sampling) -> LDA -> Dirichlet Process -> Nonparametric Bayesian -> HDP, Infinite HMM, etc.

不过读完这些就快看到当今研究的前沿了，呵呵。

[回复]
52nlp 回复:
10 12 月, 2009 at 20:00
多谢推荐!不过你已经深处最前沿的CLSP了，呵呵！

[回复]
Mars说道：

2009年12月14号 08:47

这个workbook真的很不错，深入浅出，就是夹杂了一堆难懂的笑话，呵呵

[回复]
52nlp 回复:
14 12 月, 2009 at 20:01
我还看不出来笑话~

[回复]
hyz说道：

2014年10月7号 22:20

从你们文章里可以得到一些脉络，本人是nlp领域新人，明年转博，希望可以和博主多多交流切磋。

[回复]
tqc 回复:
12 12 月, 2014 at 01:12
兄台，请问大数据分析怎么弄的？？？本人新手，求指教

[回复]

作者52nlp

作者 52nlp

相关文章

Springer面向公众开放正版电子书籍，附65本数学、编程、数据挖掘、数据科学、数据分析、机器学习、深度学习、人工智能相关书籍链接及打包下载

《贝叶斯模型文献阅读指南》有6条评论

发表回复

You missed

新浪张俊林：大语言模型的涌现能力——现象与解释

中科院张家俊：ChatGPT中的提示与指令学习

“国产类 ChatGPT ”所存在的差距与挑战-专家圆桌

探索大语言模型垂直化训练技术和应用-陈运文

贝叶斯模型文献阅读指南

作者52nlp

相关文章：

作者 52nlp

相关文章

Springer面向公众开放正版电子书籍，附65本数学、编程、数据挖掘、数据科学、数据分析、机器学习、深度学习、人工智能相关书籍链接及打包下载

《贝叶斯模型文献阅读指南》有6条评论

发表回复

You missed

新浪张俊林：大语言模型的涌现能力——现象与解释

中科院张家俊：ChatGPT中的提示与指令学习

“国产类 ChatGPT ”所存在的差距与挑战-专家圆桌

探索大语言模型垂直化训练技术和应用-陈运文