分类目录归档:机器学习

Coursera上机器学习课程(公开课)汇总推荐

Deep Learning Specialization on Coursera

Coursera上有很多机器学习课程,这里做个总结,因为机器学习相关的概念和应用很多,这里推荐的课程仅限于和机器学习直接相关的课程,虽然深度学习属于机器学习范畴,这里暂时也将其排除在外,后续会专门推出深度学习课程的系列推荐。

1. Andrew Ng 老师的 机器学习课程(Machine Learning)

机器学习入门首选课程,没有之一。这门课程从一开始诞生就备受瞩目,据说全世界有数百万人通过这门课程入门机器学习。课程的级别是入门级别的,对学习者的背景要求不高,Andrew Ng 老师讲解的又很通俗易懂,所以强烈推荐从这门课程开始走入机器学习。课程简介:

机器学习是一门研究在非特定编程条件下让计算机采取行动的学科。最近二十年,机器学习为我们带来了自动驾驶汽车、实用的语音识别、高效的网络搜索,让我们对人类基因的解读能力大大提高。当今机器学习技术已经非常普遍,您很可能在毫无察觉情况下每天使用几十次。许多研究者还认为机器学习是人工智能(AI)取得进展的最有效途径。在本课程中,您将学习最高效的机器学习技术,了解如何使用这些技术,并自己动手实践这些技术。更重要的是,您将不仅将学习理论知识,还将学习如何实践,如何快速使用强大的技术来解决新问题。最后,您将了解在硅谷企业如何在机器学习和AI领域进行创新。 本课程将广泛介绍机器学习、数据挖掘和统计模式识别。相关主题包括:(i) 监督式学习(参数和非参数算法、支持向量机、核函数和神经网络)。(ii) 无监督学习(集群、降维、推荐系统和深度学习)。(iii) 机器学习实例(偏见/方差理论;机器学习和AI领域的创新)。课程将引用很多案例和应用,您还需要学习如何在不同领域应用学习算法,例如智能机器人(感知和控制)、文本理解(网络搜索和垃圾邮件过滤)、计算机视觉、医学信息学、音频、数据库挖掘等领域。

这里有老版课程评论,非常值得参考推荐:Machine Learning

2. 台湾大学林轩田老师的 機器學習基石上 (Machine Learning Foundations)---Mathematical Foundations

如果有一定的基础或者学完了Andrew Ng老师的机器学习课程,这门机器学习基石上-数学基础可以作为进阶课程。林老师早期推出的两门机器学习课程口碑和难度均有:机器学习基石机器学习技法 ,现在重组为上和下,非常值得期待:

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This first course of the two would focus more on mathematical tools, and the other course would focus more on algorithmic tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重數學類的工具,而另一課程將較為著重方法類的工具。]

3. 台湾大学林轩田老师的 機器學習基石下 (Machine Learning Foundations)---Algorithmic Foundations

作为2的姊妹篇,这个机器学习基石下-算法基础 更注重机器学习算法相关知识:

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This second course of the two would focus more on algorithmic tools, and the other course would focus more on mathematical tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重方法類的工具,而另一課程將較為著重數學類的工具。

可参考早期的老版本课程评论:機器學習基石 (Machine Learning Foundations) 機器學習技法 (Machine Learning Techniques)

4. 华盛顿大学的 "机器学习专项课程(Machine Learning Specialization)"

这个系列课程包含4门子课程,分别是 机器学习基础:案例研究 , 机器学习:回归 , 机器学习:分类, 机器学习:聚类与检索:

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.

4.1 Machine Learning Foundations: A Case Study Approach(机器学习基础: 案例研究)

你是否好奇数据可以告诉你什么?你是否想在关于机器学习促进商业的核心方式上有深层次的理解?你是否想能同专家们讨论关于回归,分类,深度学习以及推荐系统的一切?在这门课上,你将会通过一系列实际案例学习来获取实践经历。

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices based on house-level features, analyze sentiment from user reviews, retrieve documents of interest, recommend products, and search for images. Through hands-on practice with these use cases, you will be able to apply machine learning methods in a wide range of domains. This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms. Together, these pieces form the machine learning pipeline, which you will use in developing intelligent applications. Learning Outcomes: By the end of this course, you will be able to: -Identify potential applications of machine learning in practice. -Describe the core differences in analyses enabled by regression, classification, and clustering. -Select the appropriate machine learning task for a potential application. -Apply regression, classification, clustering, retrieval, recommender systems, and deep learning. -Represent your data as features to serve as input to machine learning models. -Assess the model quality in terms of relevant error metrics for each task. -Utilize a dataset to fit a model to analyze new data. -Build an end-to-end application that uses machine learning at its core. -Implement these techniques in Python.

4.2 Machine Learning: Regression(机器学习: 回归问题)

这门课程关注机器学习里面的一个基本问题: 回归(Regression), 也通过案例研究(预测房价)的方式进行回归问题的学习,最终通过Python实现相关的机器学习算法。

Case Study - Predicting Housing Prices In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression. In this course, you will explore regularized linear regression models for the task of prediction and feature selection. You will be able to handle very large sets of features and select between models of various complexity. You will also analyze the impact of aspects of your data -- such as outliers -- on your selected models and predictions. To fit these models, you will implement optimization algorithms that scale to large datasets. Learning Outcomes: By the end of this course, you will be able to: -Describe the input and output of a regression model. -Compare and contrast bias and variance when modeling data. -Estimate model parameters using optimization algorithms. -Tune parameters with cross validation. -Analyze the performance of the model. -Describe the notion of sparsity and how LASSO leads to sparse solutions. -Deploy methods to select between models. -Exploit the model to form predictions. -Build a regression model to predict prices using a housing dataset. -Implement these techniques in Python.

4.3 Machine Learning: Classification(机器学习:分类问题)

这门课程关注机器学习里面的另一个基本问题: 分类(Classification), 通过两个案例研究进行学习:情感分析和贷款违约预测,最终通过Python实现相关的算法(也可以选择其他语言,但是强烈推荐Python)。

Case Studies: Analyzing Sentiment & Loan Default Prediction In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent. You will implement these technique on real-world, large-scale machine learning tasks. You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier. This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data. We've also included optional content in every module, covering advanced topics for those who want to go even deeper! Learning Objectives: By the end of this course, you will be able to: -Describe the input and output of a classification model. -Tackle both binary and multiclass classification problems. -Implement a logistic regression model for large-scale classification. -Create a non-linear model using decision trees. -Improve the performance of any model using boosting. -Scale your methods with stochastic gradient ascent. -Describe the underlying decision boundaries. -Build a classification model to predict sentiment in a product review dataset. -Analyze financial data to predict loan defaults. -Use techniques for handling missing data. -Evaluate your models using precision-recall metrics. -Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).

4.4 Machine Learning: Clustering & Retrieval(机器学习:聚类和检索)

这门课程关注的是机器学习里面的另外两个基本问题:聚类和检索,同样通过案例研究进行学习:相似文档查询,一个非常具有实际应用价值的问题:

Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

5. 密歇根大学的 Applied Machine Learning in Python(在Python中应用机器学习)

Python机器学习应用课程,这门课程主要聚焦在通过Python应用机器学习,包括机器学习和统计学的区别,机器学习工具包scikit-learn的介绍,有监督学习和无监督学习,数据泛化问题(例如交叉验证和过拟合)等。这门课程同时属于"Python数据科学应用专项课程系列(Applied Data Science with Python Specialization)"。

This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.

6. 俄罗斯国立高等经济学院和Yandex联合推出的 高级机器学习专项课程系列(Advanced Machine Learning Specialization)

该系列授课语言为英语,包括深度学习,Kaggle数据科学竞赛,机器学习中的贝叶斯方法,强化学习,计算机视觉,自然语言处理等7门子课程,截止目前前3门课程已开,感兴趣的同学可以关注:

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

以下是和机器学习直接相关的子课程,其他这里略过:

6.3 Bayesian Methods for Machine Learning(面向机器学习的贝叶斯方法)

该课程关注机器学习中的贝叶斯方法,贝叶斯方法在很多领域都很有用,例如游戏开发和毒品发现。它们给很多机器学习算法赋予了“超能力”,例如处理缺失数据,从小数据集中提取大量有用的信息等。当贝叶斯方法被应用在深度学习中时,它可以让你将模型压缩100倍,并且自动帮你调参,节省你的时间和金钱。

Bayesian methods are used in lots of fields: from game development to drug discovery. They give superpowers to many machine learning algorithms: handling missing data, extracting much more information from small datasets. Bayesian methods also allow us to estimate uncertainty in predictions, which is a really desirable feature for fields like medicine. When Bayesian methods are applied to deep learning, it turns out that they allow you to compress your models 100 folds, and automatically tune hyperparametrs, saving your time and money. In six weeks we will discuss the basics of Bayesian methods: from how to define a probabilistic model to how to make predictions from it. We will see how one can fully automate this workflow and how to speed it up using some advanced techniques. We will also see applications of Bayesian methods to deep learning and how to generate new images with it. We will see how new drugs that cure severe diseases be found with Bayesian methods.

7. 约翰霍普金斯大学的 Practical Machine Learning(机器学习实战)

这门课程从数据科学的角度来应用机器学习进修实战,课程将会介绍机器学习的基础概念譬如训练集,测试集,过拟合和错误率等,同时这门课程也会介绍机器学习的基本模型和算法,例如回归,分类,朴素贝叶斯,以及随机森林。这门课程最终会覆盖一个完整的机器学习实战周期,包括数据采集,特征生成,机器学习算法应用以及结果评估等。这门机器学习实践课程同时属于约翰霍普金斯大学的 数据科学专项课程(Data Science Specialization)系列:

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.

8. 卫斯理大学 Regression Modeling in Practice(回归模型实战)

这门课程关注的是数据分析以及机器学习领域的最重要的一个概念和工具:回归(模型)分析。这门课程使用SAS或者Python,从线性回归开始学习,到了解整个回归模型,以及应用回归模型进行数据分析:

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

这门课程同时属于卫斯理大学的 数据分析与解读专项课程系列(Data Analysis and Interpretation Specialization)

9. 卫斯理大学的 Machine Learning for Data Analysis(面向数据分析的机器学习)

这门课程关注数据分析里的机器学习,机器学习的过程是一个开发、测试和应用预测算法来实现目标的过程,这门课程以 Regression Modeling in Practice(回归模型实战) 为基础,介绍机器学习中的有监督学习概念,同时从基础的分类算法到决策树以及聚类都会覆盖。通过完成这门课程,你将会学习如何应用、测试和解读机器学习算法用来解决实际问题。

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

这门课程同时属于卫斯理大学的 数据分析与解读专项课程系列(Data Analysis and Interpretation Specialization)

10. 加州大学圣地亚哥分校的 Machine Learning With Big Data(大数据机器学习)

这门课程关注大数据中的机器学习技术,将会介绍相关的机器学习算法和工具。通过这门课程,你可以学到:通过机器学习过程来设计和利用数据;将机器学习技术用于探索和准备数据来建模;识别机器学习问题的类型;通过广泛可用的开源工具来使用数据构建模型;在Spark中使用大规模机器学习算法分析大数据。

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems. At the end of the course, you will be able to: • Design an approach to leverage data using the steps in the machine learning process. • Apply machine learning techniques to explore and prepare data for modeling. • Identify the type of machine learning problem in order to apply the appropriate set of techniques. • Construct models that learn from data using widely available open source tools. • Analyze big data problems using scalable machine learning algorithms on Spark.

这门课程同时属于 加州大学圣地亚哥分校的大数据专项课程系列(Big Data Specialization)

11. 俄罗斯搜索巨头Yandex推出的 Big Data Applications: Machine Learning at Scale(大数据应用:大规模机器学习)

机器学习正在改变世界,通过这门课程,你将会学习到:识别实战中需要用机器学习算法解决的问题;通过Spark MLLib构建、调参、和应用线性模型;里面文本处理的方法;用决策树和Boost方法解决机器学习问题;构建自己的推荐系统。

Machine learning is transforming the world around us. To become successful, you’d better know what kinds of problems can be solved with machine learning, and how they can be solved. Don’t know where to start? The answer is one button away. During this course you will: - Identify practical problems which can be solved with machine learning - Build, tune and apply linear models with Spark MLLib - Understand methods of text processing - Fit decision trees and boost them with ensemble learning - Construct your own recommender system. As a practical assignment, you will - build and apply linear models for classification and regression tasks; - learn how to work with texts; - automatically construct decision trees and improve their performance with ensemble learning; - finally, you will build your own recommender system! With these skills, you will be able to tackle many practical machine learning tasks. We provide the tools, you choose the place of application to make this world of machines more intelligent.

这门课程同时属于Yandex推出的 面向数据工程师的大数据专项课程系列(Big Data for Data Engineers Specialization)

注:本文首发“课程图谱博客”:http://blog.coursegraph.com ,同步发布到这里, 本文链接地址:http://blog.coursegraph.com/coursera上机器学习课程公开课汇总推荐 http://blog.coursegraph.com/?p=696

机器学习保险行业问答开放数据集: 2. 使用案例

Deep Learning Specialization on Coursera

上一篇文章中,介绍了数据集的设计,该语料可以用于研究和学习,从规模和质量上,是目前中文问答语料中,保险行业垂直领域最优秀的语料,关于该语料制作过程可以通过语料主页了解,本篇的主要内容是使用该语料实现一个简单的问答模型,并且给出准确度和损失函数作为数据集的Baseline。

DeepQA-1

为了展示如何使用该语料训练模型和评测算法,我做了一个示例项目 - DeepQA-1,本文接下来会介绍DeepQA-1,假设读者了解深度学习基本概念和Python语言。

Data Loader

数据加载包含两部分:加载语料和预处理。 加载数据使用 insuranceqa_data  载入训练,测试和验证集的数据。

预处理是按照模型的超参数处理问题和答案,将它们组合成输入需要的格式,在本文介绍的baseline model中,预处理包含下面工作:

  1. 在词汇表(vocab)中添加辅助Token: <PAD>, <GO>. 假设x是问题序列,是u回复序列,输入序列可以表示为:

超参数question_max_length代表模型中问题的最大长度。 超参数utterance_max_length代表模型中回复的最大长度,回复可能是正例,也可能是负例。

其中,Token <GO> 用来分隔问题和回复,Token <PAD> 用来补齐问题或回复。

训练数据包含了141,779条,正例:负例=1:10,根据超参数生成输入序列:

上图 中 x 就是输入序列。y_代表标注数据:正例还是负例,正例标为[1,0],负例标为[0,1],这样做的好处是方便计算损失函数和准确度。测试数据和验证数据也用同样的方式进行处理,唯一不同的是它们不需要做成mini-batch。需要强调的是,处理词汇表和构建输入序列的方式可以尝试用不同的方法,上述方案仅作为表达baseline结果而采用,一些有助于增强模型能力的,比如使用word2vec训练词向量都值得尝试。

Network

baseline model使用了最简单的神经网络,输入序列从左侧进入,输出序列输出包含2个数值的vector,然后使用损失函数计算误差。

超参数,Hyper params

损失函数

神经网络的激活函数使用函数,损失函数使用最大似然的思想。

 

迭代训练

使用mini-batch加载数据,迭代训练的大部分工作在back_propagation中完成,它计算出每次迭代的损失和b,W 的误差率,然后使用学习率和误差率更新每个b,W 。

 

执行训练脚本

python3 deep_qa_1/network.py

Visual

在训练过程中,观察损失函数和准确度的变化可以帮助优化超参数的设计。

loss

python3 visual/loss.py

accuracy

python3 visual/accuracy.py

在迭代了25,000步后就基本维持在一个固定值,学习停止了。

Baseline

使用获得的Baseline数据为:

Epoch 25, total step 36400, accuracy 0.9031, cost 1.056221.

总结

Baseline model设计的非常简单,它展示了如何使用insuranceqa-corpus-zh训练FAQ问答模型,项目的源码参考这里。在过去两周中,为了能让这个数据集能满足使用,体现其价值,我花了很多时间来建设,仓促之中仍然会包含一些不足,比如数据集中,每个问题是唯一的,不包含相似问题,是这个数据集目前最大的缺陷,另外一方面,因为该数据集的回复包含一个正例和多个负例,可以用用于训练分类器,也可以用于训练ranking model。如果在使用的过程中,遇到任何问题,可以通过数据集的地址 反馈。

反向传播算法入门资源索引

Deep Learning Specialization on Coursera

1、一切从维基百科开始,大致了解一个全貌:
反向传播算法 Backpropagation

2、拿起纸和笔,再加上ipython or 计算器,通过一个例子直观感受反向传播算法:
A Step by Step Backpropagation Example

3、再玩一下上篇例子对应的200多行Python代码: Neural Network with Backpropagation

4、有了上述直观的反向传播算法体验,可以从1986年这篇经典的论文入手了:Learning representations by back-propagating errors

5、如果还是觉得晦涩,推荐读一下"Neural Networks and Deep Learning"这本深度学习在线书籍的第二章:How the backpropagation algorithm works

6、或者可以通过油管看一下这个神经网络教程的前几节关于反向传播算法的视频: Neural Network Tutorial

7、hankcs 同学对于上述视频和相关材料有一个解读: 反向传播神经网络极简入门

8、这里还有一个比较简洁的数学推导:Derivation of Backpropagation

9、神牛gogo 同学对反向传播算法原理及代码解读:神经网络反向传播的数学原理

10、关于反向传播算法,更本质一个解释:自动微分反向模式(Reverse-mode differentiation )Calculus on Computational Graphs: Backpropagation

注:原创文章,转载请注明出处及保留链接“我爱自然语言处理”:http://www.52nlp.cn

本文链接地址:反向传播算法入门资源索引 http://www.52nlp.cn/?p=9350

QA问答系统中的深度学习技术实现

Deep Learning Specialization on Coursera

应用场景

智能问答机器人火得不行,开始研究深度学习在NLP领域的应用已经有一段时间,最近在用深度学习模型直接进行QA系统的问答匹配。主流的还是CNN和LSTM,在网上没有找到特别合适的可用的代码,自己先写了一个CNN的(theano),效果还行,跟论文中的结论是吻合的。目前已经应用到了我们的产品上。

原理

参看《Applying Deep Learning To Answer Selection: A Study And An Open Task》,文中比较了好几种网络结构,选择了效果相对较好的其中一个来实现,网络描述如下:

qacnn_v2

Q&A共用一个网络,网络中包括HL,CNN,P+T和Cosine_Similarity,HL是一个g(W*X+b)的非线性变换,CNN就不说了,P是max_pooling,T是激活函数Tanh,最后的Cosine_Similarity表示将Q&A输出的语义表示向量进行相似度计算。

详细描述下从输入到输出的矩阵变换过程:

  1. Qp:[batch_size, sequence_len],Qp是Q之前的一个表示(在上图中没有画出)。所有句子需要截断或padding到一个固定长度(因为后面的CNN一般是处理固定长度的矩阵),例如句子包含3个字ABC,我们选择固定长度sequence_len为100,则需要将这个句子padding成ABC<a><a>...<a>(100个字),其中的<a>就是添加的专门用于padding的无意义的符号。训练时都是做mini-batch的,所以这里是一个batch_size行的矩阵,每行是一个句子。
  2. Q:[batch_size, sequence_len, embedding_size]。句子中的每个字都需要转换成对应的字向量,字向量的维度大小是embedding_size,这样Qp就从一个2维的矩阵变成了3维的Q
  3. HL层输出:[batch_size, embedding_size, hl_size]。HL层:[embedding_size, hl_size],Q中的每个句子会通过和HL层的点积进行变换,相当于将每个字的字向量从embedding_size大小变换到hl_size大小。
  4. CNN+P+T输出:[batch_size, num_filters_total]。CNN的filter大小是[filter_size, hl_size],列大小是hl_size,这个和字向量的大小是一样的,所以对每个句子而言,每个filter出来的结果是一个列向量(而不是矩阵),列向量再取max-pooling就变成了一个数字,每个filter输出一个数字,num_filters_total个filter出来的结果当然就是[num_filters_total]大小的向量,这样就得到了一个句子的语义表示向量。T就是在输出结果上加上Tanh激活函数。
  5. Cosine_Similarity:[batch_size]。最后的一层并不是通常的分类或者回归的方法,而是采用了计算两个向量(Q&A)夹角的方法,下面是网络损失函数。t2,m是需要设定的参数margin,VQ、VA+、VA-分别是问题、正向答案、负向答案对应的语义表示向量。损失函数的意义就是:让正向答案和问题之间的向量cosine值要大于负向答案和问题的向量cosine值,大多少,就是margin这个参数来定义的。cosine值越大,两个向量越相近,所以通俗的说这个Loss就是要让正向的答案和问题愈来愈相似,让负向的答案和问题越来越不相似。

实现

代码点击这里,使用的数据是一份英文的insuranceQA,下面介绍代码重点部分:

字向量。本文采用字向量的方法,没有使用词向量。使用字向量的目的主要是为了解决未登录词的问题,这样在测试的时候就很少会遇到Unknown的字向量的问题了。而且字向量的效果也不一定比词向量的效果差,还省去了分词的各种麻烦。先用word2vec生成一份字向量,相当于我们在做pre-training了(之后测试了随机初始化字向量的方法,效果差不多)

原理中的步骤2。这里没有做HL层的变换,实际测试中,增加HL层有非常非常小的提升,所以在这里就省去了改步骤。

t4

CNN可以设置多种大小的filter,最后各种filter的结果会拼接起来。

t5

原理中的步骤4。这里执行卷积,max-pooling和Tanh激活。

t6

生成的ouputs_1是一个python的list,使用concatenate将list的多个tensor拼接起来(list中的每个tensor表示一种大小的filter卷积的结果)t7

原理中的步骤5。计算问题、正向答案、负向答案的向量夹角

t8

生成Loss损失函数和Accuracy。t9

核心的网络构建代码就是这些,其他的代码都是训练数据、验证数据的读入,以及theano构建训练时的一些常规代码。

如果需要增加HL层,可参照如下的代码。Whl即是HL层的网络,将input和Whl点积即可。t10

dropout的实现。

t11

结果

使用上面的代码,Test 1的Top-1 Accuracy可以达到61%-62%,和论文中的结论基本一致了,至于论文中提到的GESD、AESD等方法没有再测试了,运行较慢,其他数据集也没有再测试了。

下面是国外友人用一个叫keras的工具(封装的theano和tensorflow)弄的类似代码,Test 1的Top-1准确率在50%左右,比他这个要高:)

http://benjaminbolte.com/blog/2016/keras-language-modeling.html

Test set Top-1 Accuracy Mean Reciprocal Rank
Test 1 0.4933 0.6189
Test 2 0.4606 0.5968
Dev 0.4700 0.6088

另外,原始的insuranceQA需要进行一些处理才能在这个代码上使用,具体参看github上的说明吧。

一些技巧

  1. 字向量和词向量的效果相当。所以优先使用字向量,省去了分词的麻烦,还能更好的避免未登录词的问题,何乐而不为。
  2. 字向量不是固定的,在训练中会更新
  3. Dropout的使用对最高的准确率没有很大的影响,但是使用了Dropout的结果更稳定,准确率的波动会更小,所以建议还是要使用Dropout的。不过Dropout也不易过度使用,比如Dropout的keep_prob概率如果设置到0.25,则模型收敛得更慢,训练时间长很多,效果也有可能会更差,设置会差很多。我这版代码使用的keep_prob为0.5,同时保证准确率和训练时间。另外,Dropout只应用到了max-pooling的结果上,其他地方没有再使用了,过多的使用反而不好。
  4. 如何生成训练集。每个训练case需要一个问题+一个正向答案+一个负向答案,很明显问题和正向答案都是有的,负向答案的生成方法就是随机采样,这样就不需要涉及任何人工标注工作了,可以很方便的应用到大数据集上。
  5. HL层的效果不明显,有很微量的提升。如果HL层的大小是200,字向量是100,则HL层相当于将字向量再放大一倍,这个感觉没有多少信息可利用的,还不如直接将字向量设置成200,还省去了HL这一层的变换。
  6. margin的值一般都设置得比较小。这里用的是0.05
  7. 如果将Cosine_similarity这一层换成分类或者回归,印象中效果是不如Cosine_similarity的(具体数据忘了)
  8. num_filters越大并不是效果越好,基本到了一定程度就很难提升了,反而会降低训练速度。
  9. 同时也写了tensorflow版本代码,对比theano的,效果差不多
  10. Adam和SGD两种训练方法比较,Adam训练速度貌似会更快一些,效果基本也持平吧,没有太细节的对比。不过同样的网络+SGD,theano好像训练要更快一些。
  11. Loss和Accuracy是比较重要的监控参数。如果写一个新的网络的话,类似的指标是很有必要的,可以在每个迭代中评估网络是否正在收敛。因为调试比较麻烦,所以通过这些参数能评估你的网络写对没,参数设置是否正确。
  12. 网络的参数还是比较重要的,如果一些参数设置不合理,很有可能结果千差万别,记得最初用tensorflow实现的时候,应该是dropout设置得太小,导致效果很差,很久才找到原因。所以调参和微调网络还是需要一定的技巧和经验的,做这版代码的时候就经历了一段比较痛苦的调参过程,最开始还怀疑是网络设计或是代码有问题,最后总结应该就是参数没设置好。

结语

如果关注这个东西的人多的话,后面还可以有tensorflow版本的QA CNN,以及LSTM的代码奉上:)

补充

tensorflow的CNN代码已添加到github上,点击这里

Contact: jiangwen127@gmail.com weibo:码坛奥沙利文

斯坦福大学深度学习与自然语言处理第四讲:词窗口分类和神经网络

Deep Learning Specialization on Coursera

斯坦福大学在三月份开设了一门“深度学习与自然语言处理”的课程:CS224d: Deep Learning for Natural Language Processing,授课老师是青年才俊 Richard Socher,以下为相关的课程笔记。

第四讲:词窗口分类和神经网络(Word Window Classification and Neural Networks)

推荐阅读材料:

  1. [UFLDL tutorial]
  2. [Learning Representations by Backpropogating Errors]
  3. 第四讲Slides [slides]
  4. 第四讲视频 [video]

以下是第四讲的相关笔记,主要参考自课程的slides,视频和其他相关资料。
继续阅读

斯坦福大学深度学习与自然语言处理第三讲:高级的词向量表示

Deep Learning Specialization on Coursera

斯坦福大学在三月份开设了一门“深度学习与自然语言处理”的课程:CS224d: Deep Learning for Natural Language Processing,授课老师是青年才俊 Richard Socher,以下为相关的课程笔记。

第三讲:高级的词向量表示(Advanced word vector representations: language models, softmax, single layer networks)

推荐阅读材料:

  1. Paper1:[GloVe: Global Vectors for Word Representation]
  2. Paper2:[Improving Word Representations via Global Context and Multiple Word Prototypes]
  3. Notes:[Lecture Notes 2]
  4. 第三讲Slides [slides]
  5. 第三讲视频 [video]

以下是第三讲的相关笔记,主要参考自课程的slides,视频和其他相关资料。
继续阅读

斯坦福大学深度学习与自然语言处理第二讲:词向量

Deep Learning Specialization on Coursera

斯坦福大学在三月份开设了一门“深度学习与自然语言处理”的课程:CS224d: Deep Learning for Natural Language Processing,授课老师是青年才俊 Richard Socher,以下为相关的课程笔记。

第二讲:简单的词向量表示:word2vec, Glove(Simple Word Vector representations: word2vec, GloVe)

推荐阅读材料:

  1. Paper1:[Distributed Representations of Words and Phrases and their Compositionality]]
  2. Paper2:[Efficient Estimation of Word Representations in Vector Space]
  3. 第二讲Slides [slides]
  4. 第二讲视频 [video]

以下是第二讲的相关笔记,主要参考自课程的slides,视频和其他相关资料。
继续阅读

斯坦福大学深度学习与自然语言处理第一讲:引言

Deep Learning Specialization on Coursera

斯坦福大学在三月份开设了一门“深度学习与自然语言处理”的课程:CS224d: Deep Learning for Natural Language Processing,授课老师是青年才俊 Richard Socher,他本人是德国人,大学期间涉足自然语言处理,在德国读研时又专攻计算机视觉,之后在斯坦福大学攻读博士学位,拜师NLP领域的巨牛 Chris ManningDeep Learning 领域的巨牛 Andrew Ng,其博士论文是《Recursive Deep Learning for Natural Language Processing and Computer Vision》,也算是多年求学生涯的完美一击。毕业后以联合创始人及CTO的身份创办了MetaMind,作为AI领域的新星创业公司,MetaMind创办之初就拿了800万美元的风投,值得关注。

回到这们课程CS224d,其实可以翻译为“面向自然语言处理的深度学习(Deep Learning for Natural Language Processing)”,这门课程是面向斯坦福学生的校内课程,不过课程的相关材料都放到了网上,包括课程视频,课件,相关知识,预备知识,作业等等,相当齐备。课程大纲相当有章法和深度,从基础讲起,再讲到深度学习在NLP领域的具体应用,包括命名实体识别,机器翻译,句法分析器,情感分析等。Richard Socher此前在ACL 2012和NAACL 2013 做过一个Tutorial,Deep Learning for NLP (without Magic),感兴趣的同学可以先参考一下: Deep Learning for NLP (without Magic) - ACL 2012 Tutorial - 相关视频及课件 。另外,由于这门课程的视频放在Youtube上,@爱可可-爱生活 老师维护了一个网盘链接:http://pan.baidu.com/s/1pJyrXaF ,同步更新相关资料,可以关注。
继续阅读

PRML读书会第十四章 Combining Models

Deep Learning Specialization on Coursera

PRML读书会第十四章 Combining Models

主讲人 网神

(新浪微博: @豆角茄子麻酱凉面

网神(66707180) 18:57:18

大家好,今天我们讲一下第14章combining models,这一章是联合模型,通过将多个模型以某种形式结合起来,可以获得比单个模型更好的预测效果。包括这几部分:
committees, 训练多个不同的模型,取其平均值作为最终预测值。

boosting: 是committees的特殊形式,顺序训练L个模型,每个模型的训练依赖前一个模型的训练结果。
决策树:不同模型负责输入变量的不同区间的预测,每个样本选择一个模型来预测,选择过程就像在树结构中从顶到叶子的遍历。
conditional mixture model条件混合模型:引入概率机制来选择不同模型对某个样本做预测,相比决策树的硬性选择,要有很多优势。

本章主要介绍了这几种混合模型。讲之前,先明确一下混合模型与Bayesian model averaging的区别,贝叶斯模型平均是这样的:假设有H个不同模型h,每个模型的先验概率是p(h),一个数据集的分布是:
整个数据集X是由一个模型生成的,关于h的概率仅仅表示是由哪个模型来生成的 这件事的不确定性。而本章要讲的混合模型是数据集中,不同的数据点可能由不同模型生成。看后面讲到的内容就明白了。
继续阅读

PRML读书会第十三章 Sequential Data

Deep Learning Specialization on Coursera

PRML读书会第十三章 Sequential Data

主讲人 张巍

(新浪微博: @张巍_ISCAS

软件所-张巍<zh3f@qq.com> 19:01:27
我们开始吧,十三章是关于序列数据,现实中很多数据是有前后关系的,例如语音或者DNA序列,例子就不多举了,对于这类数据我们很自然会想到用马尔科夫链来建模:

例如直接假设观测数据之间服从一阶马尔科夫链,这个假设显然太简单了,因为很多数据时明显有高阶相关性的,一个解决方法是用高阶马尔科夫链建模:

但这样并不能完全解决问题 :1、高阶马尔科夫模型参数太多;2、数据间的相关性仍然受阶数限制。一个好的解决方法,是引入一层隐变量,建立如下的模型:

继续阅读