# Coursera上机器学习课程（公开课）汇总推荐

Coursera上有很多机器学习课程，这里做个总结，因为机器学习相关的概念和应用很多，这里推荐的课程仅限于和机器学习直接相关的课程，虽然深度学习属于机器学习范畴，这里暂时也将其排除在外，后续会专门推出深度学习课程的系列推荐。

1. Andrew Ng 老师的 机器学习课程（Machine Learning）

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This first course of the two would focus more on mathematical tools, and the other course would focus more on algorithmic tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重數學類的工具，而另一課程將較為著重方法類的工具。]

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This second course of the two would focus more on algorithmic tools, and the other course would focus more on mathematical tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重方法類的工具，而另一課程將較為著重數學類的工具。

4. 华盛顿大学的 "机器学习专项课程（Machine Learning Specialization）"

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices based on house-level features, analyze sentiment from user reviews, retrieve documents of interest, recommend products, and search for images. Through hands-on practice with these use cases, you will be able to apply machine learning methods in a wide range of domains. This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms. Together, these pieces form the machine learning pipeline, which you will use in developing intelligent applications. Learning Outcomes: By the end of this course, you will be able to: -Identify potential applications of machine learning in practice. -Describe the core differences in analyses enabled by regression, classification, and clustering. -Select the appropriate machine learning task for a potential application. -Apply regression, classification, clustering, retrieval, recommender systems, and deep learning. -Represent your data as features to serve as input to machine learning models. -Assess the model quality in terms of relevant error metrics for each task. -Utilize a dataset to fit a model to analyze new data. -Build an end-to-end application that uses machine learning at its core. -Implement these techniques in Python.

Case Study - Predicting Housing Prices In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression. In this course, you will explore regularized linear regression models for the task of prediction and feature selection. You will be able to handle very large sets of features and select between models of various complexity. You will also analyze the impact of aspects of your data -- such as outliers -- on your selected models and predictions. To fit these models, you will implement optimization algorithms that scale to large datasets. Learning Outcomes: By the end of this course, you will be able to: -Describe the input and output of a regression model. -Compare and contrast bias and variance when modeling data. -Estimate model parameters using optimization algorithms. -Tune parameters with cross validation. -Analyze the performance of the model. -Describe the notion of sparsity and how LASSO leads to sparse solutions. -Deploy methods to select between models. -Exploit the model to form predictions. -Build a regression model to predict prices using a housing dataset. -Implement these techniques in Python.

Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

Python机器学习应用课程，这门课程主要聚焦在通过Python应用机器学习，包括机器学习和统计学的区别，机器学习工具包scikit-learn的介绍，有监督学习和无监督学习，数据泛化问题（例如交叉验证和过拟合）等。这门课程同时属于"Python数据科学应用专项课程系列（Applied Data Science with Python Specialization）"。

This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.

6. 俄罗斯国立高等经济学院和Yandex联合推出的 高级机器学习专项课程系列（Advanced Machine Learning Specialization）

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

Bayesian methods are used in lots of fields: from game development to drug discovery. They give superpowers to many machine learning algorithms: handling missing data, extracting much more information from small datasets. Bayesian methods also allow us to estimate uncertainty in predictions, which is a really desirable feature for fields like medicine. When Bayesian methods are applied to deep learning, it turns out that they allow you to compress your models 100 folds, and automatically tune hyperparametrs, saving your time and money. In six weeks we will discuss the basics of Bayesian methods: from how to define a probabilistic model to how to make predictions from it. We will see how one can fully automate this workflow and how to speed it up using some advanced techniques. We will also see applications of Bayesian methods to deep learning and how to generate new images with it. We will see how new drugs that cure severe diseases be found with Bayesian methods.

7. 约翰霍普金斯大学的 Practical Machine Learning（机器学习实战）

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

10. 加州大学圣地亚哥分校的 Machine Learning With Big Data（大数据机器学习）

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems. At the end of the course, you will be able to: • Design an approach to leverage data using the steps in the machine learning process. • Apply machine learning techniques to explore and prepare data for modeling. • Identify the type of machine learning problem in order to apply the appropriate set of techniques. • Construct models that learn from data using widely available open source tools. • Analyze big data problems using scalable machine learning algorithms on Spark.

11. 俄罗斯搜索巨头Yandex推出的 Big Data Applications: Machine Learning at Scale（大数据应用：大规模机器学习）

Machine learning is transforming the world around us. To become successful, you’d better know what kinds of problems can be solved with machine learning, and how they can be solved. Don’t know where to start? The answer is one button away. During this course you will: - Identify practical problems which can be solved with machine learning - Build, tune and apply linear models with Spark MLLib - Understand methods of text processing - Fit decision trees and boost them with ensemble learning - Construct your own recommender system. As a practical assignment, you will - build and apply linear models for classification and regression tasks; - learn how to work with texts; - automatically construct decision trees and improve their performance with ensemble learning; - finally, you will build your own recommender system! With these skills, you will be able to tackle many practical machine learning tasks. We provide the tools, you choose the place of application to make this world of machines more intelligent.

# 斯坦福大学深度学习与自然语言处理第四讲：词窗口分类和神经网络

1. [UFLDL tutorial]
2. [Learning Representations by Backpropogating Errors]
3. 第四讲Slides [slides]
4. 第四讲视频 [video]

# PRML读书会第七章 Sparse Kernel Machines

PRML读书会第七章 Sparse Kernel Machines

（新浪微博: @豆角茄子麻酱凉面

# PRML读书会第四章 Linear Models for Classification

PRML读书会第四章 Linear Models for Classification

planktonli(1027753147) 19:52:28

1) Fisher准则的分类,以及它和最小二乘分类的关系 (Fisher分类是最小二乘分类的特例)
2) 概率生成模型的分类模型
3) 概率判别模型的分类模型
4) 全贝叶斯概率的Laplace近似

1) 全贝叶斯
2) 经验贝叶斯
3) MAP贝叶斯

MAP(poor man’s Bayesian)：不涉及marginalization，仅是一种按后验概率最大化的point estimate。这里的MAP(poor man’s Bayesian)是属于 点概率估计的。而全贝叶斯可以看作对test样本的所有参数集合的加权平均，PRML说的Bayesian主要还是指Empirical Bayesian： 继续阅读

# 斯坦福大学机器学习第八课“神经网络的表示(Neural Networks: Representation)”

1)  Non-linear hypotheses (非线性hypotheses)

2)  Neurons and the brain (神经元和大脑)

3)  Model representation I (模型表示一)

4)  Model representation II (模型表示二)

5)  Examples and intuitions I (例子和直观解释一)

6)  Examples and intuitions II (例子和直观解释二)

7)  Multi-class classification (多类分类问题)

1)  Non-linear hypotheses (非线性hypotheses)

2)  Neurons and the brain (神经元和大脑)

• 起源于尝试让机器模仿大脑的算法；
• 在80年代和90年代早期非常流行，慢慢在90年代后期衰落；
• 最近得益于计算机硬件能力，又开始流行起来：对于很多应用，神经网络算法是一种“时髦”的技术；

3)  Model representation I (模型表示一)

$a^{j}_i$ = j层第i个单元的激活函数

$\Theta^{(j)}$ = 从第j层映射到第j+1层的控制函数的权重矩阵

4)  Model representation II (模型表示二)

5)  Examples and intuitions I (例子和直观解释一)

6)  Examples and intuitions II (例子和直观解释二)

7)  Multi-class classification (多类分类问题)

http://en.wikipedia.org/wiki/Neural_network

http://en.wikipedia.org/wiki/Artificial_neural_network

## 神经网络编程入门

### 神经网络入门连载

http://library.thinkquest.org/29483/neural_index.shtml

http://home.agh.edu.pl/~vlsi/AI/xor_t/en/main.htm

http://en.wikipedia.org/wiki/NOR_logic

http://en.wikipedia.org/wiki/Logic_gate

# Coursera公开课笔记: 斯坦福大学机器学习第六课“逻辑回归(Logistic Regression)”

1) Classification(分类)

2) Hypothesis Representation

3) Decision boundary(决策边界)

4) Cost function(代价函数，成本函数)

5) Simplified cost function and gradient descent(简化版代价函数及梯度下降算法)

7) Multi-class classification: One-vs-all(多类分类问题)

1) Classification(分类)

1. 邮件：垃圾邮件/非垃圾邮件？
2. 在线交易：是否欺诈（是/否）？
3. 肿瘤：恶性/良性？

• 如果$h_\theta(x) \geq 0.5$，则预测y=1,既y属于正例；
• 如果$h_\theta(x) < 0.5$，则预测y=0,既y属于负例；

2) Hypothesis Representation

Sigmoid 函数在有个很漂亮的“S"形，如下图所示（引自维基百科）：

Hypothesis输出的直观解释：

$h_\theta(x)$ = 对于给定的输入x，y=1时估计的概率

3) Decision boundary(决策边界)

$h_\theta(x) < 0.5$时，y = 0;

$g(z) \geq 0.5$时, $z \geq 0$;

$\theta_0, \theta_1, \theta_2$分别取-3, 1, 1,

4) Cost function(代价函数，成本函数)

Hypothesis可表示为:

$h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$

Cost Function:

(1) 0-1损失函数(0-1 loss function):

(3) 绝对损失函数(absolute loss function)

(4) 对数损失函数(logarithmic loss function) 或对数似然损失函数(log-likelihood loss function)

5) Simplified cost function and gradient descent(简化版代价函数及梯度下降算法)

$

!min_\theta J(\theta)$

???????x???????$

h_\theta(x)

?????????????????????????" />$????????

?????????????????????????$

\theta$???$J(\theta)

??????" />$:

??????$

J(\theta)

?" />$????????????

?$

J(\theta)

?????????????????????????????" />$?????????????

?????????????????????????????$

h_\theta(x)$??????

?????

????$

\theta

??????????????????

• Quasi-Newton method(????)
• BFGS method
• L-BFGS(Limited-memory BFGS)

????????????????????????????????

??????????????

??????????????

????????-???????????????????????????????????52nlp??????????????????????????????????????????????

???????????Quasi-Newton Method??LBFGS???????????????????????????????????????????????????????????????????????????????????????????
1) Numerical Methods for Unconstrained Optimization and Nonlinear Equations?J.E. Dennis Jr. Robert B. Schnabel?
2) Numerical Optimization?Jorge Nocedal Stephen J. Wright?

7) Multi-class classification: One-vs-all(??????)

?????????

??????/??? ???????????????????

????(medical diagrams): ??????????

????????????

????????????

???????????

One-vs-all(one-vs-rest):

????????????????????????????????????????????????????

????????????????????

??-One-vs-all?????

?????? i ??????????????" />$?????????????

??????????????????

• Quasi-Newton method(????)
• BFGS method
• L-BFGS(Limited-memory BFGS)

????????????????????????????????

??????????????

??????????????

????????-???????????????????????????????????52nlp??????????????????????????????????????????????

???????????Quasi-Newton Method??LBFGS???????????????????????????????????????????????????????????????????????????????????????????
1) Numerical Methods for Unconstrained Optimization and Nonlinear Equations?J.E. Dennis Jr. Robert B. Schnabel?
2) Numerical Optimization?Jorge Nocedal Stephen J. Wright?

7) Multi-class classification: One-vs-all(??????)

?????????

??????/??? ???????????????????

????(medical diagrams): ??????????

????????????

????????????

???????????

One-vs-all(one-vs-rest):

????????????????????????????????????????????????????

????????????????????

??-One-vs-all?????

?????? i ??????????????$

h^{(i)}_\theta(x)，并且预测 y = i时的概率；

http://en.wikipedia.org/wiki/Sigmoid_function

http://en.wikipedia.org/wiki/Logistic_function

http://en.wikipedia.org/wiki/Loss_function