标签归档:机器学习

为了夸夸聊天机器人,爬了一份夸夸语料库

上周为了娱乐,写了一篇《一行Python代码实现夸夸聊天机器人》,虽然只有几十条人工整理的通用夸夸语料,但是貌似也能应付一些简单需求。不过这篇文章在微博、AINLP微信公众号、知乎专栏推送后,还是有很多同学强烈建议丰富语料库。这个建议其实是很不错的,所以周末认真调研了一番,决定从豆瓣上的夸夸小组入手,这里面有很多现成的语料,至于混进微信、QQ夸夸群,收集语料,我觉得不太现实。

豆瓣上有很多夸夸小组,貌似最大的莫过于“相互表扬小组”,最近因为这股夸夸风,据说这个小组已经开始限制加入新人了,我针对这个小组写了一个小爬虫,爬了一份夸夸语料,总计2万6千多个帖子,采集了标题、内容和回复的相关信息,保存为json格式,1个帖子1条,大概是这样的:

{"title": "因为没有男朋友,求夸", "url": "https://www.douban.com/group/topic/135844056/", "author": "71277500", "last_reply_time": "03-17 16:40", "content": "笨人原本一个人好好的,都单了两三年了,一直觉得挺开心的。最近不知道抽了什么风,突然特别想找个男朋友。但是但是,偏偏找不到靠谱的男朋友!现在一个人睡不着,没想明白这事,求夸。\n", "replies_num": "14", "replies": [{"content": "你这么可爱肯定会有一个很好很好的人在等你!", "post_id": "135844056", "comment_id": "1834208628", "user_id": "189783421", "pub_time": "2019-03-16 01:08:38"}, {"content": "最好的肯定要晚点出现哦", "post_id": "135844056", "comment_id": "1834208775", "user_id": "189783421", "pub_time": "2019-03-16 01:08:52"}, {"content": "“笨人”,刚看到开头就笑了", "post_id": "135844056", "comment_id": "1834282396", "user_id": "192799520", "pub_time": "2019-03-16 07:50:50"}, {"content": "一个好可耐的宝宝", "post_id": "135844056", "comment_id": "1834282931", "user_id": "192799520", "pub_time": "2019-03-16 07:52:24"}, {"content": "也许明天就出现了", "post_id": "135844056", "comment_id": "1834290527", "user_id": "185989534", "pub_time": "2019-03-16 08:11:38"}, {"content": "你知道有一个适合你的那个在等你吧", "post_id": "135844056", "comment_id": "1834308924", "user_id": "192597621", "pub_time": "2019-03-16 08:46:23"}, {"content": "如果没有男朋友,肯定是你太优秀", "post_id": "135844056", "comment_id": "1834313229", "user_id": "171520899", "pub_time": "2019-03-16 08:53:19"}, {"content": "没有男朋友多好,省钱", "post_id": "135844056", "comment_id": "1834320533", "user_id": "130379006", "pub_time": "2019-03-16 09:03:42"}, {"content": "哈哈,谢谢好可爱的你呀!", "post_id": "135844056", "comment_id": "1835717925", "user_id": "71277500", "pub_time": "2019-03-17 16:16:58"}, {"content": "有道理", "post_id": "135844056", "comment_id": "1835718260", "user_id": "71277500", "pub_time": "2019-03-17 16:17:22"}, {"content": "也许吧,哈哈哈", "post_id": "135844056", "comment_id": "1835718395", "user_id": "71277500", "pub_time": "2019-03-17 16:17:32"}, {"content": "原本想写本人,一不小心错别字,看样子还是很符合的", "post_id": "135844056", "comment_id": "1835719069", "user_id": "71277500", "pub_time": "2019-03-17 16:18:17"}, {"content": "没有,只是单纯地觉得很可爱,很符合你写一段话的文风😄ཽ……退一步讲,古人讲究谦辞,称呼自己要自谦,本人要说鄙人,你用“笨人”活泼可爱,也能称得上是一种自谦,还是你自创的,有趣", "post_id": "135844056", "comment_id": "1835734308", "user_id": "192799520", "pub_time": "2019-03-17 16:35:21"}, {"content": "哈哈,有道理,我懂了", "post_id": "135844056", "comment_id": "1835738373", "user_id": "71277500", "pub_time": "2019-03-17 16:40:00"}]}

写到这里,估计还是会有同学准备留言索要数据了,因为即使上次区区几十条语料,随便google一下就可以得到的“夸夸语料”都有同学留言索取,所以这里准备多说几句,关于夸夸聊天机器人,关于夸夸语料库。

上个周,在看到清华刘知远老师的评论后,我是用娱乐的心态写了上周的那篇文章:《一行Python代码实现夸夸聊天机器人》,没想到,反响还不错,甚至有一些同学提了很好的建议。所以当周末认真思考这件事的可行性时,突然觉得,夸夸聊天机器人是一个绝好的机器学习实践项目:仅从一个idea出发,怎样做一个不错的夸夸聊天机器人?

作为自然语言处理四大难题之一的自动问答,个人觉得目前还远远不够“智能”,虽然市面上有很多聊天机器人,但是观察来看,以娱乐的心态来对话是可以的,或者完成一些简单的任务是没有问题的,例如询问天气,但是如果抱着很高的期望,很多轮对话下来,基本可以认为这个聊天机器人“不靠谱”, “答非所问”,甚至是个“智障”。虽然通用领域的智能问答或者聊天机器人还有很长的路要走,但是如果把这个问题限定在垂直领域或者很小的需求范围,那么问题可能就有解了,例如夸夸聊天机器人,需求就很简单:做啥都夸。简单的就是随便夸,复杂一点或者个性化的就是夸某个点、某件事、某个人,前者吗,就是上次《一行Python代码实现夸夸聊天机器人》做得事情,准备一些通用夸奖的语料,然后随机夸;后者,需要准备一些夸夸规则和夸夸语料库。

开个玩笑,二十一世纪什么最贵?当然是数据了,确切的说,是面向特定任务的特定数据。现在不缺机器学习框架,不缺算法,不缺机器,甚至不缺“人”,缺什么,就缺数据。这段时间,因为夸夸群的兴起,很多人看到了商机,说不定哪一天你的老板把你找来,直接给扔给你一个任务:做一个夸夸聊天机器人?怎么办,当然要调研啦。花了大半天时间,你了解了聊天机器人的前世今生,发现了人工智能标记语言AIML,知道了Chatbot的种种玩法,基于规则的、基于机器学习模型的、基于知识图谱的等等等等,甚至还有很多智能问答开源框架可以直接套用,最后,当你兴高采烈的准备动手实践的时候,你突然发现,还没有数据,你需要数据,需要夸夸语料库。
继续阅读

Start your future on Coursera today.

那些值得推荐和收藏的线性代数学习资源

关于线性代数的重要性,很多做机器学习的同学可能会感同身受,这里引用“牛人林达华推荐有关机器学习的数学书籍”这篇文章中关于线性代数的一段话:

线性代数 (Linear Algebra):

我想国内的大学生都会学过这门课程,但是,未必每一位老师都能贯彻它的精要。这门学科对于Learning是必备的基础,对它的透彻掌握是必不可少的。我在科大一年级的时候就学习了这门课,后来到了香港后,又重新把线性代数读了一遍,所读的是

Introduction to Linear Algebra (3rd Ed.) by Gilbert Strang.

这本书是MIT的线性代数课使用的教材,也是被很多其它大学选用的经典教材。它的难度适中,讲解清晰,重要的是对许多核心的概念讨论得比较透彻。我个人觉得,学习线性代数,最重要的不是去熟练矩阵运算和解方程的方法——这些在实际工作中MATLAB可以代劳,关键的是要深入理解几个基础而又重要的概念:子空间(Subspace),正交(Orthogonality),特征值和特征向量(Eigenvalues and eigenvectors),和线性变换(Linear transform)。从我的角度看来,一本线代教科书的质量,就在于它能否给这些根本概念以足够的重视,能否把它们的联系讲清楚。Strang的这本书在这方面是做得很好的。

而且,这本书有个得天独厚的优势。书的作者长期在MIT讲授线性代数课(18.06),课程的video在MIT的Open courseware网站上有提供。有时间的朋友可以一边看着名师授课的录像,一边对照课本学习或者复习。

https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
(注:这里我修正了一下链接,原文链接已经没有了)

那么这里首推的线性代数学习资源就是 Gilbert Strang 教授的这门线性代数课程了,除了上面链接中官方主页的英文原版外,国内网易公开课也早已引进并有同步翻译。

1. 麻省理工公开课:线性代数

http://open.163.com/special/opencourse/daishu.html

课程介绍:

“线性代数”,同微积分一样,是高等数学中两大入门课程之一,不仅是一门非常好的数学课程,也是一门非常好的工具学科,在很多领域都有广泛的用途。它的研 究对象是向量,向量空间(或称线性空间),线性变换和有限维的线性方程组。本课程讲述了矩阵理论及线性代数的基本知识,侧重于那些与其他学科相关的内容, 包括方程组、向量空间、行列式、特征值、相似矩阵及正定矩阵。

课程主讲人:Gilbert Strang 教授

吉尔伯特-斯特朗:1934年11月27日出生,是美国享有盛誉的数学家,在有限元理论、变分法、小波分析及线性代数方面均有所建树。他对教育的贡献尤为 卓著,包括所著有的七部经典数学教材及一部专著。斯特朗自1962年至今担任麻省理工学院教授,其所授课程《线性代数导论》、《计算科学与工程》均在 MIT开放课程软件(MIT OpenCourseWare)中收录,获得广泛好评。

我大概在2013年学习过这门课程,也花了很多时间找这门课程的书籍资源,最终锁定了这本书的第四版英文版电子版:Introduction to Linear Algebra_4ED_Strang ,感兴趣的同学可以关注我们的公众号AINLP,后台回复"xiandai"获取下载链接。

2. 3Blue1Brown: Essence of linear algebra(线性代数的本质)

如果说上面 Gilbert Strang 教授的线性代数课程和书籍都是大部头,那么鼎鼎大名的3Blue1Brown出品的这个线性代数的本质系列视频就是开胃菜,总共14个小视频,视频控制在9-18分钟之间,很适合短时间快速温习。不过这套视频的评价也很高,以下是来自《3Blue1Brown:“线性代数的本质”完整笔记》的点评:

我最早系统地学习线性代数是在大二时候,当时特意选修了学校物理系开设的4学分的线代,大概也就是比我们自己专业的线代多了一章向量空间的内容,其实最后上完发现,整个课程内容还是偏向于计算,对线性代数的几何直觉少有提起,对线性代数的实际运用更是鲜有涉及。同济的那本薄薄的如同九阴真经一般的教材,把线性代数讲的云里雾里,当时一个人在自习教室度过多少不眠之夜,一点一点去思考其概念定理背后的实际意义,多半也是边猜边想,苦不堪言。直到多年以后,有幸在网上听到了MIT的Strang老师开设的线代公开课,才对一些基础概念渐渐明朗,虽然至今又过去了很多年,但是对一些本质的理解,依然清晰。
不过,仔细想想,国内的教材写的云里雾里,才促使了我自发的思考,如果一切得来太容易,也许就不会那么刻骨铭心。我很早之前就想过这个问题,国内的教科书作者简直就是在下一盘大棋,自己出版的书写的高深莫测,翻译国外的书又翻译的含糊曲折,那么留给学生的只有两条路,要么去看原版的英语书,要么就是自己一点点看云雾缭绕的国产书,边猜边想边证明,不管走哪条路,都能走向成功。

最近,在youtube上看到了3Blue1Brown的Essence of linear algebra这门课,有种如获至宝的感觉,整个课程的时间并不长,但是对线性代数的讲解却十分到位,有种浓缩版的Gilbert Strang线代课程的感觉。希望通过这个课程,重温一下Linear Algebra。

这个视频,可以在油管上看官方原版:Essence of linear algebra
也可以在B站上观看:线性代数的本质 - 01 - 向量究竟是什么?
https://www.bilibili.com/video/av5987715/

3. Immersive Linear Algebra

用交互式可视化方法学习数学估计是很多同学梦寐以求的,前两天看到这条微博:

《英文版的线性代数电子书:Immersive Linear Algebra》该书是今天 Hacker News 首页头条。号称是全球第一个全交互式图形的线代电子书。

所以在这里收藏一下,有空的同学可以试一下这个在线学习线性代数的网站,不过看似还有最后两个章节没有完成:http://immersivemath.com/ila/index.html

4. Matrix Algebra for Engineers

http://coursegraph.com/coursera-matrix-algebra-engineers

香港科技大学的面向工程师的矩阵代数(Matrix Algebra for Engineers),该课程介绍的全部是关于矩阵的知识,涵盖了工程师应该知道的线性代数相关知识。学习这门课程的前提是高中数学知识,最好完成了单变量微积分课程之后选修该课程效果更佳。

This course is all about matrices, and concisely covers the linear algebra that an engineer should know. We define matrices and how to add and multiply them, and introduce some special types of matrices. We describe the Gaussian elimination algorithm used to solve systems of linear equations and the corresponding LU decomposition of a matrix. We explain the concept of vector spaces and define the main vocabulary of linear algebra. We develop the theory of determinants and use it to solve the eigenvalue problem. After each video, there are problems to solve and I have tried to choose problems that exemplify the main idea of the lecture. I try to give enough problems for students to solidify their understanding of the material, but not so many that students feel overwhelmed and drop out. I do encourage students to attempt the given problems, but if they get stuck, full solutions can be found in the lecture notes for the course. The mathematics in this matrix algebra course is presented at the level of an advanced high school student, but typically students would take this course after completing a university-level single variable calculus course.

这门课程有个lecture-notes可以直接下载:
http://www.math.ust.hk/~machas/matrix-algebra-for-engineers.pdf

5. Mathematics for Machine Learning: Linear Algebra

http://coursegraph.com/coursera-linear-algebra-machine-learning

伦敦帝国理工学院的 面向机器学习的数学-线性代数课程(Mathematics for Machine Learning: Linear Algebra),这个课程属于Mathematics for Machine Learning Specialization 系列,该系列包含3门子课程,涵盖线性代数,多变量微积分,以及主成分分析(PCA),这个专项系列课程的目标是弥补数学与机器学习以及数据科学鸿沟:Mathematics for Machine Learning。Learn about the prerequisite mathematics for applications in data science and machine learning

In this course on Linear Algebra we look at what linear algebra is and how it relates to vectors and matrices. Then we look through what vectors and matrices are and how to work with them, including the knotty problem of eigenvalues and eigenvectors, and how to use these to solve problems. Finally we look at how to use these to do fun things with datasets - like how to rotate images of faces and how to extract eigenvectors to look at how the Pagerank algorithm works. Since we're aiming at data-driven applications, we'll be implementing some of these ideas in code, not just on pencil and paper. Towards the end of the course, you'll write code blocks and encounter Jupyter notebooks in Python, but don't worry, these will be quite short, focussed on the concepts, and will guide you through if you’ve not coded before. At the end of this course you will have an intuitive understanding of vectors and matrices that will help you bridge the gap into linear algebra problems, and how to apply these concepts to machine learning.

6. 可汗学院公开课:线性代数

http://open.163.com/special/Khan/linearalgebra.html

网易公开课引进翻译的可汗学院线性代数公开课,总共143集,每集短小精悍:

在这个课程里面,主讲者介绍了线性代数的很多内容,包括:矩阵,线性方程组,向量及其运算,向量空间,子空间,零空间,变换,秩与维数,正交化,特征值与特征向量,等等。以上这些内容是线性代数的关键内容,它们也被广泛地应用到现代科学当中。

关于线性代数学习资源,还有很多,这里仅仅抛砖引玉,欢迎大家留言提供线索。

最后,提供一个线性代数学习资源的“大礼包”,包括Gilbert Strang 教授线性代数英文教材第四版电子版,香港科技大学的面向工程师的矩阵代数课程notes,以及从其他地方收集的线性代数网盘资源,感兴趣的同学可以关注我们的公众号AINLP,回复"xiandai"获取:

注:本文首发于课程图谱,转载请注明出处“课程图谱博客”:http://blog.coursegraph.com

本文链接地址:那些值得推荐和收藏的线性代数学习资源 http://blog.coursegraph.com/?p=1014

Start your future on Coursera today.

凸优化及无约束最优化相关资料

很多年前,我的师兄 Jian Zhu 在这里发表过一个系列《无约束最优化》,当时我写下了一段话:

估计有些读者看到这个题目的时候会觉得很数学,和自然语言处理没什么关系,不过如果你听说过最大熵模型、条件随机场,并且知道它们在自然语言处理中被广泛应用,甚至你明白其核心的参数训练算法中有一种叫LBFGS,那么本文就是对这类用于解无约束优化算法的Quasi-Newton Method的初步介绍。

事实上,无论机器学习还是机器学习中的深度学习,数值优化算法都是核心之一,而在这方面,斯坦福大学Stephen Boyd教授等所著的《凸优化》堪称经典:Convex Optimization – Boyd and Vandenberghe ,而且该书的英文电子版在该书主页上可以直接免费下载:

http://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

还附带了长达301页的Slides:

http://web.stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf

以及额外的练习题、相关代码数据文件:

http://web.stanford.edu/~boyd/cvxbook/bv_cvxbook_extra_exercises.pdf
http://web.stanford.edu/~boyd/cvxbook/cvxbook_additional_exercises/

相当贴心,另外Stephen Boyd教授2014年还在斯坦福大学自家的MOOC平台上开过相关课程: CVX101

https://class.stanford.edu/courses/Engineering/CVX101

提示是:A MOOC on convex optimization, CVX101, was run from 1/21/14 to 3/14/14. If you register for it, you can access all the course materials.

不知道现在注册是否还可以访问课程材料,我当年竟然注册过这门课程,所以还能访问相关资料:

这本书也有中文翻译版,由清华大学出版社出版:

http://www.tup.tsinghua.edu.cn/bookscenter/book_03184902.html

最后提供上述相关材料的打包下载,包括凸优化课程视频、英文原版书籍、练习题和Slides,另外也包括《无约束最优化》的PDF文档,感兴趣的同学可以关注我们的公众号AINLP,回复"youhua"下载:

注:原创文章,转载请注明出处及保留链接“我爱自然语言处理”:http://www.52nlp.cn

本文链接地址:凸优化及无约束最优化相关资料 http://www.52nlp.cn/?p=11222

Start your future on Coursera today.

谷歌云平台上基于TensorFlow的高级机器学习专项课程

Coursera近期推了一门新专项课程:谷歌云平台上基于TensorFlow的高级机器学习专项课程(Advanced Machine Learning with TensorFlow on Google Cloud Platform Specialization),看起来很不错。这个系列包含5门子课程,涵盖端到端机器学习、生产环境机器学习系统、图像理解、面向时间序列和自然语言处理的序列模型、推荐系统等内容,感兴趣的同学可以关注:Learn Advanced Machine Learning with Google Cloud. Build production-ready machine learning models with TensorFlow on Google Cloud Platform.

课程链接:http://coursegraph.com/coursera-specializations-advanced-machine-learning-tensorflow-gcp
继续阅读

Start your future on Coursera today.

Andrew Ng 老师新推的通俗人工智能课程以及其他相关资料

Andrew Ng 老师是我的偶像,他在普及机器学习和深度学习的道路上纵情向前,这不他又在 Coursera 上新推了一门通俗人工智能课程:AI For Everyone(全民AI) :

http://coursegraph.com/coursera-ai-for-everyone

这门课程面向大众进行AI科普,将于2019年年初开课,目前已经可以注册课程。AI不仅适用于工程师,这门非技术性人工智能课程将帮助学习者了解机器学习和深度学习等相关技术,以及将AI应用于自己组织中的问题和机会。 通过这门课程,学习者将会了解当前人工智能可以或者不能做的事情。最后,学习者将了解AI如何影响社会以及我们将如何应对这种技术变革。

AI is not only for engineers. This non-technical course will help you understand technologies like machine learning and deep learning and spot opportunities to apply AI to problems in your own organization. You will see examples of what today’s AI can – and cannot – do. Finally, you will understand how AI is impacting society and how to navigate through this technological change.

If you are a non-technical business leader, “AI for Everyone” will help you understand how to build a sustainable AI strategy. If you are a machine learning engineer or data scientist, this is the course to ask your manager, VP or CEO to take if you want them to understand what you can (and cannot!) do.

继续阅读

Start your future on Coursera today.

Coursera专项课程推荐:金融中的机器学习和强化学习

Coursera近期新推了一个金融和机器学习的专项课程系列:Machine Learning and Reinforcement Learning in Finance Specialization(金融中的机器学习和强化学习),看起来很有意思。

课程链接:http://coursegraph.com/coursera-specializations-machine-learning-reinforcement-finance

这个专项课程的主要目标是为金融相关的机器学习核心范式和算法奠定坚实的基础而提供必要的知识和实战技能,特别关注机器学习在金融投资中不同的实际问题中的应用。

该系列旨在帮助学生解决他们在现实生活中可能遇到的实际的机器学习问题,包括:

(1)将问题映射到可用的机器学习方法的泛化场景,

(2)选择最适合解决问题的特定机器学习方法,以及

(3)成功实施解决方案,并评估其性能。

该专业课程面向三类学生设计:

· 在银行,资产管理公司或对冲基金等金融机构工作的从业人员

· 对将机器学习应用于日内交易感兴趣的个人

· 目前正在攻读金融学,统计学,计算机科学,数学,物理学,工程学或其他相关学科的学位的全日制学生,这些学生希望了解机器学习在金融领域的实际应用。
继续阅读

Start your future on Coursera today.

AI Challenger 2018 细粒度用户评论情感分析 fastText Baseline

上一篇《AI Challenger 2018 进行时》文尾我们提到 AI Challenger 官方已经在 GitHub 上提供了多个赛道的 Baseline: AI Challenger 2018 Baseline ,其中文本挖掘相关的3个主赛道均有提供,非常适合用来学习:英中文本机器翻译的 baseline 就直接用了Google官方基于Tensorflow实现的Tensor2Tensor跑神经网络机器翻译Transformer模型,这个思路是我在去年《AI Challenger 2017 奇遇记》里的终极方案,今年已成标配;细粒度用户评论情感分析提供了一个基于支持向量机(SVM)的多分类模型 baseline;观点型问题阅读理解提供一个深度学习模型 baseline , 基于pytorch实现论文《Multiway Attention Networks for Modeling Sentence Pairs》里的思路。

本次 AI Challenger 2018, 除了英中文本机器翻译,另一个我比较关注的赛道是: 细粒度用户评论情感分析。情感分析是自然语言处理里面的一个经典任务,估计很多同学入门NLP的时候都玩过 IMDB Movie Reviews Dataset , 这个可以定义为一个二分类的情感分类问题。不过这次 AI Challenger 的细粒度用户评论情感分析问题,并不是这么简单:
继续阅读

Start your future on Coursera today.

Coursera上数据科学相关课程(公开课)汇总推荐

Coursera上的数据科学课程有很多,这里汇总一批。

1、 Introduction to Data Science Specialization

IBM公司推出的数据科学导论专项课程系列(Introduction to Data Science Specialization),这个系列包括4门子课程,涵盖数据科学简介,面向数据科学的开源工具,数据科学方法论,SQL基础,感兴趣的同学可以关注:Launch your career in Data Science。Data Science skills to prepare for a career or further advanced learning in Data Science.

1) What is Data Science?
2) Open Source tools for Data Science
3) Data Science Methodology
4) Databases and SQL for Data Science

2、Applied Data Science Specialization

IBM公司推出的 应用数据科学专项课程系列(Applied Data Science Specialization),这个系列包括4门子课程,涵盖面向数据科学的Python,Python数据可视化,Python数据分析,数据科学应用毕业项目,感兴趣的同学可以关注:Get hands-on skills for a Career in Data Science。Learn Python, analyze and visualize data. Apply your skills to data science and machine learning.

1) Python for Data Science
2) Data Visualization with Python
3) Data Analysis with Python
4) Applied Data Science Capstone

3、Applied Data Science with Python Specialization

密歇根大学的Python数据科学应用专项课程系列(Applied Data Science with Python),这个系列的目标主要是通过Python编程语言介绍数据科学的相关领域,包括应用统计学,机器学习,信息可视化,文本分析和社交网络分析等知识,并结合一些流行的Python工具包进行讲授,例如pandas, matplotlib, scikit-learn, nltk以及networkx等Python工具。感兴趣的同学可以关注:Gain new insights into your data-Learn to apply data science methods and techniques, and acquire analysis skills.

1) Introduction to Data Science in Python
2) Applied Plotting, Charting & Data Representation in Python
3) Applied Machine Learning in Python
4) Applied Text Mining in Python
5) Applied Social Network Analysis in Python

4、Data Science Specialization

约翰霍普金斯大学的数据科学专项课程系列(Data Science Specialization),这个系列课程有10门子课程,包括数据科学家的工具箱,R语言编程,数据清洗和获取,数据分析初探,可重复研究,统计推断,回归模型,机器学习实践,数据产品开发,数据科学毕业项目,感兴趣的同学可以关注: Launch Your Career in Data Science-A nine-course introduction to data science, developed and taught by leading professors.

1) The Data Scientist’s Toolbox
2) R Programming
3) Getting and Cleaning Data
4) Exploratory Data Analysis
5) Reproducible Research
6) Statistical Inference
7) Regression Models
8) Practical Machine Learning
9) Developing Data Products
10) Data Science Capstone

5、Data Science at Scale Specialization

华盛顿大学的大规模数据科学专项课程系列(Data Science at Scale ),这个系列包括3门子课程和1个毕业项目课程,包括大规模数据系统和算法,数据分析模型与方法,数据科学结果分析等,感兴趣的同学可以关注: Tackle Real Data Challenges-Master computational, statistical, and informational data science in three courses.

1) Data Manipulation at Scale: Systems and Algorithms
2) Practical Predictive Analytics: Models and Methods
3) Communicating Data Science Results
4) Data Science at Scale – Capstone Project

6、Advanced Data Science with IBM Specialization

IBM公司推出的高级数据科学专项课程系列(Advanced Data Science with IBM Specialization),这个系列包括4门子课程,涵盖数据科学基础,高级机器学习和信号处理,结合深度学习的人工智能应用等,感兴趣的同学可以关注:Expert in DataScience, Machine Learning and AI。Become an IBM-approved Expert in Data Science, Machine Learning and Artificial Intelligence.

1) Fundamentals of Scalable Data Science
2) Advanced Machine Learning and Signal Processing
3) Applied AI with DeepLearning
4) Advanced Data Science Capstone

7、Data Mining Specialization

伊利诺伊大学香槟分校的数据挖掘专项课程系列(Data Mining Specialization),这个系列包含5门子课程和1个毕业项目课程,涵盖数据可视化,信息检索,文本挖掘与分析,模式发现和聚类分析等,感兴趣的同学可以关注:Data Mining Specialization-Analyze Text, Discover Patterns, Visualize Data. Solve real-world data mining challenges.

1) Data Visualization
2) Text Retrieval and Search Engines
3) Text Mining and Analytics
4) Pattern Discovery in Data Mining
5) Cluster Analysis in Data Mining
6) Data Mining Project

8、Data Analysis and Interpretation Specialization

数据分析和解读专项课程系列(Data Analysis and Interpretation Specialization),该系列包括5门子课程,分别是数据管理和可视化,数据分析工具,回归模型,机器学习,毕业项目,感兴趣的同学可以关注:Learn Data Science Fundamentals-Drive real world impact with a four-course introduction to data science.

1) Data Management and Visualization
2) Data Analysis Tools
3) Regression Modeling in Practice
4) Machine Learning for Data Analysis
5) Data Analysis and Interpretation Capstone

9、Executive Data Science Specialization

可管理的数据科学专项课程系列(Executive Data Science Specialization),这个系列包含4门子课程和1门毕业项目课程,涵盖数据科学速成,数据科学小组建设,数据分析管理,现实生活中的数据科学等,感兴趣的同学可以关注:Be The Leader Your Data Team Needs-Learn to lead a data science team that generates first-rate analyses in four courses.

1)A Crash Course in Data Science
2)Building a Data Science Team
3)Managing Data Analysis
4)Data Science in Real Life
5)Executive Data Science Capstone

10、其他相关的数据科学课程

1) Data Science Math Skills
2) Data Science Ethics
3) How to Win a Data Science Competition: Learn from Top Kagglers

注:本文首发“课程图谱博客”:http://blog.coursegraph.com

同步发布到这里, 本本文链接地址:http://blog.coursegraph.com/coursera上数据科学相关课程数据科学公开课汇总推荐 http://blog.coursegraph.com/?p=851

Start your future on Coursera today.

Coursera上数学类相关课程(公开课)汇总推荐

数学课程是基础,Coursera上有很多数学公开课,这里做个汇总,注意由于Coursera上有一批很有特色的统计学相关的数学课程,我们将在下一期里单独汇总。

1 斯坦福大学 Introduction to Mathematical Thinking(数学思维导论)

http://coursegraph.com/coursera-mathematical-thinking

引用老版课程一个同学的评价,供参考:

这门课是高中数学到大学数学的一个过度。高中数学一般重计算不太注重证明,这门课讲了基本的逻辑,数学语言(两个 quantifier,there exists, for all)和证明的几个基本方法,比如证明充要条件要从两个方向证、证伪只需要举个反例,原命题不好证的时候可以证等价的逆否命题以及很常用的数学归纳法。课程讲了数论里一些基本定理,然后通过让你证一些看起来显然而不需要证明的证明题来训练你证明的技能和逻辑思考的能力,看起来显然的命题也是要证明才能说服人的,课程最后简略的讲了下数学分析里面实数的引入,但这部分讲的不完整。Keith Devlin 是个 old school 的讲师,上课只用纸和笔,也是属于比较热情的讲师,他每周都会录几个答疑的视频。这门比较适合大一的新生上,开得也比较频繁。

课程简介:

Learn how to think the way mathematicians do – a powerful cognitive process developed over thousands of years. Mathematical thinking is not the same as doing mathematics – at least not as mathematics is typically presented in our school system. School math typically focuses on learning procedures to solve highly stereotyped problems. Professional mathematicians think a certain way to solve real problems, problems that can arise from the everyday world, or from science, or from within mathematics itself. The key to success in school math is to learn to think inside-the-box. In contrast, a key feature of mathematical thinking is thinking outside-the-box – a valuable ability in today’s world. This course helps to develop that crucial way of thinking.

2 加州大学尔湾分校 初级微积分系列课程

1)Pre-Calculus: Functions(初级微积分:函数)

http://coursegraph.com/coursera-pre-calculus

This course covers mathematical topics in college algebra, with an emphasis on functions. The course is designed to help prepare students to enroll for a first semester course in single variable calculus. Upon completing this course, you will be able to: 1. Solve linear and quadratic equations 2. Solve some classes of rational and radical equations 3. Graph polynomial, rational, piece-wise, exponential and logarithmic functions 4. Find integer roots of polynomial equations 5. Solve exponential and logarithm equations 6. Understand the inverse relations between exponential and logarithm equations 7. Compute values of exponential and logarithm expressions using basic properties

2)Pre-Calculus: Trigonometry(初级微积分:三角)

http://coursegraph.com/coursera-trigonometry

This course covers mathematical topics in trigonometry. Trigonometry is the study of triangle angles and lengths, but trigonometric functions have far reaching applications beyond simple studies of triangles. This course is designed to help prepare students to enroll for a first semester course in single variable calculus. Upon completing this course, you will be able to: 1. Evaluate trigonometric functions using the unit circle and right triangle approaches 2. Solve trigonometric equations 3. Verify trigonometric identities 4. Prove and use basic trigonometric identities. 5. Manipulate trigonometric expressions using standard identities 6. Solve right triangles 7. Apply the Law of Sines and the Law of Cosines

3 宾夕法尼亚大学的 单变量微积分系列课程

1)Calculus: Single Variable Part 1 - Functions(单变量微积分1:函数)
http://coursegraph.com/coursera-single-variable-calculus

Calculus is one of the grandest achievements of human thought, explaining everything from planetary orbits to the optimal size of a city to the periodicity of a heartbeat. This brisk course covers the core ideas of single-variable Calculus with emphases on conceptual understanding and applications. The course is ideal for students beginning in the engineering, physical, and social sciences. Distinguishing features of the course include: 1) the introduction and use of Taylor series and approximations from the beginning; 2) a novel synthesis of discrete and continuous forms of Calculus; 3) an emphasis on the conceptual over the computational; and 4) a clear, dynamic, unified approach. In this first part--part one of five--you will extend your understanding of Taylor series, review limits, learn the *why* behind l'Hopital's rule, and, most importantly, learn a new language for describing growth and decay of functions: the BIG O.

2)Calculus: Single Variable Part 2 - Differentiation(单变量微积分2:微分)

http://coursegraph.com/coursera-differentiation-calculus

Calculus is one of the grandest achievements of human thought, explaining everything from planetary orbits to the optimal size of a city to the periodicity of a heartbeat. This brisk course covers the core ideas of single-variable Calculus with emphases on conceptual understanding and applications. The course is ideal for students beginning in the engineering, physical, and social sciences. Distinguishing features of the course include: 1) the introduction and use of Taylor series and approximations from the beginning; 2) a novel synthesis of discrete and continuous forms of Calculus; 3) an emphasis on the conceptual over the computational; and 4) a clear, dynamic, unified approach. In this second part--part two of five--we cover derivatives, differentiation rules, linearization, higher derivatives, optimization, differentials, and differentiation operators.

3)Calculus: Single Variable Part 3 - Integration(单变量微积分3:积分)

http://coursegraph.com/coursera-integration-calculus

Calculus is one of the grandest achievements of human thought, explaining everything from planetary orbits to the optimal size of a city to the periodicity of a heartbeat. This brisk course covers the core ideas of single-variable Calculus with emphases on conceptual understanding and applications. The course is ideal for students beginning in the engineering, physical, and social sciences. Distinguishing features of the course include: 1) the introduction and use of Taylor series and approximations from the beginning; 2) a novel synthesis of discrete and continuous forms of Calculus; 3) an emphasis on the conceptual over the computational; and 4) a clear, dynamic, unified approach. In this third part--part three of five--we cover integrating differential equations, techniques of integration, the fundamental theorem of integral calculus, and difficult integrals.

4) Calculus: Single Variable Part 4 - Applications(单变量微积分4:应用)

http://coursegraph.com/coursera-applications-calculus

Calculus is one of the grandest achievements of human thought, explaining everything from planetary orbits to the optimal size of a city to the periodicity of a heartbeat. This brisk course covers the core ideas of single-variable Calculus with emphases on conceptual understanding and applications. The course is ideal for students beginning in the engineering, physical, and social sciences. Distinguishing features of the course include: 1) the introduction and use of Taylor series and approximations from the beginning; 2) a novel synthesis of discrete and continuous forms of Calculus; 3) an emphasis on the conceptual over the computational; and 4) a clear, dynamic, unified approach. In this fourth part--part four of five--we cover computing areas and volumes, other geometric applications, physical applications, and averages and mass. We also introduce probability.

4 杜克大学 Data Science Math Skills(数据科学中的数学技巧)

http://coursegraph.com/coursera-datasciencemathskills

这门课程主要介绍数据科学中涉及的相关数学概念,让学生了解基本的数学概念,掌握基本的数学语言,内容涵盖集合论、求和的Sigma符号、数学上的笛卡尔(x,y)平面、指数、对数和自然对数函数,概率论以及叶斯定理等:

Data science courses contain math—no avoiding that! This course is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus. Data Science Math Skills introduces the core math that data science is built upon, with no extra complexity, introducing unfamiliar ideas and math symbols one-at-a-time. Learners who complete this course will master the vocabulary, notation, concepts, and algebra rules that all data scientists must know before moving on to more advanced material.

5 加州大学圣迭戈分校 Introduction to Discrete Mathematics for Computer Science Specialization(面向计算机科学的离散数学专项课程)

http://coursegraph.com/coursera-specializations-discrete-mathematics

面向计算机科学的离散数学专项课程(Introduction to Discrete Mathematics for Computer Science Specialization),这个系列包含5门子课程,涵盖证明、组合数学与概率、图论,数论和密码学,配送问题项目等,感兴趣的同学可以关注: Build a Foundation for Your Career in IT-Master the math powering our lives and prepare for your software engineer or security analyst career

Discrete Math is needed to see mathematical structures in the object you work with, and understand their properties. This ability is important for software engineers, data scientists, security and financial analysts (it is not a coincidence that math puzzles are often used for interviews). We cover the basic notions and results (combinatorics, graphs, probability, number theory) that are universally needed. To deliver techniques and ideas in discrete mathematics to the learner we extensively use interactive puzzles specially created for this specialization. To bring the learners experience closer to IT-applications we incorporate programming examples, problems and projects in our courses.

1) What is a Proof(什么是证明)

http://coursegraph.com/coursera-what-is-a-proof

There is a perceived barrier to mathematics: proofs. In this course we will try to convince you that this barrier is more frightening than prohibitive: most proofs are easy to understand if explained correctly, and often they are even fun. We provide an accompanied excursion in the “proof zoo” showing you examples of techniques of different kind applied to different topics. We use some puzzles as examples, not because they are “practical”, but because discussing them we learn important reasoning and problem solving techniques that are useful. We hope you enjoy playing with the puzzles and inventing/understandings the proofs. As prerequisites we assume only basic math (e.g., we expect you to know what is a square or how to add fractions), basic programming in python (functions, loops, recursion), common sense and curiosity. Our intended audience are all people that work or plan to work in IT, starting from motivated high school students.

2)Combinatorics and Probability(组合和概率)

Counting is one of the basic mathematically related tasks we encounter on a day to day basis. The main question here is the following. If we need to count something, can we do anything better than just counting all objects one by one? Do we need to create a list of all phone numbers to ensure that there are enough phone numbers for everyone? Is there a way to tell that our algorithm will run in a reasonable time before implementing and actually running it? All these questions are addressed by a mathematical field called Combinatorics. In this course we discuss most standard combinatorial settings that can help to answer questions of this type. We will especially concentrate on developing the ability to distinguish these settings in real life and algorithmic problems. This will help the learner to actually implement new knowledge. Apart from that we will discuss recursive technique for counting that is important for algorithmic implementations. One of the main `consumers’ of Combinatorics is Probability Theory. This area is connected with numerous sides of life, on one hand being an important concept in everyday life and on the other hand being an indispensable tool in such modern and important fields as Statistics and Machine Learning. In this course we will concentrate on providing the working knowledge of basics of probability and a good intuition in this area. The practice shows that such an intuition is not easy to develop. In the end of the course we will create a program that successfully plays a tricky and very counterintuitive dice game. As prerequisites we assume only basic math (e.g., we expect you to know what is a square or how to add fractions), basic programming in python (functions, loops, recursion), common sense and curiosity. Our intended audience are all people that work or plan to work in IT, starting from motivated high school students.

3)Introduction to Graph Theory(图论导论)

We invite you to a fascinating journey into Graph Theory — an area which connects the elegance of painting and the rigor of mathematics; is simple, but not unsophisticated. Graph Theory gives us, both an easy way to pictorially represent many major mathematical results, and insights into the deep theories behind them. In this course, among other intriguing applications, we will see how GPS systems find shortest routes, how engineers design integrated circuits, how biologists assemble genomes, why a political map can always be colored using a few colors. We will study Ramsey Theory which proves that in a large system, complete disorder is impossible! By the end of the course, we will implement an algorithm which finds an optimal assignment of students to schools. This algorithm, developed by David Gale and Lloyd S. Shapley, was later recognized by the conferral of Nobel Prize in Economics. As prerequisites we assume only basic math (e.g., we expect you to know what is a square or how to add fractions), basic programming in python (functions, loops, recursion), common sense and curiosity. Our intended audience are all people that work or plan to work in IT, starting from motivated high school students.

4) Number Theory and Cryptography(数论和密码学)

We all learn numbers from the childhood. Some of us like to count, others hate it, but any person uses numbers everyday to buy things, pay for services, estimated time and necessary resources. People have been wondering about numbers’ properties for thousands of years. And for thousands of years it was more or less just a game that was only interesting for pure mathematicians. Famous 20th century mathematician G.H. Hardy once said “The Theory of Numbers has always been regarded as one of the most obviously useless branches of Pure Mathematics”. Just 30 years after his death, an algorithm for encryption of secret messages was developed using achievements of number theory. It was called RSA after the names of its authors, and its implementation is probably the most frequently used computer program in the word nowadays. Without it, nobody would be able to make secure payments over the internet, or even log in securely to e-mail and other personal services. In this short course, we will make the whole journey from the foundation to RSA in 4 weeks. By the end, you will be able to apply the basics of the number theory to encrypt and decrypt messages, and to break the code if one applies RSA carelessly. You will even pass a cryptographic quest! As prerequisites we assume only basic math (e.g., we expect you to know what is a square or how to add fractions), basic programming in python (functions, loops, recursion), common sense and curiosity. Our intended audience are all people that work or plan to work in IT, starting from motivated high school students.

5)Solving Delivery Problem(解决旅行商问题)

http://coursegraph.com/coursera-delivery-problem

We’ll implement together an efficient program for a problem needed by delivery companies all over the world millions times per day — the travelling salesman problem. The goal in this problem is to visit all the given places as quickly as possible. How to find an optimal solution to this problem quickly? We still don’t have provably efficient algorithms for this difficult computational problem and this is the essence of the P versus NP problem, the most important open question in Computer Science. Still, we’ll implement several efficient solutions for real world instances of the travelling salesman problem. While designing these solutions, we will rely heavily on the material learned in the courses of the specialization: proof techniques, combinatorics, probability, graph theory. We’ll see several examples of using discrete mathematics ideas to get more and more efficient solutions.

6 伦敦帝国理工学院 Mathematics for Machine Learning Specialization(面向机器学习的数学专项课程系列)

http://coursegraph.com/coursera-specializations-mathematics-machine-learning

伦敦帝国理工学院的面向机器学习的数学专项课程系列(Mathematics for Machine Learning Specialization),该系列包含3门子课程,涵盖线性代数,多变量微积分,以及主成分分析(PCA),这个专项系列课程的目标是弥补数学与机器学习以及数据科学鸿沟,感兴趣的同学可以关注:Mathematics for Machine Learning。Learn about the prerequisite mathematics for applications in data science and machine learning

For a lot of higher level courses in Machine Learning and Data Science, you find you need to freshen up on the basics in maths - stuff you may have studied before in school or university, but which was taught in another context, or not very intuitively, such that you struggle to relate it to how it’s used in Computer Science. This specialisation aims to bridge that gap, getting you up to speed in the underlying maths, building an intuitive understanding, and relating it to Machine Learning and Data Science. In the first course on Linear Algebra we look at what linear algebra is and how it relates to data. Then we look through what vectors and matrices are and how to work with them. The second course, Multivariate Calculus, builds on this to look at how to optimise fitting functions to get good fits to data. It starts from introductory calculus and then uses the matrices and vectors from the first course to look at data fitting. The third course, Dimensionality Reduction with Principal Components Analysis, uses the maths from the first two courses to do simple optimisation for the situation where you don’t have an understanding of how the data variables relate to each other. At the end of this specialisation you will have gained the prerequisite mathematical knowledge to continue your journey and take more advanced courses in machine learning.

1) Mathematics for Machine Learning: Linear Algebra(面向机器学习的数学:线性代数)

http://coursegraph.com/coursera-linear-algebra-machine-learning

In this course on Linear Algebra we look at what linear algebra is and how it relates to vectors and matrices. Then we look through what vectors and matrices are and how to work with them, including the knotty problem of eigenvalues and eigenvectors, and how to use these to solve problems. Finally we look at how to use these to do fun things with datasets - like how to rotate images of faces and how to extract eigenvectors to look at how the Pagerank algorithm works. Since we're aiming at data-driven applications, we'll be implementing some of these ideas in code, not just on pencil and paper. Towards the end of the course, you'll write code blocks and encounter Jupyter notebooks in Python, but don't worry, these will be quite short, focussed on the concepts, and will guide you through if you’ve not coded before. At the end of this course you will have an intuitive understanding of vectors and matrices that will help you bridge the gap into linear algebra problems, and how to apply these concepts to machine learning.

2)Mathematics for Machine Learning: Multivariate Calculus(面向机器学习的数学:多变量微积分)

http://coursegraph.com/coursera-multivariate-calculus-machine-learning

This course offers a brief introduction to the multivariate calculus required to build many common machine learning techniques. We start at the very beginning with a refresher on the “rise over run” formulation of a slope, before converting this to the formal definition of the gradient of a function. We then start to build up a set of tools for making calculus easier and faster. Next, we learn how to calculate vectors that point up hill on multidimensional surfaces and even put this into action using an interactive game. We take a look at how we can use calculus to build approximations to functions, as well as helping us to quantify how accurate we should expect those approximations to be. We also spend some time talking about where calculus comes up in the training of neural networks, before finally showing you how it is applied in linear regression models. This course is intended to offer an intuitive understanding of calculus, as well as the language necessary to look concepts up yourselves when you get stuck. Hopefully, without going into too much detail, you’ll still come away with the confidence to dive into some more focused machine learning courses in future.

3)Mathematics for Machine Learning: PCA(面向机器学习的数学:主成分分析)

http://coursegraph.com/coursera-pca-machine-learning

This course introduces the mathematical foundations to derive Principal Component Analysis (PCA), a fundamental dimensionality reduction technique. We'll cover some basic statistics of data sets, such as mean values and variances, we'll compute distances and angles between vectors using inner products and derive orthogonal projections of data onto lower-dimensional subspaces. Using all these tools, we'll then derive PCA as a method that minimizes the average squared reconstruction error between data points and their reconstruction. At the end of this course, you'll be familiar with important mathematical concepts and you can implement PCA all by yourself. If you’re struggling, you’ll find a set of jupyter notebooks that will allow you to explore properties of the techniques and walk you through what you need to do to get on track. If you are already an expert, this course may refresh some of your knowledge.

注:本文首发“课程图谱博客”:http://blog.coursegraph.com

同步发布到这里, 本文链接地址:http://blog.coursegraph.com/coursera上数学类相关课程数学公开课汇总推荐 http://blog.coursegraph.com/?p=804

Start your future on Coursera today.

Andrew Ng 深度学习公开课系列第五门课程序列模型开课

Andrew Ng 深度学习课程系列第五门课程序列模型(Sequence Models)在1月的尾巴终于开课 ,在跳票了几次之后,这门和NLP比较相关的深度学习课程终于开课了。这门课程属于Coursera上的深度学习专项系列 ,这个系列有5门课,目前终于完备,感兴趣的同学可以关注:Deep Learning Specialization

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others. You will: - Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs. - Be able to apply sequence models to natural language problems, including text synthesis. - Be able to apply sequence models to audio applications, including speech recognition and music synthesis. This is the fifth and final course of the Deep Learning Specialization.

这门课程主要面向自然语言,语音和其他序列数据进行深度学习建模,将会学习递归神经网络,GRU,LSTM等内容,以及如何将其应用到语音识别,机器翻译,自然语言理解等任务中去。个人认为这是目前互联网上最适合入门深度学习的系列系列课程了,Andrew Ng 老师善于讲课,另外用Python代码抽丝剥茧扣作业,课程学起来非常舒服,希望最后这门RNN课程也不负众望。参考我之前写得两篇小结:

Andrew Ng 深度学习课程小记

Andrew Ng (吴恩达) 深度学习课程小结

额外推荐: 深度学习课程资源整理

Start your future on Coursera today.