突然有一种紧迫感:再不上中文NLP,可能就错过时代机遇了

与业內老友的对话:在‘用’字上狠下功夫
耳边响起了林副主席关于系统开发的谆谆教导:
Quote
带着问题做,活做活用,做用结合,急用先做,立竿见影,在‘用’字上狠下功夫。
这是从与朋友的内部交流中得来的。赶的是编造名人名言的时髦。
~~~~~~~~~~~~
在我发文【坚持四项基本原则,开发鲁棒性NLP系统】以后,有业内资深老友表示非常有意思,建议我把NLP方面的博文系列汇集加工,可以考虑出书:
Quote
A good 经验之谈. Somehow it reminds me this —
带着问题学,活学活用,学用结合,急用先学,立竿见影,在‘用’字上狠下功夫。

You made a hidden preamble — a given type of application in a given domain.

A recommendation: expand your blog a bit as a series, heading to a book.

My friend 吴军 did that quite successfully. Of course with statistics background. So he approached NLP from math perspective — 数学之美 系列

You have very good thoughts and raw material. Just you need to put a bit more time to make your writing more approachable — I am commenting on comments like “学习不了。” and “读起来鸭梨很大”.

I know you said: “有时候想,也不能弄得太可读了,都是多年 的经验,后生想学的话,也该吃点苦头。:=)”

But as you already put in the efforts, why not make it more approachable?

The issue is, even if I am willing to 吃点苦头, I still don’t know where to start 吃苦头, IF I have never built a real-life NLP system.

For example, 词汇主义 by itself is enough for an article. You need to mention its opponents and its history to put it into context. Then you need to give some examples.

文章千古事,网上涂鸦岂敢出书?这倒不是妄自菲薄,主要是出书太麻烦,跟不上这个时代。我回到:

吴军’s series are super popular. When I first read one of his articles on the Google Blackboard, recommended by a friend, I was amazed how well he structured and carried the content. It is intriguing. (边注:当然,他那篇谈 Page Rank 的文章有偏颇,给年轻人一种印象,IT 事业的成功是由技术主宰的,而实际上技术永远是第二位的。对于所谓高技术企业,没有技术是万万不行的,但企业成功的关键却不是技术,这是显而易见的事实了。)For me, to be honest, I do not aim that high.  Never bothered polishing things to pursue perfection although I did make an effort to try to link my stuffs into a series for the convenience of cross reference inside the related series. There are missing links which I know I want to write about but which sort of depends on my mood or time slots.  I guess I am just not pressed and motivated to do the writing part.  Popularizing the technology is only a side effect of the blogging hobby at times.  The way I prove myself is to show that I will be able to build products worth of millions, or even hundreds of millions of dollars.

网上的文字都是随兴之所至,我从来不写命题作文,包括我自己的命题。有时候兴趣来了,就说自己下一篇打算写什么什么,算是自我命题,算是动了某个话题的心思。可是过了两天,一个叉打过去,没那个兴致和时间了,也就作罢。

赶上什么写什么,这就是上网的心态。平时打工已经够累了,上网绝不给自己增加负担。

So far I have been fairly straightforward on what I write about.  If there is readability issue, it is mainly due to my lack of time.  Young people should be able to benefit from my writings especially once they start getting their hands dirty in building up a system.

Your discussion is fun. You can see and appreciate things hidden behind my work more than other readers.  After all, you have published in THE CL and you have almost terminated the entire segmentation as a scientific area. Seriously, it is my view that there is not much to do there after your work on tokenization both in theory and practice.

I feel some urgency now for having to do Chinese NLP asap.  Not many people have been though that much as what I have been (luckily), so I am in a position to potentially build a much more powerful system to make an impact on Chinese NLP, and hopefully on the IT landscape as well.  But time passes fast . That is why my focus is on the Chinese processing now, day and night.  I am keeping my hands dirty also with a couple of European languages, but they are less challenging and exciting.

此条目发表在自然语言处理分类目录。将固定链接加入收藏夹。

突然有一种紧迫感:再不上中文NLP,可能就错过时代机遇了》有 4 条评论

  1. J. Ma说:

    让人很是期待啊…

    另,立委老师的这位朋友可是Jin Guo? 那篇segmentation终结者文章可是Critical Tokenization and its Properties?

    [回复]

    liwei999 回复:

    yes

    [回复]

  2. Dong Wang说:

    其实 中文NLP现在搞的很多 国内很多单位 国外也有很多单位在搞
    别的不说 美国这边 只要涉及到多语种的 就一定有汉语。不管是资源、方法、系统,都有很不错的进展。好多项目都在进展中。

    [回复]

    liwei999 回复:

    那当然。请这位朋友有空细谈一下这方面的进展,尤其是那些系统得到了实用,实用中遇到怎样的挑战,这些都是非常有意义的话题。

    [回复]

发表评论

电子邮件地址不会被公开。 必填项已用*标注