后生可畏，专业新人对《迷思》争论表面和稀泥，其实门儿清

“专业新人” （early stage researcher）也别被我的夸赞冲昏头脑。门道门道，有门有道。门儿清，不等于道儿清。做到门儿情，只要聪颖和悟性即可，而道儿清要的却是耐性、经验、时间，屡战屡败、屡败屡战的磨练，而且还要有运气。是为冰冻之寒也。
On Thu, Dec 29, 2011 G wrote:

>> As you titled yourself early stage researcher, I'd recommend you a recent dialog on something related -
http://blog.sciencenet.cn/home.php?mod=space&uid=362400&do=blog&id=523458.
>> He has a point as an experienced practitioner.

>> I quote him here as overall he is negative to what you are going to work on ［注：指的是切词研究］. And agree with him that it's time to shift focus to parsing.
2011/12/29 G
Continuation of the dialog, but with an "early stage researcher". FYI as I actually recommended your blogs to him in place of my phd thesis 🙂

On Dec 29, 2011, M wrote:
Hi Dr. G,

I just read the Liwei's posts and your comments. I partly agree with Liwei's arguments. I think It's just a different perspective to one of the core problem in NLP, disambiguation.

Usually, beginners take the pipeline architecture as granted, i.e. segmentation-->POS tagging-->chunking-->parsing, etc. However, given the ultimate goal is to predict the overal syntactical structures of sentences, the early stages of disambiguation can be considered as pruning for the exponential number of possible parsing trees. In this sense, Liwei's correct. As ambiguity is the enemy, it's the system designer's choice to decide what architecture to use and/or when to resolve it.

I guess recently many other people in NLP also realized (and might even widely agreed on) the disadvantages of pipeline architectures, which explains why there are many "joint learning of X and Y" papers in past 5 years. In Chinese word segmentation, there are also attempts at doing word segmentation and parsing in one go, which seems to be promising to me.

On the other hand, I think your comments are quite to the point. Current applications mostly utilize very shallow NLP information. So accurate tokenization/POS tagger/chunker have their own values.

As for the interaction between linguistics theory and computational linguistics. I think it's quite similar to the relationship between other pairs of science and engineering. Basically, science decides the upper bound of engineering. But given the level of scientific achievements, engineering by itself has a huge space of possibilities. Moreover, in this specific case of our interest, CL itself may serve as a tool to advance linguistics theory, as the corpus based study of linguistics seems to be an inevitable trend.

From: Wei Li
Date: Fri, Dec 30, 2011

He is indeed a very promising young researcher who is willing to think and air his own opinions.

I did not realize that the effect of my series is that I am against the pipeline architecture. In fact I am all for it as this is the proven solid architecture for engineering modular development. Of course, by just reading my recent three posts, it is not surprising that he got that impression. There is something deeper than that: a balance between pipeline structure and keeping ambiguity untouched principle. But making the relationship clear is not very easy, but there is a way of doing that based on experiences of "adaptive development" (another important principle).

【相关博文】
专业老友痛批立委《迷思》系列搅乱NLP秩序，立委固执己见

后生可畏，专业新人对《迷思》争论表面和稀泥，其实门儿清

作者liwei999

作者 liwei999

相关文章

Qwen3来了，全尺寸开源，性能拉满！附最新一手实测！

DeepSeek-V3解析及技术报告英中报告对照版

如何构建和优化推理型大型语言模型？DeepSeek R1的启示

发表回复

You missed

Qwen3-VL技术报告英中对照版.pdf

DeepSeek-V3.2-Exp：用稀疏注意力实现更高效的长上下文推理

LongCat-Flash：美团发布的高效MoE大模型，支持智能体任务，推理速度达100 token/秒

GLM-4.5：三体合一的开源智能体大模型，重新定义AI推理边界

作者liwei999

相关文章：

作者 liwei999

相关文章

发表回复

You missed