“专业新人” （early stage researcher）也别被我的夸赞冲昏头脑。门道门道，有门有道。门儿清，不等于道儿清。做到门儿情，只要聪颖和悟性即可，而道儿清要的却是耐性、经验、时间，屡战屡败、屡败屡战的磨练，而且还要有运气。是为冰冻之寒也。
On Thu, Dec 29, 2011 G wrote:
>> As you titled yourself early stage researcher, I'd recommend you a recent dialog on something related -
>> He has a point as an experienced practitioner.
>> I quote him here as overall he is negative to what you are going to work on ［注：指的是切词研究］. And agree with him that it's time to shift focus to parsing.
Continuation of the dialog, but with an "early stage researcher". FYI as I actually recommended your blogs to him in place of my phd thesis 🙂
On Dec 29, 2011, M wrote:
Hi Dr. G,
I just read the Liwei's posts and your comments. I partly agree with Liwei's arguments. I think It's just a different perspective to one of the core problem in NLP, disambiguation.
Usually, beginners take the pipeline architecture as granted, i.e. segmentation-->POS tagging-->chunking-->parsing, etc. However, given the ultimate goal is to predict the overal syntactical structures of sentences, the early stages of disambiguation can be considered as pruning for the exponential number of possible parsing trees. In this sense, Liwei's correct. As ambiguity is the enemy, it's the system designer's choice to decide what architecture to use and/or when to resolve it.
I guess recently many other people in NLP also realized (and might even widely agreed on) the disadvantages of pipeline architectures, which explains why there are many "joint learning of X and Y" papers in past 5 years. In Chinese word segmentation, there are also attempts at doing word segmentation and parsing in one go, which seems to be promising to me.
On the other hand, I think your comments are quite to the point. Current applications mostly utilize very shallow NLP information. So accurate tokenization/POS tagger/chunker have their own values.
As for the interaction between linguistics theory and computational linguistics. I think it's quite similar to the relationship between other pairs of science and engineering. Basically, science decides the upper bound of engineering. But given the level of scientific achievements, engineering by itself has a huge space of possibilities. Moreover, in this specific case of our interest, CL itself may serve as a tool to advance linguistics theory, as the corpus based study of linguistics seems to be an inevitable trend.
From: Wei Li
Date: Fri, Dec 30, 2011
He is indeed a very promising young researcher who is willing to think and air his own opinions.
I did not realize that the effect of my series is that I am against the pipeline architecture. In fact I am all for it as this is the proven solid architecture for engineering modular development. Of course, by just reading my recent three posts, it is not surprising that he got that impression. There is something deeper than that: a balance between pipeline structure and keeping ambiguity untouched principle. But making the relationship clear is not very easy, but there is a way of doing that based on experiences of "adaptive development" (another important principle).