<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>《语言模型训练工具SRILM详解》的评论</title>
	<atom:link href="http://www.52nlp.cn/language-model-training-tools-srilm-details/feed" rel="self" type="application/rss+xml" />
	<link>http://www.52nlp.cn/language-model-training-tools-srilm-details</link>
	<description>I Love Natural Language Processing</description>
	<lastBuildDate>Sun, 05 Feb 2012 11:54:59 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>作者：52nlp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-3260</link>
		<dc:creator>52nlp</dc:creator>
		<pubDate>Mon, 05 Dec 2011 15:30:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-3260</guid>
		<description>-ppl textfile本身用的就是文件本身</description>
		<content:encoded><![CDATA[<p>-ppl textfile本身用的就是文件本身</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：conanlancc</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-3240</link>
		<dc:creator>conanlancc</dc:creator>
		<pubDate>Mon, 05 Dec 2011 03:52:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-3240</guid>
		<description>我想问下 ngram -ppl算每个句子的观测概率时，-ppl可以批处理算test集么，参数可以用test集的文件列表么</description>
		<content:encoded><![CDATA[<p>我想问下 ngram -ppl算每个句子的观测概率时，-ppl可以批处理算test集么，参数可以用test集的文件列表么</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：52nlp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-3109</link>
		<dc:creator>52nlp</dc:creator>
		<pubDate>Sat, 26 Nov 2011 02:20:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-3109</guid>
		<description>Srilm本身就支持，你可以用ngarm，加上参数--ppl计算，具体见：http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html

-ppl textfile
    Compute sentence scores (log probabilities) and perplexities from the sentences in textfile, which should contain one sentence per line. The -debug option controls the level of detail printed, even though output is to stdout (not stderr).

    -debug 0
        Only summary statistics for the entire corpus are printed, as well a partial statistics for each input portion delimited by escaped lines (see -escape). These statistics include the number of sentences, words, out-of-vocabulary words and zero-probability tokens in the input, as well as its total log probability and perplexity. Perplexity is given with two different normalizations: counting all input tokens (``ppl&#039;&#039;) and excluding end-of-sentence tags (``ppl1&#039;&#039;). 
    -debug 1
        Statistics for individual sentences are printed. 
    -debug 2
        Probabilities for each word, plus LM-dependent details about backoff used etc., are printed. 
    -debug 3
        Probabilities for all words are summed in each context, and the sum is printed. If this differs significantly from 1, a warning message to stderr will be issued.</description>
		<content:encoded><![CDATA[<p>Srilm本身就支持，你可以用ngarm，加上参数&#8211;ppl计算，具体见：http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html</p>
<p>-ppl textfile<br />
    Compute sentence scores (log probabilities) and perplexities from the sentences in textfile, which should contain one sentence per line. The -debug option controls the level of detail printed, even though output is to stdout (not stderr).</p>
<p>    -debug 0<br />
        Only summary statistics for the entire corpus are printed, as well a partial statistics for each input portion delimited by escaped lines (see -escape). These statistics include the number of sentences, words, out-of-vocabulary words and zero-probability tokens in the input, as well as its total log probability and perplexity. Perplexity is given with two different normalizations: counting all input tokens (“ppl”) and excluding end-of-sentence tags (“ppl1”).<br />
    -debug 1<br />
        Statistics for individual sentences are printed.<br />
    -debug 2<br />
        Probabilities for each word, plus LM-dependent details about backoff used etc., are printed.<br />
    -debug 3<br />
        Probabilities for all words are summed in each context, and the sum is printed. If this differs significantly from 1, a warning message to stderr will be issued.</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：Kelp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-3097</link>
		<dc:creator>Kelp</dc:creator>
		<pubDate>Fri, 25 Nov 2011 14:54:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-3097</guid>
		<description>有没有工具可以使用生成好的LM计算句子的观测概率的？？？

Input:   S = (w1,w2,w3...wn)
Output: P(s)=P(w1)p(w2&#124;w1)...(wn&#124;wn-1)

thanks.</description>
		<content:encoded><![CDATA[<p>有没有工具可以使用生成好的LM计算句子的观测概率的？？？</p>
<p>Input:   S = (w1,w2,w3&#8230;wn)<br />
Output: P(s)=P(w1)p(w2|w1)&#8230;(wn|wn-1)</p>
<p>thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：52nlp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1556</link>
		<dc:creator>52nlp</dc:creator>
		<pubDate>Mon, 08 Nov 2010 16:30:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1556</guid>
		<description>这个可以考虑先梳理一下srilm的代码结构，建议你仔细阅读一下《srilm阅读文档》系列：
http://blog.chinaunix.net/u1/58264/index.html

另外，既然想将你们自己的语言模型加入到srilm中，可以直接联系原作者——SRI实验室的Andreas Stolcke，我觉得找他更合适！</description>
		<content:encoded><![CDATA[<p>这个可以考虑先梳理一下srilm的代码结构，建议你仔细阅读一下《srilm阅读文档》系列：<br />
<a href="http://blog.chinaunix.net/u1/58264/index.html" rel="nofollow">http://blog.chinaunix.net/u1/58264/index.html</a></p>
<p>另外，既然想将你们自己的语言模型加入到srilm中，可以直接联系原作者——SRI实验室的Andreas Stolcke，我觉得找他更合适！</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：jeffery</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1554</link>
		<dc:creator>jeffery</dc:creator>
		<pubDate>Mon, 08 Nov 2010 09:41:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1554</guid>
		<description>您好，看你们的文章，感觉你们对SRILM toolkit很熟悉，我博士方向也是语言模型相关，目前想把我们自己的语言模型，加到这个toolkit里面来，你们有空的话方便的话想跟你们交流下，方便的话能跟我回个邮件吗？（junfeiguo@hotmail.com）</description>
		<content:encoded><![CDATA[<p>您好，看你们的文章，感觉你们对SRILM toolkit很熟悉，我博士方向也是语言模型相关，目前想把我们自己的语言模型，加到这个toolkit里面来，你们有空的话方便的话想跟你们交流下，方便的话能跟我回个邮件吗？（junfeiguo@hotmail.com）</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：52nlp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1519</link>
		<dc:creator>52nlp</dc:creator>
		<pubDate>Tue, 26 Oct 2010 00:41:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1519</guid>
		<description>嗯，就是一行普通的文本，按空格分开，训练中文的话先分一下词。</description>
		<content:encoded><![CDATA[<p>嗯，就是一行普通的文本，按空格分开，训练中文的话先分一下词。</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：duckyaya</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1517</link>
		<dc:creator>duckyaya</dc:creator>
		<pubDate>Sun, 24 Oct 2010 17:44:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1517</guid>
		<description>我想问一下europarl-v3b.en这个文件的格式是怎么样的？

一行句话按照空格隔开吗？

另外：如果我想训练中文的语言模型，是不是也按照同样的格式给SRILM训练就可以了 :-)

以前没怎么用过，多谢帮忙！</description>
		<content:encoded><![CDATA[<p>我想问一下europarl-v3b.en这个文件的格式是怎么样的？</p>
<p>一行句话按照空格隔开吗？</p>
<p>另外：如果我想训练中文的语言模型，是不是也按照同样的格式给SRILM训练就可以了 <img src='http://www.52nlp.cn/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>以前没怎么用过，多谢帮忙！</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：52nlp</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1400</link>
		<dc:creator>52nlp</dc:creator>
		<pubDate>Mon, 09 Aug 2010 13:14:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1400</guid>
		<description>要改的话可能需要修改源代码重新编译了。</description>
		<content:encoded><![CDATA[<p>要改的话可能需要修改源代码重新编译了。</p>
]]></content:encoded>
	</item>
	<item>
		<title>作者：Kevin</title>
		<link>http://www.52nlp.cn/language-model-training-tools-srilm-details/comment-page-1#comment-1399</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Mon, 09 Aug 2010 05:09:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.52nlp.cn/?p=954#comment-1399</guid>
		<description>有没有办法让他trian的LM里面的不是－99，而是保持原状？</description>
		<content:encoded><![CDATA[<p>有没有办法让他trian的LM里面的不是－99，而是保持原状？</p>
]]></content:encoded>
	</item>
</channel>
</rss>

