Ubuntu8.10下moses测试平台搭建全记录

  实验室换了新机器,重新安装了最新的ubuntu8.10,这样不得不重新搭建moses测试平台。拿自己写的<<Moses相关介绍>>作参考,感觉写得不够细,这里把安装的全过程记录下来,属于一个step-by-step的过程,希望对大家有用。

一、在用户52nlp目录下建立moses平台主目录mtworkdir:
  52nlp@52nlp-desktop:~$ mkdir mtworkdir
  52nlp@52nlp-desktop:~$ cd mtworkdir/

二、安装语言模型工具SRILM:
1、建立srilm目录:
  52nlp@52nlp-desktop:~/mtworkdir$ mkdir srilm
  52nlp@52nlp-desktop:~/mtworkdir$ cd srilm/
2.下载最新的的srilm包(目前最新版本为 srilm-1.5.7.tar.gz)
  52nlp@52nlp-desktop:~/mtworkdir/srilm$ wget   ’ftp://ftp.speech.sri.com/pub/people/stolcke/srilm/srilm-1.5.7.tar.gz’
  显示信息如下:
=> `srilm-1.5.7.tar.gz’
正在解析主机 ftp.speech.sri.com… 130.107.33.205
正在连接 ftp.speech.sri.com|130.107.33.205|:21… 已连接。
正在以 anonymous 登录 … 登录成功!
==> SYST … 完成。 ==> PWD … 完成。
==> TYPE I … 完成。 ==> CWD /pub/people/stolcke/srilm … 完成。
==> SIZE srilm-1.5.7.tar.gz … 完成。
==> PASV … 完成。 ==> RETR srilm-1.5.7.tar.gz … 完成。
长度:48526656 (46M) (非正式数据)
出现下载进度条,等待约一段时间之后,下载完毕
3.解压:tar -zxvf srilm-1.5.7.tar.gz
4.首先确认srilm依赖的这些工具是否已安装:
 A template-capable ANSI-C/C++ compiler, preferably gcc version 3.4.3 or higher.
 GNU make, to control compilation and installation.
 GNU gawk, required for many of the utility scripts.
 GNU gzip to unpack the distribution, and to allow SRILM programs to handle “.Z” and “.gz” compressed datafiles (highly recommended).
 bzip2 to handle “.bz2″ compressed files (optional).
 p7zip to handle “7-zip” compressed files (optional).
 The Tcl embeddable scripting language library (only required for some of the test executables).
 除了上面这些工具外,还需要装一个csh。Ubuntu8.10自带的软件不多,安装时用apt-get或新利得都行。
5. 修改MakeFile:
 在以上工具都安装完毕后,首先修改srilm/MakeFile:
   cp Makefile Makefile.bak(备份)
   vi Makefile
 修改或在第7行下面加上一行
 # SRILM = /home/speech/stolcke/project/srilm/devel (原)
 SRILM = $(PWD) (修改)
 再修改srilm/common/Makefile.machine.i686:
  cd common/
  cp Makefile.machine.i686 Makefile.machine.i686.bak
  vi Makefile.machine.i686
 将第15行 # Use the GNU C compiler下的三行修改如下:
  GCC_FLAGS = -mtune=pentium3 -Wreturn-type -Wimplicit
  CC = gcc $(GCC_FLAGS)
  CXX = g++ $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES
 注:我的新机器的cpu是intel64位,所以尝试了一下64位的编译方法,不太成功,这里的方法对于64位机器也是可以的。
 将51行 # Tcl support (standard in Linux) 下的两行修改如下:
  TCL_INCLUDE = -I/usr/include/tcl8.5
  TCL_LIBRARY = -L/usr/lib/tcl8.5
 注:我装的是tcl8.5,如果是其他版本,请相应修改。
6.回到srilm目录下编译:
  cd ..
  make World
 顺利的话,srilm就编译通过了。如果出现问题,很可能就是相应的依赖工具没有装完全,请回到第4步检查。
7.进入srilm/test目录下进行测试:
 编译通过不等于编译成功,必须利用srilm提供的测试模块进行测试
 首先声明srilm编译成功后工具报所在的环境变量:
  export  PATH=$PATH:/home/52nlp/mtworkdir/srilm/bin/i686:
/home/52nlp/mtworkdir/srilm/bin
 然后进入test测试:
  cd test
  make all
 出现如下信息:
*** Running test class-ngram-simple ***
0.50user 0.11system 0:00.61elapsed 100%CPU  (0avgtext+0avgdata 0maxresident)k
0inputs+1288outputs (0major+4684minor)pagefaults 0swaps
class-ngram-simple: stdout output IDENTICAL.
class-ngram-simple: stderr output IDENTICAL.
….
 需要等待一段时间,如果出现多是IDENTICAL,很少的DIFFERS,就证明srilm编译成功了!

三、安装翻译模型训练工具Giza++,mkcls

1、 在mtworkdir目录下下载并解压Giza++:
  cd /home/52nlp/mtworkdir
  wget http://ling.umd.edu/~redpony/software/giza++.gcc41.tar.gz
  tar -zxvf giza++.gcc41.tar.gz
 解压后得到GIZA++-v2/目录
2、编译Giza++:
  cd GIZA++v2
  make
 以前这一步都比较顺利,没想到这一次出了问题:编译过程中提示stream.h文件无法找到,开始以为自己的编译环境没有配置完全,但是检查了几项必要的都安装了。在Google上搜这个问题,国内好像还没有人遇到过,最终在Google code giza-pp的issue上找到了的答案(http://code.google.com/p/giza-pp/issues/detail?id=7):
  Cannot compile with gcc 4.3 or greater
 Giza++不能被gcc,g++4.3或更高版本编译。这个问题也是最近被发现的,属于一个bug,而ubutu8.10默认安装的gcc,g++都是4.3版本,这个帖子里提出了几种解决方案,我用了最简单的一种:
 安装g++-4.1: sudo apt-get install g++-4.1
 修改GIZA++-v2里的Makefile:vi Makefile
 将第5行CXX=g++
 替换为:CXX=g++-4.1
 OK, 可以重新make了:
  make
  make snt2cooc.out
 一切顺利!
3、下载解压并编译mkcls:
  cd ..(重新进入mtworkdir目录)
  wget http://ling.umd.edu/~redpony/software/mkcls.gcc41.tar.gz
  tar -zxvf mkcls.gcc41.tar.gz
  cd mkcls-v2
  make
 这一步一般没啥问题。
4、建立bin目录,并将giza++,mkcls工具拷贝到bin目录下:
  cd ..
  mkdir -p bin
  cp GIZA++-v2/GIZA++ bin/
  cp GIZA++-v2/snt2cooc.out bin/
  cp mkcls-v2/mkcls bin/

四、安装解码器Moses及相关脚本
1、建立目录,通过svn下载moses:
  mkdir -p moses
  svn co  https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder
/trunk moses
 ubuntu8.10下需自行安装svn.
2、下载完成后编译:
  cd moses
  ./regenerate-makefiles.sh
  ./configure –with-srilm=/home/52nlp/mtworkdir/srilm
  make -j 4
  cd ..
 注:srilm指向绝对路径。
3、安装Moses训练脚本
 建立训练脚本目录:
  mkdir -p bin/moses-scripts
  修改makefile:
  vi moses/scripts/Makefile
  将第13、14行修改如下:
  TARGETDIR=/home/52nlp/mtworkdir/bin/moses-scripts
  BINDIR=/home/52nlp/mtworkdir/bin
 编译:
  cd moses/scripts/
  make release
  cd ../..
 使用时需要声明环境:
  export SCRIPTS_ROOTDIR=/home/52nlp/mtworkdir/bin/moses-scripts
/scripts-20090113-1019
4、安装Moses附加脚本及评测工具
 下载scripts.tgz并解压:
  wget http://www.statmt.org/wmt07/scripts.tgz
  tar -zxvf scripts.tgz
 这些脚本包括:
  Tokenizer scripts/tokenizer.perl
  Lowercaser scripts/lowercase.perl
  SGML-Wrapper scripts/wrap-xml.perl
 下载NIST,BLEU评测工具:
  wget ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v11b.pl

  完整的安装过程结束,这次安装中最大的问题是Giza++不能被gcc,g++4.3或更高版本编译,以后肯定会有很多人遇到这个问题,希望放在这里对大家有所帮助。

注:原创文章,转载请注明出处“我爱自然语言处理”:www.52nlp.cn

本文链接地址:
http://www.52nlp.cn/ubuntu-moses-platform-build-process-record/

此条目发表在机器翻译分类目录,贴了, , , , , 标签。将固定链接加入收藏夹。

Ubuntu8.10下moses测试平台搭建全记录》有 155 条评论

  1. 52nlp说:

    需要编译的

    [回复]

    Ran 回复:

    我是根据官网的指导下的,用的./bjam的操作,就跟你的步骤不太一样,就只有moses_chart, moses_test 哪里都没有moses,可以帮忙看看怎么回事么〜?

    [回复]

    52nlp 回复:

    前段时间我也用bjam编译过,bin目录下是有moses,建议你检查一下编译过程中的warning,另外就是boost库是否完全,有可能会影响到moses本身的编译

    [回复]

  2. zhangxin说:

    make all-recursive
    make[1]: 正在进入目录 `/home/zhangxin/tools/mtworkdir/moses’
    Making all in moses/src
    make[2]: 正在进入目录 `/home/zhangxin/tools/mtworkdir/moses/moses/src’
    make all-am
    make[3]: 正在进入目录 `/home/zhangxin/tools/mtworkdir/moses/moses/src’
    /bin/bash ../../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT FloydWarshall.lo -MD -MP -MF .deps/FloydWarshall.Tpo -c -o FloydWarshall.lo FloydWarshall.cpp
    /bin/bash ../../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT Parameter.lo -MD -MP -MF .deps/Parameter.Tpo -c -o Parameter.lo Parameter.cpp
    /bin/bash ../../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT PartialTranslOptColl.lo -MD -MP -MF .deps/PartialTranslOptColl.Tpo -c -o PartialTranslOptColl.lo PartialTranslOptColl.cpp
    /bin/bash ../../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT Phrase.lo -MD -MP -MF .deps/Phrase.Tpo -c -o Phrase.lo Phrase.cpp
    libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT FloydWarshall.lo -MD -MP -MF .deps/FloydWarshall.Tpo -c FloydWarshall.cpp -o FloydWarshall.o
    libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT Phrase.lo -MD -MP -MF .deps/Phrase.Tpo -c Phrase.cpp -o Phrase.o
    libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT Parameter.lo -MD -MP -MF .deps/Parameter.Tpo -c Parameter.cpp -o Parameter.o
    libtool: compile: g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -std=c++0x -DTRACE_ENABLE=1 -I/home/zhangxin/tools/mtworkdir/srilm/include -g -O2 -MT PartialTranslOptColl.lo -MD -MP -MF .deps/PartialTranslOptColl.Tpo -c PartialTranslOptColl.cpp -o PartialTranslOptColl.o
    FloydWarshall.cpp: In function ‘void floyd_warshall(const std::vector<std::vector >&, std::vector<std::vector >&)’:
    FloydWarshall.cpp:16:3: error: ‘size_t’ was not declared in this scope
    FloydWarshall.cpp:16:3: note: suggested alternatives:
    /usr/include/c++/4.6/i686-linux-gnu/./bits/c++config.h:155:26: note: ‘std::size_t’
    /usr/include/c++/4.6/i686-linux-gnu/./bits/c++config.h:155:26: note: ‘std::size_t’
    FloydWarshall.cpp:16:10: error: expected ‘;’ before ‘num_edges’
    FloydWarshall.cpp:18:15: error: expected ‘;’ before ‘i’
    FloydWarshall.cpp:18:20: error: ‘i’ was not declared in this scope
    FloydWarshall.cpp:18:22: error: ‘num_edges’ was not declared in this scope
    FloydWarshall.cpp:19:17: error: expected ‘;’ before ‘j’
    FloydWarshall.cpp:19:22: error: ‘j’ was not declared in this scope
    FloydWarshall.cpp:28:15: error: expected ‘;’ before ‘k’
    FloydWarshall.cpp:28:20: error: ‘k’ was not declared in this scope
    FloydWarshall.cpp:28:22: error: ‘num_edges’ was not declared in this scope
    FloydWarshall.cpp:29:17: error: expected ‘;’ before ‘i’
    FloydWarshall.cpp:29:22: error: ‘i’ was not declared in this scope
    FloydWarshall.cpp:30:19: error: expected ‘;’ before ‘j’
    FloydWarshall.cpp:30:24: error: ‘j’ was not declared in this scope
    make[3]: *** [FloydWarshall.lo] 错误 1
    make[3]: *** 正在等待未完成的任务….
    In file included from PhraseDictionaryMemory.h:26:0,
    from Hypothesis.h:33,
    from SentenceStats.h:30,
    from StaticData.h:42,
    from Phrase.cpp:30:
    PhraseDictionary.h:142:37: warning: ‘auto_ptr’ is deprecated (declared at /usr/include/c++/4.6/backward/auto_ptr.h:87) [-Wdeprecated-declarations]
    Phrase.cpp:222:6: warning: unused parameter ‘factorDelimiter’ [-Wunused-parameter]
    In file included from PhraseDictionaryMemory.h:26:0,
    from Hypothesis.h:33,
    from TranslationOption.h:31,
    from PartialTranslOptColl.h:27,
    from PartialTranslOptColl.cpp:22:
    PhraseDictionary.h:142:37: warning: ‘auto_ptr’ is deprecated (declared at /usr/include/c++/4.6/backward/auto_ptr.h:87) [-Wdeprecated-declarations]
    mv -f .deps/Phrase.Tpo .deps/Phrase.Plo
    mv -f .deps/PartialTranslOptColl.Tpo .deps/PartialTranslOptColl.Plo
    mv -f .deps/Parameter.Tpo .deps/Parameter.Plo
    make[3]:正在离开目录 `/home/zhangxin/tools/mtworkdir/moses/moses/src’
    make[2]: *** [all] 错误 2
    make[2]:正在离开目录 `/home/zhangxin/tools/mtworkdir/moses/moses/src’
    make[1]: *** [all-recursive] 错误 1
    make[1]:正在离开目录 `/home/zhangxin/tools/mtworkdir/moses’
    make: *** [all] 错误 2
    运行make -j 4的时候就出现这个问题,该怎么解决啊,好难过啊

    [回复]

  3. 52nlp说:

    FloydWarshall.cpp:16:3: error: ‘size_t’ was not declared in this scope

    估计和你的gcc/g++版本有关系,不过我好久没有碰srilm了,这个问题你还是要自己去google一下,抱歉

    [回复]

发表评论

电子邮件地址不会被公开。 必填项已用*标注