You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
167 lines
2.8 KiB
167 lines
2.8 KiB
13 years ago
|
.TH CUNEIFORM 1 "2010-09-14" "1.0.0" "multi-language OCR system"
|
||
|
|
||
|
.SH NAME
|
||
|
cuneiform \- multi-language OCR system
|
||
|
|
||
|
.SH SYNOPSIS
|
||
|
\fBcuneiform\fR [\-\-dotmatrix] [\-\-fax] [\-\-singlecolumn] [\-f \fIformat\fR] [\-l \fIlanguage\fR] [\-o \fIoutput\fR] \fIinput\fR
|
||
|
|
||
|
.SH DESCRIPTION
|
||
|
Cuneiform is an OCR system. In addition to text recognition it also does layout analysis and text format recognition. Cuneiform supports several languages.
|
||
|
|
||
|
.SH OPTIONS
|
||
|
.IP "\fB\-\-dotmatrix\fR" 4
|
||
|
Use recognition mode optimized for text printed with a dot matrix printer.
|
||
|
.IP "\fB\-\-fax\fR" 4
|
||
|
Use recognition mode optimized for text that has been faxed.
|
||
|
.IP "\fB\-\-singlecolumn\fR" 4
|
||
|
Disable page layout analysis and assumes that the image consists of only one column of text.
|
||
|
.IP "\fB\-f\fR \fIformat\fR" 4
|
||
|
Select output format. The following formats are available:
|
||
|
\fBhtml\fR (HTML format),
|
||
|
\fBhocr\fR (hOCR HTML format),
|
||
|
\fBnative\fR (native Cuneiform 2000),
|
||
|
\fBrtf\fR (RTF format),
|
||
|
\fBsmarttext\fR (plain text with TeX paragraphs),
|
||
|
\fBtext\fR (plain text).
|
||
|
The default is plain text.
|
||
|
.IP "\fB\-l\fR \fIlanguage\fR" 4
|
||
|
By default Cuneiform recognizes English text. To change the language use the command line switch \fB\-l\fR followed by a language code (typically an ISO 639-2 three-letter code). The following languages are supported:
|
||
|
.TS
|
||
|
ll.
|
||
|
T{
|
||
|
\fBbul\fR
|
||
|
T} T{
|
||
|
Bulgarian
|
||
|
T}
|
||
|
T{
|
||
|
\fBcze\fR
|
||
|
T} T{
|
||
|
Czech
|
||
|
T}
|
||
|
T{
|
||
|
\fBdan\fR
|
||
|
T} T{
|
||
|
Danish
|
||
|
T}
|
||
|
T{
|
||
|
\fBdut\fR
|
||
|
T} T{
|
||
|
Dutch
|
||
|
T}
|
||
|
T{
|
||
|
\fBeng\fR
|
||
|
T} T{
|
||
|
English
|
||
|
T}
|
||
|
T{
|
||
|
\fBest\fR
|
||
|
T} T{
|
||
|
Estonian
|
||
|
T}
|
||
|
T{
|
||
|
\fBfra\fR
|
||
|
T} T{
|
||
|
French
|
||
|
T}
|
||
|
T{
|
||
|
\fBger\fR
|
||
|
T} T{
|
||
|
German
|
||
|
T}
|
||
|
T{
|
||
|
\fBhrv\fR
|
||
|
T} T{
|
||
|
Croatian
|
||
|
T}
|
||
|
T{
|
||
|
\fBhun\fR
|
||
|
T} T{
|
||
|
Hungarian
|
||
|
T}
|
||
|
T{
|
||
|
\fBita\fR
|
||
|
T} T{
|
||
|
Italian
|
||
|
T}
|
||
|
T{
|
||
|
\fBlav\fR
|
||
|
T} T{
|
||
|
Latvian
|
||
|
T}
|
||
|
T{
|
||
|
\fBlit\fR
|
||
|
T} T{
|
||
|
Lithuanian
|
||
|
T}
|
||
|
T{
|
||
|
\fBpol\fR
|
||
|
T} T{
|
||
|
Polish
|
||
|
T}
|
||
|
T{
|
||
|
\fBpor\fR
|
||
|
T} T{
|
||
|
Portugese
|
||
|
T}
|
||
|
T{
|
||
|
\fBrum\fR
|
||
|
T} T{
|
||
|
Romanian
|
||
|
T}
|
||
|
T{
|
||
|
\fBrus\fR
|
||
|
T} T{
|
||
|
Russian
|
||
|
T}
|
||
|
T{
|
||
|
\fBruseng\fR
|
||
|
T} T{
|
||
|
mixed Russian/English
|
||
|
T}
|
||
|
T{
|
||
|
\fBslv\fR
|
||
|
T} T{
|
||
|
Slovenian
|
||
|
T}
|
||
|
T{
|
||
|
\fBspa\fR
|
||
|
T} T{
|
||
|
Spanish
|
||
|
T}
|
||
|
T{
|
||
|
\fBsrp\fR
|
||
|
T} T{
|
||
|
Serbian
|
||
|
T}
|
||
|
T{
|
||
|
\fBswe\fR
|
||
|
T} T{
|
||
|
Swedish
|
||
|
T}
|
||
|
T{
|
||
|
\fBtur\fR
|
||
|
T} T{
|
||
|
Turkish
|
||
|
T}
|
||
|
T{
|
||
|
\fBukr\fR
|
||
|
T} T{
|
||
|
Ukrainian
|
||
|
T}
|
||
|
.TE
|
||
|
.
|
||
|
.IP "\fB\-o\fR \fIoutput\fR" 4
|
||
|
If you do not define an output file with the \fB\-o\fR switch, Cuneiform writes the result to a file \[oq]cuneiform-out.\fIformat\fR\[cq]. The file extension depends on your output format.
|
||
|
|
||
|
.SH INPUT FORMAT
|
||
|
Cuneiform can process any single-page image that GraphicsMagick knows how to open. Please consult the \fBgm\fR(1) manual page for the comprehensive list of supported image formats.
|
||
|
|
||
|
.SH HOMEPAGE
|
||
|
More information about cuneiform can be found at <\fIhttp://launchpad.net/cuneiform-linux/\fR>.
|
||
|
|
||
|
.SH AUTHOR
|
||
|
cuneiform was written by Cognitive Technologies and Jussi Pakkanen <\fIjpakkane@gmail.com\fR>.
|
||
|
.PP
|
||
|
This manual page was written by Daniel Baumann <\fIdaniel@debian.org\fR>, for the Debian project (but may be used by others).
|