summaryrefslogtreecommitdiff
blob: 93fd25030a80fbbdc4269903d48ffab9cd17630e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
.TH CUNEIFORM 1 "2010-09-14" "1.0.0" "multi-language OCR system"

.SH NAME
cuneiform \- multi-language OCR system

.SH SYNOPSIS
\fBcuneiform\fR [\-\-dotmatrix] [\-\-fax] [\-\-singlecolumn] [\-f \fIformat\fR] [\-l \fIlanguage\fR] [\-o \fIoutput\fR] \fIinput\fR

.SH DESCRIPTION
Cuneiform is an OCR system. In addition to text recognition it also does layout analysis and text format recognition. Cuneiform supports several languages.

.SH OPTIONS
.IP "\fB\-\-dotmatrix\fR" 4
Use recognition mode optimized for text printed with a dot matrix printer.
.IP "\fB\-\-fax\fR" 4
Use recognition mode optimized for text that has been faxed.
.IP "\fB\-\-singlecolumn\fR" 4
Disable page layout analysis and assumes that the image consists of only one column of text.
.IP "\fB\-f\fR \fIformat\fR" 4
Select output format. The following formats are available:
\fBhtml\fR (HTML format),
\fBhocr\fR (hOCR HTML format),
\fBnative\fR (native Cuneiform 2000),
\fBrtf\fR (RTF format),
\fBsmarttext\fR (plain text with TeX paragraphs),
\fBtext\fR (plain text). 
The default is plain text.
.IP "\fB\-l\fR \fIlanguage\fR" 4
By default Cuneiform recognizes English text. To change the language use the command line switch \fB\-l\fR followed by a language code (typically an ISO 639-2 three-letter code). The following languages are supported:
.TS
ll.
T{
\fBbul\fR
T}	T{
Bulgarian
T}
T{
\fBcze\fR
T}	T{
Czech
T}
T{
\fBdan\fR
T}	T{
Danish
T}
T{
\fBdut\fR
T}	T{
Dutch
T}
T{
\fBeng\fR
T}	T{
English
T}
T{
\fBest\fR
T}	T{
Estonian
T}
T{
\fBfra\fR
T}	T{
French
T}
T{
\fBger\fR
T}	T{
German
T}
T{
\fBhrv\fR
T}	T{
Croatian
T}
T{
\fBhun\fR
T}	T{
Hungarian
T}
T{
\fBita\fR
T}	T{
Italian
T}
T{
\fBlav\fR
T}	T{
Latvian
T}
T{
\fBlit\fR
T}	T{
Lithuanian
T}
T{
\fBpol\fR
T}	T{
Polish
T}
T{
\fBpor\fR
T}	T{
Portugese
T}
T{
\fBrum\fR
T}	T{
Romanian
T}
T{
\fBrus\fR
T}	T{
Russian
T}
T{
\fBruseng\fR
T}	T{
mixed Russian/English
T}
T{
\fBslv\fR
T}	T{
Slovenian
T}
T{
\fBspa\fR
T}	T{
Spanish
T}
T{
\fBsrp\fR
T}	T{
Serbian
T}
T{
\fBswe\fR
T}	T{
Swedish
T}
T{
\fBtur\fR
T}	T{
Turkish
T}
T{
\fBukr\fR
T}	T{
Ukrainian
T}
.TE
.
.IP "\fB\-o\fR \fIoutput\fR" 4
If you do not define an output file with the \fB\-o\fR switch, Cuneiform writes the result to a file \[oq]cuneiform-out.\fIformat\fR\[cq]. The file extension depends on your output format.

.SH INPUT FORMAT
Cuneiform can process any single-page image that GraphicsMagick knows how to open. Please consult the \fBgm\fR(1) manual page for the comprehensive list of supported image formats.

.SH HOMEPAGE
More information about cuneiform can be found at <\fIhttp://launchpad.net/cuneiform-linux/\fR>.

.SH AUTHOR
cuneiform was written by Cognitive Technologies and Jussi Pakkanen <\fIjpakkane@gmail.com\fR>.
.PP
This manual page was written by Daniel Baumann <\fIdaniel@debian.org\fR>, for the Debian project (but may be used by others).