Pandoc is an Open Source program written in the language Haskell that converts in between several markup file formats, among them HTML, LaTeX and DocBook.
The program does its job by using a Reader and a Writer for each of the formats. All source code is available from:
[login to view URL]
I would like to create a Reader for the RTF format, able to read files all the way to ver. 1.9.1 of the RTF specs - to be added t the Pandoc tool.
The Reader should use Haskell alone, abide by the structure of Pandoc and be able to work with Pandoc to translate files in other formats. I'll provide RTF test files.
Resources:
RTF ver. 1.6 Specs:
[login to view URL]
RTF ver. 1.5 Specs:
[login to view URL]
Wikipedia:
[login to view URL]
unRTF:
[login to view URL]
rtf2latex2e:
[login to view URL]
Additional Project Description:
04/08/2013 at 15:01 EDT
Attached is an RTF file with the examples of what could/should be converted, to be used in tests.
04/08/2013 at 16:48 EDT
w2LaTeX: [login to view URL]
04/10/2013 at 19:09 EDT
The reader for RTF should at least handle:
* convert text style: bold, italic, color, big, small,...
* underlines and strikethroughs
* expanded and condensed text
* support for text shadowing, outlining, embossing, or engraving
* changes in the foreground and background colors
* reads embedded figures: PICT, EMF, GIF, TIFF, WMF, PNG, JPEG
* reads tables
* superscripts and subscripts
* equations: converts embedded MathType equations
* symbols: greek and math symbols
* reads footnotes (on pages and in tables)
* hyperlinks
* internal document page references
and be able to process the attached RTF file in here.
Pandoc has readers written for 7 languages: markdown, reStructuredText, textile, HTML, DocBook, LaTeX and MediaWiki markup, one should look closely at these readers to have an idea how to proceed with a new one.