Schedule of conditions for the technical implementation of the WikiMedia Parser in C#
This is a small program parser to make a static dump (HTML output) of Wikipedia which is based on WikiMedia ([url removed, login to view])
The program is designed to extract WikiMedia tags (including the template [url removed, login to view]:Template_messages/All) from text to transform onto html output. The Html must comply with the W3C's HTML specifications.
The parser must be written in C# language. The main class should have an easy to use method for getting the text parsed.
I would like to use the API something like this:
String OrignalText = “wiki text”;
WikimediaParser parser = new WikimediaParser();
String textParsed = [url removed, login to view]( OrignalText);
You can take as a starting point this site:
[url removed, login to view]
• .NET C#
• Regular expression
Wikitext language or wiki markup is a markup language that offers a simplified alternative to HTML and is used to write pages in wiki websites.
Wikitext is text in this language.
There is no commonly accepted standard wikitext language. The grammar, structure, features, keywords and so on are dependent on the particular wiki software used on the particular website. For example, all wikitext markup languages have a simple way of hyperlinking to other pages within the site, but there are several different syntax conventions for these links.
Some wiki programs allow extensive optional use of HTML tags within wikitext, others a smaller subset, and still others no HTML at all. Other wiki programs allow the restrictions on HTML to be set by the particular site.
MediaWiki's wikitext allows you to freely mix wiki format and HTML, but it provides a simple, readable syntax that allows users to not even know HTML
I would like to translate all wiki markup that is on this page:
[url removed, login to view]:How_to_edit_a_page
Wiki markup templates on this page:
[url removed, login to view]:Template_messages/All
I don’t need “User talk namespace”.
I want to use Logger4Net to log each error and accurate debug message when debug message is enabled.
I want flexible code to add future Wiki Markup or Wiki Template. The code must be commented very clearly.
The API must be run on Windows and with the .NET Framework 1.1 or more. The API must be written with C# language.
We pay only at the end of the project. Any method payment is accepted ( Paypal, wire, etc…)
It must be string, text file or xml file.
The output must be complying with HTML specifications.
I need 2 methods, you can implement this Interface.
Public interface IWikiParser
String Parse( string wikitext);
String Parse( string wikitext, int length);
For the second method, be carefully don’t split between two html tags.
You can test all Wikipedia articles with this database dump:
[url removed, login to view]
I give you also smaller files for testing the parser.
The API must be on production release the mid January. But I would like to see every x days a working parser to check the quality of the dump.