How to convert formatted text to Unicode? (OR) How to convert formatted MS-Word documents to Unicode (preserving ALL the formatting information like color, bold, highlight, text alignment, table structures, WordArt etc.)?

Converting formatted text

·      Open your document (say a.doc) in Ms-Word.

Tip: If you have formatted Tamil text (in tscii encoding) just typed out in an editor (say azhagi) and not saved yet, save it first in the editor in '.RTF' mode or copy and save it straight in MS-Word (say as a.rtf or a.doc).

·      Click on 'File->Save as Web Page...' and save the file in HTML format (say a.html) . Then, close the HTML file.

·      Now, open Azhagi application and click on 'File -> Convert to Unicode'. Azhagi's converter tool will open.

·      Click on 'Choose HTML file' and open 'a.html'. This will load the html file and keep it ready for conversion.

·      Click on 'Convert HTML File' button.

·      Your file will be converted (you can choose the name of the output converted file) and displayed in your web browser.

·      Open the created html file in Ms-Word and save it back in the original format, giving a different name (say a-unicode.doc).

Tip : Even if you wish to retain the converted file in HTML format itself, then also you have to open the file in Ms-Word again and save it as a different file (say a-unicode.html). Otherwise, some characters (like bullets) may not appear properly while viewing the converted html file in your web browser.

Document version 6.3.1Copyright 2000-2012