How to view Bengali characters instead of its html encoding
I am creating an ebook of content written in Bengali. The file is created in MS Word (using the on-screen keyboard) and saved as an .htm file (Save As Webpage - Filtered). However, when I open this .htm file in Notepad++, I see the html equivalents of the Bengali characters instead of the characters themselves.
Notepad++ is allowing me to type in Bengali in the Editor so I suspect this may not be a font issue. Does anyone know what I have to do to display the Bengali characters? Thanks.
Notepad++ shows what has been saved by Word, means as you have selected that the file
should be saved as html file Word decided to replace the Bengali chars with the html encoded version. Which is nice, as this means that most of the browsers shouldn’t have an issue
displaying the page correctly.
If you want to see the “real” glyph than you need to replace the html encoded version with
the correct version of your used encoding but this could mean, that a browser might have
an issue displaying the page correctly.
Not very sure what would be the best solution under the circumstances. Should I write a script to convert the file? I believe there is also a plugin called HTML Tags which can do this for me…
Thank you for responding so promptly!
I have to admit that I don’t have any experience in creating ebooks.
Does the ebook format specify a certain encoding? UTF-8?
Or is it basically html?
If the html tags plugin can do this, yes why not using it.
If it can’t, I assume it should be possible to write two python scripts
which do maybe something like this
Convert the html encoded tags into the “real” utf-8 encoding
and you can start writing and once you’re finished write another script
which reverts it to html encoded strings again.
Of course, python script plugin needs to be installed in this case.
Let me know if you wanna go this way.
I read somewhere that an epub is a website in a box (can’t remember where I came across that phrase) and it is very apt. Yes, its basically xhtml 1.1 and css for styling.
Will post back with test results; I am familiar with python so that may be the way to go.
can you paste here some word that you have to undrstand