I have a word document and I want to get word count programmatically using OpenXML sdk,
I managed to get word count but openXML returns wrong values.
note that the test document is mixed languages (Arabic, English) Arabic is RTL language.
if you open the word document using Microsoft word in the UI it gives you the correct number of words
but if you go and get the value stored in the app.xml file for the same document you will get different value.
I tried the code in this link
msdn.microsoft.com /en-us/library/office/bb521237(v=office.14).aspx
// To retrieve the properties of a document part.
public static void GetPropertyFromDocument(string document)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");
MessageBox.Show("Number of characters in the file = " +
chars.Item(0).InnerText, "Character Count");
}
the file I tested contains
word count is 13 but using upper code it gives me 11!