Simple VBScript program to extract data from a Microsoft Word document

by Greg Thatcher, MCSD, MCDBA, MCSE

The script below demonstrates how to extract all of the data from a Microsoft Word document. The script currently outputs all the data to a console window, though it can be easily modified to write the data to a file or database. In addition, this script can be converted to C++ by using the "VB-to-VC Automation Code Converter".

To run the script, save it to a file (e.g. word.vbs). Modify the line that sets the wordPath variable, and change it to specify the location of your Word file. For example, if your Word file was in the directory c:\StockData, and it's name was IBM.doc, you would change the line to look like this:

wordPath = "C:\StockData\IBM.doc"

You would then run the program from a DOS prompt (Start->Programs->Command Prompt) like this:

cscript word.vbs

Here is the script:

Option Explicit
REM We use "Option Explicit" to help us check for coding mistakes

REM the Word Application
Dim objWord

REM the path to the Word file
Dim wordPath

REM the document we are currently reading data from
Dim currentDocument
REM the number of Words in the current document
Dim numberOfWords
Dim i

REM where is the Word file located?
wordPath = "C:\Data\Doc1.doc"

WScript.Echo "Extract Data from " & wordPath

REM Create an invisible version of Microsoft Word
Set objWord = CreateObject("Word.Application") 

REM don't display any messages about documents needing to be converted
REM from  old Word file formats
objWord.DisplayAlerts = 0

REM open the Word document as read-only
REM open (path, confirmconversions, readonly
objWord.Documents.Open wordPath, false, true

REM Access the document
Set currentDocument = objWord.Documents(1)

REM How many words are in the document
NumberOfWords = currentDocument.Words.count
WScript.Echo "There are " & NumberOfWords & " words " & vbCRLF

For i = 1 to NumberOfWords
	WScript.Echo currentDocument.Words(i)

REM Close the document
REM Free memory used to store the document object
Set currentDocument = Nothing

REM exit Microsoft Word
Set objWord = Nothing