Simple VBScript program to extract data from a Microsoft Word document
by Greg Thatcher, MCSD, MCDBA, MCSE
The script below demonstrates how to extract all of the data from a Microsoft Word document. The script currently outputs all the data to a console window, though it can be easily modified to write the data to a file or database. In addition, this script can be converted to C++ by using the "VB-to-VC Automation Code Converter".
To run the script, save it to a file (e.g. word.vbs). Modify the line that sets the wordPath variable, and change it to specify the location of your Word file. For example, if your Word file was in the directory c:\StockData, and it's name was IBM.doc, you would change the line to look like this:
wordPath = "C:\StockData\IBM.doc"
You would then run the program from a DOS prompt (Start->Programs->Command Prompt) like this:
cscript word.vbs
Here is the script:
Option Explicit
REM We use "Option Explicit" to help us check for coding mistakes
REM the Word Application
Dim objWord
REM the path to the Word file
Dim wordPath
REM the document we are currently reading data from
Dim currentDocument
REM the number of Words in the current document
Dim numberOfWords
Dim i
REM where is the Word file located?
wordPath = "C:\Data\Doc1.doc"
WScript.Echo "Extract Data from " & wordPath
REM Create an invisible version of Microsoft Word
Set objWord = CreateObject("Word.Application")
REM don't display any messages about documents needing to be converted
REM from old Word file formats
objWord.DisplayAlerts = 0
REM open the Word document as read-only
REM open (path, confirmconversions, readonly
objWord.Documents.Open wordPath, false, true
REM Access the document
Set currentDocument = objWord.Documents(1)
REM How many words are in the document
NumberOfWords = currentDocument.Words.count
WScript.Echo "There are " & NumberOfWords & " words " & vbCRLF
For i = 1 to NumberOfWords
WScript.Echo currentDocument.Words(i)
Next
REM Close the document
currentDocument.Close
REM Free memory used to store the document object
Set currentDocument = Nothing
REM exit Microsoft Word
objWord.Quit
Set objWord = Nothing