Convert Html To Plain Text In Vba
Solution 1:
Set a reference to "Microsoft HTML object library".
Function HtmlToText(sHTML) AsStringDim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
EndFunction
Tim
Solution 2:
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html AsString) AsStringDimtextAsStringDim accumulating AsBooleanDim n AsIntegerDim c AsStringtext = ""
accumulating = True
n = 1DoWhile n <= Len(html)
c = Mid(html, n, 1)
If c = "<"Then
accumulating = FalseElseIf c = ">"Then
accumulating = TrueElseIf accumulating Thentext = text & c
EndIfEndIf
n = n + 1Loop
StripTags = textEndFunction
This can leave lots of extraneous whitespace, but it will help in removing the tags.
Solution 3:
Tim's solution was great, worked liked a charm.
I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:
SetID= ThisWorkbook.VBProject.References
ID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5
It worked on Windows XP and Windows 7.
Solution 4:
Tim's answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.
Function HtmlToText(sHTML) AsStringDim oDoc As HTMLDocument
If IsNull(sHTML) Then
HtmlToText = ""ExitFunctionEnd-IfSet oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
EndFunction
Solution 5:
Yes! I managed to solve my problem as well. Thanks everybody/
In my case, I had this sort of input:
<p>Lorem ipsum dolor sit amet.</p>
<p>Ut enim ad minim veniam.</p>
<p>Duis aute irure dolor in reprehenderit.</p>
And I did not want the result to be all jammed together without breaklines.
So I first splitted my input for every <p>
tag into an array 'paragraphs', then for each element I used Tim's answer to get the text out of html (very sweet answer btw).
In addition I concatenated each cleaned 'paragraph' with this breakline character Crh(10)
for VBA/Excel.
The final code is:
PublicFunction HtmlToText(ByVal sHTML AsString) AsStringDim oDoc As HTMLDocument
Dim result AsStringDim paragraphs() AsStringIf IsNull(sHTML) Then
HtmlToText = ""ExitFunctionEndIf
result = ""
paragraphs = Split(sHTML, "<p>")
ForEach paragraph In paragraphs
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = paragraph
result = result & Chr(10) & Chr(10) & oDoc.body.innerText
Next paragraph
HtmlToText = result
EndFunction
Post a Comment for "Convert Html To Plain Text In Vba"