Skip to content Skip to sidebar Skip to footer

Convert Html To Plain Text In Vba

I have an Excel sheet with cells containing html. How can I batch convert them to plaintext? At the moment there are so many useless tags and styles. I want to write it from scratc

Solution 1:

Set a reference to "Microsoft HTML object library".

Function HtmlToText(sHTML) AsStringDim oDoc As HTMLDocument
  Set oDoc = New HTMLDocument
  oDoc.body.innerHTML = sHTML
  HtmlToText = oDoc.body.innerText
EndFunction

Tim

Solution 2:

A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.

Function StripTags(ByVal html AsString) AsStringDimtextAsStringDim accumulating AsBooleanDim n AsIntegerDim c AsStringtext = ""
    accumulating = True

    n = 1DoWhile n <= Len(html)

        c = Mid(html, n, 1)
        If c = "<"Then
            accumulating = FalseElseIf c = ">"Then
            accumulating = TrueElseIf accumulating Thentext = text & c
            EndIfEndIf

        n = n + 1Loop

    StripTags = textEndFunction

This can leave lots of extraneous whitespace, but it will help in removing the tags.

Solution 3:

Tim's solution was great, worked liked a charm.

I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:

SetID= ThisWorkbook.VBProject.References
ID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5

It worked on Windows XP and Windows 7.

Solution 4:

Tim's answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.

Function HtmlToText(sHTML) AsStringDim oDoc As HTMLDocument

      If IsNull(sHTML) Then
        HtmlToText = ""ExitFunctionEnd-IfSet oDoc = New HTMLDocument
      oDoc.body.innerHTML = sHTML
      HtmlToText = oDoc.body.innerText
    EndFunction

Solution 5:

Yes! I managed to solve my problem as well. Thanks everybody/

In my case, I had this sort of input:

<p>Lorem ipsum dolor sit amet.</p>

<p>Ut enim ad minim veniam.</p>

<p>Duis aute irure dolor in reprehenderit.</p>

And I did not want the result to be all jammed together without breaklines.

So I first splitted my input for every <p> tag into an array 'paragraphs', then for each element I used Tim's answer to get the text out of html (very sweet answer btw).

In addition I concatenated each cleaned 'paragraph' with this breakline character Crh(10) for VBA/Excel.

The final code is:

PublicFunction HtmlToText(ByVal sHTML AsString) AsStringDim oDoc As HTMLDocument
    Dim result AsStringDim paragraphs() AsStringIf IsNull(sHTML) Then
      HtmlToText = ""ExitFunctionEndIf

    result = ""
    paragraphs = Split(sHTML, "<p>")

    ForEach paragraph In paragraphs
        Set oDoc = New HTMLDocument
        oDoc.body.innerHTML = paragraph
        result = result & Chr(10) & Chr(10) & oDoc.body.innerText
    Next paragraph

    HtmlToText = result
EndFunction

Post a Comment for "Convert Html To Plain Text In Vba"