C-Sharp
20 Aug 2021

How to remove any tag/element from HTML string in C sharp.

I was working on this project where a HTML article is returned from an API in string format, some of these articles have inline style section added to the top which was breaking my UI, so I decided to remove the whole style section from the HTML string and here is how i did it.

HTML

<style type="text/css">
  .body{
    padding: 1rem;
  }
  .page-title{
    /* some style */
  }
</style>
<article>
  <h1>...</h1>
</article>

I am using LINQ to XML to remove the element because XML document structure is similar to HTML. Here is how it goes.

C#

// Getting html string from an API
var htmlString = await someService.GetHtmlStringAsync(ArticleId);
if (!string.IsNullOrWhiteSpace(htmlString))
{
  // Parse as xml doc, wrap in single root element
  var xmlDoc = XElement.Parse("<div>" + htmlString + "</div>");
  if (xmlDoc != null)
  {
    // get style element
    var styleElement = xmlDoc.Elements("style");
    if (styleElement != null)
    {
      styleElement.Remove();
    }
    htmlString = xmlDoc.ToString();
  }
}

And here is the result

HTML

<article>
  <h1>...</h1>
</article>