C-Sharp
20 Aug 2021
How to remove any tag/element from HTML string in C sharp.
I was working on this project where a HTML article is returned from an API in string format, some of these articles have inline style section added to the top which was breaking my UI, so I decided to remove the whole style section from the HTML string and here is how i did it.
HTML
<style type="text/css">
.body{
padding: 1rem;
}
.page-title{
/* some style */
}
</style>
<article>
<h1>...</h1>
</article>
I am using LINQ to XML to remove the element because XML document structure is similar to HTML. Here is how it goes.
C#
// Getting html string from an API
var htmlString = await someService.GetHtmlStringAsync(ArticleId);
if (!string.IsNullOrWhiteSpace(htmlString))
{
// Parse as xml doc, wrap in single root element
var xmlDoc = XElement.Parse("<div>" + htmlString + "</div>");
if (xmlDoc != null)
{
// get style element
var styleElement = xmlDoc.Elements("style");
if (styleElement != null)
{
styleElement.Remove();
}
htmlString = xmlDoc.ToString();
}
}
And here is the result
HTML
<article>
<h1>...</h1>
</article>