Remove All Strings In { } Delimiter Using Regex Or Html Agility Pack In ASP.NET Web Forms
Solution 1:
i have been using HtmlAgilityPack to load an web page and extract the text content only so when i'm loading the page and extract the text the css and javascript text also is extracted so i try this method of regex to remove the javascript and css from the output text by detect the { } delimiter but was hard so i try anther way and it work and much simpler by using the Descendants()
from HtmlAgilityPack and my code is
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
doc.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "style" || n.Name == "#comment")
.ToList()
.ForEach(n => n.Remove());
string s = doc.DocumentNode.InnerText;
TextArea1.Value = Regex.Replace(s, @"\t|\n|<.*?>","");
and find this from : THIS LINK
and every thing works now.
Solution 2:
why dont you simply try :
/\{.*?\}/g
and replace with nothing.
Solution 3:
You have nested braces.
In Perl, PHP, Ruby, you could match the nested braces using (?R)
(recursion syntax). But .NET does not have recursion. Does this mean we are lost? Luckily, no.
Balancing Groups to the Rescue
C# regex cannot use recursion, but it has an awesome feature called balancing groups.
This regex will match complete nested braces.
(?<counter>{)(?>(?<counter>{)|(?<-counter>})|[^{}]+)+?(?(counter)(?!))
For instance, it will match
{sdfs{sdfs}sd{d{ab}}fs}
{ab}
- But not
{aa
Solution 4:
You want to match all case of '{' to '}' including every character which isn't '}' between the pair, then use the following:
/\{[^\}]+\}/g
Solution 5:
int x=0, y=0;
int l=string.lastIndexOf("}");
do
{
x= string.indexof("{", x) + 1;
y= string.indexof{"}", x};
string.remove(x, y-x);
}
while(y!=l);
Post a Comment for "Remove All Strings In { } Delimiter Using Regex Or Html Agility Pack In ASP.NET Web Forms"