regex - Regular Expression for Digits and Special Characters - C# -


i use html-agility-pack extract information websites. in process data in form of string , use data in program.

sometimes data includes multiple details in single string. name of movie "dog eats dog (2012) (2012)". name should have been "dog eats dog (2012)" rather first one.

above 1 example many. in order correct issue tried use string.distinct() method remove duplicate characters in string in above example return "dog eats (2012)". solved initial problem removing 2nd (2012) created new 1 changing actual title.

i thought problem solved regex have no idea how can use here. far know if use regex tell me there duplicate items in string according defined regex code.

but how remove it? there can string "meme 2013 (2013) (2013)". actual title "meme 2013" year (2013) , duplicate year (2013). if bool value indicating string has duplicate year, cant think of method remove duplicate substring.

the duplicate year comes in end of string. should regex use determine string has 2 years in it, (2012) (2012)?

if can correctly identify string contains duplicate maybe can use string.lastindexof() try , remove duplicate part. if there better way please let me know.

thanks.

the right regex "( \(\d{4}\))\1+".

string pattern = @"( \(\d{4}\))\1+"; new regex(pattern).replace(s, "$1"); 

example here : https://repl.it/evcy/2

explanation:
capture 1 " (dddd)" block, , remove following identical ones.
( \(\d{4}\)) capture, \1+ finds non empty sequence of captured block

finally, replace initial block , copies initial block alone.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -