Setting regex in python for a complex string -
i have string of ingredients of product this:
text = 'pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings'
i want detect text (ingredients) such should this.
ingredientslist= ['pork , beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']
the current regex using here following:
ingredients = re.findall(r'\([^()]*\)|([^\w\d]+(?:\s+[^\w\d]+)*)', text)
but not providing the text in bracket. did not want include codes , percentages want ingredients inside brackets. should here ? in advance.
you may restrict first branch match codes start e
, followed number:
\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*)
see regex demo
now, \(e\d+\)
match (exxx)
-like substrings only, , others processed. may add percentages here, too, explicitly skip them - \((?:e\d+|\d+(?:[.,]\d+)?%)\)
.
import re rx = r"\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*)" s = "pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings" res = [x x in re.findall(rx, s) if x] print(res)
Comments
Post a Comment