Setting regex in python for a complex string -

i have string of ingredients of product this:

text = 'pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings'

i want detect text (ingredients) such should this.

ingredientslist= ['pork , beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']

the current regex using here following:

ingredients = re.findall(r'\([^()]*\)|([^\w\d]+(?:\s+[^\w\d]+)*)', text)

but not providing the text in bracket. did not want include codes , percentages want ingredients inside brackets. should here ? in advance.

you may restrict first branch match codes start e , followed number:

\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*)

see regex demo

now, \(e\d+\) match (exxx)-like substrings only, , others processed. may add percentages here, too, explicitly skip them - \((?:e\d+|\d+(?:[.,]\d+)?%)\).

python demo:

import re rx = r"\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*)" s = "pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings" res = [x x in re.findall(rx, s) if x] print(res)

Search This Blog

WIKI

Setting regex in python for a complex string -

Comments

Post a Comment

Popular posts from this blog

qt - QML MouseArea onWheel event not working properly when inside QML Scrollview -

java - is not an enclosing class / new Intent Cannot Resolve Constructor -

python - Error importing VideoFileClip from moviepy : AttributeError: 'PermissionError' object has no attribute 'message' -