Setting regex in python for a complex string -


i have string of ingredients of product this:

text = 'pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings' 

i want detect text (ingredients) such should this.

ingredientslist= ['pork , beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings'] 

the current regex using here following:

ingredients = re.findall(r'\([^()]*\)|([^\w\d]+(?:\s+[^\w\d]+)*)', text) 

but not providing the text in bracket. did not want include codes , percentages want ingredients inside brackets. should here ? in advance.

you may restrict first branch match codes start e , followed number:

\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*) 

see regex demo

now, \(e\d+\) match (exxx)-like substrings only, , others processed. may add percentages here, too, explicitly skip them - \((?:e\d+|\d+(?:[.,]\d+)?%)\).

python demo:

import re rx = r"\(e\d+\)|([^\w\d]+(?:\s+[^\w\d]+)*)" s = "pork , beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (e450), glucose, antioxidant (e316), preservative (e250), flavorings" res = [x x in re.findall(rx, s) if x] print(res) 

Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -