Poor regex performance for word matching on postgres

I have a list of blocked phrases and I want to match for the existence of those phrases in user inputed text, but performance is very bad.

I am using this query :

SELECT value FROM blocked_items WHERE lower(unaccent( 'my input text' )) ~* ('[[:<:]]' || value || '[[:>:]]') LIMIT 1;

After my investigation I found out that world boundries [[:<:]] and [[:>:]] perform very badly knowing that blocked_items has 24k records in it.

[[:<:]]

[[:>:]]

For instance when I try to run this one:

SELECT value FROM blocked_items WHERE lower(unaccent( 'my input text ' )) ilike ('%' || value || '%') LIMIT 1;

it's very fast compared to the first one. The problem is that I need to keep the test for word boundries.

This check is performed frequently on a large program so the performance is very important for me.

Do you guys have any suggestions for making this faster?

EXPLIAN ANALYZE screenshot

Maybe you can use Full Text Search? Maybe that performs better.
– sticky bit
Jun 30 at 13:13

You have not just removed the word boundary assertions, you have changed the operators from regex matching to substring matching. Of course doing fundamentally different things takes different amount of time. If I simply add or remove word boundary assertions while still using the same operator, I get trivial differences in time. Of course I had to make up my own regex, bause I don't know what is in your blocked_items table.
– jjanes
Jun 30 at 13:51

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Search between a Gas Station

Poor regex performance for word matching on postgres

Poor regex performance for word matching on postgres

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

Python Tkinter Error, “Too Early to Create Image”