Search

Reinhart Previano K.

Do you love to Ctrl-K, Ctrl-/, or / ? Now you can do three of them (>_ )!

No results so far...

Contact Information

The whole Trust+ / Internet Sehat blocklist database, now in one regular expression;

Share Copy Link Print PDF Embed Share to Email Share to SMS Yahoo! Share to Yahoo! Mail Mastodon Share to Mastodon Share to KakaoStory Messenger Share to Messenger Pocket Share to Pocket Flipboard Share to Flipboard Pinterest Share to Pinterest Reddit Share to Reddit Y Combinator Share to Hacker News Odnoklassniki Share to Odnoklassniki Blogger Share to Blogger Pleroma Share to Pleroma Share to Friendica Share to KakaoTalk 1Artboard 1 copy 2 Share to Snapchat Xing Share to Xing Share to Misskey LINE Share to LINE Evernote Share to Evernote WhatsApp Share to WhatsApp LiveJournal Share to Livejournal Diaspora Share to Diaspora Share to Gmail Threads Share to Threads Threema Share to Threema Share to X Tumblr Share to Tumblr Buffer Share to Buffer LinkedIn Share to LinkedIn Mail.Ru Share to mail.ru VK Share to VKontakte Trello Share to Trello Facebook Share to Facebook Bluesky Share to Bluesky Skype Share to Skype Hatena Bookmark Share to Hatena Bookmark! Share via MastodonShare Telegram Share to Telegram WordPress Share to WordPress.com

Embed

This website supports oEmbed. To quickly use oEmbed, just copy this site's link to your oEmbed-supported apps and websites like WordPress.

Alternatively, copy and paste the HTML code below to embed this post in your website.

($_ )! We have made this thing responsive, but recommend at least 512x512 pixels for best results.
<iframe src="https://reinhart1010.id/blog/2023/01/07/trustpositif-regex?embed" height="512" width="512" style="border:none;"><a href="{{ $canonical }}">https://reinhart1010.id/blog/2023/01/07/trustpositif-regex</a></iframe>
Preview
Cover image for The whole Trust+ / Internet Sehat blocklist database, now in one regular expression;

(#_ )!

At the end of 2022, I decided to experiment on building a lightweight Indonesian internet blocklist database, which can be consumed offline.

No network connections to servers of Kominfo, Telkom Indonesia, and community-run services like indi.wtf. Because all you need is a freakin' huge regular expression.

Research methods

We wrote a simple Go script to compile the official Indonesian internet blocklist, found on https://trustpositif.kominfo.go.id, and convert it into a freakin' huge trie. Then that trie is then converted into regular expressions.

And to test whether the regex is effective, we decided to test the generated regex back against the original list of blocked domains.

Results

The experiment grew a 20MB-ish regex file, representing the freakin' huge trie I have mentioned earlier. That said, there's always many ways to improve, including reversing the original domain's arrangement of characters (e.g. "alterine0101.id" ➡️ "di.1010eniretla") to yield more compact results (because there are more domains ending with ".com" instead of those starting with "www.").

Unfortunately, these gigantic regex files cannot be parsed by Go's own regexp system library, hence we decided to use the regexp2 library instead, which is based on Microsoft's regex parses implementation for .NET.

And even if I switch to regexp2, only the reversed version of the regex would work well. I feel confident that the generated regex is 99.9% accurate, tested on Reinhart's M1 MacBook Air with no issues.

You can see my GitHub repo here for the code and the results. Feel free to use that as a benchmark tool for PCRE regex engines out there. We may eventually update the blocked domains list, eventually, to ensure the freshness of these regex-based blocklists.

That's all and (#_ )!

Share Copy Link Print PDF Embed Share to Email Share to SMS Yahoo! Share to Yahoo! Mail Mastodon Share to Mastodon Share to KakaoStory Messenger Share to Messenger Pocket Share to Pocket Flipboard Share to Flipboard Pinterest Share to Pinterest Reddit Share to Reddit Y Combinator Share to Hacker News Odnoklassniki Share to Odnoklassniki Blogger Share to Blogger Pleroma Share to Pleroma Share to Friendica Share to KakaoTalk 1Artboard 1 copy 2 Share to Snapchat Xing Share to Xing Share to Misskey LINE Share to LINE Evernote Share to Evernote WhatsApp Share to WhatsApp LiveJournal Share to Livejournal Diaspora Share to Diaspora Share to Gmail Threads Share to Threads Threema Share to Threema Share to X Tumblr Share to Tumblr Buffer Share to Buffer LinkedIn Share to LinkedIn Mail.Ru Share to mail.ru VK Share to VKontakte Trello Share to Trello Facebook Share to Facebook Bluesky Share to Bluesky Skype Share to Skype Hatena Bookmark Share to Hatena Bookmark! Share via MastodonShare Telegram Share to Telegram WordPress Share to WordPress.com

Embed

This website supports oEmbed. To quickly use oEmbed, just copy this site's link to your oEmbed-supported apps and websites like WordPress.

Alternatively, copy and paste the HTML code below to embed this post in your website.

($_ )! We have made this thing responsive, but recommend at least 512x512 pixels for best results.
<iframe src="https://reinhart1010.id/blog/2023/01/07/trustpositif-regex?embed" height="512" width="512" style="border:none;"><a href="{{ $canonical }}">https://reinhart1010.id/blog/2023/01/07/trustpositif-regex</a></iframe>
Preview

Reinhart Previano Koentjoro
Reinhart Previano Koentjoro
Citra Manggala Dirgantara
Citra Manggala Dirgantara

A Reinhart company

Products

Company