Text Filter & Filter Packs

Text Filter & Filter Packs

Maxthon 2.0's text filter is the most effective way of removing ads, annoyances, and malicious contents from web pages. Essentially text filter uses advanced expressions to match and replace the source code of web pages on the fly, before the rendering engine has a chance to look at it. So text filter can not only remove any unwanted content from webpages but also change how webpages appear or behave in user's computer. Until Maxthon 2.0, text filter is available only in sophisticated and dedicated web filtering applications like Proxomitron and Privoxy.

To enable the use of Text Filter in Maxthon 2.0:
Tools > Maxthon Setup Center > Ad Hunter > Enable Text Filter & Filter Packs

Writing text filters requires some advanced (but not overly) skills, which will be discussed in the next posts. But users don't need to make their own to enjoy the advantages of text filter. In Maxthon 2.0, text filters can be conveniently shared in form of Filter Pack, which is a collection of text filters packed in a file with extension "m2f". Filter pack will be automatically installed if doubleclicked or dragged into Maxthon 2.0's main window. Installed filter packs can be enabled or disabled individually from status bar, and edited or removed in Maxthon Setup Center > Filter Packs.



Filter pack can be made global, ie. it contains text filters which have no restriction on where the filters will be applied. If a global filter pack causes problem in specific websites, user can exclude the specific sites by editing the text filters.

The following are some global filter packs for various purposes:
Disable Mouse Restrictions
Disable Custom cur_sors
Disable Custom Scrollbar
Disable Blank Download Tab
Open Links in New Tab

But generally, it is advisable to make filter packs site specific, ie. filters are restricted to work only in specific sites. Being site specific has the advantage that the filters are skipped while loading non-applicable sites. This not only improves the page loading speed but also reduces erroneous filtering. In addition, user can choose to install only filter packs relevant to websites he visits.

The following are some site specific filter packs for removing ads in specific sites:
Yahoo Sports
Nba.com
Espn
Writing text filter requires some skills. A text filter author must be able to identify the html/js/css codes corresponding to ads, annoyances, or malicious contents, so he will need some html knowledge. Then he must be able to match the identified codes with expression, so that the codes can be replaced/removed, in the case of Maxthon 2.0 he will need some knowledge of standard regular expression. Finally, he needs to know the format of Maxthon 2.0 text filters, which is documented in detail here.

As complicated as it sounds, it is actually not difficult to block ads with text filter. But then why would users block ads with the relatively complicated text filter, while the good old content filter works just fine? Actually content filter should work well in most cases, but content filter has limitations:

Example 1 - text ad

CODE
<a href=...><font size=7>You have won a lottery, check it out!!!</font></a>

Content filter can do nothing to this.

Example 2 - empty space

CODE
<table height=90 width=728><tr><td><a href=...><img src=ad.gif></a></td></tr></table>

Content filter can block "ad.gif", but left an empty space defined by the <table> element. Text filter can remove the entire <table>.

Example 3 - random ad

CODE
<div id=upperad><a href=adclick?...><img src=random_image></a></div>

Content filter can block this random_image (here means a random image but not literally containing the word "random" ), but may not the next. Instead text filter can remove the link which contains "adclick?", or the <div> which has id "upperad".

Example 4 - ad section

CODE
<!-- start ads -->
<a href=...><img src=pic.gif></a>
<a href=...><img src=image.gif></a>
<a href=...><img src=graphic.gif></a>
<a href=...><img src=icon.gif></a>
<a href=...><img src=art.gif></a>
<!-- end ads -->

Content filter can block "pic.gif", "image.gif", "graphic.gif", "icon.gif", "art.gif" one by one. Instead text filter can simply remove everything from "<!-- start ads -->" to "<!-- end ads -->" in one go.

...

Naturally, if content filter already serves your need, you don't have to use text filter. If not text filter is always available for your summon.

Now let's have a walk through of the process.

 

Create a Filter Pack

- Tools > Maxthon Setup Center > Filter Packs > Create New - Enter a name eg. "My Filters" and click OK

- Enable the new filter pack and select "Edit"

- A dialog consisting 2 filters (for illustration, can delete both) will open

- Click the "Properties" button and change the information accordingly

 

 

Add a Text Filter

- Click the "Add" button and select "Text Filter" from the drop down menu

- Enter a name eg. "Filter1" and click OK - Select "Filter1" on the left panel and enter relevant information on the right panel

 

 

For ordinary ads blocking, only the following fields are required in most cases:

- action: the action to take when a match is found, just enter "3" (replace) here

- match: the regular expression which will be used for matching

- replace: the replacement for the matched content, can leave blank for ads removal

- match_url: the url(s) where the filter should work, will work on all pages if leave blank

- exclude_url: the url(s) where the filter should not work, has higher priority than the match_url

- bound: the regular expression for "pre-matching", more on this later.

Case 1

nba.com contains ads which are enclosed in the comment tags <!-- begin ad tag --> and <!-- End ad tag -->, eg.

CODE
<!-- begin ad tag -->
<script language="javas cript" type="text/javas cript">
document.write('<script language="javas cript1.1" ...
...
<!-- End ad tag -->


We can remove all such codes as follows (the green words are explanations but not part of the filter):

action=3 (replace matched contents)
match=<!-- begin ad tag -->.*?<!-- End ad tag --> (see note3 below)
replace= (empty replacement)
match_url=nba\.com (only active in nba.com)
exclude_url= (no exclude_url is required in this case)
bound= (no bound is required in this case)

Note
1. In regular expression, . means any single character (equivalent to ? in the tranditional wildcards). To specify a literal . , \. is required
2. In regular expression, .* means any number of any character (equivalent to * in wildcards). But we should use .*? so that the match will stop after the first occurrence of "<!-- End ad tag -->"
3. <!-- begin ad tag -->.*?<!-- End ad tag --> means matching <!-- begin ad tag --> and then anything until after the first <!-- End ad tag -->
Case 2

Most ads in sports.yahoo.com are hosted in table cells <td> which includes an "ADVERTISEMENT" note, eg:

CODE
<td align="center"><font face="arial" size="-2">ADVERTISEMENT</font><br><IFRAME SRC=...
...
</td>


We can remove all such codes as follows (the green words are explanations but not part of the filter):

action=3 (replace matched contents)
match=<td.*?><font.*?>ADVERTISEMENT.*?</td> (see notes below)
replace= (empty replacement)
match_url=sports\.yahoo\.com (only active in sports.yahoo.com)
exclude_url= (no exclude_url is required in this case)
bound= (no bound is required in this case)

Note
1. <td.*?> means matching <td and then anything until after the first >
2. <font.*?> means matching <font and then anything until afterthe first >
3. ADVERTISEMENT.*?</td> means matching ADVERTISEMENT and then anything until after the first </td>
Case 3

Each espn.go.com page contains several ads contained in <div> elements with the following id:

upperad
rightcolad
ad_Poster
ad_InContent
ad_MarketingLogo

eg.
CODE
<div id="upperad" style="width: 772px; height: 90px; background-image: url(http://espn-ak.starwave.com/i/ad_bgd.gif); background-repeat: repeat;">
<center><iframe src...
...
</div>


action=3 (replace matched contents)
match=<div id="(upperad|rightcolad|ad_Poster|ad_InContent|ad_MarketingLogo)".*?</div> (| means "or" )
replace= (empty replacement)
match_url=espn\.go\.com (only active in espn.go.com)
exclude_url= (no exclude_url is required in this case)
bound= (no bound)

The above filter can remove the ads, but sometimes pages are distorted. It is because sometimes the <div> element is nested. It can be <div id="upperad">...</div> or <div id="upperad">...<div>...</div>...</div>. The problem is that in the latter case, only the blue part is removed. In this case we can use the "bound" field and revise the filter as follows:

action=3 (replace matched contents)
match=<div id="(upperad|rightcolad|ad_Poster|ad_InContent|ad_MarketingLogo)".* (see note2 below)
replace= (empty replacement)
match_url=espn\.go\.com (only active in espn.go.com)
exclude_url= (no exclude_url is required in this case)
bound=<div.*?<div.*?</div>.*?</div>|<div.*?</div>(see note1 below)

Note
1. The "bound" essentially pre-match a code segment for subsequent match/replace. In this case the "bound" will return either <div...>...<div...>...</div>...</div> or <div...</div>. Now there is no possibility of matching only the blue part erroneously.

So blocking ads with text filter isn't difficult. If any problem is encountered in writing text filter, feel free to discussed here.  

 

And last but not least, why not share your good work with other users?

Tools > Maxthon Setup Center > Filter Packs > [select a filter pack] > Export

 

A filter pack (.m2f file) will be created and can be uploaded to the forum or the addons site for sharing.

Don't mean to be rude, but there's a lack of filters available, and there's more in your screenshot. Maybe you'd like to share some more? :-D would like a filter on those annoying "key word" ads, which when u hover over a keyword, it popup the ad, kind of like neowin.net has it. Also removing googleads.
However, I have tested and found that for simple ad, web content filter seems much faster then using text filter. I tested both filters with 600+ rules. Text filter will sometimes cause Maxthon "not responding". I have to force close and restart Maxthon again.
Adblock Plus for Maxthon2 -- SiteListUpdater
Quote: Original posted bypekkle at 2007-02-27 10:10
However, I have tested and found that for simple ad, web content filter seems much faster then using text filter. I tested both filters with 600+ rules. Text filter will sometimes cause Maxthon "not responding". I have to force close and restart Maxthon again.
That could happen with a lot of text filters. But as mentioned in previous posts, text filters for ad blocking are better made site specific. In that case, only relevant text filters (maybe a dozen or two at most) would be applied and the browsing speed would not be much affected.
Where can I read information about regexp expressions?
Quote: Original posted byZlydenGL at 2007-02-27 01:14
Where can I read information about regexp expressions?
There are plenty of regexp tutorials if you google [:smile:]

[Intel Core i7 920, 12GB RAM | ATI 4870 PCI-E 2GB |30" TFT @ 2560x1600 | Windows 7 Ultimate]

Here's one that is quite good and elaborate http://www.regular-expressions.info/tutorial.html
I have simple question - how I can replace &quot; --> " It's work with quot or quot; but not work with &quot; or \&quot; Symbol & don't want to be replaced ! What I need to write in match rule? Also &amp; --> &
how about \\\&quot\; for " \\\&amp\; for &
Adblock Plus for Maxthon2 -- SiteListUpdater
Quote: Original posted byEugenga at 2007-02-28 05:57
I have simple question - how I can replace &quot; --> "
May I ask what you are trying to achieve? are you sure the quote character is actually present as html entity and not just as actual character?
QUOTE(JarC @ 2007-03-01 02:41:44 PM) [snapback]392403[/snapback]
QUOTE(Eugenga @ 2007-02-28 05:57:47 PM) [snapback]392354[/snapback]
I have simple question - how I can replace &quot; --> "

May I ask what you are trying to achieve? are you sure the quote character is actually present as html entity and not just as actual character?

It's my most frequent visited site, and on one page there is such bugs (&quote; insteed of ") Admin of this torrent tracker answered to me "That is the feature, not bug :-)". But I don't want see any more that "feature", espessialy - that feature apears only on one page in names of torrens with ".

!!! Problem solved, thank to all, and sory for such simple question and my lack of attention.
I saw HTML code, I saw here - &amp;quot; - what the stupidity. I put this in match rule and now it work very well.
QUOTE(Eugenga @ 2007-03-02 09:03:19 PM) [snapback]392519[/snapback]
QUOTE(JarC @ 2007-03-01 02:41:44 PM) [snapback]392403[/snapback]
QUOTE(Eugenga @ 2007-02-28 05:57:47 PM) [snapback]392354[/snapback]
I have simple question - how I can replace &quot; --> "

May I ask what you are trying to achieve? are you sure the quote character is actually present as html entity and not just as actual character?

It's my most frequent visited site, and on one page there is such bugs (&quote; insteed of ") Admin of this torrent tracker answered to me "That is the feature, not bug :-)". But I don't want see any more that "feature", espessialy - that feature apears only on one page in names of torrens with ".

!!! Problem solved, thank to all, and sory for such simple question and my lack of attention.
I saw HTML code, I saw here - &amp;quot; - what the stupidity. I put this in match rule and now it work very well.

Glad to see your problem solved. Care to share your filter with us? " border="0" alt="smile.gif" />
Quote: Original posted byabc@home at 2007-03-02 03:09
Glad to see your problem solved. Care to share your filter with us? [:smile:]
It's absolutely useless for all of you. Page http://www.torrents.md/watcher.php available only inside .md domain and for site members. <filter enable="1" name="torrents.md Quot" author="Eugenga" type="text" comment="" action="3" postaction="0" priority="0" match="&amp;quot;" match_count="0" replace=""" update="" match_url="http://www.torrents.md/watcher.php" exclude_url="" bound="" max_bound_size="0" exclude="" ver="1.0" keywords="" /> As you can see - there is errors of site admins, on this page ALL special sequences started by &... replaced by started with &amp; and Maxthon absolutely correctly replace &amp; -> &. I long time can't resolve this, while I not saw in HTML-code. BTW, is there any manual for all filters options (exclude this topic)? I know only action=3, but what doing other values?
I have a question, may be simple for other, but not for me
I try to change background for some rows of table, and I do it, but not for such rows, which I want :-) . look at image


I think, that match rule more wide, than I need. Rule start working from precedent 1-2 rows. How I can limit rule, I need rule working in single <tr></tr> tag. I play with bound, and other settings, but haven't so much knowelege. Sorry.

QUOTE
<tr>
<td align=center style='padding: 0px'><a href="browse.php?cat=2"><img border="0" src="pic/categs/cat_music.gif" alt="Music" /></a></td>
<td align=left><a href="details.php?id=85842"><b>Gavriil Gronic - Din neamuri vechi de lutari [2001/MP3/256] [Folklore]</b></a>
</td>
<td align="right"><b><a href="details.php?id=85842&amp;filelist=1">24</a></b></td>
<td align="right">0</td>
<td align=center><nobr>18.</nobr></td>
<td align=center><nobr>135.64</nobr></td>
<td align=center>0<br>times</td>
<td align=right><b><a href=details.php?id=85842&amp;toseeders=1><font color=#000000>1</font></a></b></td>
<td align="right">0</td>
<td align=center nowrap><b><a href=userdetails.php?id=21433>Excelsior</a></b></td>
</tr>

In bold - what I try to check (I want highlight torrents from some prefered for me peoples/teams)

match = (<tr.*?>)(.*?(bibigon123|JAndrew|Nexx))
replace = <tr style="background-color: #ACACF2;">$2

the others not good working matches:
(<tr.*?>)(.*?(bibigon123|JAndrew|Nexx))(.*?</tr) or
(<tr.*?>)(.*?(bibigon123|JAndrew|Nexx).*?</tr)

when I write bound = <tr.*?</tr>
or max_bound_size = 1000
-- not working at all

I thinked to add in future ...|name1|name2|name3..... - names of my friends and highlight their rows.
Excuse my english :-(