Find as you type sorting on large record sets

One of the sorting methods I’ve been working on this week is implementing a fast find as you type method for close to 30,000 records in AS3. The feature works exactly like iTunes library search, just with a lot more data (unless you’re rocking 1800 hours worth of music). The only rule with implementing this kind of feature is that it has to feel instant to the end user, otherwise we’ll have to switch to a Find-after-pressing-search method.

Setup
The records are stored locally as a flat xml structure.

<entry id="20081" name="Antiques" fullName="Antiques" />
<entry id="37903" name="Antiquities (Classical, Amer.)" fullName="Antiques|Antiquities (Classical, Amer.)" />
<entry id="37905" name="Egyptian" fullName="Antiques|Antiquities (Classical, Amer.)|Egyptian" />
<entry id="37906" name="Greek" fullName="Antiques|Antiquities (Classical, Amer.)|Greek" />
<entry id="37907" name="Roman" fullName="Antiques|Antiquities (Classical, Amer.)|Roman" />

The data is then loaded in as an XMLList and bound to a datagrid. From there, searches should match keywords in the fullName attribute and the datagrid will show only the entries matched.

Method #1 – Iteration
There are 2 ways to iterate the search, either use AS3’s built in E4X expressions to generate unique xml lists, or use collection filtering on the existing list. The first method flexes the power of E4X expressions, but takes a hit on swapping out the datagrid with a new dataprovider, the second works on the existing dataprovider but lacks any indexing power xml might provide.
E4X expression filtering

filteredXMLList = rawXMLList.(@fullName.indexOf(searchTerm) > -1);

Filter Time: 200 ms
Collection Filtering

filteredXMLListCollection = new XMLListCollection(rawXMLList);
filteredXMLListCollection.filterFunction = collectionFilter;
private function collectionFilter(item:XML):Boolean{
	return item.@fullName.indexOf(searchTerm) > -1;
}

Filter Time: 300-200 ms
Using E4X expressions is more powerful then filtering, but the total time for both methods still feels sluggish in the UI.

Method #2 – Pattern matching full text.
Again, there are 2 approaches to pattern matching strings in AS3, RegExp and String.indexOf(). The idea is to apply pattern matching on the raw string representation of the xml to create a new xml string containing only the nodes matched. The new string is then cast back into an XMLList to be placed in the datagrid.
RexExp matching

var pattern:RegExp = new RegExp("<[^<]*" + searchTerm + "[^>]*>", "ig");
var result:Array = rawXMLString.match(pattern);
filteredXMLList = new XMLList(result.join(""));

Filter Time: 31000-14000 ms (31-14 seconds)

IndexOf matching

private function searchStringFor(source:String, term:String):String{
	var lowersource:String = source.toLowerCase();
	var lowerterm:String = term.toLowerCase();
	var i:Number = 0;
	var out:String = "";
	var pos:Number = 0;
	while(i > -1){
		pos = lowersource.indexOf(lowerterm, i);
		if(pos > -1){
			out += source.substring(source.lastIndexOf("<", pos), source.indexOf(">", pos)+1);
			i = pos+term.length;
		}else{
			i = pos;
		}
	}
	return out;
}
filteredXMLList = new XMLList(searchStringFor(rawXMLString, searchTerm));

Filter Time: 200-50 ms

While the RegExp and indexOf are performing the same match, RegExp clearly has a long way to go before being viable in AS3. Even using simply pattern matches with String.search() during Method #1 above was 3 times slower then an indexOf during iteration. IndexOf in Method #2 on the other hand has gotten us down to a 200-50 ms filter time. 50 ms is decent but 200 ms still feels slow.

Final Method – Key search + reference copy
At this point, indexOf searches on the full string block appear to be the fastest way of filtering, and it provides the most room for refinement. Two aspects are contributing to the slowdown at this point: 1. unneccessarily searching the full xml structure, 2. constructing a new xml string and casting it to XMLList after each search. We can cover both issues by initially loading the xml string into an XMLList and iterating that list to generate a second string containing just the fullName field and the index that fullName belongs to in the original XMLList

rawXMLList = XMLList(urlLoader.data);
var count:int = 0;
for each(var node:XML in rawXMLList){
	fullNameIndex += node.@fullName.toLowerCase() + "<" + count + ">";
	count++;
}

This will generate a string looking like

Antiques<0>
Antiques|Antiquities (Classical, Amer.)<1>
Antiques|Antiquities (Classical, Amer.)|Egyptian<2>
Antiques|Antiquities (Classical, Amer.)|Greek<3>
Antiques|Antiquities (Classical, Amer.)|Roman<4>

Now when we do an indexOf search on the above string, we can look ahead to the index inside the < > tags to find which reference in the original rawXMLList to use. Those references are then set into a new XMLList to display.

private function searchStringInAttribute(list:XMLList, attributelist:String, term:String):XMLList{
	var output:XMLList = new XMLList();
	var i:Number = 0;
	var pos:Number = 0;
	var index:int = 0;
	var count:int = 0;
	while(i > -1){
		pos = attributelist.indexOf(term, i);
		if(pos > -1){
			index = int( attributelist.substring(attributelist.indexOf("<", pos)+1, attributelist.indexOf(">", pos)) );
			output[count] = list[index];
			count++;
			i = pos+term.length;
		}else{
			i = pos;
		}
	}
	return output;
}
filteredXMLList = searchStringInAttribute(rawXMLList, fullNameIndex, searchTerm.toLowerCase());

Filter Time: 70-10 ms

These filtering times are excellent and feel extremely smooth. I was a bit surprised to see that performing an indexOf search on a large block of text like this, then looking for the XMLList key in string form, was faster then performing a search inside an array iteration and mapping the array key to the XMLList key. The final source is listed below.

<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" creationComplete="init()">
	<mx:Script>
		<![CDATA[
			import flash.utils.getTimer;

			[Bindable]
			private var filteredXMLList:XMLList;
			private var rawXMLList:XMLList;
			private var fullNameIndex:String;
			private var urlLoader:URLLoader;

			private function init():void{
				urlLoader = new URLLoader();
				urlLoader.addEventListener(Event.COMPLETE, entriesLoaded);
				urlLoader.load(new URLRequest("entryList.xml"));
			}
			private function entriesLoaded(e:Event):void{
 				rawXMLList = new XMLList(urlLoader.data);
 				var count:int = 0;
 				var node:XML;
 				for each(node in rawXMLList){
 					fullNameIndex += node.@fullName.toLowerCase()+"<"+count+">";
 					count++;
 				}
 				applyFilter();
 				urlLoader.removeEventListener(Event.COMPLETE, entriesLoaded);
 				urlLoader = null;
			}
			private function searchStringInAttribute(list:XMLList, attributelist:String, term:String):XMLList{
				var output:XMLList = new XMLList();
				var i:Number = 0;
				var pos:Number = 0;
				var index:int = 0;
				var count:int = 0;
				while(i > -1){
					pos = attributelist.indexOf(term, i);
					if(pos > -1){
						index = int( attributelist.substring(attributelist.indexOf("<", pos)+1, attributelist.indexOf(">", pos)) );
						output[count] = list[index];
						count++;
						i = pos+term.length;
					}else{
						i = pos;
					}
				}
				return output;
			}
			private function applyFilter():void{
				if(filtertext.text.length < 2){
					filteredXMLList = rawXMLList;
				}else{
					var searchTerm:String = filtertext.text;
					var s:int = getTimer();
					filteredXMLList = searchStringInAttribute(rawXMLList, fullNameIndex, filtertext.text.toLowerCase());
					trace("filter time: "+(getTimer()-s));
				}
			}
		]]>
	</mx:Script>
	<mx:VBox width="100%" height="100%">
		<mx:TextInput id="filtertext" width="200" keyUp="applyFilter()"/>
		<mx:DataGrid id="category_list" width="100%" height="100%" dataProvider="{filteredXMLList}">
			<mx:columns>
				<mx:DataGridColumn id="dateCol" dataField="@id" headerText="Category ID" width="100"/>
				<mx:DataGridColumn id="titleCol" dataField="@name" headerText="Category Name"/>
				<mx:DataGridColumn id="idCol" dataField="@fullName" headerText="Category Path"/>
			</mx:columns>
		</mx:DataGrid>
	</mx:VBox>
</mx:Application>

13 thoughts on “Find as you type sorting on large record sets”

  1. I just watched your demo from Apollo Camp, and this was one of the things I was curious about. Thanks for the details. Great stuff.

  2. i’m so glad i found this page on google, i’m trying to implement exactly what you’re talking about (iTunes-like search on a directory with a huge XML dataset) and am running into issues with search speeds… was trying to do it in AS2 and flash but after reading your extremely helpful and enlightening post i’m going to start educating myself in AS3 and flex. thanks again for the great information and inspiration!

  3. I don´t get it; what exactly happens here:
    ———
    for each(node in rawXMLList){
    fullNameIndex = node.@fullName.toLowerCase()””; count++; }
    ——
    What is this about and where does this come from?
    I am very curious, because I need more XML parsing speed very urgent.

    Kind regards
    ______
    Arne

  4. Now I understand: “ins” and ”\ins” as tags are some artefacts of your blogging tool; both should just be ”+” in AS code.

    Sorry :-)

  5. I added code to pull additional text from other attributes however sometimes there are multiple occurrences of search term and the output will send multiple instances of that node. Any way to prevent that?

  6. Arne, I found a solution for multiple fields. In the searchStringInAttribute method replace the setting of the new search position with this line:

    i = pos + attributelist.indexOf(”>”, pos);

    instead of what it is currently. This way when 1 match is found in the row it will go onto the next row.

  7. Hi,

    thanks for the code, it really helped a lot. I found a little flaw in it: If the user types in a number which matches an index in the filtertext variable the first XML node is returned.

    To avoid this change
    [code]
    pos = attributelist.indexOf(term, i);
    if(pos > -1){
    index = int( attributelist.substring(attributelist.indexOf(“”, pos)) );
    output[count] = list[index];
    count++;
    i = pos+term.length;
    }
    [/code]

    to

    [code]
    pos = attributelist.indexOf(term, i);
    if(pos > -1){
    if (attributelist.charAt(pos-1) != “<“){
    index = int( attributelist.substring(attributelist.indexOf(“”, pos)) );
    output[count] = list[index];
    count++;
    }
    i = (pos + term.length < attributelist.length) ? pos + term.length : -1;
    }
    [/code]

    I hope I got the code highlighting right ;)

  8. Thanks Cornelius for the fix.
    I have another change:
    this line:
    i = pos+term.length;

    to this:
    i = pos+attributelist.indexOf(“>”, pos);

    (we can also reuse the seach of “>” made before…)

    In that way we can be sure our output doesn’t have repeated results.

Comments are closed.