All posts by Sean Christmann

Find as you type sorting on large record sets

One of the sorting methods I’ve been working on this week is implementing a fast find as you type method for close to 30,000 records in AS3. The feature works exactly like iTunes library search, just with a lot more data (unless you’re rocking 1800 hours worth of music). The only rule with implementing this kind of feature is that it has to feel instant to the end user, otherwise we’ll have to switch to a Find-after-pressing-search method.

Setup
The records are stored locally as a flat xml structure.

<entry id="20081" name="Antiques" fullName="Antiques" />
<entry id="37903" name="Antiquities (Classical, Amer.)" fullName="Antiques|Antiquities (Classical, Amer.)" />
<entry id="37905" name="Egyptian" fullName="Antiques|Antiquities (Classical, Amer.)|Egyptian" />
<entry id="37906" name="Greek" fullName="Antiques|Antiquities (Classical, Amer.)|Greek" />
<entry id="37907" name="Roman" fullName="Antiques|Antiquities (Classical, Amer.)|Roman" />

The data is then loaded in as an XMLList and bound to a datagrid. From there, searches should match keywords in the fullName attribute and the datagrid will show only the entries matched.

Method #1 – Iteration
There are 2 ways to iterate the search, either use AS3’s built in E4X expressions to generate unique xml lists, or use collection filtering on the existing list. The first method flexes the power of E4X expressions, but takes a hit on swapping out the datagrid with a new dataprovider, the second works on the existing dataprovider but lacks any indexing power xml might provide.
E4X expression filtering

filteredXMLList = rawXMLList.(@fullName.indexOf(searchTerm) > -1);

Filter Time: 200 ms
Collection Filtering

filteredXMLListCollection = new XMLListCollection(rawXMLList);
filteredXMLListCollection.filterFunction = collectionFilter;
private function collectionFilter(item:XML):Boolean{
	return item.@fullName.indexOf(searchTerm) > -1;
}

Filter Time: 300-200 ms
Using E4X expressions is more powerful then filtering, but the total time for both methods still feels sluggish in the UI.

Method #2 – Pattern matching full text.
Again, there are 2 approaches to pattern matching strings in AS3, RegExp and String.indexOf(). The idea is to apply pattern matching on the raw string representation of the xml to create a new xml string containing only the nodes matched. The new string is then cast back into an XMLList to be placed in the datagrid.
RexExp matching

var pattern:RegExp = new RegExp("<[^<]*" + searchTerm + "[^>]*>", "ig");
var result:Array = rawXMLString.match(pattern);
filteredXMLList = new XMLList(result.join(""));

Filter Time: 31000-14000 ms (31-14 seconds)

IndexOf matching

private function searchStringFor(source:String, term:String):String{
	var lowersource:String = source.toLowerCase();
	var lowerterm:String = term.toLowerCase();
	var i:Number = 0;
	var out:String = "";
	var pos:Number = 0;
	while(i > -1){
		pos = lowersource.indexOf(lowerterm, i);
		if(pos > -1){
			out += source.substring(source.lastIndexOf("<", pos), source.indexOf(">", pos)+1);
			i = pos+term.length;
		}else{
			i = pos;
		}
	}
	return out;
}
filteredXMLList = new XMLList(searchStringFor(rawXMLString, searchTerm));

Filter Time: 200-50 ms

While the RegExp and indexOf are performing the same match, RegExp clearly has a long way to go before being viable in AS3. Even using simply pattern matches with String.search() during Method #1 above was 3 times slower then an indexOf during iteration. IndexOf in Method #2 on the other hand has gotten us down to a 200-50 ms filter time. 50 ms is decent but 200 ms still feels slow.

Final Method – Key search + reference copy
At this point, indexOf searches on the full string block appear to be the fastest way of filtering, and it provides the most room for refinement. Two aspects are contributing to the slowdown at this point: 1. unneccessarily searching the full xml structure, 2. constructing a new xml string and casting it to XMLList after each search. We can cover both issues by initially loading the xml string into an XMLList and iterating that list to generate a second string containing just the fullName field and the index that fullName belongs to in the original XMLList

rawXMLList = XMLList(urlLoader.data);
var count:int = 0;
for each(var node:XML in rawXMLList){
	fullNameIndex += node.@fullName.toLowerCase() + "<" + count + ">";
	count++;
}

This will generate a string looking like

Antiques<0>
Antiques|Antiquities (Classical, Amer.)<1>
Antiques|Antiquities (Classical, Amer.)|Egyptian<2>
Antiques|Antiquities (Classical, Amer.)|Greek<3>
Antiques|Antiquities (Classical, Amer.)|Roman<4>

Now when we do an indexOf search on the above string, we can look ahead to the index inside the < > tags to find which reference in the original rawXMLList to use. Those references are then set into a new XMLList to display.

private function searchStringInAttribute(list:XMLList, attributelist:String, term:String):XMLList{
	var output:XMLList = new XMLList();
	var i:Number = 0;
	var pos:Number = 0;
	var index:int = 0;
	var count:int = 0;
	while(i > -1){
		pos = attributelist.indexOf(term, i);
		if(pos > -1){
			index = int( attributelist.substring(attributelist.indexOf("<", pos)+1, attributelist.indexOf(">", pos)) );
			output[count] = list[index];
			count++;
			i = pos+term.length;
		}else{
			i = pos;
		}
	}
	return output;
}
filteredXMLList = searchStringInAttribute(rawXMLList, fullNameIndex, searchTerm.toLowerCase());

Filter Time: 70-10 ms

These filtering times are excellent and feel extremely smooth. I was a bit surprised to see that performing an indexOf search on a large block of text like this, then looking for the XMLList key in string form, was faster then performing a search inside an array iteration and mapping the array key to the XMLList key. The final source is listed below.

<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" creationComplete="init()">
	<mx:Script>
		<![CDATA[
			import flash.utils.getTimer;

			[Bindable]
			private var filteredXMLList:XMLList;
			private var rawXMLList:XMLList;
			private var fullNameIndex:String;
			private var urlLoader:URLLoader;

			private function init():void{
				urlLoader = new URLLoader();
				urlLoader.addEventListener(Event.COMPLETE, entriesLoaded);
				urlLoader.load(new URLRequest("entryList.xml"));
			}
			private function entriesLoaded(e:Event):void{
 				rawXMLList = new XMLList(urlLoader.data);
 				var count:int = 0;
 				var node:XML;
 				for each(node in rawXMLList){
 					fullNameIndex += node.@fullName.toLowerCase()+"<"+count+">";
 					count++;
 				}
 				applyFilter();
 				urlLoader.removeEventListener(Event.COMPLETE, entriesLoaded);
 				urlLoader = null;
			}
			private function searchStringInAttribute(list:XMLList, attributelist:String, term:String):XMLList{
				var output:XMLList = new XMLList();
				var i:Number = 0;
				var pos:Number = 0;
				var index:int = 0;
				var count:int = 0;
				while(i > -1){
					pos = attributelist.indexOf(term, i);
					if(pos > -1){
						index = int( attributelist.substring(attributelist.indexOf("<", pos)+1, attributelist.indexOf(">", pos)) );
						output[count] = list[index];
						count++;
						i = pos+term.length;
					}else{
						i = pos;
					}
				}
				return output;
			}
			private function applyFilter():void{
				if(filtertext.text.length < 2){
					filteredXMLList = rawXMLList;
				}else{
					var searchTerm:String = filtertext.text;
					var s:int = getTimer();
					filteredXMLList = searchStringInAttribute(rawXMLList, fullNameIndex, filtertext.text.toLowerCase());
					trace("filter time: "+(getTimer()-s));
				}
			}
		]]>
	</mx:Script>
	<mx:VBox width="100%" height="100%">
		<mx:TextInput id="filtertext" width="200" keyUp="applyFilter()"/>
		<mx:DataGrid id="category_list" width="100%" height="100%" dataProvider="{filteredXMLList}">
			<mx:columns>
				<mx:DataGridColumn id="dateCol" dataField="@id" headerText="Category ID" width="100"/>
				<mx:DataGridColumn id="titleCol" dataField="@name" headerText="Category Name"/>
				<mx:DataGridColumn id="idCol" dataField="@fullName" headerText="Category Path"/>
			</mx:columns>
		</mx:DataGrid>
	</mx:VBox>
</mx:Application>