X

New Custom String Parsing

Introduction

Custom strings have long been at the heart of what makes TradeSkillMaster such a versatile and powerful tool for gold making. They allow the user to focus on their gold making strategy, and allow the addon to worry about doing all the complex math for them. For example, the default Auctioning operation settings ensure that items will be posted above their vendor sell price and a percentage of an average of the crafting cost, realm market value, and region market value, while transparently accommodating items that can’t be either vendored or crafted, all without requiring the user to ever need to manually calculate or input those values. In v4.13, we completely rewrote how the addon parses and evaluates custom strings to allow for much greater optimization and user debuggability. This blog post will dive deep into the changes that were made and the performance improvements they’ve enabled.

Background

TSM’s custom strings are defined by a set of functions and sources that can be used together in a single expression. As an example, let’s dissect the default Auctioning Operation minimum price setting:

check(
	first(crafting,dbmarket,dbregionmarketavg),
	max(
		0.25*avg(crafting,dbmarket,dbregionmarketavg),
		1.5*vendorsell
	)
)

There are a few things going on here. First of all, the outer check() function ensures that the first parameter is valid (and greater than 0); making the entire thing invalid if not. This is used here to ensure that the item has either a crafting cost, realm market value, or region market value. If all of these are missing for the item, we don’t have enough information available to properly post the item, so the custom string is made invalid to prevent posting the item at a potentially erroneous price. Assuming that check passes, we’re left with:

max(
	0.25*avg(crafting,dbmarket,dbregionmarketavg),
	1.5*vendorsell
)

The max() function here takes the higher of its two parameters. The first parameter is the average of the crafting cost, realm market value, and region market value multiplied by 0.25. The second parameter is 1.5 times the vendor sell price. This ensures that this custom string always evaluates to at least 1.5 times the vendor sell price, even if the market values / crafting cost is very low. Both avg() and max() also have a side-effect of silently ignoring any invalid parameters, meaning if the item doesn’t have a crafting cost or a vendor sell price, the custom string can still be evaluated based on the other parameters.

Let’s plug some values into this custom string and see how it gets evaluated. Let’s assume we’re trying to post “Spice Bread” which has the following prices:

  • Crafting Cost: 35c
  • Vendor Sell Price: 5c
  • Market Value: 12s 70c
  • Region Market Value: 11s 57c

If we simply plug these into our custom string, we get the following:

check(
	first(35c, 12s70c, 11s57c),
	max(
		0.25 * avg(35c, 12s70c, 11s57c),
		1.5 * 5c
	)
)

After evaluating the first() and avg() functions:

check(
	35c,
	max(0.25 * 8s21c, 1.5 * 5c)
)

The first parameter of check() is valid (greater than 0) so it simply evaluates to its second parameter, the max(), which calculated out results in 2s5c.

This is the same math that TSM is doing under the hood, but obviously TSM needs to do it in a well-defined and programmatic manner. It has traditionally accomplished this by relying on the fact that the custom string syntax (that is, the format used to write custom strings) is very similar to valid Lua code (the programming language addons are written in), and Lua allows for dynamically loading and executing code. In other words, we can treat the custom string as Lua code and execute it directly as if it were programmed directly into the addon. However, we don’t simply want to directly execute what gets entered into a custom string setting for multiple reasons:

  1. We need to be able to plug in the correct value for the sources depending on the item which the custom string is being evaluated for.
  2. The set of functions which are valid in custom strings are specific to TSM, and not necessarily part of the Lua language itself.
  3. We need a way to ensure that what the user enters is a valid TSM custom string, and not just some arbitrary snippet of Lua code.

This was previously accomplished by doing a very rough tokenization of the string followed by applying some rules and string replacements to the tokens to ensure the input was valid and create a valid, loadable snippet of Lua code to present the custom string. Tokenization is the process of turning an arbitrary string (i.e. the entire custom string the user enters into a setting input) into a series of known words (a.k.a. tokens). For example, the custom string “max(crafting, vendorsell)” gets turned into the tokens:

  • max
  • (
  • crafting
  • ,
  • vendorsell
  • )

In this tokenized form, we can then easily perform a series of validation checks, such as requiring that there is an equal number of left and right parentheses, and replace sources and functions as appropriate to turn this into a valid snippet of Lua code we can execute:

function(helpers, _item)
	local result = helpers._max(
		helpers._priceHelper(_item, “crafting”),
		helpers._priceHelper(_item, “vendorsell”)
	)
	if not result or helpers.IsInvalid(result) or result <= 0 then return end
	return result
end

Above is a simplified version of what the code generated by the original custom string implementation would look like. It looks very similar to the custom string we started with, but the sources have been replaced with function calls to look up the specified price for the item being evaluated, and the max() function call has been replaced with a call to a max() function TSM defines and provides.

This implementation has suited TSM well for a very long time, but has a few key limitations. First of all, the amount of validation that can be performed on a list of tokens is fairly limited and isn’t robust enough to catch all possible syntax errors. Having more robust processes in place for how custom strings are parsed will also enable better user-facing error messages and debugging functionality. Also, there’s no chance to perform any optimization on the generated code with this implementation. For example, if a custom string references the same source multiple times, the generated Lua code will look up the value of that source each and every time, with these lookups accounting for the majority of the computation time needed to evaluate custom strings in general.

New Implementation

The new implementation works much more like a traditional compiler and consists of 4 high-level steps: tokenization, abstract syntax tree generation, optimization, and code generation. We’ll use the following (somewhat contrived) custom string to demonstrate this process:
ifgt(crafting, 50g, crafting + 150% crafting, 10g + 5g)

For reference, here’s what that custom string would be converted to using the old system:

function(helpers, _item)
	local result = helpers._ifgt(
		helpers._priceHelper(_item, “crafting”),
		500000,
		helpers._priceHelper(_item, “crafting”) +
			1.5 * helpers._priceHelper(_item, “crafting”),
		100000 + 50000,
	)
	if not result or helpers.IsInvalid(result) or result <= 0 then return end
	return result
end

Tokenization

The tokenization step is similar to the prior implementation, but much more in-depth. It is responsible for taking the input string and converting it into a series of well-defined tokens which can then be more easily processed further. A token can be thought of similar to a word in an English sentence. It’s the smallest discrete unit which is meaningful by itself. The tokenizer works by methodically going through each character of the input string and separating out distinct tokens which are often separated by whitespace or formatting characters (i.e. parentheses or commas). In the case of our example (ignoring whitespace tokens), the tokenizer splits up our custom string into the following tokens:

ifgt ( crafting , 50g , crafting + 150 % crafting , 10g + 5g )

In addition to just splitting up the string into tokens, the tokenizer also classifies each token based on their type:

  • FUNCTION – ifgt
  • LEFT_PAREN – (
  • IDENTIFIER – crafting
  • COMMA – ,
  • MONEY – 50g
  • COMMA – ,
  • IDENTIFIER – crafting
  • MATH_OPERATOR – +
  • NUMBER – 150
  • MATH_OPERATOR – %
  • IDENTIFIER – crafting
  • COMMA – ,
  • MONEY – 10g
  • MATH_OPERATOR – +
  • MONEY – 5g
  • RIGHT_PAREN – )

The tokenization step takes care of a lot of the ambiguity of string parsing, and we now have a well-defined representation of the custom string which can more easily be worked with programmatically. The next step is to take this series of tokens and convert it into an abstract syntax tree, or AST for short.

AST Generation

An abstract syntax tree (AST) is a tree-based (per the name) form which better represents the structure of the custom string and the discrete operations (i.e. mathematical operations and / or function calls) which it consists of. A description of the algorithm for transforming the list of tokens into an AST is beyond the scope of this blog post, but we’ll visualize it with our example string, starting with a slightly more formatted version of the original string:

ifgt(
	crafting,
	50g,
	crafting + 150 % crafting,
	10g + 5g
)

In this form, we can visually see that this example has an outer “ifgt” function which has 4 parameters. Its 3rd parameter consists of 2 separate math operations (“+” and “%”). The 4th parameter of the outer “ifgt” also consists of a “+” math operation. Each of these distinct functions and operations is turned into a node in the tree, with its arguments becoming its children:

This AST form is helpful for doing a few things:

  1. It abstracts away all the original syntax of the custom string with the order of operations entirely determined by the location of a node within the tree. For example, we can look at the tree and see that we must calculate the “%” operation of “150” and “crafting” before we can calculate the “+” operation, as the result of the former is a dependency of the latter.
  2. We can easily validate that functions are being used properly. The “ifgt” function can only take either 3 or 4 parameters. We can easily see from this tree form that the “ifgt” node in the tree has 4 direct children, and therefore the “ifgt” function has a valid number of arguments. One thing worth noting here is that TSM stores enough information in both this tree form and the token list in order to reverse the process back to the original string form. This allows TSM to point the user at exactly where their custom string is invalid, which is one of the main benefits of this new custom string implementation as a whole.
  3. Lastly, we can perform very well-informed optimizations based on this AST form, which we’ll dive into next.

Optimizer

Now that we have the custom string represented as an AST, we can methodically go through it and look for opportunities where we can optimize it without changing the result. There are dozens of different optimizations which are attempted at this stage, but for this example there are just a few which apply.

The first optimization we can perform here is to realize that taking “150%” of something is the same as multiplying it by 1.5. Multiplication is generally a lot easier to reason about than a “%” operator, so we’ll update that part of the tree accordingly:

If we zoom out one level from that new multiplication, we can see that we now have the expression “crafting + (1.5 * crafting)” which, using some simple rules of algebra, we can rewrite as “2.5 * crafting”, or in AST form as:

Lastly, we can see that the 4th argument of our “ifgt” function is simply adding two constant values together, which can be precomputed to just “15g” with our optimized AST looking like this:

Although this was a relatively contrived example, we were able to reduce the number of nodes in the AST from 11 down to just 7 just through some simple optimizations. Most importantly, we were able to get rid of one out of three of the “crafting” source nodes. These optimizations mean that TSM users can format their custom strings in a way which makes managing them as easy as possible rather than worrying about trying to write them in an optimal way. The addon itself will take care of all the optimization.

Code Generation

The last step is to turn our optimized AST into Lua code that can be executed by the addon. Compared to the previous custom string implementation, having the AST allows us to do this in a much more controlled and optimized way. For example, we can cache intermediate values so we only need to compute them a single time, even if they are used multiple times within the custom string. The generated code for our example custom string (edited for readability) is as follows:

return function(itemString, helpers)
	-- Locals
	local res_ifgt_1901082 = nil

	-- Code
	local var_crafting = helpers.GetPrice(itemString, "crafting")
	if var_crafting == helpers.INVALID then
		return nil
	end

	if var_crafting > 500000 then
		res_ifgt_1901082 = (2.5 * var_crafting)
	else
		res_ifgt_1901082 = 150000
	end

	return helpers.ValidateResult(res_ifgt_1901082)
end

This code very closely matches our optimized AST, and has many optimizations when compared to our original custom string and the previous implementation. Most importantly, we only need to look up the “crafting” source for the item a single time, and then are able to reuse that multiple times within the generated code. One important optimization that’s not shown here is that we only evaluate sources as they are needed, and at most a single time. For example, if “crafting” was referenced in the “else” part of the “ifgt”, it would reference the same “var_crafting” variable rather than re-computing the crafting cost for the item.

Conclusion

Custom strings are one of the most powerful aspects of TSM, and we continue to dedicate significant time to improving their functionality and efficiency. With these under-the-hood improvements we made in 4.13, we are able to make the process of writing and debugging custom strings much better for all TSM users, while also improving their performance.

sapu:
Related Post