JULY 2011

Dmitry Brusilovsky & Hossein Ahari, Ph.D

1. Introduction

The general problem in response modeling is to identify a response curve and also the corresponding diminishing returns. The response curve could be described as a relationship between marketing efforts (promotion) and results (sales or revenue), controlling for other relevant variables, where the result is treated as a function of marketing effort. While response modeling is industry-specific, a response curve is the product of response modeling and could be defined as a mathematical construct (depending on the nature of its application) that relates promotional effort to its response.

2. Problem Definition

The problem is to find the best piecewise linear (PWL) approximation for a discrete representation of a response curve. The data set contains customer/client responses to different marketing efforts. The marketing efforts lie on the X axis and the corresponding responses on the Y axis, resulting in a 2-D curve. Therefore, the data consists of a set of points ( x 1 , y 1 ), ( x 2 , y 2 ), etc. where x k is less than the next value. This discrete curve must be approximated by a PWL curve that has no more than 3 pieces. The concavity of the PWL approximation is a condition that has to be satisfied. The output of this PWL approximation is a set of a maximum four nodal points which provide the best approximation considering the convergence in the solution or the limitation in calculation time. The PWL approximation is necessary for the optimization of sales force structure.

3. Methodology

The piecewise linear approximation (PWL) problem is formulated as an optimization problem. The deviation area between the actual data curve and the PWL curve shows the accuracy of the PWL approximation and is considered as the objective function for this optimization problem, as depicted in Figure 1. Each piece of PWL approximation, which is shown by P i in Figure 2, is recognized by the coordinates of the nodes, and the corresponding slope value a i .

Figure 1: The deviation area between the actual curve F and its PWL approximation f
Figure 1: The deviation area between the actual curve F and its PWL approximation f

Figure 2: Nodes and slopes in a PWL approximation curve
Figure 2: Nodes and slopes in a PWL approximation curve

The goal of the optimization is to find the best set of nodes N i such that the error value (objective function) depicted in Figure 1 is minimized.
As mentioned in [1], the solution space is considered to be all points inside the rectangle created by X min , X max , Y min  and  Y max . This means all the candidate points to create PWL approximation for actual data are located in the area surrounded by the rectangle, as visualized in Figure 3. Figure 3: The solution space

Figure 3: The solution space

It is obvious that the solution space in this case contains too many sets of nodes for each actual set of data points. Each set creates a PWL approximation for the actual data points. To select the set of nodes which create an approximation with a reasonable accuracy, the deviation area between the actual curve and its PWL approximation must be evaluated. This defines an optimization problem with the error as the objective function. Genetic algorithm (GA) was selected to solve this optimization problem.

4. Why Genetic Algorithm?

Prior to applying GA, we developed and implemented an original approach to response curve approximation that was formulated as a specific Non-Linear Programming optimization problem with a continuous but non-differentiable objective function and linear and nonlinear constraints. However, after extensive experimentation with different data sets it was discovered that this algorithm often failed to find a global optimal solution and found the local optimum instead. This local solution greatly depends on the values of first approximation ("original values") provided by the algorithm randomly or by an analyst. Only in 15 - 20 % of cases were original values successfully selected in terms of finding a global optimum. In other words, the algorithm was not robust with regards to original values.

GA consists of a class of population-based algorithms. They are algorithms that work based on the notion of evolution in nature. A schematic representation of how a genetic algorithm works is depicted in Figure 4. It starts with a randomly selected population from the initial population, which is the solution space. Then, the value of an objective function is calculated for all members of the population. If the termination criterion has not yet been satisfied, the genetic operators, such as crossover and mutation, are applied to the population. This will produce a new population which is called the offspring. The iteration is repeated for this new population.

GA works on a population of solutions to solve the optimization problems. By considering a sufficiently large population of possible solutions, a large solution space can be searched for a global optimum with a decreased chance of getting trapped in a local optimum.
A population of solutions in a PWL approximation problem is a population of sets of four nodes such that N 1 , N 2 , N 3 and N 4 , where corresponding x-coordinates
x 1 < x 2 < x 3 < x 4 . As depicted in Figure 2 each member of the population can represent a PWL approximation for the given set of data points.

As shown in Figure 4, GA starts with a randomly created population of sets, whereas each set contains four nodes. Then the objective function value, or the fitness value (genetic algorithm jargon), is calculated for each member of the population.


Figure 4: A scheme of genetic algorithm based optimization process [2]

To check if the solution was found and terminate the GA procedure, two criteria are used in this application. The first one is the convergence criterion and the second one is the calculation time.

The convergence criterion is used to find the best relative fitness RFitness in the population. The best relative fitness as it is shown by (1) is a quantity that estimates the magnitude of the fitness value for each member of the population in comparison with all other members. I.e. for a sample member m of the population:

Here, I is the population size.

Then, if this value does not change for a user defined number of generations, the program stops due to the convergence criterion applied to the population. This means the optimization process could find a global or at least a reliable local solution for the problem. In other case, if the convergence criterion was not satisfied during the generations, the optimization process continues until a user defined number of generations is reached.

As mentioned in Section 1, the solution of the PWL approximation problem is a set of four points:

N 1 ( x 1 , y 1 ), N 2 ( x 2 , y 2 ), N 3 ( x 3 , y 3 ) and N 4 ( x 4 , y 4 ). Hence eight unknowns have to be found to draw PWL approximation corresponding to this set of four nodes. Since all possible solutions are located inside the rectangle shown in Figure 3, the values of x  s for the first and last points are known at the beginning. So the number of unknowns reduces to six values.

As depicted in Figure 3:

On the other hand:

PNG Hence the area surrounded by x min  and  x max is searched for x 2  and  x 3 . Also, the area surrounded by Y min  and  Y max is searched for all values of y .
Figure 5 represents the GA based optimization algorithm for this PWL approximation procedure. Here, K is the number of generations; and i is the counter for population number.

As depicted in Figure 5, the program starts by random selection of x 2 , x 3 , and y j , j = 1 to 4. Then, by having x 1 = X min  and  x 4 = X max all eight coordinates of  three pieces PWL approximation are known. The next step in to evaluate the fitness function for this PWL sample. In order to calculate the fitness function value for a population member, imagine the PWL approximation generated by that specific member is drawn. Figure 6 shows the method used to find the error value. As depicted in Figure 6, to find the error value between the actual data points (actual response curve) and its three-piece PWL approximation in the distance between x 1 = X min and x 4 = X max , some sample points (such that x 5 ) are selected along X axis. Then for each sample point two values for Y are found, one from the actual response curve and the other from the suggested PWL approximation. The deviation between these two values

Figure 5: the algorithm of PWL approximation
Figure 5: the algorithm of PWL approximation

represents the error value in that specific point. The more sample points are selected, the more accurate the error value is calculated. The number of sample points is a user choice of the algorithm.

Referring to Figure 5, as soon as the error value for one member of the population is calculated, the value of I is checked to see if the error values for all population members are calculated.

Figure 6: Deviation in a sample point x 5
Figure 6: Deviation in a sample point x 5

As soon as all fitness values are found, the fitness function evaluation process is conducted to make sure the termination criterion is satisfied. If it is satisfied, the program writes the coordinates of four nodes that create the best objective value, which is the minimum error, but if the termination criterion has not yet been satisfied, the GA operators will act on all population members. This procedure is explained in the next section.

5. Genetic operators

To create the next generation of the population members, it is necessary to apply genetic operators on the population. The first operator is the selection process.

5.1 Selection operator

Among different selection operators which are introduced through literature, the tournament selection is used for the aim of this application. This selection operator takes some members of population randomly. Then, the one with the highest fitness value is selected for the next generation. The process continues until the mating pool, which is comprised of only those members of the population producing offspring, is the same size as the population size. After the initial population, or the mating pool, is created, the crossover operator is applied to the population.

5.2 Crossover operator

Crossover is one of the genetic algorithm operators which let the system search more areas of the solution space. It randomly selects two chromosomes (each chromosome contains a set of parameters which define a proposed solution) from the current mating pool. These are called parental chromosomes. By switching the genes between these parental chromosomes, two new chromosomes, which are considered their "children", are created.

Here, after two parental chromosomes are selected, they are converted to the binary. Then a single point crossover is applied to these two parental chromosomes.

5.2.1 Single point crossover

In this method, one randomly selected location on both chromosomes is selected as the crossover point. All the data (genes) beyond this point are switched between the two parental chromosomes to create two children (Figure 7). As the final genetic operator, mutation is applied to the mating pool.

5.3 Mutation operator

Mutation has a perturbation effect on a randomly chosen chromosome. It produces a child chromosome with new characteristics that did not exist in the parent chromosome. It helps the G.A. to avoid a local optimum in order to approach a global optimum. Like crossover, the mutation is not applied to all the members of the population. The mutation rate varies based on the optimization problem and it is usually around a single digit percentage of the entire population. Mutation is carried out by different methods; one of the two most commonly used types found in this application is flip bit mutation.

Flip bit mutation

In this method, which is used for binary coded GAs, a chromosome is chosen randomly. Between genes inside that chromosome, the allele of one randomly selected gene flips from 0 to 1, or vice versa. For example, if the chromosome [1001101110110] with 13 genes is one of the members of the population, a randomly selected number between 1 and 13, say 7, introduces the genes selected for mutation. Since the allele for that gene is currently 1, becomes 0 during the mutation process. Hence, the new chromosome is [1001100110110].

Figure 7: Single point crossover
Figure 7: Single point crossover

6. New generation

After all genetic operators are applied; the next generation of the population is created. The whole process as depicted in Figure 5 is now repeated and this procedure continues until the termination criterion is satisfied.

7. Improvement Accuracy of the Result

In some cases, despite a reasonable overall error, the local error that is the error value for each piece might be high. This results in an inaccurate approximation for that specific piece. Such a case is illustrated in Figure 8. As can be seen in this figure, the first piece represents a reasonable approximation for the actual data in its bound, whereas the second and specially the third piece have a comparably high local error values.

Figure 8: Result after 10 generations
Figure 8: Result after 10 generations

Figure 9: Result after the convergence criterion is satisfied
Figure 9: Result after the convergence criterion is satisfied

To figure out this issue and provide more accurate approximation, besides the overall error, the local error values are also calculated in each piece. Then the final solution is found based on both global and local errors criteria satisfaction.

Experiments showed that applying this condition led to considerable improvement in the approximation accuracy.

8. Concavity vs. Semi-Concavity

Figure 2 represents the case of a concave linear approximation, which has been considered so far on this research. In a concave approximation, as depicted in Figure 2, the values of slopes for each piece, a 's, are constantly reduced and can be any positive or negative values. While in a semi-concave situation which is illustrated in Figure 10, the values of a 's cannot assign negative values. This case is applicable where no declination is acceptable for a given data.

To consider this situation, a new option was added to the program. In the case of the semi-concave approximation approach, the program is restricted to accept only positive values of slopes. It should be mentioned that for a predefined accuracy, there might be situations where finding a semi-concave approximation is not possible for a given set of data ponts. If this case happens, the error bound must be changed to a higher value. The program is then able to find a semi concave approximation.

Figure 10: A semi concave approximation
Figure 10: A semi concave approximation

9. Description of the Application

We have implemented the piecewise linear approximation (PWLA) engine as a .NET component. The computational experiments were conducted with a stand-alone test application built on top of the PWLA engine component that allows a user to specify
the PWLA parameters for a given data set. The parameters are a number of linear pieces
in PWL approximation constraints (concavity condition), and maximum number
of generations.

Currently, we are developing an Excel Add-In application based on the PWLA engine that will provide decision support for marketing and sales departments to help identify the optimal sales force structure and marketing budget.

Figure 11: Example of good PWL approximation

Figure 11: Example of good PWL approximation