2020 On a reduction for a class of resource allocation problems

,


Introduction
The resource allocation problem (RAP) is a classical problem within operations research and has been studied extensively and continuously since the 1950s [61]. In its most basic and most studied form, this problem asks for the allocation of a given amount of resource over a set of activities while minimizing a given separable cost function (or, equivalently, maximizing a given separable utility function). Over the years, several variations and extensions of this basic setting have been studied, with different types of individual cost functions, additional constraints, and allocation restrictions such as integer-valued allocations [38].
With regard to the constraint structure, we focus on a general version of the RAP that occurs widely in applications, namely the RAP with additional submodular constraints (see, e.g., [25,18]). In this problem, for each subset of the activities, there is an upper bound on the total amount of resource allocated to these activities and this bound is given by a submodular set function. This problem has many applications in, e.g., machine learning [4,3], scheduling [74,44], and game theory [36,28,26]. Moreover, important special cases of this problem are the RAP with box constraints (see [61]), the RAP with generalized bound constraints (see [69]), and the RAP with nested constraints (see [82]). Important application areas for in particular these special cases include, among many others, regularized learning [12,47], telecommunications and energy management [60,79,88], and statistics [57,15] (see also the overviews in [61] and [82]).
Concerning the objective, state-of-the-art solution approaches for RAPs generally allow each activity to have its own arbitrary (convex) cost function. Although this is an interesting aspect from a mathematical point of view, it is questionable whether this is a given situation in practical problems. In applications, often the structure of the cost functions is the same for all activities (e.g., all functions are quadratic) and the difference lies primarily in different parameters for these functions (e.g., each function has different multiplicative factors). In fact, this is the case for the large majority of applications studied in (the works surveyed in) [61,62,1,82].
Scanning existing solution approaches for RAPs, one observes that, to obtain algorithms with a low computational complexity, many approaches rely on advanced data structures and procedures to store and manipulate problem parameters and intermediate bookkeeping values. However, it is often unclear whether such structures and procedures are actually fast in practice due to the lack of computational experiments (see also [62]). Moreover, the fact that these algorithms are highly complex often makes it difficult to implement them, which limits their adaptability in practice where other aspects such as code maintainability and ease of use are often considered as more important (see also [53]).
The aspects discussed above motivate us to study RAPs where the cost functions share a common structure and differ only in the parameter choice. More precisely, we introduce the notion of (a, b, f )separable RAPs, wherein the cost function of each activity i is of the form a i f ( xi ai + b i ), where f is the common convex function and a i > 0 and b i are given parameters that can be different for different activities i. Such cost functions are an extension of so-called d-separable functions introduced in [81]. Moreover, they are closely related to the concept of perspective functions [10,9], which arise naturally in many problems in applied mathematics. In particular, most of the applications in (works surveyed in) [61,62,1,82] can be modeled as (a, b, f )-separable RAPs.
In this article, we show a reduction result concerning (a, b, f )-separable RAPs with submodular constraints. More precisely, we show that for given parameters a and b and an instance of this class of RAPs, there exists a feasible solution to this instance that is optimal for any choice of the convex function f . In particular, we show that any solution that is optimal to the basic quadratic version of this RAP, i.e., where f (x i ) = 1 2 x 2 i , is also optimal for the (a, b, f )-separable version for any choice of f . This means that solving any (a, b, f )-separable RAP reduces to solving the quadratic version of this RAP and allows us to solve this problem using any tailored algorithm that solves the quadratic RAP. Thus, to solve this problem, we do not require algorithms designed to solve the more general version with arbitrary convex cost functions, which are in general much slower and less efficient than the tailored algorithms for the quadratic RAP. Moreover, especially for the quadratic RAP over box constraints, many different types of algorithms exist to solve this problem, each of which has different pros and cons given the application [61,62]. Thus, our reduction result allows us to solve a wide range of RAPs using the extensive collection of solution approaches and algorithms for quadratic RAPs.
In the literature, similar results already exist for specific RAPs. For RAPs over submodular constraints, [16] showed that the problem with quadratic cost functions is equivalent to the problem of computing a lexicographically optimal base with regard to a given weight vector. [55] extends this result to a range of different strictly convex cost functions for the case of continuous variables. Their result is used in [56] to solve optimization problems on graphs and in [73] to derive efficient algorithms for processor scheduling problems. For a special case of RAPs with nested constraints, the equivalence of (a, b, f )-separable RAPs is proven in [1] for the case where the functions f are strictly convex and differentiable, b = 0, and with continuous variables.
Some reduction results can be derived from existing algorithms for specific applications in the literature. An example of this concerns the vessel speed optimization problem (see, e.g., [58]). In this problem, a ship traverses a given route between ports and must dock at each given port within a specific time window. The goal is to determine the ship's speed between each leg of the route while minimizing fuel costs. The authors in [58] propose a recursive-smoothing algorithm (RSA) for this problem, which is shown to be optimal by [33]. This algorithm does not require knowledge on the fuel cost function other than that it is convex. Thus, the optimal solution outputted by this algorithm is indifferent of the choice of cost function.
Another example is the scheduling of tasks on a single processor with agreeable deadlines (see, e.g., [20]). Here, we are given a number of tasks that must be processed on a single processor, each of which has its own workload, arrival time, and deadline. The goal is to assign processor speeds to tasks such that all tasks are finished before their deadline while minimizing the total energy usage of the processor. This energy usage depends on the workload of each task and on the required power to maintain a given processor speed. Analogously to the vessel speed optimization problem, the processor scheduling problem can be solved using the RSA without any knowledge of the nature of the convex cost function [32]. Thus, also the optimal solution outputted by this algorithm does not depend on the power function.
Our reduction result generalizes all the above results to general convex functions f , i.e., not necessarily strictly convex or differentiable, and to both continuous and integer variables. In particular, in the case of continuous variables and a strictly convex function f , our reduction result becomes an equivalence result since the optimal solution to any strictly convex optimization problem is unique. In fact, given the parameters a and b, an instance to RAP, and two strictly convex functions f andf , we show that the (a, b, f )-separable and (a, b,f )-separable versions of this RAP have the same unique optimal solution and are thus equivalent.
Next to the theoretical impact of our reduction result, we demonstrate the added value of this result for several applications. For a number of problems from the areas of telecommunications, statistics, and energy management, we show that our results provide new insights and improve several existing solution approaches. In particular, we show that the vessel speed optimization problem and the processor scheduling problem mentioned above can be solved in O(n log n) time. This is an improvement over their currently best known time complexity of O(n 2 ).
Summarizing, our technical contributions are as follows: 1. We show that (a, b, f )-separable RAPs with submodular constraints reduce to their quadratic versions, making available the more extensive collection of fast and efficient algorithms for quadratic RAPs to solve these problems.
2. We discuss the impact of this result on some special cases of the considered RAPs and derive new worst-case time complexity results for these cases.
3. We apply our results to core problems from several application areas and show how they can be solved more efficiently using our reduction result. For two of these problems, we improve their worst-case time complexity from O(n 2 ) to O(n log n). Moreover, on a higher level and perhaps of independent interest, our work demonstrates that methodological research on RAPs is conducted independently in many different research fields, be it under different names. As a consequence, many conceptual insights, structural properties, and solution approaches for RAPs have been re-invented and re-discovered many times over the years, both within the same field and independently in several fields. Therefore, we aim to promote a cross-disciplinary approach for studying RAPs. Such an approach will both reduce the amount of future re-discoveries and re-inventions and allow researchers to benefit from the many available different perspectives on RAPs.
The organization of this article is as follows. In Section 2, we provide formal problem definitions of the studied RAPs and introduce the used notation. In Section 3, we prove the reduction result and in Section 4, we discuss the impact of this result on each of the studied RAPs. In Section 5, we demonstrate the impact of our reduction result on several application areas. Finally, Section 6 contains our conclusions.

Problem formulation and preliminaries
In this section, we formulate the studied resource allocation problems, i.e., the RAP over submodular constraints and its special cases, and introduce the used notation and definitions.

Notation and definitions
In the following, we introduce some notation and properties of used functions and sets. For this, let N := {1, . . . , n} be the index set of the given activities. We call a convex function Φ : R n → R separable if it can be written as the sum of single-variable convex functions, i.e., if Φ(x) = i∈N φ i (x i ) for some single-variable convex functions φ i : R → R, i ∈ N . Moreover, given two vectors a ∈ R n >0 and b ∈ R n and a single-variable convex function f : R → R, we say that Φ is (a, b, f )-separable if each function φ i can be written as Note that we do not pose any restrictions on f other than convexity. Hence, both f and φ i are not necessarily strictly convex or differentiable. We denote the left and right derivatives of f by f − and f + respectively. It follows that the left derivative φ − i of φ i is given by Analogously, the right derivative φ + i of φ i is given by φ + i (x i ) = f + xi ai + b i . Throughout this article, we call a resource allocation problem (a, b, f )-separable if its objective function is (a, b, f )-separable.
We denote the vector of ones of dimension n byē. Furthermore, for each index i ∈ N , we denote by e i ∈ R n the standard basis vector associated with i, i.e., the vector whose i th entry is 1 and whose other entries are all zero. For a given set C ⊂ R n and x ∈ C, let E C (x) denote the set of index pairs (i, k) ∈ N 2 for which we can always shift a small amount from x i to x k without violating feasibility. More precisely, we define Each pair in (i, k) ∈ E C (x) is called an exchangeable pair. Finally, let r : 2 N → R be a set function on the ground set N . The set function r is submodular if r(X ∪ Y) + r(X ∩ Y) ≤ r(X ) + r(Y) for any X , Y ⊆ N , where we assume that r(∅) = 0.

Problem classification
The basic version of the resource allocation problem calls for an allocation x ∈ R n of a given amount of resource R ∈ R over a set of activities N such that a given convex cost function Φ(x) of the allocation is minimized. This problem can be formulated as follows: Based on this basic version, we can formulate several extensions of the problem RAP with different types of cost functions, additional constraints, and different types of decision variables. To clearly distinguish between these problems, we adapt a classification scheme similar to that in [34] and [38], i.e., we specify each problem by three fields α/β/γ, where α specifies the objective function, β describes the constraint set, and γ specifies the nature of the decision variables.
For α, we consider the following options: This means that Φ is both separable and quadratic.
For β, we consider the follows special constraint structures, where we use M := {1, . . . , m} as an index set for additional constraints: 2. Generalized bound constraints (GBC): Next to the box constraints also constraints of the form L j ≤ i∈Nj x i ≤ U j , j ∈ M are given, where the sets N 1 , . . . , N m form a partition of N .
3. Nested constraints (NC): Next to the box constraints also constraints of the form L j ≤ i∈Nj x i ≤ U j , j ∈ M are given, where the sets N 1 , . . . , N m are such that N 1 ⊂ · · · ⊂ N m ⊂ N . 4. Laminar (or tree) constraints (LC): Constraints of the form L j ≤ i∈Nj x i ≤ U j , j ∈ M are given, where the subsets N 1 , . . . , N m of N have the following property: if N j ∩ N ℓ = ∅, then either N j ⊂ N ℓ or N j ⊃ N ℓ for all j, ℓ ∈ M.
5. Submodular constraints (SC): Constraints of the form i∈S x i ≤ r(S), S ⊂ N and i∈N x i = r(N ) are given, where r is a given submodular function with r(N ) = R.
Note that the constraint structures Box, GBC, and NC are special cases of the structure LC. Moreover, it can be shown that the structure LC is a special case of the structure SC (see Appendix A). Thus, all constraint structures are special cases of SC. We discuss each of these special cases in more detail in Section 4. For γ we consider the following two cases: 1. Continuous decision variables (C): x ∈ R n .

Integer decision variables (I):
x ∈ Z n . Table 1 summarizes the possible entries of α, β, and γ as a compact reference. To simplify the presentation, we assume that whenever we specify β, the parameters that define the corresponding constraints are fixed. For example, when we consider the problems (a, b, f )-S/LC/C and (a, b)-Q/LC/C for some vectors a, b and convex function f , we assume that the subsets N 1 , . . . , N m and vectors L := (L j ) j∈M and U := (U j ) j∈M are fixed.
Finally, apart from these extensions of RAP, we study a general constraint class where a constraint x ∈ C for some set C ⊂ R n is given. We pose no restrictions on this set other than that it is nonempty. We denote this constraint class by C and denote the corresponding separable, (a, b, f )-separable, and (a, b)-quadratic versions of this problem by S/C, (a, b, f )-S/C, and (a, b)-Q/C respectively.  3 Reduction of (a, b, f )-separable RAPs to quadratic RAPs The goal of this section is to show for all the RAPs introduced in the previous section that their (a, b, f )separable versions reduce to their quadratic versions. More precisely, given a constraint structure β, variable type γ, convex function f , and vectors a ∈ R n >0 and b ∈ R n , we show that any optimal solution to (a, b)-Q/β/γ is also optimal for (a, b, f )-S/β/γ. This means that we can solve (a, b, f )-S/β/γ by solving (a, b)-Q/β/γ. Note that for many of these quadratic RAPs, tailored algorithms exist that are faster and more efficient than algorithms for the case with arbitrary convex cost functions. Thus, this reduction result allows us to solve (a, b, f )-S/β/γ problems using fast algorithms for their quadratic special case.
We start by considering the general constrained optimization problem S/C with a convex separable objective function: where C ⊂ R n . Recall that we do not assume any properties on the set C other than that it is nonempty and that all RAPs introduced in the previous section are special instances of this problem. We show that if S/C satisfies a certain optimality condition, any optimal solution to (a, b)-Q/C is also optimal for (a, b, f )-S/C. This optimality condition states that a feasible solution x to S/C is optimal if and only if moving an arbitrary amount from one variable x i to another variable x k while maintaining feasibility never leads to a decrease in objective value. We state this condition as Condition 1 and give the mentioned reduction result in Theorem 1.

Condition 1. Given separable convex functions φ i : R → R and a set C ⊂ R, a feasible solution x to S/C is optimal if and only if we have for each exchangeable pair
Let the set C, a convex function f , and a ∈ R n >0 and b ∈ R n be given. If Condition 1 is satisfied by S/C and x ∈ C is optimal for (a, b)-Q/C, then x is also optimal for (a, b, f )-S/C.
Proof. Let x be an optimal solution to (a, b)-Q/C. Note that for the problem (a, b)-Q/C, we have that Since by convexity of f the right derivative f + is non-decreasing and we have f Thus, by applying Condition 1 to (a, b, f )-S/C, this implies that x is optimal for (a, b, f )-S/C.
Note that Theorem 1 does not require the problem S/C to be a RAP. This means that this theorem and thus our reduction result is more widely applicable to other problems, provided that they satisfy Condition 1.
It is well-known that S/SC/γ satisfies Condition 1 for γ ∈ {C, I} (see, e.g., [25,18]). To gain some insight in why this is the case, we provide for the interested reader in Appendix B an alternative proof for this claim for the relevant special case S/LC/γ that relies only on basic concept from convex analysis such as subgradients. It follows that Theorem 1 can be applied to S/SC/γ and in particular also to all special cases of this problem: Corollary 1. Let a convex function f , vectors a ∈ R n >0 , b ∈ R n , and entries β and γ as specified in Table 1 be given. If x is optimal for (a, b)-Q/β/γ, then x is also optimal for (a, b, f )-S/β/γ. This corollary is an extension of the equivalence results in [55], where the reduction result is shown for the two special cases with continuous variables where f is strictly convex and differentiable or where The validity of the reduction result of Theorem 1 for RAPs with submodular constraints and its special cases implies that any algorithm for solving the quadratic version of this problem can be used to solve the (a, b, f )-separable version. In particular, any time complexity or efficiency results for the quadratic version apply also to the (a, b, f )-separable version: Let a convex function f , vectors a ∈ R n >0 , b ∈ R n , and entries β and γ as specified in Table 1 be given. The worst-case time complexity of (a, b, f )-S/β/γ equals that of (a, b)-Q/β/γ.
Finally, for the case of continuous variables, Theorem 1 holds also when we replace the problem (a, b)-Q/β/C by (a, b,f )-S/β/C, wheref is a strictly convex function. This effectively turns our reduction result into an equivalence result between these two problems: and entries β as specified in Table 1 be given, and letf be a strictly convex function and f be an arbitrary convex function. If x is optimal for (a, b,f )-S/β/C, then x is also optimal for (a, b, f )-S/β/C.
Proof. Sincef is strictly convex, x is the unique optimal solution to (a, b,f )-S/β/C. It follows from Theorem 1 that the unique optimal solution to (a, b)-Q/β/C is x and thus that x is also optimal for (a, b, f )-S/β/C. Corollary 3 allows us to solve a given continuous (a, b, f )-separable RAP using any algorithm that solves the (a, b,f )-separable version of the problem for some strictly convex functionf , i.e., not only just for quadratic objectives. This can be beneficial in cases where efficient algorithms have already been developed for a specific choice of a non-quadratic objective function, motivated by the given application.
In Section 4, we focus in more detail on each of the special cases of α/SC/γ. In particular, using the reduction result in Theorem 1 and Corollary 2, we establish worst-case complexity results for the (a, b, f )-separable versions of these problems.

Algorithms and complexity results for special cases
In this section, we first provide for each of the constraint types specified in Table 1 a brief overview of known algorithms for the given special case and other known complexity results. In particular, we focus on algorithms and complexity results for the quadratic versions of these problems. Second, we use the complexity results on the quadratic versions of the problems to prove complexity results on the (a, b, f )-separable versions. These results are based on Theorem 1 and Corollary 1, which state that we can solve each of these problems by solving the same problem with a quadratic objective function, i.e., where f (y) = 1 2 y 2 . As a compact reference, Tables 2 and 3 summarize the complexity results discussed and obtained in this section.

α/Box/γ: Optimization over a single linear constraint
The resource allocation problem over a single linear constraint, (a, b)-S/Box/γ, can be formulated as follows: (a, b)-S/Box/γ : min x i∈N x ∈ R n if γ = C, Z n if γ = I.
This problem and its more general version S/Box/γ have been studied since the 1950s [61]. Since then, many solution approaches and algorithms have been proposed for this problem, especially for the problems Q/Box/γ. We refer to [61,62] for surveys on the continuous version S/Box/C and to [38] for a brief but thorough review on the integer version S/Box/I. The best known complexities for S/Box/C and S/Box/I are O(n log nR ǫ ) and O(n log R n ) respectively, where ǫ is an accuracy parameter [14,29]. Furthermore, their quadratic versions (a, b)-Q/Box/C and (a, b)-Q/Box/I can be solved in O(n) time ( [7] and [34] respectively). Through Corollary 2, this yields the following complexity results for (a, b, f )-S/Box/γ: The linear-time algorithms for (a, b)-Q/Box/C belong to the class of so-called breakpoint search algorithms that solve the problem by efficiently searching for the optimal Lagrange multiplier corresponding to the resource constraint (1) (see also [41]). The linear-time algorithm for (a, b)-Q/Box/I in [34] first solves the continuous version (a, b)-Q/Box/C of this problem using a linear-time algorithm such as in [7]. Subsequently, it uses this solution and a specific rounding scheme to construct an instance of (a, b)-Q/Box/I with R = O(n) that has the same optimal solution as the original instance of (a, b)-Q/Box/I. Using the algorithm in, e.g., [14,29] for S/Box/I, this instance can be solved in O(n log O(n) n ) = O(n) time.
With regard to practical execution time, there are several classes of algorithms that outperform the aforementioned linear-time algorithms. For example, for the problem (a, b)-Q/Box/C, [42] shows that so-called variable-fixing algorithms that run in O(n 2 ) time are in general faster than linear-time algorithms such as in [7]. These algorithms first compute a solution to the problem without the box constraints (2) and subsequently determine the optimal value of several variables that exceed their bounds in this solution. This process continues until none of the variables in the solution to the relaxed problem exceeds its bounds. The worst-case time complexity of O(n 2 ) is attained when only one variable can be fixed to its optimal value during each step in the procedure. However, this is quite a pathological case since it has the property that in the optimal solution all variables are equal to one of their bounds.
Moreover, [86] shows that for several instances of (a, b, f )-S/Box/C, a specialized interior-point method significantly outperforms other approaches including the linear-time breakpoint search approaches. Interior-point methods are iterative approaches where each intermediate solution is obtained from the previous one by taking a step in a search direction that is the solution of a perturbed version of the Karush-Kuhn-Tucker optimality conditions (see also [24]). Normally, the computation of this search direction is the computationally most expensive step of the interior-point method since it requires solving a linear system involving the constraint matrix. However, by exploiting the sparse structure of the constraint matrix for S/Box/C, the number of operations required to solve this system can be reduced from O(n 3 ) to O(n).
One reason for the in practice quite bad practical performance of linear-time algorithms for (a, b)-Q/Box/C is that they require the computation of the median of sets of numbers. However, to attain a linear-time complexity, also linear-time procedures for median finding such as in [6] have to be used. Such methods are in general significantly slower than alternative sorting-based approaches that run in linearithmic time [40,2].
The linear-time complexity of (a, b)-Q/Box/I is based on the linear-time complexity of (a, b)-Q/Box/C and the existence of linear-time algorithms for selecting a k th smallest element from a collection of sorted lists [34]. For the latter problem, many studies refer to [14] for such a linear-time algorithm. Analogously to the breakpoint search algorithms for (a, b)-Q/Box/C, this algorithm requires a linear-time algorithm for median-finding to attain a linear-time complexity and may thus be slower in practice than alternative sorting-based approaches. It should be noted, however, that recently new linear-time algorithms have been developed that are based on specialized heap data structures and have been shown to have a better practical performance (see, e.g., [37]).

α/GBC/γ: Optimization over generalized bound constraints
Let N 1 , . . . , N m be a partition of the index set N . Given parameters L, U ∈ R m , the resource allocation problem with generalized bound constraints can be formulated as Applications of this problem include portfolio optimization [45], transportation problems [11], stratified sampling [67], and electric vehicle charging [69].
In the literature, this problem is studied primarily with only the upper bound constraints in (3). [29] shows that S/GBC/γ with only generalized upper bound constraints can be solved in the same time as S/Box/γ by reducing the problem to a sequence of subproblems S/Box/γ over in total n variables. [69] shows a similar result for (a, b)-Q/GBC/γ with both generalized lower and upper bound constraints, which yields an O(n) algorithm for solving (a, b)-Q/GBC/γ. Thus, by Corollary 2, also the problems (a, b, f )-S/GBC/γ can be solved in O(n) time:

α/NC/γ: Optimization over nested constraints
Let N 1 , . . . , N m be subsets of N such that N 1 ⊂ · · · ⊂ N m ⊂ N . The resource allocation problem with nested constraints, (a, b)-S/NC/γ, is stated as follows: (a, b)-S/NC/γ : min x i∈N x ∈ R n if γ = C, Z n if γ = I.
Research on this problem and the more general problem S/NC/γ has almost exclusively focused on the case with either the lower or upper nested constraints in (4) but not both. We refer to [1] for a survey on this version of the problem.
The most efficient algorithm for both S/NC/γ and (a, b)-Q/NC/γ is the decomposition algorithm in [82]. This algorithm solves the problem as a sequence of S/Box/γ subproblems where the single-variable bounds (2)  The algorithm in [82] attains the O(n log m) time complexity by utilizing the linear-time algorithms for (a, b)-Q/Box/γ to solve the subproblems. As mentioned in Section 4.1, these are not the fastest algorithms for these subproblems. As a consequence, it can be expected that using, e.g., variable-fixing algorithms [42] for the subproblems significantly improves the overall execution time of the algorithm.
It has been shown [87,68] that infeasibility-guided algorithms such as in [79,87] are significantly faster than the decomposition algorithm in [82]. These algorithms first compute a solution to S/NC/γ without the nested constraints (4) and, based on which nested constraint is violated most in this solution, subsequently divide the problem into two smaller instances of this problem. Analogously to the variablefixing algorithms for (a, b)-Q/Box/C, the maximum number of divisions is O(n), which results in a worst-case time complexity of O(n 2 log nR ǫ ) for S/NC/C [79] and Θ(n 2 log R n ) for S/NC/I [87]. However, this worst-case complexity occurs only in pathological cases where each nested constraint is tight in an optimal solution, whereas it can be expected that the number of tight constraints is relatively small in practice. In particular, for the case with only upper nested constraints (4), lower single-variable bounds (5), and randomly generated problem parameters, it is shown in [83] that the expected number of tight constraints in an optimal solution to (a, b, f )-S/NC/C is O(log n).
An alternative algorithm for (a, b)-Q/NC/C that attains the same time complexity as [82] for m = n is given in [68]. This algorithm is similar to the decomposition algorithm of [82] in the sense that it solves a (slightly different) sequence of (a, b)-Q/Box/C subproblems where the single-variable bounds for each subproblem are optimal solutions to previous subproblems. However, this algorithm avoids the time-consuming explicit computation of solutions to subproblems by exploiting the properties of a specific breakpoint searching algorithm for (a, b)-Q/Box/C and computing only the optimal Lagrange multiplier of each subproblem. As a consequence, this algorithm is shown to be one order of magnitude faster than the decomposition algorithm of [82], while attaining the same worst-case time complexity of O(n log n) for m = O(n).
Recently, for the problem (a, b)-Q/NC/C with only upper nested constraints, [85] shows that a specialized interior-point method is able to outperform the decomposition-based approach in [83], which is similar to the approach in [82], when the ratio m n is larger than 0.1. Analogously to [86] as mentioned in Section 4.1, this method exploits the constraint structure of S/NC/C to compute search directions in O(n) time instead of O(n 3 ) time. Although the authors in [85] consider only upper nested constraints, it is straight-forward to generalize their results to problems involving also lower nested constraints [75]. Interestingly, [83,1] shows that we can solve the problem (a, b, f )-S/NC/C with only nested upper constraints and without the box constraints (5) in O(n) time. More precisely, they show that this problem can be reduced to the problem of finding a concave cover of n points in R 2 and give an O(n) time algorithm to find this cover. This algorithm is very similar to the recursive-smoothing algorithm mentioned in Section 1 that is used to solve the vessel speed optimization problem [58] and processor scheduling problem with agreeable deadlines [32].

α/LC/γ: Optimization over laminar constraints
Let N 1 , . . . , N m be subsets of N that satisfy the following property: if N j ∩ N ℓ = ∅, then either N j ⊂ N ℓ or N j ⊃ N ℓ for all j, ℓ ∈ M. We formulate the resource allocation with laminar constraints, (a, b, f )-S/LC/γ, as follows: Similarly to S/NC/γ, the problem S/LC/γ has been studied mainly with only the upper laminar constraints in (6). The algorithms with the lowest computational complexities for these problems are given by [29] and have time complexities of O(n log n log nR ǫ ) for γ = C and O(n log n log R n ) for γ = I. For the general problem S/LC/γ, we obtain an efficient algorithm by combining results on the complexity of general separable convex optimization problems with linear constraints [31] and of the problem S/LC/C with a linear objective function [59]. More precisely, the time complexities of S/LC/C and S/LC/I are O(P linear (8n 2 , m) log Rn ǫ ) and O(P linear (4n 2 , m) log Rm n ) respectively, where P linear (n, m) is the time complexity of solving an instance of S/LC/C with a linear objective function [31]. The latter problem can be solved in O(n log n) time using the algorithm in [59], hence we obtain a time complexity of O(n 2 log n log Rn ǫ ) and O(n 2 log n log Rm n ) for S/LC/C and S/LC/I respectively. With regard to the quadratic version of the problem, the special case of (a, b)-Q/LC/C with only upper laminar constraints can be solved in O(n log n) time [30]. This is done by reducing the problem to an instance of (a, b)-Q/NC/C with only upper nested constraints, which can be solved in O(n log n) time [30]. The general version of (a, b)-Q/NC/C with both lower and upper laminar constraints can be solved in O(n 2 ) time as an instance of the quadratic convex cost flow problem on a tree network [77]. Finally, the integer-valued problem (a, b)-Q/NC/I can be solved in O(n 2 ) time by first computing a solution to the continuous version of this problem and subsequently using a specific rounding procedure to obtain the optimal integer solution from this continuous solution [51]. By Corollary 2, this yields the following worst-case time complexities for (a, b, f )-S/LC/C and (a, b, f )-S/LC/I: As far as we are aware, the problem S/LC/γ has been studied primarily from an academic point of view in the literature, i.e., little attention is paid to possible applications. One relevant application that has received quite some importance in the past years is the scheduling of the (dis)charging of an electrical storage system within a smart grid (see also Section 5.2) where the energy can be drawn from each of the three phases within the low-voltage distribution network (see also [69]). The resulting problem is an instance of S/LC/C where the feasible set is the intersection of nested constraints (to model the storage capacity limits) and generalized upper bound constraints (to model the charging limits). We plan to investigate this topic further in future research.

α/SC/γ: Optimization over submodular constraints
Given a submodular function r over the ground set N , the (a, b, f )-separable resource allocation over submodular constraints can be formulated as follows: (a, b, f )-S/SC/γ : min x i∈N x ∈ R n if γ = C, Z n if γ = I.
For this problem, one can find two classes of algorithms in the literature. The first class consists of decomposition algorithms that first compute a solution to the problem without the submodular constraints (7) and, based on which constraints are violated by this solution, split up the problem into two smaller instances of S/SC/γ [16,25]. Note that the infeasibility-guided algorithms for S/NC/γ as discussed in Section 4.3 are based on the same principle. The best worst-case time complexities of such algorithms are O(n 2 log nr(N ) ǫ + n · EO) for S/SC/C [55] and O(n 2 (log r(N ) n + F log r(N )) + nF ) for S/SC/I [38], where EO is the time required to minimize a given submodular function and F is the time required to check the feasibility of a given vector for the submodular constraints. Moreover,F is the time required to determine for a given solution x that is feasible for the submodular constraints (7) by how much we can increase a given variable x i without violating any of these submodular constraints. For the quadratic problems (a, b)-Q/SC/γ, these complexities reduce to O(n 2 + n · EO) for (a, b)-Q/SC/C and to O(n 2 F log r(N ) + nF ) for (a, b)-Q/SC/I. By Corollary 1, these are also the complexities for solving the problems (a, b, f )-S/SC/C and (a, b, f )-S/SC/I using decomposition algorithms.
The second class consists of greedy algorithms that solve the integer version S/SC/I by incrementally building an optimal solution (see, e.g., [29,50]). However, instead of incrementing the total amount of allocated resource by unit steps, these algorithms apply a scaling procedure to determine larger step sizes that speed up the building process while still maintaining feasibility of the current solution. To solve the continuous version S/SC/C, these algorithms exploit a proximity result between optimal solutions of S/SC/C and S/SC/I (see, e.g., [51]) that states that for any optimal solution x * to S/SC/I there exists an optimal solutionx to S/SC/C such that |x i − x * i | ≤ n − 1. As a consequence, to solve S/SC/C with an given accuracy ǫ, one can scale all problem parameters by a factor ⌈ n ǫ ⌉, solve the scaled problem with integer variables using the greedy algorithm, and scale back the resulting solution. The most efficient algorithms of this class run in O(n(log n +F ) log r(N ) ǫn ) time for S/SC/C and O(n(log n +F ) log r(N ) n ) for S/SC/I [29,50], which unfortunately cannot be improved for the quadratic cases (a, b)-Q/SC/C and (a, b)-Q/SC/I.
One relevant special case of (a, b)-Q/SC/C is the problem of computing the minimum-norm point of a base polytope (see, e.g., [18]). This problem is equivalent to (ē, 0)-Q/SC/C and plays an important role as a subroutine in several algorithms for machine learning problems and submodular function minimization [19,3]. One of the most popular algorithms in practice for finding the minimum-norm point is Wolfe's algorithm [84], which solves the problem by iteratively updating a hyperplane and the minimum-norm point on this hyperplane based on the feasibility of this point. The authors in [8] show that this algorithm computes an ǫ-approximate solution to (ē, 0)-Q/SC/C in O( nM 2 ǫ ) time, where M is the norm of the maximum-norm point. Although there are algorithms for finding the minimum-norm point that have a better computational complexity, e.g., the aforementioned decomposition and greedy algorithms, Wolfe's algorithm has been shown to be among the fastest algorithms in practice [19,3].

Impact on applications
The goal of this section is to show the relevance of (a, b, f )-separable resource allocation problems in applications. As we discussed in the previous section, our newly derived complexity results might not directly lead to practical faster algorithms for these problems. However, for a number of applications from the domains of telecommunications, statistics, and energy management, we show that our reduction result lead to new insights into common practices in these fields. In particular, we show that two problems in the area of vessel routing and processor scheduling can be solved in O(n log n) time rather than O(n 2 ) time, which was the previously known best complexity for these problems. Finally, with this collection of applications and the included references, we intent to stimulate cross-disciplinary research that leads to new structural results and algorithms for RAPs that are applicable to many different research fields.

Power allocation in multi-channel communication systems
In many telecommunication systems, data can be transmitted over several parallel channels to reduce the amount of noise experienced when transmitting the data (see, e.g., [71]). The amount of data that can be transmitted through a given channel i, i.e., the channel capacity, depends on the power x i spent on this channel, its bandwidth B i , and a "gain" parameter c i that represents the amount of noise on the channel. One goal in these systems is to allocate a given budget of total power P tot over a set N of n channels such that the overall channel capacity is maximized while respecting power limits on each channel. This problem can be formulated mathematically as whereP i is the maximum allowed power on channel i. Note that for a given channel i ∈ N we have Since the second term B i log(B i c i ) in the above expression is constant, we can replace the objective function of Problem (P) by i∈N B i log Bici . Note that this is more efficient than several existing approaches for solving Problem (P) that claim a linear time complexity (see, e.g., [43,39]). The reason for this is that these algorithms achieve this complexity only if the gain parameter c has already been sorted, which is however only the case for some specific communication systems (see, e.g., [60]).
Another common objective for the channel power allocation problem (P) (see, e.g., [88]) is to minimize the mean square error between different channels from a set N . This objective is given by Moreover, several variations of the channel power allocation problem have been studied with, e.g., bounds on disjoint or nested subsets of allocations (see, e.g., [27] and [13] respectively). Analogously to Problem (P), one can show that these problems are instances of (a, b, f )-S/GBC/C and (a, b, f )-S/NC/C and thus can be solved as instances of (a, b)-Q/GBC/C and (a, b)-Q/NC/C respectively.

Storage operation in energy systems
Storage systems are becoming a crucial part of current and future sustainable energy systems (see, e.g., [65,46,89]). Such systems support satisfying the energy demand of, e.g., a neighborhood, when renewable energy sources such as solar and wind are insufficient due to, e.g., unfavorable weather conditions. Commonly, the operation of the storage systems is done in a way that the stress on the overall grid is reduced as much as possible. Determining for a given time horizon the best operational schedule for the storage, i.e., how much energy should be (dis)charged at each moment to reach the overall goal in the best way, leads to an optimization problem. In this problem, we divide the overall time horizon into n equidistant time intervals of length ∆t indexed by the set N := {1, . . . , n} and determine for each interval i ∈ N the (dis)charging power x i during this interval. This amount is limited by the minimum and maximum charging rates X min and X max . Moreover, the charging must be done such that the storage capacity D is not exceeded. Given the initial amount of energy S start in the storage and a desired target amount S end at the end of the horizon, the storage operation problem can be formulated as follows (see also [79]): where the functions φ i represent the desired grid objective. Note that if each function φ i is convex, which is in general the case in this problem setting, this problem is an instance of S/NC/C. Three commonly seen objectives that are used to reduce grid stress and congestion are: minimal import and export of energy from the main grid (also known as energy-autarky, see, e.g., [52]), load profile flattening (see, e.g., [23]), and minimizing peak consumption (see, e.g., [78]). One way to model the latter case is to set a maximum level M for the overall power consumption of the neighborhood. Given the power consumption p := (p i ) i∈N of the neighborhood, we can model these objectives as follows: Minimizing exchange with main grid: where f is a convex non-decreasing function with f (M ) = 0. Note that for the objective of load profile flattening, Problem (B) is an instance of (ē, p)-Q/NC/C, whereē is the vector of ones. Moreover, for the other two objectives, Problem (B) is an instance of (ē, p, f )-S/NC/C where f is the absolute value function or the piecewise function It follows by Corollary 1 that the optimal solution to (ē, p)-Q/NC/C is also optimal for (ē, p, f )-S/NC/C for these two functions. This implies that we can schedule the storage (dis)charging such that all three objectives are satisfied simultaneously by aiming for load profile flattening. This is an effect that can also be observed for other renewable energy systems such as photovoltaic (solar panel) systems and electric vehicle charging (see, e.g., [54]) and heat pumps (see, e.g., [80]). Moreover, energy tariff systems that employ piecewise linear cost functions have been shown to be able to flatten the load profile, i.e., the objective modeled by a quadratic cost function (see, e.g., [64]). Since such tariff systems are simpler to explain to end users, they are more likely to be accepted than systems using quadratic cost functions while still achieving the desired objective of load profile flattening.

Stratified sampling
Stratified sampling is a sampling method suitable for situations where it is likely that a random sample is not a proper representation of the population [57]. Such a situation occurs, e.g., when several subclasses of the population score extremely on the to-be-estimated characteristic. To deal with this specific case, we partition the given population into n so-called strata with sizes N 1 , . . . , N n that, ideally, represent the aforementioned subclasses. Given the desired overall sample size R, the goal is to determine for each stratum i ∈ N := {1, . . . , n} the number of samples x i drawn from this stratum while minimizing the variance of the given characteristic. Following the formulation in [15], the optimal sample allocation is the solution of the following optimization problem: where S 2 i is the variance of the characteristic within stratum i. Similarly to [15], the sample bounds of 0 and N i can be chosen differently to ensure a minimum or maximum number of samples drawn from a given stratum.
Let D ∈ R n be a vector with D i := N 2 i S 2 i for all i ∈ N . Then the above problem is an instance of the problem (D, 0, f )-S/Box/I with f (x i ) = 1 xi . Thus, by Corollary 4, we can solve this problem as an instance of (D, 0)-Q/Box/I in O(n) time. Note that, in contrast to the approaches in, e.g., [15], this complexity depends only on the number n of strata and not on the actual strata sizes N 1 , . . . , N n or desired sample size R. As a consequence, our reduction result yields a promising approach to determine optimal sample sizes in large datasets, which can contain billions of samples (see, e.g., [49]).

Vessel speed optimization
A recent trend in ship routing is to actively manage the ship's sailing speed to reduce fuel costs and carbon emissions [63]. As a consequence, when determining the routes of a fleet of ships to deliver cargo within given timing constraints, one must be able to determine the minimum cost of having a ship sail a given route. This problem is known as the vessel speed optimization problem (see, e.g., [58,33]). In this problem, we are given a route between n + 1 ports starting at port 0 at time t start and required to finish at time t end at port n. The distance between consecutive ports i − 1 and i is given by d i and each port i must be serviced by the ship within a given time window [A i , D i ]. The goal is to determine for each leg i ∈ N := {1, . . . , n} of the tour, i.e., for each distance d i , a speed v i such that the fuel cost of sailing at these speeds is minimized. Following [58,33], we formulate this problem as follows: Here, v min and v max are the minimum and maximum cruising speeds and c is a non-decreasing convex function that models the relation between sailing speed and fuel costs per unit distance. From this formulation, it follows by induction on i that Note that q is convex since c is non-decreasing. This means that Problem (V) is equivalent to the following convex optimization problem: This problem is an instance of (d, 0, q)-S/NC/C. Hence, by Corollary 6, this problem and thus Problem (V) can be solved in O(n log n) time by, e.g., the fast algorithm in [68]. This result is relevant since Problem (V) often occurs as a subproblem in fleet routing algorithms [63] and thus using a faster algorithm for this subproblem can lead to significant speed-ups for the overall algorithm.

Speed scaling
Efficient energy usage is an important topic within the development of computing systems [90]. To reduce energy consumption, modern computer processors can adjust their speed to save energy while still meeting their performance constraints. This leads to scheduling problems where a set of tasks needs to be scheduled and processor speeds need to be chosen such that all tasks are executed before their deadline (see [21] for a survey). One special case of these types of scheduling problems is the case where the deadlines are agreeable, i.e., deadlines are ordered according to the arrival times of the tasks (see also [5]). In this problem, we are given n tasks indexed by the set N that must be processed on a single processor. Each task i ∈ N has an arrival time A i , deadline D i , and amount of work w i that can be interpreted as the amount of operations and calculations the processor must execute to perform this task. The goal is to select for each task i an execution speed s i and starting time B i such that each task is processed before its deadline and the total energy usage of the processor is minimized.
Since the deadlines are agreeable we have that D i ≤ D k if A i ≤ A k for any two tasks i, k ∈ N . Moreover, in an optimal schedule, the tasks can be scheduled in non-decreasing order of their deadlines [5]. This means that we can formulate this speed scaling problem as follows (see also [20,Chapter 4]: where s max is the maximum processor speed and p is a convex function that models the relation between processor speed and its energy usage. Note that we can impose a nonzero lower bound on each s i so that the feasible set of this problem is guaranteed to be closed. Since we must choose the speeds such that each task can be executed in the maximum time that is available for it, we have that wi si ≤ D i − A i . This yields a lower bound on s i of wi Di−Ai that is nonzero since w i > 0 and D i > A i . If the processor is active until the latest deadline regardless of the scheduling of the tasks, then there exists an optimal schedule with no idle time [35]. This means that we can add without loss of generality the constraint B i = i k=1 wi si for all i ∈ N to the formulation of Problem (S). Let x i := wi si for all i ∈ N and q(x i ) := x i p(1/x i ) (note that q is convex). It follows that Using the lower bound on s i , the added constraint on B i , the transformation x i = wi si , and the function q, we can reformulate Problem (S) to min x∈R n i∈N This is an instance of (w, 0, q)-S/NC/S. Hence, by Corollary 6, this problem and thus Problem (S) can be solved in O(n log n) time. This result also leads to complexity improvements for speed scaling problems that can be reduced to Problem (S), e.g., for the multi-core processor scheduling problem considered in [22].
Recently, [73] applied the equivalence result in [55] to improve the time complexity of several other speed scaling problems. Together with the result in this section, this suggests that there is a great potential for using the reduction result in this article to contribute to more efficient algorithms within this research field.

Conclusions and outlook
In this article, we studied the resource allocation problem (RAP) with additional submodular constraints. We proved that the class of RAPs whose objective function is (a, b, f )-separable can be solved efficiently as quadratic RAPs if a certain optimality condition of the general separable problem is satisfied. Using this reduction result, we derive new worst-case time complexity results on several relevant special cases of the studied problem. Moreover, we have shown the impact of our reduction result on several core problems in wireless communications, smart grids, statistics, routing, and processor management.
One major direction for future research is the extension of the reduction result to other problems. The most intuitive starting point for this is to search for other optimization problems that satisfy the required optimality condition. Promising candidates for this are problems that are variations on the RAPs studied in this article, e.g., RAPs with interval and cardinality constraints [76,70] and with additional nonseparable terms in the objective functions [66,72,69]. Besides this more technical direction, in the light of a more cross-disciplinary approach towards the study of RAPs, it is worthwhile to identify more research fields and applications, next to the ones that we discussed in this article, where RAPs are being studied and where our results can have impact and lead to new insights.
A Laminar constraints are a special case of submodular constraints In this appendix, we show that laminar (or tree) constraints are a special case of submodular constraints. Recall that • laminar constraints are of the form L j ≤ i∈Nj x i ≤ U j , j ∈ M, where the subsets N 1 , . . . , N m of N have the property that either N j ∩ N ℓ = ∅, N j ⊂ N ℓ , or N j ⊃ N ℓ for all j, ℓ ∈ M; • submodular constraints are of the form i∈S x i ≤ r(S), S ⊂ N and i∈N x i = r(N ), where r is a submodular function.
For this, we use a result from [17,18] on so-called cross-free families of subsets. A family F ⊆ 2 N is called cross-free if none of its elements cross, i.e., for any two subsets X , Y ∈ F we have that at least one of the sets X ∩ Y, X ∩ (N \Y), (N \X ) ∩ Y, or (N \X ) ∩ (N \Y) is empty. For a given cross-free family F containing ∅ and N and for any set function r : F → R with r(∅) = 0, the set is a base polyhedron [17,18]. This means that there exists a submodular function r ′ : 2 N → R such that Thus, we can show that laminar constraints are a special case of submodular constraints if for a given feasible set C ′ determined by laminar constraints we can find a cross-free family F and a set function r : F → R such that B(F , r) = C ′ . For given laminar constraints L j ≤ i∈Nj x i ≤ U j , j ∈ M and a feasible set C ′ := {x ∈ R n | L j ≤ i∈Nj ≤ U j , j ∈ M}, we define the following family of subsets of N : Note, that the feasible set C ′ is equal to B(N ′ , r ′ ), where r ′ : N ′ → R is a set function on N ′ given by We claim that N ′ is a cross-free family, which immediately implies that the set B(N ′ , r ′ ) is a base polyhedron and thus that laminar constraints are a special case of submodular constraints. For this, we consider for two different sets X , Y ∈ N ′ four cases: B An alternative proof that Condition 1 holds for the resource allocation problem with laminar constraints Here we present an alternative proof of the claim that Condition 1 holds for the resource allocation problem with laminar constraints (S/LC/γ). Before we prove this result in Lemma 3, we first show that the difference between any two feasible solutions x and z of S/LC/γ can be written as a nonnegative combination of vectors in E C ′ (x), where C ′ is the feasible set of S/LC/γ. In other words, z − x belongs to the cone generated by the vectors in E C ′ (x). To this end, we present the following procedure to obtain this combination. Starting from the solutionx 0 := x, we construct a series of intermediate vectors (x t ) t≥0 that finally leads to z by iteratively transferring amounts between two variables. We do this in such a way that the distance i∈N |z i −x t i | reduces as t increases and becomes zero for somet ≥ 0, meaning thatxt = z. To ensure finiteness of this process, we always choose two variables with indices i, k such thatx t i > z i andx t k < z k . By transferring an amount of λ ik := min(x t i − z i , z k −x t k ) between those variables, we have for the subsequent vectorx t+1 that eitherx t+1 i = z i orx t+1 k = z k . By repeating this process, we finally reach an intermediate vectorxt that equals z. For each selected pair (i, k), the value λ ik represents a positive coefficient in the desired conic combination.
To ensure that each index pair with a positive coefficient is an exchangeable pair (see also Lemma 2), i.e., is in E C ′ (x), we restrict the choice of index pair in the procedure in the following way. First, we order the subsets such that N j ⊂ N j ′ implies j > j ′ for all j, j ′ ∈ M. Moreover, we define N 0 := N . Now we iterate through the subsets from N m to N 0 and during iteration j we allow only exchanges between variables whose indices belong to the current subset N j .
The procedure is summarized in Algorithm 1. In this algorithm, for any j ∈ {0} ∪ M, t j is the last iteration index such that no exchanges are allowed between a variable whose index is in N j and a variable whose index is not in N j .
Algorithm 1 Computing z − x as a conic combination of vectors in E C ′ (x).
1: Input: Two feasible solutions x, z to S/LC/γ 2: Output: Weight matrix λ ∈ R n×n ≥0 3: Initialize λ ik = 0 for all i, k ∈ N 4: Order subsets such that N j ⊂ N j ′ implies j > j ′ for all j, j ′ ∈ M 5: N 0 := N ; t = 0;x 0 := x 6: for j = m down to 0 do 7: while there exist i, k ∈ N j such thatx t i > z i andx t k < z k do 8:

If
3. If x t i < z i for a given t ≥ 0, then z i ≥x t ′ i ≥x t i ≥ x i for all t ′ > t; 4. If x t i = z i for a given t ≥ 0, then x t ′ i = z i for all t ′ > t. 5. For a given j and t ≥ t j , we have that eitherx t i ≤ z i for all i ∈ N j orx t i ≥ z i for all i ∈ N j . 6. Each index pair (i, k) ∈ N 2 is selected at most once over the entire course of the algorithm. 7. For a given j and any t ≤ t j , it holds that ℓ∈Njx t ℓ = ℓ∈Nj x ℓ .
Proof. Part (1): Follows by induction on t since ℓ∈Nx t+1 ℓ Part (2): For a given t ≥ 0, we have thatx t i > z i implies that eitherx t+1 i =x t i (if i is not selected during iteration t) or z i ≤x t+1 i <x t i (if i is selected during iteration t). Thus, we have thatx t i > z i implies that z i ≤x t+1 i ≤x t i . By induction, one can deduce that if x t i > z i , then z i ≤x t ′ i ≤x t i ≤ x i for all t ≥ 0 and t ′ > t.
Part (3): Is analogous to the proof of Part (2). Part (4): If x t i = z i , then i will not be selected anymore as part of an exchangeable pair. Hence, x t i = x t+1 i = · · · = z i . Part (5): By definition of t j , we have that eitherx tj ℓ ≥ z ℓ for all ℓ ∈ N j orx tj ℓ ≤ z ℓ for all ℓ ∈ N j . It follows directly from Parts (2)-(4) that in the first casex t ℓ ≥ z ℓ for all ℓ ∈ N j and that in the second casex t ℓ ≤ z ℓ for all ℓ ∈ N j . Part (6): If the pair (i, k) is chosen during some iteration t, then eitherx t+1 i = z i orx t+1 k = z k . Thus, at least one of the indices i, k cannot be chosen again as part of a pair, hence the pair (i, k) is selected at most once. Part (7): For a given t ≤ t j , let (i, k) denote the selected pair during iteration t − 1. Thus, there is a subset N j ′ with j ′ > j such that i, k ∈ N j ′ . By the ordering of the subsets, we have either N j ∩ N j ′ = ∅ or N j ′ ⊂ N j . Thus, either both or neither of the indices i and k are in N j . This implies that ℓ∈Njx t ℓ = ℓ∈Njx t−1 ℓ . By induction on t, it follows that ℓ∈Njx t ℓ = ℓ∈Njx 0 ℓ = ℓ∈Nj x ℓ . Part (8): Follows from Part (6) and the fact that λ ik = 0 if the pair (i, k) has not been chosen during any iteration.
Lemma 1 implies that for any two feasible solutions x and z, the difference z − x can be written as a nonnegative combination of the vectors (e k − e i ) (i,k)∈N 2 . We strengthen this result in Lemma 2 by proving that z − x can be written as a nonnegative combination of the vectors in E C ′ (x). Lemma 2. Let λ and (x t ) t≥0 be the output of Algorithm 1 applied to two feasible solutions x and z of the problem S/LC/γ. If λ ik > 0 for a given pair (i, k) ∈ N 2 , then (i, k) ∈ E C ′ (x) and λ ℓ,i = λ k,ℓ = 0 for all ℓ ∈ N .
Proof. Note that for any two indices i, k ∈ N , the solution x + ǫ(e k − e i ) is feasible for some ǫ > 0 if and only if we have for each subset N j that contains i but not k that ℓ∈Nj x ℓ > L j , and for each subset N j ′ that contains k but not i that ℓ∈N j ′ x ℓ < U j ′ . Let N j ′ be the minimal subset that contains both i and k, i.e., there is no other subset N j such that N j ⊂ N j ′ and i, k ∈ N j . If λ ik > 0, then there exists t j ′ +1 < t ≤ t j ′ such that the pair (i, k) has been selected during iteration t. Thus,x t i >x t+1 i ≥ z i and x t k <x t+1 k ≤ z k . By Parts (2) and (3) of Lemma 1, this means that x i > z i and x k < z k and thatx t i ≥ z i andx t k ≤ z k for all t ≥ 0. By Part (5) of Lemma 1, this means that for any subset N j that contains i but not k we have thatx tj ℓ ≥ z ℓ for all ℓ ∈ N j since j > j ′ . In particular, we have by Part (2) thatx tj i > z i sincex t i > z i . It follows from feasibility of z and Part (7) that L j ≤ ℓ∈Nj z ℓ < ℓ∈Njx tj ℓ = ℓ∈Nj x ℓ . Analogously, we can show that U j > ℓ∈Nj x ℓ . Thus, the solution x + ǫ(e k − e i ) is feasible for some ǫ > 0, hence (i, k) ∈ E C ′ (x).
Note that for any ℓ ∈ N , we can only have that λ ℓi > 0 if there is some iteration t withx t i < z i . Sincex t i ≥ z i for all t ≥ 0, we must have that λ ℓi = 0. Analogously, we must have that λ kℓ = 0 sincē x t k ≤ z k for all t ≥ 0. Lemma 2 implies that we can partition N into three subsets such that one subset contains all indices i for which λ ik > 0 for at least one k ∈ N , one subset contains all indices i for which λ ki > 0 for at least one k ∈ N , and one subset contains all indices i such that λ ik = λ ki = 0 for all k ∈ N . More precisely, we can define the following partition of N : L(x) := {i ∈ N | λ ik > 0 for some k ∈ N }, U(x) := {i ∈ N | λ ki > 0 for some k ∈ N }, F (x) := N \(L(x) ∪ U(x)) = {i ∈ N | λ ik = λ ki = 0 for all k ∈ N }.
Using this partition and Lemma 2, we can show that S/LC/γ satisfies Condition 1. Proof. First, we prove the "only if"-part. Suppose x is optimal for S/LC/γ and there exists an index pair (i, k) ∈ E C ′ (x) such that φ + k (x k ) < φ − i (x i ). By definition of E C ′ (x) and the left and right derivatives φ + k and φ − i , there exists ǫ > 0 such that x + ǫ(e k − e i ) is feasible and This implies that the objective value of x + ǫ(e k − e i ) is smaller than that of x. Hence, x cannot be optimal, which is a contradiction. It follows that φ + k (x k ) ≥ φ − i (x i ) for all (i, k) ∈ E C ′ (x). Second, we prove the "if"-part. Let x be a feasible solution such that φ + k (x k ) ≥ φ − i (x i ) for all (i, k) ∈ E C ′ (x) and let z be an arbitrary feasible solution. Moreover, let λ ∈ R n×n denote the output of Algorithm 1 when applied to x and z. By Lemma 2 and definition of the sets L(x), U(x), and F (x), we have that z − x = λ ik (e k − e i ).
We define the following subgradient g ∈ R n at the solution x: By convexity of the functions φ i , it follows that It follows that x is optimal since z is an arbitrary feasible solution.