从子模优化之后继续往前:Projected Gradient Descent in Isabelle/HOL
在完成子模优化项目的 AFP cleanup 之后,我原本以为下一步会很直接,比如整理+提交+等待 review,然后再慢慢考虑下一个 Isabelle 项目,但实际流程没有这么线性。AFP 那边暂时出现了一些技术性的 blocking issue,我问了负责人之后得到的答复是还需要等一等,于是这段时间变成了一个很微妙的空档,刚好又考完闲得慌。
我不太想把这段时间完全空掉,于是我开始做另一个 Isabelle/HOL 项目:Projected Gradient Descent in Isabelle/HOL。
这次的对象从 combinatorial optimization 里的 submodular greedy 转到了 smooth convex optimization 里的 first-order methods。表面上看它像是换了一个方向,从集合函数和贪心算法换成梯度下降、投影、凸性、光滑性和收敛率。但对我来说这两个项目其实在同一条线上,它们都不是简单地把一个定理搬进 Isabelle,而是在问同一个问题:如果一个算法证明要变成真正可复用、可检查、可继续扩展的形式化结构,它需要被拆成哪些层?
为什么是 projected gradient descent
Projected gradient descent 是优化里非常基础但也非常核心的一类方法。无约束的 gradient descent 已经是 first-order optimization 的入口,而 projected gradient descent 则把可行域约束放进了迭代过程里:每一步先沿着负梯度方向走,再投影回可行集合。
纸面上来讲,这个算法的形式其实非常简洁。无约束时是:
x_{n+1} = x_n - alpha * G x_n
有约束时是:
x_{n+1} = P_C (x_n - alpha * G x_n)
这里 P_C 是到闭凸集 C 上的 metric projection。
但越是这种看起来标准的东西,放进 Isabelle 里越会暴露很多隐藏结构。比如什么叫 gradient-like map?smoothness 要用 Lipschitz gradient 表述,还是用 quadratic upper bound 表述?projection 的 variational inequality 应该放在哪一层?收敛证明里的 telescoping argument 应该绑定在具体算法上,还是抽出来作为独立 infrastructure?这些问题在纸面证明里常常被默认处理,但在形式化里必须被明确组织。
所以这个项目一开始我就没有把目标写成证明 projected gradient descent 收敛这么单薄。更准确地说我想做的是一套 reusable descent/projection convergence infrastructure。
从算法证明到 library 结构
目前 repo 的结构已经比较清楚,它是按层拆开的 Isabelle session。
最底层是一些 gradient 和 convex differentiability 的基础接口,比如函数在 real inner-product space 上的梯度式描述、convex differentiable function 的一阶不等式,以及 smooth upper-bound 这种更适合做 descent proof 的假设形式。
再往上是 algorithm-independent 的 descent infrastructure。这里我比较在意的一点是不要把所有 telescoping、averaging、finite-sum arguments 都锁死在某一个算法里,因为这些结构以后不只会服务于 gradient descent,也可能服务于 projected methods、proximal methods、conditional gradient,甚至更复杂的 first-order schemes。形式化证明里最容易写成一团的就是这些看起来只是技术引理的部分,但它们其实决定了整个 library 以后能不能继续下去。
然后是 gradient descent 本身,包括 one-step descent estimates、sublinear O(1/N) convergence、gradient residual bounds 等等。这一层可以看成 projected case 之前的基础版本:没有可行域约束,先把 smooth convex first-order method 的主线跑通。
真正的 projected 部分从 projection geometry 开始。这里需要把 closed convex set 上的 metric projection 作为一个数学对象处理,并证明它满足对应的 variational inequality。这个部分对我来说很有意思,因为 projection 在算法描述里只是一个符号,但在形式化里它本身就需要一层几何理论作为支撑。
在此基础上,项目再进入 projected-gradient descent:可行性保持、函数值收敛、O(1/N) 型 finite-horizon bound、projected-gradient mapping,以及 residual / epsilon-stationarity certificate。这个 residual layer 是我比较喜欢的一部分,因为它不只是说函数值下降而是开始把某一步已经接近一阶最优性这种算法分析里很常用的证书也 formalise 出来。
最后还有 strong convexity 和 linear-rate convergence。这一层让整个项目不仅停留在 basic convex case,也能覆盖更强假设下的线性收敛结论。
这次我更在意 public theorem surface
做子模优化项目的时候,我已经意识到一个问题:一个 Isabelle project 如果只是内部能 build,其实还不够。尤其是如果目标是 AFP 或者未来给别人复用,那么它必须有一个相对稳定的 public surface。
所以这次我特意放了 Main_Results.thy 作为主要入口,而不是让下游用户直接 import 一堆内部 theory file。Main_Results 里收集的是比较稳定、可以被引用和复用的 theorem aliases,比如 gradient descent 的 sublinear complexity、projected gradient descent 的 sublinear complexity、projected-gradient mapping 的 optimality certificate、epsilon-stationarity complexity、strongly convex case 下的 distance / function-value linear complexity 等。
这件事看起来像命名和整理但其实比想象中重要很多。因为 formalisation 一旦往 library 方向走,内部证明文件迟早会重构,如果下游依赖直接绑在内部结构上,那么每次重构都会把别人也拖下水。public API 的意义就是把“我内部怎么证明”和“别人应该怎么使用”分开。
这个项目和子模优化项目的关系
如果说 submodular greedy 项目让我第一次比较系统地接触了 algorithmic theorem formalisation,那么这个 projected-gradient descent 项目更像是把我带向了 optimization library building。
子模优化那边的核心是:怎样把 greedy construction、marginal gain、submodularity、approximation recurrence 拆开,并让 LazyGreedy 这种更接近实现层的变体继承 classical guarantee。
而这个项目的核心则变成了:怎样把 convexity、smoothness、projection geometry、descent lemma、residual certificate、strong convexity 和 convergence rate 拆成可以复用的 Isabelle 组件。
两者的数学对象不同但形式化思维是一致的。它们都在逼我面对一个问题:一个看起来标准的证明,到底依赖了多少没有被说出来的结构?
这也是我越来越喜欢 Isabelle 的地方。它不是只帮我检查最后一步对不对,而是不断迫使我把整条证明路线里的接口、假设和依赖关系都暴露出来。很多时候真正有研究价值的东西并不只是最终 theorem statement,而是在 formalisation 过程中被迫整理出来的那套 architecture。
当前状态
目前这个 repo 已经包含 smooth convex first-order methods 的主要基础设施,并以 projected-gradient descent 作为中心应用。当前完成的部分包括:
- gradient 和 convex differentiability 的基础接口;
- smooth upper-bound 和 Lipschitz-smoothness bridge;
- algorithm-independent descent / telescoping / averaging lemmas;
- gradient descent 的 one-step estimates 和
O(1/N)收敛; - projection geometry 和 projected-gradient descent;
- projected-gradient mapping 与一阶最优性证书;
- residual convergence 和 epsilon-stationarity complexity;
- strong convexity 下的 distance-gap estimate 与 linear-rate convergence;
- quadratic 和 projected quadratic examples;
Main_Results.thy作为下游使用的 public theorem interface;- AFP-oriented 的 session、document 和 bibliography 结构。
我现在不打算在这个阶段继续疯狂加新算法,因为我感觉目前最重要的事情不是再多塞几个 method,而是把现有 entry 打磨到足够稳定,比如命名、文档、public aliases、build hygiene、AFP document,以及不同 theory file 之间的依赖关系都要尽量干净。
接下来
我现在越来越觉得,形式化证明对我来说不是一个单独的技能而是一种研究方式。它让我在读算法证明、写优化理论、设计 abstraction 的时候,都更自然地问:这个结论真正依赖什么?哪些假设可以抽象出去?哪些证明结构可以复用?如果以后别人要在这个基础上继续做,他们应该 import 什么、相信什么、避免碰什么?
Projected-gradient descent 本身当然是经典内容,但把它整理成一套 Isabelle/HOL 里的 reusable convergence infrastructure 对我来说是很自然的下一步。
Moving Forward After Submodular Optimization: Projected Gradient Descent in Isabelle/HOL
After finishing the AFP cleanup for my submodular optimization project, I thought the next step would be linear: polish, submit, wait for review, and only later think about another Isabelle project. In practice, the process has not been that linear. The AFP side is temporarily blocked by a technical issue, and after asking the maintainer I learned that I need to wait a bit. So this became a strange open interval after exams.
I did not want that interval to go empty, so I started another Isabelle/HOL project: Projected Gradient Descent in Isabelle/HOL.
This time the object moves from submodular greedy in combinatorial optimization to first-order methods in smooth convex optimization. On the surface this looks like a different direction: from set functions and greedy algorithms to gradient descent, projection, convexity, smoothness, and convergence rates. But to me the two projects belong to the same line. They are not just about moving one theorem into Isabelle; they ask how an algorithmic proof should be decomposed if it is meant to become reusable, checkable, and extensible formal infrastructure.
Why projected gradient descent
Projected gradient descent is one of the basic but central methods in optimization. Unconstrained gradient descent is already the entry point to first-order optimization, and projected gradient descent folds feasibility constraints directly into the iteration: take a negative-gradient step, then project back to the feasible set.
On paper, the algorithm is very compact. The unconstrained step is:
x_{n+1} = x_n - alpha * G x_n
With constraints, it becomes:
x_{n+1} = P_C (x_n - alpha * G x_n)
Here P_C is the metric projection onto a closed convex set C.
But the more standard something looks, the more hidden structure it reveals once it enters Isabelle. What exactly is a gradient-like map? Should smoothness be stated as a Lipschitz-gradient assumption or as a quadratic upper bound? Where should the variational inequality for projection live? Should the telescoping argument in the convergence proof be tied to the concrete algorithm, or extracted as independent infrastructure? Paper proofs often pass over these decisions silently, but formalisation has to organize them explicitly.
So I did not frame the project as merely proving convergence of projected gradient descent. More precisely, I want it to become reusable descent/projection convergence infrastructure.
From algorithm proof to library structure
The repository is already organized as a layered Isabelle session.
The bottom layer contains interfaces for gradients and convex differentiability, including gradient-style descriptions of functions on real inner-product spaces, first-order inequalities for convex differentiable functions, and smooth upper-bound assumptions that are convenient for descent proofs.
Above that sits algorithm-independent descent infrastructure. I care about not locking telescoping, averaging, and finite-sum arguments into one specific algorithm, because those structures may later serve gradient descent, projected methods, proximal methods, conditional gradient, or more complicated first-order schemes. In formal proofs, these technical lemmas are easy to write into a knot, but they decide whether the library can keep growing.
Then comes gradient descent itself: one-step descent estimates, sublinear O(1/N) convergence, gradient residual bounds, and related results. This is the unconstrained base case before the projected setting.
The projected part begins with projection geometry. Metric projection onto a closed convex set has to be treated as a mathematical object, together with the variational inequality it satisfies. I like this layer because projection is just a symbol in the algorithm, but in formalisation it needs its own geometric theory.
On top of that, the project enters projected-gradient descent: feasibility preservation, function-value convergence, O(1/N) finite-horizon bounds, projected-gradient mapping, and residual / epsilon-stationarity certificates. The residual layer is one of my favorite parts because it formalises not only decrease of the objective value, but also the common algorithmic certificate that an iterate is close to first-order optimality.
Finally, there is strong convexity and linear-rate convergence. That lets the project cover not only the basic convex case, but also the stronger linear-convergence regime.
The public theorem surface matters more this time
While doing the submodular optimization project, I realized that an Isabelle project being able to build internally is not enough. If the target is AFP or future reuse, it needs a relatively stable public surface.
So this time I intentionally added Main_Results.thy as the main entry point, instead of expecting downstream users to import many internal theory files. Main_Results collects theorem aliases that should be stable and reusable: sublinear complexity for gradient descent, sublinear complexity for projected gradient descent, optimality certificates for the projected-gradient mapping, epsilon-stationarity complexity, and distance / function-value linear complexity in the strongly convex case.
This looks like naming and cleanup, but it matters a lot. Once a formalisation becomes library-like, internal proof files will eventually be refactored. If downstream users depend directly on internal structure, every refactor drags them along. A public API separates how I prove things internally from how others should use the results.
How this relates to the submodular project
If the submodular greedy project was my first systematic encounter with algorithmic theorem formalisation, this projected-gradient descent project feels like a step toward optimization library building.
In the submodular project, the core question was how to separate greedy construction, marginal gains, submodularity, and approximation recurrences, and how to let a more implementation-level variant such as LazyGreedy inherit the classical guarantee.
Here the core question becomes how to separate convexity, smoothness, projection geometry, descent lemmas, residual certificates, strong convexity, and convergence rates into reusable Isabelle components.
The mathematical objects differ, but the formalisation mindset is the same. Both force me to ask how much unspoken structure a standard proof actually depends on.
That is also why I increasingly like Isabelle. It does not only check the final theorem; it forces the interfaces, assumptions, and dependencies of the whole proof route to become visible. Often the research value is not only the final theorem statement, but the architecture that formalisation forces into the open.
Current status
The repository currently contains the main infrastructure for smooth convex first-order methods, with projected-gradient descent as the central application. It includes:
- interfaces for gradients and convex differentiability;
- a bridge between smooth upper bounds and Lipschitz smoothness;
- algorithm-independent descent, telescoping, and averaging lemmas;
- one-step estimates and
O(1/N)convergence for gradient descent; - projection geometry and projected-gradient descent;
- projected-gradient mapping and first-order optimality certificates;
- residual convergence and epsilon-stationarity complexity;
- distance-gap estimates and linear-rate convergence under strong convexity;
- quadratic and projected quadratic examples;
Main_Results.thyas the public theorem interface for downstream use;- AFP-oriented session, document, and bibliography structure.
At this stage I do not want to keep adding algorithms aggressively. The more important task is to make the current entry stable enough: names, documentation, public aliases, build hygiene, the AFP document, and the dependency structure between theory files all need to stay clean.
Next
I increasingly feel that formal proof is not just an isolated skill for me, but a way of doing research. When I read algorithm proofs, write optimization theory, or design abstractions, it makes me ask more naturally: what does this conclusion really depend on, which assumptions can be abstracted away, which proof structures can be reused, and if someone builds on this later, what should they import, trust, and avoid touching?
Projected-gradient descent is of course classical, but organizing it as reusable convergence infrastructure inside Isabelle/HOL feels like the natural next step.
Project link: https://github.com/lyuf09/projected-gradient-descent