Matrix Base
0.1 Determinant
Theorem: $ det(AB) = det(A)det(B) $
Let $ A, B $ be a square matrices of order n.
Let $ det(A) $ be the determinant of $ A $.
Let $ AB $ be the matrix product of $ A $ and $ B $.
Then:
$ det(AB) = det(A)det(B) $
Proof
The proof I provide is based on Determinant of Matrix Product
Consider two cases:
$ A $ is singular.
$ A $ is nonsingular.
proof of case1
Assume $ A $ is singular.
Then:
$$ det(A) = 0 $$
Also if A is singular then so is AB.
Indeed, if $ AB $ has an inverse $ C $, then:
$$ ABC = E $$
whereby $ BC $ is a right inverse of $ A $.
It follows by Left or Right Inverse of Matrix is Inverse that in that case $ BC $ is the inverse of $ A $. This contradicts the assumption.
It follows that:
$$ det(AB)=0 $$
Thus:
$$ 0 = 0 \times det(B) $$
that is:
$$ det(AB) = 0 = det(A)det(B) $$
proof of case2
Assume $ A $ is nonsingular.
Then $ A $ is a product of elementary row matrices, $E $.
Let $ A=E_kE_{k−1}⋯E_1 $.
So:
$$ det(AB) = det(E_kE_{k−1}⋯E_1B) $$
It remains to be shown that for any square matrix $ D $ of order n:
$$ det(ED) = det(E) det(D) $$
$ det(ED) = -|D| = det(E) det(D) $ for swap 2 rows
$ det(ED) = k|D| = det(E) det(D) $ for apply k to 1 row
$ det(ED) = |D| = det(E) det(D) $ for row(i) + k x row(j)
Then:
$ det(AB) = det(E_kE_{k−1}⋯E_1B) $
$ det(AB) = det(E_k) det(E_{k−1}⋯E_1(B)) $
$ det(AB) = \alpha det(B) $
and:
$ det(A) = det(E_kE_{k−1}⋯E_1I) $
$ = det(E_kE_{k−1}⋯E_1(I)) $
$ = \alpha det(I) $
Therefore:
$ det(AB)=det(A)det(B) $
0.2 Real Symmetric Matrix
Theorem: all real eigenvalues
If $ A $ is a (real) $ n \times n $ symmetric matrix, then $ A $ has n real eigenvalues (counted bytheir multiplicities). For each eigenvalue, we can find a real eigenvector associated with it.
Proof
The proof I provide is based on Orthogonally Diagonalizable Matrices
Theorem: orthogonally diagonalizable
A real symmetric matrix must be orthogonally diagonalizable.
Proof
The proof I provide is based on Orthogonally Diagonalizable Matrices
This is a proof by induction, and it uses some simple facts about partitioned matrices and change of coordinates.
This is obviously true for every $ 1 \times 1 $ matrix $ A $ if $ A = [a] $, then $ E_1^{-1} (a) E_1 = (a) $
Assume now that every $ (n - 1) \times (n - 1) $ real symmetric matrix is orthogonally diagonalizable.
Consider an $ n \times n $ real symmetric matrix $ A $ where $ n \gt 1 $. By the preceding theorem, we can find a real eigenvalue $ \lambda $ of $ A $, together with a real eigenvector $ v_1 $. By normalizing, we can assume $ v_1 $ is a unit eigenvector. Add vectors to extend $ (v_1) $ to a basis for $ R^n $ and then use the Gram Schmidt process to get an orthonormal basis for $ R^n $: $ (v_1, v_2, …, v_n) $.
Let
$ T_1 = (v_1, v_2, …, v_n) $.
$ T_1 $ is orthogonal.
Now look that
$$ T_1^{-1} A T_1 = T_1^{-1} (A v_1, A v_2, …, A v_n) $$
$$ T_1^{-1} A T_1 = (T_1^{-1} \lambda_1 v_1, T_1^{-1} A v_2, …, T_1^{-1} A v_n) $$
Because $ T_1^{-1} T_1 = E $, we have
$$ T_1^{-1} (v_1, v_2, …, v_n) = (\epsilon_1, \epsilon_2, …, \epsilon_n) $$
Thus, $ T_1^{-1} v_1 = \epsilon_1 $, So column 1 of $ T_1^{-1} A T_1 $ is $ \lambda_1 \epsilon_1 $. Suppose now that
$$ T_1^{-1} A T_1 = \begin{pmatrix} \lambda_1 & \alpha \\ \mathbf{0} & \beta \end{pmatrix} $$
$ T_1^{-1} A T_1 $ is symmetric, because
$$ (T_1^{-1} A T_1)^{\top} = (T_1^{\top} A T_1)^{\top} = T_1^{\top} A^{\top} T_1^{\top \top} = T_1^{\top} A T_1 = T_1^{-1} A T_1 $$
So $ \alpha = \mathbf{0} $, and $ \beta $ is a real symmetric matrix. By the induction hypothesis, $ \beta $ is diagonalizable. Thus, we has $ T_2 $ such that
$$ T_2^{-1} \beta T_2 = diag\{\lambda_2, \lambda_3, …, \lambda_n\} $$
Let
$$ T = T_1 \begin{pmatrix} 1 & \mathbf{0} \\ \mathbf{0} & T_2 \end{pmatrix} $$
$$ T^{-1} A T = \begin{pmatrix} 1 & \mathbf{0} \\ \mathbf{0} & T_2 \end{pmatrix}^{-1} T_1^{-1} A T_1 \begin{pmatrix} 1 & \mathbf{0} \\ \mathbf{0} & T_2 \end{pmatrix} $$
$$ T^{-1} A T = \begin{pmatrix} 1 & \mathbf{0} \\ \mathbf{0} & T_2^{-1} \end{pmatrix} \begin{pmatrix} \lambda_1 & \mathbf{0} \\ \mathbf{0} & \beta \end{pmatrix} \begin{pmatrix} 1 & \mathbf{0} \\ \mathbf{0} & T_2 \end{pmatrix} $$
$$ T^{-1} A T = \begin{pmatrix} \lambda_1 & \mathbf{0} \\ \mathbf{0} & T_2^{-1} \beta T_2 \end{pmatrix} $$
$$ T^{-1} A T = diag\{\lambda_1, \lambda_2, …, \lambda_n\} $$
Therefore, a real symmetric matrix must be orthogonally diagonalizable.
Theorem: r-fold root, $ A - \lambda E $ has a rank of $ n - r $
If $ A $ is a real symmetric matrix of order n. $ \lambda $ is an r-fold root of its characteristic polynomial
Then the matrix $ A - \lambda E $ has a rank of $ n - r $. This implies that there are exactly $ r $ linearly independent eigenvectors associated with the eigenvalue $ \lambda $.
Proof
The proof I provide is based on Yiwen’s Zhihu Answer
Suppose that $ A’s $ r -fold root is $ \lambda_k $.
Because $ A $ is real symmetric matrix, so we have orthogonal matrix $ P $ such that:
$$ P^{-1} A P = \begin{pmatrix} \lambda_1 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & 0 \\ 0 & \lambda_2 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \blue{\lambda_{k}} & 0 & 0 & \cdots & 0 & \cdots & 0 \\ 0 & 0 & 0 & 0 & 0 & \blue{\lambda_{k}} & 0 & \cdots & 0 & \cdots & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \blue{\lambda_{k}} & \cdots & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & \blue{\lambda_{k}} & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & \lambda_n \end{pmatrix} $$
We don’t care whether $ \lambda_i $ is 0 or not.
For matrix $ A - \lambda_k E $, we have:
$$ P^{-1} (A - \lambda_k E) P = P^{-1} A P - \lambda_k P^{-1} E P = $$
$$ \varLambda - \lambda_k E = \begin{pmatrix} \lambda_1 - \blue{\lambda_{k}} & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & 0 \\ 0 & \lambda_2 - \blue{\lambda_{k}} & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \blue{0} & 0 & 0 & \cdots & 0 & \cdots & 0 \\ 0 & 0 & 0 & 0 & 0 & \blue{0} & 0 & \cdots & 0 & \cdots & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \blue{0} & \cdots & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & \blue{0} & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & \cdots & \lambda_n - \blue{\lambda_{k}} \end{pmatrix} $$
$\because \lambda_i \neq \lambda_k \Rightarrow \lambda_i - \lambda_k \neq 0, (i \neq k) $
$ \therefore rank(A - \lambda_k E) = rank(\varLambda - \lambda_k E) = n - r $
0.3 Vector spaces
$ A \in M_{m,n}(\mathbf{F}) $ as a linear transformation $ x \rightarrow Ax $ from $ F^n $ to $ F^m $. The domain of this linear transformation is $ F^n $; its range is $ range A = \{y \in F^m : y =Ax \} $ for some $ x \in F^n $; its null space is $ nullspace A = {x \in F^n : Ax = 0} $. The range of $ A $ is a subspace of $ F^m $, and the null space of $ A $ is a subspace of $ F^n $. The dimension of $ nullspace A $ is denoted by $ nullity A $; the dimension of $ range A $ is denoted by $ rank A $.
To avoid ambiguities,
The range of a linear transformation can be denoted as $ \operatorname{range} A $, $ \operatorname{im} A $.
The dimension of the range of a linear transformation can be denoted as $ \operatorname{rank} A $.
The null space of a linear transformation can be denoted as $ \operatorname{nullspace} A $, $ \operatorname{ker} A $.
The dimension of the null space of a linear transformation can be denoted as $ \operatorname{nullity} A $.
The dimension of $ \operatorname{range} A $ is denoted by $ \operatorname{rank} A $
The proof I provide is based on rank-of-matrix-equals-dimension-of-range
Suppose that we are given that the $ A \in M_{m, n}(\mathbf{F}) $ has column-rank (and therefore coincident row-rank) $ r $. That is, the maximal set of linearly independent columns of $ A $ contains $ r $ vectors. It follows that the span of the columns of $ A $, henceforth the column space of $ A $, is an r-dimensional subspace of $ R^m $.
Each column of $ A $ is a vector with $ m $ entries, so the columns live in $ F^m $. If $ n \gt m $, These columns must be dependent, so their dimension $ r \leq m $. Hence, the column space of $ A $ is a subspace of $ F^m $.
Let $ a_1, \cdots, a_n $ denote the columns of $ A $. Consider an arbitrary element $ y $ inside the column space of $ A $. By definition, this mean that there exist coefficients $ x_1, \cdots, x_n $ such that
$$ y = a_1 x_1 + a_2 x_2 + \cdots + a_n x_n $$
Note that we can rewrite the above sum as a product. In particular, we have
$$ y = \begin{pmatrix} a_1 & a_2 & \cdots & a_n \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = A x $$
where $ x = (x_1, x_2, \cdots, x_n)^{\top} \in R^n $. So, any element $ y $ from the column space of $ A $ is also an element of the range of $ A $.
So the dimension of $ range A $ is denoted by $ rank A $.
Rank-nullity theorem
Matrices
For $ A \in M_{m, n}(\mathbf{F}) $, we have:
$$ rank(A) + nullity(A) = n $$
Proof
The proof I provide is based on The Rank-Nullity Theorem.
If $ rank(A) = n $, then by the Invertible Matrix Theorem, the only solution to $ Ax = 0 $ is the trivial solution $ x = 0 $. Hence, in this case, $ nullspace(A) = {0} $, so $ nullity(A) = 0 $ and Equation holds.
Now suppose $ rank(A) = r \lt n $. In this case, there are $ n − r > 0 $ free variables in the solution to $ Ax = 0 $. Let $ t_{1}, t_{2},…,t_{n−r} $ denote these free variables (chosen as those variables not attached to a leading one in any row-echelon form of $ A $), and let $ x_1, x_2,…, x_{n−r} $ denote the solutions obtained by sequentially setting each free variable to 1 and the remaining free variables to zero. Note that $ \{ x_1, x_2,…, x_{n−r} \} $ is linearly independent. Moreover, every solution to $ Ax = 0 $ is a linear combination of $ x_1, x_2,…, x_{n−r} $:
$$ x = t_1x_1 + t_2x_2 +···+ t_{n−r}x_{n−r}, $$
which shows that $ \{ x_1, x_2,…, x_{n−r} \} $ spans $nullspace(A) $. Thus, $ \{ x_1, x_2,…, x_{n−r} \} $ is a basis for nullspace(A), so $ nullity(A) = n − r $ and Equation holds.
Linear transformations
Let $ T:V \rightarrow W $ be a linear transformation between two vector spaces where $ T $ ’s domain $ V $ is finite dimensional. Then
$$ rank(T) + nullity(T) = dim V $$
Proof
The proof I provide is based on Rank–nullity theorem.
Let $ V, W $ be vector spaces over some field $ F $, and $ T $ defined as in the statement of the theorem with $ \dim V = n $.
As $ \operatorname{Ker}T \subset V $ is a subspace, there exists a basis for it. Suppose $ \dim\operatorname{Ker}T = k $ and let $ \mathcal{K} := \{v_1, \ldots, v_k\} \subset \operatorname{Ker}(T) $ be such a basis.
We may now, by the Steinitz exchange lemma, extend $ \mathcal{K} $ with $ n-k $ linearly independent vectors $ w_1, \ldots, w_{n-k} $ to form a full basis of $ V $.
Let
$ \mathcal{S} := \{w_1, \ldots, w_{n-k}\} \subset V \setminus \operatorname{Ker}(T) $
such that
$ \mathcal{B} := \mathcal{K} \cup \mathcal{S} = \{v_1, \ldots, v_k, w_1, \ldots, w_{n-k}\} \subset V $
is a basis for $ V $. From this, we know that
$$ \operatorname{Im} T = \operatorname{Span}T(\mathcal{B}) = \operatorname{Span}\{T(v_1), \ldots, T(v_k), T(w_1), \ldots, T(w_{n-k})\} = \operatorname{Span}{T(w_1), \ldots, T(w_{n-k})} = \operatorname{Span}T(\mathcal{S}). $$
We now claim that $ T(\mathcal{S}) $ is a basis for $ \operatorname{Im} T $. The above equality already states that $ T(\mathcal{S}) $ is a generating set for $ \operatorname{Im} T $; it remains to be shown that it is also linearly independent to conclude that it is a basis.
Suppose $ T(\mathcal{S}) $ is not linearly independent, and let
$ \sum_{j=1}^{n-k} \alpha _j T(w_j) = 0_W $
for some $ \alpha _j \in F $.
Thus, owing to the linearity of $ T $, it follows that
$$ T \left(\sum_{j=1}^{n-k} \alpha_j w_j \right) = 0_W \implies \left(\sum_{j=1}^{n-k} \alpha_j w_j \right) \in \operatorname{Ker} T = \operatorname{Span} \mathcal{K} \subset V. $$
This is a contradiction to $ \mathcal{B} $ being a basis, unless all $ \alpha _j $ are equal to zero. This shows that $ T(\mathcal{S}) $ is linearly independent, and more specifically that it is a basis for $ \operatorname{Im}T $.
To summarize, we have $ \mathcal{K} $, a basis for $ \operatorname{Ker}T $, and $ T(\mathcal{S}) $, a basis for $ \operatorname{Im}T $.
Finally we may state that
$$ \operatorname{Rank}(T) + \operatorname{Nullity}(T) = \dim \operatorname{Im} T + \dim \operatorname{Ker}T = |T(\mathcal{S})| + |\mathcal{K}| = (n-k) + k = n = \dim V . $$
This concludes our proof.
Application: $ \dim C(AB) = \dim C(B) - \dim ( \operatorname{nullspace} (A) \cap C(B)) $
The application I provide is based on $ \dim C(AB) = \dim C(B) - \dim ( \operatorname{nullspace} (A) \cap C(B)) $
considering the map $ T: C(B) \rightarrow C(AB) $ given by $ T(y) = Ay $, for all $ y \in C(B) $.
the kernel of $ T $ is $ \operatorname{nullspace} (A) \cap C(B) $. Indeed, $ y \in \operatorname{Ker} (T) $ if and only if $ y \in C(B) $ and $ A y = 0 $, if and only if $ y \in \operatorname{nullspace} (A) \cap C(B) $.
Now use the rank-nullity theorem, we have
$$ \operatorname{Rank}(T) + \operatorname{Nullity}(T) = \dim C(AB) + \dim (\operatorname{nullspace} (A) \cap C(B)) = \dim C(B) = \dim V. $$
$$ \dim C(AB) = \dim C(B) - \dim (\operatorname{nullspace} (A) \cap C(B)) $$
0.4 Rank
Sylvester inequality
If $ A \in M_{m,k} (F) $ and $ B \in M_{k,n} (F) $, then
$$ (rank A + rank B) - k \leq rank (AB) $$
Proof
The proof I provide is based on Prove Sylvester rank inequality.
If we prove that $ dim ker A + dim ker B \geq dim ker (AB) $. By using the rank-nullity theorem, we can then deduce that $ (rank A + rank B) - k \leq rank (AB) $.
Firstly, we show that $ kerB \subseteq ker(AB) $.
Any $ x \in kerB $, we have:
$$ B x = 0 $$
Now, applying the matrix $ A $ to both sides of this equation:
$$ A B x = A 0 = 0 \Rightarrow x \in ker(AB) $$
Thus $ kerB \subseteq ker(AB) $.
Then we show that $ dim ker A + dim ker B \geq dim ker (AB) $.
Let $ \beta = \{\alpha_1,…,\alpha_r\} $ be a basis for $ kerB $. Because $ kerB \subseteq ker(AB) $, so we can extend $ \beta $ to a basis for $ ker(AB) $. Suppose $ \{\alpha_1, \cdots, \alpha_r, \alpha_{r+1}, \cdots, \alpha_n \} $ be basis for $ ker(AB) $. Since $ \alpha_{i + 1}, \cdots, \alpha_n \notin kerB $, we have that $ B \alpha_i \neq 0 $ for $ i \in \{ r \lt i \lt n + 1 \} $.
We show that $ \{B \alpha_{r+1}, \cdots, B \alpha_n \} $ is linear independent. If we can show that, then we would have $ dim ker A \geq n - r $. (Note: $ dim ker A \neq dim ker (AB) $, Because $ B \alpha_i = 0 $, for $ i \in \{ 1 \leq i \leq r \} $, if we add $ B \alpha_1 $ to $ \{B \alpha_{r+1}, \cdots, B \alpha_n \} $, we would have $ \{ B \alpha_1, B \alpha_{r+1}, \cdots, B \alpha_n \} $ is linear dependent. So we could only say $ dim ker A \geq n - r $.)
Assume that there exist scalars $ \lambda_1, \cdots, \lambda_n $, not all zero, such that $ \sum_{i = r + 1}^n \lambda_i B \alpha_i = 0 $. Since $ B $ is linear, we have $ B \sum_{i = r + 1}^n \lambda_i \alpha_i = 0 $, so that $ \sum_{i = r + 1}^n \lambda_i \alpha_i $ belongs to the kernel of $ B $. On other hand, we already know that $ \beta = \{\alpha_1,…,\alpha_r\} $ be a basis for $ kerB $. Next since the set $ \{\alpha_1, \cdots, \alpha_r, \alpha_{r+1}, \cdots, \alpha_n \} $ is an independent set, we infer that $ \lambda_i $ must be zero for all $ i = r + 1, \cdots, n $.
Now one can see that
$$ dim ker A + dim ker B \geq n - r + r = n \Rightarrow dim ker A + dim ker B \geq dim ker (AB) $$
Using the rank-nullity theorem, we have
$$ k - rank A + n - rank B \geq n - rank (AB) \Rightarrow (rank A + rank B) - k \leq rank (AB) $$
Frobenius inequality
Let $ A \in M_{m \times k}(\bf F) $, $ B \in M_{k \times p}(\bf F) $ and $ C \in M_{p \times n}(\bf F) $, then $ \operatorname{rank}(AB) + \operatorname{rank}(BC) \leq \operatorname{rank}(B) + \operatorname{rank}(ABC) $
Proof
The proof I provide is based on Frobenius rank inequality.
We can use the rank-nullity theorem to show that
$$ \tag{1} \operatorname{rank}(AB) = \operatorname{rank}(B) - \dim (\operatorname{im}(B) \cap \operatorname{ker} (A)) $$
Since $ \operatorname{im}(BC) \subseteq \operatorname{im}(B) $, we have
$$ \tag{2} \operatorname{im}(BC) \cap \operatorname{ker}(A) \subseteq \operatorname{im}(B) \cap \operatorname{ker}(A) $$
Now we want to write $ \operatorname{rank}(ABC) $ in such a way that $ \operatorname{im}(BC) \cap \operatorname{ker}(A) $ pops up, so we could make use of (2). Analogously to (1):
$$ \tag{3} \operatorname{rank} (ABC) = \operatorname{rank} (BC) - \dim (\operatorname{im} (BC) \cap \operatorname{ker} (A)) $$
From (1) and (3), we have
$$ \operatorname{rank}(AB) + \operatorname{rank}(BC) = \operatorname{rank}(B) + \operatorname{rank}(ABC) + \underbrace{ \dim (\operatorname{im} (BC) \cap \operatorname{ker} (A)) - \dim (\operatorname{im}(B) \cap \operatorname{ker} (A)) }_{\leq 0 \text{ due to (2)}} $$
which implies the desired inequality.
Left or right multiplication by a nonsingular matrix leaves rank unchanged
If $ A \in M_m(\bf F) $ and $ C \in M_n(\bf F) $ are nonsingular and $ B \in M_{m,n}(\bf F) $, then $ \operatorname{rank} AB = \operatorname{rank} B = \operatorname{rank} BC = \operatorname{rank} ABC $; that is, left or right multiplication by a nonsingular matrix leaves rank unchanged.
Proof
Firstly, we show that $ \operatorname{rank} (B) = \operatorname{rank} (BC) $.
The proof is based on How to prove that $ \operatorname{im}(B) = \operatorname{im}(BA)$?.
We know that $ C $ is a linear transformation from $ \bf F^n $ to $ \operatorname{im} (C) $. Since $ C $ is nonsingular, it follows that $ C $ is onto, implying $ \operatorname{im} (C) = \bf F^n $.
Now consider the linear transformation $ BC $. This composition can be viewed in two stages:
The transformation $ x \rightarrow Cx $, where $ C: \bf F^n \rightarrow \bf F^n $.
The transformation $ Cx \rightarrow BCx $, where $ B: \bf F^n $ to $ \operatorname{im} (BC) $.
$ Cx $ spans all of $ \bf F^n $, which aligns with the domain of $ B $. Thus, the transformation $ Cx \rightarrow BCx $ behaves exactly as $ B $, mapping $ \bf F^n $ to $ \operatorname{im} (B) $.
Hence, $ \operatorname{im} (BC) = \operatorname{im} (B) $.
This concludes our proof.
Then we show that $ \operatorname{rank} (AB) = \operatorname{rank} (B) $.
We know that $ A $ is a linear transformation from $ \bf F^m $ to $ \operatorname{im} (A) $. Since $ A $ is nonsingular, it follows that $ A $ is onto, meaning $ \operatorname{im} (A) = \bf F^m $. Furthermore, because $ \dim(\operatorname{im}(A)) = \dim(\operatorname{domain}(A)) = m $, $ A $ is injective as well.
Now consider the linear transformation $ AB $. This composition can be viewed in two stages:
The transformation $ x \rightarrow Bx $, where $ B: \bf F^n \rightarrow \operatorname{im} (B) $.
Then $ Bx \rightarrow ABx $, where $ A: \operatorname{im} (B) $ to $ \operatorname{im} (AB) $.
Since A is injective, $ \dim (\operatorname{im} (AB)) = \dim (\operatorname{im} (B)) $.
This concludes our proof.
If $ A \in M_{m,n}(\bf C) $, then $ \operatorname{rank} A^* A = \operatorname{rank} A $
Proof
The proof I provide is based on Prove $\operatorname{rank}A^TA=\operatorname{rank}A$.
Let $ x \in \operatorname{nullspace} (A) $.
$$ A x = 0 $$ $$ \Rightarrow A^* A x = 0 $$ $$ \Rightarrow x \in \operatorname{nullspace} (A^* A) $$
Hence, $ \operatorname{nullspace} (A) \subseteq \operatorname{nullspace} (A^* A) $.
Agian let $ x \in \operatorname{nullspace} (A^* A) $.
$$ A^* A x = 0 $$ $$ \Rightarrow x^* A^* A x = 0 $$ $$ \Rightarrow (A x)^* A x = 0 $$ $$ \Rightarrow A x = 0 $$ $$ \Rightarrow x \in \operatorname{nullspace} (A) $$
Hence, $ \operatorname{nullspace} (A^* A) \subseteq \operatorname{nullspace} (A) $.
Therefore, $$ \operatorname{nullspace} (A) = \operatorname{nullspace} (A^* A) $$ $$ \dim (\operatorname{nullspace} (A)) = \dim (\operatorname{nullspace} (A^* A)) $$ $$ \dim (\operatorname{domain} (A)) - \operatorname{rank} (A) = \dim (\operatorname{domain} (A^* A)) - \operatorname{rank} (A^* A) $$ $$ \operatorname{rank} (A) = \operatorname{rank} (A^* A) $$
CR Factorization
Let $ A \in M_{m,n}(\bf F) $, $ \operatorname{rank} (A) = r $. Suppose $ C $ contains the first $ r $ independent columns of $ A $. Suppose $ R $ contains the $ r $ nonzero rows of $ \operatorname{rref} (A) $. Then $ A = C R $.
Now suppose the matrix $ B $ contains the first $ r $ independent rows of $ A $. $ W $ is the $ r $ by $ r $ matrix where $ C $ meets $ B $. Then $ A = C W^{-1} B $.
The establishment I provided is based on LU and CR Elimination.
Establish $ A = C R $
Here are the steps to establish $ A = C R $. We know that an invertible elimination matrix $ E $ (a product of simple steps) gives $ E A = R_0 = \operatorname{rref} (A) $. Then $ A = E^{-1} R_0 $. Drop the $ m - r $ zero rows of $ R_0 $ and the last $ m - r $ columns of $ E^{-1} $. This leaves $ A = C \begin{bmatrix} I & F \end{bmatrix} P $, where the identity matrix in $ R $ allows us to identify $ C $ in the columns of $ E^{-1} $.
Establish $ A = C W^{-1} B $
Suppose the $ r $ independent columns of $ A $ in the first $ r $ columns, and there are $ r $ independent rows of $ A $ in the first $ r $ rows.
Let $ A = \begin{bmatrix} W & H \\ J & K \end{bmatrix} $, $ C = \begin{bmatrix} W \\ J \end{bmatrix} $, $ B = \begin{bmatrix} W & H \end{bmatrix} $.
Combinations $ V $ of the rows of $ B $ must produce the dependent rows in $ \begin{bmatrix} J & K \end{bmatrix} $. Then $ \begin{bmatrix} J & K \end{bmatrix} = V B = \begin{bmatrix} V W & V H \end{bmatrix} $. And $ C = \begin{bmatrix} I \\ V \end{bmatrix} W $.
Combinations $ T $ of the columns of $ A $ must produce the dependent columns in $ \begin{bmatrix} H \\ K \end{bmatrix} $. Then $ \begin{bmatrix} H \\ K \end{bmatrix} = C T = \begin{bmatrix} W T \\ J T \end{bmatrix} $. And $ B = W \begin{bmatrix} I & T \end{bmatrix} $.
$$ A = \begin{bmatrix} W & H \\ J & K \end{bmatrix} = \begin{bmatrix} W & H \\ VW & VH \end{bmatrix} = \begin{bmatrix} W & WT \\ VW & VWT \end{bmatrix} = \begin{bmatrix} I \\ V \end{bmatrix} \begin{bmatrix} W \end{bmatrix} \begin{bmatrix} I & T \end{bmatrix} = C W^{-1} B $$
Wedderburn’s rank-one reduction formula
Let $ A \in M_{m,n}(\bf F) $. If $ x \in \bf F^n $ and $ y \in \bf F^m $, and if
$$ \omega = y^{\top} A x \neq 0. $$
Then
$$ \operatorname{rank} (A - \omega^{-1} A x y^{\top} A ) = \operatorname{rank} (A) - 1. $$
In general, if $ X \in M_{n,k} (\bf F) $ and $ Y \in M_{m,k} (\bf F) $, and if
$$ W = Y^{\top} A X $$
is nonsingular, then
$$ \operatorname{rank} (A - A X W^{-1} Y^{\top} A) = \operatorname{rank} A - k $$
Proof
The proof I provide is based on Generalized Wedderburn Rank Reduction.
Proof of rank-one reduction formula
For any $ v $ in $ \operatorname{nullspace} (A) $, $$ A v = 0 \Rightarrow (A - \omega^{-1} A x y^{\top} A) v = 0 $$
Hence, $ \operatorname{nullspace} (A) \subseteq \operatorname{nullspace} (A - \omega^{-1} A x y^{\top} A) $.
Since $ \omega = y^{\top} A x \neq 0 \Rightarrow A x \neq 0 \Rightarrow x \notin \operatorname{nullspace} (A) $, but
$$ (A - \omega^{-1} A x y^{\top} A) x = (A - A) = 0 \Rightarrow x \in \operatorname{nullspace} (A - \omega^{-1} A x y^{\top} A) . $$
Hence, $ \operatorname{nullity} (A - \omega^{-1} A x y^{\top} A) - 1 \geq \operatorname{nullity} (A) $. Also, $ \dim (\operatorname{domain} (A)) = \dim (\operatorname{domain} (A - \omega^{-1} A x y^{\top} A)) $.
Therefore
$$ \operatorname{rank} (A - \omega^{-1} A x y^{\top} A) \leq \operatorname{rank} (A) - 1. $$
Assume now $ (A - \omega^{-1} A x y^{\top} A) u = 0 $. Let $ \lambda = \omega^{-1} y^{\top} A u $. then
$$ A (u - \lambda x) = 0 $$
hence $ u \in \operatorname{nullspace} (A) + \bf F x $. Therefore
$$ \operatorname{rank} (A - \omega^{-1} A x y^{\top} A) = \operatorname{rank} (A) - 1. $$
Note that:
$ A x $: A vector in the column space of $ A $.
$ y^{\top} A $: A row vector in the row space of $ A $.
$ (Ax)(y^{\top} A) $: A rank-1 matrix since it’s the outer product of a vector $ (A x) $ and a row vector $ (y^{\top} A) $.
Thus, the term $ A x y^{\top} A $ represents a modification to $ A $ in the direction of a rank-1 update.
Proof of general formula
The proof is similar.
Since $ \operatorname{rank} (M) = k $, the $ k $ columns of $ X $ are linearly independent, and $ A X \in \bf F^{m \times k} $ is of rank $ k $, so no linear combination of columns of $ X $ is contained in the nullspace of $ A $. But we have
$$ ( A - A X M^{-1} Y^{\top} A ) X = 0 .$$
We know that $ \operatorname{nullspace} (A) \subseteq \operatorname{nullspace} (A - A X M^{-1} Y^{\top} A) $, so
$$ \operatorname{rank} (A - A X M^{-1} Y^{\top} A) = n - \dim(\operatorname{nullspace} (A - A X M^{-1} Y^{\top} A)) \leq n - ((n - \operatorname{rank} (A)) + k) = \operatorname{rank} (A) - k. $$
Let $ U \in M_{n \times k} (\bf F) $ be any matrix such that
$$ \operatorname{rank} (A - A X M^{-1} Y^{\top} A) U = 0. $$
Let $ \Lambda = M^{-1} Y^{\top} A U $, then
$$ A (U - X \Lambda) = 0, $$
hence
$$ U \in \operatorname{nullspace} (A) + X \Lambda. $$
This implies that
$$ \operatorname{nullspace} (A - A X M^{-1} Y^{\top} A) \subseteq \operatorname{nullspace} (A) + \mathcal{C} (X) \cong \operatorname{nullspace} (A) \oplus \mathcal{C} (X) ,$$
and finally,
$$ \operatorname{rank} (A - A X M^{-1} Y^{\top} A) = \operatorname{rank} (A) - k. $$
0.5 The Euclidean inner product
Coordinates form of inner product and $ cos(\theta) $
The inner product $ <F, s> $ is proposed to descrip the work done by force $ F $ acting along a displacement $ s $.
$$ W = <F, s> = |F| \sdot |s| \sdot cos(\theta) $$
From this, the cosine of the angle $ \theta $ between $ F $ and $ s $ can be expressed as:
$$ cos(\theta) = \frac{<F, s>}{|F| \sdot |s|} $$
In coordinates, the inner product $ <F, s> $ is defined as:
$$ <F, s> = s^* F = s_x F_x + s_y F_y + s_z F_z $$
This decomposition shows that the work done in each direction corresponds to the component-wise product: The work in the $ x $ direction is $ s_x F_x $, the work in the $ y $ direction is $ s_y F_y $, and the work in the $ z $ direction is $ s_z F_z $.
Thus, the coordinate form provides a clear and meaningful breakdown of the total work into contributions from each axis.
0.6 Determinants again
The Cauchy-Binet formula
Let $ A \in M_{m,n} (\bf F) $, $ B \in M_{n,m} (\bf F) $, and $ C = A B $.
Cauchy-Binet formula expresses determinant $ | C | $ in terms of the minors of $ A $ and $ B $:
$$ \tag{1} \begin{vmatrix} c_{11} & \cdots & c_{1m} \\ \vdots & \ddots & \vdots \\ c_{m1} & \cdots & c_{mm} \end{vmatrix} = \sum_{1 \leq k_1 < k_2 < \cdots < k_m \leq n} \begin{vmatrix} a_{1 k_1} & \cdots & a_{1 k_m} \\ \vdots & \ddots & \vdots \\ a_{m k_1} & \cdots & a_{m k_m} \end{vmatrix} \begin{vmatrix} b_{k_1 1} & \cdots & b_{k_1 m} \\ \vdots & \ddots & \vdots \\ b_{k_m 1} & \cdots & b_{k_m m} \end{vmatrix} $$
or in a concise notation,
$$ \tag{1’} C \begin{pmatrix} 1 & 2 & \cdots & m \\ 1 & 2 & \cdots & m \end{pmatrix} = \sum_{1 \leq k_1 < k_2 < \cdots < k_m \leq n} A \begin{pmatrix} 1 & 2 & \cdots & m \\ k_1 & k_2 & \cdots & k_m \end{pmatrix} B \begin{pmatrix} k_1 & k_2 & \cdots & k_m \\ 1 & 2 & \cdots & m \end{pmatrix} $$
Derivation
The derivation I provide is based on Volume 1 of Gantmacher’s classic The Theory of Matrices, in chapter 1 $ \S $ 2.
The determinant of $ C $ can be represented in the form
$$ \begin{vmatrix} c_{11} & \cdots & c_{1m} \\ \vdots & \ddots & \vdots \\ c_{m1} & \cdots & c_{mm} \end{vmatrix} = \begin{vmatrix} \sum_{\alpha_1 = 1}^n a_{1 \alpha_1} b_{\alpha_1 1} & \cdots & \sum_{\alpha_m = 1}^n a_{1 \alpha_m} b_{\alpha_m m} \\ \vdots & \ddots & \vdots \\ \sum_{\alpha_1 = 1}^n a_{m \alpha_1} b_{\alpha_1 1} & \cdots & \sum_{\alpha_m = 1}^n a_{m \alpha_m} b_{\alpha_m m} \end{vmatrix} $$
Expanding the determinant, we consider the multilinear property of determinants: the determinant is linear in each row. Substituting each term into the determinant:
$$ = \sum_{\alpha_1, \alpha_2, \cdots, \alpha_m = 1}^n \begin{vmatrix} a_{1 \alpha_1} b_{\alpha_1 1} & \cdots & a_{1 \alpha_m} b_{\alpha_m m} \\ \vdots & \ddots & \vdots \\ a_{m \alpha_1} b_{\alpha_1 1} & \cdots & a_{m \alpha_m} b_{\alpha_m m} \end{vmatrix} $$
Now, by the property of determinants that “multiplying a column by a number multiplies the determinant by this number,” we can factor out $ b_{\alpha_i i} $ from the $ i $ -th column of the determinant:
$$ = \sum_{\alpha_1, \alpha_2, \cdots, \alpha_m = 1}^n \begin{vmatrix} a_{1 \alpha_1} & \cdots & a_{1 \alpha_m} \\ \vdots & \ddots & \vdots \\ a_{m \alpha_1} & \cdots & a_{m \alpha_m} \end{vmatrix} b_{\alpha_1 1} b_{\alpha_2 2} \cdots b_{\alpha_m m} $$
$$ \tag{2} = \sum_{\alpha_1, \alpha_2, \cdots, \alpha_m = 1}^n A \begin{pmatrix} 1 & 2 & \cdots & m \\ \alpha_1 & \alpha_2 & \cdots & \alpha_m \end{pmatrix} b_{\alpha_1 1} b_{\alpha_2 2} \cdots b_{\alpha_m m} $$
If m > n, the matrices $ A $ and $ B $ do not have minors of order $ m $. In that case the right-hand sides of (1) and (1’) are to be replaced by zero.
Now let $ m \leq n $. Then in the sum on the right-hand side of (2) all those summands will be zero in which at least two of the subscripts $ \alpha_1, \alpha_2, \cdots, \alpha_m $ are equal. (If two columns of $ A $ are equal, its determinant is zero.) All the remaining summands of (2) can be split into groups of $ m! $ terms each by combining into one group those summands that differ from each other only in the order of the subscripts $ \alpha_1, \alpha_2, \cdots, \alpha_m $. Now within one such group the sum of the corresponding terms is
$$ \sum \varepsilon (\alpha_1, \alpha_2, \cdots, \alpha_m) A \begin{pmatrix} 1 & 2 & \cdots & m \\ k_1 & k_2 & \cdots & k_m \end{pmatrix} b_{\alpha_1 1} b_{\alpha_2 2} \cdots b_{\alpha_m m} $$
$$ = A \begin{pmatrix} 1 & 2 & \cdots & m \\ k_1 & k_2 & \cdots & k_m \end{pmatrix} \sum \varepsilon (\alpha_1, \alpha_2, \cdots, \alpha_m) b_{\alpha_1 1} b_{\alpha_2 2} \cdots b_{\alpha_m m} $$
By the Leibniz formula for determinants, the summation over all permutations can be expressed compactly. Thus, the expression becomes:
$$ = A \begin{pmatrix} 1 & 2 & \cdots & m \\ k_1 & k_2 & \cdots & k_m \end{pmatrix} B \begin{pmatrix} k_1 & k_2 & \cdots & k_m \\ 1 & 2 & \cdots & m \end{pmatrix} $$
Here, $ k_1 < k_2 < \cdots < k_m $ is the normal order of the subscripts $ \alpha_1, \alpha_2, \cdots, \alpha_m $ and $ \varepsilon (\alpha_1, \alpha_2, \cdots, \alpha_m) = (-1)^N $ where $ N $ is the number of transpositions of the indices needed to put the permutation $ \alpha_1, \alpha_2, \cdots, \alpha_m $ into normal order.
Hence from (2) we obtain (1’).
Generalization
Let $ A \in M_{m,k} (\bf F) $, $ B \in M_{k,n} (\bf F) $, and $ C = A B $. Furthermore, let $ 1 \leq r \leq min \{m, k, n\} $, and let $ \alpha \subseteq \{1, \dots, m\} $ and $ \beta \subseteq \{1, \dots, n\} $ be index sets, each of cardinality $ r $. An expression for the $ \alpha $, $ \beta $ minor of $ C $ is
$$ \operatorname{det} C [\alpha, \beta] = \sum_\gamma \operatorname{det} A [\alpha, \gamma] \operatorname{det} B [\gamma, \beta] $$
in which the sum is taken over all index sets $ \gamma \subseteq \{1, \dots, k\} $ of cardinality $ r $.
Derivation
Given that
$$ C [\alpha, \beta] = A [\alpha, [k]] B [[k], \beta] $$
and applying the Cauchy-Binet formula, we arrive at the generalized form above.
The Sylvester-Franke Theorem
If $ A \in M_n $ and $ 1 \leq k \leq n $, then $ \operatorname{det} C_k(A) = (\operatorname{det} A)^e $, where $ e = {\begin{pmatrix} n - 1 \\ k - 1\end{pmatrix}} $.
Proof
The proof I provide is based on Leonard Tornheim, The Sylvester-Franke Theorem, The American Mathematical Monthly.
There are three types $ E_1 $, $ E_2 $, $ E_3 $ of elementary transformations: $ E_1 $ is the multiplication of a row or column by a constant $ k $; $ E_2 $ is the interchange of two adjacent rows or columns; and $ E_3 $ is the addition of k times a row or column to another row or column. If $ B $ is transformed into $ C $ by $ E_i $, then
$$ \tag{1} \operatorname{det} C = \mu_i \operatorname{det} B, $$
where $ \mu_1 = k $, $ \mu_2 = -1 $, and $ \mu_3 = +1 $.
The proof of the theorem will be based on the following lemma.
Lemma 0.6.1 Let $ B $ be an arbitrary $ n \times n $ square matrix such that
$$ \tag{2} \operatorname{det} C_k(B) = (\operatorname{det} B)^e $$
If $ B $ is transformed into $ C $ by an elementary transformation $ E $, then $ \operatorname{det} C_k(C) = (\operatorname{det} C)^e $.
Proof of Lemma 0.6.1
We state first that
$$ \tag{3} \operatorname{det} C_k(C) = \mu_i^e \operatorname{det} C_k(B). $$
We shall prove this in detail only for $ i = 2 $; the other two cases are easier. Suppose that the $ u $-th and ($ u + 1 $)-th rows of $ B $ have been interchanged to give a matrix $ C $. The minors of $ C $ which do not involve the $ u $-th or ($ u + 1 $)-th rows equal the corresponding minors of $ B $. If both $ u $ and $ u + 1 $ appear in $ \lambda (p) $, then $ C_{\lambda (p) \lambda (q)} = -B_{\lambda (p) \lambda (q)} $. The number of rows of $ C_k (C) $ having such minors is $ {\begin{pmatrix} n - 2 \\ k - 2 \end{pmatrix}} $. Next suppose $ u $ appears in $ \lambda (p) $ but $ u + 1 $ does not. Let $ \lambda (p’) $ be obtained from $ \lambda (p) $ by replacing $ u $ by $ u + 1 $. Then $ C_{\lambda (p) \lambda (q)} = B_{\lambda (p’) \lambda (q)} $ and $ B_{\lambda (p’) \lambda (q)} = C_{\lambda (p) \lambda (q)} $ since the same elements are used and in the same arrangement, because the $ u $-th and ($ u + 1 $)-th rows are adjacent. Thus from this cause $ {\begin{pmatrix} n - 2 \\ k - 1 \end{pmatrix}} $ pairs of rows have been interchanged in going from $ C_k(B) $ to $ C_k(C) $. (There are $ 2 \times {\begin{pmatrix} n - 2 \\ k - 1 \end{pmatrix}} $ rows in $ C $ that need to be changed to match $ B $. However, these rows must be interchanged within $ C $, forming $ {\begin{pmatrix} n - 2 \\ k - 1 \end{pmatrix}} $ pairs.) Thus the total number of changes in sign in going from $ \operatorname{det} C_k(B) $ to $ \operatorname{det} C_k(C) $ is
$$ {\begin{pmatrix} n - 2 \\ k - 2 \end{pmatrix}} + {\begin{pmatrix} n - 2 \\ k - 1 \end{pmatrix}} = {\begin{pmatrix} n - 1 \\ k - 1 \end{pmatrix}} = e.$$
From (3), (1), and (2) it now follows that
$$ \operatorname{det} C_k(C) = \mu_i^e \operatorname{det} C_k(B) = \mu_i^e (\operatorname{det} B)^e = (\mu_i \operatorname{det} B)^e = (\operatorname{det} C)^e, $$
and the proof of the Lemma 0.6.1 is complete.
Proof of Sylvester-Franke Theorem
The proof of the Sylvester-Franke Theorem can now be given as follows. It is known that there exists an $ n \times n $ matrix
$$ D = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} $$
with $ 0 \leq r \leq n $.
$ D $ can be ransformed into the given $ n \times n $ matrix $ A $ by a finite sequence of elementary transformations $ E^{(1)} $, $ E^{(2)} $, $ \dots $, $ E^{(v)} $.
If $ r = n $, $ \operatorname{det} D = 1 $ and if $ r < n $, $ \operatorname{det} D = 0 $. Also if $ r = n $, $ C_k (D) = I_{\begin{pmatrix} n \\ k \end{pmatrix}} $, and $ \operatorname{det} C_k (D) = 1 $; otherwise $ \operatorname{det} C_k (D) = 0 $. Hence the relation $ \operatorname{det} C_k (D) = (\operatorname{det} D)^e $ holds in both cases. Then by the lemma, $ \operatorname{det} C_k (E^{(1)} D) = \operatorname{det} C_k (D_1) = (\operatorname{det} D_1)^e $, $ \operatorname{det} C_k (E^{(2)} D_1) = \operatorname{det} C_k (D_2) = (\operatorname{det} D_2)^e $, and so on. If follows by induction that $ \operatorname{det} C_k (E^{(v)} D_{v - 1}) = \operatorname{det} C_k (A) = (\operatorname{det} A)^e $, and the proof is complete.
Laplace expansion
Let $ n \in \mathbf{N} $. Let $ A = (a_{i,j})^{1 \leq i \leq n, 1 \leq j \leq n} $ be an $ n \times n $ matrix.
(a) For every $ p \in \{1, 2, \dots, n\} $, we have
$$ \operatorname{det} A = \sum_{q = 1}^n (-1)^{p + q} a_{p,q} \operatorname{det} (A_{\sim p, \sim q}). $$
(b) For every $ q \in \{1, 2, \dots, n\} $, we have
$$ \operatorname{det} A = \sum_{p = 1}^n (-1)^{p + q} a_{p,q} \operatorname{det} (A_{\sim p, \sim q}). $$
Proof
The proof I provide is based on 6.12. Laplace expansion, in Notes on the combinatorial fundamentals of algebra, Darij Grinberg.
Lemma 0.6.2
For every $ n \in N $, let $ [n] $ denote the set $ \{1, 2, \dots, n\}$. Let $ n \in \mathbf{N} $. For every $ p \in [n] $, we define a permutation $ g_p \in S_n $ by $ g_p = cyc_{p, p + 1, \dots, n} $ (where I am using the notations of Definition 5.37 in Notes).
(a) We have $ g_p(1), g_p(2), \dots, g_p(n - 1) = (1, 2, \dots, \hat{p}, \dots, n) $ for every $ p \in [n] $.
(b) We have $ (-1)^{g_p} = (-1)^{n-p} $ for every $ p \in [n] $.
(c) Let $ p \in [n] $. We define a map
$$ g_p^{\prime}: [n - 1] \to [n] \setminus {p} $$
by
$$ (g_p^{\prime} (i) = g_p (i) \quad \text{for every } i \in [n - 1]). $$
This map $ g_p^{\prime} $ is well-defined and bijective.
(d) Let $ p \in [n] $ and $ q \in [n] $. We define a map
$$ T: \{\tau \in S_n | \tau(n) = n\} \to \{\tau \in S_n | \tau(p) = q\} $$
by
$$ (T(\sigma) = g_q \circ \sigma \circ g_p^{-1} \quad \text{for every } \sigma \in \{\tau \in S_n | \tau(n) = n\}). $$
Then, this map $ T $ is well-defined and bijective.
Lemma 0.6.3
Let $ n \in \mathbf{N} $. Let $ A = (a_{i,j})^{1 \leq i \leq n, 1 \leq j \leq n} $ be an $ n \times n $ matrix. Let $ p \in \{1, 2, \dots, n\} $ and $ q \in \{1, 2, \dots, n\} $. Then,
$$ \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} = (-1)^{p + q} \operatorname{det} (A_{\sim p, \sim q}). $$
Proof of Lemma 0.6.3
Let us use all notations introduced in Lemma 0.6.2.
Now, the definition of $ A_{\sim p, \sim q} $ yields
$$ \begin{aligned} A_{\sim p, \sim q} &= \operatorname{sub} \substack{1, 2, \dots, \hat{q}, \dots, n \\ 1, 2, \dots, \hat{p}, \dots, n} A = \operatorname{sub} \substack{g_q(1), g_q(2), \dots, g_q(n - 1) \\ g_p(1), g_p(2), \dots, g_p(n - 1)} A \\ &= (a_{g_p(i), g_q(j)})^{1 \leq i \leq n - 1, 1 \leq j \leq n - 1} \end{aligned} $$
Now, let us recall the map $ T: \{\tau \in S_n | \tau(n) = n\} \to \{\tau \in S_n | \tau(p) = q\} $ defined in Lemma 0.6.2 (d). Lemma 0.6.2 (d) says that this map $ T $ is well-defined and bijective. Every $ \sigma \in \{\tau \in S_n | \tau(n) = n\} $ satisfies
$$ \tag{1} (-1)^{T(\sigma)} = (-1)^{p + q} \cdot (-1)^{\sigma} $$
and
$$ \tag{2} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, (T(\sigma))(i)} = \prod_{i = 1}^{n - 1} a_{g_p(i), g_q(\sigma(i))}. $$
Proof of result (1)
Applying Lemma 0.6.2 (b) to $ q $ instead of $ p $, we obtain $ (-1)^{g_q} = (-1)^{n-q} = (-1)^{n+q} $ (since $ n - q \equiv n + q \mod 2 $).
The definition of $ T(\sigma) $ yields $ T(\sigma) = g_q \circ \sigma \circ g_p^{-1} $. Thus,
$$ T(\sigma) \circ g_p = g_q \circ \sigma \circ g_p^{-1} \circ g_p = g_q \circ \sigma, $$
so that
$$ \begin{aligned} (-1)^{T(\sigma) \circ g_p} &= (-1)^{g_q \circ \sigma} = (-1)^{g_q} \cdot (-1)^{\sigma} \\ &= (-1)^{n + q} \cdot (-1)^{\sigma}. \end{aligned} $$
Compared with
$$ \begin{aligned} (-1)^{T(\sigma) \circ g_p} &= (-1)^{T(\sigma)} \cdot (-1)^{g_p} \\ &= (-1)^{T(\sigma)} \cdot (-1)^{n - p}, \end{aligned} $$
this yields
$$ (-1)^{T(\sigma)} \cdot (-1)^{n - p} = (-1)^{n + q} \cdot (-1)^{\sigma} $$
We can divide both sides of this equality by $ (-1)^{n-p} $ (since $ (-1)^{n-p} \in \{1, -1\} $ is clearly an invertible integer), and thus we obtain
$$ (-1)^{T(\sigma)} = (-1)^{p + q} \cdot (-1)^{\sigma}. $$
This proves result (1).
Proof of result (2)
Let $ \sigma \in \{\tau \in S_n | \tau(n) = n\} $. Let us recall the map $ g_p^{\prime}: [n-1] \to [n] \setminus {p} $ introduced in Lemma 0.6.2 (c). Lemma 0.6.2 (c) says that this map $ g_p^{\prime} $ is well-defined and bijective. In other words, $ g_p^{\prime} $ is a bijection. Let $ i \in [n-1] $. Then, $ g_p^{\prime} (i) = g_p(i) $ (by the definition of $ g_p^{\prime} $). Also, the definition of $ T $ yields $ T(\sigma) = g_q \circ \sigma \circ g_p^{-1} $, so That
$$ T(\sigma) (g_p^{\prime} (i)) = (g_q \circ \sigma \circ g_p^{-1}) (g_p (i)) = g_q (\sigma (g_p^{-1} (g_p (i)))) = g_q (\sigma(i)). $$
From $ g_p^{\prime} (i) = g_p (i) $ and $ (T(\sigma)) (g_p^{\prime} (i)) = g_q (\sigma(i)) $, we obtain
$$ a_{g_p^{\prime} (i), (T(\sigma)) (g_p^{\prime} (i))} = a_{g_p (i), g_q (\sigma(i))}. $$
Now, let us forget that we fixed $ i $. We thus have proven the above equation for every $ i \in [n-1] $. But now, we have
$$ \begin{aligned} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, (T(\sigma))(i)} \\ &= \prod_{\substack{i \in [n]; \\ i \neq p}} a_{i, (T(\sigma))(i)} = \prod_{i \in [n] \setminus \{p\}} a_{i, (T(\sigma))(i)} = \prod_{i \in [n - 1]} a_{g_p^{\prime} (i), (T(\sigma)) (g_p^{\prime} (i))} \\ &\quad (\text{here, we have substituted } g_p^{\prime} (i) \text{ for } i \text{, since } g_p^{\prime}: [n - 1] \to [n] \setminus \{p\} \text{ is a bijection}) \\ &= \prod_{i \in [n - 1]} a_{g_p(i), g_q(\sigma(i))}. \end{aligned} $$
This proves result (2).
Now,
$$ \begin{aligned} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} \\ &= \sum_{\substack{\sigma \in \{\tau \in S_n | \tau(p) = q\}}} (-1)^{\sigma} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} \\ &= \sum_{\substack{\sigma \in \{\tau \in S_n | \tau(n) = n\}}} (-1)^{T(\sigma)} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, (T(\sigma))(i)} \\ &\quad (\text{here, we have substituted }T(\sigma) \text{ for } \sigma \text{ in the sum, since the map } T: \{\tau \in S_n | \tau(n) = n\} \to \{\tau \in S_n | \tau(p) = q\} \text{ is a bijection}) \\ &= \sum_{\substack{\sigma \in S_n; \\ \sigma(n) = n}} (-1)^{p + q} \cdot (-1)^{\sigma} \prod_{i = 1}^{n - 1} a_{g_p(i), g_q(\sigma(i))} \\ &= (-1)^{p + q} \sum_{\substack{\sigma \in S_n; \\ \sigma(n) = n}} (-1)^{\sigma} \prod_{i = 1}^{n - 1} a_{g_p(i), g_q(\sigma(i))} \\ &= (-1)^{p + q} \det \left(a_{g_p(i), g_q(j)}\right)^{1 \leq i \leq n - 1, 1 \leq j \leq n - 1} \\ &= (-1)^{p + q} \operatorname{det}(A_{\sim p, \sim q}). \end{aligned} $$
This proves Lemma 0.6.3.
Now, we can finally prove Laplace expansion
Proof of Laplace expansion (a)
Let $ p \in \{1, 2, \dots, n\} $.
$$ \begin{aligned} \operatorname{det} A &= \sum_{\sigma \in S_n} (-1)^{\sigma} \prod_{i = 1}^{n} a_{i, \sigma(i)} \\ &= \sum_{q \in \{1, 2, \dots, n\}} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} \prod_{i = 1}^{n} a_{i, \sigma(i)} \\ &= \sum_{q \in \{1, 2, \dots, n\}} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} \prod_{i \in \{1, 2, \dots, n\}} a_{i, \sigma(i)} \\ &= \sum_{q \in \{1, 2, \dots, n\}} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} a_{p, \sigma(p)} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} \\ &= \sum_{q = 1}^{n} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} a_{p, q} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} \\ &= \sum_{q = 1}^{n} a_{p, q} \sum_{\substack{\sigma \in S_n; \\ \sigma(p) = q}} (-1)^{\sigma} \prod_{\substack{i \in \{1, 2, \dots, n\}; \\ i \neq p}} a_{i, \sigma(i)} \\ &= \sum_{q = 1}^{n} a_{p, q} (-1)^{p + q} \operatorname{det} (A_{\sim p, \sim q}) = \sum_{q = 1}^{n} (-1)^{p + q} a_{p, q} \operatorname{det} (A_{\sim p, \sim q}) \end{aligned} $$
This proves Laplace expansion (a).
Laplace expansion in multiple rows/columns
Let $ n \in \mathbf{N} $. Let $ A \in \mathbf{F}^{n \times n} $. For any subset $ I $ of $ \{1, 2, \dots, n\} $, we let $ I $ denote the complement $ \{1, 2, \dots, n\} \setminus I $ of $ I $.
(a) For every subset $ P $ of $ \{1, 2, \dots, n\} $, we have
$$ \operatorname{det} A = \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| = |P|}} (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} A). $$
(b) For every subset $ Q $ of $ \{1, 2, \dots, n\} $, we have
$$ \operatorname{det} A = \sum_{\substack{P \subseteq \{1, 2, \dots, n\}; \\ |P| = |Q|}} (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} A). $$
Proof
The proof I provide is based on 6.22. Laplace expansion in multiple rows/columns, in Notes on the combinatorialfundamentals of algebra, Darij Grinberg.
Lemma 0.6.4
Let $ n \in \mathbf{N} $. For any subset $ I $ of $ \{1, 2, \dots, n\} $, we let $ I $ denote the complement $ \{1, 2, \dots, n\} \setminus I $ of $ I $.
Let $ A = (a_{i,j})^{1 \leq i \leq n, 1 \leq j \leq n} $ and $ B = (b_{i,j})^{1 \leq i \leq n, 1 \leq j \leq n} $ be two $ n \times n $ matrices. Let $ Q $ be a subset of $ \{1, 2, \dots, n\} $. Let $ k = |Q| $. Then,
$$ \sum_{\substack{\sigma \in S_n; \\ \sigma(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma(i)}) \\ = (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ (\{1, 2, \dots, k\})} A) (\operatorname{sub} \substack{w(\tilde{Q}) \\ (\{k + 1, k + 2, \dots, n\})} B). $$
Proof of Lemma 0.6.4
We begin with some simple observations. The definition of $ Q $ yields $ \tilde{Q} = \{1, 2, \dots, n\} \setminus Q $. Since $ Q $ is a subset of $ \{1, 2, \dots, n\} $, this leads to $ |\tilde{Q}| = |\{1, 2, \dots, n\}| - |Q| = n - k $. Thus, $ n - k = |\tilde{Q}| \geq 0 $, so that $ n \geq k $ and thus $ k \in \{1, 2, \dots, n\} $.
Let $ (q_1, q_2, \dots, q_k) $ be the list of all elements of $ Q $ in increasing order (with no repetitions). Let $ (r_1, r_2, \dots, r_{n−k}) $ be the list of all elements of $ \tilde{Q} $ in increasing order (with no repetitions).
The lists $ w(Q) $ and $ (q_1, q_2, \dots, q_k) $ must be identical. In other words, we have $ w(Q) = (q_1, q_2, \dots, q_k) $. Similarly, $ w(\tilde{Q}) = (r_1, r_2, \dots, r_{n−k}) $.
for every $ \alpha ∈ S_k $ and $ \beta ∈ S_{n−k} $, we can define an element $ \sigma_{Q, \alpha, \beta} ∈ S_n $ according to Exercise 5.14 (a) in Notes. Consider this $ \sigma_{Q, \alpha, \beta} $. Exercise 5.14 (b) in Notes shows that for every $ \alpha ∈ S_k $ and $ \beta ∈ S_{n−k} $, we have
$$ \ell (\sigma_{Q, \alpha, \beta}) = \ell (\alpha) + \ell (\beta) + \sum Q - (1 + 2 + \dots + k) $$
and
$$ (-1)^{\sigma_{Q, \alpha, \beta}} = (-1)^{\alpha} \cdot (-1)^{\beta} \cdot (-1)^{\sum Q - (1 + 2 + \dots + k)}. $$
Exercise 5.14 (c) in Notes shows that the map
$$ S_k \times S_{n−k} \to {\tau \in S_n | \tau(\{1, 2, \dots, k\}) = Q}, \\ (\alpha, \beta) \mapsto \sigma_{Q, \alpha, \beta} $$
is well-defined and a bijection.
We have $ w(Q) = (q_1, q_2, \dots, q_k) $. Thus,
$$ \begin{aligned} \operatorname{sub} \substack{w(Q) \\ (1, 2, \dots, k)} A &= \operatorname{sub} \substack{(q_1, q_2, \dots, q_k) \\ (1, 2, \dots, k)} A = \operatorname{sub} \substack{q_1, q_2, \dots, q_k \\ 1, 2, \dots, k} A = (a_{x, q_y})^{1 \leq x \leq k, 1 \leq y \leq k} \\ &= (a_{i, q_j})^{1 \leq i \leq k, 1 \leq j \leq k} \end{aligned} $$
Thus,
$$ \begin{aligned} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ (1, 2, \dots, k)} A) &= \sum_{\sigma \in S_k} (-1)^{\sigma} \prod_{i = 1}^{k} a_{i, q_{\sigma(i)}} \\ &= \sum_{\alpha \in S_k} (-1)^{\alpha} \prod_{i = 1}^{k} a_{i, q_{\alpha(i)}} \\ \end{aligned} $$
Also, $ w(\tilde{Q}) = (r_1, r_2, \dots, r_{n−k}) $. Thus,
$$ \begin{aligned} \operatorname{sub} \substack{w(\tilde{Q}) \\ (k + 1, k + 2, \dots, n)} B &= \operatorname{sub} \substack{(r_1, r_2, \dots, r_{n−k}) \\ (k + 1, k + 2, \dots, n)} B = \operatorname{sub} \substack{r_1, r_2, \dots, r_{n−k} \\ k + 1, k + 2, \dots, n} B = (b_{k + x, r_y})^{1 \leq x \leq n - k, 1 \leq y \leq n - k} \\ &= (b_{k + i, r_j})^{1 \leq i \leq n - k, 1 \leq j \leq n - k} \end{aligned} $$
Thus,
$$ \begin{aligned} \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ (k + 1, k + 2, \dots, n)} B) &= \sum_{\sigma \in S_{n−k}} (-1)^{\sigma} \prod_{i = 1}^{n - k} b_{k + i, r_{\sigma(i)}} \\ &= \sum_{\beta \in S_{n−k}} (-1)^{\beta} \prod_{i = 1}^{n - k} b_{k + i, r_{\beta(i)}} \\ \end{aligned} $$
Now, we claim the following: For any $ \alpha \in S_k $ and $ \beta \in S_{n−k} $, we have
$$ \prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma_{Q, \alpha, \beta}(i)} = \prod_{i = 1}^{k} a_{i, q_{\alpha(i)}} $$
and
$$ \prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma_{Q, \alpha, \beta}(i)} = \prod_{i = 1}^{n - k} b_{k + i, r_{\beta(i)}}. $$
Now,
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma(i)}) \\ &= \sum_{\sigma \in \{\tau \in S_n | \tau(\{1, 2, \dots, k\}) = Q \}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma(i)}) \\ &= \sum_{(\alpha, \beta) \in S_k \times S_{n−k}} (-1)^{\sigma_{Q, \alpha, \beta}} (\prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma_{Q, \alpha, \beta}(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma_{Q, \alpha, \beta}(i)}) \\ & (\text{here, we have substituted } \sigma_{Q, \alpha, \beta} \text{ for } \sigma \text{ in the sum, since the map } S_k \times S_{n−k} \to {\tau \in S_n | \tau(\{1, 2, \dots, k\}) = Q} \text{, } (\alpha, \beta) \mapsto \sigma_{Q, \alpha, \beta} \text{ is a bijection}) \\ &= \sum_{\alpha \in S_k} \sum_{\beta \in S_{n−k}} (-1)^{\sigma_{Q, \alpha, \beta}} (\prod_{i = 1}^{k} a_{i, q_{\alpha(i)}}) (\prod_{i = 1}^{n - k} b_{k + i, r_{\beta(i)}}) \\ &= \sum_{\alpha \in S_k} \sum_{\beta \in S_{n−k}} (-1)^{\alpha} \cdot (-1)^{\beta} \cdot (-1)^{\sum Q - (1 + 2 + \dots + k)} (\prod_{i = 1}^{k} a_{i, q_{\alpha(i)}}) (\prod_{i = 1}^{n - k} b_{k + i, r_{\beta(i)}}) \\ &= (-1)^{\sum Q - (1 + 2 + \dots + k)} (\sum_{\alpha \in S_k} (-1)^{\alpha} (\prod_{i = 1}^{k} a_{i, q_{\alpha(i)}})) (\sum_{\beta \in S_{n−k}} (-1)^{\beta} (\prod_{i = 1}^{n - k} b_{k + i, r_{\beta(i)}})) \\ &= (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ (1, 2, \dots, k)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ (k + 1, k + 2, \dots, n)} B) . \end{aligned} $$
This proves Lemma 0.6.4.
Lemma 0.6.5
Let $ n \in \mathbf{N} $. Let $ \gamma \in S_n $. Then, the map $ S_n \to S_n $, $ \sigma \to \sigma \circ \gamma $ is a bijection.
Lemma 0.6.6
Let $ n \in \mathbf{N} $. For any subset $ I $ of $ \{1, 2, \dots, n\} $, we let $ \tilde{I} $ denote the complement $ \{1, 2, \dots, n\} \setminus I $ of $ I $. Let $ I $ be a subset of $ \{1, 2, \dots, n\} $. Let $ k = |I| $. Then, there exists a $ \sigma \in S_n $ satisfying $ (\sigma (1), \sigma (2), \dots, \sigma (k)) = w(I) $, $ (\sigma (k + 1), \sigma (k + 2), \dots, \sigma(n)) = w(\tilde{I}) $, and $ (−1)^{\sigma} = (−1)^{\sum I − (1 + 2 + \dots + k)} $.
Lemma 0.6.7
Let $ n \in \mathbf{N} $. For any subset $ I $ of $ \{1, 2, \dots, n\} $, we let $ \tilde{I} $ denote the complement $ \{1, 2, \dots, n\} \setminus I $ of $ I $.
Let $ A = a_{i, j}^{1 \leq i \leq n, 1 \leq j \leq n} $ and $ B = b_{i, j}^{1 \leq i \leq n, 1 \leq j \leq n} $ be two $ n \times n $ matrices. Let $ P $ and $ Q $ be two subsets of $ \{1, 2, \dots, n\} $ such that $ |P| = |Q| $. Then,
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)}) (\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) \\ &= (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B). \end{aligned} $$
Proof of Lemma 0.6.7
Define $ k \in \mathbf{N} $ by $ k = |P| = |Q| $.
The definition of $ \tilde{P} $ yields $ \tilde{P} = \{1, 2, \dots, n\} \setminus P $. $ |\tilde{P}| = n - k $.
Let $ (p_1, p_2, \dots, p_k) $ be the list of all elements of $ P $ in increasing order (with no repetitions).
Let $ (r_1, r_2, \dots, r_{n−k}) $ be the list of all elements of $ \tilde{P} $ in increasing order (with no repetitions).
Lemma 0.6.6 shows that there exists a $ \sigma \in S_n $ satisfying $ (\sigma (1), \sigma (2), \dots, \sigma (k)) = w(P) $, $ (\sigma (k + 1), \sigma (k + 2), \dots, \sigma(n)) = w(\tilde{P}) $, and $ (−1)^{\sigma} = (−1)^{\sum P − (1 + 2 + \dots + k)} $. Denote this $ \sigma $ by $ \gamma $.
Notice that
$$ \begin{aligned} (-1)^{\sum P - (1 + 2 + \dots + k)} (-1)^{\gamma} &= (-1)^{\sum P - (1 + 2 + \dots + k)} (-1)^{\sum P - (1 + 2 + \dots + k)} \\ &= 1 \end{aligned} $$
Define an $ n \times n $ matrix $ A^{\prime} $ by $ A^{\prime} = a_{\gamma (i), j}^{1 \leq i \leq n, 1 \leq j \leq n} $. Define an $ n \times n $ matrix $ B^{\prime} $ by $ B^{\prime} = b_{\gamma(i), j}^{1 \leq i \leq n, 1 \leq j \leq n} $. Then,
$$ \operatorname{sub} \substack{w(Q) \\ w(P)} A = \operatorname{sub} \substack{w(Q) \\ (1, 2, \dots, k)} (A^{\prime}) $$
and
$$ \operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B = \operatorname{sub} \substack{w(\tilde{Q}) \\ (k + 1, k + 2, \dots, n)} (B^{\prime}) . $$
Lemma 0.6.5 shows that the map $ S_n \to S_n $, $ \sigma \to \sigma \circ \gamma $ is a bijection.
But $ \gamma (\{1, 2, \dots, k\}) = P $. Hence, every $ \sigma \in S_n $ satisfies
$$ (\sigma \circ \gamma) (\{1, 2, \dots, k\}) = \sigma (\gamma (\{1, 2, \dots, k\})) = \sigma (P) . $$
Furthermore, every $ \sigma \in S_n $ satisfies
$$ \prod_{i \in \{1, 2, \dots, k\}} a_{\gamma (i), (\sigma \circ \gamma) (i)} = \prod_{i \in P} a_{i, \sigma(i)} $$
and
$$ \prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{\gamma (i), (\sigma \circ \gamma) (i)} = \prod_{i \in \tilde{P}} b_{i, \sigma(i)}. $$
Now, Lemma 0.6.4 yields
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{\gamma (i), \sigma(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{\gamma (i), \sigma(i)}) \\ &= \sum_{\substack{\sigma \in S_n; \\ \sigma(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{i, \sigma(i)}^{\prime})(\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{i, \sigma(i)}^{\prime}) \\ &= (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ \{1, 2, \dots, k\}} A^{\prime}) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ \{k + 1, k + 2, \dots, n\}} B^{\prime}) \\ &= (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B). \end{aligned} $$
Comparing this with
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma} (\prod_{i \in \{1, 2, \dots, k\}} a_{\gamma (i), \sigma(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{\gamma (i), \sigma(i)}) \\ &= \sum_{\substack{\sigma \in S_n; \\ (\sigma \circ \gamma)(\{1, 2, \dots, k\}) = Q}} (-1)^{\sigma \circ \gamma} (\prod_{i \in \{1, 2, \dots, k\}} a_{\gamma (i), (\sigma \circ \gamma)(i)}) (\prod_{i \in \{k + 1, k + 2, \dots, n\}} b_{\gamma (i), (\sigma \circ \gamma)(i)}) \\ & (\text{here, we have substituted } \sigma \circ \gamma \text{ for } \sigma \text{ in the sum, since the map } S_n \to S_n \text{, } \sigma \mapsto \sigma \circ \gamma \text{ is a bijection}) \\ &= \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} \cdot (-1)^{\gamma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) \\ &= (-1)^{\gamma} \cdot \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) , \end{aligned} $$
we obtain
$$ \begin{aligned} & (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B) \\ &= (-1)^{\gamma} \cdot \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) . \end{aligned} $$
Multiplying both sides of this equality by $ (-1)^{\sum P - (1 + 2 + \dots + k)} $, we obtain
$$ \begin{aligned} & (-1)^{\sum P - (1 + 2 + \dots + k)} (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B) \\ &= (-1)^{\sum P - (1 + 2 + \dots + k)} (-1)^{\gamma} \cdot \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) \\ & (\text{here, we have } \gamma = \sum P - (1 + 2 + \dots + k)) \\ &= \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) \end{aligned} $$
Thus,
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)})(\prod_{i \in \tilde{P}} b_{i, \sigma(i)}) \\ &= (-1)^{\sum P - (1 + 2 + \dots + k)} (-1)^{(1 + 2 + \dots + k) + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B) \\ &= (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} B) \end{aligned} $$
This proves Lemma 0.6.7.
For the sake of convenience, let us restate a simplified particular case of Lemma 0.6.7 for $ A = B $:
Lemma 0.6.8
Let $ n \in \mathbf{N} $. For any subset $ I $ of $ \{1, 2, \dots, n\} $, we let $ \tilde{I} $ denote the complement $ \{1, 2, \dots, n\} \setminus I $ of $ I $.
Let $ A = a_{i, j}^{1 \leq i \leq n, 1 \leq j \leq n} $ be an $ n \times n $ matrix. Let $ P $ and $ Q $ be two subsets of $ \{1, 2, \dots, n\} $ such that $ |P| = |Q| $. Then,
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) \\ &= (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} A). \end{aligned} $$
Proof of Lemma 0.6.8
Every $ \sigma \in S_n $ satisfies
$$ \prod_{i = 1}^{n} = (\prod_{i \in P} a_{i, \sigma(i)}) (\prod_{i \in \tilde{P}} a_{i, \sigma(i)}) . $$
Now,
$$ \begin{aligned} & \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) \\ &= \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i \in P} a_{i, \sigma(i)}) (\prod_{i \in \tilde{P}} a_{i, \sigma(i)}) \\ &= (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} A). \end{aligned} $$
This proves Lemma 0.6.8.
We can now step to the proof of Laplace expansion in multiple rows/columns (a):
Write the $ n \times n $ matrix $ A $ in the form $ A = a_{i,j}^{1 \leq i \leq n, 1 \leq j \leq n} $. If $ P $ and $ Q $ are two subsets of $ \{1, 2, \dots, n\} $ satisfying $ |Q| \neq |P| $, then
$$ \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) = 0 . $$
Indeed, the map $ \sigma: \{1, 2, \dots, n\} \to \{1, 2, \dots, n\} $ is injective, so the set $ \sigma (P) $ is a subset of $ \{1, 2, \dots, n\} $ satisfying $ |\sigma (P)| = |P| $. Hence, $ |P| = \sigma (P) = |Q| \neq |P| $. This is absurd. Hence, we have found a contradiction.
Now, forget that we fixed $ \sigma $. We thus have found a contradiction for every $ \sigma \in S_n $ satisfying $ \sigma (P) = Q $. Thus, there exists no $ \sigma \in S_n $ satisfying $ \sigma (P) = Q $. Hence, the sum $ \sum_{\substack{\sigma \in S_n; \\ \sigma (P) = Q}} (−1)^{\sigma} \prod_{i = 1}^{n} a_{i, \sigma (i)} $ is an empty sum. Thus
$$ \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) = (\text{empty sum}) = 0 . $$
Let $ P $ be a subset of $ \{1, 2, \dots, n\} $. Then,
$$ \begin{aligned} \operatorname{det} A &= \sum_{\sigma \in S_n} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) = \sum_{Q \subseteq \{1, 2, \dots, n\}} \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) \\ &= \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| = |P|}} \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) + \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| \neq |P|}} \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) \\ & (\text{since every subset } Q \text{ of } \{1, 2, \dots, n\} \text{ satisfies either } |Q| = |P| \text{ or } |Q| \neq |P|. \text{ When } |Q| \neq |P| \text{, this is empty sum.}) \\ &= \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| = |P|}} \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) + \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| \neq |P|}} 0 \\ &= \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| = |P|}} \sum_{\substack{\sigma \in S_n; \\ \sigma(P) = Q}} (-1)^{\sigma} (\prod_{i = 1}^{n} a_{i, \sigma(i)}) \\ &= \sum_{\substack{Q \subseteq \{1, 2, \dots, n\}; \\ |Q| = |P|}} (-1)^{\sum P + \sum Q} \operatorname{det} (\operatorname{sub} \substack{w(Q) \\ w(P)} A) \operatorname{det} (\operatorname{sub} \substack{w(\tilde{Q}) \\ w(\tilde{P})} A) . \end{aligned} $$
This proves Laplace expansion in multiple rows/columns (a).