拉格朗日乘数法复习

发表于 2025-05-04

拉格朗日乘数法复习

$\hspace{1cm}$ 给定一个多元函数$f(\mathbf{x})$, 和若干个限制：$g_1(\mathbf{x}) = g_2(\mathbf{x}) = ... = g_m(\mathbf{x}) = 0$, 求$f(\mathbf{x})$的最小值。这里的$\mathbf{x}$是一个向量。$\mathbf{x} = \{x_1,x_2,...x_n\}$

$\hspace{1cm}$ 设$L(\mathbf{x}, \lambda_1, \lambda_2, ..., \lambda_m) = f(\mathbf{x}) - \lambda_1g_1(\mathbf{x}) - ... - \lambda_mg_m(\mathbf{x})$,若$\frac{\partial L}{\partial \mathbf{x}} = \mathbf{0}$，$\forall i \in [1,m],\frac{\partial L}{\partial \lambda_i} = 0$, 则此时的$\mathbf{x}$是一个候选的极小值点（极小值点必定在这样的一个$\mathbf{x}$中）

证明：

$\hspace{1cm}$ 考虑$\mathbf{x} = \mathbf{x_0}$时，此时$\mathbf{x}$想要有任何的微小位移$\Delta \mathbf{x}$,都必须满足： \[ \Delta \mathbf{x} \cdot \nabla g_i(\mathbf{x_0}) = 0 \ \ \ \forall i \in[1,m] \] $\hspace{1cm}$ 我们称任意一个满足上式的微小位移$\Delta \mathbf{x}$为合法微小位移。

$\hspace{1cm}$ 若$\mathbf{x_0}$还是一个最小值点，那么对于任意一个合法微小位移，它肯定需要满足： \[ \Delta \mathbf{x} \cdot \nabla f(\mathbf{x_0}) = 0 \] $\hspace{1cm}$ 因为对于任意一个合法微小位移都需要满足，所以显然$\nabla f(\mathbf{x_0})$必定在$\nabla g_i(\mathbf{x_0})$张成的线性空间里。这就对应：$\frac{\partial L}{\partial \mathbf{x}} = \mathbf{0}$的这个条件。至于$\forall i \in [1,m],\frac{\partial L}{\partial \lambda_i} = 0$，这是为了满足约束条件而显然必须的条件。

Hello World

发表于 2025-01-27

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

概率论

发表于 2025-01-01 更新于 2025-02-06

概率论

1. 泊松分布

$\hspace{1cm}$ 定义为$P(x=k) = \frac{\lambda ^k}{k!}e^{-\lambda}$，$\lambda > 0$，也可以记作$X$ ~ $P(\lambda)$

$\hspace{1cm}$ 对于一个二项分布$B(n,p)$，如果$n$比较大，$np$适中（$n \ge 100， np \le 10$）那么可以把二项分布$B(n,p)$近似为$P(np)$

2. 超几何分布

$\hspace{1cm}$ 定义为$H(N,M,n)$，表示从有限$N$个物品中（其中包含$M$个指定种类的物件）中抽出$n$个物件，成功抽出该指定种类的物件的次数（不放回）

3. 均匀分布

$\hspace{1cm}$ 即在$[a,b]$范围内等概率分布的情况，记作$X$ ~ $U(a,b)$

4. 指数分布

$\hspace{1cm}$ 其概率密度函数为 \[ f(x) = \begin{cases}\lambda e^{-\lambda x}\ \ \ \ x > 0\\ 0 \ \ \ \ \ \ \ \ \ \ \ \ x \le 0 \end{cases} \] $\hspace{1cm}$ 记作$X$ ~ $Exp(\lambda)$

5. 几何分布

$\hspace{1cm}$ 定义为$P(X = k) = (1 - p)^{k - 1} \times p$，也可记作$X$ ~ $G(p)$，表示第$k$次刚好第一次发生概率为$p$的事

6. 正态分布

$\hspace{1cm}$ 其密度函数记为 \[ \varphi(x) = \frac 1{\sqrt{2\pi} \sigma}e^{-\frac{(x - \mu)^2}{2\sigma^2}} \] $\hspace{1cm}$ 记作$X$ ~ $N(\mu, \sigma^2)$

$\hspace{1cm}$ 其分布函数记为$\Phi(x) = \int_{-\infty}^x \phi(t) dt$

$\hspace{1cm}$ 当$\mu = 0, \sigma = 1$时，认为是标准正态分布，其概率密度函数记为$\varphi_0(x)$，分布函数记为$\Phi_0(x)$

$\hspace{1cm}$ 对于任何任意一个$\varphi(x)$，有$\varphi(x) = \frac 1 \sigma \varphi_0(\frac {x - \mu}{\sigma})$，以及$\Phi(x) = \Phi_0(\frac{x - \mu}{\sigma})$

多维随机变量相关

1.多维随机变量的联合分布函数，以及边缘分布函数（对于离散型和连续型均适用）

$\hspace{1cm}$ 仿照单元模式，我们有如下定义：$F(x,y) = P(X \le x, Y \le y)$ 为$(X,Y)$的联合分布函数

$\hspace{1cm}$ 根据定义，容易证明

\[ P(x_1 < X \le x_2, y_1 < Y \le y_2) = F(x_2,y_2) - F(x_1,y_2) - F(x_2,y_1) + F(x_1,y_1) \]

$\hspace{1cm}$ 注意到$F(x,y)$有如下性质：

\[ F(-\infty,y) = F(x, -\infty) = 0 \]

\[ F(-\infty,-\infty) = 0, F(+\infty,+\infty) = 1 \]

$\hspace{1cm}$ 定义边缘分布函数：称$F_X(x),F_Y(y)$分别为$(X,Y)$关于$X$和$Y$的边缘分布函数，其中$F_X(x) = F(x,+\infty)$，$F_Y(y) = F(y, +\infty)$。

2. 多维随机变量的联合概率密度以及边缘概率密度（连续型）

$\hspace{1cm}$ 若存在二元函数$f(x,y)$使得$\int_{-\infty}^x \int_{-\infty}^y f(u,v)dudv = F(x,y)$，则称$f(x,y)$为联合概率密度

$\hspace{1cm}$ 定义边缘概率密度函数：称$f_X(x), f_Y(y)$分别为$(X,Y)$关于$X$和$Y$的边缘密度函数，其中$f_X(x) = \int_{-\infty}^{+\infty}f(x,y)dy$, $f_Y(y) = \int_{-\infty}^{+\infty}f(x,y)dx$

3. 多维随机变量的联合分布列以及边缘分布列（离散型）

$\hspace{1cm}$ 若$(X,Y)$均是离散的，那么定义$p_{ij} =P(X = x_i, Y = y_i)$为联合分布列

$\hspace{1cm}$ 同时定义$p_{i·} = \sum_j p_{ij}$为$(X,Y)$关于$X$的边缘分布列，$p_{·j} =\sum _i p_{ij}$ 为$(X,Y)$关于$Y$的边缘分布列

4. 多维变量的分布

$\hspace{1cm}$ 设$Z = \frac XY$，且$f(x,y)$表示变量$X,Y$的联合概率密度函数，则有： \[ f_Z(z) = \int_{-\infty}^{+\infty} |y| f(yz,y)dy \]

期望，方差，协方差

常见分布的期望和方差：

类型	E:期望	D：方差
$U(a,b)$	$\frac {a + b}2$	$\frac{(b - a)^2}{12}$
$P(\lambda)$	$\lambda$	$\lambda$
$E(\lambda)$	$\frac 1\lambda$	$\frac 1{\lambda^2}$
$G(p)$	$\frac 1p$	$\frac q{p^2}$
$N(\mu, \sigma^2)$	$\mu$	$\sigma^2$
$B(n,p)$	$np$	$npq$

注：$E(\lambda)$为指数分布，$G(p)$为几何分布，$q = 1 - p$

切比雪夫不等式：对于一切分布$X$，有： \[ P(|X - EX| > \varepsilon) \le \frac {DX}{\varepsilon^2} \]
注意：如果随机变量$X,Y$满足$E(XY) = E(X)E(Y)$那么不能说明$X$和$Y$是独立的！
对于任意随机变量$X_1,X_2,...,X_n$以及任意常数$k_0,k_1,...,k_n$，我们有： \[ D(k_0 + k_1X_1+ k_2X_2 + ... + k_nX_n) = \sum_{i = 1}^n{k_i}^2DX_i + 2 \sum_{1 \le i < j \le n}k_ik_jCov(X_i,X_j) \]
Cauchy-schwarz inequation: \[ EX^2 \times EY^2 \ge (E[XY])^2 \]
最小二乘法拟合：对于随机变量$X,Y$，尝试用$y = \hat{a}x + \hat{b}$拟合，那么最好的拟合直线中: \[ \hat{a} = \rho_{XY} \sqrt{\frac {DY}{DX} }, \hat{b} = EY - \hat{a}EX \]
若$(X,Y)$服从二维正态分布$N(\mu_1,\mu_2,{\sigma_1}^2,{\sigma_2}^2,\rho)$，那么$Cov(X,Y) = \sigma_1\sigma_2\rho$
条件期望的概念：书p69开始，注意：$E(X|Y)$是一个关于$Y$的函数，也是一个随机变量，而$E(X|Y = y)$是一个常数。全期望公式： \[ E(X) = E[E(X|Y)] \]

数理统计部分

样本方差的定义：设有$n$个样本$X_1,X_2,...,X_n$，则定义样本方差 \[ S^2 = \frac 1{n - 1} \sum_{i = 1}^n(X_i - \bar{X})^2 \] 其中： \[ \bar{X} = \frac 1n \sum_{i = 1}^nX_i \]
设$n$个样本都是独立同分布的，且每个样本作为随机变量均值为$\mu$,方差为$\sigma^2$，则： \[ ES^2 =E(\frac 1{n - 1} \sum_{i = 1}^n(X_i - \bar{X})^2) = \frac 1{n - 1} \sum_{i = 1}^n(E^2(X_i) - 2E(X_i\bar{X}) + E^2(\bar{x}))\\ = \frac 1{n - 1} \sum_{i = 1}^n(\mu^2 + \sigma^2 + \mu^2 + \frac 1n \sigma^2 - 2E(X_i\bar{X}))\\ \because E(X_i\bar{X}) = \frac 1nE(\sum_{1 \le j\le n, j \neq i}X_iX_j) + \frac 1nE({X_i}^2) = \frac 1n\sum_{1 \le j\le n, j \neq i}E(X_i)E(X_j) + \frac 1nE({X_i}^2)\\ = \mu^2 + \frac 1n \sigma^2\\ \therefore ES^2 = \frac 1{n - 1} \times n \times \frac{n - 1}n\sigma^2 = \sigma^2 \]
设$\chi^2$ ~ $\chi^2(n)$，则当$n$足够大时，$\sqrt{2\chi^2}$近似服从正态分布$N(\sqrt{2n-1},1)$
t分布：若$X$ ~ $N(0,1)$, $Y$ ~ $\chi^2(N)$，且$X$和$Y$相互独立，那么定义$T$ ~ $t(n) = \frac{X}{\sqrt{Y/n}}$为t分布.

当$n \to +\infty$时，$t(n)$ 可近似为 $N(0,1)$
F分布：若$X$ ~ $\chi^2(n_1)$， $Y$ ~ $\chi^2(n_2)$，且$X$和$Y$相互独立，那么定义$F(n_1,n_2) = \frac{X/{n_1}}{Y/_{n_2}}$为F分布

若$F$ ~ $F(n_1,n_2)$,则有$1F $ ~ $F(n_2,n_1)$。同时我们有：$F_{\alpha}(n_2,n_1) = \frac 1{F_{1 - \alpha}(n_1,n_2)}$
假设$X_1,X_2,...,X_n$均服从$N(\mu,\sigma^2)$且相互独立，设$\bar{X} = \frac 1n \sum_{i = 1}^n {X_i}$，$S^2 = \frac 1n \sum_{i = 1}^n(X_i - \bar{X})^2$，那么：
- $ _{i = 1}^n (X_i - )^2$ ~ $\chi^2(n)$
- $_{i = 1}^n (X_i - {X})^2 $ ~ $\chi^2 (n - 1)$ 这条等价为$(n - 1)\frac {S^2}{\sigma^2}$ ~ $\chi^2(n - 1)$
- $\frac{\bar{X} - \mu} {\sigma}\sqrt{n}$ ~ $N(0,1)$；
- $\frac{\bar{X} - \mu}{S}\sqrt{n}$ ~ $t(n - 1)$；（这里的S表示标准差，即$\sqrt{S^2}$，同时我们可以比较上下发现，当$n \to \infty$时，$t(n - 1) \to N(0,1)$）
p96 推论6.3 考前看一下，难记
在正态分布中，不相关和独立是等价的，这是一条很强的结论
正态分布随机抽样得到$X_1,X_2,...,X_n$。那么有$S^2$与$\overline{X}$独立，且有$\tilde{S}$与$\overline{X}$独立。

参数估计部分

$\hspace{1cm}$ 这部分主要用于处理已经知道一系列样本，要用这些样板来估计分布函数里面的一些参数的问题。

矩估计：
- 若只有一个参数要估计，那只需要用一阶矩来估计即可；如果有n个参数要估计，那么用前n阶矩来估计
- 用第$k$阶矩来估计的方法：
  - 首先计算出总体的$k$阶原点矩，这等于$E(X^k)$，这应该是一个关于待估计参数的函数
  - 然后计算出样本的$k$阶原点矩，这等于$\frac 1n \sum_{i = 1}^n {X_i}^k$ ，这算出来是一个确定的数
  - 令样本的$k$阶原点矩等于总体的$k$阶原点矩，得到一个方程用来解待估计的参数
- 更加一般化地说，就是我们可以用样本的$k$阶原点矩作为总体的$k$阶原点矩。例如，我们知道样本的$1$阶原点矩是$a$，样本的$2$阶原点矩是$b$，现在要求样本的标准差的矩估计。
  
  那么，我们就可以认为$EX = a$, $EX^2 = b$，于是$D = EX^2 -(EX)^2 = b - a^2$，所以我们就认为$\sqrt{b - a^2}$是样本的标准差的矩估计。
- 我们有如下记号约定：对于一个样本$(X_1,X_2,...,X_n)$，令 $\tilde{S}^2 = \frac 1n \sum_{i = 1}^n (X_i - \overline{X})^2$
- 在矩估计法中，认为$DX = \tilde{S}^2$
  
  证明如下： \[ \tilde{S}^2 =\frac 1n \sum_{i = 1}^n (X_i - \overline{X})^2 \\ = \frac 1n \sum_{i = 1}^n{X_i}^2 - \frac 2n \sum_{i = 1}^n{X_i}\overline{X} + \overline{X}^2\\ = \frac 1n \sum_{i = 1}^n{X_i}^2 - 2\overline{X}^2 + \overline{X}^2 = \frac 1n \sum_{i = 1}^n{X_i}^2 - \overline{X}^2\\ = EX^2 - (EX)^2 = DX \]
最大似然估计

即为要让参数取该估计值时，样本发生的概率最大
设$\hat{\theta} = \hat{\theta}(X_1,X_2,...,X_n)$是参数$\theta$的估计量，若对于任意的$\theta \in \Theta$，都有： \[ E(\hat{\theta}) = \theta \] 那么称$\hat{\theta}$是$\theta$的无偏估计。

设总体方差$DX=\sigma^2$存在，那么$S^2 = \frac 1{n - 1} \sum_{i = 1}^n (X_i - \overline{X})^2$是$\sigma^2$的无偏估计，而$\tilde{S}^2 = \frac 1n \sum_{i = 1}^n(X_i - \overline{X})^2$是$\sigma^2$的有偏估计。

概率论好题整理

若$X,Y$独立，且$X$ ~ $P(\lambda_1)$,$Y$ ~ $P(\lambda_2)$。
1. 证明$X + Y$ ~ $P(\lambda_1 + \lambda_2)$; （2）求在已知$X + Y = m$的条件下$X$的分布
已知随机变量$X$和$Y$相互独立，且都服从$N(0,1)$，求$E[max\{X,Y\}]$
已知$X$ ~ $N(\mu, \sigma ^2)$，求$DX$

首先$EX = \mu$, 然后： \[ DX = E((X - EX)^2) = E((X - \mu)^2) = \int_{-\infty}^{+\infty} \frac 1 {\sqrt{2\pi}\sigma}e^{-\frac {(x - \mu)^2}{2\sigma ^2}}\times (x - \mu)^2 dx\\ = \int_{-\infty}^{+\infty} \frac 1 {\sqrt{2\pi}\sigma} e^{\frac{-x^2}{2\sigma^2} }\times x^2 dx = \int_{-\infty}^{+\infty} \frac 1 {\sqrt{2\pi}\sigma} e^{-(\frac x{\sqrt 2 \sigma})^2} \times (\frac x{\sqrt 2 \sigma})^2d(\frac {x}{\sqrt 2 \sigma}) \times 2\sqrt 2 \sigma ^3\\ = \frac{2\sigma ^2}{\sqrt{\pi}} \int_{-\infty}^{+\infty} e^{-x^2} x^2dx \]

考察积分$\int_{-\infty}^{+\infty}e^{-x^2}x^2dx$： \[ \int_{-\infty}^{+\infty} e^{-x^2}dx = \left. xe^{-x^2}\right|_{-\infty}^{+\infty} + 2\int_{-\infty}^{+\infty}e^{x^2}x^2 dx = \sqrt{\pi} \] 于是： \[ \int_{-\infty}^{+\infty}e^{x^2}x^2 dx =\frac {\sqrt \pi}2 \] 所以: \[ DX = \sigma^2 \]

已知$X$ ~ $N(0,\sigma^2)$, $Y$ ~ $N(0,\sigma ^2)$，求证：$X + Y$ ~ $N(0, 2\sigma^2)$， $X - Y$ ~ $N(0, 2\sigma^2)$。

超级强的一个结论：若$X$ ~ $N(\mu_1,{\sigma_1}^2)$，$Y$ ~ $N(\mu_2,{\sigma_2}^2)$，则$X + Y$ ~ $N(\mu_1+\mu_2，{\sigma_1}^2+{\sigma_2}^2 )$，

$X - Y$ ~ $N(\mu1 - \mu 2,{\sigma_1}^2+{\sigma_2}^2 )$

设随机变量$X$与$Y$相互独立，且都服从$N(0, \frac 12)$分布，求$E|X - Y|$以及$D|X-Y|$ \[ E|X-Y| = \sqrt{\frac 2 \pi}\ \ \ \ \ \ \ D|X-Y| = 1 - \frac 2 \pi \]
设随机变量$x$具有概率密度： \[ f(x) = \begin{cases} \frac{x^m}{m!} e^{-x}\ \ \ \ \ \ \ \ x \ge 0\\ 0 \ \ \ \ \ \ \ \ \ \ \ \ \ else\end{cases} \] 其中$m$为正整数。请证明：$P(0 \le X \le 2(m + 1)) \ge \frac m{m + 1}$

hint: 实际上这个概率密度函数和欧拉给出的 x! 的连续函数的被积函数很像
设随机变量$X,Y$独立，且分别服从参数为$\lambda$和$\mu$的泊松分布，求$E(X|X + Y = m)$

$ans:\frac{m\lambda}{\lambda + \mu}$
假设随机变量$X_1$在$[0,1]$上有均匀分布,$X_i$在$[X_{i-1},X_{i-1}+1]$上有均匀分布，其中$i = 2,3,...,n$，求$EX_n$

解：首先$E(X_i|X_{i - 1}) = X_{i - 1} + \frac 12$，根据全期望公式，有

\[ E(X_i) = E(E(X_i|X_{i - 1})) = E(x_{i - 1} + \frac 12) = EX_{i - 1} + \frac 12 \] 而$E_1 = \frac 12$，所以$E_n = \frac n2$
已知$X$ ~ $N(0,1)$，求$E(X^2),D(X^2)$ \[ E(X^2) = E^2(X) + D(X) = 1\\ E(X^4) = \int_{-\infty}^{+\infty}\frac 1{\sqrt{2\pi}}e^{-\frac{x^2}2}x^4dx\\ while\ \ \ \int_{-\infty}^{+\infty}e^{-x^2}x^4dx = \frac 34 \sqrt{\pi}\ \ (This \ \ is \ \ similar \ \ to \ \ calculating \ \ \int_{-\infty}^{+\infty}e^{-x^2}x^2 dx)\\ \therefore E(X^4) = 3\\ \therefore D(X^2) = E(X^4)-E^2(X^2) = 3 - 1 = 2 \]
设总体$B$ ~ $B(1,p)$,$p$为未知参数，$(X_1,X_2,X_3,...,X_n)$ 为来自总体$X$的样本，求$p$的极大似然估计

第一种做法是$L(\theta) = \prod(X_ip + (1-X_i)(1-p))$，这种后续取对然后求驻点是做不下去的

需要写成$L(\theta) = \prod(p^{X_i}(1 - p)^{1 - X_i})$，然后后续取对然后求驻点可以做
设总体$X$的密度函数为： \[ f(x) = \begin{cases}\frac 1 \theta e^{-\frac{x - \mu}\theta}\ \ \ \ x \ge \mu \\ 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x < \mu \\ \end{cases} \] 其中$\theta > 0$，求未知参数$\mu, \theta$的矩估计量

注意到：$X - \mu$ ~ $E(\frac 1\theta)$，

所以：$E(X - \mu)$ = $\theta$, $D(X - \mu) = \theta^2$，所以：$EX = \theta + \mu, DX = \theta ^2$，所以：$\overline{X} = \theta + \mu, \tilde{S}^2 = \theta^2$ 所以：$\theta = \tilde{S}, \mu = \overline{X} - \tilde{S}$

若$X$与$Y$不相关，是否有$f(X)$与$g(Y)$不相关？
若$X_1,X_2,...,X_n$独立同分布，是否有$\overline{X}$与$S^2$不相关？

ICS Notes

发表于 2024-11-25 更新于 2025-01-28

ICS Notes

Carry Flag将两个操作数视作无符号数，无符号数加减溢出的时候Carry Flag会被置为1。在实际执行的加法中，如果两个操作数的符号位相等，但不等于结果的符号位，CF会被置为1，反之为0. 在减法中，如果不够减，那么CF被置为1.

CF (bit 0) Carry flag — Set if an arithmetic operation generates a carry or a borrow out of the most significant bit of the result; cleared otherwise.

Over Flag将两个操作数视为有符号数。加法时，若CF=1表示加法有进位。减法时，若CF=1表示不够减。

SHL 逻辑左移，每左移一次，把最高位移到CF里，并在低位补0
SHR 逻辑右移，每右移一次，把最低位移到CF里，并在高位补0
SAL 算术左移，和SHL一样，只不过若移位前后sign bit发生变化，那么OF会被置为1
SAR 算术右移，每右移一次，把最低位移到CF里，并在高位补符号位
ROL 循环左移，每左移一次，把最高位放到最低位，同时把最高位放到CF里
ROR 循环右移，每右移一次，把最低位放到最高位，同时把最低位放到CF里
RCL 把CF当成最高位加到原来的数左边，一起循环左移。
RCR 把CF当成最低位加到原来的数右边，一起循环右移。

JCXZ/JECXZ： jump if CX/ECX is 0
MUL
- 单操作数例如MUL A 意思是DX:AX = （AX * A)
- 双操作数例如MUL S, D 意思是 D *= S
- 三操作数例如MUL n,S,D 意思是 D = S * n, 这里D必须是寄存器
IMUL: 与MUL同理
机器码里面的操作数如果是立即数存储方法也是小端序的，例如e8 6e ff ff ff实际上的操作数是ff ff ff 6e

编译器方面的优化方式：

提高cache的命中率，例如把二位数组求和改成行优先
提高流水线的命中率，例如①把循环拆开，②采用cmov指令
提高单条指令的运行速度，例如把lea 0(%eax,%eax,3) %eax 改为 SAL $2,%eax
使用串指令，例如在把一个数组的全部赋为0的任务中，使用memset这一封装了串指令的命令
使用多线程，例如在把一个数组的全部赋为0的任务中，开多个线程分别对数组的各个部分赋0
使用SIMD：一条指令成组操作，节约了时间
变量与寄存器绑定

C语言方面的优化方式：

把*2改为<< 1，提高单调指令的效率
循环展开，提高流水线效率
二位数组行优先访问，提高cache命中率
使用memset等封装了串指令的命令
register int i
if （A && B）如果A满足概率90%, B 10%，那么写成if (B && A)
把递归改为迭代
用封装了SIMD的代码
多线程
减少重复计算

异常：CPU内部产生的错误；中断：由外部设备产生的“外部事件”

Question:

rela.data节具体是怎么存储信息的？之前在实验四用readelf看的时候没有看到过rela.data节

存储信息方式同rela.text;
软中断（int命令）是异常还是中断？

是异常
栈帧栈顶地址是什么意思？是存返回地址的那个地方是栈顶吗？

答：不是，是%esp的地址。栈帧的范围是从%ebp到%esp之间（闭区间）
我想要让代码跳到0x4011d6处执行，对应的汇编代码不能直接写jmp 0x4011d6，而是得写成

考试时可以这样写
1
2
mov $0x4011d6,%r10
jmp *%r10
这是为什么？

另外，如果我新建一个.s文件然后在里面写jmp 0x4011d6然后汇编，然后再反汇编，得到的反汇编结果是jmp 0x05,这是为什么？
一般来说，用switch效率是比用很多if要高吗？

不一定，按习题来
extern定义的变量名会不会出现在符号表里面（因为如果不用就不会）？这种变量是不是没有强弱符号的概念？
在同一个.o文件下，要么全是R_386_32重定位方式，要么全部都是R_386_PC32重定位方式？看起来不是：（
栈顶是低地址
b8 mov literal, a1 mov 直接寻址
外部变量不会在.data,.bss,.text节中

假设LP是一个标号，那么JMP LP等价的语句是：

LEA LP, %EAX

JMP *%EAX

字符之间的比较最好用ja,jb

.data 段里面定义的变量使用的时候可以替换为它的地址。

所有的符号（包括外部符号）只要被引用了就要重定位

如果$x$在C语言中是一个无符号数，那么x >>= 1翻译成汇编语言就会时shr

如果$x$在C语言中是一个有符号数，那么x >>= 1翻译成汇编语言就会时sar

基址加变址寻址：A(B,C,D)中B和C都必须要是寄存器

没直接问寻址方式就详细答，不放心全都写上

中断描述符表就是中断向量表。

欧拉函数&线性筛

发表于 2024-11-07 更新于 2025-01-28

欧拉函数&线性筛

1. 定义

$\varphi(n)$ 表示所有与$n$互质的正整数$i$的个数，其中$i <= n$.

2. 欧拉定理

欧拉定理是指对于任意互质的正整数$a$ 和 $n$，有:

\[ a^{\varphi(n)} \equiv 1\pmod{n} \]

3. 证明

记小于$n$且与$n$互质的正整数集合$R$为：

\[ R = \{ r_1,r_2,r_3,……,r_{\varphi(n)}\} \]

再令集合S为：

\[ S = \{a\times r_1 \% n,a \times r_2 \% n,……,a\times r_{\varphi(n)}\%n\} \]

现在我们先证明$S$中的元素两两互不相同。

反证法：假设存在两个整数$i$ 和$j$（方便起见，令$i < j$）使得$a \times r_i \% n = a \times r_j \% n$,则：

\[ a \times r_i \equiv a \times r_j \pmod{n} \]

又因为$a$与$n$互质，所以$r_i = r_j$,而$i < j$，即 $r_i \neq r_j$，矛盾，所以$S$中元素两两互不相同。

又因为$a$ 与 $n$互质，$r_i$与 n互质，所以易得$a * r_i \% n$与 $n$ 互质，即$S$中的元素都与 $n$ 互质。

所以可以得到 $R = S$.

所以：

\[ \prod_{i = 1}^{\varphi(n)}{r_i} = \prod_{i = 1}^{\varphi(n)}{a \times r_i\%n} \]

\[ \prod_{i = 1}^{\varphi(n)}{r_i} \equiv \prod_{i = 1}^{\varphi(n)}{r_i \times a} \pmod{n} \]

\[ \prod_{i = 1}^{\varphi(n)}{r_i} \equiv a^{\varphi(n)} \times \prod_{i = 1}^{\varphi(n)}{r_i} \pmod{n} \]

又因为$_{i = 1}^{(n)}{r_i} $ 与 n 互质，

所以： \[ a^{\varphi(n)} \equiv 1 \pmod{n} \]

4.线性筛

先上代码：

#include<iostream>
#include<cstdio>
#include<cmath>
using namespace std;

const int maxn = 1e6 + 5;
int prime[maxn],px,l,r,n;
bool b[maxn];

int main() {
	scanf("%d",&n);
	for (int i=2; i<=n; i++) {
		if (!b[i]) 
			prime[++px] = i;
		for (int j=1; j<=px && i * prime[j] <= n; j++) {
			b[prime[j] * i] = 1;
			if (i % prime[j] == 0)
				break;
		}
	}
	for (int i=1; i<=px; i++)
		printf("%d\n",prime[i]);
	return 0;
}

算法的正确性证明中，其实我们要证明的就是两点：

每一个合数都会被筛掉。
每一个合数只能被筛一遍（以保证时间复杂度$O(n)$)

对于第一点，证明如下：

设这个合数为$v$,则 $v = p_1 p_2 …… p_k(p_i<=p_{i+1})$，$p$ 为 $v$的质因数。

可以发现，当$i = p_2 \cdot p_3 \cdot …… \cdot p_k$, $prime[j] = p_1$时，$i$与所有小于$p_1$的质数互质，所以中途不会被break掉，所以此时$v$可以被筛掉。

对于第二点，证明如下：

在第一点中，我们是证明了v会在$i = p_2 \times p_3 \times …… \times p_k$, $prime[j] = p_1$时被筛掉，那么现在就是要证明除了这种情况外，v不会被筛掉。

设当$i = p_1 \times p_2 \times ... \times p_{m - 1} \times p_{m + 1} \times ... \times p_k$ $(p_i <= p_{i + 1})$ 时

$prime[j] = p_m(p_m \neq p_1)$时 v 被筛掉.

可以发现 $p_1 | i$,所以当$prime[j] = p_1$时就会被break掉，所以v不会在此时被筛掉。

由此，我们可以证明，算法的正确性和时间复杂度的正确性。

5.求欧拉函数

首先，欧拉函数是一个积性函数。

积性函数的定义是：函数$f(x)$,若满足$f(m \times n) = f(m) \times f(n) (gcd(n,m) = 1)$,那么$f(x)$就是一个积性函数。

欧拉函数是一个积性函数。

设$n = p_1^{m_1} p_2^{m_2} p_3^{m_3} …… p_k^{m_k} $,$p_iP$，则有：

\[ \varphi(n) = \varphi(p_1^{m_1}) \times \varphi(p_2^{m_2}) \times …… \times \varphi(p_k^{m_k}) \]

对于$\varphi(p^k),p_i \in P$，易得$\varphi(p^k) = p^k - p^{k - 1} = p^k \times (1 - \frac{1}{p})$

所以有：

\[ \varphi(n) = p_1^{m_1} \times p_2^{m_2} \times p_3^{m_3} \times …… \times p_k^{m_k} \times (1 - \frac{1}{p_1}) \times (1 - \frac{1}{p_2}) \times …… \times (1 - \frac{1}{p_k}) \\= n \times (1 - \frac{1}{p1}) \times (1 - \frac{1}{p2}) \times …… \times (1 - \frac{1}{p_k})) \]

特别的，当$n \in P$时，$\varphi(n) = n - 1$.

理解了上面这些，可以上代码了：

#include<iostream>
#include<cstdio>
using namespace std;

const int maxn = 1e6 + 5;
int n,phi[maxn],prime[maxn],px;
bool b[maxn];

int main() {
	scanf("%d",&n);
	for (int i=2; i<=n; i++) {
		if (!b[i]) {
			prime[++px] = i;
			phi[i] = i - 1;
		}
		for (int j=1; j<=px && prime[j] * i <= n; j++) {
			b[i * prime[j]] = 1;
			if (i % prime[j] == 0) {
				phi[i * prime[j]] = phi[i] * prime[j];
				break;
			}
			phi[i * prime[j]] = phi[i] * phi[prime[j]];
		}
	}
	phi[1] = 1;
	for (int i=1; i<=n; i++)
		printf("%d\n",phi[i]);
	return 0;
}

可以看到，这个代码是由线性筛变过来的。

当$i \in P$时，毫无疑问$\varphi(i) = i - 1$.

当$i \notin P$时，分以下两种情况讨论：

若$i \% prime[j] \neq 0$，因为$prime[j] \in P$,所以$gcd(i,prime[j])) = 1$。因此根据积性函数的定义，有：

\[ \varphi(i \times prime[j]) = \varphi(i) \times \varphi(prime[j]) \]

若$i \% prime[j] = 0$，因为$prime[j] \in P$,所以$prime[j]$必定为 i 的一个质因数，所以$prime[j]$必定是$p_1,p_2,p_3,……,p_k$中的一个，因此得到： \[ \varphi(i \times prime[j]) = i \times prime[j] \times (1 - \frac{1}{p1}) \times (1 - \frac{1}{p2}) \times …… \times (1 - \frac{1}{p_k})) = \varphi(i) \times prime[j] \]

最后和线性筛一样，可以证明欧拉函数求法时间复杂度也是线性的。

特别的，$\varphi(1) = 1$.

6.一些神奇定理

$\sum_{d|n}\varphi(d) = n$
$\forall n > 1,1-n$中与n互质的数的和为$n \times \varphi(n) / 2$
$若gcd(a,n) = 1$ $ 正整数b$ $a^b \equiv a^{b \ mod \ \varphi(n)}(mod\ n)$
若$a$,$n$互质，则满足$a^x \equiv 1 (mod \ n)$的最小正整数$x_0$是$\varphi(n)$的约数。

gdb过程中的一些碎碎念

发表于 2024-11-03 更新于 2025-01-28

gdb

$esp表示寄存器%esp
可以在特定机器指令的地址上打断点，例如

b *0xff3ef23f

也可以这样：

b *phase_1 + 22
寄存器%pc和%eip中存放的值貌似都是将要执行的下一条指令所在地址

x/1i $pc可以显示将要执行的下一条指令
call命令会先把寄存器%ip压栈，然后把%ip赋值为调用的函数开头命令的地址
ret命令相当于就是pop %ip

2024牛客D1

发表于 2024-08-25 更新于 2025-01-28

A

Given two integer $n$ and $m$, among all the sequence containing n non-negative integers less than $2^m$, you need to count the number of such sequences $A$ that there exists a non-empty subsequence of $A$ in which the bitwise AND of the integers is $1$.

Since the answer may be very large, output it modulo a positive integer $q$.

Input

The only line contains three integers $n,m,q$. ($1 \le n,m \le 5000, 1 \le q \le 10^9$)

Output

Output a line containing an integer, denoting the answer.

Sample input 1

1	2 3 998244353

Sample output 1

Sample input 2

1	5000 5000 998244353

Sample output 2

Solution:牛客2024D1TA_哔哩哔哩_bilibili

Code

#include<bits/stdc++.h>
#define f(i,l,r) for (int i=l; i<=r; i++)
#define int long long

using namespace std;

const int maxn = 5e3 + 5;
int n,m,mo;
int C[maxn][maxn];

void pre() {
	f(i,1,5000)
		C[i][i] = C[i][0] = 1;
	f(i,2,5000)
		f(j,1,i - 1) {
			C[i][j] = (C[i - 1][j] + C[i - 1][j - 1]) % mo;			
		}

}

int ksm(int x,int y) {
	int ans = 1;
	while (y) {
		if (y & 1)
			(ans *= x) %= mo;
		y >>= 1;
		(x *= x) %= mo;
	}
	return ans;
}

signed main() {
	cin>>n>>m>>mo;
	pre();
	if (m == 1) {
		int ans = ksm(2,n) - 1;
		if (ans < 0) ans += mo;
		cout<<ans<<'\n';
		return 0; 
	}
	int ans = 0;
	
	f(i,0,n) {
		int tmp = ksm(ksm(2,i) - 1,m - 1);
		if (tmp < 0) tmp += mo;
		int oth = ksm(ksm(2,n - i),m - 1);
		int cnt = ((tmp * oth) % mo * C[n][i]) % mo;
		(ans += cnt) %= mo; 
	}
	if (ans < 0) ans += mo;
	cout<<ans<<'\n';
	return 0;
}

B

Given two integer $n$ and $m$, among all the sequence containing n non-negative integers less than $2^m$, you need to count the number of such sequences $A$ that there exists at least two different non-empty subsequence of $A$ in which the bitwise AND of the integers is $1$.

Since the answer may be very large, output it modulo a positive integer $q$.

Input

The only line contains three integers $n,m,q$. ($1 \le n,m \le 5000, 1 \le q \le 10^9$)

Output

Output a line containing an integer, denoting the answer.

Sample input 1

1	2 3 998244353

Sample output 1

Sample input 2

1	5000 5000 998244353

Sample output 2

530227736

Solution: 牛客2024D1TB_哔哩哔哩_bilibili

Code:

#include<bits/stdc++.h>
#define f(i,l,r) for (int i=l; i<=r; ++i)
#define int long long

using namespace std;

const int maxn = 5e3 + 5;
int n,m,mo,ans1,ans2;
int C[maxn][maxn],dp[maxn][maxn];
int f[maxn];

int ksm(int x,int y) {
	int ans = 1;
	while (y) {
		if (y & 1)
			(ans *= x) %= mo;
		y >>= 1;
		(x *= x) %= mo;
	}
	return ans;
}

void pre() {
	f(i,1,5000)
		C[i][i] = C[i][0] = 1;
	f(i,2,5000)
		f(j,1,i - 1) {
			C[i][j] = (C[i - 1][j] + C[i - 1][j - 1]) % mo;			
		}
	f[0] = 1;
	f(i,1,5000) 
		f[i] = (f[i - 1] << 1) % mo;
	//dp[i][j] = ksm(2^i - 1 - j, m - 1)
	dp[0][0] = 0;
	f(i,1,n) {  // j in [0,i]
		for (int j=1; j<=i; j+=2)
			dp[i][j] = (dp[i - 1][((1 + j) >> 1) - 1] * f[m - 1]) % mo;
		for (int j=0; j<=i; j+=2)
			dp[i][j] = ksm(f[i] - 1 - j,m - 1);
	}
}

void solve1() {
	pre();
	int ans = 0;
	
	f(i,0,n) {
		int v = f[i] - 1; if (v < 0) v += mo;
		int tmp = (ksm(f[i], m - 1) - ksm(v,m - 1)) % mo;
		if (tmp < 0) tmp += mo;
		int oth = ksm(f[n - i],m - 1);
		int cnt = ((tmp * oth) % mo * C[n][i]) % mo;
		ans += cnt;
	}
	ans = (ksm(f[n],m) - ans) % mo;
	if (ans < 0) ans += mo;
	ans1 = ans;
}
signed main() {
// 	freopen("in.in","r",stdin);
// 	freopen("1.out","w",stdout); 
	ios::sync_with_stdio(0); cin.tie(0); cout.tie(0);
	cin>>n>>m>>mo;
	if (mo == 1) {
		cout<<"0\n";
		return 0;
	}
	solve1();
	if (m == 1) {
		int final = f[n] - n - 1; final %= mo; if (final < 0) final += mo;
		cout<<final<<'\n';
		return 0; 
	}
	
	f(i,0,n) { // i: the number of how many entry with the least significant bit being 1
		if (i > m - 1) break;
		int sum = 0;
		int p = f[i] - 1; 
		f(j,0,i) {
			//dp[i][j] = ksm(2^i - 1 - j, m - 1)
			int tmp = (dp[i][j] * C[i][j]) % mo;
			if (j & 1) tmp = (~tmp + 1);
			sum += tmp;	
		} 
		sum %= mo;
//		if (sum < 0) sum += mo;
		int oth = ksm(f[n - i],m - 1);
		int loc = ((sum * oth) % mo * C[n][i]) % mo;
		ans2 += loc;
		
	}
	int final = (ans1 - ans2) % mo; if (final < 0) final += mo;
	cout<<final<<'\n';
	return 0;
}

/*
	int sum = 0;
		f(j,0,i - 1) {
			int tmp = ksm(n - j,m - 1) * C[i][j];
			if (j & 1) tmp = -tmp; tmp %= mo;
			(sum += tmp) %= mo;
		}
*/

CSAPP-notes

发表于 2024-08-03 更新于 2025-05-04

CSAPP notes

Lecture 2

Argument X X = 10100010

<< 3 X = 00010000

logical >> 2 X = 00101000

Arithmetic >> 2 X = 11101000

Logical right shift would fill the high digit with zero.

Arithmetic right shift would fill the high digit with the sign digit.
In real world practice, when you do X << Y in the machine, most machine would actually do X << (Y mod 8) (However, when I test it locally, this mod operation didn't apply)
The difference between && and &：
- && would only return 0 or 1, while & can return all kinds of integers
- && would consider any non-zero integer as 1, and consider 0 as 0. & certainly don't have this trait
- When the outcome of an expression is known, && would not calculate the rest of the expression. For example, if int x = 0, and we calculate expression a&&(5/a), it would not raise an DIVIDEZERO error as the second part would not be calculated. Similarly, if p is a null pointer, then p&&*p++ would not access a null pointer as the second part would be omitted.
If you shift left by a negative number, the outcome might not be a right shift. For example, 1 << (-2) might not be 1 >> 2
A different way to understand two's complement

For example, the two's complement of a number is 10110001, then the value of this number should be: \[ 1\times (-2^7) + 0 \times 2^6 + 1 \times 2^5 + 1 \times 2^4 + 0 \times 2^3 + 0 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 \]

In other word, when it comes to two's complement, we consider the weight of the highest digit to be negative. More accurately, the weight of the highest digit should be the opposite number of its original weight.
If a comparison operation involves two signed numbers, then it would do it signed ways. If a comparison operation involves two unsigned umbers, then it would do it unsigned ways.

However, if a comparison operation involves a signed number and an unsigned number, then it would first transform the signed number into unsigned form then compare these two unsigned numbers.

For example, if int x = -1; unsigned int y = 0, then x would be greater than y. In other word, x > y would be true.

This rule applies to other operations, including add, minus, etc.
Define int x = -2147483648 might make the compiler confused for obscure reason. You can do the same thing by int x = -2147483647 - 1
An example:
1
2
3
int i;
for (i = 10; i - sizeof(char) >= 0; i--)
printf("%d\",i);
For this program, what you will get is just an endless loop. Why？because sizeof(char) is an unsigned number, making the whole bunch of i - sizeof(char) to be considered unsigned. Thus, i - sizeof(char)>=0 always holds, making the loop endless.
How to promote a smaller type of integer to a larger type of integer?

Argument X	X = 10100010
<< 3	X = 00010000
logical >> 2	X = 00101000
Arithmetic >> 2	X = 11101000

$\hspace{1cm}$ For example, if we have a variable x of type int, how do we cast it to type long long?

$\hspace{1cm}$ Simply by repeatedly copying the sign digit and insert it to the left until the number of digit reach the word size of long long.

$\hspace{1cm}$ This is both true when it is a two's complement number and when it is an unsigned number.

Lecture 3

Addition for unsigned number (A.K.A. UAdd) :

Computer will just add the two numbers up and abandon the higher digit if it exceeds the word size.

For unsigned number u and v，perform addition between them would lead to the result of $(u + v) \% 2^w$ , with w representing the word size.
Addition for two's complement number (A.K.A TAdd):

Computer will do the same thing in UAdd: Add the two numbers up and abandon the higher digit if it exceeds the word size.

For example, if we have a 4-digit two's complement number u = 1101 and v = 0101. So u is actually -3 and v is actually 5. Thus, u + v should be 2 arithmetically. When we add the 1101 and 0101, we got 10010, and we abandon the most significant digit and we got 0010, which is 2 and it matches the arithmetic result.

Another example: u = 1101, v = 1010, so u is actually -3 and v is actually -6, so u + v should be -9 arithmetically. If we add 1101 and 1010, we got 10111. Abandon the highest digit we got 0111, which is 7.
UAdd and TAdd have identical Bit-level behavior
Multiplication for unsigned number:

Similar to addition, computer would first calculate the product, then abandon the higher digit exceeding the word size. Simply put, $u \times v$ would be $(u \times v) \% (2^w)$, with $w$ representing the word size
Multiplication for two's complement number:

Computer would first calculate the product, then abandon the higher digit exceeding the word size.

For example, if u=0101 and v = 0101, then $u \times v = 11001$. Abandon the highest digit we got 1001 , that would be -7.
A property for two's complement

Assume a number x can be represented $(\overline{a_1 a_2 a_3 ... a_n})_2$ in two's complement. Let's denote $(\overline{a_1 a_2 a_3 ... a_n})_2$ as $[a_1,a_2,...,a_n]$. And we could assert that: if $x = [a_1,a_2,...,a_n]$, then: \[ -x = [1 - a_1,1 - a_2, ... , 1 - a_n] +1 \] Notice: the last "+" operation will be done in computer ways, which means if it overflows, then the highest digit would be omitted.

For example, 0 = [0,0,0,0], so -0 = [1,1,1,1] +1 = [1,0,0,0,0], then we omit the highest number, then we could get [0,0,0,0], and the result is correct.
It appears that in my computer, when you do a ">>" operation then it would actually perform arithmetic >>
In a 64-bit word size machine, the address of the memory would be 8-bytes (which is also 64 bits). Thus, if we do printf("%d\n",sizeof(int*)), then it would probably get "8".
Terminology time:

Word means a range of memory unit storing a specific type of data. For example, an int word is comprised of 2 bytes, an long long word is comprised of 4 bytes.
So, how are the bytes within a multi-byte word ordered in memory?

Two ways: Big Endian; Little Endian;

Little Endian is the major way today.

The internet still use Big Endian.

The way to store strings:

Computer would just store the string from head to tail char-wise. There's no big-endian or little-endian here.

(Notes: IA32 is a little-endian machine, and Sun is a big-endian machine)

Lecture 4

An example
1
printf("%d\n",(2/3)==(2/3.0));
In fact, this program would output 0. Because (2/3) is an integer while (2 / 3.0) is a double number. Thus, the equation can not hold.
Implicit Conversion

In the C language, when an operation involves multiple types of variables, type conversions occurs to make sure that the type of different variables match. In operations, implicit conversions take place. Type conversions can also be called as cast.
- All integer type smaller than int (like short, char) will be promoted to int
- Low precision type will be cast to high precision type.(int would be cast to float, float would be cast to double)
Immediate Value

In the C language, immediate value refers to a constant value that is directly specified in code.

The immediate value is also known as a literal.

In the C language, the immediate value 3.14 are treated as type duoble by default.
IEEE floating point representation

The IEEE float point representation represents a number in the form $V = (-1)^s \times M \times 2^E$

The sign s determine whether this number is positive or negative

The exponent E weights the value by a (possibly negative) power of 2.

Floating-point number are represented in three fields above.
- The single sign bit in the front directly encode sign s
- The $k-bit$ exponent field $exp = e_{k-1}e_{k - 2}...e_0$ encodes the exponent $E$
- The $n - bit$ fraction field $frac = f_{n - 1}f_{n - 2}...f_0$ encodes the significand $M$
- Normalized Case
  
  When exp is not all 0 and exp is not all 1，it is the normalized case
  
  In this case, $E = e - bias$，where $bias = 2^{k - 1} - 1$ and $e$ is the unsigned number having the bit representation of $e_{k - 1}e_{k- 2}...e_0$.
  
  In this case, $M = (1.f_{n - 1}f_{n - 2}...f_0)_2$
- Denormalized Case
  
  When exp is all 0, it is the denormalized case, used to represent floating-point number that is close to zero.
  
  In this case, $E = 1 - bias$, where $bias = 2^{k - 1} - 1$.
  
  In this case, $M = (0.f_{n - 1}f_{n - 2}...f_0)_2$
- Special Case A
  
  When exp is all 1, and frac is all 0, this number means infinity. By assigning s to 1 or 0, we could get positive infinity and negative infinity.
- Special Case B
  
  When exp is all 1, and frac is not all 0, this number means NaN (Not a number). Such values are returned as the result of an operation where the result can not be given as a real number or as infinity, as when computing $\sqrt{-1}$ or $\infty - \infty$ .
Why do we have to define denormalized case?

In normalized case, we can not represent 0.

In denormalized case, we can represent 0 as well as some numbers really close to 0.
Floating-point Operation

The IEEE standard specifies a simple of way of floating-point operation. Viewing the two number as real numbers and just do the operation in mathematical ways, then round the mathematical result to yield the final result.

If one of the value is special (such as NaN, infinity,0), the IEEE standard specify a convention that attempt to be reasonable. To be specific, 1/-0 is defined to yield $-\infty$, $\infty - \infty$ is defined to yield NaN.
The floating-point addition is commutative but not associative. For example, a float x = 3.14 + 1e20 - 1e20 would yield 0, while a float x = 3.14 + (1e20 - 1e20) would yield 3.14
The floating-point multiplication is commutative but not associative. For example, 1e20 * 1e20 * 1e-20 would yield $+\infty$, while 1e20 * (1e20 * 1e-20) would yield 1e20

The floating-point multiplication does not distribute over addition.
Conversion between int, float and double:
- int -> float: The number will not overflow, but it may be rounded.
- int or float -> double: The number will not overflow and it will not be rounded. Because it has both greater range and greater precision.
- double -> float: The number might overflow(leading to $+\infty$ or $-\infty$) or be rounded.
- float or double -> int: The number would be rounded toward $0$.
Infinity can be compared. It is compared in mathematical ways.
For floating-point number $a$,$b$,$c$
- if $ a> b$, then $a + c > b + c$
- If $a > b$, then $-a < -b$
- For every floating-point number $x$, we have $x \times x \ge 0$

Data lab

Lecture 5

how to assemble a file
1
gcc -S test.c
Can assemble test.c into assemble code test.s
how to disassemble an object code file
1
objdump -d test > test.d
Can disassemble the object code file test into assemble code file test.d
Use gdb to disassemble a function:

Suppose we have a program code test.c and we want to disassemble the function build which is in this program.

First we have to compile test.c:
1
gcc -Og -o test test.c
Then we type
1
gdb test
Then we would get into the gdb interface.

After it, we type
1
disassemble build
and we could get the disassemble code
The concept of instruction set:

There are two kinds of prevalent instruction set, intel x86-64 and ARM. Most intel chip use x86-64 while mobile phone chip and apple chip use ARM. The course is about x86-64
The assembly code usually implement data manipulation in memory and register. There are 16 register, and there names are as follow:

%rax %r8

%rbx %r9

%rcx %r10

%rdx %r11

%rsi %r12

%rdi %r13

%rsp %r14

%rbp %r15
the syntax %rax refers to the register %rax, and the syntax (%rax) refers to the value in register %rax
A general way to designate address: \[ D(R_b, R_i, S) \] $D$ is a constant, $R_b$ and $R_i$ are the name of two general-purpose register, and the $S$ is the scaling factor restricted to be 1, 2, 4, or 8. This expression specifies an address, which would be: \[ D + ([numbers \ \ in \ \ R_b] + S \times [numbers \ \ in \ \ R_i]) \] For example, provided that the number stored in $R_b$ is 0x100, the number stored in $R_i$ is 0x200, $D = 2$, $S = 4$， then $D(R_b,R_i,S)$ would designate the address 2 + 0x100 + 0x200 * 4, which is 0x902.

(Denote: D is short for displacement, and S is short for scaling)
The MOV grammar:
- Move an immediate value to the register:
  
  mov $0x4, %rax
  
  This would move the immediate value 0x4 into the register %rax
- Move an immediate value to the memory:
  
  mov $-147, (%rax)
  
  In this case, there is an address stored in the register %rax and it would move the immediate value -147 to that address in the memory
- Move a value from a register to register
  
  mov %rad, %rdx
  
  It would copy the value stored in register %rad to register %rdx
- Move a value from a register to memory
  
  mov %rax, (%rdx)
  
  It would copy the value stored in register %rax to the address (%rdx) in the memory
- Move a value from memory to register
  
  mov (%rax), %rdx
  
  vice versa
Note: you can't move a value from memory directly to memory

Observation:

This operation simply move something into a register or the memory. If it is a register, the second argument should be the name of the register. If it is the memory, then the second argument should be the address, usually designated by (the name of a register) or D(Rb,Ri,S).

The first argument specifies the number we want to move. The first argument could be an immediate number, the name of a register or an address. If it is an immediate value, then it means it would move this literal. If it is the name of a register, then it means it would move the number stored in the register. If it is an address, it means it would move the number stored in this address.
The LEA grammar

The general form:LEA first_argument, second_argument

The first argument is an address, the second argument must be the name of a register.

This operation will move the address (the first argument) itself to the second argument (a register or an address).
- The difference between LEA and MOV: LEA would simply move the address itself while MOV would dereference the address and move the value stored in that address

%rax	%r8
%rbx	%r9
%rcx	%r10
%rdx	%r11
%rsi	%r12
%rdi	%r13
%rsp	%r14
%rbp	%r15

In practice, the first argument can be literal or the name of a register. This might violate the rule, but compiler still use this, especially in some arithmetic operation.

Can't understand MOV and LEA? Watch this video: 4 MOV vs LEA (youtube.com)

In a function, the first parameter passed to the function will be stored in register %rdi, the second parameter would be stored in %rsi. The return value of a function would be stored in register %rax
movl instruction：

This instruction could move a 32-bit value from one place to another.
- From register to register:
  
  movl %eax, %ebx
  
  This would move the value in %eax to %ebx
  
  (Note: %eax is the lower 32 bits of %rax, so does %ebx)
- From memory to register
  
  movl 0x100(%rbp), %eax
  
  This would move the 32-bit value from memory location at 0x100(%rbp) to register %eax. This would load 4 memory units and would load it upside down (because we are talking about little-endian machine)
  
  For example, here are the number stored in memory:
  
  Memory Location Value
  
  0x100 0x17
  
  0x101 0x25
  
  0x102 0x33
  
  0x103 0x64
  
  0x104 0x89
  
  0x105 0x26
  
  0x106 0x71
  
  If %rbp = 0, then after this operation, 0x64332517 would be stored in register %eax.
- From register to memory
  
  Similar, so omit it.
- Move a literal to memory & Move a literal to register
  
  Similar, so omit it.
Note: After the movl operation, the upper-32 bits of the destination(could be register or memory location) would be set to 0
A general property for instructions copying and generating 1-, 2-, 4-, 8- byte values: When the destination is a register, those that generates 1-,2- byte values leaves the remaining upper bytes unchanged, while those that generate 4- byte values will set the upper 4 byte to 0.
The program counter(commonly called %rip) indicates the address in memory of the next instruction to be executed.
The regular movq operation can only have immediate source operand that can be represented in 32-bit two's complement (Simply put, the immediate source operand must be in the range of $[-2147483648, 2147483647])$. Then the compiler would do sign extension to the 32-bit two's complement to make it 64-bit, then carry out the move.

What is sign extension?

Given a number represented in short-word-length two's complement, we want to represent this number in a long-word-length two's complement without changing its arithmetic value. This transformation is called sign extension.

We can finish this task by replicating the sign digit. Whether the number is positive or not, you will find by doing the replication the value remain unchanged.

For example, 0x0011 after sign extension becomes 0x000011, the value remains unchanged. (before: 3; after: 3)

Another example: 0x1100 after sign extension becomes 0x111100， and the value remains unchanged (before: -4; after: -32+16+8+4 = -4;)
As mentioned above, the regular movq instruction has limitation on literal, that's why the instructionmovabsq comes into life. It can move arbitrary 64-bit two's complement literal, and the destination must be a register.
When copying from a smaller source to a larger destination (for example, from a 32-bit to 64-bit), the x86-64 ISA provides two instructions: movz and movs.

movz would set the remaining higher bits to 0, while movs would fill the remaining higher bits by replicating the most significant bits of the source operand.

Here are some example:

Memory Location	Value
0x100	0x17
0x101	0x25
0x102	0x33
0x103	0x64
0x104	0x89
0x105	0x26
0x106	0x71

The operation cltq has the same effect of the operation movslq, but it has more compact encoding.

cltq is short for convert l(double word) to q(quad word)

The unary operation:

It has a single operand serving as both resource and destination. For example, we could have:

1 2	incq(%rax) # to increment register %rax by 1 decq(%rax) # to decrement register %rax by

The arithmetic binary operation:

addq %rcx, %rax   # to increment register %rax by the value stored in %rcx
subq %rcx, %rax   # to decrement register %rax by the value stored in %rax
mulq %rcx, %rax   # rax <---- (rax * rcx)   (value stored in rax and rcx are considered to be unsigned)
imulq $16, (%rax,%rdx,8) # to multiply the address %rax + 8 * %rdx by a factor of 16 (value involved are considered to be two's complement)

For these binary operations, there are two observations:

The source would be the first argument, and the destination would be the second argument.

The first operand could be an immediate number, a register or a memory location. The second operand could be an register or a memory location. However, the first operand and the second operand could not both be memory location.

Shift operation:

The operation involves two operand, the first operand specifies the shift amount, and the second value designates the value we want to shift. Shift amount usually is a immediate number, but it can also be the value stored in single-byte register %cl.

A shift operation on the data type that are w bits long would use the lower m bits of the register %cl, where $2^m = w$. For example, if the register %cl stores the hexadecimal value of $0XFF, then SALB would shift by 7 digits, SALW would shift by 15 digits, SALL would shift by 31 digits, and SALQ would shift by 63 digits.
- Left shift:
  
  Comprises two instructions: SAL and SHL. These instructions have same effect
- Right shift:
  
  Comprises two instructions: SAR and SHR. The former is the arithmetic right shift, and the latter is the logical right shift.
Single Operand Multiplication:

This types comprise two operations: imulq S and mulq S. The former one is for two's complement, and the latter one is for unsigned. Apart from this difference, the rest are the same, so we would only talk about imulq S.

The operation imulq S would calculate the value $S \times M[\%rax]$ keep the all 128-bit without truncate. Then it would store the higher 64 bits to register %rdx and store the lower 64 bits to register %rax.
cqto operation:

This would convert a 64-bit value to 128-bit value. Specifically, it would sign extend %rax to make it to be a 128-bit value, and then store the upper 64 bit in %rdx and store the lower-64 bit in %rax
divide operation:

There are divide operation: idivq and divq. Similarly, former one is for two's complement, and the latter one is for unsigned. Let's just talk about idivq since divq is the same case.

Syntax for idivq: idivq S
- case 1: If the bits of $rdx are all set to the sign bit of $rax (for operation divq S, we should look at whether the bits of $rdx are all set to zero), then we would consider that the dividend is a 64-bit value stored in $rax.
- case 2: If the bits of $rdx are not all set to the sign bit of $rax (for operation divq S, we should look at whether the bits of $rdx are all set to zero), then we would consider that the dividend is a 128-bit value, with higher 64 bits stored in $rdx and lower 64 bits stored in $rax
Anyway, after specifying the dividend, then the machine would do the division, and yield a quotient and a remainder.

The quotient would be stored in $rax and the remainder would be stored in $rdx.

CSAPP-lecture-6

发表于 2024-08-02 更新于 2025-05-04

CSAPP lecture 6

1. Condition code

Four condition codes:
- CF: Carry Flag. Will be 1 if there is a carry out of the most significant bit. Used to detect the overflow of the unsigned number
- ZF: Zero Flag. Will be 1 if the most recent operation yield 0.
- SF: Sign Flag. Will be 1 if the most recent operation yield a value with the sign bit being 1.
- OF: Overflow Flag. Will be 1 if the most recent operation caused an overflow (for two's complement), either positive overflow or negative overflow.
The leaq operation would not alter any of the condition code. All other arithmetic operation would make these condition code to change.
For the logical operation (such as xor or add), they will leave the CF and OF to be 0.
For the shift operation, the CF will be set to the last digit shifted out, and the OF will be set to 0.
For the inc and dec operation, they will set the OF and ZF, but they will leave the CF unchanged.
There are two instruction classes which can only alter the condition codes without changing the other value.

We can check the sign of a value or whether this value is 0 by test %rax %rax. (Suppose this value is stored in register %rax)

2. Access the condition code

Access the condition code: Rather than directly access the condition code, we prefer to access the condition code indirectly, using the following 3 methods:
- We can set a single byte to 0 or 1, depending on some combination of the condition codes.
- We can conditionally jump to some part of the program
- We can conditionally transfer data.

2.1 How to set a single byte according to the condition code

These set class of instruction have a single operand D indicating the destination. The destination have to be a register or a memory location. If it is a register, then it would set the least significant byte of the register to 1 (usually this register have to be %al kind rather than %rax kind.)

an example:

comp:
	cmpq	%rsi %rdi
	setl	%al
	movzbq	%al %rax
	ret

2.2 (Conditionally) jump to some part of the program

Unconditionally jump:

	movq $0, %rax
	jmp .L1
	movq (%rax), %rdx
.L1:
	popq %rdx

The jmp instruction would jump to .L1 label

jmp is the unconditional jump operation. It can jump directly or indirectly. When it is directly, it's something like jmp .L1. When it is indirectly, it is something like jmp *%rax or jmp *(%rax), where the label is stored in the register %rax or the memory location (%rax)

Conditional jump:

2.3 Jump instruction encodings

In assembly code, to specify where we want to jump to we can simply write something like .L1 happily. In the disassembled machinery code (the assemble code translated from the machinery code), the jump operation would written in the form like jmp 0x1137. There will be an address operand like 0x1137 to specify which operation will the program jump to. In this case, the jump operation would jump to 0x1137 and do the operation in 0x1137

But in machinery code, how to specify where we want to jump to? There are two ways to do this, first is so called PC relatives. That is saying, they encode the target address as the difference between the target address and the address of the instruction immediately following the jump instruction.

The difference between a and b is a - b

The reason why we try to use this kind of notation is that the jump target would remain unchanged when the whole program is shifted to another address.

1
2

eb 03 # In this case, 0x03 is the encoded jump target. 0x03 has to be 							added by the address of the next instruction to get the 								target address. 
# "eb 03" is a machine code in binary. "eb" specifies the type of the operation and "03" specifies the encoded jump target

Note: this 0x03 is a two's complement number, so for example, if now it is 0xFF, then it is actually -1, and the actual jump target would be the address of the next instruction minus 1.

Question

Sometimes we would see operation jmp .L1 and sometimes we would see operation jmp 0x1137. Which one (probably both) is the assembly code? What's difference between these two command?

2.4 Conditional Move

Apart from the jump operation, we have an operation called conditional move.

long comp(long x,long y) {
    long res;
    if (x < y)
        res = y - x; else
        res = x - y;
    return res;
}

For example, this C code might be translated into the following assembly code:

absdiff:
	movq 	%rsi, %rax
	subq	%rdi, %rax
	movq	%rdi, %rdx
	subq	%rsi, %rdx
	cmpq	%rsi, %rdi
	cmovge	%rdx, %rax	#If the last comparison said "%rdi >= %rsi", then this move will 						  conditional move will apply. Otherwise nothing will happen.
	ret

Question

Why will the conditional move operation reduce the penalty for misprediction? It also need to decide whether to do the move instruction, and the misprediction should affect the following pipeline as well.

The class of conditional move:

Some property for the conditional move:

The destination must be register
The data type can be 2,4,8 bytes long, but can't be 1 byte long
The instruction don't need a suffix to specify the data type. The processor could infer the operand length from the name of the destination register.

The condition move may not be valid if one expression is invalid.

for example
1
2
3
4
int *p;
if (!p)
    return 0; else
    return *p;
You can not calculate *p beforehand as p may be a null pointer, making the dereferencing invalid.

Three Formulae

发表于 2024-06-09 更新于 2025-01-28

\[ \int\sqrt{a^2 - x^2}dx = \frac x2\sqrt{a^2 - x^2} + \frac {a^2}2\arcsin(\frac xa) + C \] \[ \int \frac 1{\sqrt{x^2 \pm a^2}}dx = \ln|x + \sqrt{x^2 \pm a^2}| + C \] \[ \int \sqrt{x^2 \pm a^2}dx = \frac x2 \sqrt{x^2 \pm a^2} \pm\frac {a^2}2\ln|x + \sqrt{x^2 \pm a^2}| + C \]

类型	E:期望	D：方差
\(U(a,b)\)	\(\frac {a + b}2\)	\(\frac{(b - a)^2}{12}\)
\(P(\lambda)\)	\(\lambda\)	\(\lambda\)
\(E(\lambda)\)	\(\frac 1\lambda\)	\(\frac 1{\lambda^2}\)
\(G(p)\)	\(\frac 1p\)	\(\frac q{p^2}\)
\(N(\mu, \sigma^2)\)	\(\mu\)	\(\sigma^2\)
\(B(n,p)\)	\(np\)	\(npq\)