Applied Statistics for Social Science Research
\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\P}{\mathbb{P}} \DeclareMathOperator{\V}{\mathbb{V}} \DeclareMathOperator{\L}{\mathscr{L}} \DeclareMathOperator{\I}{\text{I}} \]
“The purrr::map*
functions transform their input by applying a function to each element of a list or atomic vector and returning an object of the same length as the input.”
min = 1
, second min = 2
… last vector has min = 10
, and max = 15
for all“pivot_longer()
”lengthens” data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()
”
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris_long <- iris |> pivot_longer(!Species, names_to = "length_width", values_to = "measure")
head(iris_long)
# A tibble: 6 × 3
Species length_width measure
<fct> <chr> <dbl>
1 setosa Sepal.Length 5.1
2 setosa Sepal.Width 3.5
3 setosa Petal.Length 1.4
4 setosa Petal.Width 0.2
5 setosa Sepal.Length 4.9
6 setosa Sepal.Width 3
1 | 2 | 3 |
---|---|---|
105.1271 | 110.5171 | 116.1834 |
106.8939 | 114.2631 | 122.1403 |
108.6904 | 118.1360 | 128.4025 |
110.5171 | 122.1403 | 134.9859 |
112.3745 | 126.2802 | 141.9068 |
114.2631 | 130.5605 | 149.1825 |
116.1834 | 134.9859 | 156.8312 |
118.1360 | 139.5612 | 164.8721 |
120.1215 | 144.2917 | 173.3253 |
122.1403 | 149.1825 | 182.2119 |
map()
functions.purrr::map()
functions instead of R’s *apply()
.rates <- seq(0.05, 0.20, length = 10)
P <- 100
time <- seq(1, 5, length = 50)
Pe <- function(A, r, t) A * exp(r * t)
time |> map(\(x) Pe(A = P, r = rates, t = x))
# above is a shortcut for
map(time, function(x) Pe(A = P, r = rates, t = x))
# and the above is a shortcut for the following loop
l <- list()
for (i in seq_along(time)) {
l[[i]] <- Pe(A = P, r = rates, t = time[i])
}
Here is a cheat sheet explaining tidyr
functions.
rate | year | value |
---|---|---|
0.05 | 1 | 105.1271 |
0.05 | 2 | 110.5171 |
0.05 | 3 | 116.1834 |
0.05 | 4 | 122.1403 |
0.05 | 5 | 128.4025 |
0.05 | 6 | 134.9859 |
0.05 | 7 | 141.9068 |
0.05 | 8 | 149.1825 |
0.05 | 9 | 156.8312 |
0.05 | 10 | 164.8721 |
RStudio cheatsheats
install.packages('HistData')
library(HistData)
?Arbuthnot
Year | Males | Females | Plague | Mortality | Ratio | Total |
---|---|---|---|---|---|---|
1629 | 5218 | 4683 | 0 | 8771 | 1.114243 | 9.901 |
1630 | 4858 | 4457 | 1317 | 10554 | 1.089971 | 9.315 |
1631 | 4422 | 4102 | 274 | 8562 | 1.078011 | 8.524 |
1632 | 4994 | 4590 | 8 | 9535 | 1.088017 | 9.584 |
1633 | 5158 | 4839 | 0 | 8393 | 1.065923 | 9.997 |
1634 | 5035 | 4820 | 1 | 10400 | 1.044606 | 9.855 |
Image Source: Calculus Volume 1
\[ \begin{eqnarray} v(t) & = & \frac{d}{dt} \left( 2 + 3t^2 \right) = 6t \\ x(t) & = & \int{6t\, dt} = 3t^2 + C \end{eqnarray} \]
Why almost? Look at the constant \(2\) in \(\frac{d}{dt}(2 + 3t^2)\). You can replace it with any other constant and the result will still be \(6t\).
To put it another way, to characterize the position fucntion you need to know the intial position and you can’t get that from the velocity function alone.
The position function for different values of initial position \(C\). Notice that the only thing that changes is the intercept.
OpenStax: Here is a more complete list
\[ \begin{eqnarray} \int [f(x) + g(x)] \, dx &=& \int f(x) \, dx + \int g(x) \, dx \\ \int [c \cdot f(x)] \, dx &=& c \int f(x) \, dx \end{eqnarray} \]
Unlike derivatives, there are generablly no rules for finding integrals
Most integrals do not have a closed-form, analytical solutions. This is true for almost all integrals in statistics.
In one or two dimentions, it is easy to evaluate most integrals numerically.
In higher dimentions, you need very sophisticated methods that rely on Markov Chain Monte Carlo (MCMC). We will not cover MCMC in this course.
For simple integrals we can somtimes find a closed-form solution by relying on u-substitution and integration by parts.
Notice that the function takes a function as an argument. These are called higher order functions.
Source: OpenStax Calculus Volume 1
\[ \begin{eqnarray} \int \sqrt{u} \cdot \sin(u) \frac{1}{2\sqrt{u}}du & = & \\ \frac{1}{2}\int \sin(u)\, du & = & \\ -\frac{1}{2} \cos(u) & = & -\frac{1}{2} \cos(x^2) \end{eqnarray} \]
When in doubt, you can always try WolframAlpha
Python library SymPy through R pacakge caracas
library(caracas); library(stringr)
add_align <- function(latex) {
str_c("\\begin{align} ", latex, " \\end{align}")
}
add_int <- function(latex) {
str_c("\\int ", latex, "\\, dx")
}
x <- symbol('x'); f <- x^2 / sqrt(x^2 + 4)
tex(f) %>% add_int() %>% str_c(" =") %>% add_align() %>% cat()
\[\begin{align} \int \frac{x^{2}}{\sqrt{x^{2} + 4}}\, dx = \end{align}\]
\[\begin{align} \frac{x \sqrt{x^{2} + 4}}{2} - 2 \operatorname{asinh}{\left(\frac{x}{2} \right)} \end{align}\]
\[ f(x) = \lambda e^{-\lambda x}, \, x > 0, \text{and } \lambda > 0 \]
This distribution is sometimes called the waiting time (to some event) distribution, where \(\lambda\) is the rate of events we expect
One property of this distribution is that no matter how long you wait, the probability of seeing an event remains the same.
One of the properties of the PDF is that it must integrate to 1
Let’s check that it’s true
\[ \int_{0}^{\infty} \lambda e^{-\lambda x} dx = \]
\[ \begin{eqnarray} (f g)' & = & f'g + g'f \\ \int (f g)' \, dx & = & \int f'g dx + \int g'f \, dx \\ fg & = & \int f'g \, dx + \int g'f \, dx \\ \int f g' \, dx & = & fg - \int f' g \, dx \\ u & = & f(x) \\ v & = & g(x) \\ du & = & f'(x) \, dx \\ dv & = & g'(x) \, dx \\ \int u \, dv & = & uv - \int v \, du \end{eqnarray} \]
Recall, at the beginning we defined an exponential growth with the following differential equation:
\[ \frac{dy(t)}{dt} = k \cdot y(t) \]
We can now solve it:
\[ \begin{align*} \frac{1}{y} \, dy &= k \, dt \\ \int \frac{1}{y} \, dy &= \int k \, dt \\ \log(y) &= k \cdot t + C \\ y(t) &= y_0 \cdot e^{kt} \end{align*} \]
iris
(?iris)group_by
and summarise
functions from dplyr
)geom_density
and facet_wrap
functions)\[ \int 2x \cos(x^2)\, dx \]
integrate
function to validate your answer.