2017-08-02 20:40:27
函数式编程(functional programming),与指令性编程(imperative programming)相对,是以函数为核心的编程范式(paradigm)
在FP范式中,函数是一等公民 (像变量一样,可以被创建,被作为参数,被作为返回结果)
FP思想起源于Princeton Univ的 Alonzo Church 提出的\(\lambda\)演算
FP最早应用在Lisp语言,如今Python、JavaScript、Haskell、Erlang、Clojure、R、Mathematica等均支持FP
FP是学术界为数值计算而设计出来的,尤其适合数值计算
R在本质上是一种FP语言 (回忆一下apply家族)
一个调查问卷,采集了5个人对a-f六个问题的答案。部分答案为-9,表示漏填
要求: 将df数据集里的-9都改为NA
set.seed(1234) df <- as.data.frame(matrix(sample(c(1:5, -9), 30, replace=TRUE), ncol=6)) names(df) <- letters[1:6]; print(df)
## a b c d e f ## 1 1 4 5 -9 2 5 ## 2 4 1 4 2 2 4 ## 3 4 2 2 2 1 -9 ## 4 4 4 -9 2 1 5 ## 5 -9 4 2 2 2 1
df$a[df$a == -9] <- NA df$b[df$b == -9] <- NA df$c[df$c == -9] <- NA df$d[df$d == -9] <- NA df$e[df$e == -9] <- NA df$f[df$f == -9] <- NA
问题来了
for (col in letters[1:6]){ df[df[, col] == -9, col] <- NA }
或
for (col in letters[1:6]){ df[, col][df[, col] == -9] <- NA }
symb2na <- function(x) { x[x == -9] <- NA; x } df$a <- symb2na(df$a) df$b <- symb2na(df$b) df$c <- symb2na(df$c) df$d <- symb2na(df$d) df$e <- symb2na(df$e) df$f <- symb2na(df$f)
symb2na <- function(x) { x[x == -9] <- NA x } df[] <- lapply(df, symb2na)
或用匿名函数,连函数名都省了
df[] <- lapply(df, function(x) { x[x == -9] <- NA x })
symb2na <- function(x, symb){ x[x %in% symb] <- NA x } df[] <- lapply(df, symb2na, symb=c(-9, -99))
replaceVal <- function(x, v1, v2){ x[x %in% v1] <- v2 x } df[] <- lapply(df, replaceVal, v1=c(-9, -99), v2=9)
R是多范式的,应当灵活综合OOP和FP的优点
apply
家族%>%
匿名函数同样有头、体、环境
formals(function(x) mean(x)+c(1,-1)*sd(x))
## $x
body(function(x) mean(x)+c(1,-1)*sd(x))
## mean(x) + c(1, -1) * sd(x)
environment(function(x) mean(x)+c(1, -1)*sd(x))
## <environment: R_GlobalEnv>
匿名函数最常用于apply
家族函数
lapply(iris[,1:4], function(x) mean(x)+c(1, -1)*sd(x))
## $Sepal.Length ## [1] 6.671399 5.015267 ## ## $Sepal.Width ## [1] 3.493200 2.621467 ## ## $Petal.Length ## [1] 5.523298 1.992702 ## ## $Petal.Width ## [1] 1.9615710 0.4370957
square
和cube
都是函数闭包
power <- function(exp){ function(x) x ^ exp } square <- power(2) cube <- power(3)
power
用来生成函数闭包,叫工厂函数(function factory)
c(square(3), cube(3))
## [1] 9 27
c(power(2)(3), power(3)(3))
## [1] 9 27
square
和cube
函数定义是一样的
all.equal(square, cube)
## [1] TRUE
unenclose后则不同
library(pryr) c(unenclose(square), unenclose(cube))
## [[1]] ## function (x) ## x^2 ## ## [[2]] ## function (x) ## x^3
函数闭包会保留数据
new_counter <- function() { i <- 0 function() { i <<- i + 1 i } } counter1 <- new_counter() counter2 <- new_counter()
测一下
counter1()
## [1] 1
counter1()
## [1] 2
counter1()
## [1] 3
counter2()
## [1] 1
library(magrittr) a <- lapply(1:4, function(i) data.frame( I(rownames(mtcars)[1:5]), mtcars[1:5,i]) %>% `names<-`(c("Car", names(mtcars)[i]))) str(a, vec.len=1)
## List of 4 ## $ :'data.frame': 5 obs. of 2 variables: ## ..$ Car:Class 'AsIs' chr [1:5] "Mazda RX4" ... ## ..$ mpg: num [1:5] 21 21 ... ## $ :'data.frame': 5 obs. of 2 variables: ## ..$ Car:Class 'AsIs' chr [1:5] "Mazda RX4" ... ## ..$ cyl: num [1:5] 6 6 ... ## $ :'data.frame': 5 obs. of 2 variables: ## ..$ Car :Class 'AsIs' chr [1:5] "Mazda RX4" ... ## ..$ disp: num [1:5] 160 160 ... ## $ :'data.frame': 5 obs. of 2 variables: ## ..$ Car:Class 'AsIs' chr [1:5] "Mazda RX4" ... ## ..$ hp : num [1:5] 110 110 ...
merge(merge(merge(a[[1]], a[[2]]), a[[3]]), a[[4]])
Reduce
Reduce(merge, a)
Car mpg cyl disp hp 1 Datsun 710 22.8 4 108 93 2 Hornet 4 Drive 21.4 6 258 110 3 Hornet Sportabout 18.7 8 360 175 4 Mazda RX4 21.0 6 160 110 5 Mazda RX4 Wag 21.0 6 160 110
Map
就等于 mapply
Filter(function(df) min(df[,2]>100), a)
## [[1]] ## Car disp ## 1 Mazda RX4 160 ## 2 Mazda RX4 Wag 160 ## 3 Datsun 710 108 ## 4 Hornet 4 Drive 258 ## 5 Hornet Sportabout 360
Find
返回第一个满足条件的元素Position
返回第一个满足条件的元素的索引号Filter(Negate(function(df) max(df[,2]>10)), a)
## [[1]] ## Car cyl ## 1 Mazda RX4 6 ## 2 Mazda RX4 Wag 6 ## 3 Datsun 710 4 ## 4 Hornet 4 Drive 6 ## 5 Hornet Sportabout 8
set.seed(12345) f <- function(fun, ...){ stopifnot(is.function(fun)) fun(...) } f(mean, 1:10)
## [1] 5.5
matrix(f(rnorm, 6, 0, 1), nrow=2)
## [,1] [,2] [,3] ## [1,] 0.5855288 -0.1093033 0.6058875 ## [2,] 0.7094660 -0.4534972 -1.8179560
set.seed(12345) hiOrdFun <- function(fun, ...){ stopifnot(is.function(fun)) function(...) fun(...) } hiOrdFun(mean)(1:10)
## [1] 5.5
matrix(hiOrdFun(rnorm)(6, 0, 1), nrow=2)
## [,1] [,2] [,3] ## [1,] 0.5855288 -0.1093033 0.6058875 ## [2,] 0.7094660 -0.4534972 -1.8179560
将函数组成列表,结合高阶函数,可以套用apply
家族函数
求一组数据的mean、sd、fivenum:
x <- c(sample(1:100, 10), NA) list(mean(x, na.rm=TRUE), sd(x, na.rm=TRUE), fivenum(x))
## [[1]] ## [1] 47.3 ## ## [[2]] ## [1] 30.86188 ## ## [[3]] ## [1] 1.0 30.0 40.5 74.0 98.0
函数列表解决方案
f.lst <- list(mean, sd, fivenum) lapply(f.lst, f, x, na.rm=TRUE)
## [[1]] ## [1] 47.3 ## ## [[2]] ## [1] 30.86188 ## ## [[3]] ## [1] 1.0 30.0 40.5 74.0 98.0
工厂函数 + 函数列表 + 高阶函数 + 匿名函数
lst.fun <- lapply(f.lst, hiOrdFun) lapply(lst.fun, function(f) f(x, na.rm=TRUE))
## [[1]] ## [1] 47.3 ## ## [[2]] ## [1] 30.86188 ## ## [[3]] ## [1] 1.0 30.0 40.5 74.0 98.0
power.fun <- function(exp, ...) function(...) list(...)[[1]] ^ exp f.lst <- c(square=2, cube=3, powfour=4) pow <- lapply(f.lst, power.fun)
c(pow$square(4), pow$cube(4), pow$powfour(4))
或
sapply(pow, function(f) f(4))
## square cube powfour ## 16 64 256
用list2env
将函数列表pow提取到环境中
list2env(pow, environment())
## <environment: R_GlobalEnv>
c(square(3), cube(3), powfour(3))
## [1] 9 27 81
即实现了函数的批量生成
Reduce
的例子中,合并四个列表也可以用递归法
f <- function(dfs, ...) { if (length(dfs) == 2) merge(dfs[[1]], dfs[[2]], ...) else merge(dfs[[1]], f(dfs[-1]), ...)} print(f(a))
## Car mpg cyl disp hp ## 1 Datsun 710 22.8 4 108 93 ## 2 Hornet 4 Drive 21.4 6 258 110 ## 3 Hornet Sportabout 18.7 8 360 175 ## 4 Mazda RX4 21.0 6 160 110 ## 5 Mazda RX4 Wag 21.0 6 160 110
Thank you!