4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧

博客小编 (94) 2024-06-13 22:01:01

b站课程视频链接：
https://www.bilibili.com/video/BV19x411X7C6?p=1
腾讯课堂(最新，但是要花钱，我花99😢😢元买了，感觉讲的没问题，就是知识点结构有点乱，有点废话）：
https://ke..com/course/#term_id=

本笔记前面的笔记参照b站视频，【后面的画图】参考了付费视频
笔记顺序做了些调整【个人感觉逻辑顺畅】，并删掉一些不重要的内容，以及补充了个人理解
系列笔记目录【持续更新】：4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧

文章目录

1. rehsape2包介绍
2. 使用前必须准备的工作
- 2.1 melt函数：宽数据——>长数据
- 2.1 cast函数：长数据——>宽数据
- - ① dcast( )函数
  - ② acast()函数
- 3. 其他函数
- - 3.1 `add_margins()`函数
  - 3.2 `recast ()`函数
  - 3.3 `melt_check()`函数
  - 3.4 `colsplit()`函数

1. rehsape2包介绍

reshape2包是由Hadley Wickham开发的一个R包，从其命名不难看出，reshape2包可以对数据重塑，就像炼铁一样，先融化数据，再重新整合数据，它的主要功能函数为cast()和melt()，实现了长数据格式与宽数据格式之间的相互转换。
比如说，如果你要做回归等等的多变量分析，用到glm等等，那必然要用宽格式数据；
再比如说，如果你要到ggplot里面画图，按照Hadley大神的可视化语法思想，多半是要用长格式的数据的。

宽数据格式：每个变量单独成一列。长数据格式：变量ID没有单独成列，而是整合在同一列代码： airquality names(airquality) <- tolower(names(airquality)) aql <- melt(airquality, id.vars = c(“month”, “day”)) View(aql)

2. 使用前必须准备的工作

# 安装 install.packages("reshape2") # 导入 library(reshape2) #reshape2包的学习主要以官方推荐的空气质量数据集(airquality)为例 使用前必须要将列名变成小写，否则后面会报错 names(airquality) <- tolower(names(airquality))

2.1 melt函数：宽数据——>长数据

官方介绍

从官方文档可以看出，melt()函数可以将一个对象“融化”为一个数据框。
对于不同的数据结构，melt()函数有不同的用法，如：
(1) 数据框(data frame)：melt.data.frame()
(2) 数组(array)：melt.array()、melt.matrix()、melt.table()
(3) 列表(list)：melt.list()
(4) 向量(vector)：melt.default()

用法
由于实际处理数据时，数据框(data frame)使用较为普遍，所以在此以数据框为例。

melt( data, 数据集 id.vars, ID变量的向量，可以是整数(变量位置)或字符串(变量名)。如果为空，将使用所有的变量 measure.vars, 测量变量的向量。可以是整数(变量位置)或字符串(变量名称)。如果为空，将使用所有测量变量 variable.name = "variable", 用于存储测量变量名的变量名 na.rm = FALSE, 逻辑值，是否移除数据集中的NA value.name = "value", 用于存储值的变量的名称 factorsAsStrings = TRUE 逻辑值，控制因子型变量是否转换为字符型 )

melt( )例子：

aml1 <- melt(airquality, id.vars=c("month", "day"))

代码：
airquality

names(airquality) <- tolower(names(airquality))
aml1 <- melt(airquality, id.vars=c(“month”, “day”))
View(aml1)

aml2 <- melt(airquality, id.vars =c("month", "day"),value.name = "my value") head(aml2)

黄色背景展示了相较于上一步的不同
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第3张

aml3 <- melt(airquality, id.vars =c("month", "day"),value.name = "my value",na.rm = T) head(aml3)

可以发现，NA值所在的行被删除了！
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第4张

aml4 <- melt(airquality, id.vars =c("month", "day"),value.name = "my value", na.rm = T,variable.name = "my variable") head(aml4)

黄色背景展示了相较于上一步的不同
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第5张

aml5 <- melt(airquality, id.vars =c("month", "day"),value.name = "my value", na.rm = T,variable.name = "my variable",measure.vars = "temp") head(aml5)

黄色背景展示了相较于上一步的不同；measure.vars参数能够选择某一列，或者除id.vars之外所有列作为variable。
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第6张

接下来，我们再对列表数据的处理简单讲解。
melt.list()函数能够递归的拆分列表元素。

参数详解
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第7张

level ： 用于设置标签，默认值为1。

例子：

list1 <- as.list(c(1:10, c(NA,2,3,4))) list1

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第8张

melt(list1)

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第9张

names(list1) <- letters[1:14] melt(list1)

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第10张

如果列表包含矩阵

a <- list(matrix(1:4, ncol=2), matrix(1:6, ncol=2)) a melt(a)

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第11张

其他的melt函数使用方法此处不再赘述。

2.1 cast函数：长数据——>宽数据

cast()函数具有两种形式：
1️⃣ dcast() ：输出为数据框
2️⃣ acast() ：输出为向量、矩阵、数组

① dcast( )函数

dcast( data, 数据集 formula, 公式格式如下：x_variable + x_2 ~ y_variable + y_2 ~z_variable ~ … ； "…“表示公式中未使用的所有其他变量；”."代表没有变量 fun.aggregate = NULL, 聚合函数，如果变量不能识别每个输出单元的单个观察值时需要设置此参数。如mean,sum等。 margins = NULL, 变量名的向量(可以包括“grand_col”和“grand_row”)用来计算其边距，值为TRUE时计算所有边距。 不能被追加的变量都将被悄悄地删除 subset = NULL, 用于取子集 fill = NULL, 用于填补缺失值的值 drop = TRUE, 错失的组合保留还是删除 value.var = guess_value(data) )

例子：

# 首先将airquality数据框转化为长数据 data <- melt(airquality, id=c("month", "day"), na.rm=T)

dcs1 <- dcast(data, month+day ~ variable) head(dcs1)

可以看出，dcs1与airquality完全一致。

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第13张

dcs2 <- dcast(data, month ~ variable, mean) # average effect of month dcs2

把day这一列合在一起（mean代表合在一起是求平均值）
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第15张

dcs3 <- dcast(data, month ~ variable, mean, margins = c("month", "variable")) dcs3

黄色标记处展示了与之前结果的差异。
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第16张

② acast()函数

用法与dcsat()函数类似 ，acsat()函数用法以ChickWeight数据集为例介绍。

准备工作：

# 载入数据集 data(ChickWeight) head(ChickWeight) # weight time chick diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5 76 8 1 1 6 93 10 1 1 # 首先将数据集列名转换为小写 names(ChickWeight) <- tolower(names(ChickWeight)) # 构建数据集 chick <- melt(ChickWeight, id=2:4, na.rm=T) # 多了一列 看宽数据与长数据的介绍 head(chick) # 前6行

time的值：0 2 4 6 8 10 12 14 16 18 20 21
chick的值：1 2 3 4 5 …50
diet 的值：1 2 3 4
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第17张

用法：

ac1 <- acast(chick, diet ~ time, mean) # average effect of diet & time ac1

ac1矩阵的行为diet，列为time
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第18张

ac2 <- acast(chick, time ~ diet, length) head(ac2)

ac2表示的时在不同的饲养天数下，小鸡食物不同时的数据大小
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第19张

library(plyr) # 为了使用"."，引用变量 ac3 <- acast(chick, chick ~ time, mean, subset = .(time < 10 & chick < 20)) ac3

使用subset参数对数据集进行筛选
4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第20张

as4 <- acast(chick, chick ~ time ~ diet) # 生成三维数组 as4

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第21张
当然，acast()函数换有很多其他的用法，有待在实际运用中去探索。

3. 其他函数

3.1 `add_margins()`函数

该函数的实际价值目前未知，仅以简单的例子列出，若有小伙伴了解，请在下方留言！

data <- data.frame(a = c(1:5),b = c(6:10),c = c('a','b','c','d','e')) rownames(data) <- c('ass','xxx','ccc','fff','rr') data1 <- add_margins(data,vars = "c") View(data) View(data1)

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第22张

3.2 `recast ()`函数

一步操作进行melt和dcast，相当于整合了数据“融化”和“整合”两步。

# french_fries为reshape2包自带的数据集 head(french_fries) time treatment subject rep potato buttery grassy rancid painty 61 1 1 3 1 2.9 0.0 0.0 0.0 5.5 25 1 1 3 2 14.0 0.0 0.0 1.1 0.0 62 1 1 10 1 11.0 6.4 0.0 0.0 0.0 26 1 1 10 2 9.9 5.9 2.9 2.2 0.0 63 1 1 15 1 1.2 0.1 0.0 1.1 5.1 27 1 1 15 2 8.8 3.0 3.6 1.5 2.3 recast(french_fries, time ~ variable, id.var = 1:4)

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧 (https://mushiming.com/) 第24张

3.3 `melt_check()`函数

这个函数的主要目的是在数据“融化”之前，检查数据集是否适合于“融化”，返回标识变量和测量变量。
有兴趣可以查看官方文档学习

3.4 `colsplit()`函数

相较于 strsplit()函数，个人感觉此函数功能更加强大，下面请看具体的例子。

x <- c('x_1','a_1','z_1') strsplit(x,split = '_',fixed = T) [[1]] [1] "x" "1" [[2]] [1] "a" "1" [[3]] [1] "z" "1" colsplit(string = x,pattern = "_", names = c('str','num')) str num 1 x 1 2 a 1 3 z 1

通过上面的例子，是不是更加倾向于colsplit()函数呢？当然， strsplit()函数自然有它的方便之处。

此外，reshape2包还有一个函数，是parse_formula()函数，它的主要功能是对cast的表达式格式进行转换；三个数据集，分别是french_fries、smiths和tips数据集，详细内容请阅读官方文档，或者通过?smiths进行查看。

THE END

发表回复

请先登录账户再评论哦

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧

文章目录

1. rehsape2包介绍

2. 使用前必须准备的工作

2.1 melt函数：宽数据——>长数据

2.1 cast函数：长数据——>宽数据

① dcast( )函数

② acast()函数

3. 其他函数

3.1 `add_margins()`函数

3.2 `recast ()`函数

3.3 `melt_check()`函数

3.4 `colsplit()`函数

HDLBits(八)学习笔记——Counters(计数器)

京东应急物资供应链管理平台_京东智慧供应链

vivadoltx文件_tcl脚本语言

什么是覆盖方法_表格怎么覆盖相同内容

推荐文章

Oracle的学习心得和知识总结（六）|Oracle数据库同义词技术详解

发表回复

热门文章

推荐文章

4.R语言【rehsape2包】介绍、melt( )、cast( )函数、其他使用技巧

文章目录

1. rehsape2包介绍

2. 使用前必须准备的工作

2.1 melt函数：宽数据——>长数据

2.1 cast函数：长数据——>宽数据

① dcast( )函数

② acast()函数

3. 其他函数

3.1 add_margins()函数

3.2 recast ()函数

3.3 melt_check()函数

3.4 colsplit()函数

HDLBits(八)学习笔记——Counters(计数器)

京东应急物资供应链管理平台_京东智慧供应链

vivadoltx文件_tcl脚本语言

什么是覆盖方法_表格怎么覆盖相同内容

推 荐 文 章

Oracle的学习心得和知识总结（六）|Oracle数据库同义词技术详解

发表回复

热门文章

推荐文章

3.1 `add_margins()`函数

3.2 `recast ()`函数

3.3 `melt_check()`函数

3.4 `colsplit()`函数

推荐文章