ETRI-Activity3D - Introduction
Provider Created (Updated) Hits Download Anybody Can Update Category
Dohyung Kim 2020-02-19 04:40 (2020-10-14 07:48) 583 50 not allowed Human Understanding
#
License Agreement not required
(Not submitted)
Description

ETRI-Activity3D

A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of The Elderly

 

 

Background

 

     As part of the solutions to an aging society, research on elder care robots has been actively carried out around the world. In order for robots to understand the elderly and provide context-sensitive services, robotic intelligence technologies that can identify various human attributes is essential. Among them, action recognition is a fundamental technology to understand the intentions of human behavior and grasp the daily life patterns of human users.

     The massive success of the deep learning approach has enabled rapid improvement in many computer vision tasks. Efforts to create large scale datasets to accelerate deep learning studies have been underway in extensive research areas, including human action understanding. However, despite the large number of publicly available datasets, there is a great lack of adequate data for robots to recognize daily activities of human users. Most datasets have no consideration for the robotic environment in which humans and robots live together. Furthermore, there is no large-scale visual dataset at all that deals with the everyday behavior of the elderly. The absence of datasets centered on robots and humans has been a serious impediment to robot intelligence researches, especially for elder care robots.



Introduction



[Figure 1] Sample frames from daily actions in the proposed dataset are displayed together with the corresponding depth map and skeleton information obtained from Kinect v2 sensors. Actions (from left to right): eating food with a fork, vacuuming the floor, spreading bedding, washing a towel by hands, hanging out laundry, hand shaking.

 

     To solve the shortage of datasets, we collect and release the first large-scale RGB-D dataset of daily activity of the elderly for human care robots: ETRI-Activity3D. 

     The dataset is collected by Kinect v2 sensors and consists of three synchronized data modalities: RGB videos, depth maps, and skeleton sequences. To shoot visual data, 50 elderly people are recruited. The elderly subjects are in a wide range of ages from 64 to 88, which lead to a realistic intra-class variation of the actions. In addition, we acquire a dataset for 50 young people in their 20s in the same way as older people. Finally, 112,620 sets of 3D data were obtained.

     We hope that the proposed dataset, which comprehensively considers the elderly, the robots and the environment in which they interact, can contribute to the advancement of robot intelligence.


Number of samples

112,620

Number of action classes

55

Number of subjects

100 (50 old people, 50 young people)

Collection environment

Residential Environment in Apartment

Data modalities

RGB videos, depth map frames, body index frames, 3D skeletal data

Sensor

Kinect v2


Unique characteristics and advantages of the proposed dataset over the existing ones are as follows.


1) A new visual dataset based on observations of the daily activities of the elderly

2) A realistic dataset considering the service situation of human care robots

3) A large-scale RGB-D action recognition dataset that overcomes the limitations of previous datasets



Action Classes

 

A closer understanding of what older people actually do in their daily lives is important for determining practical action categories. We visit the homes of 53 elderly people over the age of 70 and carefully monitor and document their daily behavior from morning to night. Based on the most frequent behaviors observed, 55 action classes are defined. 

ID

Action description

ID

Action description

1

eating food with a fork

29

hanging out laundry

2

pouring water into a cup

30

looking around for something

3

taking medicine

31

using a remote control

4

drinking water

32

reading a book

5

putting food in the fridge/taking food from the fridge

33

reading a newspaper

6

trimming vegetables

34

handwriting

7

peeling fruit

35

talking on the phone

8

using a gas stove

36

playing with a mobile phone

9

cutting vegetable on the cutting board

37

using a computer

10

brushing teeth

38

smoking

11

washing hands

39

clapping

12

washing face

40

rubbing face with hands

13

wiping face with a towel

41

doing freehand exercise

14

putting on cosmetics

42

doing neck roll exercise

15

putting on lipstick

43

massaging a shoulder oneself

16

brushing hair

44

taking a bow

17

blow drying hair

45

talking to each other

18

putting on a jacket

46

handshaking

19

taking off a jacket

47

hugging each other

20

putting on/taking off shoes

48

fighting each other

21

putting on/taking off glasses

49

waving a hand

22

washing the dishes

50

flapping a hand up and down (beckoning)

23

vacuuming the floor

51

pointing with a finger

24

scrubbing the floor with a rag

52

opening the door and walking in

25

wiping off the dinning table

53

fallen on the floor

26

rubbing up furniture

54

sitting up/standing up

27

spreading bedding/folding bedding

55

lying down

28

washing a towel by hands

 

 


     The resolution of RGB videos is 1920 × 1080. Depth maps are stored frame by frame in 512 × 424. Skeleton information contains locations of 25 body joints in the 3D space for tracked human bodies.



Collected Data

 

Collected Data

Resolution

File Format

Size

RGB videos

1920×1080

MP4

296 GB

Depth map frames

512×424

PNG

4.08 TB

Body index frames

512×424

PNG

42.60 GB

3D skeletal data

25 joints

CSV

20.83 GB

 

 

Total

4.44 TB



Setup

 

     Considering the height of home robots, the shooting device is prepared with two Kinect sensors at heights of 70cm and 120cm as shown in Figure 2. The four shooting devices are grouped together, and eight synchronized sensors in the group capture the subjects' action at the same time. Instead of placing the devices at fixed horizontal angular intervals, we place them in a position where the robot can appear inside the house. The distance between the sensors and the subject also varies from 1.5 meters to 3.5 meters. For actions that can be done anywhere (e.g., taking medicine and talking on the phone), we shoot them up to five times, changing the places where they might occur. In this way, we can provide further intra-class variation by containing different views and background conditions. All the group and camera numbers are provided as the filename for each video sample.





[Figure 2] Configuration of the data acquisition system.




Publications

 

All documents and papers that report on research that uses the ETRI-Activity3D dataset should cite the following paper:


Jinhyeok Jang, Dohyung Kim, Cheonshu Park, Minsu Jang, Jaeyeon Lee, Jaehong Kim, “ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly,” arXiv:2003.01920

 



Contact


Please email dhkim008@etri.re.kr if you have any questions or comments.

 

 

Acknowledgment


The protocol and consent of data collection were approved by the Institutional Review Board(IRB) at Suwon Science College, our joint research institute. 

This work was supported by the ICT R&D program of MSIP/IITP. [2017-0-00162, Development of Human-care Robot Technology for Aging Society]. 




Datasets File(s) (Total 1 Unit)
SampleVideos

Description SampleVideos

  • Created  2020-02-19
  • File   Sampl...
  • Size  90.9MB
  • Download  50
SampleVideos

Description SampleVideos

  • Provider  Dohyung Kim
  • File   SampleVideos.zi...
  • Size  90.9MB
  • Download  50

SampleVideos