A Novel Framework for Unique People Count from Monocular Videos

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Doctoral

Degree

Doctor of Philosophy

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Counting unique number of people in a video (i.e., counting a person only once while the person passes through the field of view (FOV)), is required in many video analytic applications, such as transit passenger and pedestrian volume count in railway stations, malls and road intersections, aid in security and resource management, urban planning, advertising and many others.

In this PhD thesis I have developed a robust algorithm to generate unique people count from monocular videos taken from an arbitrary angle. From applications point of view, my algorithm is one of the most economical ones, because it can work with existing video cameras already mounted. Within a region of interest (ROI) on the FOV of the camera, I compute influx/outflux rate of people, i.e., number of people coming in or going out of the ROI per unit time. Then, I sum the influx/outflux rate between any two time points to estimate the number of people that entered and/or left the ROI within that time interval. I employ two well-known computer vision techniques for this purpose: Gaussian process regression (GPR) to estimate the number of people present within a ROI and optical flow-based tracking of the boundary of the ROI.

The principle roadblock in most of computer vision problems is occlusion. To avoid this bottleneck, we adopt the combination of (a) the concept of influx and outflux of fluid mass from computational fluidics, (b) the GPR to estimate the number of people within a ROI and (c) ROI boundary tracking (as opposed to object or feature tracking) for a short period. Thus, the principal contribution of the thesis is to successfully handle occlusions by computing the average influx and/or outflux of people and avoiding people detection and tracking.

We validate the proposed algorithm on 19 publicly available monocular benchmark videos. Occlusions are abundant in these videos, yet we obtain more than 95% accuracy for most of these videos. We also extend our proposed framework beyond monocular videos and apply it on multiple views of a publicly available dataset with about 99% accuracy.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source