| Abstrak/Abstract |
The growing potential of big data is enhancing insights and decision-making processes, especially with the advent
of technologies like cloud computing and the Internet of Things (IoT). In Indonesia, Mobile Positioning Data (MPD) obtained
from smartphones is a valuable big data resource due to the widespread use of these devices. Although analyzing MPD offers
crucial insights into public behavior and preferences, handling it is quite challenging due to its large volume and high velocity.
Traditional Python-based approaches fall short when dealing with such big and dynamic datasets. The lack of an efficient big data
analytics solution for large-scale MPD processing prompt us to develop a distributed solution integrating Apache Spark, Hadoop,
and Docker Swarm container orchestration in a cluster of three interconnected computers. This solution was compared against
traditional Python methods ran on a single computer, demonstrating its capability to execute MPD processing stages and use case
on datasets up to 26GB, which was not possible with basic Python. Moreover, the distributed approach showed faster execution
times in 4 out of 6 tested stages and was more efficient in terms of CPU, memory, and data exchange.
Consequently, this distributed solution presents a robust and efficient alternative for big data analytics, particularly for MPD, as
it has successfully demonstrate the ability to excels over traditional methods in both time and resource efficiency. These finding
raise a potential point for the developed big data solutions as it serves as a valuable case study for regions and industries facing
similar big data challenges, offering a blueprint for implementing efficient big data analytics solutions. This potential point could
extent into various industries, enabling more effective analysis of public behavior and preferences using MPD, with applications
that extends into marketing, urban planning, and public policy. |