METHODOLOGY OF BIG DATA SCIENCE ALONG WITH
SMART CITIES AND INTERNET OF THINGS
Er. Prakash Chandra Das
Balasore College of Engineering and Technology,
Balasore, Odisha, India
*Corresponding Author E-Mail:[email protected]
The Internet of Things (IoT) for cities, offering examples of IoT-powered 21st century smart cities, in implementing its own IoT-driven services to improve the quality of life of its people through measures that promote an eco-friendly, sustainable environment. The potential benefits as well as the challenges associated with IoT for cities are discussed. Much of the ‘big data’ that are continuously generated by IoT sensors, devices, systems and services are geo-tagged or geo-located. The importance of having robust, intelligent geospatial analytics systems in place to process and make sense of such data in real time cannot therefore be overestimated. The journey of data from the state of being valueless to valuable has been possible due to powerful analytics tools and processing platforms. Organi-zations have realized the potential of data, and they are looking far ahead from the traditional relational databases to unstructured as well as semi-structured data generated from heterogeneous sources. The numerous devices and sensors surrounding our ecosystem, IoT has become a reality, and with the use of data science, IoT analytics has become a tremendous opportunity to perceive incredi-ble insights. However, despite the various benefits of IoT analytics, organizations are apprehensive with the dark side of IoT such as security and privacy concerns. In this research, we discuss the opportunities and concerns of IoT analytics. Moreover, we propose a generic data science methodology for IoT data analytics named as Plan, Collect and Analytics for Internet-of-Things. The pro-posed methodology could be applied in IoT scenarios to perform data analytics for effective and efficient decision-making.
Keywords: Internet-of-Things, Big Data science, Smart cities, Environmental Sustainability
IoT-powered smart cities of the 21st century: promises and challenges As the Internet and other relevant technologies continued to develop and mature over the first decade of the 21st century, a number of solutions from market giants such as Cisco, IBM and others emerged that made IoT a feasible option for a good number of modern cities and metropolises today. Indeed, IoT is often perceived as a major enabler for the ‘smart cities’ of the present and future. IoT-powered smart cities aim at improving the quality of life of their populations in a variety of ways, including through measures that promote eco-friendly, sustainable environments and the delivery of ‘connected health/care services to citizens at home and on the move.
There was a time when data communication was a challenge between human beings but now a days due to revolutionized development in the world of standards and network protocols, communication and data exchange has been possible even among the de-vices/sensors. These devices represent anything literally from our ecosystem such as a wearables accessories, t-shirts, automobile, keychain, sphygmomanometer, chair, game console, air-conditioner, refrigerator, projector, boiler, smartphone, plants, animals, application platforms, humans beings, and bots “connected with smart sensors” to name a few. The communication and data generated by these devices come under the world of Internet-of-Things (IoT). Kevin Ashton coined the word IoT in 1999, and its advancement has been directly proportional to the advancement in the internet technology. According to Gartner, there will be 25 billion internet-connected wired and wireless devices by 2020 and those devices will generate data that could be collected, prepared and analyzed to undertake intelligent decisions. IoT platforms have been deployed in various domains including healthcare, agriculture, military, food processing sector, energy, security surveillance, and environmental mon-itoring. For example, IoT applications are already serving the community in the weather forecast, monitoring the health and well-being of individuals. The data generated in an IoT environment are processed instantly to enhance the effectiveness and improve the efficiency of the entire service domain. Using IoT applications such as Lenovo smart shoes, one can track and monitor fitness data. Furthermore, the electrical appliances including refrigerators and washing machines can be controlled remotely using IoT. The surveillance cameras installed for security purpose could be remotely monitored. Since data plays an integral role in an IoT environment, IoT data could be considered both as if it is effectively treated using state-of-the-art data science for research methodology, tools, algorithms and techniques whereas dust if it is im-properly or inappropriately analyzed. An IoT system should be able to gather raw data from various network sources and analyze it to produce knowledge. The field of data science could make IoT platforms more intelligent. Data science is a mixture of diverse scientific domains. It uses techniques such as data mining, machine learning and Big Data Analytics to identify new insights and patterns from data. Therefore, IoT understanding of data, thus leads to effectual results that could benefit their business processes. However, IoT has its limitations because IoT devices generate and collect a huge amount of personal data whose management poses severe legal and ethical issues related to security and privacy. The objective of this paper is to enlighten the role of data science in IoT. In order to contribute to the domain of data science and IoT, we have proposed a data science methodology. The research methodology will assist the data scientists to perform an accurate analysis of telemetry to seek effective insights and undertake smart decisions. This paper is structured as follows: the relationship be-tween IoT and data science is discussed, on the opportunities of research methodology as the big data science and IoT.
II. EXAMPLES OF SMART CITIES OF IOT APPLICATIONS AND SERVICES
The Smart cities are given in and include India in Bhubaneswar, Masdar in United Arab Emirates, South Korea, over a dozen of cities in China, and many more cities in Europe see also and the Smart Santander project and the rest of the world. In the UK, Bristol City Council is developing a smart city service within the SPHERE project to monitor the health and well-being of people living at home. Bristol also has a more extensive smart city agenda that goes beyond home tele healthcare services. Moving north, Glasgow (Scotland) is similarly running an ambitious £24 million (GBP) programme to demonstrate how technology can make life in the city smarter, safer and more sustainable. Love Clean Streets is a UK app that enables Internet connected citizens to use their mobile phones’ built-in GPS (Global Positioning System) and camera to document and directly report to their local authorities any environmental or neighborhood issues or crimes they might come across while travelling in the city.
Barcelona (Spain) and the ‘Internet of Everything’ Cisco calls its own “version” of IoT the ‘Internet of Everything’. Barcelona, the capital city of the autonomous community of Catalonia in Spain, teamed up with Cisco to deploy city-wide IoT systems and services to better serve its citizens and visitors. The ‘Internet of Everything’ acts as the backbone around which technological initiatives are being undertaken in Barcelona, rather than doing projects in silo. A 500 Km long underground fibre network is being installed progressively as the city carries out routine maintenance to its roads and other underground services, which helps reduce installation costs significantly. Barcelona’s smart bus stops are connected to the city’s fibrenetwork. They display real time bus timetables, tourist information and digital advertising, offer USB (Universal Serial Bus) charging sockets for mobile devices such as smartphones and tablets, and act as free WiFi hotspots, allowing people to connect to the Internet using their mobile devices while waiting for a bus. The city’s smart parking spots are also connected to Barcelona’s WiFi network. They detect the presence of cars through a combination of light and metal detectors, but do not currently work with motorcycles. Online searching and payment for the smart parking spots is possible using dedicated smartphone apps. A city-wide network of sensors provides real-time valuable information on the flow of citizens, noise and other forms of environmental pollution, as well as traffic and weather conditions.
III. THE IOT AND DATA SCIENCE
The data have been generating from IoT devices such as RFIDs, sensors, satellites, business transactions, actuators (such as machines/equipment fitted with sensors and deployed for mining, oil exploration, or manufacturing operations), lab instruments (e.g., high energy physics synchrotron), smart consumer appliances (TV, phone, etc.), and social media as well as clickstreams. Fig. 1 illustrates the land-scape of IoT and Data Science, in which various applications such as smart transporta-tion, smart home and smart grid, generate data using embedded sensors and objects. These generated data are transferred via networks and stored in the cloud for processing using numerous big data technologies. The data scientists use BDA applications with well-defined data science methods to analyze volumes of structured and unstructured data with various characteristics generated from IoT devices.
BDA is used to ex-tract information that assists in identifying trends, discovering correlations, predicting patterns and undertaking effective decisions. However, since IoT data is mostly collected from sensors, it is different from normal big data regarding characteristics such as extreme noise, heterogeneity, and express evolution. In 2030, the number of sensors will increase by 1 trillion that would eventually upsurge the big data.
IV. BIG OPPORTUNITIES OF IOT AND DATA SCIENCE
IoT is one of the most vital domains of next-generation technology that is obtaining huge attention from the industries widely.IoT technologies offer enhanced data collection, enabling real-time responses, improving the access and control of devices, increasing efficiency and productivity, and connecting technologies. IoT can be considered as a deployment of smart devices which uses data and connectivity. The devices are connected and communicated with each other, and the IoT technologies integrate the collected data from the devices with customer support systems, vendor-managed inventory systems, business intelligence applications, and business analytics tools. The integrated IoT devices produce a huge amount of data rapidly.
Hence, data science can play a substantial role in IoT to extract useful information for pattern recognition, trend prediction, and decision-making. Following are some of the opportunities that require IoT and data science to develop more benefits for industry and academia.
V. BIG IOT DATA AND BUSINESS ANALYTICS
The enormous volume of data is generated by actuators and sensors embedded in IoT machines and devices. This huge amount of data can be transmitted into business analytics and intelligence tools to improve the accuracy of decision-making outcomes. Analyzing markets trends and conditions, and customer behaviors can help business organizations to detect and solve their business issues and increase the level of their customers’ satisfaction. Business analytics technologies can be integrated with IoT devices such as wearable health monitoring sensors. This integration provides real-time decision-making possibilities at the source of data. For instance, the health data collected via sensors and monitoring systems such as Humana’s Health sense eNeighbor® remote monitoring system which reports changes in normal activities of its members using in-home sensors can provide opportunities for healthcare providers to analyze the collected data and monitor patients far more regularly and efficiently.
VI. MONITORING AND CONTROL SYSTEM OF IOT
Monitoring the environmental conditions, the level of energy consumption, and even the performance of equipment require IoT technologies to collect data from available sources and data science to extract useful information for automated controller and managers to monitor the performance and changes of the related objects. Advanced technologies such as smart grid and smart metering offer higher productivity and lower costs by exposing operational patterns, optimizing operations and predicting future changes and trends. One of the well known IoT monitoring and control Systems is a smart home technology. In this technology, the main intentions are to save energy and also to protect family and property. For instance, the Verizon Home Monitoring and Control network developed remote control applications for home automation using a special wireless communications technology. Users of the applications can monitor and control IoT enabled devices via smartphone, tablet or a computer.
They can control the home appliance’s, climate, adjust the lights, lock and unlock the doors, Camera, manage security systems, etc. The applications also send event notifications to the users automatically. All these functional-ities are not possible without analyzing the received data from IoT devices. Another edition of this story is happening in smart cars where IoT technologies are used to mon-itor and control various parts of smart cars.
VII. COLLABORATION AND INFORMATION SHARING OF IOT
Different types of information sharing can be occurred using IoT technologies. This can be categorized in human-to-human, human-to-things, things-to-human, and things-to-things. For example, in the human-to-human category, communication and sharing information occurs commonly when a manager assigns a task to staffs using IoT enabled mobile devices. When alerts from sensors embedded in a machine are sent to the person in charge of informing about an event like dropping the temperature of the ma-chine, a things-to-human type of information sharing has been happening. Now a user can send a command to the system and react to the alert as a human-to-things type of collaboration. Sending raw information from a complex machine to a normal user may cause a wrong interpretation. So, the data collected from IoT-enabled devices must be analyzed to take proper actions.
The IoT for e-commerce platforms is the delivery of intelligent visions which provides new business outcomes. The future of retail is claimed to be e-commerce and shifting to online shopping and marketing is getting the attention of the customers regarding offering more benefits to them. Hence, it is necessary for retailers to adjust their business strategies to embed new technologies such as IoT into their system. Certainly, IoT and big data perform a key role in this ongoing technological disruption. The generated data require to be analyzed to come up with new solutions to improve their business and increase their annual profit. Simultaneously, they should not under-estimate the vital impression of their data contribution to gain more benefits by looking for a customized and improved users’ shopping experience.
IX. SMART LEARNING
In Activities and behavioral data can be collected from digital sources using IoT devices in various platforms such as social media and online shopping systems. These web-based behavioral data are recorded in different forms such as transactional purchase information or cookies data. IoT devices can observe consumers’ habits, preferences, tendencies, and their environments using data science. These IoT enabled devices can learn from the patterns and outcomes extracted from the analytical processes that data science can apply to IoT data. It offers opportunities to markets, providers, and websites to learn more about consumers’ needs and interests. This learning process is based on consumers’ behaviors in the physical world as opposed to the strictly online world. In during COVID-19, 2020 all those meeting or learnings used by online mode.
X. IOT: SMART HEALTHCARE
Internet of Things technology has attracted much attention in recent years for its potential to alleviate the strain on healthcare systems caused by an aging population and a rise in chronic illness. Standardization is a key issue limiting progress in this area, and thus this paper proposes a standard model for application in future IOT healthcare systems. This survey paper then presents the state-of-the-art research relating to each area of the model, evaluating their strengths.
XI. IOT: SECURITY AND PRIVACY
Since telemetry travels via several hops in a network, a strong encryption mechanism is essential to guarantee data confidentiality, integrity, and availability. Moreover, the Machine-to-Machine (M2M), Cyber-Physical Systems (CPSs) and Wireless Sensor Networks (WSNs) have progressed as essential components for IoT. Therefore, the security issues related to M2M, CPS, and WSN are rising in relation to IoT. The whole deployment architecture needs to be secured from attacks, which may obstruct the services provided by IoT as well as may pose a threat to privacy, integrity and confidentiality. IoT can bring opportunities for major industries such as healthcare, military, energy, and e-commerce, etc. These opportunities for IoT could also be an encouragement for the hackers to steal a wealth of data generated from IoT sensors due to political and commercial interest. The security of IoT sensors could be violated that could lead to a breach of service integrity. The IoT sensors could retrieve numerous data including the personal information of the users because those sensors can be integrated into a wide variety of things in our entire ecosystem. The hackers could launch a variety of identity theft attacks on the vulnerable IoT devices for malicious purposes.
The ownership of personal data is another concern especially when data is collected without the awareness of the users or with their awareness but without the knowledge of how the data related to them is going to be used and who stays the owner of the data? The European Commission also has doubts regarding data ownership. These challenges related to IoT security and privacy remain the open areas of research. However, efforts have been reported in research and industry standards to make IoT a secure, reliable and trusted platform. Standardization organizations such as IEEE are also focused on strengthening IoT security by developing necessary communication technologies. These technologies are imperative to enhance IoT reliability and power efficiency. IoT has an extraordinary capability for flexibility and scalability. One of the main goals is to ensure the availability of authentication mechanisms to thwart any attacks, which could compromise the integrity of data and services.
XII. METHODOLOGY AS DATA SCIENCE FOR IOT ANALYTICS
Although the IoT and data science are frequently discussed research topics nowadays, to the best of our knowledge and findings, we could not find any paper with the systematic description and application of a data science approach to performing analytics on telemetry. To fulfill the gaps, in this paper we have provided a generic data science methodology named as Plan, Collect and Analytics for Internet-of-Things (PCA-IoT) as shown in Fig 7. The proposed methodology could be applied in IoT scenarios to perform data analytics for effective and efficient decision-making. PCA-IoT initiates with the planning of the project, and it traverses through the collection and analysis of telemetry and ends with the reporting of analytical insights and actions. However, the entire methodology is completely iterative, i.e., there is a possibility to switch backward and forward from one stage to another. For example, a data scientist could switch from analytics to plan stage to modify the initial strategy after the preliminary visualization results. The detailed steps of each stage of the methodology are discussed in the following sub-sections.
XIII. PLANNING & ANALYZED
Since every project has a certain set of goals to achieve, it is imperative for the project to start with the analysis of the requirements. All the stakeholders of an IoT project especially those who require an analytical solution must be involved in the planning stage to ensure that their requirements are being properly understood and analyzed. The main stakeholders such as the domain experts must be involved in every cycle of the project to provide domain knowledge and review and revise the continuous progress as well as the direction of the project to perceive valuable insights and to obtain the required solutions. After the successful gathering and analysis of the requirement, a data scientist can formulate the preliminary analytical approaches using statistical techniques and machine learning algorithms to address the problem. With the preliminary findings, team of data scientists, domain experts and appropriate entities from the side of project sponsor could work together to identify and undertake decision on the selection of most
suitable analytics tools to be used, algorithms techniques to be applied, the type of models to be generated, and the hosting platform such as in-house or cloud infrastructure. For in-stance, if the goal is to estimate the relationship between independent and predictor variables, data scientists may choose to generate a regression model. In the planning stage, it is also important to identify the sources of IoT data because telemetry generated from unknown or unreliable sources may lead to inaccurate and invalid analysis.
XIV. COLLECT OF IOT ANALYTICS
Due to rapidly expanding volume and velocity of telemetry, it would be feasible to perform IoT analytics using third-party cloud services such as Amazon IoT core, IBM Watson IoT, and Azure IoT hub. The gathering of telemetry could initiate after the successful completion of the activities defined at the planning stage. The communication between the IoT hub, i.e., IoT data sources takes places via the gateway which manages all active device connections and implements semantics for multiple protocols to ensure that devices can securely and effectively communicate using various proto-cols such as MQTT, CoAP, Web Sockets, and HTTP. However, the gateway could apply rules and restrictions to the incoming data using SQL-like statements. A rule can be applied to data from one or many devices. For example, the gateway may filter-out and reject data from certain sensors of the IoT network, or it may accept only certain types of data from specific sensors. The gateway bridge publishes all device telemetry to the cloud that can then be consumed by downstream analytic systems using stream or batch processing.
XV. ANALYTICS OF DATA SCIENCE AND IOT
For processing all the transactions that have been performed by a major financial firm in a week. However, stream processing will be feasible if real-time analytics is required such as fraud detection and live application monitoring. In an IoT environment, both types of the processing could be useful depending on the requirements and nature of the project especially related to the type of analytics required. Batch processing best fits in the situations where generating real-time analytics results are not the priority and more importance is given to the processing of large volumes of data than to getting fast analytics results. Since the sensors can generate inappropriate or null data values, the next step would be to pre-process the telemetry using typical data science approaches such as removing duplication, filter unwanted outliers, handling missing data, etc. Unlike manual data processing in traditional data analytics systems, in an IoT analytics environment, data processing is fast and automated by writing well-defined program codes. During the analysis of data, if data scientists identified that the data needs to further pre-processed, they will switch to pre-processing before performing the analysis. The prepared data is then analyzed using various machine learning and statistical techniques to generate models by considering the steps decided in the project plan. Finally, the models are visualized to per-form various analytics such as descriptive, predictive and prescriptive. Due to real-time analytics, organizations, individuals can undertake efficient as well as effective decisions using telemetry.
XVI. FUTURE SCOPE OF DATA SCIENCE AND IOT
IoT has a bright and dark side. However, the research world is currently focused on eliminating the concerns related to IoT to make it as a trusted, reliable and secure platform to seek incredible insights. The research in the field is rapidly increasing, and we could predict that it will continue because data is of high value for the organizations and IoT is the major source for gathering and generating volumes and variety of data. The relationship between the IoT and data science is eternal because to convert data into diamond, analytical approaches are required. However, there are several opportunities to contribute to the areas of IoT and data science. New systems are required to guarantee the security and privacy of users’ data and trustworthiness of IoT sensors. The developments in the world of technology, there is a need to establish new policies, standards, and guidelines for the entire IoT ecosystem to achieve the trust of all the users and to make IoT analytics an opportunity for all types of organizations.
From the above discussion, we see many similarities but also significant differences when it comes to Data Science for IoT. There are obvious differences (for example in the use of Hardware and Radio networks). But for me, the most exciting development is the fact that IoT powers exciting new greenfield domains such as Drones, Self driving cars, Enterprise AI, Cloud robotics, Smart cities, Home appliances, recently implemented IOT based different equipment covid-19, 2020 Hospital and many more.
Author (PCDAS) thanks the Department of Electronics And Communication Engineering for permitting to carry out the research work. Also, author acknowledges the necessary facilities provided by the Balasore College of Engineering and Technology (Biju Patnaik University of Technology),Odisha, India.
Mohammad SaeidMahdavinejad, MohammadrezaRezvan, MohammadaminBarekatain, PeymanAdibi, Payam Barnaghi, Amit P. Sheth, Machine learning for internet of things data analysis: a survey, Digital Communications and Networks, Volume 4, Issue 3, 2018, Pages 161-175.
 Kevin Ashton. That ‘Internet of Things'[J]. RFID Journal, 2010.
 Luigi Atzori, Antonio Iera, Giacomo Morabito. The Internet of Things: A survey. Computer Networks, Vol.54, 2010
 Mulligan G. The Internet of Things: Here now and coming soon. IEEE Internet Computing, 2010, 14( 1) : 35- 36.
 Rolf H. Weber. Internet of Things–New security and privacy challenges. Computer Law & Security Review, No. 26, 2010.
 P.P. Ray, A survey on Internet of Things architectures, Journal of King Saud University – Computer and Information Sciences, Volume 30, Issue 3, 2018, Pages 291-319.
 Xu Da, Li Wu He, Li Shancang, Internet of things in industries: A survey IEEE Trans. Ind. Inf., 10 (4) (2014), 2233-2243.
[8) Li S., Tryfonas T., Li H. The internet of things: a security point of view Internet Res., 26 (2) (2016), 337-359.
Yuehong Y.I., Zeng Y., Chen X., Fan Y. The internet of things in healthcare: an overview J. Ind. Inf. Integr.,31 (1) (2016), pp. 3-13.
 M. Rouse, I. Wigmore, Internet of things, 2016. http://internetofthingsagenda.techtarget.com/defini-tion/Internet-of-Things-IoT.
 Cisco: Internet of Everything: Circle Story (:60) (video, January2014).http://www.youtube.com/watch?v=Kt5VulFqBm4.
Minhaj Ahmad Khan, Khaled Salah, IoT security: Review, blockchain solutions, and Future Generation Computer Systems, Volume 82, 2018, Pages 395-411.
 Mohsen Marjani, FarizaNasaruddin, Abdullah Gani, Ahmad Karim, Ibrahim AbakerTargio Hashem, Aisha Siddiqa, and IbrarYaqoob, 2016. Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges, Volume 5, pp. 5247-5261.
 Rajiv Ranjan, DhavalkumarThakker, Armin Haller, RajkumarBuyya, 2017. A note on exploration of IoT generated big data using semantics, Future Generation Computer Systems, 76 (2017), 495–498.
 John B. Rollins, Polong Lin, Alex Aklson, 2017.
 Cisco Consulting Services: The Internet of Everything—A $19 Trillion Opportunity. 2014.http://www.cisco.com/web/services/portfolio/consultingservices/documents/consulting-services-capturing-ioe- value-aag.pdf.
 Smart Cities, not only new Research Papers but also exciting forecast!. 2012.http://ict4green.wordpress.com/2012/02/11/smart-cities-not-only-newresearch-papers-but-also-exiting-forecast/.