Big Data and Its Impact on Its Users
Student name:
Tutor name:
Class: INFS 401
Date:
Introduction
Today, digital data produced is merely the result of specific, super-fast computers with substantial storage capacities on devices connected to the Internet. Therefore, computers, tablets, and smartphones all transmit data concerning their users. Connected smart devices convey individual information, which is then compiled together and stored on a system server every second on the usage of the users.
Apart from the smart connected devices, data is obtained from a broader range of sources for instances: demographic data, scientific, climate data, population and medical data, power and energy consumption data, etc. All these different data provide information about the unusual location of users and their devices, their usage, their travel pattern, their interests, their habits, their leisure, and outdoor activities entirely, and more so the projects they are working on. But also information on how the machinery, infrastructure, and types of equipment are used. With the ever-increasing number of mobile phone users and the Internet, the volume of big data is every day snowballing.
Today, the world is based on a Society full of information, and as the years pass. We are moving towards a society that is based on knowledge. So, To keep up with the experience, we need a vast amount of data that we can be used to extract the best education. The community of information is a society whereby important data information plays a significant considerable role in the cultural, economic, and political stage.
What is big data?
The term “Big data” refer to all the various mass form of producing digital data by a company, more so, this data can also be provided by individuals who intend to create a large volume of data, speed processing and different forms of data. Additionally, this data production process requires increasingly sophisticated storage, processing, and analysis tools.
Big data is an evolution which involves technologies providing the community with the right information at the right time and a mass of data from which the community can compare and extract knowledge. For ages now, digital data has grown exponentially, and now, the users, companies, and organizations are facing challenges. Not only in the rapid increase in data volume, but also it has become severe in managing all the different formats and complexity of data.
Being a polymorphic object, significant data definition varies following the community and the user depending on the service they want the data to serve. Invented by legends of the web, big data presents itself as a solution to all users and the community at large and, more so, gives real-time access to information across numerous platforms on the Internet. Nonetheless, considering that big data is extensive, then it cannot be defied as it comprises several technologies. On the contrary, big data defines a group of technologies and techniques. However, this is an emerging field, and as we seek to learn how to implement this new paradigm and harness the value, the definition is changing. ()
Characteristics of big data
As we have just seen in the introduction, Big Data is massive datasets (volume) that are more diversified, which include structure, semi-structured and (variety) (unstructured) data, accelerated data, arriving faster than before. Below is a 3V concept indicating volume, variety, and velocity:
From the diagram;
Volume: represents the amount of data that is extracted, generated, stored, and operated within a given system. The increase in bulk from the diagram explains the increase in the total amount of information (data) generated and stored by the velocity of exploitation by the community and users.
Variety: represents the multiplication of all types of data collected by an information system. This multiplication makes up a complexity of links and information links between these data. The variety nonetheless relates to all the possible uses which are associated with raw data.
Velocity: represents the frequency at which data is captured, recorded, shared, and generated. The data arrived by the stream and analyzed in real-time.
How is data gathered and recorded?
A decade ago, in companies, data were a collection of information that, in some cases, would have been considered impossible to process, recorded, and importantly stored. The processing of such a bulk quantity of data imposes new advanced methods. A classic database management system is too small to process, record, and store such information. To solve this problem, Hadoop steps in. So, what is Hadoop, and why use Hadoop for storing such a large quantity of data?
Hadoop, by definition, is an open-source program (considered more accurate as far as its library framework is considered) that is collaboratively produced and freely distributed by the Apache Foundation. Effectively, it is a developer’s toolkit designed to simplify the building of Big Data solutions (). Currently, Hadoop is being used with companies with massive data which needs to be process and store. Such companies include Facebook, Google, Twitter, LinkedIn, eBay, and Amazon. Hadoop is a distributed data processing and management system and contains many components, including YARN, HDFS, and Map Reduce. Nonetheless, HDFS provides distributed high performances access. YARN, on the other hand, is a core component of the Apache Hadoop program framework.
Additionally, Hadoop relies on one-two critical servers:
Job Tracker: there is only one JobTracker per Hadoop cluster. It receives Map/Reduce tasks to run and organizes their execution on the group. When you submit your code to be executed on the Hadoop cluster, it is the JobTracker ” “s responsibility to build an execution plan. This execution plan includes determining the nodes that contain data to operate on, arranging nodes to correspond with data, monitoring running tasks, and relaunching tasks if they fail. ()
- Task Tracker: several per cluster. Executes the Map/Reduce work itself (as a Map and Reduce duty with the associated input data).
The JobTracker server is in communication with HDFS; it knows where the Map/Reduce program input data is and where the output data must be stored. It can thus optimize the distribution of tasks according to the associated information.
To run a Map/Reduce program, we must:
- Write input data in HDFS
- Submit the program to the cluster’s JobTracker.
- Retrieve output data from HDFS.
Below is a Hadoop Architecture:
What is a database?
By definition, a database is an organized selection of structured data, information, or digital information electronically stored in a computer system. A database is controlled by a DBMS (database management system). Together, the database, the DBMS, along with all the programs involved in data processing, sorting, storing are known as a database system. Just like the human body, one body part requites another body part for more effective processing. The same case applies to the database system; without one of the above programs, the entire unit cannot function at all. Examples of databases include; relational databases, Distributed databases, Object-oriented databases, and Data warehouses, among others. ()
How is a database used?
From the definition of big data, it is clear that data and massive information is stored and managed by the databases. So, the database is used in various forms for instances in business programs, and financial transactions, your bank, store, rental, and any other place in the world were by you key in your information. In daily life, databases are used in numerous ways, and without data, most activities.
What is DBMS?
A database management system is a system program used for creating, sorting, and managing databases. One aspect of a database is to ensure the end-user can read, create delete, and update information in the system at any given time. In simple terms, a DBMS is an interface between the end-user and the databases or other programs, which ensures that data is consistently organized and remains safe from any form of danger.
How do databases and DBMS factor in its use?
Managing information and data means taking great care of data so that it can be useful to us and can perform all the tasks we need the data to do. Using DBMS, all the information collected, recorded, and stored cannot be subjected to accidental distortion or disorganization. Additionally, data saved within the server becomes more accessible, available, and integrated with your entire work. The majority of user’s access data for various reasons among them includes: Creating mailing lists, writing management reports, Generating files of selected news stories, and identifying multiple client needs.
Conclusion
Big data is a crucial element when it comes to knowledge, and so, as we have seen, without sophisticated processing and storing databases, then we are prone to failure. Without a secure, fast and perfect storage system, then it is hard for us to process and extract data from the databases and so, we need to know the databases and how they function together with the DBMS. Having this knowledge improves our ability to extract, store, and even retrieve data without a doubt.
References
Big data analytics: challenges and applications for text, audio, video, and social media data-International Journal on Soft Computing, Artificial Intelligence, and Applications (IJSCAI), Vol.5, No.1. (February 2016).
Deep learning applications and challenges in big data analytics. Journal of Big Data. (2005). Najafabadi et al.
Ohlhorst, F. J. (2004). Big Data Analytics Turning Big Data into Big Money. 160.
Perspectives on Big Data and Big Data Analytics-Database Systems Journal. (2014).
The Big Data Revolution, I. a. (n.d.). Azzeddine Riahi, Sara Riahi- IJARCSSE, Volume 5, Issue 8, 11(5).