"BIG DATA IS THE TERM FOR COLLECTION OF DATA SETS SO LARGE AND COMPLEX THAT IT BECOMES DIFFICULT TO PROCESS USING ON-HAND DATABASE SYSTEM TOOLS OR TRADITIONAL DATA PROCESSING APPLICATIONS."
EVOLUTION OF DATA:
1⇒EVOLUTION OF TECHNOLOGY
2⇒IOT
3⇒SOCIAL MEDIA
4⇒OTHER FACTORS
3V OF BIG DATA:
💨VOLUME : BY 2020, ACCUMULATED DIGITAL UNIVERSE OF DATA WILL GROW FROM 4.4 ZETA-BYTE TO 44 ZETA-BYTE OR 44 TRILLION GIZA-BYTE.
💨VARIETY :DIFFERENT KINDS OF DATA IS BEING GENERATED FROM VARIOUS SOURCES.
💨VELOCITY :DATA IS BEING GENERATED AT AN ALARMING RATE.
BIG DATA AS AN OPPORTUNITY:
Big Data Analytics
Cost reduction:Cost effective storage system for use data sets
Faster and better decision making :Provide ways to analyse information quickly and make decisions
Next generation production:Automated car, Healthcare, etc
Improved service or product :Evolution of customer need and satisfaction
AND
many more opportunities....
PROBLEMS WITH BIG DATA:
PROBLEM 1: STORING EXPONENTIALLY GROWING HUGE DATA-SETS.
PROBLEM 2: PROCESSING DATA HAVING COMPLEX STRUCTURE.
PROBLEM 3: PROCESSING DATA FASTER.
HADOOP- AS- A- SOLUTION:
"HADOOP IS A FRAMEWORK THAT ALLOWS US TO STORE AND PROCESS DATA SETS IN PARALLEL AND DISTRIBUTED FASHION. "
HDFS (STORAGE)
ALLOWS TO DUMP ANY KIND OF DATA ACROSS THE CLUSTERS.
MAP REDUCE (PROCESSING)
ALLOW PARALLEL PROCESSING OF THE DATA STORED IN HDFS.
HADOOP DISTRIBUTED FILE SYSTEM:
HDFS HAS TWO CORE COMPONENTS, THAT IS NAMENODE AND DATANODE :
⇒THE NAME NODE IS THE MAIN NODE THAT CONTAINS METADATA ABOUT THE DATA STORED.
⇒DATA IS STORED ON THE DATANODES WHICH ARE COMMODITY HARDWARE IN THE DISTRIBUTED ENVIRONMENT.
STORING DATA(SOLLUTION):
⇒PROBLEM 1: STORING EXPONENTIALLY GROWING HUGE DATASETS
⇒SOLUTION : HDFS
⇒STORAGE UNIT OF HADOOP
⇒IT IS A DISTRIBUTED FILE SYSTEM
⇒DIVIDE FILES (INPUT DATA) INTO SMALLER CHUNKS AND STORES IT ACROSS THE CLUSTER
⇒SCALABLE PER REQUIREMENT
PROBLEM 2 : STORING UNSTRUCTURED DATA:
⇒SOLUTION: HDFS
⇒ALLOWS TO STORE ANY KIND OF DATA, BE IT STRUCTURED, SEMI-STRUCTURED OR UNSTRUCTURED
HADOOP ECOSYSTEM:
⇒HADOOP: HADOOP PROVIDES A SCALABLE SOLUTION TO STORE AND PROCESS HUGE DATA SETS IN PARALLEL AND DISTRIBUTED FASHION.
⇒APACHE HIVE : APACHE HIVE IS A DATA WAREHOUSING TOOL THAT ALLOWS US TO PERFORM BIG DATA ANALYTICS USING HIVE QUERY LANGUAGE WHICH IS VERY SIMILAR TO SQL.
⇒APACHE PIG: APACHE PIG IS A PLATFORM, USED TO ANALYSE LARGE DATASETS REPRESENTING THEM TO DATAFLOWS.
⇒APACHE SPARK
⇒APACHE HBASE
KEEP VISITING...
KEEP VISITING...
Nice article https://www.railwayjobss.in all government job update https://www.dollar13.com how to make money online
ReplyDeleteYou develop data on taking care of information by enlist yourself in a course offered by any of the prestigious organizations that have been working personally with the continually creating business. Data Analytics Course
ReplyDeleteVery Informative Article
ReplyDeleteData Science Interview Questions
Nice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work
data analytics course
Business Analytics Certification Course Training in Hyderabad
ReplyDelete
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteSimple Linear Regression
Correlation vs Covariance
Attend The Data Analyst Course From ExcelR. Practical Data Analyst Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analyst Course.
ReplyDeleteData Analyst Course
This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
ReplyDeleteSimple Linear Regression
Correlation vs covariance
KNN Algorithm
Logistic Regression explained