All Question paper with solution mean

Aktu Data Analytics KCS-051/KIT-601 Btech Short Question Quantum Book

Visit the AKTU Quantum Book Short Question Notes on Data Analytics for B.Tech. Explore cutting-edge methods and tools for gathering information and making data-driven decisions across a range of fields.

Dudes 🤔.. You want more useful details regarding this subject. Please keep in mind this as well.

Important Questions For Data Analytics:
*Quantum              *
*Circulars               * AKTU RESULT
* Btech 3rd Year    * Aktu Solved Question Paper

Unit-I: Introduction to Data Analytics (Short Question)

Q1. What is data analytics?

Ans. The science of studying raw data to draw conclusions about that information is known as data analytics.

Q2. What are the source of data?

Ans. Source of data are:

  • 1. Social data
  • 2. Machine data
  • 3. Transactional data

Q3. What are the classification of data?

Ans. Data is classified into three types:

  • 1. Unstructured data
  • 2. Structured data
  • 3. Semi-structured data

Q4. Write the difference between structured and semi-structured data.


S. No.Structured dataSemi-structured data
1.It is schema dependent and less flexible.It is more flexible than structured data.
2.It is very difficult to scale database schema.It is more scalable than structured data.
3.It is based on Relational database table.It is based on XML/RDF.

Q5. List the characteristics of data. 

Ans. Characteristics of data:

  • 1. Volume 
  • 2. Velocity
  • 3. Variety
  • 4. Veracity

Q6. Define Big Data platform.

Ans. A big data platform is a kind of IT solution that combines the functions and features of several big data tools and applications into a single product.

Q7. Define analytical sand box.

Ans. An analytical sandpit offers a collection of tools that can be used for in-depth study to address important business concerns.

Q8. List some of the modern data analytic tools.

Ans. Modern data analytic tools are:

  • 1. Apache Hadoop
  • 2. KNIME
  • 3. OpenRefine
  • 4. RapidMiner
  • 5. R programming language
  • 6. DataWrapper

Q9. List the benefits of analytics sand box.

Ans. Benefits of analytics sand box:

  • 1. Independence
  • 2. Flexibility
  • 3. Efficiency
  • 4. Freedom
  • 5. Speed

Q10. Write the application of data analytics.

Ans. Application of data analytics are:

  • 1. Security
  • 2. Transportation 
  • 3. Risk detection
  • 4. Internet searching 
  • 5. Digital advertisement

Q11. What are the phases of data analytic life cycle?

Ans. Phases of data analytic life cycle:

  • 1. Discovery
  • 2. Data preparation
  • 3. Model planning
  • 4. Model building 
  • 5. Communication result 
  • 6. Operationalize

Q12. What are the activities performed during discovery phase?

Ans. Activities performed during discovery phase are:

  • 1. Identity data sources
  • 2. Capture aggregate data sources
  • 3. Review the raw data 
  • 4. Evaluate the data structure and tools needed.

Q13. What are the sub-phases of discovery phase?

Ans. Sub-phase of discovery phase are:

  • 1. Learning the business domain
  • 2. Resources
  • 3. Framing the problem 
  • 4. Identifying key stake holders
  • 5. Interviewing the analytic sponsor
  • 6. Developing initial hypotheses

Q14. What are the sub-phases of data preparation phase?

Ans. Sub-phases of data preparation phases:

  • 1. Preparing an analytic sand box
  • 2. Performing ETL (Extract, Transform, Load) process
  • 3. Learning about data
  • 4. Data conditioning

Q15. List the tools used for model planning phase.

Ans. Tools used for model planning phase:

  • 1. R
  • 2. SQL Analysis service
  • 3. SAS / ACCESS
  • 4. Matlab
  • 5. Alpine Miner
  • 6. SPSS Modeler

Unit-II: Data Analysis (Short Question)

Q1. What is regression technique?

Ans. The identification and estimation of potential links between an interesting pattern or variable and its influencing factors are made possible by the regression technique.

Q2. What are the types of regression analysis?

Ans. Type of regression analysis:

  • 1. Linear regression
  • 2. Non-linear regression
  • 3. Logistic regression
  • 4. Time series regression

Q3. Define Bayesian network.

Ans. A type of probabilities graphical model called a Bayesian network employs Bayesian inference to calculate probabilities.

Q4. List the application of time series analysis.

Ans. Application of time series analysis:

  • 1. Retail sales
  • 2. Spare parts planning
  • 3. Stock trading

Q5. What are the components of time series?

Ans. Component of time series are:

  • 1. Trends
  • 2. Seasonality
  • 3. Cyclic

Q6. Define rule induction.

Ans. Rule induction is a data mining technique that uses a dataset to infer if-then rules. The attributes and class labels in the dataset have an underlying link that is explained by these symbolic decision rules.

Q7. Define sequential covering.

Ans. The process of sequential covering involves repeatedly taking rules out of the data set. The sequential coverage method looks for each rule in the data set one class at a time.

Q8. What are the steps in sequential covering?

Ans. Steps in sequential covering are:

  • 1. Class selection
  • 2. Rule development
  • 3. Learn-one-rule
  • 4. Next rule
  • 5. Development of rule set

Q9. Define supervised learning.

Ans. When a network is educated by input and matching output patterns, the process is referred to as supervised learning, or associative learning.

Q10. What are the categories of supervised learning?

Ans. Supervised learning can be classified into two categories:

  • i. Classification
  • ii. Regression

Q11. Define unsupervised learning.

Ans. Unsupervised learning involves training an output unit to react to pattern clusters in the input.

Q12. What are the categories of unsupervised learning?

Ans. Unsupervised learning can be classified into two categories:

  • i. Clustering 
  • ii. Association

Q13. Difference between supervised and unsupervised learning?


S. No.Supervised learning Unsupervised learning
1.It uses known and labeled data as input.It uses unknown data as input.
2.It uses offline analysis.It uses real time analysis of data.
3.Number of classes is knownNumber of classes is not known.

Q14. List the algorithm to optimize the network size.

Ans. 1. Growing algorithm

2. Pruning algorithm

Q15. Define learning rate.

Ans. The speed and extent of weight matrix corrections are determined by the learning rate, which is a constant used in learning algorithms.

Q16. What are the various parameters in back propagation network (BPN)?


  • 1. Number of hidden nodes
  • 2. Momentum coefficient
  • 3. Sigmoidal gain
  • 4. Learning coefficient

Q17. Define multivariate analysis.

Ans. The foundation of multivariate analysis (MVA) is multivariate statistics, which involves simultaneously observing and analyzing many statistical result variables.

Q18. Define principal component analysis.

Ans. By removing significant variables from a large dataset, PCA is a technique for reducing the number of variables in a dataset. In order to keep as much information as possible, it shrinks the dimension of our data.

Unit-III: Mining Data Streams (Short Question)

Q1. Define data stream management system.

Ans. A computer software programme called a data stream management system (DSMS) is used to handle a continuous data stream. A DSMS also provides flexible query processing so that queries can be used to represent the required information.

Q2. Define data stream.

Ans. A data stream is a collection of coherent signals that have undergone digital encoding and are used to transmit or receive information that is currently being transferred.

Q3. What are the steps in query processing?

Ans. Steps in query processing:

  • 1. Formulation of continuous queries.
  • 2. Translation of declaration query.
  • 3. Optimization of queries.
  • 4. Transformation of queries.
  • 5. Execution of queries.

Q4. What are the characteristics of Big Data input stream?

Ans. Characteristics of Big Data input stream are:

  • 1. High speed
  • 2. Real time information
  • 3. Large volume data

Q5. What is the main drawback of Bernoulli sampling?

Ans. When the desired sample size is small, Bernoulli sampling’s primary flaw is the unpredictable unpredictability of the sample size.

Q6. What is Real-time Analytic Platform (RTAP)?

Ans. By assisting in the extraction of useful information and patterns from real-time data, a real-time analytics platform enables companies to maximise their potential.

Q7. What are the steps in RTAP?

Ans. Steps in RTAP are:

  • 1. Real-time stream sources
  • 2. Real-time stream ingestion
  • 3. Real-time stream storage
  • 4. Real-time stream processing

Q8. What are the sources of streaming data?

Ans. Sources of streaming data are:

  • 1. Sensor data 
  • 2. Social media system
  • 3. Click stream

Q9. List the tools used for real-time stream ingestion.

Ans. 1. Apache streamsets

2. Apache Nifi

Q10. List the tools used for real-time stream storage.

Ans. 1. Apache Kafka

2. Apache Pulsar


Q11. List the tools used for real-time stream processing.


  • 1. Apache Spark 
  • 2. Apache Apex
  • 3. Apache Flink
  • 4. Apache Atorm
  • 5. Apache Beam

Q12. Define sentiment analysis.

Ans. Sentiment analysis is a sort of natural language processing used to monitor public sentiment about a specific product. The term “opinion mining” also applies to sentiment analysis.

Q13. What are the steps in architecture of sentiment analysis?

Ans. Step in architecture of architecture of sentiments analysis are:

  • 1. Data collection
  • 2. Text preparation 
  • 3. Sentiment detection
  • 4. Sentiment classification
  • 5. Presentation of output

Q14. Define stock market prediction.

Ans. Trying to anticipate the future value of a company’s shares or another financial instrument traded on an exchange is known as stock market prediction.

Unit-IV: Frequent Itemsets and Clustering (Short Question)

Q1. Define itemset and k-itemset.

Ans. An itemset is a grouping of things or a single entity that has some form of relationship. A k-itemset is an itemset that includes k items.

Q2. What is apriori property?

Ans. Any subset of a frequent item set must also be frequent if the item set is to be deemed frequent. The Apriori attribute is used to describe this. In other words, a frequent itemset must also be a frequent itemset for all nonempty subsets.

Q3. What are the two step process of association rule mining?

Ans. Association rule mining can be viewed as a two-step process:

  • 1. Find all frequent itemsets: Each of these itemsets will, by definition, appear at least as frequently as a defined minimum support count.
  • 2. Generate strong association rules from the frequent itemsets: These regulations must by definition meet minimum support and confidence requirements.

Q4. What are the various categories of clustering techniques?

Ans. Clustering techniques are organized into the following categories:

  • 1. Partitioning methods
  • 2. Hierarchical methods
  • 3. Density-based methods
  • 4. Grid-based methods

Q5. What are the two major approaches to subspace clustering based on search strategy?

Ans. Two major approaches to subspace clustering based on search strategy:

  • 1. A top-down approach that first determines an initial grouping in all available dimensions, assesses the subspaces of each cluster, and then iteratively enhances the outcomes.
  • 2. Bottom-up strategies that combine dense areas in low-dimensional spaces to create clusters.

Q6. Define subspace clustering.

Ans. Finding every cluster in every subspace is the goal of subspace clustering. The same point can then belong to various clusters in various subspaces. Axis-parallel or universal subspaces are possible.

Q7. What is cluster?

Ans. A cluster is a group of data objects that are distinct from the objects in other clusters yet comparable to one another within the same cluster.

Q8. What is clustering?

Ans. Clustering is the process of organising a collection of concrete or abstract things into classes of related objects. It is connected to machine learning’s unsupervised learning.

Q9. What are the applications of cluster analysis?

Ans. Application of cluster analysis are:

  • 1. Business intelligence
  • 2. Image pattern recognition 
  • 3. Web search 
  • 4. Biology
  • 5. Security
  • 6. Data mining tool

Q10. Explain partitioning method .

Ans.  A partitioning method starts by constructing a starting set of k partitions, where k is the desired number of partitions. After that, an iterative relocation method is used to move items from one group to another in an effort to optimise the partitioning.

Q11. Explain grid-based method.

Ans. A grid-based approach executes clustering on the grid structure after first quantizing the object space into a limited number of cells.

Q12. List the algorithm for grid-based method of clustering.

Ans. Algorithm for grid based method of clustering are:

  • 1. STING
  • 2. CLIQUE

Q13. Define clustering evaluation.

Ans. Clustering evaluation evaluates a data set’s suitability for clustering analysis as well as the volume of results a clustering algorithm produces. Assessing clustering tendency, figuring out how many clusters there are, and gauging clustering quality are among the tasks.

Q14. Define centroids and clustroids.

Ans. Centroids: The centroid of a cluster is the average of all of the cluster members in a Euclidean space.

Clustroids: As there is no assurance that points in non-Euclidean spaces have a “average,” we are compelled to use one of the cluster’s members as a representative or typical component of the cluster. The clustroid is the name of that representation.

Q15. List different potioning method.

Ans. 1. K-means

2. K-medoids


Unit-V: Frame Works and Visualization (Short Question)

Q1. What are the ways to execute Pig program?

Ans. These are the following ways of executing a Pig program:

  • 1. Interactive mode
  • 2. Batch mode
  • 3. Embedded mode

Q2. What is not supported by NoSQL?

Ans. Following are not supported by NoSQL:

  • 1. Joins
  • 2. Group by
  • 3. ACID transactions
  • 4. SQL
  • 5. Integration with applications that are based on SQL.

Q3. Define sharding.

Ans. A method of database partitioning known as “sharding” divides very huge databases into smaller, quicker, and easier to handle chunks known as “data shards,” which can then be distributed across two or more separate servers.

Q4. What are the main differences between HDFS and S3?

Ans. The main differences between HDFS and S3 are:

  • 1. S3 is more scalable than HDFS.
  • 2. When it comes to durability, S3 has the edge over HDFS.
  • 3. Data in S3 is always persistent, unlike data in HDFS.
  • 4. S3 is more cost-efficient and likely cheaper than HDFS.
  • 5. HDFS excel when it comes to performance, outshining S3.

Q5. Define Amazon S3 (Simple Storage Service).

Ans. Simple Storage Service, or Amazon S3, is an easy-to-use web-based cloud IaaS (infrastructure as a service) solution from Amazon Web Service for storing items.

Q6. What is the basic idea of visual data mining?

Ans. The fundamental tenet of visual data mining is to visualize the data in some way so that the user can understand it, make inferences from it, and interact with it.

Q7. What are the benefits of data visualization?

Ans. Benefits of data visualization:

  • 1. Identify areas that need attention or improvement.
  • 2. Clarity which factors influence customer behaviour.
  • 3. Predict sales volumes.

Q8. What are the benefits of data analytics?

Ans. Benefits of data analytics:

  • 1. Recognize the fundamental models and trends.
  • 2. serves as a source of data input for data visualization.
  • 3. Contributes to business improvement by foreseeing needs.

Q9. What types os data are visualized?

Ans. The types of data to be visualized are:

  • 1. One-dimensional data techniques
  • 2. Two-dimensional data techniques
  • 3. Multi-dimensional data techniques
  • 4. Text and hypertext techniques
  • 5. Hierarchies and graphs techniques
  • 6. Algorithm and software techniques

Q10. What are the classification of visualization techniques?

Ans. The visualization technique may be classified as:

  • 1. Standard 2D/3D displays
  • 2. Geometrically-transformed displays
  • 3. Icon-pixel displays
  • 4. Dense pixel displays
  • 5. Stacked displays

Q11. What are the uses of data visualization?

Ans. Uses of data visualization:

  • 1. Effective data exploration with clear results.
  • 2. The data mining process’s preprocessing stage is where it is primarily used.
  • 3. Aids in the process of cleaning the data by identifying inaccurate and missing values.

Q12. What is edit() and fix() function in R?

Ans. Functions such as edit() and fix() allow the user to update the contents of an R variable.

Q13. What are the four window panes in RStudio?

Ans. The four window panes are as follows:

  • 1. Scripts
  • 2. Workspace
  • 3. Plots
  • 4. Console

Q14. Define Hadoop Distributed File System.

Ans. The essential element or backbone of the Hadoop Ecosystem is the Hadoop Distributed File System (HDFS). The one that enables the storage of various kinds of enormous data sets is HDFS.

Q15. Define hash ring.

Ans. Hash ring is a collection of servers, each of which is responsible for a particular range of harsh value.

Q16. Difference between Apache Pig and Hive.


S. No.Apache PigHive
1.It uses a language called Pig Latin.It uses a language called Hive QL.
2.It handles all types of data.It handles only structured data.

bachelor exam preparation all question paper with solution important questions with solution

Data Analytics Btech Quantum PDF, Syllabus, Important Questions

Subject SyllabusSyllabus
Short QuestionsShort-question
Question paper – 2021-222021-22

Data Analytics Quantum PDF | AKTU Quantum PDF:

Quantum SeriesLinks
Quantum -2022-232022-23

AKTU Important Links | Btech Syllabus

Link NameLinks
Btech AKTU CircularsLinks
Btech AKTU SyllabusLinks
Btech AKTU Student DashboardStudent Dashboard
AKTU RESULT (One VIew)Student Result

Important Links-Btech (AKTU)

Btech InformationInfo Link
Btech BranchLINK

4 thoughts on “Aktu Data Analytics KCS-051/KIT-601 Btech Short Question Quantum Book”

Leave a Comment