Dive into the realm of data analytics with the B.Tech AKTU Quantum Book. Access crucial notes, frequently asked questions, and valuable insights to flourish in this dynamic sector. Unit-4 Frequent Itemsets and Clustering
Dudes 🤔.. You want more useful details regarding this subject. Please keep in mind this as well. Important Questions For Data Analytics: *Quantum *B.tech-Syllabus *Circulars *B.tech AKTU RESULT * Btech 3rd Year * Aktu Solved Question Paper
Q1. Write short notes on frequent patterns in data mining.
- 1. Frequent patterns are patterns that appear frequently in a dataset (such as itemsets, subsequences, or substructures).
- 2. A substructure can refer to many structural forms such as subgraphs, subtrees, or sublattices that can be combined with itemsets or subsequences.
- 3. A (frequent) structured pattern is one in which a substructure occurs frequently.
- 4. Identifying common patterns is critical in mining connections, correlations, and other relevant data linkages.
- 5. It is beneficial for data classification, clustering, and other data mining activities.
- 6. Frequent pattern mining searches a dataset for repeating relationships.
- 7. A frequent itemset is a group of items, such as milk and bread, that appear frequently together in a supermarket transaction dataset.
- 8. A subsequence, such as purchasing a Computer first, then a digital camera, and then a memory card, is a (often) sequential pattern if it occurs frequently in a shopping history database.
Q2. Explain frequent itemset mining.
- 1. Frequent itemset mining reveals relationships and correlations between items in big transactional or relational datasets.
- 2. Massive volumes of data are constantly collected and stored. Many companies are interested in extracting such patterns from their databases.
- 3. The finding of interesting correlation relationships across massive volumes of business transaction records can aid in numerous commercial decision-making processes, such as catalogue design, consumer cross-selling, and shopping behaviour analysis.
- 4. Market basket analysis is a common example of frequent itemset mining.
- 5. This approach examines customer purchasing habits by identifying links between the various goods placed in customers’ “shopping baskets.”
- 6. The finding of these correlations might assist merchants in developing marketing strategies by providing insight into which things customers usually purchase together.
- 7. For example, if a customer buys one product, how likely are they to buy another at the same time. This data can lead to greater sales by assisting merchants with targeted marketing.
Q3. How does the k-means algorithm work ? Write k-means algorithm for partitioning.
- 1. Initially, it chooses k items at random from D, each of which represents a cluster mean or centre at first.
- 2. Based on the Euclidean distance between the object and the cluster mean, each of the remaining objects is assigned to the cluster to which it is most similar.
- 3. The k-means method then improves the within-cluster variation iteratively.
- 4. It computes the new mean for each cluster using the objects assigned to the cluster in the previous iteration.
- 5. All of the objects are then reassigned as the new cluster centres using the revised means.
- 6. The iterations continue until the assignment is stable, that is, the clusters formed in the current round are the same as those formed in the previous round.
k : the number of clusters,
D : a data set containing n objects.
Output: A set of k clusters.
- 1. Arbitrarily choose k objects from D as the initial cluster centers;
- 2. repeat
- 3. (re)assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster;
- 4. update the cluster means, that is, calculate the mean value of the objects for each cluster;
- 5. until no change.
Q4. What are the approaches for high dimensional data clustering ?
Ans. Approaches for high dimensional data clustering are:
1. Subspace clustering:
- a. Clustering of subspaces The search for relevant dimensions is localized by subspace clustering algorithms, allowing them to locate clusters that exist in numerous, possibly overlapping subspaces.
- b. This technique is a feature selection extension that seeks clusters in distinct subspaces of the same dataset.
- c. Subspace clustering necessitates the use of a search strategy and evaluation criteria.
- d. It restricts the scope of the assessment criteria so that distinct subspaces are considered for each cluster.
2. Projected clustering:
- a. In high-dimensional spaces, even though a decent partition cannot be specified on all dimensions due to data sparsity, some subset of the dimensions can always be acquired, and some subsets of data form high quality and significant clusters.
- b. Projected clustering algorithms seek clusters that are specific to a set of dimensions. Each cluster may correspond to distinct dimensions subsets.
- c. The output of a typical projected clustering algorithm, searching for k clusters in subspaces of dimension 1, is twofold:
- i. A partition of data of k + 1 different clusters, where the first k clusters are well shaped, while the (k + 1)th cluster elements are outliers, which by definition do not cluster well.
- ii. A possibly different set of dimensions for each of the first k clusters, such that the points in each of those clusters are well clustered in the subspaces defined by these vectors.
- a. Biclustering (or two-way clustering) is an approach that allows for feature set and data point clustering at the same time, i.e., finding clusters of samples with similar characteristics as well as the features that cause these similarities.
- b. The result of biclustering is a partition of the entire matrix into sub-matrices or patches, rather than a partition or hierarchy of partitions of either rows or columns.
- c. The purpose of biclustering is to discover as many patches as possible, with as many patches as possible, while retaining strong homogeneity within patches.
Q5. What are the major tasks of clustering evaluation ?
Ans. The major tasks of clustering evaluation include the following:
1. Assessing clustering tendency:
- a. In this task, we determine whether a non-random structure exists in a given data set.
- b. Using a clustering method to a data set blindly will produce clusters; however, the clusters produced may be misleading.
- c. Clustering analysis on a data set is only useful when the data has a nonrandom structure.
2. Determining the number of clusters in a data set:
- a. The number of clusters in a data set is required as a parameter by a few algorithms, such as k-means.
- b. Furthermore, the number of clusters can be viewed as an intriguing and significant summary statistic of a data collection.
- c. As a result, it is preferable to estimate this number before employing a clustering technique to generate detailed clusters.
3. Measuring clustering quality:
- a. After applying a clustering approach to a data collection, we want to evaluate the quality of the generated clusters.
- b. A variety of measures are available.
- c. Some methods assess how well the clusters fit the data set, while others assess how well the clusters match the ground truth, if one exists.
- d. There are other measures that score clustering and so allow you to compare two sets of clustering results from the same data set.
Q6. Explain initialization of cluster tree in GRGPF algorithm.
- 1. The clusters are structured into a tree, with nodes that can be quite large, such as disc blocks or pages, as in the case of a B-tree, which the cluster-representing tree is similar to.
- 2. Each tree leaf can carry as many cluster representations as it can.
- 3. The size of a cluster representation is independent of the number of points in the cluster.
- 4. A cluster tree’s core node contains a sample of the clustroids of the clusters represented by each of its subtrees, as well as pointers to the roots of those subtrees.
- 5. Because the samples are fixed in size, the number of children an inner node can have is independent of its level.
- 6. As we go up the tree, the probability that a given cluster’s clustroid is part of the sample diminishes.
- 7. We begin the cluster tree by clustering a main-memory sample of the dataset hierarchically.
- 8. The clustering produces a tree T, however T is not the same as the tree used by the GRGPF Algorithm. Alternatively, we choose from T some of its nodes that constitute clusters of roughly the requisite size n.
- 9. These are the beginning clusters for the GRGPF Algorithm, and their representations are placed at the tree’s leaf. The clusters with a common ancestor in T are then grouped into interior nodes of the cluster-representing tree. Rebalancing the cluster representing tree may be required in some circumstances.
Important Question with solutions | AKTU Quantums | Syllabus | Short Questions
Data Analytics Btech Quantum PDF, Syllabus, Important Questions
Data Analytics Quantum PDF | AKTU Quantum PDF:
AKTU Important Links | Btech Syllabus
|Btech AKTU Circulars
|Btech AKTU Syllabus
|Btech AKTU Student Dashboard
|AKTU RESULT (One VIew)