Answer the following questions. Include the question number in the response. No

Answer the following questions. Include the question number in the response. No format needed. No reference needed. No table or chart.
1. Answer this question (200 words)
What is Dimension reduction in Data Mining? Please provide one example and explain why and how you have utilized it in your analysis.
1a. Respond to the following answer to the question above (100 words)
The technique of lowering the number of random variables or attributes under consideration is known as dimension reduction. In many practical systems, high-dimensionality data reduction as part of a data pre-processing step is critical. One of the most important issues in data mining applications is reducing high dimensionality.
If we have a dataset with hundreds of features, for instance (columns in database). Then there’s dimensionality reduction, which is when we lower the features of data attributes by mixing or integrating them in such a way that don’t lose many of the original dataset’s vital elements. The “Curse of Dimensionality” is a well-known difficulty that arises when dealing with high-dimensional data. If we want to use our data for analysis, we must lower the dimensions of our data.
1b. Respond to the following answer to the question above (100 words)
Dimension reduction is a technique to transform high dimension data into low dimension. It aims to reduce the work needed to process and analyze the data and retain the properties at the same time. Depends on the underlying data, the dimension reduction can be conducted on continuous data with method such as principal component analysis or on category data with method reducing categories.
Take the example of modeling return on one individual stock, AAPL. To build a factor model on modeling it, we can naturally think it will be a linear combination of the global equity index return, north america equity index return since it is a US company, TMT sector equity index return since it’s in TMT sector and also computer industry equity index return since its main business is to produce electronic products. However, when we build the model, we would found that the global equity index return is highly correlated with US equity index and the TMT sector is highly correlated with computer industry as well. Therefore, we can drop highly correlated variables and only keep the US equity index and TMT equity index as the factors to be regressed for AAPL stock, with remaining unexplained as idosyncic factors. So the dimension is reduced from 4 to 2.
2. Please define ‘Association Rules and Collaborative Filtering’ and give examples (include generating rules, terms etc.) (300 words)
2a. Respond to the following answer to the question above (100 words)
In association rules, the goal is to identify item clusters in transaction-type databases. Association rule discovery in marketing is termed “market basket analysis” and is aimed at discovering which groups of products tend to be purchased together. These items can then be displayed together, offered in post-transaction coupons, or recommended in online shopping. We describe the two-stage process of rule generation and then assessment of rule strength to choose a subset. We look at the popular rule-generating Apriori algorithm, and then criteria for judging the strength of rules.
In collaborative filtering, the goal is to provide personalized recommendations that leverage user-level information. User-based collaborative filtering starts with a user, then finds users who have purchased a similar set of items or ranked items in a similar fashion, and makes a recommendation to the initial user based on what the similar users purchased or liked. Item-based collaborative filtering starts with an item being considered by a user, then locates other items that tend to be co-purchased with that first item.
2b. Respond to the following answer to the question above (100 words)
Associations rules are are “if-then” statements, that help to show the probability of relationships between data items, within large data sets in various types of databases. Association rule mining has a number of applications and is widely used to help discover sales correlations” (Lutkevich, 2020). One of the example is in retailing business when retailers can collect data about purchasing patterns, recording purchase data as item barcodes are scanned by point-of-sale systems. Machine learning models can look for co-occurrence in this data to determine which products are most likely to be purchased together. The retailer can then adjust marketing and sales strategy to take advantage of this information (Lutkevich, 2020).
Collaborative Filtering refers to a predictive filtering about the interests of users by collecting data about preferences and data from many users. One of the example is video websites such as Amazon or Netflix. For example, matching the videos based on the past views. For example, the system or the websites can identify all the videos and dramas that a viewer or similar viewers have watched before. For example, I have watched Squid Game in Netflix. Thanks to the Collaborative Filtering, Netflix might recommend Alice in The Borderline or The New World starring by Lee Jung Lae, who is the main actor of Squid Game.
3. What is the usefulness of Cluster Analysis? What is Hierarchical Clustering? Give examples. (300 words)
3a. Respond to the following answer to the question above (100 words)
Cluster analysis is commonly used for classification. This could be used in market research to define classifications such as age ranges, wage ranges, and urban, rural, or suburban location. Cluster analysis can be used for personalization in branding to target distinct client groups with the most appropriate communications. Cluster analysis could be used by healthcare researchers to see if specific geographic locations are associated to high or low levels of certain illnesses, allowing them to look into probable local factors that contribute to medical conditions.
Hierarchical clustering, known as hierarchical cluster analysis, is a method of grouping related objects into clusters. The end destination is a collection of clusters, each of which is different from the others yet the items into each cluster are roughly equivalent.
3b. Respond to the following answer to the question above (100 words)
As mentioned in the questions, cluster analysis tries to identify structures within the data. In another words, it tries to identify homogenous groups of data if the group is not known previously. It can be used in various areas, such as medicine, marketing, education and biology. In the area of medicine, a diagnostic questionnaire may include multiple possible symptoms, such as anxiety, depression, etc. The cluster analysis can identify groups of patients that have similar symptoms. In the area of marketing, a market researcher may try to conduct a survey to draw the picture of customers’ needs, demographics and behaviors. The researcher can use the cluster analysis to identify similar groups of customers that have similar attributes. In the area of education, researchers may measure aptitude and achievement characteristics. Cluster analysis can help to identify which group of students need special attention. In the area of biology, researchers identify different attributes of different plants and become able to divide plants into different groups and subgroups to set up the taxonomy of species. The above are very limited examples and the truth is cluster analysis in many other areas and provide insides into a cluster of data.
Hierarchical cluster seems to be the most common method to conduct cluster analysis. It involves creating clusters that have a predetermined ordering from top to bottom. For example, the files and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical clustering, divisive and agglomerative. Agglomerative is a “bottom-up” approach. Under this approach, each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. By contrast, divisive is a “top-down” approach. Under this approach, all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
4. Please explain the difference between the predictive nature of time series forecasting vs. the descriptive or explanatory task of time series analysis. Give examples. (300 words)
4a. Respond to the following answer to the question above (100 words)
Analysis of time series data, utilizing statistics and modeling, is the process of forecasting and making strategic decisions. Forecasts aren’t always accurate, and the likelihood of forecasts varies significantly when dealing with time series data and other elements that are out of our control. Forecasting, on the other hand, provides insight into which events are more or less likely to occur (Hajifar, Sun, Megahed, Jones-Farmer, Rashedi, & Cavuoto, 2021).
For example, time series forecasting and time series analysis can be separated into two distinct categories: predictive and descriptive. As a general rule, descriptive tasks focus on finding patterns or rules in the world that can be explained by humans. Data mining tasks such as classification, clustering, association rule mining, time series mining, regression, etc. are common. Predictive and explanatory tasks such as classification and regression are included in this list (Makridakis, Spiliotis, & Assimakopoulos, 2021).
When I was a marketer, we used historical sales data to estimate whether we would meet our sales goals in a certain time period. Time series forecasting, in my opinion, has a predictive aspect because we use data to predict possible outcomes. It’s possible, though, that we may use time-series analysis to uncover trends in the demographics of our customers. Moreover, the descriptive goal of time series analysis includes this process. Although the “predictive nature” and the “descriptive/explanatory job” are complementary, they are used at different points in the research process (Pasini, Khouadjia, Samé, Trépanier, & Oukhellou, 2021).
4b. Respond to the following answer to the question above (100 words)
Time series analysis is the process of analyzing data points that are collected over a series of times. In this time series, the data is collected at specific time intervals but not randomly which should be remembered by the individuals who analyze the data. There are various divisions that are part of the time series forecasting and those include stock price forecasting, business planning, and weather forecasting. Forecasting can be a subset of supervised regression problems.
Predictive forecasting is an automated forecasting technique that helps the company to forecast and makes continuous adjustments to forecasts. The predictive forecasts help the company to identify new opportunities and risks at an early stage and help the company to grow profits. Predictive forecasting focuses on a multitude of inputs, trends, values, and cycles and also fluctuations of data in different business areas. Every piece of data is analyzed to make necessary predictions in the business. This is one of the most powerful and data-driven approaches for providing better support to the business. Corporate planning and business will be completely transformed with predictive forecasting. With predictive forecasting, companies will get proper forecasting of identifying real-time patterns.
Descriptive forecasting is also one of the best methods for analyzing data and it consists of descriptive methods for time series analysis. There are models with the deterministic trend where it constitutes regression methods. Descriptive or explanatory time series analysis is the method that focuses on analyzing the current data and understanding the current situation of the company or the data set that belongs to a particular entity. A clear summary of the data can be known with descriptive time series analysis. The patterns of a particular time series are clearly known with descriptive forecasting and its underlying causes and systematic patterns are also clearly known.

Posted in Uncategorized

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount