Objective:
The objective of this assignment is to enable students to apply data science and AI techniques on a real-world dataset. This will help in understanding the intricacies of selecting appropriate datasets, formulating research problems, selecting suitable methodologies, and interpreting results in a scientific manner. The assignment is divided into two parts: a proposal submission and a full research paper.
Part A: Proposal Submission
Dataset Selection: Select a dataset that is not commonly used in typical data science tutorials or coursework. Examples of overused datasets include Iris, MTcars, etc. Your dataset selection must be approved by the instructor.
Research Problem Definition: Clearly define the problem you wish to address with this dataset. This could be a classification problem, a clustering task, time series analysis, regression, etc.
Preliminary Methodology: Provide a brief description of the techniques or algorithms you plan to use to address your research problem.
Submission:Dataset Description: A brief description of your dataset including its source, the type of data it contains, and why it’s significant.
Research Problem: A clear definition of the problem you aim to solve.
Planned Methodology: A brief description of your planned approach.
Evaluation Criteria: – Relevance and uniqueness of dataset. – Clarity in the definition of the research problem. – Suitability of the planned methodology.
Here are some examplesto give you some ideas:
Predicting Housing PricesDataset: Historical housing sales data from a specific region/country.
Objective: Utilize regression techniques to predict future housing prices based on features like location, square footage, number of bedrooms, etc.
Methods: Linear regression, decision trees, random forests, etc.
Analyzing Customer Sentiment from ReviewsDataset: Customer reviews for products from an e-commerce website.
Objective: Classify the sentiment of the review (e.g., positive, negative, neutral) and determine key factors that contribute to customer satisfaction.
Methods: Natural language processing (NLP), sentiment analysis, Naive Bayes, SVM, etc.
Recommendation System for Movies or BooksDataset: User ratings for movies or books from platforms like IMDb, Goodreads, etc.
Objective: Develop a recommendation system that suggests movies/books to users based on their historical preferences.
Methods: Collaborative filtering, matrix factorization, deep learning techniques.
Forecasting Stock Market PricesDataset: Historical stock market data for selected companies.
Objective: Utilize time series analysis to predict future stock prices or identify patterns that could suggest buy/sell decisions.
Methods: ARIMA, Prophet, LSTM neural networks, etc.
Clustering News ArticlesDataset: A collection of news articles from various sources over a specific time period.
Objective: Group articles into clusters based on their content to identify common themes or topics being discussed.
Methods: K-means clustering, hierarchical clustering, topic modeling (e.g., Latent Dirichlet Allocation), etc.
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount