Black Orange Bold Modern Project Proposal Presentation

Published on Apr 21, 2024

Scene 1 (0s)

undefined. & Genetics. PRESENTED BY. Kartik 2101730051.

Scene 2 (21s)

[Audio] An overview of our project is provided in this slide. It starts with a timeline outline for the project followed by data on the datasets to be used. Further details on the data preprocessing and feature engineering techniques used to prepare the data for the Machine Learning model will be provided. Lastly the ML Model will be presented along with the accuracy result and expected outcome..

Scene 3 (45s)

[Audio] The goal of this project is to create a machine learning model to predict genes associated with genetic disorders and classify genetic disorders for accurate diagnosis. We will collaborate with geneticists and medical professionals and utilize curated datasets while employing cutting-edge feature engineering tactics to identify significant genetic characteristics. Our objective is to enable early detection and personalized treatment for the public promote precision medicine and scientific research in genetics and guarantee model accuracy and applicability..

Scene 4 (1m 19s)

[Audio] The project has been broken down into three distinct stages. Initially we chose our idea assessing the problem. Subsequently we concentrated on data pre-processing and attribute extraction. Finally we implemented the model itself. Further technical details of each of the stages will be provided in the subsequent slides..

Scene 5 (1m 40s)

[Audio] The dataset for this project has 22083 records for training and 9465 for testing. Each record contains 43 attributes plus a target variable denoting the Genetic disorder and its subclass. The machine learning model will be deployed to predict the target variable based on the other fields in the dataset..

Scene 6 (2m 1s)

[Audio] We explored our dataset to understand the relationships with genetic disorders and identified any key statistical summaries or correlations that would be helpful in feature selection. We addressed any missing values and managed any outliers present. Visualizations were used to better interpret and identify any useful patterns in our data and then we optimized the feature set by removing any irrelevant or redundant features that were identified through our exploratory data analysis. This powerpoint slide describes the data preprocessing step we took while developing our machine learning model..

Scene 7 (2m 41s)

[Audio] We are utilizing four different algorithms to predict accurate results for this Machine Learning Model CatBoost Random Forest XGBoost and LightGBM. CatBoost is crafted to produce robust performance with automatic feature scaling and effective handling of categorical features. Random Forest aids in offering strong performance through ensemble learning and further grants feature importance analysis and interpretability. XGBoost is used to achieve high accuracy and efficiency with optimized gradient boosting algorithms and hyperparameter tuning. Lastly LightGBM supplies efficient training and prediction speeds using histogram based algorithms for greater accuracy and scalability..

Scene 8 (3m 27s)

[Audio] Our model with the four machine learning algorithms CatBoost RandomForest XGBoost and LightGBM has achieved remarkable accuracy rates. We obtained an overall accuracy of 56% with CatBoost 57% with RandomForest 59% with XGBoost and 56% with LightGBM. We have seen even more impressive results for subclass prediction accuracy 68% with CatBoost 70% with RandomForest 71% with XGBoost and 70% with LightGBM. We are extremely pleased with these results and extend our sincere gratitude to the team for their hard work and commitment..