Data Science & ML

Applied machine learning with a quality-first mindset.

My data science work emphasizes model evaluation, data quality, interpretable results, and practical business use cases for QA Automation, Data Quality, ML QA, and applied Data Science roles.

Interactive Study Lab

Data Science Flashcards

A small portfolio feature for learning and interview prep. Each card includes a visual memory cue, mini diagram, or formula-style hint so concepts are easier to remember. I can use this area to add flashcards from my courses, and visitors can test themselves on machine learning, metrics, QA-for-ML, and World Publishing Houses examples.

DeckLevel

Visual memory cue

Question

Answer

Click “Show Answer” below to reveal the explanation.

Add a new flashcard

Add temporary study cards directly in the browser. They are saved on this device with local storage. For permanent public cards, add them to assets/script.js in the baseFlashcards list. Each card can include a memory cue to help create a mental picture.

How this supports my portfolio

This feature shows more than course knowledge. It demonstrates how I think about learning systems, product usability, local state, clear interaction design, and explainable machine learning concepts.

  • Recruiters see applied data science topics, not only a static resume.
  • Hiring managers can connect my QA mindset to ML evaluation and risk.
  • I can keep adding cards from BU courses, interview prep, and WPH use cases.
  • The diagrams make the page feel more like a small learning product than a static portfolio.

Capstone: Residential Property Value Prediction

This project predicts residential property values using a Zillow-derived dataset of approximately 64,894 records and 19 numeric variables. The target variable is property tax assessed value, and the project compares linear regression, random forest, and gradient boosting.

Best modelRandom Forest — RMSE ~$290,919, MAE ~$184,569, R² ~0.517.
Primary metricRMSE, supported by MAE and R².
Model comparisonRandom Forest performed best among the evaluated linear regression, random forest, and gradient boosting approaches.
Business use caseValuation support for investors, lenders, assessors, analysts, and real-estate decision makers.
Ethics lensLocation features can proxy socioeconomic patterns; monitoring and retraining are important.
  • Python
  • Pandas
  • scikit-learn
  • Random Forest
  • Gradient Boosting
  • Model evaluation

Text mining and classification

I am connecting course concepts like embeddings, text classification, evaluation metrics, and responsible AI to World Publishing Houses. Potential applications include classifying publisher descriptions, detecting genre signals, identifying translation-related text, and improving search/discovery.

Data product applications

World Publishing Houses creates many natural ML opportunities: entity resolution for publisher and translator names, clustering countries or publishers by activity, predicting translation likelihood, and identifying metadata conflicts that require human review.

Tools and concepts

Technologies and methods I use or study in my QA-to-DS transition.

Python data stack

Pandas, NumPy, scikit-learn, notebooks, data cleaning, model comparison, and evaluation.

ML concepts

Regression, classification, clustering, embeddings, decision trees, gradient boosting, cross-validation, and error analysis.

Engineering context

SQL, API testing, logs, automation, CI/CD awareness, cloud/data pipeline concepts, and production QA thinking.