top of page
Search

Unlocking Insights with Data Science: My Internship Journey at Cognifyz Technologies

  • Writer: Geoffrey Ogato
    Geoffrey Ogato
  • Feb 19
  • 4 min read

Introduction

Data science has rapidly evolved into one of the most crucial fields in technology, impacting industries ranging from finance to healthcare.


My journey into data science took a significant leap when I was selected as a Data Science Intern at Cognifyz Technologies.


This opportunity allowed me to work on real-world datasets, build predictive models, and explore geospatial analytics, reinforcing my passion for machine learning and data-driven decision-making.


This article serves as a comprehensive breakdown of my experience, including the challenges I faced, the tools I used, and the key takeaways from my internship.


Project Overview


During my internship, I worked extensively with a dataset containing 9,551 entries, which primarily focused on the restaurant industry.


My tasks involved data cleaning, exploratory data analysis (EDA), feature engineering, predictive modeling, and geospatial analysis.


Through these tasks, I was able to develop a deeper understanding of data processing, visualization, and model optimization.


Exploratory Data Analysis (EDA)

EDA is the foundation of any data science project. The first step in my workflow was to explore, clean, and visualize the dataset to uncover patterns and inconsistencies.


  • Identifying Missing Values: I used Pandas and NumPy to analyze and handle missing values in the dataset.


  • Statistical Analysis: I computed summary statistics, such as mean, median, mode, and standard deviation, for numerical features like restaurant pricing, customer ratings, and votes.


  • Visualization Techniques: I employed Matplotlib and Seaborn to create histograms, KDE plots, and bar charts to better understand data distributions and feature relationships.


  • Key Findings:

    • Only 12.1% of restaurants in the dataset offer table booking, while 25.7% provide online delivery.

    • Higher-end restaurants (Price Range 4) tend to receive the highest average ratings.

Data Preprocessing & Feature Engineering

To ensure that the dataset was suitable for machine learning models, I implemented various data preprocessing techniques:


  • Data Cleaning: Used SimpleImputer to handle missing values and StandardScaler to normalize numerical features.


  • Feature Transformation: Applied OneHotEncoder to convert categorical variables into numerical representations.


  • Outlier Detection: Identified anomalies in restaurant votes and pricing data and handled them appropriately to avoid skewed model performance.


  • Class Imbalance: Addressed imbalance in restaurant ratings by using upsampling, downsampling, and data augmentation techniques.


Building Predictive Models

With a well-prepared dataset, I developed various machine learning models to predict restaurant ratings based on key features such as location, price range, and availability of online services.


Model Selection & Training

I experimented with multiple models to identify the best-performing one:


  • Linear Regression: Established a baseline model but was insufficient due to non-linearity in the dataset.


  • Decision Trees: Provided better interpretability but was prone to overfitting.


  • Random Forest Regressor: Achieved superior performance by reducing variance and improving generalization.


Hyperparameter Optimization

To fine-tune the models, I used GridSearchCV to optimize parameters, improving prediction accuracy. The model evaluation metrics included:


  • Mean Absolute Error (MAE)


  • Mean Squared Error (MSE)


  • R² Score


The Random Forest Regressor emerged as the most accurate model, reducing error significantly while maintaining interpretability.



Geospatial Analysis: Understanding Restaurant Density

One of the most exciting aspects of my internship was exploring geospatial analytics. Using Folium and Geopandas, I built interactive heatmaps to visualize restaurant concentration across different regions.


  • Created heatmaps to display restaurant density in India, the Middle East, and Southeast Asia.


  • Analyzed correlations between location and customer ratings, revealing trends in restaurant quality.


  • Identified data gaps in Africa and Europe, highlighting areas for potential market research.



Key Business Insights

Beyond technical development, this project also offered valuable business insights:


  • Urban Market Insights: The dataset confirmed that major metropolitan areas


  • dominate the restaurant industry, emphasizing the need for targeted marketing.


  • Consumer Behavior Analysis: Low-cost restaurants were more likely to offer online delivery, while premium restaurants focused on in-house dining experiences.


  • Market Gaps & Opportunities: Analyzed customer ratings by geography, helping businesses identify areas with demand for new restaurants.


Challenges & Lessons Learned

Throughout my internship, I encountered several challenges that strengthened my problem-solving skills:


  • Handling Large Datasets: Processing over 9,500 records required efficient data manipulation techniques to ensure smooth analysis.


  • Feature Selection: Choosing the right features for predictive modeling was crucial to improving model accuracy.


  • Interpreting Model Performance: Understanding why some models outperformed others deepened my knowledge of bias-variance tradeoffs.


  • Visualizing Data Effectively: I learned that clear, insightful visualizations significantly enhance decision-making.


Tools & Technologies Used

The internship provided hands-on experience with various data science tools and libraries, including:


  • Python: The primary language for data processing and model building.


  • Pandas & NumPy: Data manipulation and numerical computations.


  • Matplotlib & Seaborn: Data visualization.


  • Scikit-learn: Machine learning model development and evaluation.


  • Folium & Geopandas: Geospatial data analysis and mapping.


  • GridSearchCV: Hyperparameter tuning for model optimization.


Future Directions & Applications

This project has inspired me to explore further applications of data science, such as:


  • Deploying Machine Learning Models: Packaging models into APIs or integrating them into web applications.


  • Time Series Forecasting: Predicting restaurant demand trends over time.


  • Deep Learning for Sentiment Analysis: Analyzing customer reviews to gain deeper insights into restaurant popularity.


Conclusion

My internship at Cognifyz Technologies has been an incredible learning experience. From data wrangling and predictive modeling to geospatial analytics and business insights, I have gained invaluable skills that will shape my career in data science and artificial intelligence.


This experience has reinforced my belief that data is the key to unlocking powerful insights. I am excited to continue applying my knowledge to real-world problems and expanding my expertise in machine learning and AI-driven solutions.


Let's Connect!


If you're interested in collaborating on data science projects or discussing machine learning innovations, feel free to reach out! 🚀


 
 
 

Comments


bottom of page