About Me

My name is Prakhar Srivastava. I'm a Full Stack Data Scientist with a unique blend of skills in machine learning, data engineering, and data science. I have a trifecta of industrial expertise, academic rigor, and over three years of experience. I'm a recent master's graduate in Computer Science with a specialization in AI from the University of Tartu, Estonia

Three words describe me: challenge seeker, passionate learner, and a jocose.

I strongly believe that the data-driven approach for organizing data and explainability in machine learning models can reduce computation costs and carbon footprint ūüĆŹ

Employment History

Project: Deep Learning for Large-Scale Image Processing, Engineering and Analysis

1. Leveraged the Euclidean distance algorithm to cluster and preprocess massive volumes of optics-free image data, facilitating downstream analysis and enabling streamlined data pipelines.

2. Conducted comprehensive analysis on image data classes, employing advanced data wrangling techniques and building robust data structures to enable efficient data processing and manipulation.

3. Within this project, I worked on my thesis to train deep metric models for optics-free image classification, leveraging advanced machine learning libraries and frameworks to build scalable and optimized data pipelines.

4. Conducted performance evaluation and optimization of the model using sophisticated data engineering techniques, including data cleaning, feature engineering, and data normalization, delivering significant improvements in model accuracy and efficiency. 

    Skills: ETL Tools, Data Analysis, Deep Metric Learning, Image Processing, Python (Programming Language),  Big Data, Data Mining 


Project 1: Cloud removal in remote sensing Sentinel-2 images 

1. Worked on removing clouds from satellite images using conditional-GAN image-to-image translation. 

2. Prepared the clouded and cloudless images for the model using time series selection. Applied data augmentation using albumentations. 

3. Built the network architecture that focused on learning from the mapping of input to output images with a loss function. 

4. Evaluated the CGAN-generated synthesized images using the pre-trained semantic classifier. 

Tools: PyTorch, GIS, GDAL, rasterio,albumentations; Language: Python ; AWS cloud service: Sagemaker, S3 bucket. 


Project 2: Crop parcel delineation in remote sensing Sentinal-2 images 

1. Worked on delineating crop parcels using the hybrid Transformer and CNN architecture. 

2. Collaborated with space agency for labels and prepared the rasters from Sen2Cor. Performed cloud mask-out on labels using cloud coordinates and applied controlled data augmentation using albumentations. 

3. Built the UNet network architecture in which features are extracted using CNN and converted into feature maps that are sequentially passed to Transformer layers for further up-down sampling tasks to obtain the segmented binary image of the crop parcels. 

4. The model is evaluated using MIoU, FwIoU, and Dice-score metrics. 

Tools: PyTorch, Tensorflow, GIS, GDAL, rasterio,albumentations; Language: Python; AWS cloud service: Sagemaker, S3 bucket. 


Project: Model optimization for pedestrian detection/tracking using a dual Transformer architecture.

1. Collaborated with a Ph.D. student to develop and improve pedestrian detection/tracking models based on a dual Transformer architecture.

2. Conducted extensive experimentation and analysis to identify and fine-tune the most effective hyperparameters and settings for the model.

3. Implemented various optimizations and techniques to improve the speed, accuracy, and efficiency of the model, such as pruning, quantization, and knowledge distillation.

4. Evaluated the performance of the model on benchmark datasets and compared them with other state-of-the-art approaches in terms of detection and tracking metrics such as AP, MOTA, and MOTP.

5. Developed and maintained code repositories, documentation, and experiments tracking systems to facilitate collaboration and reproducibility. 


Project type: Telecom 

1. Worked in a team environment to develop, deploy and maintain ETL packages using SSIS workflows on Snowflake for new/old mobile tariff plans within the billing and CRM. 

2. Collaborated and coordinated with development teams to deploy data quality solutions for cross-functional teams to design data warehousing systems to meet machine learning team needs (for example mobile tariff plan preference predictions, etc.) 

3. Understanding the existing business processes and interacting with clients to finalize their requirements. 

4. Used Spark SQL and Kafka for data pre-processing, cleaning and joining very large data sets. 

Tools: Spark, Kafka and SSIS; Stream-processing systems: Spark-Streaming; Database: Oracle and Snowflake; AWS cloud service: Redshift; Languages: Spark SQL, Python, Shell Scripting 


Project type: Healthcare

1. Utilising statistical analysis models to interpret and produce comprehensive reports on trends and patterns of e-healthcare services. 

2. Assisted data scientists with analysis. 

3. Review metrics reports and performance indicators to correct issues related to data codes for process improvement of treatment and medicine delivery. 

Tools: Tableau, Microsoft Excel, SAS Enterprise Miner; Analytical coding language: SQL, Python; Database: Oracle database.