Monday, June 12, 2023

Database replication using Confluent (Kafka) and Debezium

I have been playing with confluent cloud and Debezium for a little while and found it is extremely useful in streaming data ingestion, the usual use case I came across includes the following two scenario:

1. Use Debezium CDC connector to generate change records to Kafka topics, dump the change records to either cloud storage or to delta lake, this is usually called the raw zone, you can then subsequently consume these change records in your favorate data platform, such as Databricks or Snowflake, both have a rebust streaming ingestion support.
2. Another way is often you just want to have a copy fo the production database for analytics usage, hence a like for like replication is what you need, you can you jdbc sink connector for that, the additional benefits is that you can replicate data to different target database platform, for example mysql to SQL Server, postgres to SQL Server, mysql to postgres etc.

Friday, May 5, 2023

Migrating IBM DB2 to Google Bigtable and achieving FIPS compliance encryption using Java custom encryption library

 This is about a project I undetook recently, the purpose was to migrate  large volume of on-prem db2 data to google bigtable using DataProc, a spark based solution on Google Cloud, a few things that are notable from the project:

1. I have to use SCALA to develop the solution due to the fact that the encryption libary was developed in Java and althoguh it has interoperativity to Python, it does have lot of limitation which stoped me from using Python... on the other hand, SCALA and JAVA just work together seamlessly.

2. for FIPS compliance, I have to use bouncecastle library, which introduces issues in managing dependencies, "dependency hell" as some named it, at then end I have to use mevan to manage dependencies and shade sbt due to the complexity.

3. I used hbase-spark connector for talking to bigtable, Since I am using spark 3, I have to complied the connector libaray manually, see https://github.com/apache/hbase-connectors/tree/master/spark 

(this project was done about a year ago)

Handling Large Messages With Apache Kafka

While working on handling large messages with Kafka, I came across a few useful reference articles, bookmarking here for anyone who needs them:

https://dzone.com/articles/processing-large-messages-with-apache-kafka

https://www.morling.dev/blog/single-message-transforms-swiss-army-knife-of-kafka-connect/

https://www.kai-waehner.de/blog/2020/08/07/apache-kafka-handling-large-messages-and-files-for-image-video-audio-processing/

https://docs.confluent.io/cloud/current/connectors/single-message-transforms.html#cc-single-message-transforms-limitations

Tuesday, April 4, 2023

Object Tracking Demo

 


In a proof of concept project I undertook a while ago, YOLO (You Only Look Once) object detection model was used in combination with the Deep SORT (Simple Online and Realtime Tracking) algorithm to track objects in real-time. The aim was to showcase how this technology can be applied to traffic monitoring, specifically measuring the time it takes for vehicles to pass through a road junction. The project demonstrated the effectiveness of the combination of these technologies in accurately detecting and tracking vehicles as they move through a monitored area. The results showed that this method could provide accurate data on traffic flow, which could be useful for traffic management and infrastructure planning purposes. The time taken for vehicles to pass through the road junction was also measured as part of the project.


Elevating LLM Deployment with FastAPI and React: A Step-By-Step Guide

  In a   previous exploration , I delved into creating a Retrieval-Augmented-Generation (RAG) demo, utilising Google’s gemma model, Hugging ...