I have been trying to do this since 3 years ago. I have been trying to deploy and run Apache kafka in kubernetes cluster and use debezium for CDC to source from Oracle database. At that time most of the kubernetes APIs are still in beta and debezium version is something like 0.1 beta. There was no kafka operator for kuberenetes. I have to do every kafka related deployment by hands. Kafka is a kind of stateful application and it is very difficult to manage in stateless nature of kubernetes without operator. Debezium support for Oracle was also in alpha stage. I have failed to connect Oracle database using debezium connector because of many bug in the connector plugin. I followed debezium mailing list to know updates and maturity of the project to try it later again. Now its latest version is 1.6 and support for Oracle DB also very much stable.
Last week I tried to do this setup again. I found the project Strimzi. It is kafka operator for kubernetes. Latest version is 0.25.0. Although it is still in version 0.* it works perfectly. Here I will show how it is easy to deploy kafka in kubernetes using Strimzi and how easy to setup/configure CDC using debezium.
You need to have all kubernetes related utilities such as kubectl, helm and oracle database with logminer or xstream support. If you would like to start with minimal setup, you can use minikube or kubernetes kind. Here is the steps you need to follow.
- Prepare Docker image for debezium oracle connector
- setup helm repository and pull helm chart to get sample values.yaml
- Update strimzi helm values
- Deploy strimzi operator and kafka cluster with updated values
- Deploy debezium oracle connector plugin cluster
- Prepare Oracle database
- Apply kafka connector configuration for debezium oracle connector
- verify and check data stream
Prepare Docker image for debezium oracle connector
Debezium oracle connector need jdbc.jar and xstream.jar from oracle database client library. You need to get it from oracle download site. Here is Dockerfile content to build debezium oracle connector container image.
FROM quay.io/strimzi/kafka:0.25.0-kafka-2.8.0 ENV KAFKA_HOME=/opt/kafka USER root:root RUN cd /tmp; \ curl -LO https://download.oracle.com/otn_software/linux/instantclient/213000/instantclient-basic-linux.x64-21.3.0.0.0.zip; \ unzip instantclient-basic-linux.x64-21.3.0.0.0.zip; \ cp instantclient_21_3/* ${KAFKA_HOME}/libs; \ rm -rf instantclient_21_3; \ rm instantclient-basic-linux.x64-21.3.0.0.0.zip ########## # Connector plugin debezium-oracle-connect ########## RUN mkdir -p ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4 \ && curl -L --output ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz https://repo1.maven.org/maven2/io/debezium/debezium-connector-oracle/1.6.1.Final/debezium-connector-oracle-1.6.1.Final-plugin.tar.gz \ && echo "fe5eb4d0dda150b10d24a6d9f3a631c493267a0dee2d72167a8841af5804c43a908d149e5bc4a87dc48f0747e26a1d85a930225eea7b9709726ea2abda95b487 ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz" > ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz.sha512 \ && sha512sum --check ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz.sha512 \ && rm -f ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz.sha512 \ && tar xvfz ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz -C ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4 \ && rm -vf ${KAFKA_HOME}/plugins/debezium-oracle-connect/deaf1cc4.tgz USER 1001
Or you can use prebuilt image from here “registry.gitlab.com/herzcthu/debezium-oracle:1.6.1”
setup helm repository and pull helm chart to get sample values.yaml
helm repo add strimzi https://strimzi.io/charts/ helm pull strimzi/strimzi-kafka-operator tar -zxvf strimzi-kafka-operator-helm-3-chart-0.25.0.tgz code strimzi-kafka-operator/values.yaml
Update strimzi helm values
At line number 5 update this to watch kafka deployments in “kafka” namespace
watchNamespaces - kafka:
At line number 47 to 50, you need to update values to use previous built image
kafkaConnect: image: registry: registry.gitlab.com repository: herzcthu name: debezium-oracle tag: 1.6.1
Deploy strimzi operator and kafka cluster with updated values
Deploy strimzi operator
kubectl create ns strimzi kubectl create ns kafka helm install strimzi strimzi/strimzi-kafka-operator -n strimzi -f values.yaml
Download deployment file sample from https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml
Change replicaset to 3 and change other parameters according to your requirements
kubectl apply -f kafka-persistent-single.yaml -n kafka kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka
Deploy debezium oracle connector plugin cluster
You need to create container registry credentials as k8s secret if you are using private repository for debezium oracle connector image.
kubectl -nkafka create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
Connector deployment file
apiVersion: kafka.strimzi.io/v1bet kind: KafkaConnect metadata: name: my-connect-cluster annotations: # # use-connector-resources configures this KafkaConnect # # to use KafkaConnector resources to avoid # # needing to call the Connect REST API directly strimzi.io/use-connector-resources: "true" spec: version: 2.8.0 replicas: 1 image: "registry.gitlab.com/herzcthu/debezium-oracle:1.6.1" template: pod: imagePullSecrets: - name: regcred bootstrapServers: my-cluster-kafka-bootstrap:9093 tls: trustedCertificates: - secretName: my-cluster-cluster-ca-cert certificate: ca.crt config: group.id: connect-cluster offset.storage.topic: connect-cluster-offsets config.storage.topic: connect-cluster-configs status.storage.topic: connect-cluster-status # -1 means it will use the default replication factor configured in the broker config.storage.replication.factor: -1 offset.storage.replication.factor: -1 status.storage.replication.factor: -1
Prepare Oracle database
This is a job for Oracle DBA. You can follow this documentation.
https://debezium.io/documentation/reference/connectors/oracle.html#_preparing_the_database
Apply kafka connector configuration for debezium oracle connector
apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaConnector metadata: name: debezium-oracle-server1 labels: strimzi.io/cluster: debezium-oracle spec: class: io.debezium.connector.oracle.OracleConnector tasksMax: 1 config: database.server.name: "server1" database.hostname: "192.168.1.1" database.port: "1521" database.user: "dbzuser" database.password: "dbz" database.dbname: "DBZDB" # broker port 9092 is plain text and 9093 is for SSL/TLS database.history.kafka.bootstrap.servers: "my-cluster-kafka-bootstrap:9092" database.history.kafka.topic: "schema-changes.collections" schema.include.list: "SCHEMA_NAME" table.include.list: "SCHEMA_NAME.TABLE_NAME"
verify and check data stream
Check configuration
kubectl -n kafka run connect-cluster-configs -ti --image=quay.io/strimzi/kafka:0.25.0-kafka-2.8.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic connect-cluster-configs --from-beginning
Check connector status
kubectl -n kafka run connect-cluster-status -ti --image=quay.io/strimzi/kafka:0.25.0-kafka-2.8.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic connect-cluster-status --from-beginning
Check data streaming. Topic name format is SERVER_NAME.SCHEMA_NAME.TABLE_NAME
kubectl -n kafka run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.25.0-kafka-2.8.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic server1.SCHEMA_NAME.TABLE_NAME --from-beginning
References:
- https://github.com/strimzi/strimzi-kafka-operator/tree/0.25.0/helm-charts/helm3/strimzi-kafka-operator
- https://debezium.io/documentation/reference/connectors/oracle.html