Where are the dynamic configurations for a topic stored?
- A . In Zookeeper
- B . In an internal Kafka topic __topic_configuratins
- C . In server.properties
- D . On the Kafka broker file system
A
Explanation:
Dynamic topic configurations are maintained in Zookeeper.
What happens when broker.rack configuration is provided in broker configuration in Kafka cluster?
- A . You can use the same broker.id as long as they have different broker.rack configuration
- B . Replicas for a partition are placed in the same rack
- C . Replicas for a partition are spread across different racks
- D . Each rack contains all the topics and partitions, effectively making Kafka highly available
C
Explanation:
Partitions for newly created topics are assigned in a rack alternating manner, this is the only change broker.rack does
What is the disadvantage of request/response communication?
- A . Scalability
- B . Reliability
- C . Coupling
- D . Cost
C
Explanation:
Point-to-point (request-response) style will couple client to the server.
is KSQL ANSI SQL compliant?
- A . Yes
- B . No
B
Explanation:
KSQL is not ANSI SQL compliant, for now there are no defined standards on streaming SQL languages
When using plain JSON data with Connect, you see the following error messageorg.apache.kafka.connect.errors.DataExceptionJsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields .
How will you fix the error?
- A . Set key.converter, value.converter to JsonConverter and the schema registry url
- B . Use Single Message Transforms to add schema and payload fields in the message
- C . Set key.converter.schemas.enable and value.converter.schemas.enable to false
- D . Set key.converter, value.converter to AvroConverter and the schema registry url
C
Explanation:
You will need to set the schemas.enable parameters for the converter to false for plain text with no schema.
There are 3 brokers in the cluster. You want to create a topic with a single partition that is resilient to one broker failure and one broker maintenance .
What is the replication factor will you specify while creating the topic?
- A . 6
- B . 3
- C . 2
- D . 1
B
Explanation:
1 is not possible as it doesn’t provide resilience to failure, 2 is not enough as if we take a broker down for maintenance, we cannot tolerate a broker failure, and 6 is impossible as we only have 3 brokers (RF cannot be greater than the number of brokers). Here the correct answer is 3
Two consumers share the same group.id (consumer group id). Each consumer will
- A . Read mutually exclusive offsets blocks on all the partitions
- B . Read all the data on mutual exclusive partitions
- C . Read all data from all partitions
B
Explanation:
Each consumer is assigned a different partition of the topic to consume.
A consumer starts and has auto.offset.reset=none, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 10 for the topic before.
Where will the consumer read from?
- A . offset 45
- B . offset 10
- C . it will crash
- D . offset 2311
C
Explanation:
auto.offset.reset=none means that the consumer will crash if the offsets it’s recovering from have been deleted from Kafka, which is the case here, as 10 < 45
A kafka topic has a replication factor of 3 and min.insync.replicas setting of 2 .
How many brokers can go down before a producer with acks=all can’t produce?
- A . 0
- B . 2
- C . 1
- D . 3
C
Explanation:
acks=all and min.insync.replicas=2 means we must have at least 2 brokers up for the partition to be available
Where are KSQL-related data and metadata stored?
- A . Kafka Topics
- B . Zookeeper
- C . PostgreSQL database
- D . Schema Registry
A
Explanation:
metadata is stored in and built from the KSQL command topic. Each KSQL server has its own in-memory version of the metastore.
You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3 .
How many tasks will you configure for the S3 connector?
- A . 10
- B . 6
- C . 3
- D . 2
D
Explanation:
You cannot have more sink tasks (= consumers) than the number of partitions, so 2.
To enhance compression, I can increase the chances of batching by using
- A . acks=all
- B . linger.ms=20
- C . batch.size=65536
- D . max.message.size=10MB
B
Explanation:
linger.ms forces the producer to wait before sending messages, hence increasing the chance of creating batches that can be heavily compressed.
How can you gracefully make a Kafka consumer to stop immediately polling data from Kafka and gracefully shut down a consumer application?
- A . Call consumer.wakeUp() and catch a WakeUpException
- B . Call consumer.poll() in another thread
- C . Kill the consumer thread
A
Explanation:
See https://stackoverflow.com/a/37748336/3019499
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> textLines = builder.stream("word-count-input");
KTable<String, Long> wordCounts = textLines
.mapValues(textLine -> textLine.toLowerCase())
.flatMapValues(textLine -> Arrays.asList(textLine.split("W+")))
.selectKey((key, word) -> word)
.groupByKey()
.count(Materialized.as("Counts"));
wordCounts.toStream().to("word-count-output", Produced.with(Serdes.String(), Serdes.Long()));
builder.build();
What is an adequate topic configuration for the topic word-count-output?
- A . max.message.bytes=10000000
- B . cleanup.policy=delete
- C . compression.type=lz4
- D . cleanup.policy=compact
D
Explanation:
Result is aggregated into a table with key as the unique word and value its frequency. We have to enable log compaction for this topic to align the topic’s cleanup policy with KTable semantics.
Where are the ACLs stored in a Kafka cluster by default?
- A . Inside the broker’s data directory
- B . Under Zookeeper node /kafka-acl/
- C . In Kafka topic __kafka_acls
- D . Inside the Zookeeper’s data directory
A
Explanation:
ACLs are stored in Zookeeper node /kafka-acls/ by default.
What kind of delivery guarantee this consumer offers?
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
try {
consumer.commitSync();
} catch (CommitFailedException e) { log.error("commit failed", e)
}
for (ConsumerRecord<String, String> record records)
{
System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s
",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
}
- A . Exactly-once
- B . At-least-once
- C . At-most-once
C
Explanation:
Here offset is committed before processing the message. If consumer crashes before processing the message, message will be lost when it comes back up.
The exactly once guarantee in the Kafka Streams is for which flow of data?
- A . Kafka => Kafka
- B . Kafka => External
- C . External => Kafka
A
Explanation:
Kafka Streams can only guarantee exactly once processing if you have a Kafka to Kafka topology.
You are using JDBC source connector to copy data from a table to Kafka topic. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers .
How many tasks are launched?
- A . 3
- B . 2
- C . 1
- D . 6
C
Explanation:
JDBC connector allows one task per table.
You want to perform table lookups against a KTable everytime a new record is received from the KStream .
What is the output of KStream-KTable join?
- A . KTable
- B . GlobalKTable
- C . You choose between KStream or KTable
- D . Kstream
D
Explanation:
Here KStream is being processed to create another KStream.
You are doing complex calculations using a machine learning framework on records fetched from a Kafka topic. It takes more about 6 minutes to process a record batch, and he consumer enters rebalances even though it’s still running .
How can you improve this scenario?
- A . Increase max.poll.interval.ms to 600000
- B . Increase heartbeat.interval.ms to 600000
- C . Increase session.timeout.ms to 600000
- D . Add consumers to the consumer group and kill them right away
A
Explanation:
Here, we need to change the setting max.poll.interval.ms (default 300000) to its double in order to tell Kafka a consumer should be considered dead if the consumer only if it hasn’t called the .poll() method in 10 minutes instead of 5.
Which actions will trigger partition rebalance for a consumer group? (select three)
- A . Increase partitions of a topic
- B . Remove a broker from the cluster
- C . Add a new consumer to consumer group
- D . A consumer in a consumer group shuts down
Add a broker to the cluster
A,C,D
Explanation:
Rebalance occurs when a new consumer is added, removed or consumer dies or paritions increased.
Which of the following setting increases the chance of batching for a Kafka Producer?
- A . Increase batch.size
- B . Increase message.max.bytes
- C . Increase the number of producer threads
- D . Increase linger.ms
D
Explanation:
linger.ms forces the producer to wait to send messages, hence increasing the chance of
creating batches
What data format isn’t natively available with the Confluent REST Proxy?
- A . avro
- B . binary
- C . protobuf
- D . json
C
Explanation:
Protocol buffers isn’t a natively supported type for the Confluent REST Proxy, but you may use the binary format instead
You are using JDBC source connector to copy data from 2 tables to two Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers .
How many tasks are launched?
- A . 6
- B . 1
- C . 2
- D . 3
C
Explanation:
we have two tables, so the max number of tasks is 2
To import data from external databases, I should use
- A . Confluent REST Proxy
- B . Kafka Connect Sink
- C . Kafka Streams
- D . Kafka Connect Source
D
Explanation:
Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.
What is a generic unique id that I can use for messages I receive from a consumer?
- A . topic + partition + timestamp
- B . topic + partition + offset
- C . topic + timestamp
B
Explanation:
(Topic,Partition,Offset) uniquely identifies a message in Kafka
What happens if you write the following code in your producer?
producer.send(producerRecord).get()
- A . Compression will be increased
- B . Throughput will be decreased
- C . It will force all brokers in Kafka to acknowledge the producerRecord
- D . Batching will be increased
B
Explanation:
Using Future.get() to wait for a reply from Kafka will limit throughput.
Compaction is enabled for a topic in Kafka by setting log.cleanup.policy=compact .
What is true about log compaction?
- A . After cleanup, only one message per key is retained with the first value
- B . Each message stored in the topic is compressed
- C . Kafka automatically de-duplicates incoming messages based on key hashes
- D . After cleanup, only one message per key is retained with the latest value
Compaction changes the offset of messages
D
Explanation:
Log compaction retains at least the last known value for each record key for a single topic partition. All compacted log offsets remain valid, even if record at offset has been compacted away as a consumer will get the next highest offset.
Which of these joins does not require input topics to be sharing the same number of partitions?
- A . KStream-KTable join
- B . KStream-KStream join
- C . KStream-GlobalKTable
- D . KTable-KTable join
C
Explanation:
GlobalKTables have their datasets replicated on each Kafka Streams instance and therefore no repartitioning is required
How often is log compaction evaluated?
- A . Every time a new partition is created
- B . Every time a segment is closed
- C . Every time a message is sent to Kafka
- D . Every time a message is flushed to disk
B
Explanation:
Log compaction is evaluated every time a segment is closed. It will be triggered if enough data is "dirty" (see dirty ratio config)
A Zookeeper ensemble contains 3 servers.
Over which ports the members of the ensemble should be able to communicate in default configuration? (select three)
- A . 2181
- B . 3888
- C . 443
- D . 2888
- E . 9092
- F . 80
A,B,D
Explanation:
2181 – client port, 2888 – peer port, 3888 – leader port
A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception Not Leader For Partition Exception in the response .
How does client handle this situation?
- A . Get the Broker id from Zookeeper that is hosting the leader replica and send request to it
- B . Send metadata request to the same broker for the topic and select the broker hosting the leader replica
- C . Send metadata request to Zookeeper for the topic and select the broker hosting the leader replica
- D . Send fetch request to each Broker in the cluster
B
Explanation:
In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.
If I supply the setting compression.type=snappy to my producer, what will happen? (select two)
- A . The Kafka brokers have to de-compress the data
- B . The Kafka brokers have to compress the data
- C . The Consumers have to de-compress the data
- D . The Consumers have to compress the data
- E . The Producers have to compress the data
C
Explanation:
Kafka transfers data with zero copy and no transformation. Any transformation (including compression) is the responsibility of clients.
If I produce to a topic that does not exist, and the broker setting auto.create.topic.enable=true, what will happen?
- A . Kafka will automatically create the topic with 1 partition and 1 replication factor
- B . Kafka will automatically create the topic with the indicated producer settings num.partitions and default.replication.factor
- C . Kafka will automatically create the topic with the broker settings num.partitions and default.replication.factor
- D . Kafka will automatically create the topic with num.partitions=#of brokers and replication.factor=3
C
Explanation:
The broker settings comes into play when a topic is auto created
How will you read all the messages from a topic in your KSQL query?
- A . KSQL reads from the beginning of a topic, by default.
- B . KSQL reads from the end of a topic. This cannot be changed.
- C . Use KSQL CLI to set auto.offset.reset property to earliest
C
Explanation:
Consumers can set auto.offset.reset property to earliest to start consuming from beginning. For KSQL, SET ‘auto.offset.reset’=’earliest’;
The kafka-console-consumer CLI, when used with the default options
- A . uses a random group id
- B . always uses the same group id
- C . does not use a group id
A
Explanation:
If a group is not specified, the kafka-console-consumer generates a random consumer group.
A producer is sending messages with null key to a topic with 6 partitions using the DefaultPartitioner. Where will the messages be stored?
- A . Partition 5
- B . Any of the topic partitions
- C . The partition for the null key
- D . Partition 0
A
Explanation:
Message with no keys will be stored with round-robin strategy among partitions.
Which of the following Kafka Streams operators are stateless? (select all that apply)
- A . map
- B . filter
- C . flatmap
- D . branch
- E . groupBy
- F . aggregate
A,B,C,D,E
Explanation:
Seehttps://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#stateless-transformations
Suppose you have 6 brokers and you decide to create a topic with 10 partitions and a replication factor of 3. The brokers 0 and 1 are on rack A, the brokers 2 and 3 are on rack B, and the brokers 4 and 5 are on rack C. If the leader for partition 0 is on broker 4, and the first replica is on broker 2, which broker can host the last replica? (select two)
- A . 6
- B . 1
- C . 2
- D . 5
- E . 0
- F . 3
B,E
Explanation:
When you create a new topic, partitions replicas are spreads across racks to maintain availability. Hence, the Rack A, which currently does not hold the topic partition, will be selected for the last replica
Your topic is log compacted and you are sending a message with the key K and value null .
What will happen?
- A . The broker will delete all messages with the key K upon cleanup
- B . The producer will throw a Runtime exception
- C . The broker will delete the message with the key K and null value only upon cleanup
- D . The message will get ignored by the Kafka broker
A
Explanation:
Sending a message with the null value is called a tombstone in Kafka and will ensure the log compacted topic does not contain any messages with the key K upon compaction
A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1 .
What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?
- A . 3
- B . 0
- C . 2
- D . 1
C
Explanation:
Two brokers can go down, and one replica will still be able to receive and serve data
You have a Kafka cluster and all the topics have a replication factor of 3. One intern at your company stopped a broker, and accidentally deleted all the data of that broker on the disk .
What will happen if the broker is restarted?
- A . The broker will start, and other topics will also be deleted as the broker data on the disk got deleted
- B . The broker will start, and won’t be online until all the data it needs to have is replicated from other leaders
- C . The broker will crash
- D . The broker will start, and won’t have any data. If the broker comes leader, we have a data loss
B
Explanation:
Kafka replication mechanism makes it resilient to the scenarios where the broker lose data on disk, but can recover from replicating from other brokers. This makes Kafka amazing!
A consumer application is using KafkaAvroDeserializer to deserialize Avro messages .
What happens if message schema is not present in AvroDeserializer local cache?
- A . Throws SerializationException
- B . Fails silently
- C . Throws DeserializationException
- D . Fetches schema from Schema Registry
D
Explanation:
First local cache is checked for the message schema. In case of cache miss, schema is pulled from the schema registry. An exception will be thrown in the Schema Registry does not have the schema (which should never happen if you set it up properly)
In the Kafka consumer metrics it is observed that fetch-rate is very high and each fetch is small .
What steps will you take to increase throughput?
- A . Increase fetch.max.wait
- B . Increase fetch.max.bytes
- C . Decrease fetch.max.bytes
- D . Decrease fetch.min.bytes
- E . Increase fetch.min.bytes
E
Explanation:
This will allow consumers to wait and receive more bytes in each fetch request.
In Avro, removing or adding a field that has a default is a __ schema evolution
- A . full
- B . backward
- C . breaking
- D . forward
A
Explanation:
Clients with new schema will be able to read records saved with old schema and clients
with old schema will be able to read records saved with new schema.
You have a consumer group of 12 consumers and when a consumer gets killed by the process management system, rather abruptly, it does not trigger a graceful shutdown of your consumer. Therefore, it takes up to 10 seconds for a rebalance to happen. The business would like to have a 3 seconds rebalance time .
What should you do? (select two)
- A . Increase session.timeout.ms
- B . Decrease session.timeout.ms
- C . Increase heartbeat.interval.ms
- D . decrease max.poll.interval.ms
- E . increase max.poll.interval.ms
- F . Decrease heartbeat.interval.ms
B,E
Explanation:
session.timeout.ms must be decreased to 3 seconds to allow for a faster rebalance, and the heartbeat thread must be quicker, so we also need to decrease heartbeat.interval.ms
A consumer starts and has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group has committed the offset 643 for the topic before.
Where will the consumer read from?
- A . it will crash
- B . offset 2311
- C . offset 643
- D . offset 45
C
Explanation:
The offsets are already committed for this consumer group and topic partition, so the property auto.offset.reset is ignored
You are receiving orders from different customer in an "orders" topic with multiple partitions. Each message has the customer name as the key. There is a special customer named ABC that generates a lot of orders and you would like to reserve a partition exclusively for ABC. The rest of the message should be distributed among other partitions .
How can this be achieved?
- A . Add metadata to the producer record
- B . Create a custom partitioner
- C . All messages with the same key will go the same partition, but the same partition may have messages with different keys. It is not possible to reserve
- D . Define a Kafka Broker routing rule
B
Explanation:
A Custom Partitioner allows you to easily customise how the partition number gets computed from a source message.
A Zookeeper ensemble contains 5 servers .
What is the maximum number of servers that can go missing and the ensemble still run?
- A . 3
- B . 4
- C . 2
- D . 1
C
Explanation:
majority consists of 3 zk nodes for 5 nodes zk cluster, so 2 can fail
If I want to send binary data through the REST proxy, it needs to be base64 encoded .
Which component needs to encode the binary data into base 64?
- A . The Producer
- B . The Kafka Broker
- C . Zookeeper
- D . The REST Proxy
A
Explanation:
The REST Proxy requires to receive data over REST that is already base64 encoded, hence it is the responsibility of the producer
Which of the following is not an Avro primitive type?
- A . string
- B . long
- C . int
- D . date
- E . null
D
Explanation:
date is a logical type
An ecommerce website maintains two topics – a high volume "purchase" topic with 5 partitions and low volume "customer" topic with 3 partitions. You would like to do a stream-table join of these topics .
How should you proceed?
- A . Repartition the purchase topic to have 3 partitions
- B . Repartition customer topic to have 5 partitions
- C . Model customer as a GlobalKTable
- D . Do a KStream / KTable join after a repartition step
C
Explanation:
In case of KStream-KStream join, both need to be co-partitioned. This restriction is not applicable in case of join with GlobalKTable, which is the most efficient here.
A topic "sales" is being produced to in the Americas region. You are mirroring this topic using Mirror Maker to the European region. From there, you are only reading the topic for analytics purposes .
What kind of mirroring is this?
- A . Passive-Passive
- B . Active-Active
- C . Active-Passive
C
Explanation:
This is active-passing as the replicated topic is used for read-only purposes only
In Kafka, every broker… (select three)
- A . contains all the topics and all the partitions
- B . knows all the metadata for all topics and partitions
- C . is a controller
- D . knows the metadata for the topics and partitions it has on its disk
- E . is a bootstrap broker
- F . contains only a subset of the topics and the partitions
B,E,F
Explanation:
Kafka topics are divided into partitions and spread across brokers. Each brokers knows about all the metadata and each broker is a bootstrap broker, but only one of them is elected controller
What is true about partitions? (select two)
- A . A broker can have a partition and its replica on its disk
- B . You cannot have more partitions than the number of brokers in your cluster
- C . A broker can have different partitions numbers for the same topic on its disk
- D . Only out of sync replicas are replicas, the remaining partitions that are in sync are also leader
- E . A partition has one replica that is a leader, while the other replicas are followers
C,E
Explanation:
Only one of the replicas is elected as partition leader. And a broker can definitely hold many partitions from the same topic on its disk, try creating a topic with 12 partitions on one broker!
A consumer has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group never committed offsets for the topic before.
Where will the consumer read from?
- A . offset 2311
- B . offset 0
- C . offset 45
- D . it will crash
A
Explanation:
Latest means that data retrievals will start from where the offsets currently end
You are running a Kafka Streams application in a Docker container managed by Kubernetes, and upon application restart, it takes a long time for the docker container to replicate the state and get back to processing the data .
How can you improve dramatically the application restart?
- A . Mount a persistent volume for your RocksDB
- B . Increase the number of partitions in your inputs topic
- C . Reduce the Streams caching property
- D . Increase the number of Streams threads
A
Explanation:
Although any Kafka Streams application is stateless as the state is stored in Kafka, it can take a while and lots of resources to recover the state from Kafka. In order to speed up recovery, it is advised to store the Kafka Streams state on a persistent volume, so that only the missing part of the state needs to be recovered.
What isn’t a feature of the Confluent schema registry?
- A . Store avro data
- B . Enforce compatibility rules
- C . Store schemas
A
Explanation:
Data is stored on brokers.
A producer just sent a message to the leader broker for a topic partition. The producer used acks=1 and therefore the data has not yet been replicated to followers.
Under which conditions will the consumer see the message?
- A . Right away
- B . When the message has been fully replicated to all replicas
- C . Never, the produce request will fail
- D . When the high watermark has advanced
D
Explanation:
The high watermark is an advanced Kafka concept, and is advanced once all the ISR replicates the latest offsets. A consumer can only read up to the value of the High Watermark (which can be less than the highest offset, in the case of acks=1)
To continuously export data from Kafka into a target database, I should use
- A . Kafka Producer
- B . Kafka Streams
- C . Kafka Connect Sink
- D . Kafka Connect Source
C
Explanation:
Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.