🚀 Advanced: GitHub - werowe/cassandraTwitter · GitHub | Tech Viral News

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
TwitterStream.scala		TwitterStream.scala
sbt.build		sbt.build

Repository files navigation

This Scala program is a Spark Streaming application that consumes real-time tweet data from a Kafka topic and extracts the tweet text for each message. Here's how it works step-by-step:

1. Imports and Spark/Kafka Integration

Imports Spark Streaming, Kafka, and deserializer utilities needed for integration.
Sets up standard imports for Spark Streaming and Kafka 0.10 connector.

2. Kafka and Spark Configuration

Defines kafkaParams, a configuration map for Kafka consumer parameters (brokers, deserializers, consumer group id, etc.).
Creates a SparkConf to configure the Spark application (named "tweeter").
Initializes a StreamingContext with a batch interval of 2 seconds.

3. Kafka Stream Creation

Specifies the Kafka topic ("trump") to listen to as an array.
Uses KafkaUtils.createDirectStream to connect Spark Streaming to Kafka, subscribing to the "trump" topic.

4. Streaming Processing Pipeline

For each RDD (micro-batch of messages) in the Kafka stream:
- Iterates over every record in the batch.
- Gets the message value and parses it

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

Contributors

Languages

Scala 100.0%