Skip to content

werowe/cassandraTwitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

This Scala program is a Spark Streaming application that consumes real-time tweet data from a Kafka topic and extracts the tweet text for each message. Here's how it works step-by-step:


1. Imports and Spark/Kafka Integration

  • Imports Spark Streaming, Kafka, and deserializer utilities needed for integration.
  • Sets up standard imports for Spark Streaming and Kafka 0.10 connector.

2. Kafka and Spark Configuration

  • Defines kafkaParams, a configuration map for Kafka consumer parameters (brokers, deserializers, consumer group id, etc.).
  • Creates a SparkConf to configure the Spark application (named "tweeter").
  • Initializes a StreamingContext with a batch interval of 2 seconds.

3. Kafka Stream Creation

  • Specifies the Kafka topic ("trump") to listen to as an array.
  • Uses KafkaUtils.createDirectStream to connect Spark Streaming to Kafka, subscribing to the "trump" topic.

4. Streaming Processing Pipeline

  • For each RDD (micro-batch of messages) in the Kafka stream:
    • Iterates over every record in the batch.
    • Gets the message value and parses it

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages