Home

Awesome

streaming-data-pipeline

CircleCI

Streaming pipeline repo for data engineering training program

See producers and consumers set up README in their respective directories

#local environment setup

###Prerequisites:

###Steps

  1. Run ./sbin/buildAndRunLocal.sh. This creates various Docker containers (each with an independent purpose) for running and testing this setup on your local machine.

  2. If everything us up and running, you should be able to see data in hadoop. To check for data:

    1. docker ps | grep hadoop - you should see at least one container referencing hadoop (we can ignore hadoop_seed for now)
    2. docker exec -it $CONTAINER_ID bash
    3. /usr/local/hadoop/bin/hadoop fs -ls /free2wheelers/stationMart/data
    4. Tada! We have data! (if you don't -- something went wrong, check "Considerations")

###Considerations