DesiDataDuo
Posts
📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3

📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3

Edition 1: Setup + Ingest + Raw Landing Zone

Mohit Sharma & Rehana Sheikh
May 04, 2025

In this 3-part series, we’re walking you through how to build a lightweight but production-worthy data pipeline on AWS using free services.

🧱 Part 1: Setup, Batch Data Ingestion & Raw Landing Zones

✅ What You’ll Build

Pull JSON data from a public URL (coingecko.com)
Store it in an S3 raw landing zone
Use AWS Lambda (free tier eligible) for automation
(Bonus): Schedule it with EventBridge

🔧 Step 1: Create Your S3 Raw Zone

Go to the S3 Console.
Click "Create bucket"
Name it something like desidataduo-crypto-data or anything you want to name it
Enable versioning (optional)
Create the bucket

🛡️ Step-by-Step: Create an IAM Role for Lambda (with S3 + Logs Access)

🔹 Step 1: Go to the IAM Console

Open AWS Console → Search for IAM
Click "Roles" in the left sidebar
Click "Create role"

🔹 Step 2: Choose Trusted Entity Type

Under "Trusted entity type", select:
✅ AWS Service
Under "Use case", select:
✅ Lambda

🔹 Step 3: Attach Permissions Policy

At this step, you'll choose what actions the role can perform. Use AWS-Managed Policies

You can check, this will give full access and make it easier to implement the project:

AmazonS3FullAccess

🔹 Step 4: Name and Create Role

Name the role something like: write-from-lamda-to-s3
Optionally add a tag
Click Create Role
You should see something like below:

Role that writes from API to S3 using Lambda

🧠 Step 3: Write the Lambda Function (Python)

import datetime  # For getting the current date
import urllib.request  # To make HTTP requests
import urllib.parse  # To build a properly encoded URL
import os  # To read environment variables like S3 bucket name
import json  # To handle JSON encoding/decoding
import boto3  # AWS SDK for Python to interact with S3

# Create an S3 client
s3 = boto3.client('s3')

# Get the S3 bucket name from environment variables
BUCKET_NAME = os.environ.get("BUCKET_NAME", "your-bucket-name")
# PLEASE UPDATE THE BUCKET NAME
# Number of top cryptocurrencies to fetch
COINS_LIMIT = 100

def lambda_handler(event, context):
    # Get today's date in YYYY-MM-DD format
    today = datetime.datetime.utcnow().strftime("%Y-%m-%d")
    
    # Base URL of the CoinGecko API
    base_url = "https://api.coingecko.com/api/v3/coins/markets"

    # API query parameters
    query_params = {
        "vs_currency": "usd",        # Get prices in USD
        "order": "market_cap_desc",  # Sort by market cap, desc 
        "per_page": COINS_LIMIT, # Limit to top N coins
        "page": 1,               # Get page 1
        "sparkline": "false"     # Don't include sparkline data
    }

    # Construct the full URL with encoded parameters
    url = f"{base_url}?{urllib.parse.urlencode(query_params)}"

    try:
        # Send the API request and read the response
        with urllib.request.urlopen(url) as response:
            data = json.loads(response.read())  
         # Convert the JSON response to a Python object

        # Create the S3 key (file path) using today's date
        s3_key = f"raw/coins/{today}/top_{COINS_LIMIT}_coins.json"

        # Upload the JSON data to the specified S3 path
        s3.put_object(
            Bucket=BUCKET_NAME,
            Key=s3_key,
            Body=json.dumps(data),
            ContentType="application/json"
        )

        # Return a success message
        return {
            "statusCode": 200,
            "body": f"Successfully saved data to s3://{BUCKET_NAME}/{s3_key}"
        }

    except Exception as e:
        # Return the error message if anything goes wrong
        return {
            "statusCode": 500,
            "body": str(e)
        }

🚀 Step 4: Deploy the Lambda

Go to the Lambda Console
Click Create Function
Runtime: Python 3.9
Role: Use existing → write-from-lamda-to-s3
Paste the code and hit Deploy
Test it once manually — check S3 for the file (top_100_coins.json), you should have a file like below:

⏰ Bonus: Schedule It with EventBridge

Go to EventBridge > Rules
Click Create Rule
Use cron(0 2 * * ? *) for daily 2 AM UTC runs
Add your Lambda as the target
In settings page, Schedule state is set to “Enable” and Permissions is set to “Create new role for this schedule”
last page is “review and create” check everything and hit “Create schedule“

Name and schedule details for 2 AM UTC

The green box shows our existing lambda name used here

Schedule state = Enable and Permissions let it > “Create new role for this schedule”

Review and Create Schedule

Created 🙂

🔍 Summary

Service	Free Tier	Role
S3	✅	Store raw files
Lambda	✅	Lightweight ingestion logic
EventBridge	✅	Scheduled batch jobs (Optional)
IAM Role	✅	Secure S3 access
Glue	❌	Not free — skip for now

🧠 Why Not Glue for Ingestion?

Too heavy for simple batch pulls
Not free tier eligible, we are looking to enable you with free tier 🙂
Overkill unless you're transforming large datasets

🔄 That’s a Wrap on Part 1

You’ve just built a clean, serverless foundation to ingest Crypto data — no servers, no manual uploads, no fuss.

Next up: We’ll show you how to clean and structure that raw JSON into analytics-ready data, so you can start unlocking insights.

📩 Part 2: Data Transformation and Storage for Analytics lands in your inbox soon.

💬 Enjoyed this tutorial?
Forward it to a friend, or share it with your team.

📬 Not subscribed yet?
Join here to get the full 3-part series — and more hands-on data projects — straight to your inbox.

Until then, happy building! 🛠️