• DesiDataDuo
  • Posts
  • 📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3

📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3

Edition 1: Setup + Ingest + Raw Landing Zone

In this 3-part series, we’re walking you through how to build a lightweight but production-worthy data pipeline on AWS using free services.

🧱 Part 1: Setup, Batch Data Ingestion & Raw Landing Zones

✅ What You’ll Build

  • Pull JSON data from a public URL (coingecko.com)

  • Store it in an S3 raw landing zone

  • Use AWS Lambda (free tier eligible) for automation

  • (Bonus): Schedule it with EventBridge

🔧 Step 1: Create Your S3 Raw Zone

  1. Go to the S3 Console.

  2. Click "Create bucket"

  3. Name it something like desidataduo-crypto-data or anything you want to name it

  4. Enable versioning (optional)

  5. Create the bucket

🛡️ Step-by-Step: Create an IAM Role for Lambda (with S3 + Logs Access)

🔹 Step 1: Go to the IAM Console

  • Open AWS Console → Search for IAM

  • Click "Roles" in the left sidebar

  • Click "Create role"

🔹 Step 2: Choose Trusted Entity Type

  • Under "Trusted entity type", select:
    ✅ AWS Service

  • Under "Use case", select:
    ✅ Lambda

🔹 Step 3: Attach Permissions Policy

At this step, you'll choose what actions the role can perform. Use AWS-Managed Policies

You can check, this will give full access and make it easier to implement the project:

  • AmazonS3FullAccess

🔹 Step 4: Name and Create Role

  • Name the role something like: write-from-lamda-to-s3

  • Optionally add a tag

  • Click Create Role

  • You should see something like below:

Role that writes from API to S3 using Lambda

🧠 Step 3: Write the Lambda Function (Python)

import datetime  # For getting the current date
import urllib.request  # To make HTTP requests
import urllib.parse  # To build a properly encoded URL
import os  # To read environment variables like S3 bucket name
import json  # To handle JSON encoding/decoding
import boto3  # AWS SDK for Python to interact with S3

# Create an S3 client
s3 = boto3.client('s3')

# Get the S3 bucket name from environment variables
BUCKET_NAME = os.environ.get("BUCKET_NAME", "your-bucket-name")
# PLEASE UPDATE THE BUCKET NAME
# Number of top cryptocurrencies to fetch
COINS_LIMIT = 100

def lambda_handler(event, context):
    # Get today's date in YYYY-MM-DD format
    today = datetime.datetime.utcnow().strftime("%Y-%m-%d")
    
    # Base URL of the CoinGecko API
    base_url = "https://api.coingecko.com/api/v3/coins/markets"

    # API query parameters
    query_params = {
        "vs_currency": "usd",        # Get prices in USD
        "order": "market_cap_desc",  # Sort by market cap, desc 
        "per_page": COINS_LIMIT, # Limit to top N coins
        "page": 1,               # Get page 1
        "sparkline": "false"     # Don't include sparkline data
    }

    # Construct the full URL with encoded parameters
    url = f"{base_url}?{urllib.parse.urlencode(query_params)}"

    try:
        # Send the API request and read the response
        with urllib.request.urlopen(url) as response:
            data = json.loads(response.read())  
         # Convert the JSON response to a Python object

        # Create the S3 key (file path) using today's date
        s3_key = f"raw/coins/{today}/top_{COINS_LIMIT}_coins.json"

        # Upload the JSON data to the specified S3 path
        s3.put_object(
            Bucket=BUCKET_NAME,
            Key=s3_key,
            Body=json.dumps(data),
            ContentType="application/json"
        )

        # Return a success message
        return {
            "statusCode": 200,
            "body": f"Successfully saved data to s3://{BUCKET_NAME}/{s3_key}"
        }

    except Exception as e:
        # Return the error message if anything goes wrong
        return {
            "statusCode": 500,
            "body": str(e)
        }

🚀 Step 4: Deploy the Lambda

  1. Go to the Lambda Console

  2. Click Create Function

  3. Runtime: Python 3.9
    Role: Use existing →  write-from-lamda-to-s3

  4. Paste the code and hit Deploy

  5. Test it once manually — check S3 for the file (top_100_coins.json), you should have a file like below:

⏰ Bonus: Schedule It with EventBridge

  1. Go to EventBridge > Rules

  2. Click Create Rule

  3. Use cron(0 2 * * ? *) for daily 2 AM UTC runs

  4. Add your Lambda as the target

  5. In settings page, Schedule state is set to “Enable” and Permissions is set to “Create new role for this schedule”

  6. last page is “review and create” check everything and hit “Create schedule“

Name and schedule details for 2 AM UTC

The green box shows our existing lambda name used here

Schedule state = Enable and Permissions let it > “Create new role for this schedule”

Review and Create Schedule

Created 🙂 

🔍 Summary

Service

Free Tier

Role

S3

Store raw files

Lambda

Lightweight ingestion logic

EventBridge

Scheduled batch jobs (Optional)

IAM Role

Secure S3 access

Glue

Not free — skip for now

🧠 Why Not Glue for Ingestion?

  • Too heavy for simple batch pulls

  • Not free tier eligible, we are looking to enable you with free tier 🙂 

  • Overkill unless you're transforming large datasets

🔄 That’s a Wrap on Part 1

You’ve just built a clean, serverless foundation to ingest Crypto data — no servers, no manual uploads, no fuss.

Next up: We’ll show you how to clean and structure that raw JSON into analytics-ready data, so you can start unlocking insights.

📩 Part 2: Data Transformation and Storage for Analytics lands in your inbox soon.

💬 Enjoyed this tutorial?
Forward it to a friend, or share it with your team.

📬 Not subscribed yet?
Join here to get the full 3-part series — and more hands-on data projects — straight to your inbox.

Until then, happy building! 🛠️