- DesiDataDuo
- Posts
- 📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3
📨 🚀 AWS Crypto Data Pipeline: Your Free Blueprint — Part 1 of 3
Edition 1: Setup + Ingest + Raw Landing Zone
In this 3-part series, we’re walking you through how to build a lightweight but production-worthy data pipeline on AWS using free services.
🧱 Part 1: Setup, Batch Data Ingestion & Raw Landing Zones
✅ What You’ll Build
Pull JSON data from a public URL (coingecko.com)
Store it in an S3 raw landing zone
Use AWS Lambda (free tier eligible) for automation
(Bonus): Schedule it with EventBridge
🔧 Step 1: Create Your S3 Raw Zone
Go to the S3 Console.
Click "Create bucket"
Name it something like
desidataduo-crypto-data
or anything you want to name itEnable versioning (optional)
Create the bucket
🛡️ Step-by-Step: Create an IAM Role for Lambda (with S3 + Logs Access)
🔹 Step 1: Go to the IAM Console
Open AWS Console → Search for IAM
Click "Roles" in the left sidebar
Click "Create role"
🔹 Step 2: Choose Trusted Entity Type
Under "Trusted entity type", select:
✅ AWS ServiceUnder "Use case", select:
✅ Lambda
🔹 Step 3: Attach Permissions Policy
At this step, you'll choose what actions the role can perform. Use AWS-Managed Policies
You can check, this will give full access and make it easier to implement the project:
AmazonS3FullAccess
🔹 Step 4: Name and Create Role
Name the role something like:
write-from-lamda-to-s3
Optionally add a tag
Click Create Role
You should see something like below:

Role that writes from API to S3 using Lambda
🧠 Step 3: Write the Lambda Function (Python)
import datetime # For getting the current date
import urllib.request # To make HTTP requests
import urllib.parse # To build a properly encoded URL
import os # To read environment variables like S3 bucket name
import json # To handle JSON encoding/decoding
import boto3 # AWS SDK for Python to interact with S3
# Create an S3 client
s3 = boto3.client('s3')
# Get the S3 bucket name from environment variables
BUCKET_NAME = os.environ.get("BUCKET_NAME", "your-bucket-name")
# PLEASE UPDATE THE BUCKET NAME
# Number of top cryptocurrencies to fetch
COINS_LIMIT = 100
def lambda_handler(event, context):
# Get today's date in YYYY-MM-DD format
today = datetime.datetime.utcnow().strftime("%Y-%m-%d")
# Base URL of the CoinGecko API
base_url = "https://api.coingecko.com/api/v3/coins/markets"
# API query parameters
query_params = {
"vs_currency": "usd", # Get prices in USD
"order": "market_cap_desc", # Sort by market cap, desc
"per_page": COINS_LIMIT, # Limit to top N coins
"page": 1, # Get page 1
"sparkline": "false" # Don't include sparkline data
}
# Construct the full URL with encoded parameters
url = f"{base_url}?{urllib.parse.urlencode(query_params)}"
try:
# Send the API request and read the response
with urllib.request.urlopen(url) as response:
data = json.loads(response.read())
# Convert the JSON response to a Python object
# Create the S3 key (file path) using today's date
s3_key = f"raw/coins/{today}/top_{COINS_LIMIT}_coins.json"
# Upload the JSON data to the specified S3 path
s3.put_object(
Bucket=BUCKET_NAME,
Key=s3_key,
Body=json.dumps(data),
ContentType="application/json"
)
# Return a success message
return {
"statusCode": 200,
"body": f"Successfully saved data to s3://{BUCKET_NAME}/{s3_key}"
}
except Exception as e:
# Return the error message if anything goes wrong
return {
"statusCode": 500,
"body": str(e)
}
🚀 Step 4: Deploy the Lambda
Go to the Lambda Console
Click Create Function
Runtime: Python 3.9
Role: Use existing →write-from-lamda-to-s3
Paste the code and hit Deploy
Test it once manually — check S3 for the file (top_100_coins.json), you should have a file like below:

⏰ Bonus: Schedule It with EventBridge
Go to EventBridge > Rules
Click Create Rule
Use
cron(0 2 * * ? *)
for daily 2 AM UTC runsAdd your Lambda as the target
In settings page, Schedule state is set to “Enable” and Permissions is set to “Create new role for this schedule”
last page is “review and create” check everything and hit “Create schedule“

Name and schedule details for 2 AM UTC

The green box shows our existing lambda name used here

Schedule state = Enable and Permissions let it > “Create new role for this schedule”

Review and Create Schedule

Created 🙂
🔍 Summary
Service | Free Tier | Role |
---|---|---|
S3 | ✅ | Store raw files |
Lambda | ✅ | Lightweight ingestion logic |
EventBridge | ✅ | Scheduled batch jobs (Optional) |
IAM Role | ✅ | Secure S3 access |
Glue | ❌ | Not free — skip for now |
🧠 Why Not Glue for Ingestion?
Too heavy for simple batch pulls
Not free tier eligible, we are looking to enable you with free tier 🙂
Overkill unless you're transforming large datasets
🔄 That’s a Wrap on Part 1
You’ve just built a clean, serverless foundation to ingest Crypto data — no servers, no manual uploads, no fuss.
Next up: We’ll show you how to clean and structure that raw JSON into analytics-ready data, so you can start unlocking insights.
📩 Part 2: Data Transformation and Storage for Analytics lands in your inbox soon.
💬 Enjoyed this tutorial?
Forward it to a friend, or share it with your team.
📬 Not subscribed yet?
Join here to get the full 3-part series — and more hands-on data projects — straight to your inbox.
Until then, happy building! 🛠️