Job Type: Contract
Contract Length: 4-6 Months (High likelihood of extension)
Pay Range: $25 - $35/hr
Start Date: Immediate
Location: Remote
About the Opportunity:
Our client, a leader in the Analytics and AI space, is looking for a skilled Data Engineer (Data Preprocessing) to join their team for a 4-6 Month engagement. This project involves a strategic migration from a legacy platform (Informatica sunset) to a modern Spark-based environment for product-agnostic data preprocessing. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.
Key Responsibilities & Deliverables:
This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:
- Legacy Code Migration: Reviewing and interpreting legacy code (including Scala) to refactor and rewrite into Python/PySpark for the new platform.
- Pipeline Development: Building and maintaining a new data migration module and data pipelines for both historical data resets and ongoing batch ingestion.
- Data Transformation: Developing product-agnostic data layers to ensure clean data flow across multiple internal products.
- Cloud Data Management: Utilizing AWS Glue for Spark jobs, Lambda for serverless functions, and managing data files within S3 buckets.
- Quality Assurance: Ensuring data integrity through rigorous validation and end-to-end testing during the baseline reset for migrated clients.
We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:
- Deep expertise in Python and PySpark (Dataframes) for data transformation and ETL. This isn't a learning role—you need to be a subject matter expert.
- The ability to read and understand Scala for the purpose of rewriting it into Python/PySpark (actual coding will be in Python/PySpark).
- Strong hands-on experience with the Core AWS Stack: AWS Glue, S3, and Lambda.
- Proven experience building multi-layer pipelines (Raw to Cleansed) and handling complex schema mapping.
- Strong SQL skills for data validation and reconciliation.
- Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
#LI-RB1





