Junior Data Engineer – Python, SQL, Spark (Remote Option)

Remotely
Full-time

A fast-growing technology firm turns raw data into predictive insights for Fortune 100 enterprises. An agile, cross-functional culture rewards curiosity, experimentation, and continuous learning—regardless of where you log in from.


Why This Role Rocks  

You dive into code that actually ships. You’ll refine data ingestion, cleanse terabytes, and automate workflows that data scientists depend on. Expect mentorship from senior engineers, a dedicated training budget, and hands-on exposure to modern cloud stacks.


Core Responsibilities  

- Construct and deploy scalable ETL pipelines using Python 3.11, Apache Spark, and SQL.  

- Schedule, monitor, and troubleshoot DAGs in Airflow for real-time and batch data movement.  

- Profile raw datasets—identify anomalies, handle missing values, and enforce data quality rules.  

- Integrate diverse sources (REST APIs, streaming topics, relational databases) into a unified lakehouse on AWS S3.  

- Write modular, testable code and maintain CI/CD workflows in GitHub Actions.  

- Optimize query performance by tuning partitions, indexes, and file formats (Parquet, ORC).  

- Document schemas, lineage, and transformation logic in Confluence for analytics and ML teams.  

- Collaborate with data scientists to prep features for training, validation, and model monitoring.  

- Respond to pipeline incidents; execute root-cause analysis and preventive fixes.  

- Continuously research emerging tools—Delta Lake, dbt, Iceberg—and propose adoption roadmaps.


Required Qualifications  

- Bachelor’s degree in Computer Science, Data Engineering, or a related STEM field.  

- 0-2 years of professional or internship experience writing production Python.  

- Solid grasp of SQL (window functions, CTEs, query optimization).  

- Familiarity with distributed processing (Spark or PySpark) and ETL best practices.  

- Understanding of version control, preferably Git, and basic Linux command line.  

- Clear verbal and written communication; ability to explain technical concepts to non-technical partners.  

- Analytical mindset, relentless problem-solver, and eagerness to learn new frameworks quickly.


Preferred Extras  

- Exposure to AWS analytics services (Glue, Redshift, Kinesis) or Azure equivalents.  

- Experience with BI visualization tools like Tableau or Looker.  

- Knowledge of Docker, Kubernetes, or Terraform for infrastructure as code.  

- Participation in open-source projects or hackathons showcasing data skills.  

- Familiarity with data governance concepts—GDPR, HIPAA, SOC 2.


Growth Path  

Starting as a Junior Data Engineer, you can progress to Data Engineer II within 12-18 months, specialize in MLOps, or pivot toward Analytics Engineering. Your career map is flexible—driven by measurable achievements and the business impact of your pipelines.