As we transition to a modern big data infrastructure, PySpark plays a critical role in powering high-performance data processing. We are seeking a Data engineer / PySpark expert to optimize data pipelines, enhance processing efficiency, and drive cost-effective cloud operations. This role will have a direct impact on scalability, performance, and real-time data processing, ensuring the company remains competitive in data-driven markets.

You’ll be working closely with a Data Platform Architect and a newly formed team of four Data Engineers based in India (GMT+5:30) and one Data Engineer in Uzbekistan (GMT+5). Additionally, we're planning to hire two more Senior Data Engineers in Georgialater this year.In this role, you’ll report to the CTO, who is based in the GMT-8 time zone, and the VP of Engineering (EDT/EST).

Position Details‍

Role: Senior Data Engineer
Location: Remote (We’re looking for candidates based in Georgia, Romania, and the Czech Republic only)
Employment: Service Agreement (B2B contract; you’ll need a legal entity to sign)
Start Date: ASAP
Salary: $5,500 - $8,000 USD per month GROSS (fixed income, paid via SWIFT)
Working Hours: 11 AM to 7 PM local time. No night or weekend work is expected
Time Overlaps: Sync ups with RnD( Puna, India) in GMT+5:30 and devs in GMT-5, plus occasional meetings with the VP of Engineering in EST/EDT and the CTO in GMT-8.
Equipment: The company will provide a laptop.

What You’ll Be Doing‍

Optimize Data Processing Pipelines: Fine-tune PySpark jobs for maximum performance, scalability, and cost efficiency, enabling smooth real-time and batch data processing.
Modernize Legacy Systems: Drive the migration from traditional .NET, C#, and relational database systems to a modern big data tech stack.
Build Scalable ETL Pipelines: Design and maintain robust ETL/ELT workflows capable of handling large volumes of data within our Bronze/Silver/Gold data lake architecture.
Enhance Apache Spark Workloads: Apply best practices such as memory tuning, efficient partitioning, and caching to optimize Spark jobs.
Leverage Cloud Platforms: Use AWS EMR, Databricks, and other cloud services to support scalable, low-maintenance, high-performance analytics environments.
Balance Cost & Performance: Continuously monitor resource usage, optimize Spark cluster configurations, and manage cloud spend without compromising availability.
Support Real-Time Data Streaming: Contribute to event-driven architectures by developing and maintaining real-time streaming data pipelines.
Collaborate Across Teams: Partner closely with data scientists, ML engineers, integration specialists, and developers to prepare and optimize data assets.
Enforce Best Practices: Implement strong data governance, security, and compliance policies to ensure data integrity and protection.
Drive Innovation: Participate in global initiatives to advance supply chain technology and real-time decision-making capabilities.
Mentor Junior Engineers: Share your knowledge of PySpark, distributed systems, and scalable architectures to help develop the team’s capabilities.

Experience & Expertise:‍

5+ years as a Data Engineer, with solid experience in big data ecosystems.
7+ years of hands-on AWS experience is a must, including deep familiarity with EMR, IAM, VPC, EKS, ALB, and Lambda.
Cloud experience beyond AWS (GCP or Azure) is a strong plus.
Proficiency with Python (including data structures and algorithms), SQL, and data modeling.
Strong expertise in distributed computing frameworks, particularly Apache Spark and Airflow.
Experience with streaming technologies such as Kafka.
Proven track record optimizing Spark jobs for scalability, reliability, and performance.
Familiarity with cloud-native ETL/ELT workflows, data sharing techniques, and query optimization (e.g., AWS Athena, Glue, Databricks).
Experience with complex business logic implementation and enabling application engineers through APIs and abstractions.
Solid understanding of data modeling, warehousing, and schema design.

Soft Skills:‍

Strong problem-solving skills and proactive communication.
Fluent English - B2 and higher (both written and verbal).

Preferred Skills & Certifications:‍

Familiarity with .NET applications structure and deployment.
Relevant cloud certifications (AWS Solutions Architect, Developer, Big Data Specialty).
Certifications or proven experience in Databricks, Apache Spark, Apache Airflow, and data modeling are a plus.

Recruitment Process‍

‍# 1 Initial Interview: Up to 1 hour with HR or/and including a self-assessment form (Click to fill out the form). If you prefer, you can skip the call and discuss all questions and details in writing instead. Just let us know!
‍# 2 Managerial Interview (Optional): 30-60 minutes (You will meet with the CTO to learn more about the company, the position, and future plans directly from the source.)‍
‍# 3 Test Assignment: up to 113 minutes on iMocha platform (Data Structures - Graph data structure, Array and String manipulation - All in Python, with a few MCQ questions on Spark)
‍# 4 Technical Interview: Platform/Application Architect: up to 1h..FAQ – Technical Interview Format + Key Domains Covered - Show more
‍# 5 Offer & Paperwork: Up to 30 minutes with the CTO to finalize conditions and complete necessary paperwork.
‍# 6 Onboarding: Get ready to join the team and start your journey!

Ready to apply for this role?

Apply Now →

Senior Data Engineer

Position Details‍

What You’ll Be Doing‍

Experience & Expertise:‍

Soft Skills:‍

Recruitment Process‍

Related jobs

Senior Machine Learning Engineer (f/m/d)

(Senior) Backend Engineer, Platform

DataOps Engineer (AI Platform Engineer)

DataOps Engineer