AI Infrastructure Engineer

Key Responsibilities: Collaborative engineering: Work within a larger team to rapidly develop proof-of-concept prototypes to validate research ideas and integrate them into production systems and infrastructure Performance Analysis: Conduct in-depth profiling and tuning of operating systems and large-scale distributed systems, leveraging heterogeneous hardware (CPU, NPU). Documentation and Reporting: Maintain clear technical documentation of research findings, design decisions, and implementation details to ensure reproducibility and facilitate knowledge transfer within the team. Research and Technology Exploration: Stay current with the latest advancements in AI infrastructure, cloud-native technologies, and operating systems. E.g. techniques to efficiently execute inference workload based on SW/HW co-design; exploit workload characteristics to prefetch memory/minimize communication. Stakeholder Communication: Present project milestones, performance metrics, and key findings to internal stakeholders. Person Specification: Required: Bachelor''s or Master''s degree in Computer Science or a related technical field. A solid background in operating systems and/or distributed systems and/or ML systems. Excellent programming skills, master of at least one language, such as C/C++. Good communication and teamwork skills. Be comfortable with research methodology. Desired: Familiarity with current LLM architectures (e.g. Llama3, DeepSeek V3) Familiarity with production LLM ..... full job details .....