Inference Performance Optimization Specialist

Chemonics is seeking a skilled Inference Performance Optimization Specialist for short-term assignment. The consultant will help one of BREB’s beneficiaries in optimizing the inference performance of Speech-to-Text (STT) and Text-to-Speech (TTS) AI models of the beneficiary platform and their product DQ Call Studio tailored for their client: AS Watson's diverse global operations.  

The Inference Performance Optimization Specialist is responsible for ensuring real-time processing and efficient performance across various technical environments, languages, and requirements.  

Scope of Work:  

  • Assessing the beneficiary platform and DQ Call Studio to analyze 3 different STT/TTS AI models performance across multiple natural languages (Dutch, English, and Arabic). 
  • Identify bottlenecks and suggest practical solutions. 
  • Implement optimization techniques for improving inference speed. 
  • Collaborate with the development team to integrate optimized models across the beneficiary’s client: AS Watson’s diverse technical infrastructure. 
  • Increasing the speed of the AI model performance by at least 2x. 
  • Optimized inference speed for AI models by at least 2x 
  • Write an API (Application Programming Interface) level integration guide. 
  • Responsible for all aspects of inference performance optimization, considering regional differences, including analysis, implementation, testing, and documentation.  
  • Monthly reporting to the beneficiary: Presenting findings and actionable recommendations to the beneficiary’s senior management, guiding strategic decisions based on the insights derived from the analysis. 

Master’s degree in Artificial Intelligence.  At least 10 years of experience as a Software Engineer, with a minimum of 4 years in a senior role.  Proven ability to architect, implement, and maintain cloud infrastructure and data pipelines to support large-scale AI model training and deployment.  Ability to optimize memory management and inference speed across CPUs and GPUs for improved AI model performance.  Experience in developing and maintaining CI/CD workflows for reliable and consistent application performance.  Expertise in building and supporting backend services with Node.js and C#, including for applications such as voice assistants and interactive speakers.  Strong competency in identifying processing bottlenecks and implementing targeted solutions to enhance speed and functionality.  In-depth knowledge of designing and building AI-powered platforms with real-time interaction, including AI bots.  Experience with multilingual STT and TTS AI models (e.g., Arabic and English) for real-time transcription.  Skilled in developing and documenting APIs for seamless AI model integration across web and mobile environments.  R&D background with experience in setting up cloud infrastructure using diverse tools (advantageous).  Expected Level of Effort (LOE):   This is a short-term technical assistance (STTA) and the expected level of effort for this assignment is 90 working days (LOEs) over a period of 6 months.   Location of assignment  Palestine, Ramallah (office and online).  
تاريخ النشر: اليوم
الناشر: Jobs
تاريخ النشر: اليوم
الناشر: Jobs