Researcher Engineer

Verita HR Polska is a Human Resources service provider operating under number 5694.
We are working as a recruitment provider searching on our Client’s behalf for a person in the following role:

Responsibilities

About the company: US-based AI startup focused on building the next generation of training data for LLMs. The team partners with top AI labs to create realistic RL environments where models encounter research and engineering challenges, iterate, and learn from feedback, pushing AI closer to its full potential.
Project: Design and build reinforcement learning environments to teach LLMs advanced reasoning and modern ML concepts. Candidates will work on realistic feedback loops where models encounter research and engineering problems and iterate on solutions.

What you will do:
• Build and maintain RL/ML environments for LLM training
• Implement robust, production-quality Python code (not just notebooks)
• Deploy and run environments in Docker with focus on reliability and iteration speed
• Analyze model performance and respond to feedback efficiently
• Collaborate with research teams to translate papers and ideas into RL problems

Requirements

• Strong Python (engineering-quality)
• Docker and production mindset
• Understanding of LLMs and their limitations
• Ability to meet throughput expectations
• Advanced English (C1/C2) and ≥4 hours overlap with US time zones

Nice-to-have:
• Deep knowledge of transformer internals and LLM training/inference
• Experience with inference libraries (vLLM, SGLang, etc.)
• CUDA or Pallas kernel development experience (nice to have)
• Publications or open-source contributions in active DL/ML research
• Experience building interactive RL environments and RL-based learning systems

The offer

• Fully remote, flexible work schedule with some overlap to US time zone
• Direct impact on how LLMs learn
• Collaboration with top AI researchers and labs
• Exposure to cutting-edge RL and ML projects

Job offer details
  • Salary:
  • Category: IT
  • Country: Poland
  • State: Abroad / Remote
  • City: Remote
  • Valid until: 30/06/2026
Contact to your recruiter

milena.gorka@veritahr.com

Access to my personal data

Formularz dla kandydata

Administratorem danych osobowych jest Verita HR Polska Sp. z o.o. oraz HRO Personnel Sp. z o.o. Dane osobowe będą przetwarzane w celu udzielnie odpowiedzi na zadane pytanie przez formularz kontaktowy. Więcej informacji o zasadach przetwarzania danych, w tym o celach i prawach dostępne jest w Polityce prywatności.
INSPEKTOR OCHRONY DANYCH OSOBOWYCH​
Inspektor Danych Osobowych w Verita HR Sp. z o.o.:
dane.osobowe@veritahr.com 
Inspektor Danych Osobowych w HRO Personnel Sp. z o.o.:
dane.osobowe@hropersonnel.com 

Formularz dla pracodawcy

Administratorem danych osobowych jest Verita HR Polska Sp. z o.o. oraz HRO Personnel Sp. z o.o. Dane osobowe będą przetwarzane w celu udzielnie odpowiedzi na zadane pytanie przez formularz kontaktowy. Więcej informacji o zasadach przetwarzania danych, w tym o celach i prawach dostępne jest w Polityce prywatności.
INSPEKTOR OCHRONY DANYCH OSOBOWYCH​
Inspektor Danych Osobowych w Verita HR Sp. z o.o.:
dane.osobowe@veritahr.com 
Inspektor Danych Osobowych w HRO Personnel Sp. z o.o.:
dane.osobowe@hropersonnel.com