This is a job posting for a Data Engineer at the Forecasting Research Institute, a nonprofit conducting research on high-stakes problems like AI progress and biosecurity. The role involves building and maintaining ELT pipelines, managing a cloud data warehouse, and collaborating with analysts. The salary range is $75k-$130k.
What they want, where you stand, and the exact résumé edits to qualify.
Biggest lever: Gain practical, hands-on experience building and managing cloud data warehouses with formal dimensional modeling and orchestration.
A starter prompt for Claude Code, what you'll need, and how to reach them.
You are a senior data engineer. Your task is to outline a detailed, step-by-step plan for implementing and maintaining ELT pipelines and a dimensional data warehouse for the Forecasting Research Institute (FRI). FRI is a nonprofit collecting forecasting data from various sources (surveys, expert panels, AI systems) and needs to transition from an external vendor to in-house ownership. The core technology stack for the implementation should leverage Python for scripting, an orchestration tool like Apache Airflow or Dagster, a cloud data warehouse (e.g., Snowflake, BigQuery, or Redshift, assuming one is already in use or will be chosen based on current vendor setup, but specify a general approach), and SQL for dimensional modeling. Focus on building robust, scalable, and maintainable data infrastructure.
Outline the following sections:
1. **Phase 1: Discovery & Integration (1 week)**
* Steps for understanding existing vendor setup and data sources (e.g., identifying survey platforms, external APIs).
* Initial collaboration points with current external vendor and internal analysts.
* Tools/scripts for initial data exploration and schema assessment.
2. **Phase 2: ELT Pipeline Development (3-4 weeks)**
* Detailed steps for designing and implementing Python-based data extraction scripts for various sources.
* Strategies for incremental data loading and handling data quality issues.
* Selection and setup of an orchestration framework (Airflow/Dagster) for scheduling and monitoring.
* Steps for building the initial dimensional model (facts, dimensions) in the cloud warehouse using SQL.
3. **Phase 3: Ownership & Maintenance (Ongoing)**
* Strategies for taking full ownership from the external vendor.
* Plans for ongoing data pipeline monitoring, alerting, and error handling.
* Processes for collaborating with research analysts for new data requirements and ad-hoc queries.
* Approaches for optimizing warehouse performance and cost.
For each step, specify potential challenges and how to address them. The output should be a detailed, actionable plan ready for execution.Forecasting Research Institute (FRI) | Data Engineer | REMOTE | Full-time We're a ~20-person nonprofit doing forecasting research on high-stakes problems, including AI progress, biosecurity, and nuclear risk. We have dozens of active projects that generate forecasting data from surveys, expert panels, and AI systems. We're looking for our first dedicated data engineer. You'd start alongside an external vendor extending an existing warehouse, then take full ownership. Concretely, the work would involve building ELT pipelines from survey platforms and external sources into a cloud warehouse, maintaining a dimensional model, collaborating with analysts, and overseeing orchestration & monitoring. You have solid Python + ETL/ELT, strong SQL and dimensional modeling, cloud warehouse experience. Nice to have: dbt/Airflow/Dagster, prior SWE work, interest in forecasting. Apply even if you don't tick every box! Conditions: - 100% Remote (worldwide) / Remote (global) - 30 days PTO, health insurance contribution - $75k–$130k, depending on experience - 3 team retreats/year Hiring process: short work test → paid 10-hour test → a few interviews. Apply at https://forecastingresearch.org/careers/
Build a small end-to-end data engineering project: ingest data from an API (e.g., public forecasting data), load into a free tier cloud data warehouse (e.g., BigQuery), apply dimensional modeling with dbt, and orchestrate with a serverless cron job or simple Airflow setup. Document the architecture and code on GitHub. (4-6 weeks)
Standard for a solo operator and data engineering role.
Fundamental database skill for data engineering.
Familiarity with at least one major cloud warehouse is assumed for a data engineer.
Common tools for data orchestration and transformation.
Standard practice for software development.
Apply directly through the link provided in the job description: https://forecastingresearch.org/careers/data-engineer
“Submit a well-crafted application highlighting your Python, ETL/ELT, SQL, and cloud data warehousing skills, emphasizing your ability to quickly take ownership of complex data infrastructure, as demonstrated by [briefly mention a relevant project/experience from your portfolio].”
Open the original ↗