SQOR.ai logo

[Remote] Co-founder, Senior AI-Native Infrastructure and DevOps Engineer (**High Equity-Based)

SQOR.ai

Share this job:

Note: The job is a remote job and is open to candidates in USA. SQOR.ai is an AI-native Decision Intelligence platform that aims to simplify business intelligence through machine learning and real-time data analysis. The role involves managing and optimizing the company's Google Cloud infrastructure, ensuring reliability and performance for AI queries, and collaborating closely with the CTO and engineering teams.

Responsibilities

  • Manage and optimize GKE clusters (Autopilot and Standard) for performance, reliability, and scalability.
  • Configure pod resources, autoscaling, workload classes, and node pools to support high-demand AI workloads.
  • Ensure parity between Dev, Staging, and Prod environments for accurate performance comparison.
  • Improve latency and efficiency across the entire infrastructure footprint.
  • Build and maintain CI/CD pipelines using GitHub Actions, Cloud Build, and ArgoCD.
  • Automate deployments, infrastructure provisioning, and configuration updates.
  • Implement automated checks, including performance tests, health checks, and stability gates.
  • Implement and manage observability across all services using Cloud Monitoring, Cloud Logging, Prometheus, Grafana, and OpenTelemetry.
  • Build dashboards to track CPU, memory, DNS latency, network paths, pod health, vector search latency, and AI inference behavior.
  • Create automated alerts and anomaly detection for infrastructure-level issues.
  • Diagnose latency differences between VMs and GKE environments.
  • Identify issues caused by CPU throttling, DNS behavior, logging overhead, networking paths, and service mesh or gateway overhead.
  • Run load, stress, and concurrency testing using tools like k6, Locust, or equivalent.
  • Optimize configurations for speed, reliability, and cost efficiency.
  • Manage and tune RAG pipelines and vector databases (Vertex AI Matching Engine or comparable).
  • Instrument vector search performance and optimize index configurations.
  • Improve integration with Vertex AI and Gemini models for consistent, low-latency responses.
  • Work closely with the CTO, AI team, and backend team to ensure infrastructure supports their requirements.
  • Provide clear reporting on performance findings, system behavior, and recommended improvements.
  • Operate independently with high ownership and accountability.

Skills

  • Google Cloud (GKE, IAM, Cloud Build, Cloud Run, Cloud Logging, Cloud Monitoring)
  • Kubernetes performance tuning (Autopilot and Standard)
  • Terraform and IaC
  • CI/CD (GitHub Actions, Cloud Build, ArgoCD)
  • Distributed tracing and observability (OpenTelemetry, Cloud Trace, Prometheus, Grafana)
  • Vector databases and RAG infrastructure (Vertex Matching Engine, Pinecone, or equivalent)
  • Gemini model integration and Vertex AI pipelines
  • Multi-agent AI orchestration fundamentals
  • BigQuery performance profiling and understanding of execution behavior
  • Networking, DNS, load balancing, Kgateway, and service mesh basics
  • Stress testing and performance profiling (k6, Locust, etc.)
  • Strong diagnostic and troubleshooting abilities across compute, DNS, logging, networking, and data layers
  • Experience running AI or ML-heavy workloads in production
  • MCP protocol familiarity (agent setup, gateway configuration, context-sharing patterns)
  • Apache Flink, Kafka Streams, or Ray for advanced streaming or distributed compute
  • GPU cost optimization or inference-time optimization strategies
  • Anthos or hybrid/multi-cloud environments
  • Experience with high-throughput data ingestion pipelines
  • Prior experience in regulated enterprise environments

Benefits

  • 60,000 company shares, which is approximately 1 percent equity.
  • As revenue and funding increase, a competitive full-time salary will be added.

Company Overview

  • The future of Business Intelligence is not more dashboards. It is real-time answers. It was founded in 2021, and is headquartered in New York, New York, USA, with a workforce of 11-50 employees. Its website is https://www.sqor.ai.

Job Type

Job Type
Full Time
Location
United States

Share this job: