Methodology - AI Market Research Factory

Research Methodology

Complete technical documentation of our institutional-grade AI market research factory methodology, including architecture, data sources, quality assurance, and validation procedures.

Overview: AI Market Research Factory Methodology

Our AI research factory combines multiple specialized systems to deliver investment-grade market intelligence with complete transparency and auditability.

Hybrid AI Architecture

Combines multiple AI models (GPT-4, Claude Sonnet) with specialized Python engines for quantitative analysis, ensuring optimal performance for each task type.

Multi-Phase Processing

13-task workflow divided into 4 distinct phases: data collection, consolidation, analysis, and presentation generation with quality gates between each phase.

Source Verification

Every data point is traced back to original sources with confidence scores, publication dates, and reliability assessments for complete transparency.

Parallel Processing

Advanced N8N orchestration enables simultaneous processing of multiple analysis streams, reducing total processing time while maintaining quality.

System Architecture

Detailed breakdown of our technical infrastructure and service integration

Orchestration
N8N Workflows
localhost:5678 - Task coordination and workflow management
Vector Database
ChromaDB
localhost:8001 - Isolated collections per research task
Embedding Service
Text Embeddings
localhost:8010 - Document chunking and vector generation
Analysis Engine
Python FastAPI
localhost:8020 - Quantitative analysis and modeling
Data Storage
Google Drive
OAuth2 authenticated - Unique folders per research task
Tracking
Airtable
Metadata and audit trail storage
# Core Services Health Checks curl http://localhost:8001/api/v1/heartbeat # ChromaDB curl http://localhost:8010/health # Embedder Service curl http://localhost:8020/health # Financial Engine # Docker Compose Stack services: - chroma: Vector database (port 8001) - embedder: Text processing (port 8010) - financial-engine: Analysis engine (port 8020)

13-Task Process Flow

Detailed breakdown of each processing phase with quality gates and validation steps

Phase 1: Research Data Collection (Tasks 1-5)

Hybrid Data Extraction

Task 1 uses financial-engine directly for quantitative data extraction, bypassing slower LLM calls. Combines Tavily API research with Python-based time series analysis for maximum accuracy and speed.

RAG-Powered Research

Tasks 2-5 follow standard RAG architecture: Query Tavily → Save & Embed in ChromaDB → Retrieve Context → Build LLM Prompt → Generate Analysis → Create Deliverable Files → Upload to Drive

Phase 2: Data Consolidation (Task 6)

Master Dataset Creation

Retrieve all Phase 1 deliverables from ChromaDB → Build consolidated prompt → OpenAI analysis → Generate master_dataset.csv + data_quality_report.md → Critical validation step before proceeding

Phase 3: Analysis (Tasks 7-9)

Quantitative Analysis

Pure Financial Engine processing: Monte Carlo simulations → Sensitivity analysis → Upload results. No LLM involvement for maximum mathematical accuracy.

Strategic Analysis

Enhanced LLM analysis with CSV context: Build prompt with master dataset → LLM strategic insights → Process results → Upload deliverables

Phase 4: Presentation (Tasks 10-13)

Asset Generation & Assembly

Parallel processing: Task 11 content generation (Claude) + Task 12 chart generation (Financial Engine) + Task 10 image generation → Final assembly (Task 13) → HTML/PPTX output

Data Sources & Integration

Comprehensive list of data sources with reliability ratings and update frequencies

🌐
Tavily API

Real-time web research and data aggregation

📊
Financial APIs

Market data, company financials, and economic indicators

🏛️
Regulatory Filings

SEC EDGAR, company reports, and compliance documents

📰
News & Analysis

Financial news, analyst reports, and market commentary

🔍
Research Databases

Industry reports, market studies, and academic research

📈
Market Data

Real-time pricing, trading volumes, and market sentiment

Quality Assurance Framework

Multi-layer validation and verification protocols ensuring institutional-grade accuracy

95%
Accuracy Rate
100%
Source Attribution
3000
Character Chunks
200
Overlap Buffer

Validation Protocols

Source Verification

Every data point includes source URL, publication date, author credentials, and reliability score based on historical accuracy and institutional recognition.

Cross-Reference Validation

Key findings are verified against multiple independent sources. Discrepancies are flagged and resolved through additional research or expert consultation.

Mathematical Verification

All quantitative analyses are independently verified using alternative calculation methods. Monte Carlo simulations include confidence intervals and sensitivity analysis.

Completeness Checks

Automated verification ensures all required sections, citations, and supporting documentation are present before final delivery.

AI Model Selection Strategy

Strategic deployment of different AI models optimized for specific task types

GPT-4 for Planning

Strategic planning, complex reasoning, and multi-step analysis tasks requiring sophisticated decision-making capabilities.

Claude Sonnet for Synthesis

Content synthesis, executive summaries, and presentation generation where natural language quality is paramount.

Python Engines for Quantitative

Financial modeling, statistical analysis, and mathematical computations requiring deterministic accuracy.

Sentence Transformers for Embeddings

all-MiniLM-L6-v2 model for document chunking and semantic search in the RAG pipeline.

Technical Specifications

Detailed technical parameters and system requirements

Processing Speed
4 Hours Average
Complete research cycle from brief to delivery
Document Chunking
3000 Characters
200 character overlap for context preservation
Embedding Model
MiniLM-L6-v2
Sentence transformers for semantic search
File Naming Convention
Structured Format
{runId}_Task{XX}_{Description}_{Industry}_{Region}.{ext}
Data Isolation
Per-Task Collections
Unique ChromaDB collections for each research task
API Endpoints
5 Core Functions
Time series, Monte Carlo, sensitivity, charts, assembly
# Financial Engine API Endpoints POST /extract_time_series # Task 1: Market data extraction POST /monte_carlo_forecast # Task 7: Probabilistic modeling POST /sensitivity_analysis # Task 7: Variable impact analysis POST /generate_presentation_charts # Task 12: Visualization POST /assemble_presentation # Task 13: Final assembly # Example Response Format { "status": "success", "extracted_datapoints": [...], "interpolated_series": {...}, "base_forecast": {...}, "metadata": { "processing_time": "2.3s", "confidence_score": 0.94, "source_count": 47 } }