Experimental Data Report Generator¶

Overview¶

The src/utils/generate_experimental_data_report.py script generates a comprehensive markdown report containing all experimental datasets used in the quantum pattern analysis research. This report provides complete datasets for reproducibility and further analysis.

Purpose¶

This script addresses the need to provide complete experimental data for research papers, including:

Framework Concept Extractions: Complete datasets from Classiq, PennyLane, and Qiskit
Pattern Matching Results: Full analysis results from pattern matching
Pattern Atlas Data: Complete set of quantum patterns from PlanQK Pattern Atlas
Statistical Analysis: All generated tables and metrics

Usage¶

Command Line¶

# Using just (recommended)
just experimental-data

# Direct execution
python -m src.utils.generate_experimental_data_report

Output¶

The script generates docs/experimental_data.md containing all experimental datasets.

Generated Content¶

1. Framework Concept Tables¶

Classiq Quantum Concepts: Complete dataset from data/classiq_quantum_concepts.csv (comma-delimited)
PennyLane Quantum Concepts: Complete dataset from data/pennylane_quantum_concepts.csv (comma-delimited)
Qiskit Quantum Concepts: Complete dataset from data/qiskit_quantum_concepts.csv (semicolon-delimited)
Row Numbers: All tables include sequential row numbers for easy reference

2. Pattern Analysis Results¶

Top Matched Concepts: From data/report/top_matched_concepts.csv
Match Type Analysis: From data/report/match_type_counts.csv
Framework Analysis: From data/report/matches_by_framework.csv
Pattern Frequency: From data/report/patterns_by_match_count.csv

3. Pattern Atlas Data¶

Complete Pattern List: From data/quantum_patterns.json
Pattern Metadata: Names, aliases, intents, and descriptions
Source Reference: PlanQK Pattern Atlas citation

File Structure¶

docs/
└── experimental_data.md          # Generated experimental data report

data/
├── classiq_quantum_concepts.csv     # Classiq concepts (input)
├── pennylane_quantum_concepts.csv   # PennyLane concepts (input)
├── qiskit_quantum_concepts.csv      # Qiskit concepts (input)
├── quantum_patterns.json            # Pattern Atlas data (input)
└── report/                          # Analysis results (input)
    ├── top_matched_concepts.csv
    ├── match_type_counts.csv
    ├── matches_by_framework.csv
    └── patterns_by_match_count.csv

Features¶

Error Handling¶

Graceful handling of missing files
Clear error messages for debugging
Continues processing even if some files are missing

Data Validation¶

Checks file existence before processing
Validates CSV structure
Handles empty datasets appropriately
Delimiter Detection: Automatically uses semicolon (;) for Qiskit files and comma (,) for others
Row Numbering: Adds sequential row numbers to all tables for easy reference

Formatting¶

Clean markdown tables
Proper section organization
Academic citation format
Professional presentation

Integration¶

With Research Workflow¶

Run the main analysis: just identify-concepts
Generate pattern matching: just run_main
Create final report: just report
Generate experimental data: just experimental-data

With Paper Writing¶

The generated markdown file can be: - Converted to LaTeX for academic papers - Used directly in documentation - Referenced for data citations - Shared for reproducibility

Customization¶

Adding New Data Sources¶

To include additional datasets, modify the ExperimentalDataReportGenerator class:

# Add new file paths
self.new_data_file = self.data_dir / "new_data.csv"

# Add new table generation method
def _generate_new_data_table(self):
    # Implementation here
    pass

Modifying Output Format¶

The script uses pandas to_markdown() for table formatting. To change the format:

# In table generation methods
table_md = df.to_markdown(index=False, tablefmt="grid")

Dependencies¶

pandas: Data manipulation and table formatting
pathlib: File path handling
json: JSON data processing
src.conf.config: Project configuration

Testing¶

The script includes comprehensive tests in tests/test_generate_experimental_data_report.py:

16 test cases covering all functionality
Error handling for missing files and data issues
Mock testing for isolated unit testing
Integration testing for end-to-end functionality

Run tests with:

just test-file tests/test_generate_experimental_data_report.py

Benefits¶

Reproducibility: Complete datasets for research replication
Transparency: Full experimental data available
Academic Standards: Proper citations and references
Automation: No manual data compilation required
Consistency: Standardized format across all datasets
Accessibility: Easy to share and reference

Example Output¶

The generated markdown file includes:

# Experimental Data

## Overview
This document contains the complete experimental datasets...

## Classiq Quantum Concepts
The complete dataset of quantum concepts extracted from the Classiq framework.

**File**: `classiq_quantum_concepts.csv`
**Total Concepts**: 150

| name | summary |
|------|---------|
| classiq.circuit | Quantum circuit implementation |
| ... | ... |

## References
@online{PlanQK_QuantumPatterns_2024,
  author       = {{PlanQK}},
  title        = {Quantum Computing Patterns},
  year         = {2025},
  url          = {https://patternatlas.planqk.de/...},
  urldate      = {2025-09-28}
}

This comprehensive experimental data report ensures that all research data is properly documented and accessible for reproducibility and further analysis.