A Strategy That Builds Strategies: My Automated Crypto Strategy Generator (Dev Diary + How to Run It)

I didn’t start this project because I believed I could “outsmart the market” with one perfect idea. I started it because I kept catching myself doing the same dangerous thing: falling in love with a backtest curve. You know the pattern. You tweak a threshold. You change one indicator window. You rerun the backtest. The curve improves. And quietly, without noticing, you stop doing research and start doing cosmetic surgery on your results. At some point I wanted a system that would argue with me. A system that would force every strategy idea to survive outside the exact period it was tuned on. So instead of writing one strategy, I built a program that can generate, mutate, evaluate, reject, and evolve strategies automatically. This is my developer diary—how I built the loop, what it’s meant to prevent, and how you can run it yourself. The core idea: stop “handcrafting” and start “evolving” This project is an AI-assisted strategy evolution framework for crypto research. At a high level it does this: Start from a candidate strategy file (strategy_candidate.py) Evaluate it with walk-forward testing (train → test, multiple folds) Score it by rewarding ROI and punishing drawdown (and filtering fragile samples) Keep an elite pool of the best candidates Mutate the strategy logic (optionally with AI) Repeat and save the best output as best_strategy.py The goal is not one lucky run. The goal is a process that makes overfitting hard. Why walk-forward testing became non-negotiable for me I used to trust “one big backtest.” Now I don’t. A single backtest can hide all kinds of illusions: a strategy that only works in one market regime parameters tuned to a specific volatility pattern a lucky sequence of trades that never repeats Walk-forward testing is my antidote: Split history into train and test Tune only on train (optional) Lock the configuration Measure performance on test Roll forward and repeat If a strategy can’t survive that loop, it’s not robust—it’s just decorated. What makes this system realistic (and not just a toy) Backtests aren’t useful unless they include the boring, painful stuff. So the engine intentionally models real execution friction: fees slippage leverage and margin accounting limits on maximum open positions (portfolio risk control) optional “one position per symbol” constraint This matters because many strategies look great only because they assume magical fills. I’m not trying to create perfection. I’m trying to create research that doesn’t collapse the moment it meets reality. How the evolver works (the loop I actually run) The evolver iterates like this: 1) Verify the strategy Before anything runs, the system checks: the strategy file is valid Python the allowed edit block is intact required features are declared correctly nothing unsafe is being imported or executed This is the part that keeps the system from slowly destroying itself as it evolves. 2) Evaluate with walk-forward (and optionally regime split) Then it runs walk-forward evaluation across multiple folds. Optionally, it can split evaluation by market regime (up / down / range), forcing strategies to survive more than one type of market. 3) Score (ROI vs drawdown, plus “don’t trust tiny samples”) A strategy isn’t “good” because it has high ROI. It’s good if ROI is high without unacceptable drawdown, and if it traded enough times to be meaningful. So scoring is designed to: reward return penalize max drawdown discard strategies with too few trades (tiny samples are how you fool yourself) 4) Maintain an elite pool Instead of only keeping “the current best,” the system maintains a top-K elite pool. This matters because evolution can get stuck in local optima. Sometimes the next breakthrough comes from mutating the 3rd-best candidate, not the #1. 5) Mutate and repeat The next iteration is generated by mutation: parameter tweaks logic variations optional AI rewrite of only the editable block The best candidate is saved as best_strategy.py, and the run artifacts go into a timestamped folder under runs/. AI rewriting: I constrain it on purpose I don’t let the AI edit everything. That’s not “safety theater,” it’s survival. In strategy_candidate.py, there’s a clearly marked section: # === AI_EDIT_START === # === AI_EDIT_END === If AI is enabled, it is allowed to rewrite only that section, under strict constraints: Python code only (no commentary) no redefining the entire Strategy class only use supported feature names keep indentation correct if it introduces parameters, it must also define the parameter space Then the program validates the output before it’s accepted. This is the difference between “AI as a tool” and “AI as chaos.” About sharing: I removed my personal strategy logic Yes—my original repository included parts of my own strategy logic (entry/exit rules, parameter ranges, etc.). For sharing, I created a public template: The framework remains intact The strategy file is replaced with an empty shell (no trading logic) No run artifacts are included API keys are never included (environment variables only) That way you can explore the system without inheriting my edge or my private research trail. How to run it (friendly setup guide) Step 1) Install Python and create a virtual environment Inside the project folder: python -m venv .venv Activate it: Windows (PowerShell): .venv\Scripts\activate macOS/Linux: source .venv/bin/activate Install dependencies: pip install -r requirements.txt Step 2) Set environment variables (only if you use AI mutation) You only need this if you want the AI to mutate the strategy block. Windows (PowerShell): setx OPENAI_API_KEY "YOUR_KEY_HERE" macOS/Linux: export OPENAI_API_KEY="YOUR_KEY_HERE" (If you don’t set this, you can still run without AI—see below.) Step 3) Configure run_settings.py Open run_settings.py and adjust: data source: binance or csv timeframe (e.g., 1h) number of symbols by volume (Top N) total history days loaded walk-forward windows (train / test / step) scoring weights and risk caps This file is the control panel. Everything else should “just run.” Step 4) Run the evolver The simplest run: python run_main.py Each run creates a new folder like: ./runs/evolve_YYYYMMDD_HHMMSS/ Inside you’ll find logs, evaluation summaries, and the best discovered candidate: best_strategy.py Step 5) Run without AI (deterministic mode) If you want pure evaluation and selection without AI rewriting: python -m hf_lite.cli --no-ai --base-dir . --source binance --timeframe 1h --top 30 --iters 30 CSV mode (controlled experiments): python -m hf_lite.cli --no-ai --base-dir . --source csv --csv-folder ./my_csv --csv-symbols BTCUSDT,ETHUSDT --timeframe 1h --iters 20 If you want the project files, leave a comment I’m happy to share the public template. If you’re interested: leave a comment below include your email address (or comment and email me directly if you don’t want to post it publicly) I’ll send the ZIP + a short setup checklist. What I learned building this The biggest change wasn’t technical. It was psychological. I stopped asking: “Can I make a backtest look great?” And started asking: “Can I build a process that makes it hard to lie to myself?” That’s the entire point of this program. It doesn’t guarantee profit. It doesn’t eliminate risk. But it’s my attempt to build research that can survive reality. (Not financial advice. Research and engineering only.)

Comments