After 8 Months with Codex CLI, Here Is My Advanced AI Coding Workflow

Hi, I’m Haijun, a Full-Stack Developer | AI Enthusiast | Indie Developer. It has been a while since I last wrote an article. I have been genuinely busy recently, mainly pushing forward AI projects inside the company. From requirement breakdown, solution design, frontend and backend development, to Agent workflows, RAG, API integration, and production debugging, I have basically been carrying many parts forward on my own. Fortunately, I have had a reliable teammate fighting alongside me: Codex. Since last year, my feelings about AI tools have become stronger and stronger: things are changing incredibly fast. Last year, we already felt that a major AI update every four or five months was pretty intense. Now it is different. Many capabilities are reshaped almost every one or two months. Models are changing, tools are changing, and the way we build software is changing as well. In the past, we talked about whether AI could help us write code. Now the more practical question has become: can you turn AI into a stable, controllable, and reusable engineering workflow? I have been exploring Codex CLI for about 8 months. During this time, I have used it to build quite a few projects and stepped into many traps. Sometimes it is very smart. Sometimes it does strange things. Sometimes it is unbelievably efficient. But if you do not give it boundaries, it can also make things unnecessarily complicated. So in this article, I want to share the advanced Codex CLI workflow I have summarized over the past 8 months. This is a practical AI coding productivity system that I have repeatedly used, adjusted, and stabilized in real projects. Task context, AGENTS.md, configuration, MCP, Skills, and automation are the key paths to improving Codex stability. Project rules go into AGENTS.md Personal default configuration goes into ~/.codex/config.toml Repeated tasks become Skills External context is handled by MCP Complex tasks should be planned before execution Frontend tasks should use screenshots + Playwright feedback loops Risky tasks should be controlled through review / diff / sandbox Stable tasks can be automated with codex exec Codex CLI is not just a tool for generating code snippets. It can read repositories, edit files, and run commands inside the directory you choose. New feature development Bug investigation Frontend page implementation Component refactoring API integration Test coverage Code review Documentation generation Script automation Project structure understanding Technical debt analysis The more vague, cross-module, or high-impact a task is, the more you should first let Codex plan, locate relevant files, and confirm boundaries before asking it to modify code. Enter Plan mode first. Let Codex collect context, ask questions, and form a plan before implementation. Codex officially supports adding images to the initial prompt with -i or --image. You can attach one or more images. Multiple images can be separated by commas or passed repeatedly. Common formats such as PNG and JPEG are supported. codex -i ./screenshots/home.png "Please analyze this frontend screenshot and implement the corresponding page in the current project." The boundary here is important: images provide the visual target, but they do not replace engineering context. If you only provide one screenshot, Codex can usually produce an approximate result. If you provide multiple page states and detailed screenshots, the reconstruction becomes much more stable. A more efficient way to use it: codex \ --sandbox workspace-write \ --ask-for-approval on-request \ -i ./screenshots/home-desktop.png \ -i ./screenshots/home-mobile.png \ "Please implement the homepage in the current project based on these two screenshots. First, read the project structure and confirm the routing, component directories, styling approach, design system, and existing page implementations. Then analyze the layout, component hierarchy, spacing, typography, colors, and responsive rules in the screenshots. During implementation, prioritize reusing existing components, tokens, Tailwind configuration, and project conventions. Do not create a separate visual style. After completion, run lint, typecheck, and build. Finally, explain: which files were changed, which parts were inferred from the screenshots, and which visual differences still need manual confirmation." We can combine Codex with Playwright, allowing it to open a real browser, compare the implementation with the reference image, and iterate on layout and behavior across different screen sizes. A complete and efficient workflow looks like this: Reference image → Implement page → Start local server → Open browser → Compare screenshots → Fix → Check again The prompt can be written like this: codex -i ./references/home.png " Please implement the page shown in this reference image and use Playwright for visual verification. Requirements: 1. First confirm how to start the local development server for this project. 2. After implementing the page, start the service. 3. Use Playwright to open the page and take a screenshot. 4. Compare the reference image with the current screenshot and fix obvious differences. 5. Check at least two breakpoints: desktop and mobile. 6. Finally, output the difference summary and verification result. " Do not let it start coding immediately. First, ask Codex to analyze the image elements. codex -i ./screenshots/home.png " Do not modify code yet. Please analyze this screenshot and combine it with the current project structure. Output: 1. Page structure 2. Component breakdown 3. Mapping to the existing styling system 4. Components that should be added or reused 5. Screenshot details that may be unclear 6. Implementation plan " Then let it implement the plan: codex resume --last " Implement the page according to the previous plan. Make small incremental changes and prioritize reusing existing components. After completion, run lint, typecheck, and build. If there are visual differences, explain the reason. " You can first let Codex implement a version based on the image, then use the following prompt to let it automatically compare and fix the differences. codex resume --last \ -i ./references/target.png \ -i ./screenshots/current.png \ "The first image is the target design. The second image is the current implementation. Please compare the differences and only fix visual issues. Do not refactor unrelated code. Focus on: 1. Overall page proportions 2. Top spacing 3. Title font size and weight 4. Card border radius, shadow, and border 5. Button size and position 6. Mobile breakpoint 7. Whether there is horizontal overflow After completion, run the checks and output the changes made." This step is very important. If it is not handled properly, the whole system can become messy, and the more you change, the more chaotic it gets. We can provide the error logs and the affected scope. The clearer the context, the more stable Codex’s understanding and modifications will be. Once there is a defined scope, it is less likely to make random changes. The Codex logs are in History.log. Please investigate and fix this issue based on the logs. Working method: 1. Do not modify code first. 2. Find potentially related pages, components, state management, API requests, and style files. 3. Explain the 2-3 most likely causes. 4. If the project can be run, try to reproduce the issue. 5. After confirming the root cause, make the smallest possible change. 6. Run relevant checks after the change. 7. Finally, explain the root cause, changed files, and verification method. Especially when modifying legacy projects, never let AI directly rewrite everything. Legacy projects are already complex enough, and AI also needs time to understand them. I usually split the work into four layers. This is much more stable. Understand the current state -----> Identify risks -----> Create migration plan -----> Execute in batches The prompt can be written like this: codex " I am preparing to refactor the component structure of the current project. Do not modify code yet. Please output: 1. Current component directory structure and main responsibilities 2. Duplicate code and possible abstraction points 3. High-risk files 4. Recommended target directory structure 5. Phased migration plan 6. Verification method for each phase 7. Areas that should not be changed right now " Codex CLI has /review, which can review the workspace, a commit, or a branch-based diff. According to the official description, it reads the diff as an independent reviewer and outputs prioritized, actionable issues without modifying the working tree. You can also use your own custom review standard, or use Codex review. My prompt looks like this: codex " Please review the current uncommitted changes. Focus on: 1. Whether there are behavioral regressions 2. Whether there are type risks 3. Whether responsive behavior is broken 4. Whether unrelated files were modified 5. Whether there are maintainability issues 6. Whether necessary tests are missing Only output high-risk issues and suggested fixes. Do not modify code. " We can use external MCP servers to extend Codex’s capabilities. For example, if we want to implement designs from design files, we can use Figma MCP. If we want to develop based on the latest documentation, we can use Context7 MCP. Installation and usage are also simple: codex mcp add context7 -- npx -y @upstash/context7-mcp For prompts that we use frequently, we can extract them into Skills. In real development, you can directly use these Skills and only provide the variable parts. Codex can then develop based on the workflow you previously extracted. Skills can be divided into system-level and project-level Skills. If a Skill is general-purpose, you can put it globally. If it is only frequently used in the current project, you can package it as a Skill and place it inside .agents/skills in your project. When using Codex, it can recognize your intent and call the corresponding Skill. You can also explicitly tell it which Skill to use. For example: codex " Use the haijun skill to implement this page. " I want to share one real feeling. Many people feel that using AI to write code is unstable. The problem is not necessarily the tool itself. The problem may be that we are still using it as if we are simply “submitting requirements,” instead of managing it the way we would manage a teammate. If you do not provide project background, it can only guess. If you do not provide boundaries, it may modify things randomly. If you do not provide verification standards, it does not know what “done” means. If you do not make the final judgment yourself, it may hand you a solution that seems runnable but is not actually reliable. The upper limit of AI depends on the professional ability and cognitive ability of the person using it. I used Cursor for a long time before. Later, I also used Claude Code for a while. Claude Code is indeed very powerful. In many cases, it is impressive. But for users in China, the restrictions are quite obvious, and the stability and long-term sustainability of using it were not ideal for me. Later, I gradually switched to Codex CLI. I have now used it for more than half a year. I can even say that it has become one of the most important collaboration tools in my daily development work. Over the past few months, it has helped me complete many things: writing pages, modifying components, investigating bugs, refactoring projects, organizing architecture, designing Agent tools, optimizing RAG pipelines, generating documentation, and performing pre-launch checks. It is not just helping me “write code.” It is more like helping me reorganize many parts of the development process. But the premise is: you need to give it a workflow. So I am becoming more and more certain about one thing: In the future, the real gap will not come from who knows how to use a specific AI tool. It will come from who can turn AI tools into their own engineering system, content system, business system, and even a personal capability amplifier.

Key Takeaways

•Hi, I’m Haijun, a Full-Stack Developer | AI Enthusiast | Indie Developer. It has been a while since I last wrote an article. I have been genuinely busy recently, mainly pushing forward AI projects inside the company

•This story was reported by Dev.to, covering developments in the dev space.

•AI advancements continue to reshape industries — read the full article on Dev.to for complete coverage.

After 8 Months with Codex CLI, Here Is My Advanced AI Coding Workflow

Key Takeaways

•This story was reported by Dev.to, covering developments in the dev space.

•AI advancements continue to reshape industries — read the full article on Dev.to for complete coverage.

After 8 Months with Codex CLI, Here Is My Advanced AI Coding Workflow

Key Takeaways

Related Articles

Humanoid Robots built 30,000 BMWs and cleaned Airport Terminals for $15,400. Here's why the Pilot Era is over.

I'm moving house🏡 - What gadgets, furniture and whatnot do I need for The Ultimate Setup™? 🚀

Building an AI Scoring Pipeline for 10,000+ Listings a Day

I shipped Sipcode today: keeping Claude Code's context clean for sharper answers

Discussion

After 8 Months with Codex CLI, Here Is My Advanced AI Coding Workflow

Key Takeaways

Related Articles

Humanoid Robots built 30,000 BMWs and cleaned Airport Terminals for $15,400. Here's why the Pilot Era is over.

I'm moving house🏡 - What gadgets, furniture and whatnot do I need for The Ultimate Setup™? 🚀

Building an AI Scoring Pipeline for 10,000+ Listings a Day

I shipped Sipcode today: keeping Claude Code's context clean for sharper answers

Discussion

Related Articles

Dev.to
Humanoid Robots built 30,000 BMWs and cleaned Airport Terminals for $15,400. Here's why the Pilot Era is over.
Physical AI hit 99% accuracy on BMW X3 production, JAL deployed airport robots at $15,400, and China mandated 10,000 commercial deployments by year-end. Here is what your industry missed this week. Value Description 99%+ Figure AI accuracy on BMW X3 assembly across 30,000+ vehicles $15,400

Dev.to
I'm moving house🏡 - What gadgets, furniture and whatnot do I need for The Ultimate Setup™? 🚀
TL;DR: What gadgets, tools, furniture and whatnot do I need for an ideal desk setup for a tinkerer/programmer/gamer? Leave a comment and give it your all! Image is AI generated for the lack of a better option, but I kinda like it. New year, new everything, I guess. Yes, I know, it's June, but the la

Dev.to
Building an AI Scoring Pipeline for 10,000+ Listings a Day
The bill is the part nobody talks about when they demo AI pipelines. You see the cool output, the semantic matching, the ranked results. You don't see the spreadsheet where you realize each token costs real money. I was building an AI scoring pipeline for a job board platform that ingests listings f

Dev.to
I shipped Sipcode today: keeping Claude Code's context clean for sharper answers
Sipcode is live on Product Hunt as of today. Local proxy plus MCP server for Claude Code. It rewrites tool output (Read, Bash, Grep) before the model sees it, removing redundancy without dropping information. On my own 3,567,170-token dogfood corpus: 62.6% median tool-output savings, range 37.4 to 8