Andy (@viewerisland) 's Twitter Profile
Andy

@viewerisland

Building Gulp (YC W25)

ID: 2557375100

linkhttp://baiqinglyu.com calendar_today09-06-2014 19:12:15

225 Tweet

158 Followers

93 Following

Andy (@viewerisland) 's Twitter Profile Photo

So much folklore of injecting text like checking docs or extra think for models to improve performance. Folks, just say you are forcing a certain distribution of behavior, it ain't that deep

Andy (@viewerisland) 's Twitter Profile Photo

Bytedance has allegedly surpassed 1 million GPUs of H100/H800 sku, putting them in the same league as Google (1-1.5M H100 level) and Microsoft (750k - 900k). Combining megascale, Verl, and joint efforts with AReaL, it is some incredibly exciting times ahead

Andy (@viewerisland) 's Twitter Profile Photo

With releases like MiMo it's clear the future for production AI systems will be a strong and small reasoning model paired with a strong retrieval and knowledge recommendation system.

Kasey Zhang (@_weexiao) 's Twitter Profile Photo

We used RL to train a model for MCP! Connect any MCP client to any MCP server - you can run MCP workflows fully with local models (+ tune it further). It works with Ollama / any MCP client that supports Qwen3 models - download it below 👇1/

We used RL to train a model for MCP! 

Connect any MCP client to any MCP server - you can run MCP workflows fully with local models (+ tune it further).

It works with Ollama / any MCP client that supports Qwen3 models - download it below 👇1/
Andy (@viewerisland) 's Twitter Profile Photo

For people that have tried post training the MiMo RL models, have anyone noticed that this model is *incredibly* verbose? Even for simple tasks it will go on for 20k+ tokens before an answer.

Andy (@viewerisland) 's Twitter Profile Photo

Many frameworks have claimed to support RL for LLMs. Only one that would actually work in production and even it lacks support on standardizing things like tool invocation and reward policies. It's like selling you a fully functional car with no gas and the nearest gas station is

Kasey Zhang (@_weexiao) 's Twitter Profile Photo

Don't use structured output mode for reasoning tasks. We’re open sourcing Osmosis-Structure-0.6B: an extremely small model that can turn any unstructured data into any format (e.g. JSON schema). Use it with any model - download and blog below!

Andy (@viewerisland) 's Twitter Profile Photo

Am I a boomer in the ai agent world... I don't care about your fancy abstractions show me the jinja template and your tools k? Thx

Andy (@viewerisland) 's Twitter Profile Photo

Nvidia releases Python DSL for CuTe and within weeks there's already a 200+ WeChat group chat about this discussing use cases and findings. Recommending a blog post here: veitner.bearblog.dev/bridging-math-…

Kasey Zhang (@_weexiao) 's Twitter Profile Photo

It’s easy to fine-tune small models w/ RL to outperform foundation models on vertical tasks. We’re open sourcing Osmosis-Apply-1.7B: a small model that merges code (similar to Cursor’s instant apply) better than foundation models. Links to download and try out the model below!

Andy (@viewerisland) 's Twitter Profile Photo

Anyone coding with OSS AI IDE's should have at least a perplexity MCP and a language server MCP for linting/type checking etc