

1·
25 days agoIf I’m creating a corpus for an LLM to consume, I feel like I would probably create some data source quality score and drop anything that makes my model worse.


If I’m creating a corpus for an LLM to consume, I feel like I would probably create some data source quality score and drop anything that makes my model worse.
Yeah, after reading a bit into it. It seems like most of the work is up front, pre filtering and classifying before it hits the model, to your point the model training part is expensive…
I think broadly though, the idea that they are just including the kitchen sink into the models without any consideration of source quality isn’t true