Jack actually has a point. Even if we have the source of an AI algorithm, we don't know what it's been trained on, or how it's different from what it was yesterday.
I disagree that open sourcing them is a waste of effort but it IS insufficient. I don't think his "plug in an AI of your choice" is a great idea though. That requires massive flows of data and seems like a security nightmare waiting to happen.
Open sourcing is still a must so they cannot hide the most deliberate thumb on the scales. Bias through training data is trickier.
the big VC firms are talking about this stuff heavily, not because of the censored/uncensored issue, but because there are a TON of practical problems that overlap it.
businesses don't want their confidential info going into something else that might train it into the system
multiple AI companies have been busted adding confidential info to training even when they said they wouldn't. so for example, if you give it unfettered access, it will report account and routing numbers for employees.
so not even talking about political bias where modern chat AI will tell you nazis bad, but admit communism genocide dwarfs the nazi's death toll, yet with communism "it's complicated"... not even talking about that level of bias... no one wants to use someone else's trained model, because it can have bias around companies and other shit.
result is that there are TONS of open source, uncensored models freely available. and many companies are already using them, because of these exact issues.
most people don't realize that both before and after output, RAGs are created which alter your query and the output. "jailbreaking" chat AI means breaking out of the pre/post processing RAGs. when it comes to uncensored, it means the training data is untainted, and usually the pre/post processing RAGs are either absent, or at least not loaded with any bias.
the reason why this issue exists is because RAGs in the public chat AI space are ridiculously biased. like it got outed that google's RAGs specifically were forcing massive over-representation of non-whites. the dall-e RAGs are better, but there are still some that protect certain political figures specifically.
another part of the RAG debate is using agentic RAGs at all. agentic RAGs only exist because humans are bad at thinking of the big picture. on a long enough timeline, this problem won't exist. humans also want to be able to re-use those RAGs in different areas. that re-usability is a much different issue, and the benefit is real.
so long as humans want to keep a hand on the wheel, agentic RAGs will exist because people want to influence the outcome. some is more nefarious (e.g. political bias) but some is perfectly fine. for example, if my company has a chatbot, and it's not trained to deal with contract amendments, i don't want the chatbot talking about the contract at all. so if the answer isn't high precision/recall and 0 probability of hallucination, i want a RAG that says the chatbot will grab someone else to talk about the issue. this has become a big enough issue where already court cases have been litigated over customers being promised better deal terms by chatbots, with sellers backing out, blaming the chatbot. courts ruled in favor of the customers.
Jack actually has a point. Even if we have the source of an AI algorithm, we don't know what it's been trained on, or how it's different from what it was yesterday.
I disagree that open sourcing them is a waste of effort but it IS insufficient. I don't think his "plug in an AI of your choice" is a great idea though. That requires massive flows of data and seems like a security nightmare waiting to happen.
Open sourcing is still a must so they cannot hide the most deliberate thumb on the scales. Bias through training data is trickier.
you can readily download uncensored models.
the big VC firms are talking about this stuff heavily, not because of the censored/uncensored issue, but because there are a TON of practical problems that overlap it.
result is that there are TONS of open source, uncensored models freely available. and many companies are already using them, because of these exact issues.
most people don't realize that both before and after output, RAGs are created which alter your query and the output. "jailbreaking" chat AI means breaking out of the pre/post processing RAGs. when it comes to uncensored, it means the training data is untainted, and usually the pre/post processing RAGs are either absent, or at least not loaded with any bias.
What's your opinion on the agent RAG debate?
which part of the RAG debate?
the reason why this issue exists is because RAGs in the public chat AI space are ridiculously biased. like it got outed that google's RAGs specifically were forcing massive over-representation of non-whites. the dall-e RAGs are better, but there are still some that protect certain political figures specifically.
another part of the RAG debate is using agentic RAGs at all. agentic RAGs only exist because humans are bad at thinking of the big picture. on a long enough timeline, this problem won't exist. humans also want to be able to re-use those RAGs in different areas. that re-usability is a much different issue, and the benefit is real.
so long as humans want to keep a hand on the wheel, agentic RAGs will exist because people want to influence the outcome. some is more nefarious (e.g. political bias) but some is perfectly fine. for example, if my company has a chatbot, and it's not trained to deal with contract amendments, i don't want the chatbot talking about the contract at all. so if the answer isn't high precision/recall and 0 probability of hallucination, i want a RAG that says the chatbot will grab someone else to talk about the issue. this has become a big enough issue where already court cases have been litigated over customers being promised better deal terms by chatbots, with sellers backing out, blaming the chatbot. courts ruled in favor of the customers.