We heard you: let's focus on substantive discussion

anon18632855 · September 24, 2024, 10:39pm

I hand-counted the votes in the spreadsheet immediately to the right of your screenshot, and double-checked by tallying the results from the working group reports for Llama 2, Pythia, OpenCV, and Bloom, double confirming that the results relied upon to make the recommendation to exclude training data were erroneous:

Model	Use	Study	Modify	Share	Total
Llama 2	0	2	1	0	3
Pythia	0	4	0	0	4
OpenCV	4	4	2	0	10
Bloom	2	5	3	1	11
Total	6	15	6	1	28

Furthermore, if you consolidate training and testing data (as is often the case), you’d pick up another 4 votes (DT, JL, JT, and RA), taking you into hard “Yes = Required (≥2μ votes)” territory.

For good measure I then combined and totaled the vote totals from the final reports to examine the votes by component and freedom:

I also applied the “pretty solid” statistics to see what that methodology would actually recommend (whether or not I agree with its validity — you can’t vote butter out of a pound cake and have it still function as a pound cake!):

Finally, I went back to the original raw voting data consolidated incomplete rows, flattened the votes to produce raw data, and analysed that data to create this heatmap:

As for calls for civility, here like at Wikipedia I’m careful to comment on content, not the contributor, and I have plenty of respect for those doing the work being led by @Mer . As a design thinking advocate I’m intrigued by the co-design process too.

I disagree with the results in part because they throw me and my Open Source AI project under the bus of the AI behemoths (who got a vote while we didn’t, which is like the fox guarding the henhouse), but also because voting is not an appropriate way to reach consensus on technical topics, so the technical question of what is actually required to practically protect the four freedoms remains open.

That said, the vote tallies from the final reports do actually support the inclusion of training data so I’ll respect those results provided you do too — given training, validation, and testing code scores lower but is required by the checklist, both training data sets and testing data sets should be as well. I look forward to seeing them added to the 0.0.10 draft.