mia748•9h ago

Had a chat with my old professor about AI bias and it totally shifted how I see training data

I was catching up with my professor from college last week, and she mentioned how her team found that 60% of their image dataset had a lighting bias toward outdoor shots. That one stat made me realize I've been ignoring how data collection quirks can mess up results. How do you folks check for hidden biases in your own datasets?

3 comments

3 Comments

emery_lopez9h agoMost Upvoted

Oh man, that stat about lighting bias is wild! My friend Sarah works at a small dev shop and she told me this story about their traffic sign recognition project. They trained this model on like 10,000 images but it kept messing up on signs taken at night or in the rain. Turned out their dataset was almost entirely scraped from sunny day photos on Google Street View. They didn't realize until one of their testers drove through a storm and the system failed to recognize a stop sign. She said they had to go back and manually add all these night and bad weather shots just to fix it. It made me realize how easy it is to overlook something as simple as lighting or weather in your training data.

williams.kim9h ago

That 60% stat is honestly pretty wild when you think about it.

morgan.logan7h ago

Man, that's the thing nobody talks about enough. It's not just lighting bias either - think about how models handle things like snow covering signs, or signs that are partially blocked by tree branches or dirt. We had a similar issue at my last job with a license plate reader. It worked fine in Arizona but totally choked in Michigan winters because nobody thought to train it on plates covered in road salt and slush. Simple stuff like that completely breaks these systems in real world use.