denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Denise ([staff profile] denise) wrote2009-07-25 11:45 pm
Entry tags:

Teaching People to Fish

The transcript and video of [personal profile] damned_colonial's OSCON keynote about women in open source are now available! Well worth a read/listen; she provides some really thought-provoking material.

After she presented the lecture, someone sent her an email saying (in summary) that encouraging beginners is all well and good for small projects, but for larger or more complex projects, it's impractical: beginners will make mistakes, the project maintainers will reject their patches, the newcomer will get frustrated and leave, and nothing will get accomplished. There was also a suggestion that in order for open source software to achieve wide adoption, it has to be high quality, with an undercurrent (if not outright statement) that a). newcomers' contributions won't be quality contributions and b). it's all well and good for small projects to encourage newcomers, but "real", serious open source projects should be staffed entirely by experienced developers. His point seemed to imply that the world is divided into programmers and nonprogrammers, and if you aren't born A Programmer, you have no hope of ever achieving that status. I'm not certain that was what he meant to imply, but it's what was there between the lines.

Skud copied me (and Mark) in on her response to him, in the hopes that one or both of us would have something to say, and indeed I did, because that attitude is exactly what drives people away from contributing to projects. Skud suggested I take my response and turn it into a blog post nearly verbatim, but after some consideration, I decided I'd rather rewrite it to be a little less off-the-cuff and a little more well-structured.

It's slightly facile to say that everyone contributing to Open Source was a beginner once, because some people are better at self-education than others, some people had the benefit of a university computer science degree, some people had the benefit of hands-on coaching from a parent or mentor in their teenage years, etc. But 'experience' isn't a binary, all-or-nothing quality.

If a project keeps turning away people who don't have enough experience for them, the end result will not be that project having a large pool of only experienced developers, while interested people who don't quite match the appropriate experience level go off, educate themselves, and return when they're qualified enough to match the project's requirements. The end result will be that project having a small pool of developers who are 'experienced' enough to meet the project's requirements, while interested people who don't match that level of experience go away and never come back. And because nobody stays with a project forever -- burnout, life commitments, and the lure of other shiny projects ensure that there will be perpetual turnover -- the project is likely to see its contributor pool slowly dwindling over time.

To that end, I've put together a list of steps that any project can take to lure in new contributors and ensure that beginners have a successful learning experience with them, without compromising that project's quality and professionalism:

1. Lower the barrier to entry.

Make a list of the steps a newcomer to your project would need to go through in order to start hacking on your code. Downloading the source, obviously, but what else? Do you require libraries that the average user isn't likely to have installed? Do you assume the programmer will have, or require the programmer to have, certain tools or utilities on their system? Is your code architecture weird or nonstandard, so people will have trouble finding things in the code? Is your install process well-documented?

Barriers to installation aren't the only barriers to analyze, either. Take a look at the process by which people can contribute patches to you. Is it well-documented? How long would it take for a newcomer to figure out where to send a patch? Do you make it clear what form you require patches to be in? Are your coding style guidelines clearly available somewhere? Is your code review process transparent?

(The classic measure of this is the "typo test" -- if someone notices a typo in your project and wants to submit a patch to correct it, how long would it take them to obtain your code, make the fix, generate a patch, and submit the patch?)

Then, once you've done this analysis, figure out how you can streamline the process. For example, on Dreamwidth, we knew that the process of installing the code was tedious, confusing, and poorly documented. The technical requirements for a development environment are also fairly strict. We're doing work to improve the documentation and the install process, but even if it were much simpler, the technical requirements alone would be a barrier.

Our solution was to add another VPS server solely dedicated to providing development sandboxes for anyone who wants to contribute to the project. We call them Dreamhacks, and anyone who's interested in coding with us gets their own install (with all the libraries, plus the code already checked out for them, plus a few useful scripts), their own database to run against, and a little bit of perlbal magic to make their assigned port answer to requests for username.hack.dreamwidth.net. A little scripting on our part, and setting up a new instance takes less than five minutes for us, while it might take a newcomer several days of fighting with it.

2. Keep a public task list -- no matter how small the tasks. (Especially the small tasks.)

Every project collects them like they're candy: the little flaws or bugs that have been there since the dawn of time. You know, the ones that will take an experienced developer five minutes to fix. Typos. Missing fields. Missing options. Legacy code that doesn't actually do anything, but is still there because nobody ever bothered to take it out. Cases where you switched over to a new way of doing things halfway through the project, so half your code uses the old way and half of it uses the new way.

In most projects, those little flaws and bugs are on everyone's "when I get a chance" list: "it'll only take me five minutes; I'll do it when I get a chance". Except there are always bigger and shinier things to do, or bugs that are more crucial to fix, or new features that will knock everyone's socks off. (There's also, sometimes, a pervasive atmosphere of "those little things are beneath me".)

If you keep a public "when I get a chance" list, categorized by amount of effort required to complete the task, a new developer won't have to go looking. Some people come to a project with a particular itch that they want to scratch, and they won't have any problems finding something they want to work on. Other people come in to the project knowing that they want to help out, but not really having any particular direction they want to go in. Being able to pick a task off the list means that those people won't have to go searching for something they want to do. The idea is to present them with a wide variety of options, in the hopes that one will catch their eye and appeal to them. This increases the chance that a casual, drive-by viewer will get lured in and decide to stick around.

On Dreamwidth, we handle this by logging everything -- no matter how big or small, no matter if it's a bug or a planned feature -- into our Bugzilla. Through judicious use of keywords, we separate them into "minor effort", "medium effort", and "major effort". People who are looking for a quick hit can choose something off the "effort-minor" list, while people looking for something to dig their teeth into can choose from the "effort-major" list.

Categorizing every task by the amount of effort and experience it will take to implement it gives newcomers a solid idea of what might be suitable for them to tackle. And the benefit of having everything logged and recorded means that you're not going to forget it when it pops off your mental stack. I've seen multiple projects use their bug trackers either solely for major bugs, or as a release tracker -- only entering items that are planned for the next release and saving the others for "later", in a list in someone's email or on a wiki or whatever. By keeping the list public and entering everything on it, no matter how minor, you increase the chances that a random passerby will see an item, think "that's been annoying me for ages, too!" and provide you a fix.

Think of your bug list/issue tracking list not as a list of flaws that you have to correct (which will lead to you getting annoyed or angry every time another one gets added) but as a list of opportunities to improve. That way, every item that gets added to the list is not only proof that people are using your product and want it to be more awesome, but another item added to the list of things that can be used to lure in new people.

3. Have clear coding guidelines.

Many projects have a series of best-practice guidelines -- sometimes written, sometimes unwritten -- for coding style or architecture style. If the best-practice guidelines are written, they're often written in a shallow or confusing fashion, or phrased/presented in such a way that it's assumed everyone reading the guidelines will be able to extrapolate the guidelines that are only written between the lines. This leads to a newcomer submitting a patch that doesn't meet the project's guidelines, then having the patch rejected because it's not good enough.

It's also common for code reviewers to treat "it would be nice, but not necessary" guidelines as Holy Writ and reject patches that meet all of the required guidelines, but violate one or more of the undocumented, optional guidelines or the community standards that have evolved over the years. This is frustrating as hell, and often leads to people having a patch rejected, and rather than rewriting it just throwing up their hands and saying "screw you guys, I'm going home".

Fixing this problem involves that thing that so many developers hate: writing documentation. First, look at your coding guidelines. (You do have written code guidelines, right? If you don't, go write some. Make them detailed. Then go back and make them more detailed.) Is there anything in there that could be misinterpreted? Are you using language that assumes your reader already knows everything about your particular programming language? Are there any unwritten rules that will cause a reviewer to reject a patch that aren't documented in your guidelines?

Then, ask someone who's new to programming, or new to programming in your project's language, to read over the coding guidelines, noting down sections that are unclear or that assume a familiarity that not everyone will have. You don't have to turn your coding guidelines into a tutorial on programming in $language (and in fact you shouldn't, as that will make more experienced people take one look at them, assume they know everything that's in there, and stop reading), but if there's heavy jargon, or the guidelines assume a computer-science education, it would be a good idea to footnote a "further reading" list for any particularly jargony bits.

You should also include clear, concise examples of each point you include. Instead of saying:
Use postfix conditionals whenever possible. Postfix conditionals should not use parentheses.

Say this instead:
Use postfix conditionals whenever possible. Instead of:
if ( $u->is_person ) {
    return "User is a personal account";
}

Do this:
return "User is a personal account" if $u->is_person;


Don't use parentheses in postfix conditionals. Instead of:
return "User is a personal account" if ( $u->is_person );

Do this:
return "User is a personal account" if $u->is_person;


Ideally, your coding guidelines document should be the only document anyone new to your project should have to read in order to write a patch that will pass your code review on the first try (assuming that there are no technical errors with the patch, of course).

4. Lower your pedantry level.

This is perhaps the hardest thing for a lot of developers to accept, because to them, it sounds like "lower your standards". There are often a number of reasons for reviewers to reject a patch, though:
  1. This patch is wrong/doesn't work/doesn't compile.
  2. This patch introduces a security flaw.
  3. This patch fixes part of the issue, but not all of it.
  4. This patch doesn't meet the functional style guidelines (weird logic, bad architecture, etc)
  5. This patch is perfectly okay as is, but the reviewer sees a more elegant way to implement it.
  6. This patch doesn't meet the cosmetic style guidelines (missing spaces, wrong formatting, etc).
  7. This patch fixes the issue as described, but the reviewer would like to expand the scope of the issue (e.g. changing "fix a missing </table> in the HTML this produces" to "rewrite the page so it uses <div> instead of <table>")


In order to attract, train, and retain newer or less-experienced developers, it's important for project leaders to recognize that there's a difference between all of those reasons -- and that some of those reasons are valid reasons to reject a patch, and some of them are pedantry that will drive a newcomer away.

Obviously, if a patch doesn't work, doesn't compile, or introduces a security flaw, that's a perfectly justified reason to reject the patch. Whether or not you reject a patch for fixing part of the issue but not all of it depends on your particular project and on the nature of the issue. For projects that have a very rapid release cycle, or for issues that are particularly severe, it can be better to take a patch that fixes 80% of the issue, assuming that you'll get the next 20% in a later release: it can be better to fix the issue for a subset of your users now instead of waiting for someone else to come along and fix the issue for everyone later, particularly if the remaining 20% requires a specialized skillset that only a few people have.

Rejecting a patch because it doesn't conform to the functional style guidelines is also something that depends on your particular project. If your project combines a number of different functional styles, due to issues with legacy code or different maintainers over the years, it can be better to commit the patch and open a new bug to improve it later. (For instance, if you have a functional rule that anything that can be abstracted into a function that can be called anywhere should be, but the existing code doesn't always do that, it might be better to commit the patch now and open up a new bug to refactor it later.) If the particular functional rule is for performance reasons, or the entire codebase follows that rule, rejecting the patch for that reason is justified.

The other issues, however, are things you should think twice -- or three times -- before rejecting a patch for, especially if the patch comes from a new contributor. If the fix is perfectly valid, but the reviewer sees a more elegant way to implement it, it's often better to commit the patch as-is and then open a new bug to patch the patch later (especially, again, in projects with a rapid release cycle), under the theory that it's better to have a functional fix now than an elegant fix later. If the patch has cosmetic style flaws, it's better for the committer to add the missing spaces or remove the extraneous parentheses or convert tabs to spaces (or vice versa) at the time of commit.

And if the patch fully fixes the issue as described, but the reviewer rejects it because he or she wants to expand the scope of the issue, that is the fastest way to drive people away from contributing to your project. Your community developers will learn that nothing they submit will ever be good enough and stop even trying.

5. Never reject a patch without explaining.

If you reject a patch, explain why you're rejecting it. Tailor your explanation for the contributor's experience level. If you know that the person who submitted it has been programming for twenty years, it's okay to bounce a patch with just a quick note about what's wrong, but if someone's newer, getting back that one line is more likely to leave them sitting there staring at your response and wondering what the heck you mean by it.

If you accept the patch, but there's a better way to do it, explain it. If you make corrections to the style when you commit it, explain the corrections that you made, and why. The idea is to train people with specific, real-world examples, offering them feedback on things that they've done so that they'll learn from their mistakes and not make the same mistake again. (And if they are making the same mistakes frequently, ask yourself why. Are your explanations not clear enough? Are your coding guidelines not explicit enough? What can you improve to make sure that it doesn't keep happening?)

In either case, provide your feedback in as constructive a method as possible:
  • Explain any problems with the patch in a clear, explicit fashion.
  • Provide concrete suggestions about how to improve the code.
  • Direct the newcomer to resources they can use to learn about the problems in more depth.
  • Give examples of what would be considered good coding practice in that particular situation.


And, as a side note to this item: Review and commit patches quickly. Ideally within a day or two, especially for small fixes. If someone's a frequent contributor and their patches sit for a while, they'll likely be off doing something else while they're waiting, but if someone's new to the open source world and their first patch sits for a while, they'll be sitting there wondering what they did wrong.

Think back to how excited you felt the first time you saw your name in the changelog of a project you liked and respected. If you make someone's first experience with you be positive, quick, and seamless, they're far more likely to stick around.

6. Get buy-in from your existing contributors.

This methodology is, without a doubt, a burden on the existing contributors, especially on bootstrap:
  • It requires them to spend less time coding and more time coaching.
  • It requires them to take the time to log everything into an issue tracking database, with enough detail that a random passerby will be able to understand the issue.
  • It requires a heavy commitment to code review and rapid commit cycles.


The benefit to this methodology, though, is that all of that effort you put into coaching and training will pay off. The people you mentor in this fashion will not only gain technical experience that can be applied to your project, they're likely to be tremendously loyal to your project, thanks to all of the effort you put into teaching them. (Humans are social animals: if you welcome people as part of the pack or the tribe, and demonstrate to them that they are valued, they will continue to participate with you.)

This is not to say that every single one of your developers needs to take on the task of mentoring newcomers. (In fact, it's often a good idea to keep certain people far, far away from those who are just starting out.) A good percentage of your team should commit to mentoring, though, and you should provide fora for newcomers to help each other, and experienced developers to help newcomers, in as many methods as possible.

Designate at least one, and possibly more, people on your team (who have good social skills) to be the "newcomer liaison". This should be a person who believes that there's no such thing as a stupid question and is willing to explain the same basic steps over and over again, in whatever fashion the newcomer learns best. (Some people learn just fine from a link or a book reference or a basic pointer about what to research; some people need painstaking hand-holding.) "JFGI" should never, ever be the answer to a question about how to do something. At most, the response should be "Here, let me link you to this page where someone else explained this, and if you still have questions after reading that, I can answer them for you."

Remember that not everyone you train in this fashion will stick around after doing one or two small things. Maybe you're not a good fit for them; maybe their life gets insanely complicated; maybe they've run out of things they feel like hacking on. Don't expect to have a 100% retention rate. If your retention rate stays at 50% or better, though, you're still ahead of the game.

*

On the surface, this whole methodology looks like a tremendous amount of effort for very little payoff. After all, if you have to keep holding newcomers' hands, when will you have time to do any hacking of your own? And the stuff that the newcomers are working on, well, you could do that in five minutes, and it takes someone who's new to programming five days.

But as you deploy this method, your newcomers gain experience, and start coaching other people. Because they were welcomed into your project with these methods in place, it will seem natural and normal to them to do the same for others, and thus your pool of available mentors will grow over time. By fostering a communal, convivial atmosphere, where everyone helps everyone else, your project will be a pleasant place for contributers to be, where people want to spend time helping to achieve your goals. And sure, you could have done those little tasks in five minutes, but when was the last time you really spent any amount of time doing those little, five-minute tasks, instead of viewing them as annoyances? By using your little tasks to train your newcomers, it relieves you of the burden of having to do those little tasks yourself.

In Skud's keynote at OSCON, she said something that really stuck with me: You can teach programming. You can't teach passion. And that, really, is the core of the Dreamwidth development methodology: we take people who are passionate about the project, people who want to contribute, and teach them the skills. Because they're so passionate about the project, they put incredible effort into learning. Because we teach them the skills they want to learn, they turn around and pay it forward by teaching others.

We average 30-50 commits a week. We've had contributions from some 40 unique contributors since the project began. Of those, I'd estimate that anywhere between 50%-65% have either never programmed in Perl before, never contributed to an Open Source project before, or never programmed before, period. At the beginning, [staff profile] mark was our only committer; he spent several months coaching and mentoring others, using these methods, and now we have five committers and about three or four more people doing regular code reviews.

I'm really happy with how well our way is working for us. I encourage any project that's looking for more people to try these methods, too, because experienced programmers are made, not born. By turning away people who have the passion, but don't have the experience, projects are narrowing their potential range of contributors down to a very small talent pool -- the group of people who already have the narrow skillset the project is looking to recruit. Those skillsets might be relatively common in the open-source world, but there are thousands of projects competing for that talent pool.

A project looking to recruit new participants could do far worse than to train their own.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2009-08-08 01:01 am (UTC)(link)
Drop a comment to any dev

If you need a dev's help, post on [site community profile] dw_dev_training and somebody will come help.

I'm with you on getting a dreamhack seeming like a big huge step! I haven't actually used mine much, but I keep being told that leaving it there unused doesn't used up any resources so that makes me feel a lot better!

r
ninetydegrees: Art & Text: heart with aroace colors, "you are loved" (Default)

[personal profile] ninetydegrees 2009-08-08 01:05 am (UTC)(link)
Exactly. I think what you said is a better way to phrase what cesy aims to convey.

Really?! That would be a very good thing to mention then (even if you then have to remind people that you can't request them for fun :))