NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
I Spent 24 Hours with GitHub Copilot Workspaces (every.to)
extr 4 days ago [-]
I've noticed the same issue with AI coding, where you start to write requirements and then realize that you yourself don't have a perfect idea of what exactly this feature should be, or how it should work. It's easy to say the answer should be to simply think harder, or enter a dialogue with the AI about missing details, but if you try that you'll find yourself supplying an enormous amount of context you didn't expect to have to communicate. Context not even directly related to the code at hand, but about the broader business or industry, past lessons learned, something the CEO said to you last week about the feature, etc.

It's this kind of thing that makes me think tackling big feature requests is still an AGI-complete problem. Perhaps if it gets good enough at pure coding you can iterate your way to success.

ossobuco 4 days ago [-]
> but if you try that you'll find yourself supplying an enormous amount of context you didn't expect to have to communicate. Context not even directly related to the code at hand, but about the broader business or industry, past lessons learned, something the CEO said to you last week about the feature, etc.

Basically you go from programmer to product manager, except you also get to micromanage a non-sentient programmer

lordswork 4 days ago [-]
What prevents an AI agent from becoming the product manager as well, and communicating with you (the customer) to clarify requirements?
datadrivenangel 4 days ago [-]
A failure mode of product managers is to just pass customer requests to the developers.

I don't see an AI agent doing a good job of avoiding that.

awill88 3 days ago [-]
The tenor of the conversation I imagine since it’s a chatbot
krainboltgreene 4 days ago [-]
Slavery laws, presumably.
duxup 4 days ago [-]
I don't know if we're talking about exactly the same thing but this is my side story:

Even small requests to AI I find myself accidentally including some words or phrases that seem to indicate to AI "Oh he wants this as a function that does all the things very manually".

So I get some fairly capable, but very verbose and often inflexible code.

Yet, that's not what I was asking for, but something in the context set the AI off in that direction. In reality I'm not sure what I want and I'm open to anything.

Often I suddenly realize "Wait, there's gotta be some built in things in this language that does this or part of this..." and often there is that is far more reliable and a better way to do it. Somehow AI skipped that and gave me a different answer.

It strikes me as similar to customers who come to me with "I want an email that's sent on Tuesdays that are single digit calendar dates and this field contains the letter Q in them and ..." But when I ask them what they're trying to accomplish I find all that specificity isn't needed, and they really mean they order all their grapes on Tuesdays at the begging of the month and they just want a list of their grapes orders every few weeks.

extr 4 days ago [-]
Yeah this is a similar phenomenon. AI is not so good at recognizing that you're looking for the "general" solution to the problem, one that will holistically fit in with the rest of the codebase/objective, and what has been provided as an example is really just a special case.

I think part of the problem is that instruction fine tuning is not done on full codebases, just shorter problems that fit into reasonable (8K, 32K) context windows. By nature these problems are more specific, so they are biased in that direction from the start.

steve1977 4 days ago [-]
And that is why the I in AI is still misleading.
sdesol 4 days ago [-]
> then realize that you yourself don't have a perfect idea of what exactly this feature should be

I talked about it the last time that Copilot Workspaces reached the front page two days ago and that was, I don't think the value is in the code generation, but rather in the ability to capture our thought process. CW is currently a bottleneck in my opinion and I think the code generation will have to get pretty good before we can see the value in writing everything down vs just coding as we have always done.

HanClinto 4 days ago [-]
Agreed.

The most compelling part of the demo showcased in this post is the way that the tool built the bulleted list of success criteria -- that's so often a tedious and overlooked part of writing user stories, but its importance shouldn't be understated -- the fact that it bakes that step into the workflow feels like the most valuable piece of the puzzle here.

throwup238 4 days ago [-]
I only got to use the Workspaces feature for an hour before they fixed the waitlist access check but IMO the real value is that it provides a familiar PR-style interface for the whole process that enables fast iterations.

TFA didn't show a screenshot of it but the per file plans and the diffs are side by side on a single screen so you can update the per file plan (adding and removing files as needed) and then "re-roll" the code changes as you go. With the Codespaces feature you can even launch the project and get access to a terminal to run stuff and presumably feed the output back into the plan.

It makes it really easy to spot deficiencies in the code, add comments in the plan, and instantly regenerate the code (well, not instantly, there was a queue when I used it). It was a lot smoother than my experience with Copilot Chat, Aider, and Plandex.

4 days ago [-]
layer8 4 days ago [-]
The same is true on the code level, which can be viewed as a more detailed specification.

Part of the fun of software development is exploring the solution space by implementing, and gaining a deeper understanding in the process, as well as coming up with the corresponding design decisions.

It seems that with current AI, in order to steer it and evaluate its output, you would have to build that deeper understanding up front without doing the work, which seems difficult.

mamcx 4 days ago [-]
At the moment you have clear which are the requirements you have already solved the program.

Programming is the task of finding the real requirements!

jprete 4 days ago [-]
To me this looks similar to rubber-ducking or technical writing. All three involve mentally modeling the perspective of someone who may not share your knowledge or assumptions.
Swizec 4 days ago [-]
> It's this kind of thing that makes me think tackling big feature requests is still an AGI-complete problem. Perhaps if it gets good enough at pure coding you can iterate your way to success.

I think you’ve just invented product managers. This used to be part of a software engineer’s job. Back when inputting code into a computer was so labor intensive that you’d write your program then hand it off to another human to translate into machine code.

Then we invented compilers and now programming can take up a whole person’s day so programmers stopped having time to do product management. That became a full-time job supplying 4+ programmers with enough work to stay busy.

If we can replace those 4 programmers with AI, software engineers will once more turn back into product managers.

The best product managers I’ve worked with have some combination of a comp sci and business background. The CS background helps a lot.

And some of the best software engineers I’ve worked with are basically their product manager’s right hand. Partnering smoothly in developing requirements, communicating technical feasibility, and deeply understanding their customers. They could be product managers but choose not to.

willsmith72 4 days ago [-]
and that's because software engineering is <50% "writing code"

TDD is a great way to show exactly how much you understand what you're about to build. the make all the decisions about edge cases and various conditions ahead of time, before even getting to the code

chasd00 4 days ago [-]
it sounds like pseudocode sort of, like analyzing the requirements and needs to a point all that's left is typing it out in whatever programming language you're using. I can see an LLM being pretty good at that but then that's just a higher level version of a compiler going from a programming language to what the machine understands. You start with very well structured human language, the llm turns that into something the compiler understands, and then that is turned into something the machine understands.

It sounds like using an LLM to write code requires careful preparation and wording ahead of time that it's basically like writing in a very high level programming language itself.

extr 4 days ago [-]
Yeah, this is my experience as well. Once I've fully fleshed out the requirements to the point that there is zero ambiguity in what I want, I've basically written a pseudocode implementation already and the AI is just saving me some typing.
sottol 4 days ago [-]
The main thing that makes me skeptical is still what happens to a code base when you do this longer-term. And not just the code base but also the company when nobody understands the code any longer, but maybe neither are problems.

A couple questions:

* Will the codebase turn into a mess over time by having the AI apply changes over changes over changes? Do we even care? Or do we want a human to still be able to follow what is going on?

* Will you just be able ask the AI to refactor it all and clean it up? Then it wouldn't be a problem I presume.

* Are product-based tech companies/startups still defensible if anyone can basically recreate the product with some English?

* I don't know Codepilot Workspaces - are the prompts that generate and change the code kept somewhere? Imo they're part of the codebase now.

bunderbunder 4 days ago [-]
My sense, after a year of working at a company with an enterprise Copilot subscription:

If your idea of high-quality code is "follows all the standard clean coding practices, uses design patterns, doesn't do anything Sonarqube would complain about, etc.", then it does a great job.

In terms of more abstract, design-level aspects of code quality, though, I have been less impressed. So, things like limiting statefulness and avoiding unnecessary temporal coupling, good high-level abstractions that obey regular and predictable - ideally algebraic - rules, preservation of well-defined bounded contexts, things like that. Left unchecked, Copilot will happily help you turn a large monolithic codebase into architectural spaghetti.

But then, most humans will do that, too.

theshrike79 4 days ago [-]
Is it verbose? Yes

Will it work? Also yes.

This is usually enough for most cases. Despite HN skewing to the fancier side of programming, the vast majority of day to day programming is just slapping together API glue.

For those cases LLMs like Copilot are excellent. It's a lot faster to ask Copilot about some specific C# thing than start searching through Microsoft's documentation for it. In most cases it can just insert whatever you want at the cursor.

Like just today I pasted a SQL CREATE statement to Copilot and asked it to create a FooModel class of it. Took me 3 seconds of typing, about 5-10 seconds of waiting and clicking "insert at cursor" and I had a 15 property model class.

Repeat a few more times and I've cut down stupid tedious writing by at least 30 minutes and I can go do the more fun bits of attaching some actual logic to those models.

danpalmer 4 days ago [-]
Limiting state is the big problem here in my opinion. I’ve also noticed the tendency of AI tools to just add more variables to make things work, which is fine for write-only code, but makes it harder for humans and AI tools to maintain it in the future.

However I think this is also the hard bit for humans to do. It’s one of the most frequent stumbling blocks I see for more junior engineers, and one of the things I notice most when working with code from people who are really good programmers.

simonw 4 days ago [-]
My experience so far with LLM generated code is that it tends to be pretty easy to maintain in the future, because it uses obvious code patterns and includes genuinely relevant comments.

The trick is to know how to program already, and avoid checking in LLM-generated code unless you completely understand every line.

If you don't do that you'll run into the same problems as you would if you hire a contractor to build your codebase without understanding what they did for you.

HanClinto 4 days ago [-]
> because it uses obvious code patterns and includes genuinely relevant comments.

I often (simplistically) explain LLMs to people by explaining that it's essentially running a statistical average of language. Next-token-prediction (generally) aims to predict the next-least-surprising word that would occur in a sequence. It aims to "make sense" and be unsurprising.

If you want creative writing and innovative research papers and novel ideas, this isn't going to get you very far.

But if the things you want are "unsurprising" or "predictable" (great attributes of good, maintainable source code), then using this to write code feels like a pretty darn good fit.

easton 4 days ago [-]
> If you don't do that you'll run into the same problems as you would if you hire a contractor to build your codebase without understanding what they did for you.

I guess the difference is now that the contractor is cheap or free (because it’s a LLM), whereas in the old days you’d either hire a person to do the work and not understand or pick up a book and figure it out yourself (or go to school, or whatever). Figuring it out yourself was often cheaper and then you could understand.

(Not that humans can be replaced by LLM devs yet, or that LLM generated code is necessarily unreadable. It’s usually fine as you say.)

sottol 4 days ago [-]
I have a feeling that if there was not a healthy amount of competition in the space, the prices would start to trend towards the cost of human work.
antifa 4 days ago [-]
And the self-hosted options are better than nothing, I'm currently getting code autocompletions via Starcoder+TabbyML on an M1 MacBook pro.
sottol 4 days ago [-]
> If you don't do that you'll run into the same problems as you would if you hire a contractor to build your codebase without understanding what they did for you.

I really like this way of thinking about using LLMs, I think that's a great analogy in many ways.

Swizec 4 days ago [-]
> Are product-based tech companies/startups still defensible if anyone can basically recreate the product with some English?

The code is not the asset. It never has been. Deeply understanding your customer, their problem, and how to solve it is the asset. The code is just the current manifestation of that understanding.

Problem is that for many companies the code is also the only manifestation of that understanding.

sottol 4 days ago [-]
Maybe put another way, if I can get an LLM/AI to build exactly the product that I need, is a company that serves many customers simultaneously but probably worse still necessary?

I think it'll be hard enough to reason about what you really want that most customers won't care enough to roll their own. And personally, I'd happily pay someone to keep the product maintained. A product is usually not one and done.

usea 4 days ago [-]
My experience at companies is that the vast majority of the code is not understood by anybody working there, nor even attempted to be. It sits in third party libraries that nobody audits.

That's a very bad thing, but this sounds like just more of that. Which most developers seem totally fine with.

layer8 4 days ago [-]
I think there is a natural trend for implementations to drift in complexity toward the edge of what can be understood (and quite often beyond that edge). I would expect the same to happen for AI-authored code with respect to what the AI can understand. Maybe refactoring and reducing tech debt will have to be a more explicit part of development and maintenance in the future?
lordswork 4 days ago [-]
>Will you just be able ask the AI to refactor it all and clean it up? Then it wouldn't be a problem I presume.

For smaller contexts, LLMs tend to be really good at reviewing, suggesting changes, and refactoring. I haven't seen this applied successfully at a larger contexts, though.

4 days ago [-]
siliconc0w 4 days ago [-]
This debunking video(https://www.youtube.com/watch?v=tNmgmwEtoWE) of Devin really questioned the usefulness for me. It created a file in the repo and spent a lot of time debugging its own unnecessary code rather than reading the Read Me to understand that the code it needed to use already existed and just needed to be run with different inputs.

It's not clear it we're even near a point where it can independently and meaningfully contribute to an existing codebase rather these greenfield demos. Feels similar to the self-driving AI hype where level 5 is still pretty far from realized (Waymo is closest but AIUI still uses a lot of remote human intervention).

whamlastxmas 4 days ago [-]
Waymo is definitely not the closest
HanClinto 4 days ago [-]
When reading posts such as these, it occurs to me that AI is increasing the rate / lowering the bar for developers to make the jump to leader / architect.

Look at the lessons that the author has learned here:

* More specificity == better

* The importance of clear bulleted delivery items / criteria-for-success

* Unspecified details around a general goal is a ripe area for disappointment

All of these are things that a product owner / team leader learns in their first few projects (and so often must re-learn as the years go by).

AI is lowering barriers and promoting more developers to this role earlier. But everything that we learned about good Agile development in the past will still apply to the future.

ec109685 4 days ago [-]
This isn’t architect level planning. Even a junior developer should be able to work from vague requirements and build a mental like that.
morbicer 3 days ago [-]
Exactly. Companies relying on architects making all decisions and providing detailed specs are doomed. Architects in general often sucks. Empower any rank to create designs and make technological decisions. They will grow. If you can't trust them you have a problem.
frereit 4 days ago [-]
I'm honestly surprised at the relatively positive reception to this. While there isn't any problem with the code shown, the same effect couldn've probably been achieved with a few well thought out shortcuts in any IDE (delete outerHTML of svg tag, add new tag, add attributes). The only "more complex" output that is shown is the specification that CW produces, which literally contains an error in the first line ("Sp<logo>ral").

Moving on to the complex task, the author simply hand-waves "this isn't good yet but surely it will be". No evidence is given as to _why_ there should be any expectation of LLMs getting there.

And the perceived benefit of discovering that their idea of the more complex task was not thought out enough did not come from the LLM, it came from the author itself. They may as well have spoken to ELIZA or a rubber duck.

What am I missing?

tymscar 4 days ago [-]
Youre missing the koolaid. I do wonder if people who cut too much slack to this sort of tech are just doing it because they’re scared of going against the grain. Sort of a vicious cycle.
doug_durham 4 days ago [-]
This is a pretty reductive argument. I'm not quite sure what "a few well thought out IDE shortcuts" are. I've never experienced an IDE that allows any kind of sophisticated "shortcut" that will write arbitrary code.
throwaway71271 4 days ago [-]
Copilot is so strange for me, I use it, but it deeply conflicts with the way I code.

As I type the code I get a feeling if I like it, I also pretend to use it even when its unfinished, kind of like playing a game. Even if I spent a lot of time thinking about what I am going to write, until it exists and I play with the code, I don't know if its good.

Now Copilot writes so much code, even if it exactly what I was going to type, I kind of lost the intuition, and I hate it.

So I just enable it when I do things that I don't consider programming anymore.

I still think it is absolutely amazing tech though, and I know it will get better and better, and at some point it will be hard to not use it, but I really enjoy playing with the code as I write it.

anotherpaulg 4 days ago [-]
My open source tool aider [0] has long offered an "AI pair programming" workflow that is similar but not identical to Copilot Workspaces.

Aider is more of a collaborative chat, where you work with the LLM interactively asking for a sequence of changes to your git repo. The changes can be non-trivial, modifying a group of files in a coordinated way.

Workspaces seems more agentic. You need to do a bunch of up-front work to (fully) specify the requirements. Even with a perfectly formulated request, agents often go down wrong paths and waste a lot of time and token costs doing the wrong thing.

That's also not how I code personally. My process is usually more iterative.

Another big difference compared to Workspaces is that aider is primarily a CLI tool. Although I just released an experimental browser UI [1] yesterday, making it more approachable for folks who are not fully comfortable on the command line.

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/2024/05/02/browser.html

objectivetruth 4 days ago [-]
[dead]
throwaway918274 4 days ago [-]
Nobody is running faster towards the cliff of their own destruction than programmers.
isoprophlex 4 days ago [-]
Good. I'm looking forward to my future career as a woodworker.
layer8 4 days ago [-]
How will you make a living though?
panarky 3 days ago [-]
AI automates the jobs of sheep, not the (yet) the jobs of shepherds.
ike2792 4 days ago [-]
I might be a curmudgeon, but I think that even teaching CS in Python is too new-fangled and high-level for CS students. Learning the hard way with C/C++ (or for a more modern flair Go or Rust) and understanding how to handle pointers and memory allocation makes it a lot easier to debug things when the higher level languages and frameworks have issues. A class or two on coding with AI would be great at the undergrad level, but not basing an entire curriculum on it.
FrustratedMonky 4 days ago [-]
Agree.

And not joking, I think there should be engineering classes taught with slide rule, to get students to learn old school ability to work with orders of magnitude in their head.

Of course students have to learn new things too. But do think we are really losing some of the basic skills, methods of thinking, that you get with the old methods.

Like tracking down some pointer errors, it takes time, it's a difficult struggle, but you do learn a lot about how things work.

Have classes with 'new' tech, then have classes that require 'old' tech. Exams without calculators, or make an Assembly language class mandatory.

tbeseda 4 days ago [-]
My experience was similar[0] and my conclusions line up with the author here. Summed up: thinking about the problem is the hard part. I can think faster than I can code, but I can code faster than I can write out (in a detailed enough way to achieve my goal with Copilot Workspace) the spec.

[0] https://tbeseda.com/blog/previewing-github-copilot-workspace...

vundercind 4 days ago [-]
… am I wrong for thinking the actual play Workspaces is making is in corporate spyware, and the rest is mostly secondary as far as what may get businesses to pay for it?
kbenson 4 days ago [-]
I don't know, but I think you and I have vastly different base assumptions.

Its a huge legal liability to have statements about how data won't be used and then use it, when you're a company that might compete in similar spaces, and Microsoft competes almost everywhere.

While I trusted githib when they were independent, I trust this feature from MS owned github more than I would them because the liability misuse opens them up to is so much more. If I was building a product and I was able to prove some MS depot used my info in an unauthorized way to build a product, I could sue that product out of existence, and someone always talks, so MS can't assume it will never be known, and they know that.

Marsymars 4 days ago [-]
> Its a huge legal liability to have statements about how data won't be used and then use it, when you're a company that might compete in similar spaces, and Microsoft competes almost everywhere.

Almost everywhere in tech, but almost nowhere outside of tech. I work for a large non-tech conglomerate, and as far as I'm aware, we don't compete with any MS products/services.

jmole 4 days ago [-]
Yeah, but look at this through the lens of enshittification.

Microsoft will sell "Copilot enterprise" to companies that can afford to negotiate. But every individual out there on a normal subscription gets data mined.

OpenAI is similar - you can't negotiate a "no-logs" deal with them unless you are a player the size of say, Epic (the health industry giant).

airstrike 4 days ago [-]
> OpenAI is similar - you can't negotiate a "no-logs" deal with them unless you are a player the size of say, Epic (the health industry giant).

OpenAI's API license states that they won't use your data to train models, if that's any consolation. Unlike ChatGPT

vundercind 4 days ago [-]
I mean “here’s some telemetry (spying) data on your employees, in a nice little dashboard”
kbenson 4 days ago [-]
Ah, that makes more sense. I was misinterpreting corporate Spyware and corporate espionage. I imagine providing additional info to employers is something MS would offer as a value add to organizations using this, but that's sort of expected with all organization based tooling in my eyes.

Dont use your personal account for work, and don't assume any service you use for work provided by work isn't giving data on you to you employer, and if at all possible try to work for a company that cares what you deliver and not how you do it (meaning they aren't micromanage, not that they want you to skirt laws.. ). Some of those are obviously easier than others to control.

kulor 4 days ago [-]
There was an impressive demo at AWS Summit London of their Code Whisperer and Q products taking a similar route to CW. Provide a user story and it'd create a PR.

I could see "AI workspace driven development" being the future of at the very least cutting through the smaller tickets of work and generally improving developer workflows.

HanClinto 4 days ago [-]
It feels like CW is taking a step further left -- it takes a description of the problem and the codebase and creates a detailed user story (with bulleted points for success criteria and everything).

That feels like the right way to go -- almost baking an "agile done right" workflow into its engine.

Kon5ole 3 days ago [-]
I think AI copilots are great for coding. The IDE and compiler are a second source of truth so you can quickly eliminate AI generated nonsense and figure out what kind of problems it is good at solving.

To me the effect seems similar to going from assembly language to C or from C to Java or Visual Basic. It's a new level of abstraction that saves massive amounts of time.

I think the amount of work for software developers will increase just like it did back then. Many software projects are never started because they will be too expensive. If they can be done by half the number of people in half the time using AI tools, they might get a "go" instead.

bengale 4 days ago [-]
I think the disconnect with these tools is that their endgame is not to be a developer tool, it’s to take them out of the loop.

This is a tool for product owners, it’s just too early for them to use it by itself.

aantix 4 days ago [-]
The author is a software engineer and his last name is "Shipper".

Talk about high expectations!

This guy ships code.

ozten 4 days ago [-]
No mention of Cost for a task completed.

A similar system, CrewAI, I ran their hello world and it cost $4 against GPT-4.

There is a trade-off between my time and the cost of the feature against me just coding it up with LLM assistance which has a fixed cost of $20 per month.

Fin_Code 4 days ago [-]
I'm still not sure how this is different than the vs code plugin. It seems to function in about the same way. Just uses a bit of different context reference. But that scope can easily lead to incorrect code targeting.
HanClinto 4 days ago [-]
I haven't played with CW yet, but based on the screenshots and whatnot, it feels like CW adds another layer of requirements-gathering to its workflow (along with clear bullet points for what the terms of success look like) that regular Copilot doesn't have.
justinclift 3 days ago [-]
> CW took two to three minutes to return.

Hmmm, wonder if there's cheaply sourced labour of the human variety in that loop then?

akiselev 4 days ago [-]
Any way to get access? All the AI product waitlists are killing me and I was stuck for months on the last GHNext waitlist.
andrewstuart 4 days ago [-]
I tried GitHub copilot in vscode. It was immensely frustrating.

The main problem was context. It didn’t seem to know what files to use for our discussion, didn’t listen when I told it, didn’t remember when I told it, had no effective way that I could bring files in and out of the discussion.

All this led to a deeply frustrating session of interaction and frankly I hated it. Easier to use ChatGPT web ui and copy and paste in and out.

GitHub copilot I found better in jetbrains ides. It seemed mostly to know what I was asking about though it’s very long was from being good at managing context.

It’s surprising that after the amount of development they’ve put into copilot it still is so bad at what I’d consider to be barest minimum functionality to integrate into an IDE.

intended 4 days ago [-]
ChatGPT will happily tell you how to build ocean liners in landlocked deserts, or how to ice skate up a hill.
marc_ranieri 4 days ago [-]
It's pretty much having an assistant changing hieroglyphs to the alphabet...
ianbutler 4 days ago [-]
To answer the question of whether something like this is the future of programming posed at the end of the article: I think in a lot of ways yes. It reduces the iteration time for making a new feature and handles a lot of the project management too. As AI get's smarter it makes sense to design workflows around how their capabilities can complement ours as developers and not just force them into existing workflows.

We're working on something similar to workspaces: https://www.bismuthos.com

We provide a workspace to build Python backends. Chat on the left, code and visual editors on the right. However, we also handle deployments, data storage (we have a blob store), serving (we built a home grown function runtime) and logging.

The experience is tightly integrated with our copilot and the idea is to get ideas off the ground as quickly as possible with as little devops hassle. Right now the focus is on building something new, but we're in the process of making it easier for existing projects to integrate with us too.

Feel free to drop by our (very) new discord too: https://discord.gg/E5Yn3vaM

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:35:09 GMT+0000 (Coordinated Universal Time) with Vercel.