How to use GPT-4 Turbo with Vision to build apps faster

OpenAI announced the general release of the GPT-4 Turbo with Vision through their API. You'll see it baked into every product near you soon. We've had it in Create since the preview release in November 2023.

If you want to make something from an image with GPT-4-Turbo with Vision, Create is the fastest way. You can try it here.

I thought it might be a good time to share what we've learned since November about using Vision to build sites and applications in Create.

I hope these insights also give you ideas on apps you'd want to make that utilize this new lego piece. We've added GPT-4-Turbo with Vision as an integration you can add to your own apps made with Create - so making your own Vision powered apps is easier than ever.

What is GPT-4 Turbo with Vision?

You've heard of ChatGPT, right? GPT-4 Turbo is the AI model that powers it. You know it can take in text and output text. The Vision version can take in images as well, which is a powerful new capability. It means you can scan and understand any image. It was the first "multimodal" model, which is a fancy word for "multiple modalities", which is a fancy term for "it doesn't just take text, but also other formats like images, video, etc."

Since OpenAI first released GPT-4-Turbo with Vision, there's been a whole host of other models that have been introduced, all vying for the metaphorical king of the hill as all around best model. Some of them are even multimodal. Some of them claim they're better.

At least today, GPT-4-Turbo is still the OG workhorse "just get it done" model. In Create, it's our base free tier model (and you can upgrade to Pro to build with other models at the same time). Since the preview release, OpenAI has been iterating on it to make it better and now it's generally available for everyone to use.

So in a nutshell: GPT-4 Turbo with Vision lets you go from an image to text.

The obvious questions are:

  1. if you can take in images, couldn't I give it screenshots or images of apps and designs?
  2. And if I do that, can I ask GPT-4-Vision to describe them or even better, ask it to output code?
  3. And if I can do that, can I do that in my favorite new creative tool called Create that lets me turn text and images into apps?

Yes. The answer is yes.

8 killer ways to use GPT-4 Turbo with Vision to build apps

Since it's release, the team at Create and our users have discovered 8 killer ways to use vision models when building.

Recreate existing websites like Hacker News

Having a vision model handy is like having a giant copy and paste button for the web. You can now re-create entire pages, like this demo taking a screenshot of Hacker News and turning it into a working version of the page in Create.

0:00
/0:20

Screenshot of Hacker News to app for Hacker News

From here, you can make modifications or add more details in text about what you want to change. It's a strong starting point.

Vision models still have trouble when there are too many details in the image. So a simple site like Hacker News with just links works well, but you might struggle with larger pages with more components and details. As the models improve, this workflow should improve.

Still, it's a useful starting point. After you get a 1st version in, you can add more text in Create to get it just right. The more details you give it, the better it can do.

Grab components from around the web

Instead of recreating entire pages, you can also use GPT-4 Turbo with Vision to grab specific components from websites.

See a cool navigation bar, footer, or card design? Take a screenshot, feed it into Create, and watch as the AI generates the code for that component. It's like having a personal web designer at your fingertips. Since components are typically smaller than entire pages, Vision models can usually get them right in one screenshot.

Here's an example grabbing a card from AirBnb and turning it into my own card component. Notice how you can add more details in text beyond the initial image to make it more custom:

0:00
/0:30

AirBnB has a cool card, so now I have a cool card

I've been using Create as an "inspo" Pinterest board for building. I grab screenshots of things I like, paste them into Create, and then later refine them into component libraries I can use in my own project.

In fact, this is the primary workflow for some of our most active users on Create.

Turn a napkin sketch into an app

Have you ever had a brilliant app idea but struggled to bring it to life? With GPT-4 Turbo with Vision, you can turn your napkin sketches into functional prototypes.

Simply take a picture of your sketch, upload it to Create, and let the AI work its magic. It'll generate the necessary code and components, allowing you to refine and iterate on your idea faster than ever before.

I'll admit this one is more of a "party trick" but it's still super cool.

Don't have a napkin? No worries! You can use Excalidraw or Tldraw to make a sketch fast and paste the image of it into Create.

This workflow helps you brainstorm much faster. You don't need to over invest in high fidelity Figma designs to have something interactive that you can get feedback on and feel.

Last month we profiled Shaunak Turaga and his team at Docsum (YC S23) and how they use Create to speed up how fast they can move. It's a game-changer for rapid prototyping and ideation.

Grab lots of text at once

Whenever you're building something real, you know that one of the most time consuming parts is taking the text that you have in your design, document, creative brief, another website, or wherever, and bringing that into the right place in your app. It's amongst the most tedious things you do when building software.

GPT-4-Turbo with Vision has a chance to eliminate the tedium once and for all. It's scary good at extracting text from images and turning it into usable content for your apps.

Want to iterate on an existing page, like your pricing page? Just grab a screenshot, and GPT-4-Turbo with Vision will pull out all the existing options for you. From there, you can easily change them to your liking.

0:00
/0:27

Copy and pasting all of this would have taken forever...

Need more realistic lorem ipsum for your designs? Feed it an image of the kind of text you want, and voila! Instant, context-aware placeholder content that feels way more authentic than your typical "lorem ipsum dolor sit amet."

Turn any existing offline form into your own online tool

If there's a paper process, you can probably turn it into an online process very fast using Create and Vision. Just snap a pic of the form, feed it into Create, and let the AI generate the necessary fields and logic. A few tweaks later, and you've got yourself a shiny new web app that streamlines the process for everyone involved.

A few months ago, I was riding in an Uber when the driver told me he always wanted to make an app to help his riders understand their vehicle smog reports. He had been looking at them for years so got what they meant, but most people found them unintelligible. I took a picture of the form and asked GPT-4-Turbo with Vision to make it. He then could customize it with a few more lines of text!

So what manual forms are you going to turn into your own personal tool?

Turn static design files into interactive prototypes

Just feed your design files into Create, and watch as the AI generates functional code based on your designs. Since it's actual code under the hood, your engineering team can then take that as a starting point and run with it.

Need designs? Some of our users even go from generative AI design tools like Galileo AI straight to Create project.

No more static designs gathering dust – now they can come to life with just a quick snapshot. It means you can feel what they'll feel like vs. imagine it.

And once you get it just right in Create, you're ready to go live to your team in one tap.

Make still images move

This one's also just for fun (or maybe your next great party trick) but they say a picture is worth 1000 words. Sometimes you see something that's not moving, and you just know it would be cool if it were animated.

For example, one of our users took a picture of the Star Wars opening crawler and made it animated in Create using GPT-4-Turbo with Vision. He even turned it into a Japanese crawler! But you could take a picture of pretty much anything still and say "animate it" or "translate it", and and it would probably work.

0:00
/0:21

Talk about a moving picture

This is possible because Create is generating code under the hood based on your image and instructions. So not only can you bring static images to life, but you're also getting a foundation you can build on. Pretty wild, right?

Create your own tools that uses images as an input

We've gone over a bunch of ways to use images to make apps. But if your apps could also use images?

Now that we've added Vision as an integration in Create, you can make your own apps that take in images and do cool stuff with them. Just type / in your project spec and find GPT Vision in the menu.

For example, here's an AI resume scanner I made on Create that just takes in a resume, scans it as an image, and uses AI models to give feedback. If you don't like my version, you can easily make your own by remixing the app and tweaking it to your liking.

The possibilities are endless – from image-based search engines to custom tools that parse unstructured data, the world is your oyster when you combine Create with the power of GPT-4-Turbo with Vision.

Wrapping up

GPT-4-Turbo with Vision is a game-changer for app building, and we're just scratching the surface of what's possible. By leveraging the power of this multimodal AI model in Create, you can turn sketches into prototypes, design files into functional code, and offline processes into streamlined web apps – all faster than ever before.

But the real magic happens when you start exploring your own ideas. With GPT-4-Turbo with Vision as an integration in Create, the only limit is your imagination. So go forth, experiment, and build something awesome – we can't wait to see what you come up with!