When I wrote my last article on AI, I was concerned that people were experiencing burnout from AI news and debates, but the views said otherwise. So, here I am again. If you're jaded with the AI topic, save yourself some time and click away, but if you're not, let me flag to you something I haven't seen spoken of anywhere else.
I have been using AI in various capacities for several years, but the past 18 months have given birth to a completely new breed of AI, more powerful than anything we'd seen before by lightyears. For the past 9 months or so, I have been experimenting with Midjourney, one of the premier image-generating AI software similar to OpenAI's DALL-E (which I used before Midjourney) and Stable Diffusion (which I've barely used, but is held in high regard.) For the uninitiated, let's do a quick summary.
What Is Midjourney?
Midjourney is a large language model (LLM) that utilizes AI to create images from text. By describing the image you want to see, Midjourney can generate results by using the enormous dataset of images it has. The text used to generate these images are called "prompts," and they can be as simple or as complicated as you choose. While early versions of Midjourney were impressive, the most recent model version, 5.2, allows the creation of images indistinguishable from photographs.
The Edge Photographers and Videographers Have
The first thing to note is that anybody can get a photo-realistic image out of Midjourney, even with the most basic prompts. You might be surprised just how strong the results can be from prompts that are a few words. What many people misconstrue about this is that anyone can create anything, but that's not necessarily true. What makes prompt-driven AI difficult is controlling the output. Yes, anybody could create a photo-realistic image of an elephant, but to have full control over the setting, the colors, the depth of field, the angle, the light, and so on, requires some know-how. Although it doesn't pertain to Midjourney (but rather LLM such as ChatGPT and Bard), there is a reason why "Prompt Engineer" is the most in-demand new job with over 7,000 roles listed from June 2022 to June 2023, according to Workyard.
Now, having used many different AI software, I feel confident in saying that the skill ceiling for Midjourney is significantly lower than the likes of ChatGPT. Nevertheless, most people do not use text-to-image AI particularly well, just typing basic prompts and hoping to get lucky. You can improve this with various parameters, but where photographers and videographers have the advantage is using our expertise in the prompt.
Cameras and Lenses
Firstly, it has been proven that including cameras in your prompts can affect quality. It isn't known how many images Midjourney has been trained on, but the general consensus is that it's comfortably in the billions. When you include a camera in your prompt, it will likely find images taken with that camera (among many other images). In fact, some people found that simply adding H6D to the end of a prompt could yield higher-quality results. I suspect many doing this don't even know that it refers to the $33,000 Hasselblad H6D medium format DSLR.
In my experience, which modern camera you choose doesn't affect the final result all that much in terms of quality, though the sensor size of the camera does often affect depth of field. For example, the below images were identical prompts with the results varied only by camera; one was the Hasselblad H6D and one was the Fujifilm X100V. That is, one is a medium format sensor and one is an APS-C sensor.
What's important here is not that the lighting changed, or some elements, or even the model — they're par for the course when you regenerate. What's interesting here is the depth of field. The background of the X100V image is far closer to focus than the medium format — this is accurate, and as photographers, we understand why this happened. So, using a combination of aperture and the camera, we can dictate the depth of field.
Settings
As I mentioned above, the aperture can be used to affect the depth of field of an image, just as it does in real life. If you want a narrow depth of field in your image, you want Midjourney trawling fast apertures. Although it is far from an exact science — primarily because Midjourney has no way of gauging the distance of the subject from the camera in the reference images — the results will be in the right direction at least. Below are two prompts for a headshot on the street, one I included f/1.4 in the prompt, and the other I included f/11 in the prompt instead.
You can see from the people on the left of the frame how much the aperture impacts the image, and you can see more extreme examples that this. Remember though, your words affect the depth of field too.
Terminology
So, words — rather expectedly — play a massive role and often overpower the settings you use in your prompt if they are at odds with one another (for example, a "cinematic headshot" at "f/18" isn't going to give you a headshot with everything in focus.) If you type "snapshot of a man on the street" your depth of field will likely be wildly different to "editorial headshot of a man on the street." Below are the results for exactly those two prompts.
What's more, you don't have to use photography terms logically for them to work well. One example would be "macro photography" added to any prompt that has nothing to do with macro photography. Those two words will often cause your results to have a narrow depth of field and a generally cinematic look. The below examples show just how much the term "macro photography" can improve the results.
Lighting
As every photographer and videographer knows, light is the be-all and end-all of our crafts. So, put that to work in Midjourney too. The average person doesn't know lighting styles, but you can control the lighting in Midjourney by using them. As with every tip, remember Midjourney isn't a simulator, and you'll sometimes miss your target, but with some experimentation, you can control the output and look of generated images.
It wildly overcooked the eyes, but you can see how impactful it can be when you dictate the lighting.
Miscellaneous Tips
Remember, there is a lot of what will seem like randomness in the results, but really, we just don't know all the interactions or Midjourney's source material. Here are some photography-centric tips:
- Midjourney can replicate film stocks quite well, so use them for a certain aesthetic
- "Tilt-shift" sometimes works, but it often chooses a high point of view
- "Color grading" tends to shoot for complementary and bold colors
- "HDR" does exactly what HDR does most of the time
- "Cinematic" often results in darker, low-key images
- "8K" — lots of people add this to the end of prompts, but it causes the results to look fake and CGI in my experience
- Obscure photography types such as "pinhole" or "infrared" often work well
- Unusual lenses can work too if they're well known enough, such as Lensbaby
- "Low-key" and "high-key" do exactly what you'd hope
- You can dictate the angle of the shot with "low-angle" or even "drone"
- Not including something, such as lighting, doesn't mean "no lighting", it means "Midjourney, pick the lighting based on what I've said in this prompt"
The Ethical Elephant
I put some thought into whether I would add this section, but at this point, it's a widely known functionality of Midjourney and other AI image generators, so I will address it. However, I have decided not to include any example images.
The "in the style of" component to prompts is arguably the most powerful influence on the look of the final image. This can be used ethically and to great effect, as I have shown above, with the likes of "in the style of a National Geographic photograph" or "in the style of a Vogue cover", but by getting more specific, you tread on ethically difficult ground. For example, you could add to your prompt, "in the style of Annie Leibowitz," and it will get you closer to her aesthetic. If you combine this with other details — which I am not going to provide — you can get to an image that I'm confident I could fool people into thinking is hers. These sorts of prompts make me uncomfortable, whether you're referencing a photographer, an artist, or a DoP. This is also one thread to an entire rope of copyright issues surrounding AI image generation.
Final Thoughts
AI is a mixed bag for photographers; it's powerful, valuable, and revolutionary, but it's also scary, damaging, and legally uncharted. I resolved to practice using AI of all forms as part of my skill set, and while that is helping me in many ways, it's also making me aware of where photographers are vulnerable. This is something that is spoken about regularly, so I thought I'd balance the scales a little with some of the advantages us 'togs have with text-to-image generators such as Midjourney.
Haha. Exactly. Some of the photographers I shoot with embrace this. And, they try to spice up their flat senior portrait style with some AI for more dramatics instead of learning how to shoot with natural light and editing; or kickin' in some sort of flash/artificial light and some editing.
"There is nothing wrong with your television. Do not attempt to adjust the picture. We are now controlling the transmission. We control the horizontal, and the vertical. We can deluge you with a thousand channels, or expand one single image to crystal clarity... and beyond. We can shape your vision to anything our imagination can conceive. For the next hour, we will control all that you see and hear. You are about to experience the awe and mystery which reaches from the deepest inner mind to... The Outer Limits. Please stand by. - " Outer Limits TV Series 1995
This AI application and others like it will be a crutch for many who don't have the skill to create compelling real images. However, it will be a real hit for those posting on dating sites! (lol)
Still not overly impressed with these types of AI programs like MidJourney. The humans in most of these images look “dead”, with no emotion. And in the image with the gentleman in the “headshot” — more a portrait really — that f/11 aperture doesn’t look like f/11, the BG looks too blurred for that aperture. And if you look carefully the catchlights in each eye is different and the shirt buttons don’t match on one of the images.
And frankly, I’d rather go to Africa and photograph elephants than boast that I created a photo of an elephant in the wild sitting at my computer using an AI program. Some things need to be done in reality.
I agree with your statement. However, it all comes down to the almighty dollar. If I'm running an article and need a photo of an elephant in the wild to grace my magazine, which photo costs more. The one I generated in a few minutes or the one I pay you to publish. Will the difference outweigh the cost? I doubt it as most viewers wont be examining the image through the eyes of a professional photographer.
True but being able to add specific requirements or items in the photo would take less time and effort than starting with a stock image and then using photoshop.
It is nitpicking but you can't generate a "photo" only an image. They are not the same thing.
In 2023, how many still believe that film photographers have the advantage in the digital era?
Digital photographers are in the exact same position as film photographers were back when cheap DSLRs became accessible to the masses. It doesn't matter what came before, because the new medium will create it's own aesthetic. The young generation growing up with AI will reject the photo techniques, looks and styles of the DSLR era. They will create their own rules and standards. In fact, expertise from a previous era actually becomes a disadvantage because that makes it harder to integrate the new trends. It's kinda like how parents can never really understand their kids new music because part of what makes it new is a rebellion against the parents.
Today, digital photo bloggers are still considered the authority figures but they'll soon be replaced by the next generation who will be experts at knowing the difference between "cool" and "lame" prompts. There's no way the previous generation will beat the new generation at being popular. Ai generative artists won't care about what camera settings and techniques their prompts were originally based on and they'll break all kinds of rules both on purpose (out of spite) and simply by accident (out of ignorance.)
I can buy that at least for some time people with photographic knowledge could have some advantage with generating images. To what end? A crappy job making fake images? It has already been ruled that you can't even copyright the images.
Been a street photographer for over 30 years, and photographer for over 40. I would never want to use AI to enhance or make photos. I have been using Midjourney ai to edit my creative photography, into unique art. All copyrighted and loving every second of it.