Swift Regex Tutorial: Getting Started

Master the pattern-matching superpowers of Swift Regex. Learn to write regular expressions that are easy to understand, work with captures and try out RegexBuilder, all while making a Marvel Movies list app! By Ehab Amer.

4.4 (5) · 2 Reviews

Download materials
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Looking Ahead

To get the expression to look ahead, you want the operation called NegativeLookAhead. In regular expression syntax, it's denoted as (?!pattern), where pattern is the expression you want to look ahead for.

titleField should look ahead for fieldSeparator before resuming the repetition of its any-character expression.

Change the declaration of titleField to the following:

let titleField = OneOrMore {
  NegativeLookahead { fieldSeparator }
  CharacterClass.any
}

Build and run. Observe the output in the console log:

Found 49 matches
tt10857160	She-Hulk: Attorney at Law	                  |
tt10648342	Thor: Love and Thunder	                    |
tt13623148	I Am Groot	                                |
tt9419884	  Doctor Strange in the Multiverse of Madness	|
tt10872600	Spider-Man: No Way Home	                    |

Excellent. You fixed the expression, and it's back to only picking up the fields you requested.

Before you add the remaining fields, update ones with an any-character-type expression to include a negative lookahead. Change the declaration of premieredOnField to:

let premieredOnField = OneOrMore {
  NegativeLookahead { fieldSeparator }
  CharacterClass.any
}

Then, change imdbRatingField to:

let imdbRatingField = OneOrMore {
  NegativeLookahead { CharacterClass.newlineSequence }
  CharacterClass.any
}

Since you expect the rating at the end of the line, the negative lookahead searches for a newline character instead of a field separator.

Update recordMatcher to include the remaining fields:

let recordMatcher = Regex {
  idField
  fieldSeparator
  titleField
  fieldSeparator
  yearField
  fieldSeparator
  premieredOnField
  fieldSeparator
  urlField
  fieldSeparator
  imdbRatingField
}

Build and run. The console will show that it found 49 matches and will print all the rows correctly. Now, you want to hold or capture the relevant parts of the string that the expressions found so you can convert them to the proper objects.

Capturing Matches

Capturing data inside a Regex object is straightforward. Simply wrap the expressions you want to capture in a Capture block.

Change the declaration of recordMatcher to the following:

let recordMatcher = Regex {
  Capture { idField }
  fieldSeparator
  Capture { titleField }
  fieldSeparator
  Capture { yearField }
  fieldSeparator
  Capture { premieredOnField }
  fieldSeparator
  Capture { urlField }
  fieldSeparator
  Capture { imdbRatingField }
  /\n/
}

Then change the loop that goes over the matches to the following:

for match in matches {
  print("Full Row: " + match.output.0)
  print("ID: " + match.output.1)
  print("Title: " + match.output.2)
  print("Year: " + match.output.3)
  print("Premiered On: " + match.output.4)
  print("Image URL: " + match.output.5)
  print("Rating: " + match.output.6)
  print("---------------------------")
}

Build and run. The console log should output each row in full with a breakdown of each value underneath:

Found 49 matches
Full Row: tt10857160	She-Hulk: Attorney at Law......

ID: tt10857160
Title: She-Hulk: Attorney at Law
Year: (2022– )
Premiered On: Aug 18, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BMjU4MTkxNz......jpg
Rating: 5.7
---------------------------
Full Row: tt10648342	Thor: Love and Thunder.....

ID: tt10648342
Title: Thor: Love and Thunder
Year: (2022)
Premiered On: July 6, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BYmMxZWRiMT......jpg
Rating: 6.7
---------------------------

Before you added any captures, the output object contained the whole row. By adding captures, it became a tuple whose first value is the whole row. Each capture adds a value to that tuple. With six captures, your tuple has seven values.

Naming Captures

Depending on order isn't always a good idea for API design. If the raw data introduces a new column in an update that isn't at the end, this change will cause a propagation that goes beyond just updating the Regex. You'll need to revise what the captured objects are and make sure you're picking the right item.

A better way is to give a reference name to each value that matches its column name. That'll make your code more resilient and more readable.

You can do this by using Reference. Add the following at the top of loadData():

let idFieldRef = Reference(Substring.self)
let titleFieldRef = Reference(Substring.self)
let yearFieldRef = Reference(Substring.self)
let premieredOnFieldRef = Reference(Substring.self)
let urlFieldRef = Reference(Substring.self)
let imdbRatingFieldRef = Reference(Substring.self)

You create a Reference object for each value field in the document using their data types. Since captures are of type Substring, all the References are with that type. Later, you'll see how to convert the captured values to a different type.

Next, change the declaration of recordMatcher to:

let recordMatcher = Regex {
  Capture(as: idFieldRef) { idField }
  fieldSeparator
  Capture(as: titleFieldRef) { titleField }
  fieldSeparator
  Capture(as: yearFieldRef) { yearField }
  fieldSeparator
  Capture(as: premieredOnFieldRef) { premieredOnField }
  fieldSeparator
  Capture(as: urlFieldRef) { urlField }
  fieldSeparator
  Capture(as: imdbRatingFieldRef) { imdbRatingField }
  /\n/
}

Notice the addition of the reference objects as the as parameter to each capture.
Finally, change the contents of the loop printing the values of data to:

print("Full Row: " + match.output.0)
print("ID: " + match[idFieldRef])
print("Title: " + match[titleFieldRef])
print("Year: " + match[yearFieldRef])
print("Premiered On: " + match[premieredOnFieldRef])
print("Image URL: " + match[urlFieldRef])
print("Rating: " + match[imdbRatingFieldRef])
print("---------------------------")

Notice how you are accessing the values with the reference objects. If any changes happen to the data, you'll just need to change the regex reading the values, and capture it with the proper references. The rest of your code won't need any updates.

Build and run to ensure everything is correct. You won't see any differences in the console log.

At this point, you're probably thinking that it would be nice to access the value like a property instead of a key path.

The good news is that you can! But you'll need to write the expression as a literal and not use RegexBuilder. You'll see how it's done soon. :]

Transforming Data

One great feature of Swift Regex is the ability to transform captured data into different types.

Currently, you capture all the data as Substring. There are two fields that are easy to convert:

  • The image URL, which doesn't need to stay as a string — it's more convenient to convert it to a URL
  • The rating, which works better as a number so you'll convert it to a Float

You'll change these now.

In ProductionsDataProvider.swift, change the declaration of urlFieldRef to:

let urlFieldRef = Reference(URL.self)

This changes the expected type to URL.

Then, change imdbRatingFieldRef to:

let imdbRatingFieldRef = Reference(Float.self)

Similarly, this changes the expected data type to Float.

Next, change the declaration of recordMatcher to the following:

let recordMatcher = Regex {
  Capture(as: idFieldRef) { idField }
  fieldSeparator
  Capture(as: titleFieldRef) { titleField }
  fieldSeparator
  Capture(as: yearFieldRef) { yearField }
  fieldSeparator
  Capture(as: premieredOnFieldRef) { premieredOnField }
  fieldSeparator
  TryCapture(as: urlFieldRef) { // 1
    urlField
  } transform: {
    URL(string: String($0))
  }
  fieldSeparator
  TryCapture(as: imdbRatingFieldRef) { // 2
    imdbRatingField
  } transform: {
    Float(String($0))
  }
  /\n/
}

Notice how you captured urlField and imdbRatingField changed from just Capture(as::) to TryCapture(as::transform:). If successful, the later attempts to capture the value will pass it to transform function to convert it to the desired type. In this case, you converted urlField to a URL and imdbRatingField to a Float.

Now that you have the proper types, it's time to populate the data source.

Replace the code you have inside the loop to print to the console with:

let production = MarvelProductionItem(
  imdbID: String(match[idFieldRef]), // 1
  title: String(match[titleFieldRef]),
  productionYear: ProductionYearInfo.fromString(String(match[yearFieldRef])), // 2
  premieredOn: PremieredOnInfo.fromString(String(match[premieredOnFieldRef])), // 3
  posterURL: match[urlFieldRef], // 4
  imdbRating: match[imdbRatingFieldRef]) // 5

marvelProductions.append(production)

This creates an instance of MarvelProductionItem and appends it to the array, but there's a little more happening:

  1. You convert the first two Substring parameters to strings.
  2. ProductionYearInfo is an enum. You're creating an instance from the string value. You'll implement this part in the next section. For now, the value is always ProductionYearInfo.unknown.
  3. PremieredOnInfo is also an enum you'll implement in the next section. The value for now is PremieredOnInfo.unknown.
  4. The value provided for the poster is a URL and not a string.
  5. The rating value is already a Float.

Build and run. You should see the Movies and TV shows listed on the app.

The app showing data presented on the screen