这种情况不建议使用正则做匹配,而且从你贴出来的这一小部分代码里也看不出什么规律。
创新互联坚持“要么做到,要么别承诺”的工作理念,服务领域包括:成都网站设计、成都网站制作、企业官网、英文网站、手机端网站、网站推广等服务,满足客户于互联网时代的靖宇网站设计、移动媒体设计的需求,帮助企业找到有效的互联网解决方案。努力成为您成熟可靠的网络建设合作伙伴!
建议使用 HtmlAgilityPack 或者 NSoup 一类的库,就可以把 HTML 文档变成类似于 jQuery 选择的方式来处理了,容错性和便捷度都更高。
Imports System.Text.RegularExpressions
Public Class Form1
Public Function MadeRegexArray(ByVal strText As String, ByVal strRegx As String, ByVal rexOpt As RegexOptions, ByVal Groups As Integer) As String()
REM 正则结果直接以字符串组形式返回
REM strHtml待搜索的字符串
REM strRegx正则表达式
REM rexOpt正则选项
On Error Resume Next
Return MadeMatchString(MadeRegexMatch(strText, strRegx, rexOpt), Groups)
End Function
Public Function MadeMatchString(ByVal mc As MatchCollection, ByVal Groups As Integer) As String()
REM 把MatchCollection以字符串组形式保存
REM mc正则匹配的集合
REM 返回字符串组
On Error Resume Next
Dim strRegCode(mc.Count) As String
For i As Integer = 0 To mc.Count - 1
strRegCode(i) = mc(i).Groups(Groups).Value
Next i
Return strRegCode
End Function
Public Function MadeRegexMatch(ByVal strText As String, ByVal strRegex As String, ByVal rexOpt As RegexOptions) As MatchCollection
REM 获取正则表达式匹配的集合
REM strHtml待正则的字符串
REM strRegex正则表达式
REM rexOpt正则选项
REM 返回 MatchCollection 类型集合
On Error Resume Next
Dim rex As Regex = New Regex(strRegex, rexOpt)
Return rex.Matches(strText)
End Function
Public Function MadeRegexReplace(ByVal sText As String, ByVal sRegex As String, ByVal sReplace As String) As String
REM 正则表达式文本替换
REM sText原文本
REM sRegex表达式
REM sReplace替换文本
On Error Resume Next
Dim rex As Regex = New Regex(sRegex, RegexOptions.IgnoreCase)
Return rex.Replace(sText, sReplace)
End Function
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
On Error Resume Next
Dim sRegex As String = "([\s\S]*?)"
Dim aTable() As String = MadeRegexArray(TextBox1.Text, sRegex, RegexOptions.IgnoreCase, 1)
For i As Integer = 0 To aTable.Length - 1
TextBox2.Text = TextBox2.Text aTable(i)
Next i
End Sub
End Class
如图:
(a|A)href=(.*?[\u4e00-\u9fa5]{8,19}.*?)(a|A)
在需要提取的部分加上正反括号进行匿名分组,
然后用$2获取该匿名分组就可以了
或者可以做命名分组,