Awesome
Parse HTML to Go Struct or Dump to JSON
The parser project provides a method to convert HTML bytes to Golang Object or JSON bytes.
Installation
To install and set up the project, follow these steps:
- Clone the repository:
git clone https://github.com/html2any/parser.git
- Navigate to the project directory:
cd parser/example
- Install the required dependencies:
go build && ./example
Usage
Follow these instructions to use the project:
HTML TO JSON
if data, err := os.ReadFile("complicated.html"); err != nil {
panic(err)
} else {
if data, err := html2json.Convert(data); err != nil {
panic(err)
} else {
os.WriteFile("complicated.json", data, 0644)
}
}
HTML TO Go Object
- You should define a struct with methods
- Implement Methods
- Examples could find in html2json
type Tag struct {
TagName string `json:"tagName"`
Attrs map[string]string `json:"attrs"`
Children []*Tag `json:"children"`
Content string `json:"content"`
}
// When Find A New Tag, The Call Step is:
// 0. Check Content Before Tag, If It Is Not Empty, Add To Parent.
// 1. Create A NewTag.
// 2. Set The TagName.
// 3. Set Attrs.
// 4. Add Child To Parent.
// 0. Check Content Before Tag, If It Is Not Empty, Add To Parent.
func (t *Tag) SetContent(content string) parser.IHtmlTag {
t.Content = content
return t
}
// 1. Create A NewTag.
func (t *Tag) NewTag() parser.IHtmlTag {
tag := new(Tag)
return tag
}
// 2. Set The TagName.
func (t *Tag) SetTagName(tagname string) parser.IHtmlTag {
t.TagName = tagname
t.Children = make([]*Tag, 0)
t.Attrs = make(map[string]string)
t.TagName = tagname
return t
}
// 3. Set Attrs
func (t *Tag) SetAttr(attr string, value string) parser.IHtmlTag {
t.Attrs[attr] = value
return t
}
// 4. Add Child To Parent
func (t *Tag) AddChild(child parser.IHtmlTag) parser.IHtmlTag {
t.Children = append(t.Children, child.(*Tag))
return t
}
// GetTagName Help To Check Tag Close is Correct?
func (t *Tag) GetTagName() string {
return t.TagName
}
- Use method just like
json.Unmarshal
var root Tag
if err := parser.ParseHTML(data, &root); err != nil {
panic(err)
}
Performance
goos: linux
goarch: amd64
pkg: parser/example
cpu: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
BenchmarkHTMLParser-24 6390 988853 ns/op 197689 B/op 4407 allocs/op
BenchmarkHTMLParserByNetHTML-24 4892 1192781 ns/op 200719 B/op 4020 allocs/op
BenchmarkJSONParser-24 1194 5007065 ns/op 234991 B/op 5561 allocs/op
BenchmarkSonicParser-24 4796 1198912 ns/op 235447 B/op 2232 allocs/op
PASS
ok parser/example 19.045s
Contributing
Contributions to the project are welcome! If you would like to contribute, please follow these guidelines:
- Fork the repository.
- Create a new branch:
git checkout -b feature/your-feature
. - Make your changes and commit them:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin feature/your-feature
. - Open a pull request.
Please ensure that your code adheres to the project's coding style and includes appropriate tests.
License
This project is licensed under the MIT License. See the LICENSE file for more information.