Skip to content

New OnPage SEO Function - Need Feedback

123468

Comments

  • edited June 2020
    CompaniesKaine said:
    ... I don't know how this could be approached, it would take semantic dictionaries and some sort of AI to do the job as a search engine...
    So you plug in two text corpus and calculate a similarity score between them, ranging from 0 to 1. This can be approached with NLP (Natural Language Processing). Like a "Bert" Model (https://en.wikipedia.org/wiki/BERT_(language_model) trained on a special loss function with some specifications. If you have any needs to get AI going for SEO needs, iam happy for ideas and a team up for launching a service like an API, for example on https://rapidapi.com/

    If Sven is interested in lifting up the captchabreaker game to AI level, i could help too. Its a combination of CNN  (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) networks, setted up behind as AI component. The user only needs to collect enough samples with labels (annotations) ~50-100 per Captchatype and press a button. A new net would automatically be trained on the camptcha type with high accurracy even on hard captchas. Sven wouldn't implement a python interpreter in the GSA products, so i think the only way to go is microservice approach via api. 
  • KaineKaine thebestindexer.com
    edited July 2020
    Hello @TOPtActics :) ,
    The idea seems nice after I still need to know how to achieve it.
    I was talking about semantic dictionaries because it would have been possible to find (by correspondence) the dictionary that would have the most similarity with a text and therefore to know what it is talking about.
    I do not know if this corresponds with the initial request but the interests would be multiple and appropriate for a software like Gsa Content Generator for example.
    For GSA Keyword Research, this could make it possible to create a semantic graph which could be useful for the realization of a silo, or the study of the tree structure of a site.

    Semantic Cocoon:


    Some interesting links:
    https://gephi.org/
    https://medialab.sciencespo.fr/en/tools/navicrawler/
  • SvenSven www.GSA-Online.de
    Latest update will create reports in HTML with all kinds of data that your SEO customers might need/want.
    It's not perfect but slowly getting there. Let me know what you think.

    Thanked by 1Kaine
  • Kaine said:
    Hello @TOPtActics :) ,
    The idea seems nice after I still need to know how to achieve it.
    I was talking about semantic dictionaries because it would have been possible to find (by correspondence) the dictionary that would have the most similarity with a text and therefore to know what it is talking about.
    I do not know if this corresponds with the initial request but the interests would be multiple and appropriate for a software like Gsa Content Generator for example.
    For GSA Keyword Research, this could make it possible to create a semantic graph which could be useful for the realization of a silo, or the study of the tree structure of a site.

    Semantic Cocoon:
    ...
    Hi Kaine, 

    my suggestion would be to go with a similarity score from a BERT model between all Corpuses (corpus = the content from a page for example) and to know what the corpus is talking about, do a topic modelling (https://en.wikipedia.org/wiki/Topic_model) for every corpus, too.

    If there is interest in building such an API or a implementation, let me know.


    Thanked by 1Kaine
  • edited July 2020
    Kaine said:
    ...
    Some interesting links:
    https://gephi.org/
    ...
    Nice tool, this could be done too. BERT or every Neural network has a n-dimensional (n can be choosen) representation in the middle of the network. This n dimensions can be reduced with dimensionality reduction algos like t-SNE or PCA to get a 2-dimensional representation of an input e.g. a word for graphical analysis. The distance in this 2-D space or in the n-D space has a meaning. If you calculate eucledean distances or manhatten distance or any L-Norm in this space, you can interpret this as a word "meaning" distance. Similar words stand next to each other.
  • KaineKaine thebestindexer.com
    edited July 2020
    Sven said:
    Latest update will create reports in HTML with all kinds of data that your SEO customers might need/want.
    It's not perfect but slowly getting there. Let me know what you think.


    It still adds value but I don't think I would use it in the short term. It is rather the "Keywords/Content" aspect that attracts me to this software.
    I take this opportunity to respond to @TOPtActics since it comes together, all that relates to content, to its understanding, its optimization and even its generation is the most vital sector of present and future SEO and deserves each search going in this direction. The more Search Engine will develop their AI, the more our content will be flawless. Since I launched our indexing service, I have seen tons of different strategies and I think I can tell you that the age of the spinned text should soon come to an end. The detection is much better since the last update.
    You will therefore have to adapt, write your own texts or have this part outsourced by writers if something new does not happen quickly enough.
  • SvenSven www.GSA-Online.de
    edited July 2020
    Latest update also includes keyword suggestions.

    Thanked by 1Kaine
  • edited August 2020
    I coded a multi-categorizer with multi-language support some time ago, maybe this is interesting for you?

    You put in any text you like in any language you like (de, en, fr, pl, sp, etc....) and you give em any categories you like (maybe your site categories) in any language you like and the model will put out probabilities for each category given that the text belongs to that category.

    Also sentiment is possible:



    Here is an example:

    https://www.nytimes.com/2020/08/18/dining/black-jam-makers.html -- Food

    pasted first paragraph in
    pasted original nyt categories in

    the model got it right.



    Another Example:

    https://www.nytimes.com/2020/08/28/technology/microsoft-tiktok-lobbying.html -- Tech

    Tech, Business, Politics & USA > 50% --> For me it seems the model got it right! :)



    The special thing here is, normally you have to train a new neural net on every new label-set, here i can put in whatever label i like and the model will do its best to find a way the text similarity to the label (category in this example). This is totaly new in the game! One model to serve your needs for categorization. :)

    For example i can just paste random new category i think about in my categroy-space like "app" and the model will handle it:


    Thanked by 1Kaine
  • SvenSven www.GSA-Online.de
    Sounds useful but how did you train it? It's a NN?
  • edited August 2020
    Its a NN with a special kind of training, architecture and a special cost function. The network puts the labels in a sentence called hypothese_template and compares the distance between the sentence and the hypothese_template and calculates the probabilties from that. the hypothese_template is somthing like --> 'This text is about {}.' But can be modified for better accuracy. But the base hypothese_template works pretty good out of the box.
  • edited September 2020


    I'am now done coding a little test API (nothing for much load, no redis etc..., maybe it can handle 200 requests/min). Iam on holiday next week, maybe i send you guys a link the week after. This screen is a GET-API, i will only deploy a POST-API, because the sequences can get to big.
  • SvenSven www.GSA-Online.de
    yes perfect :)
  • z3rz3r
    edited September 2020
    Thanks @Sven for all the updates
    Sven said:
    1) Thanks for the explanation. Though I think it's just the same as it's added now but with different statistics behind. What you see in the listing now is really the data you would expect, maybe in some different order but still I think it's ranked good.

    Bur yes, I will probably make a new listing for found terms on a new form to show more stats on each term/phrase.

    2) I really would like not to add google APIs here.
    - Still no chance to have google nlp api? Maybe optional for who want it, i'm using them with another tool and they provide a lot of different keywords that we might miss with ngram/td-idf
    - I'm still having some issue when i search a English keywords for the US, in the top 10 it will show some local serps sometimes, any solution?
    - Is possible to export in html all sites? at the moment only the personal report is html/excel
    - An option to export only the keywords in html/excel without opening the full research, the list is so long that it start lagging and sometimes it freeze
    - Can we select what we want in the full competitor research to speed up the process?
  • SvenSven www.GSA-Online.de
    - Still no chance to have google nlp api? Maybe optional for who want it, i'm using them with another tool and they provide a lot of different keywords that we might miss with ngram/td-idf
    whats that other app? i will try and have a look again
    - I'm still having some issue when i search a English keywords for the US, in the top 10 it will show some local serps sometimes, any solution?
    please give a sample.
    - Is possible to export in html all sites? at the moment only the personal report is html/excel
    yes but that would be a bit useless as there is nothing to compare against!?
    - Can we select what we want in the full competitor research to speed up the process?
    Yes, you can click on configure and edit the ranking factor filters or right click on the factor to filter it out
    - An option to export only the keywords in html/excel without opening the full research, the list is so long that it start lagging and sometimes it freeze
    Where exactly is it loading slow? The ngram on full competitor research? You don't need this at all to get the ngram data. You can get it on main form with "Add->Extract from Website->Search"
  • z3rz3r
    edited September 2020
    Sven said:
    - Still no chance to have google nlp api? Maybe optional for who want it, i'm using them with another tool and they provide a lot of different keywords that we might miss with ngram/td-idf
    whats that other app? i will try and have a look again
    - I'm still having some issue when i search a English keywords for the US, in the top 10 it will show some local serps sometimes, any solution?
    please give a sample.
    - Is possible to export in html all sites? at the moment only the personal report is html/excel
    yes but that would be a bit useless as there is nothing to compare against!?
    - Can we select what we want in the full competitor research to speed up the process?
    Yes, you can click on configure and edit the ranking factor filters or right click on the factor to filter it out
    - An option to export only the keywords in html/excel without opening the full research, the list is so long that it start lagging and sometimes it freeze
    Where exactly is it loading slow? The ngram on full competitor research? You don't need this at all to get the ngram data. You can get it on main form with "Add->Extract from Website->Search"
    - Surferseo they usually show keywords from google nlp besides their suggestion http://prntscr.com/ufj5di
    http://prntscr.com/ufj60a you can see some website have /it filter and some are just italian websites
    - Like you said it's useful when you have already something to compare but when we research some keywords/topic we have nothing to compare, we can see what is going on in the top 10 in that case
    - option -> edit filter it works, right click -> filter highlighted and remove, it doesn't remove anything on my end
    - It lags and freeze (sometimes) when i scroll down to the bottom the full competitor research. I just checked the ngram export from the main interface, it just show the full keyword list without all the info we have inside the full competitor research am i right?
    - is it still in the roadmap td-idf keywords beside the ngram?

  • SvenSven www.GSA-Online.de
    - Like you said it's useful when you have already something to compare but when we research some
    next update will allow you to export it in html and use the HTML report template as base
    -> filter highlighted and remove, it doesn't remove anything on my end
    what did you try to filter out? Some factors are dynamic.
    - It lags and freeze (sometimes) when i scroll down to the bottom the full competitor research. I just checked the ngram export from the main interface, it just show the full keyword list without all the info we have inside the full competitor research am i right?
    yes, it just shows you the extracted keywords, not details on it. Though I might add some more details if you need it.
    - is it still in the roadmap td-idf keywords beside the ngram?
    well td-ldf is just a different metric on how important the ngram extracted keyword might be. I don't see how I can get more keywords extracted here.
  • z3rz3r
    edited September 2020
    Thanks @Sven :)
    - Regarding the keyword export, all the data inside the full report are pretty useful, thats the main reason 
    - An option to filter keywords by words 1- 2 - 3 etc
    - I tried to filter out the whole security section, the server line and http version line, it doesn't remove anything. ( refreshed ). Can we have a search function or something to find what we want inside the filter list? http://prntscr.com/ufldai

  • Hello Sven,
    It's great. Thanks for sharing this new OnPage SEO functions
  • SvenSven www.GSA-Online.de
    - An option to filter keywords by words 1- 2 - 3 etc
    you have that already. You can edit how many sites must have a word to take notice of it. edit ngram filter is what you are looking for.

  • edited September 2020
    Sven said:
    ...
    - is it still in the roadmap td-idf keywords beside the ngram?
    well td-ldf is just a different metric on how important the ngram extracted keyword might be. I don't see how I can get more keywords extracted here.

    OK one freebie for you guys: A simple yet powerfull unsupervised algorithm for extracting keywords from texts is called ... TextRank ... (http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf). It is not as powerful as a good neural network, but its way more powerful than tf-idf wrangling. It uses graph-based ranking algorithms (explicitly it uses the good old PageRank ;) ) to natural language texts.  So TextRank build a graph that represents the text, and interconnects words or other text entities with meaningful relations. This "textgraph" is than evaluated with the pagerank algo:



     Sven is right on that one, TextRank will not extract "more" Keywords, but it gives the better ranking among all the words and n-grams in the text in respect of importance/relevance to the text. More Keywords are not always better! You want the relevant keywords, not all keywords. If you want to extract all keywords, just extract all n-grams without a weight :). One method to get more "good/important/relevant" keywords not in the text itself is to lookup the nearby keywords from the best TextRank keywords with an pretrained embedding matrix.

    If sven can implement it, great. If you need an api let me know, i have this already in my repo and i can set it up for you after i have done the setup for the "Content-Categorizer". I can also setup a much more powerful keyword extractor based on deeplearning if you guys are interested. 
    Thanked by 1z3r
  • KaineKaine thebestindexer.com
    edited September 2020
    @TOPtActicsI don't understand all the abbreviations but it sounds really very interesting. We really have to push in this direction and you seem to have studied the subject well. There seems to be something to recover from the side of OpenAI with Elon Musk's Gpt-3(4)

    https://www.cnbc.com/2020/07/23/openai-gpt3-explainer.html#:~:text=OpenAI first described GPT-3,and spam in vast quantities.
  • Elon Musk is just an investor from openai, like microsoft is too. GPT3 is closed so we cant rebuild it and it will be opened for ~400$/month. Its really cool what i see from the api playground of gpt3. By now the best general purpose model out there, but i think its to expensive and most of the tasks have to be "finetuned" on a specific problem set, to work good.
  • SvenSven www.GSA-Online.de
    @TOPtActics thanks for the ideas here. Though the extraction of phrases and keywords is already good in my eyes. The labeling of what might be more important than others is something to look at right now. This algorithm can help here as well.
  • z3rz3r
    edited October 2020
    @Sven is possible at the moment to bulk quick competitor research? i have tried to select two or more but it always search the first keyword and open the top10 list or it's the same as using tool -> collect SEO score? manually searching one by one it takes few seconds while using the tool even with two keywords it takes few minutes
  • SvenSven www.GSA-Online.de
    @z3r you select the keywords you want the data for and click tools->collect meta data->on the dialog you choose to do this for selected items only.
  • Sven said:
    @z3r you select the keywords you want the data for and click tools->collect meta data->on the dialog you choose to do this for selected items only.
    It take few minutes for two keywords
  • SvenSven www.GSA-Online.de
    that depends on the proxy setup here. Because a query to the search engine has to be done + all the search results have to be parsed to get a score.
  • KaineKaine thebestindexer.com
    edited October 2020
    @Sven
    I find myself in a situation which could lead to a new option.
    I have a list of keywords and I would like to be able to test their presence in a page of one of my sites.
    It might be interesting to do a mass check of all the words so that you can only keep the words that are missing on the page.
    I think that the check could be done directly on the html version (which would therefore include all the tags without distinction) and have in return only the missing words.
  • SvenSven www.GSA-Online.de
    sounds like a useful option. I will add support for it.
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    Yes i think too, thank you Sven!
Sign In or Register to comment.